machine learning methods and their acknowledgement ... · 1 machine learning methods and their...
Post on 08-Jun-2018
230 Views
Preview:
TRANSCRIPT
1
Machine Learning Methods and Their
Applications to Autonomous Driving and
e-Healthcare using Video Sensor Data
Prof. Irene Y.H. Gu
Signal Processing Group, Dept. of Signals and Systems
Chalmers University of Technology, Göteborg, Sweden
April 14, 2016
Acknowledgement
Chalmers
ZH Khan, K Fu, Y Yun,
M Bolbat, S Haner, P Strandström,
MH Changrampadi, M Emami, D Moro, DP Kumar,
H Fundin, A Johannesson, P Shams, G Sowulewski1
Other collaborators
L Li (I2R, Singapore),
H Aghajan (Stanford, USA),
J Yang, X long (Shanghai Jiao Tong Univ., China),
M Thordstein, A Flisberg (SU, Sweden)
Contents
1. Why using video/image-based techniques ?
2. Applications
3. Demo: our results
4. Our recently developed ML methods
5. Conclusion
1. Why ML using visual sensors
Data from visual sensors (RGB-D and IR
images/videos) provide important information.
Machine learning (ML) using visual data:
* important in its own right for theoretical studies
* wide applications
Our methods focus on video/image information analysis,
modeling for machine learning
2. Applications we address
ML for autonomous driving /traffic analysis /
driver assistence:
- road traffic mornitoring: speed, frequency, lane, traffic
light, surrounding vehicles …
- traffic sign detection and recognition
- drivers’ attention, other status (sleepyness, attention,
actions, identification)
ML for activity recognition and e-healthcare
ML for online learning in tracking/surveillance systems
Dynamic:
videos:dynamic background in addition to moving objects
Static:
videos: static background (though background changes in images
due to lighting, camera jitter, occlusion…)
Visual (RGB)
Thermal IR
Near IR
Depth
CamerasCameras can be mounted in different ways:
Cameras types we use:
2
3. Demo: Examples from Our
Experimental Results
Demo-1: Traffic applications
+ vehicle tracking
+ road traffic analysis
+ traffic sign recognition
Results: Automatic traffic sign detection and recognition
Results of tracking vehicles
static camera
3
Results: traffic monitoring
Demo-2:
video tracking applications
Tracking under a range of complex scenarios
by using single/multiple cameras
+ non-plannar (out-of-plane) changes
(using single camera video)
+ partial/full occlusions
(using single/multiple camera videos)
4
Tracking using multi-camera
videos
(PETS 2006, Scenario 7,
3-cameras)
Tracking using multi-camera
videos
(PETS 2006, Scenario 7,
3-cameras)
Tracking using multi-camera
videos
(PETS 2006, Scenario 7,
3-cameras)
Tracking using multi-camera
videos
(TUG dataset, hard scenario,
3-cameras)
+ tracking: human faces (single camera)
+ analyze / classify: eye states
(sleepy, blink, open, close …)
Demo-3: face tracking and analysis
IR:
5
Detect eye states for early warning: too sleepy to drive?
Demo-4: Point feature-based tracking
+ limb movement
+ video object tracking (with occlusions+intersection)
Tracking limb-movement for analyzing abrupt
movement related to infant neurological dysfunctions
Demo-5:
Identification of human activities …
by fusion, classification from RGB-D videos
6
Identify human activities from videos
(for healthcare, assisted-living)Chalmers RGB-D video dataset:
Falling down, lying down, eating, drinking, reading, playing laptop,
sitting down, walking …(each activity: 500 videos by 19 subjects)
Identify human activities from images(can be used to identify driver’s status: use cell phone, reading, chatting, eating …)
1
4. Our recently developed ML methods
4.1. Domain-shift online learning /classification on Riemannian
manifolds
4.2. Enhanced ML from salient object /region detection
4.3. Applications:
Traffic sign recognition
Human fall detection and activity classification
4.1. Domain-Shift learning/classification
Remannian manifold learning of large-size video
objects with out-of-plane pose changes
What is a manifold ?
A set of all low dimensional subspaces in a high dim.
space: {𝑅𝑘} ∈ 𝑅𝑛
Nonlinear, e.g. curved space
May define a set of matrics, or calculus on a manifold
Geometry, topology, and some essential properties of
signals are maintained
Local Eulidean, but not globally
A set of metrics may be defined on manifolds
Differential/smooth manifolds: particularly attractive
e.g. Riemannian, Grassmann
Characterize dynamic signals/object whose statistics evolve in time,
not lying in a single vector space.
=> employ smooth shifting domains to characterize such
dynamic objects
e.g a dynamic object in images with out-of-plane pose changes does not lie in the
same vector space; rather, lies in a set of subspaces (or, on a manifold).
Efficiently represent a signal by a set of low dim. subspaces
e.g a 1D curve embedded in 3D space; “walking” is cyclic on a manifold.
Motivation: manifold learning and classification
Domain-shift characterization: dynamic objects
Object in each image frame
a manifold point
dynamic video object
a curve on the manifold
Riemannian manifolds: some notations
Geodesic: shortest curve on the manifold
Geodesic distance on Riemannian manifold (under log-Euclidean metric)
Matrics: Riemannian metrics are inner products on manifolds
(preserve geodesic distance, symmetric positive)
e.g. Log Eucldean matric, affine-invariant metric
Means:
Riemannian (extrinsic) mean
Exponential mapping
Logarithmic mapping
Mapping functions:
=
(under log Euclidean metric)
(p,q: covariance matrix)
Karcher (intrinsic) mean
domain-shift learning for visual object tracking
method, with main novelties:
Sequential Bayesian online learning and tracking on the manifold;
A dynamic NL model for object appearance on the manifold:
both manifold point and its velocity are included in state vector;
Extend particle filters on the manifold;
Domain-shift online learning with occlusion handling
Main issues:
ML and Classification on Riemannian manifolds.
The method is particularly attractive for tracking large-size
deformable objects with significant out-of-plane pose changes
2
NL dynamic modeling on the Manifold
state vector:
piecewise geodesic:
constant velocity:
Sequential Bayesian estimation of by employing a
particle filter on the manifold
PF-1 on the manifold for online learning:
Likelihood between new observation and predicted manifold point
MMSE estimate: (expected value of weighted particles)
geodesic:
PF-1 weights:
new observation from the tracking
Posterior reference object
before the occlusion handling
Occusion handling
Main challenge: Ambiguity in changes due to:
- out-of-plane target object ?
- other occluding objects/background clutter ?
Rationale: similarity between the candidate and reference object:
- Occluding object/clutter is generally less similar (to target)
- Target with slightly changed views are more similar
Similarity measure: short geodesic dist. between target and ref. object
Strategy adopted:
perform ref. object learning, only when occlusion is unlikely!
geodesic distance: ref. object at (t-1) and posteriori estimated ref. object at t
Proposed Riemannian manifold online learning
and tracking (learning and tracking in alternation)
Results: with (red) /without (blue) online learning
Results: with/without occlusion handling
Euclidean distance
between 4 corners of
box
Evaluation: average tracking accuracy (ATA)
larger ATA values better performance
3
4.2. Enhanced ML
from Salient Object /Region Detection
Motivations
Detect attention-grabbing objects/regions from
image scenes
Improve ML by using segmented objects/regions
Object detection and recognition
Video summarization
Content-based image editing
Image retrieval
Object enhancement through regression
….
Applications may be found in:
Original Superpixels
Global contrast
Harris convex hull
Coarse saliency
Geodesic propagation
Merging
Update salient values based on geodesic propagation:
Area of a
soft region
Define: geodesic distance for 𝑹𝒊 ∶
Connectivity measure [0,1] Coarse energy
Shortest
path on the
graph
Method-A: Saliency Detection by Geodesic Propagation
Estimate saliency map by propagating saliency regions from a coarse
map, based on geodesic distance (It is a geodesic-based filtering/regression).
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
Pre
cis
ion
CA
IT
SR
FT
LC
HC
RC
SF
GS
Ours
Global Contrast Convex hull Coarse saliencyGeodesic Saliency
Propagation
Results and performance
Graph nodes are formed from the 2-ring graph topology: green connections (immediate neighbor superpixels) + blue connections (2nd layer neighbor superpixels) + brown connections (boundary superpixels)
Graph 𝑮𝟏 = 𝑽,𝑬,𝑾 𝐟𝐫𝐨𝐦 𝐢𝐦𝐚𝐠𝐞 𝐮𝐬𝐢𝐧𝐠 𝐬𝐮𝐩𝐞𝐫𝐩𝐢𝐱𝐞𝐥s as the basic components
TIP 2015
Method Description:
Superpixel graph
A node belonging to the salient object
A node belonging to the background
Input image
Method-B: Ncut-based saliency detection by
adaptive multi-level region merging Main ideas: Apply Ncut to salient region detection, and induce a saliency map by Ncut eigenvectors for visual clustering.
Define graph edge weight: affinity (dissimilarity)
Superpixel color differences Intervening edge magnitude
Intervening edge magnitude may help delineate object v.s. background!
Object and
background
have similar
colors but
different
textures
E(p):
- Line connects 2 superpixels
- Intensity of edge point on the line
Saliency computation: Ncut + adaptive region mergingNcut generate a partition , that minimizes the cut cost:
where: 𝑎𝑠𝑠𝑜𝑐 𝐴𝑖, 𝑉 = 𝑣𝑚∈𝐴𝑖,𝑣𝑗∈𝑉 𝑤𝑚𝑗
4
Graph spectral analysis to obtain clustering information
Ncut: Solve from 𝐺1 (generalized eigen-decomposition)
Update the graph edge weight 𝑒𝑖𝑗 in graph-2: 𝐺2 = (𝑉,𝐸, 𝒆):
Remark: eigenvectors: soft indicating vectors for the Ncut, eij: a measure of inter-cluster distance of nodes
Pick up nvec (e.g. nvec=8) eigenvectors with smallest non-zero eigenvalues TIP 2015
Saliency computation by multilevel adaptive merging of graph-2 nodes
1) Merging starts from initial super-pixels 2) At level l, two regions are merged, if
3) At the next level l+1:
4) Continue merging 2)-3) until convergence5) Final saliency map: sum of saliency-maps in all levels + smoothing 𝒇 = 𝑫− 𝛽𝑾 −1𝒔
≤ 𝑇ℎ
where:
Reconstructed graph edges from Ncut
Cluster information gradually discovered
in 𝐺2
𝐺1
Evaluation: precision-recall curve + F-measure + MAE
Results and performance
Results: quantitative evaluation
Method-C: Manifold Diffusion-based Saliency
Detection by Adaptive Graph Weight Construction
Problems: Existing methods use fixed Gaussian bandwidths to measure graph affinity.
Do not always reach optimum for images with different FG/BG contrast
Diffusion
matrix
Diffused saliency
values
Seed vector
Diffusion matrix A* highly depends on the graph edge weights:
Saliency seeds Saliency map
High
contrastlow
contrast
Apply MPD to saliency detection in a 2-stage manner
Input image Superpixel graph
Manifold smoothness
Manifold reconstruction
Two-stage detection scheme
Saliency map
vi
vj
vi
vj
MPD
MPDS
2 graphs (G1: node smoothness, G2: local linear embedding LLE)
5
Proposed diffusion MPD : manifold assumptions + adaptive weight construction
1. Compute: weights for edges (affinity matrix W); and reconstruction matrix A: min error bylocal linear embedding (LLE)
2. Diffusion energy is formulated as: (given G=(V,E), and adaptively estimated W and A)
(not nb’s)
3. Generate saliency map by: using BG seeds, FG seeds, Harris convex hull, and apply MPD.
PerformanceComparison with Yang’13 (fixed-bandwidth diffusion)
Quantitative evaluation: precision-recall curve + MAE + F-measure
Resultsfixed-bandwidth
diffusion
4.3. Application:
(a) Enhanced ML for automatic detection and
recognition of saliency-segmented traffic signs
Aims: automatic traffic sign recognition (ATR)
Applications:
Advanced driver assistance systems
Intelligent autonomous driving
Road/highway maintenance
Sign inventory
Recognizing traffic signs captured from street-view images/videos.
Enhance performance through salient object detection and classification
Google self-
driving car
Recognition of signs from street-view images
What are street-view images/videos?
Street images are captured by multiple cameras mounted on the top of a moving vehicle.
Images captured in different orientations are stitched to generate a 360 degree full street scene.
Publically available online street-view data:
Google (covers many countries in the world)
Tencent (covers main cities in China)
A Google
street-view car
Appearance distortions of signs due to, e.g.: lighting, view angle changes, image compression, scale changes, occlusion, motion blur …
Background noise, e.g., advertisement, logos, dirt/clutter, partial occlusion …
Similarity within and across sign categories
Main challenges
6
Saliency-enhanced coarse-to-fine learning/classification
Category detector #1
Training
samples
Category detector #2
Learning
Training
samples
Learning
Category detector #3
Training
samples
Learning
No
n-m
axim
um
su
pp
ressio
n a
mo
ng
ca
teg
ori
es
Sliding
windows
Sliding
windows
Coarse classification
Sample street view image #1(960*640)
Sample street view image #2 (960*640)
Detected signs for image #1
Detected signs for image #2
a) Coarse-step: detect sign categories Sliding window candidate detection
Integral channel features (Dollar’09) + discrete AdaBoost
Non-maximum suppression across categories
Robust
segmentation
of salient
sign regions
(ROIs)
Fin
e c
las
sif
icati
on
in
cate
go
ries
Feature extraction and
dimension reduction
Sign recognition of image #1
Sign recognition of image #2
b) Fine-step: Saliency-based segmentation and classification of
sign classes within each category
Study traffic signs in 3 categories
Performance:
Experiments and Results
b) Classification of signs within each category
a) Classification of sign categories
C is within-category
confusion matrix
Average classification rate
(overall classification
accuracy):
Performance: Classification of 11 classes within “indication”
category on signs (894/905) from the testset (3237 images)
1 2 3 4 5 6 7 8 9 10 11 121 'min100' 42 1 0 0 0 0 0 0 0 0 0 02 'min110' 3 21 0 2 0 1 0 0 0 0 0 03 'min50' 0 0 0 3 0 0 0 0 0 0 0 04 'min60' 1 0 0 524 0 0 0 0 1 0 0 15 'min70' 0 0 0 0 7 2 6 0 0 0 0 16 'min80' 1 0 0 16 0 83 1 0 0 0 0 07 'min90' 1 1 0 5 0 1 77 0 0 0 0 08 'must-horn' 0 0 0 0 0 0 0 0 0 0 0 09 'must-left' 0 0 0 0 0 0 0 0 7 0 0 110 'must-right' 0 0 0 0 0 0 0 0 0 27 0 011 'must-straight' 0 0 0 0 0 0 0 0 0 0 1 012 'unknown' 0 0 0 4 0 0 0 0 0 0 0 7
Act
ual
Cla
sses
Classified classes
Average
precision
= 0.917958
Average recall
= 0.842536
Total No signs
(with/without
Unknown
class)
= 894/905
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 181 'unknown' 105 5 2 1 8 0 1 0 4 2 2 1 4 3 14 6 1 02 'warn-accident' 0 17 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 03 'warn-construct' 0 0 37 0 0 0 0 0 0 0 0 0 0 0 1 0 0 14 'warn-cross' 0 0 0 6 0 0 0 0 0 1 0 0 0 0 0 0 0 15 'warn-danger' 0 0 0 0 4 0 1 0 0 0 0 0 0 0 0 0 0 06 'warn-go-right' 1 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 07 'warn-human' 0 0 1 0 1 0 7 0 0 0 0 0 0 0 0 0 0 08 'warn-kids' 10 0 0 0 0 0 0 13 0 0 0 0 0 1 0 2 0 09 'warn-left-T' 2 0 0 0 0 0 2 0 7 0 0 0 0 0 0 0 0 010 'warn-left-turn' 0 0 0 0 0 0 0 0 0 16 0 0 0 0 0 0 0 011 'warn-narrow' 1 0 0 0 0 0 0 0 0 0 59 0 5 0 0 0 0 012 'warn-railway' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 013 'warn-right-Lane' 0 0 0 0 2 0 0 0 0 1 3 0 247 0 0 0 0 014 'warn-right-T' 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 015 'warn-right-turn' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 25 0 0 016 'warn-slow' 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 17 0 017 'warn-tunnel' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 018 'warn-zzz' 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 11
Performance: Classification of 33 classes within “Warning”
category on signs (536/698) from the testset (3237 images)
Act
ual
Cla
sses
Classified classes
Average
precision
=0.754189
Average recall
= 0.849863
Total No signs
(with/without
Unknown
class)
= 536/698
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 341 'SpLim:10' 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 02 'SpLim:100' 0 573 0 5 0 0 0 0 3 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 33 'SpLim:110' 0 0 36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 04 'SpLim:120' 0 6 0 454 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 'SpLim:20' 0 0 0 12 47 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 'SpLim:30' 0 0 0 1 0 38 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 27 'SpLim:40' 0 0 0 1 0 0 315 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 08 'SpLim:50' 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 09 'SpLim:60' 0 1 0 1 0 0 0 0 324 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 010 'SpLim:70' 0 0 0 0 0 0 1 0 0 60 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 011 'SpLim:80' 0 2 1 1 0 0 0 0 1 0 337 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 212 'SpLim:90' 0 2 0 0 0 0 1 0 2 0 1 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 013 'SpLm:5' 0 0 0 0 0 0 0 0 0 0 0 0 54 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 114 'combination' 0 0 0 0 0 1 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 215 'enable-overtake' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 016 'no-U-turn' 0 0 0 6 0 0 0 0 0 0 0 0 0 1 0 522 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 517 'no-bike' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 218 'no-bus' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 219 'no-car' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 120 'no-entry' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0 0 0 0 0 0 0 0 0 0 0 0 0 021 'no-explosive' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 022 'no-horn' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 1 0 0 0 0 0 0 0 323 'no-left-turn' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 224 'no-motor-bike' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 125 'no-overtake' 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 60 0 1 0 0 0 0 0 0 026 'no-parking' 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 164 13 0 0 0 0 0 0 427 'no-pass' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 18 0 0 0 0 0 0 228 'no-phone' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 029 'no-right-turn' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 030 'no-stopping' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 031 'no-tractor' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 032 'no-truck' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 1 0 0 0 0 52 0 133 'no-walking' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 434 'unknown' 0 2 0 3 1 1 1 1 0 2 4 1 0 1 0 6 0 2 0 0 0 0 0 0 0 0 1 0 2 1 0 0 0 25
Performance: Classification of 33 classes within “Prohibitary”
category on signs (3494/3549) from the testset (3237 images)
Classified classes
Act
ual
Cla
sses
Average
precision
=0.916468
Average
recall
= 0.885022
Total No
signs
(with/without
Unknown
class)
= 3494 / 3549
7
Comparison: German traffic sign recognition benchmark
(preliminary: without optimizing parameters in our training)
http://benchmark.ini.rub.de/?section=gtsrb&subsection=results
Team Method All Signs
[3] INSIA committee of CNNs 99,46%
[1] INI-RTCV Human performance 98.84%
[4] Sermanet Mult-scale CNNs 98,31%
[2] CAOR Random forests 96,14%
Ours Saliency-enhanced 95,80 %
[6] INI-RTCV LDA on HOG 2 95,68%
[5] INI-RTCV LDA on HOG 1 93,18%
[7] INI-RTCV LDA on HOG 3 92,34%
4.3. Application:
(b) ML for privacy-preserving fall detection and
activity classification using RGB-D videos
Addressed Problem
Fall detection and activity recognition from RGB-D videos
Privacy preserving: using low-resolution video, or depth video
only
Applications:
Automatically detect falls and trigger alarms
Detect falls using a single camera view
Exploit spatial-temporal features of shape, pose + appearance
Privacy-preserving issue
Healthcare, assisted living …
Motivations
Main focusing issues: Effective spatio-temporal features:
Global shape + motion from RGB videos
Local shape + motion from Depth videos
Combine different features for fall detection
(classify fall vs. lie-down)
Study the contribution of individual component feature to
overall performance
Exploit low-resolution RGB-D videos for privacy preserving
fall detection
Riemannian manifold classify of a list of activities (onging)
The big picture (ongoing work)
Manifold-based
video activity
classification
Normalize ROI size: maintain object aspect ratio (filling BG)
(w, h) ⇒ max(w, h) ⇒ (λ, λ) Appearance feature is represented by HOG
Motion feature is represented by HOG-OF
RGB videos: motion and appearance features
8
Depth videos: dynamic shape/shape features
Shape dynamic features
Shape features Time-Dependent Features
Time-dependent feature matrix:
Conclusion
ML methods: several of our recent ML methods (smooth
manifold-based ML, saliency-enhanced ML) are presented
ML applications: several applications (traffic sign
recognition, activity classification, visual object tracking) are
presented.
RGB-D video data from camera sensors is shown to contain
important information, and is useful for ML
More ML research attentions should be put on image/video
analysis techniques for ML.
top related