agenda - wordpress.com · 6) matthew tang, “recognizing hand gestures with microsoft’s...

36

Upload: others

Post on 11-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,
Page 2: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Agenda •  Vision •  Current State •  Our Approach - towards three main Areas

o  Dynamic Gesture Recognition •  Using Machine Learning

o  Modeling 3D objects •  Building the environment •  Modeling styles •  Other features

o  Improving Accuracy and Interface •  IMU/Kinect integration •  Mobile interface

•  Conclusion •  Demo •  Future Directions •  References

Page 3: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Vision

“Device newer interfaces and techniques, which will provide a total immersive experience for modeling 3D

objects in real time.”

Page 4: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Current State •  Wacom & AutoCAD

o  It is the industry standard to create 3D objects in AutoCAD using a Wacom pen tablet.

o  Problem is that we think and draw in 3D, but the tablets interface is 2D. We can do better by moving towards natural and 3D gestures which takes the imagination of artist to the next level.

o  No commercial application exists which solves this problem. It is still in research. Major problems are Accuracy, Precision and Control.

Page 5: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Approach

Recognition Modeling Improving

Page 6: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Sensor Library

Application ML (Training)

Sample

Sample

Data

Sensor

Application

ML (Recognition)

Data Classification

Sample

Fig.  1:  Training Fig.  2:  Classification

Basic

Page 7: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Fig.  3:  Training

Training of a new gesture

Kinect Depth  Sensor

Library (last  k  frames)

Application Artificial  Neural  

Network (Training)

Sample

Depth  Frames

Depth  Frames

OpenNI 3D  Coordinates

Supervise

Page 8: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Fig.  4:  Classification

Classification of gesture

Kinect  depth  sensor

Application

Artificial  Neural  Network

(Recognition)

Classification

3D  Coordinates OpenNI

Depth  frames

Depth  frames

Page 9: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

ANN

Fig.  5:  ANN Net

x y

z

x y

z

x y

z

. . .

3  x  k  inputs

t

t-­‐‑1

t-­‐‑k

h1

h2

h3

h4

G1

G2

G3

Gp

. . .

‘p’  outputs

hn

. . .

Page 10: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

•  k -> discrete time constant, over which gesture is performed.

•  p -> number of gestures. •  n -> number of neurons in hidden layer. •  Complete Graph between input and hidden layer,

and between hidden layer and output layer. •  Tangent sigmoidal function f(x) used at neurons

•  Learns using Back propagation of errors

Recognition Modeling Improving

ANN-Features

!(!) != !!! − !!!!! + !!! !

Page 11: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

ANN-Limitations

•  The net rebuilds itself upon addition of a new gesture to the library.

•  Needs comparatively large data sets for training. •  Detection rate drops if considering both hands.

Page 12: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Results

0 10 20 30 40 50 60 70 80 90

0 5 10 15 20 25

Detection  Rate

Number  of  data  sets  for  a  single  gesture   Fig.  6:  Detection  rate  graph

Approaching  79%  with  no  false  positives

Page 13: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Approach

“The vision behind is to device newer interfaces and techniques, which will provide a total immersive experience for modeling 3D objects in real time.”

Page 14: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Create Visualize Share

Page 15: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Model Change  color

Change  texture

Stop  &  Save Pause

Load  object

Place  Object

Create Visualize Share

Recognition Modeling Improving

Page 16: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Visualize

6  DOF

Rotate

Zoom

Fork  and  Create

Stop

Load  object

Create Visualize Share

Recognition Modeling Improving

Page 17: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Share PDF

DXF

Save  in  Real  time

Create Visualize Share

Recognition Modeling Improving

Page 18: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Application

OpenNI NITE OpenGL Voice  Recognition

Voce (Sphinx4)

Page 19: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Render  scene

Voice  control

Set  perspective,  bgcolor,  etc.  

Detect  hand  location

Plot  spline  points  for  Sculpture

Draw  balls,  spheres,  etc

OpenGL

Display  /  Save

Recognition Modeling Improving

Page 20: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Results

Page 21: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Approach

“The vision behind is to device newer interfaces and techniques, which will provide a total immersive experience for modeling 3D objects in real time.”

Page 22: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Limitations of our recognizer

•  Trade off between accuracy and latency. •  Recognition using Kinect sensor ~ 20 pps •  As high amount of visual data is fed for processing

(which consists of pre-processing, feature extraction and classification) in real-time the latency of such a system is high.

Page 23: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Solution

•  Analogy with a GPS/IMU system used in Airplane navigation. o  GPS is the – Kinect Depth Sensor o  Inertial Measuring Unit (IMU) – smart phone sensors. o  Dead reckoning

•  We apply Data Fusion of data from Kinect and Smart phone sensors. o  Both have complementary data streams, which helps us to better

estimate the current state.

•  The user is holding a smart phone while hand recognition.

Page 24: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Gyroscope

Accelerometers

Integrate

Rotate  accelerometers  into  local  level  

navigation  frame

Remove  effect  of  gravity

Double  integrate

 

Position Orientation

Recognition Modeling Improving

IMU / Smart phone sensor – Position Estimation

!!

!!

Page 25: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Limitations of a IMU

•  Major problem is drifting; more in low-cost sensors (like in a smart phone).

•  If one of the accelerometers has a bias error of just 0.001 m/s2, the reported position output would diverge from the true position with an acceleration of 0.0098 m/s2—i.e. after a mere 30 seconds, the estimates would have drifted by 4.5 meters!

Page 26: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Fusion

Errors  estimates

Visual  Tracker

IMU Fusion Kalman  Filter  

Corrected  Position

Time  update

xk-­‐=Axk-­‐1+Buk-­‐1

P-­‐k=APk-­‐1AT+Q

 

Measurement  update

Kk=Pk-­‐HT(HPk-­‐HT+R)-­‐1

xk=xk-­‐+Kk(zk-­‐Hxk-­‐)

Pk=(I-­‐KkH)Pk-­‐

 

Page 27: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Fusion contd.

Android  phone

Kinect

Sensors

Fusion

Display

 

Application

Interaction

Page 28: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Results

Fig.  :  A  quick  loop  simulation.  Using  only  Kinect  (left).   Using  Kinect  and  IMU  (right)

Page 29: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Recognition Modeling Improving

Results contd. •  Kinect’s position was fed at the rate of 10Hz and

check against the original 20Hz. •  As compared to a usual Linear Interpolation, our

IMU assisted system improved the location estimates by 1.37 times.

•  Further tuning of initial parameters according to application can decrease the errors.

Page 30: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Conclusion

•  An architecture was presented to recognize

dynamic gestures from a depth camera using neural networks, which expanded the ways of interaction with 3D objects. ML

•  We also developed new set of gestures (e.g. pottery style) for 3D modeling. It resulted in structures of actual significance, which could be imported and used in other applications. HCI

•  Finally to improve location estimates, we showed how to integrate data from inertial sensors and Kinect to obtain high quality results. Data Fusion

Page 31: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Short Video Demo

Page 32: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Tools used

•  Hardware o  PC (4GB RAM, min 1 GB free space for environment, 2.3 GHz dual core) o  Microsoft Kinect Sensor o  Android phone (With accelerometers, Gyroscope, OS ver >= 2.3)

•  Software o  OpenNI/NITE o  Point cloud Library (PCL) o  OpenCV, OpenGL o  Processing, Eclipse IDE o  Voce Voice Recognition o  Neuroph o  Apache Common Math Library (for Kalman filter)

Page 33: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Future Directions

Our system is still far from its original vision, but in current state can be used for initial abstract designs.

•  More 3D Interaction Techniques o  Survey done on interaction by Chris Hand[2] can be used as a start point

to develop more interaction techniques for non-ambiguous set of gestures which helps users create sculptures in 3D.

•  Deep Learning o  Deep learning has proved to have more classification accuracy than

tradition ANN techniques, but it requires large data sets and hence computing power. But once trained it performs way better.

•  Others o  Many other features such as network based multi-user interaction,

enhanced brush and color support, integration of physics engine, increased interactivity with the user, multi-user support is possible. We can also involve multi-users to draw at the same instance, using robust distributed computing.

Page 34: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

References

1)  Roope Raisamo: “Multimodal Human- Computer Interaction:

a constructive and empirical study”, Academic Dissertation, University of Tampere, 1999

2)  Chris Hand, “A Survey of 3D Interaction Techniques”, Volume 016, (1997) number 005 pp. 269–281, Wiley, 1997

3)  Christoph Arndt and Otmar Loffeld, “Information gained by data fusion”, SPIE Conference Volume 2784, 1996

4)  Dipen Dave, Ashirwad Chowriappa and Thenkurussi Kesavadas, “Gesture Interface for 3D CAD Modeling using Kinect”, Computer-Aided Design & Applications, 9(a), 2012.

5)  Gabrielle Odowichuk, Shawn Trail, Peter Driessen, Wendy Nie“, Sensor Fusion: Towards a Fully Expressive 3D Music Control Interface”, University of Victoria, 2011

Page 35: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

References

6)  Matthew Tang, “Recognizing Hand Gestures with Microsoft’s

Kinect”, Stanford University, 2011 7)  Gabrielle Odowichuk, Shawn Trail, Peter Driessen, Wendy Nie,

“Sensor Fusion: Towards a Fully Expressive 3D Music Control Interface”, University of Victoria, 2012

8)  Rufeng Meng, Jason Isenhower, Chuan Qin, Srihari Nelakuditi ,“Can Smartphone Sensors Enhance Kinect Experience?”, MobiHoc’12, June 11–14, 2012

Page 36: Agenda - WordPress.com · 6) Matthew Tang, “Recognizing Hand Gestures with Microsoft’s Kinect”, Stanford University, 2011 7) Gabrielle Odowichuk, Shawn Trail, Peter Driessen,

Thank  you!

Questions?

End