agenda - wordpress.com · 6) matthew tang, “recognizing hand gestures with microsoft’s...

Agenda •  Vision •  Current State •  Our Approach - towards three main Areas

o  Dynamic Gesture Recognition •  Using Machine Learning

o  Modeling 3D objects •  Building the environment •  Modeling styles •  Other features

o  Improving Accuracy and Interface •  IMU/Kinect integration •  Mobile interface

•  Conclusion •  Demo •  Future Directions •  References

Vision

“Device newer interfaces and techniques, which will provide a total immersive experience for modeling 3D

objects in real time.”

Current State •  Wacom & AutoCAD

o  It is the industry standard to create 3D objects in AutoCAD using a Wacom pen tablet.

o  Problem is that we think and draw in 3D, but the tablets interface is 2D. We can do better by moving towards natural and 3D gestures which takes the imagination of artist to the next level.

o  No commercial application exists which solves this problem. It is still in research. Major problems are Accuracy, Precision and Control.

Approach

Recognition Modeling Improving


Sensor Library

Application ML (Training)

Sample

Sample

Data

Sensor

Application

ML (Recognition)

Data Classification

Sample

Fig. 1: Training Fig. 2: Classification

Basic


Fig. 3: Training

Training of a new gesture

Kinect Depth Sensor

Library (last k frames)

Application Artificial Neural

Network (Training)

Sample

Depth Frames

Depth Frames

OpenNI 3D Coordinates

Supervise


Fig. 4: Classification

Classification of gesture

Kinect depth sensor

Application

Artificial Neural Network

(Recognition)

Classification

3D Coordinates OpenNI

Depth frames

Depth frames


ANN

Fig. 5: ANN Net

x y

z

x y

z

x y

z

. . .

3 x k inputs

t

t-‐‑1

t-‐‑k

h1

h2

h3

h4

G1

G2

G3

Gp

. . .

‘p’ outputs

hn

. . .

•  k -> discrete time constant, over which gesture is performed.

•  p -> number of gestures. •  n -> number of neurons in hidden layer. •  Complete Graph between input and hidden layer,

and between hidden layer and output layer. •  Tangent sigmoidal function f(x) used at neurons

•  Learns using Back propagation of errors


ANN-Features

!(!) != !!! − !!!!! + !!! !


ANN-Limitations

•  The net rebuilds itself upon addition of a new gesture to the library.

•  Needs comparatively large data sets for training. •  Detection rate drops if considering both hands.


Results

0 10 20 30 40 50 60 70 80 90

0 5 10 15 20 25

Detection Rate

Number of data sets for a single gesture Fig. 6: Detection rate graph

Approaching 79% with no false positives


Approach

“The vision behind is to device newer interfaces and techniques, which will provide a total immersive experience for modeling 3D objects in real time.”


Create Visualize Share

Model Change color

Change texture

Stop & Save Pause

Load object

Place Object



Visualize

6 DOF

Rotate

Zoom

Fork and Create

Stop

Load object



Share PDF

DXF

Save in Real time




Application

OpenNI NITE OpenGL Voice Recognition

Voce (Sphinx4)

Render scene

Voice control

Set perspective, bgcolor, etc.

Detect hand location

Plot spline points for Sculpture

Draw balls, spheres, etc

OpenGL

Display / Save



Results


Approach

“The vision behind is to device newer interfaces and techniques, which will provide a total immersive experience for modeling 3D objects in real time.”


Limitations of our recognizer

•  Trade off between accuracy and latency. •  Recognition using Kinect sensor ~ 20 pps •  As high amount of visual data is fed for processing

(which consists of pre-processing, feature extraction and classification) in real-time the latency of such a system is high.


Solution

•  Analogy with a GPS/IMU system used in Airplane navigation. o  GPS is the – Kinect Depth Sensor o  Inertial Measuring Unit (IMU) – smart phone sensors. o  Dead reckoning

•  We apply Data Fusion of data from Kinect and Smart phone sensors. o  Both have complementary data streams, which helps us to better

estimate the current state.

•  The user is holding a smart phone while hand recognition.

Gyroscope

Accelerometers

Integrate

Rotate accelerometers into local level

navigation frame

Remove effect of gravity

Double integrate

Position Orientation


IMU / Smart phone sensor – Position Estimation

!!

!!


Limitations of a IMU

•  Major problem is drifting; more in low-cost sensors (like in a smart phone).

•  If one of the accelerometers has a bias error of just 0.001 m/s2, the reported position output would diverge from the true position with an acceleration of 0.0098 m/s2—i.e. after a mere 30 seconds, the estimates would have drifted by 4.5 meters!


Fusion

Errors estimates

Visual Tracker

IMU Fusion Kalman Filter

Corrected Position

Time update

xk-‐=Axk-‐1+Buk-‐1

P-‐k=APk-‐1AT+Q

Measurement update

Kk=Pk-‐HT(HPk-‐HT+R)-‐1

xk=xk-‐+Kk(zk-‐Hxk-‐)

Pk=(I-‐KkH)Pk-‐


Fusion contd.

Android phone

Kinect

Sensors

Fusion

Display

Application

Interaction


Results

Fig. : A quick loop simulation. Using only Kinect (left). Using Kinect and IMU (right)


Results contd. •  Kinect’s position was fed at the rate of 10Hz and

check against the original 20Hz. •  As compared to a usual Linear Interpolation, our

IMU assisted system improved the location estimates by 1.37 times.

•  Further tuning of initial parameters according to application can decrease the errors.

Conclusion

•  An architecture was presented to recognize

dynamic gestures from a depth camera using neural networks, which expanded the ways of interaction with 3D objects. ML

•  We also developed new set of gestures (e.g. pottery style) for 3D modeling. It resulted in structures of actual significance, which could be imported and used in other applications. HCI

•  Finally to improve location estimates, we showed how to integrate data from inertial sensors and Kinect to obtain high quality results. Data Fusion

Short Video Demo

Tools used

•  Hardware o  PC (4GB RAM, min 1 GB free space for environment, 2.3 GHz dual core) o  Microsoft Kinect Sensor o  Android phone (With accelerometers, Gyroscope, OS ver >= 2.3)

•  Software o  OpenNI/NITE o  Point cloud Library (PCL) o  OpenCV, OpenGL o  Processing, Eclipse IDE o  Voce Voice Recognition o  Neuroph o  Apache Common Math Library (for Kalman filter)

Future Directions

Our system is still far from its original vision, but in current state can be used for initial abstract designs.

•  More 3D Interaction Techniques o  Survey done on interaction by Chris Hand[2] can be used as a start point

to develop more interaction techniques for non-ambiguous set of gestures which helps users create sculptures in 3D.

•  Deep Learning o  Deep learning has proved to have more classification accuracy than

tradition ANN techniques, but it requires large data sets and hence computing power. But once trained it performs way better.

•  Others o  Many other features such as network based multi-user interaction,

enhanced brush and color support, integration of physics engine, increased interactivity with the user, multi-user support is possible. We can also involve multi-users to draw at the same instance, using robust distributed computing.

References

1)  Roope Raisamo: “Multimodal Human- Computer Interaction:

a constructive and empirical study”, Academic Dissertation, University of Tampere, 1999

2)  Chris Hand, “A Survey of 3D Interaction Techniques”, Volume 016, (1997) number 005 pp. 269–281, Wiley, 1997

3)  Christoph Arndt and Otmar Loffeld, “Information gained by data fusion”, SPIE Conference Volume 2784, 1996

4)  Dipen Dave, Ashirwad Chowriappa and Thenkurussi Kesavadas, “Gesture Interface for 3D CAD Modeling using Kinect”, Computer-Aided Design & Applications, 9(a), 2012.

5)  Gabrielle Odowichuk, Shawn Trail, Peter Driessen, Wendy Nie“, Sensor Fusion: Towards a Fully Expressive 3D Music Control Interface”, University of Victoria, 2011

References

6)  Matthew Tang, “Recognizing Hand Gestures with Microsoft’s

Kinect”, Stanford University, 2011 7)  Gabrielle Odowichuk, Shawn Trail, Peter Driessen, Wendy Nie,

“Sensor Fusion: Towards a Fully Expressive 3D Music Control Interface”, University of Victoria, 2012

8)  Rufeng Meng, Jason Isenhower, Chuan Qin, Srihari Nelakuditi ,“Can Smartphone Sensors Enhance Kinect Experience?”, MobiHoc’12, June 11–14, 2012

Thank you!

Questions?

End

agenda - wordpress.com · 6) matthew tang, “recognizing hand gestures with microsoft’s...

Documents