teaching cars, robots and buildings to read human body...
Post on 30-May-2018
215 Views
Preview:
TRANSCRIPT
Copyright © 2017 1
Teaching Cars, Robots and Buildings to Read Human Body
Language using Webcams and GPUs
Copyright © 2017 2
Show me don’t tell meHuman are highly visual creaturesThe majority of communication is non-verbal
body languagetone of voice
Dogs are our best friends because they have the unique ability to read our body languageHow do we teach computers to read our body language?
Problem: How to allow Humans and Computers to Interact
Copyright © 2017 3
Motion CaptureMarker-basedSensor-based
IMUDepth
ShortcomingsExpensive to own / operateTime consuming to set up / useRequire specialty sensors / suitsSensitive to light conditions
Existing Tracking Solutions
Copyright © 2017 4
Eureka Moment: software “Kinect” using regular cameras and GPUs
Mobile AR Object Tracking
Copyright © 2017 5
Value Proposition No specialty sensors / use standard RGB monocular cameraNo specialty hardware / use GPUsCan be deployed in virtually any environment where humans can seeNo set up time
DurabilityLeverage consumer grade GPUsForm factor opens up new marketsFuture proof: system will only get better with more powerful GPUs and more data
Solution: Webcams, GPUs and Deep Learning
Copyright © 2017 6
BodySLAM™: Human Machine Interaction Engine
User Application
BodySLAM™
In: Video
Out: WhoWhat
WhereWhen
Copyright © 2017 7
Deep LearningAccurateRobustKeeps getting smarter
Real-timeEnables interactivity
No Specialty HardwareConsumer grade cameras & GPUs
BodySLAM™: Unique Features
Copyright © 2017 8
Live Demos
Copyright © 2017 9
BodySLAM: High Level Architecture
2D Human Detection
3D Human Detection
Human Tracker
Activity Recognition
Gesture Recognition
Background 2D Skeletons
3D Skeletons ID Location Activities Gestures
2D Video
2D Video
2D Video
BodySLAM™
Copyright © 2017 10
In Home for elder care, etc.In Store for retailIn Vehicle for awarenessIn Factories for accidents, etc.
Applications: Human Monitoring
Copyright © 2017 11
Applications: Robot Interaction
Copyright © 2017 12
Applications: Eyes for Virtual Assistants
Copyright © 2017 13
Applications: Health & Wellness
Copyright © 2017 14
Applications: Sport Analytics
Copyright © 2017 15
Applications: AR / VR
Copyright © 2017 16
Fast allowing real-time interactivityAccurate tracking of 63 body parts per person including fingersRobust across large numbers of people in crowded conditions
BodySLAM™: Unique Features
Copyright © 2017 17
TrainingEnhanced COCOProprietary Synthetic Data
Engine2D: VGG-19 backbone / 51 layers3D: Stacked Hourglass / 134 layersC++ Tracking
InferencingTensorRT (NVIDIA GPU)wrCNN (MPS iOS)
Tech Stack
Copyright © 2017 18
2D Training Pipeline
Copyright © 2017 19
Depth Training Pipeline
Copyright © 2017 20
Synthetic Data: Unity Game Engine
6 Human Models3 Male / 3 Female
Randomized skin color, clothing textures,
50 Animations100 BackgroundsRandomized lighting and camera position
Copyright © 2017 21
High Level Architecture
Copyright © 2017 22
GPU CPU OS Total(mSec)
2D CNN 3D CNN Misc
K80 Intel Xeon E5-2686v4 2.3GHz Linux 131 90 36 5
1080 Intel i7-5930K 3.5GHz Linux 37 22 12 3
1080 Intel i7-5930K 3.5GHz Windows 41 24 13 4
Titan XP Intel i7-5930K 3.5GHz Linux 30 16 11 3
Titan XP Intel i7-5930K 3.5GHz Windows 34 18 12 4
iPhone 6s iOS 1183 963 200 20iPhone 7 iOS 854 696 147 11
Runtime Performance
Copyright © 2017 23
Questions?
top related