the future of the in-car experience - nvidia€¦ · real time student feedback video & photo...
Post on 04-Sep-2020
2 Views
Preview:
TRANSCRIPT
@affectiva
The Future of the In-Car Experience
Abdelrahman Mahmoud Product Manager
Ashutosh Sanan Computer Vision Scientist
@affectiva
Affectiva Emotion AI
Interviewing Mood Tracking Social RobotsDrug Efficacy
Banking
Content Management (video / audio)
Focus Groups
Customer Analytics
Education Surveillance
Telehealth Academic Research
Connected devices / loT Health & Wellness
Social RoboticsMOOCs
Recruiting Market Research Legal
Mental health
Web Conferencing HealthcareReal time student feedback
Video & Photo organization AutomotiveFraud Detection
Retail
Virtual Assistants Online education Gaming
Live streaming
Telemedicine Security
In market products since 2011 • 1/3 of Fortune Global 100, 1400 brands • OEMs and Tier I suppliers
Emotion recognition from face and voice powers several industries
Built using real-world data • 6.5M face videos from 87 countries • 42,000 miles of driving quarterly
Recognized Market / AI Leader • Spun out of MIT Media Lab • Selected for Startup Autobahn and Partnership on AI
@affectiva
Affectiva Automotive AI
The Problem Affectiva Solution
Driver Safety
Transitions in control in semi-autonomous vehicles
(e.g. the L3 handoff problem)
Current solutions based on steering wheel sensors
are irrelevant in autonomous driving
Next generation AI based system to monitor and manage driver capability
for safe engagement
Occupant Experience
Differentiated and monetizable in-cab experience (e.g. the L4 luxury
car challenge)
First in-market solution for understanding occupant state
and mood to enhance overall in-cab experience
@affectiva
External Context Weather Traffic Signs Pedestrians
People Analytics
Personal Context Identity Likes/dislikes & preferences Occupant state history Calendar
In-Cab Context Occupant relationshipsInfotainment content Inanimate objects Cabin environment
Emotion AIFacial expressions
Tone of voice Body posture
People Analytics
Anger Surprise
Distraction Drowsiness Intoxication
Cognitive Load
Enjoyment Attention
Excitement Stress
Discomfort Displeasure
People Analytics context-aware with Emotion AI as the foundational technology.
Personalization Individually customized baseline Adaptive environment Personalization across vehicles
Safety Next generation driver monitoring Smart handoff Proactive intervention
Monetization Differentiation among brands Premium content delivery Purchase recommendations
@affectiva
Affectiva approach to addressing Emotion AI complexities
Data
Our robust and scalable data strategy enables us to acquire large and diverse data sets, annotate these
using manual and automated approaches.
Algorithms
Using a variety of deep learning, computer vision and speech processing approaches, we have
developed algorithms to model complex and
nuanced emotion and cognitive states.
Team
Our team of researchers and technologists have deep
expertise in machine learning, deep learning, data
science, data annotation, computer vision and speech
processing
Infrastructure
Deep learning infrastructure allows for rapid
experimentation and tuning of models as wells as large scale data processing and
model evaluation.
@affectiva
World’s largest emotion data repository87 countries, 6.5M faces analyzed, 3.8B facial frames Includes people emoting on device, and while driving
Top Countries for Emotion Data
USA1,166K
MEXICO150K
BRAZIL194K
GERMANY148K
UNITED KINGDOM265K CHINA
562KJAPAN61K
VIETNAM148KPHILIPPINES159K
INDONESIA325K
THAILAND184K
INDIA1,363K
@affectiva
Data StrategyTo develop a deep understanding of the state of occupants in a car, one needs large amounts of data. With this data we can develop algorithms that can sense emotions and gather people analytics in real world conditions.
Foundational proprietary data will drive value to accelerate data partner ecosystem
Spontaneous occupant data
Using Affectiva Driver Kits and Affectiva Moving Labs to collect naturalistic driver and occupant data to develop metrics that are robust to
real-world conditions
Data partnershipsAcquire 3rd party natural in-cab data through academic and commercial partners (MIT AVT, fleet operators, ride-share companies)
Simulated data
Collect challenging data in safe lab simulation environment to augment the spontaneous driver dataset and bootstrap algorithms (e.g. drowsiness, intoxication) multi-spectral & transfer learning.
Auto Data Corpus
Affectiva Confidential
@affectiva
@affectiva
Affectiva approach to addressing Emotion AI complexities
Data
Our robust and scalable data strategy enables us to acquire large and diverse data sets, annotate these
using manual and automated approaches.
Algorithms
Using a variety of deep learning, computer vision and speech processing approaches, we have
developed algorithms to model complex and
nuanced emotion and cognitive states.
Team
Our team of researchers and technologists have deep
expertise in machine learning, deep learning, data
science, data annotation, computer vision and speech
processing
Infrastructure
Deep learning infrastructure allows for rapid
experimentation and tuning of models as wells as large scale data processing and
model evaluation.
@affectiva
Algorithms
@affectiva
Deep learning advancements driving the automotive roadmap The current SDK consists of deep learning networks that: • Face detection: given an image, detect faces • Landmark localization: given a image + bounding box, detect and track landmarks • Facial analysis: detect facial expression/emotion/attributes
Face detection(RPN + bounding boxes)
image
Landmark localization (Regression + confidence)
Facial analysis(Multi-task CNN/RNN)
face image
per face analysis
bounding boxes
Region Proposal Network
Shared Conv.
Shared Conv.
Shared Conv.
Classification
Landmarkestimate
Landmarkrefinement
ConfidenceEmotions
Temporal Expressions
Attributes
@affectiva
Task: Facial Action/Emotion Recognition
• Given a face classify the corresponding visual expression/emotion occurrence.
• Many Expressions: Facial muscles generate hundreds of facial expressions/emotions.
• Multi-Attribute Classification
• Fast enough to run on mobile/embedded devices.
Joy
Yawn
Eye Brow Raise
@affectiva
Is a single image always enough?
Giphy
@affectiva
Information in Time
Emotional state continuously evolving process over time.
Adding temporal information makes it easier to detect highly subtle changes in facial state.
How to utilize temporal information • Use post-processing based over static classifier output using previous predictions and images. • Use Recurrent Architectures.
Inte
nsity
of E
xpre
ssio
n
0
26
53
79
105
1 2 3 4 5 6 7 8 9
TIME
@affectiva
Spatio-Temporal Action Recognition
CNN
CNN
CNN
L S T M
Temporal Sequence of Frames
Spatial Feature Extraction
0
0
0.5 , , , ,
0.8Learning temporal
structure
Frame Level Classification
Yawn Recognition using CNN + LSTM
@affectiva
Training Challenges & Inferences
@affectiva
Data challenges
While training RNN’s expect a continuous temporal sequence.
Missing facial frames • Bad lighting • Face out of view • Face not visible
Possible Solutions• Use shorter and fixed continuous sequences with no missing data • Copy the last state of the sequence. Repeat last tracked frame • Mask the missing frames
Missing human annotations Facial frames not labeled by humans
Missing Frames in Sequence
@affectiva
Masking vs Copying last stateResults indicate that masking works better than copying the last state
Chart Title
0.94
0.948
0.955
0.963
0.97
ROC-AUC Val Acc
Using last state Masking
@affectiva
ExpressionsInput A
Input B
Frozen Feature Extractors
Yawn
Transfer
Two approaches to train our model:
• Train both convolution and recurrent filters jointly.
• Transfer learning using previously learned convolutional filters.
How to train a Spatio-Temporal model?
@affectiva
Transfer learning for runtime performance
Chart Title
0.961
0.963
0.966
0.968
0.97
ROC-AUC Val Acc
Fixed Weights Fully Trainable
Shared Conv.
Emotions
Temporal Expressions
Attributes
Intelligent Filter Reuse
• Increased runtime performance to run on mobile.
• Minimal benefit by tuning filters from scratch.
• Large real-world dataset for pretrained filters.
Usage of transfer learning to help with the runtime performance
@affectiva
Yawn ROC-AUC Performance (Temporal vs Static)
0.93
0.94
0.95
0.96
0.97
Static Temporal
Smile ROC-AUC Performance (Temporal vs Static)
0.93
0.938
0.946
0.954
0.962
Static Temporal
Outer Brow Raiser-AU02 ROC-AUC Performance (Temporal vs Static)
0.86
0.868
0.875
0.883
0.89
Static Temporal
Does temporal info always help?
@affectiva
Models in Action
@affectiva
Key Takeaways
• Not all the metrics are benefited by adding complex temporal information
• Using all the data (complete & partial sequences) definitely helps the model
• Masking works better with partial sequences than copying last frames
• Intelligent filters reuse makes it possible to deploy these models on mobile with real-time performance
What’s next ?
@affectiva
ANGER 100.00
JOY 0.00SMILE 0.00EXPRESSIVENESS 92.00
FATIGUE 98.00
EYE CLOSURE 100.00
SMILE 0.00EXPRESSIVENESS 57.00
CONCENTRATION 85.00FEAR 78.00JOY 0.00EXPRESSIVENESS 68.00
• Analyze the effects of difference in frame rate at deployment vs training.
• Use facial markers to create a drowsiness intensity metric.
@affectiva
Q&A
Learn more at affectiva.com
top related