recognizing the impact of ai · ai and media metadata management advances in computer vision...
Post on 25-Aug-2020
1 Views
Preview:
TRANSCRIPT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
David Pearson, AWS AI Services
May 2017
Recognizing the Impact of AI
Media and Entertainment
– in –
Media Metadata Management
Audience Engagement
Lifelike Speech
Recognizing the Impact of AI…
AI and Media Metadata Management
Advances in computer vision enables:
• Detection of objects, scenes, and concepts in images
• Estimation of age range, gender and emotion in faces
• Recognition of individuals in images and video
Using AI to Extract Metadata from Visual Content
objects, scenes, facial attributes, people
rich media
index
Deer 98.8%
Wildlife 95.1%
Conifer 95.1%
Spruce 95.1%
Wood 78.3%
Tree 63.5%
Forest 63.5%
Vegetation 61.9%
Pine 60.6%
Outdoors 54.0%
Flower 53.9%
Plant 52.9%
Nature 50.7%
Field 50.7%
Grass 50.7%
smart cropping
& ad overlays
demographic &
sentiment analysis
face editing
& pixelation
Age Range 38-59
Beard: False 84.3%
Emotion: Happy 86.5%
Eyeglasses: False 99.6%
Eyes Open: True 99.9%
Gender: Male 99.9%
Mouth Open: False 86.2%
Mustache: False 98.4%
Smile: True 95.9%
Sunglasses: False 99.8%
Landmarks
EyeLeftEyeRightNoseMouthLeftMouthRightLeftPupilRightPupilLeftEyeBrowLeftLeftEyeBrowRightLeftEyeBrowUp
:
Audience Analysis
• Touchless data gathering via cameras facing the audience
• Anonymous, high volume demographic and sentiment capture
• Analysis produces usable feedback trends and patterns
AUDIENCE CAMERA
Facial Matching and Recognition
How AI Analyzes Faces
Face Detection Landmark Feature Extraction Identification/Recognition
Attributes Verification/Comparison
Index/SearchEstimated age range,
gender, and emotion;
facial hair, smiling++
Face comparison,
match, index and
search
C-SPAN’s Index of
Public Figures
AI and Active Audience Engagement
Advances in chatbot technologies enable:
• Fan exchanges with character bots via social
media, mobile and web apps
• Employee conversations with internal support bots
for help desk assistance
• Spoken interactions between executives and
enterprise information
I’d like to book a flight to London
Sure! Do you want to fly to Heathrow or Gatwick?
Conversational Chatbots
Heathrow, pleaseDestination:
LHR
Conversational Chatbots
I’d like to book a flight to London
Sure! Do you want to fly to Heathrow or Gatwick?
When would you like to fly?
Next WednesdayDeparture:
5/31/2017
Conversational Chatbots
Heathrow, pleaseDestination:
LHR
I’d like to book a flight to London
Sure! Do you want to fly to Heathrow or Gatwick?
Origin
Destination
Departure Date
Flight Booking
“I’d like to book a flight
to London”
Automatic
Speech RecognitionNatural Language
Understanding
Book Flight
London
Utterances
Flight booking
London Heathrow
LHR
LocationLocation
LAX
Prompt
“When would you like to fly?”
“When would you
like to fly?”
Text To
Speech
Intent /Slot model
UserPreferences
Origin
Destination
Departure Date
Flight Booking
“Next Wednesday”Automatic
Speech Recognition
Next Wednesday
Natural Language
Understanding
Flight booking
05 / 31 / 2017
LHR
LAX
05/31/2017
Confirmation
“Your flight is booked for next Wednesday”
“Your flight is booked
for next Wednesday”
Fulfilment
Utterances
Intent /Slot model
Text To
Speech
AI and Lifelike Speech
Advances in speech to text technologies enable:
• Computer-generated natural speech
• Automatic, accurate text processing
• Intelligible and easy to understand
• Semantic additions to text
• Customized pronunciation
Text To Speech Quality
Natural sounding speech• A subjective measure of how close is TTS output to human speech
Accurate text processing• Ability of the system to interpret common text formats such as
abbreviations, numerical sequences, homographs etc.
Today in Las Vegas, NV it's 90°F.
"We live for the music", live from the Madison Square Garden.
Highly intelligibile• A measure of how comprehensible speech is.
“Peter Piper picked a peck of pickled peppers.”
Improving Text to Speech with SSML
Speech Synthesis Markup Language
• W3C recommendation, an XML-based markup
language for speech synthesis applications
<speak>
My name is Kuklinski. It is spelled
<prosody rate='x-slow'>
<say-as interpret-as="characters">Kuklinski</say-as>
</prosody>
</speak>
Custom Pronunciation with Lexicons
Enables developers to customize the pronunciation of
words or phrases
My daughter’s name is Kaja.
<lexeme>
<grapheme>Kaja</grapheme>
<grapheme>kaja</grapheme>
<grapheme>KAJA</grapheme>
<phoneme>"kaI.@</phoneme>
</lexeme>
Speech Synchronization
Synchronize speech with visual content for more lifelike
speech behavior from characters & avatars
• Request an additional stream of TTS metadata
containing sentence word timings
• Use the metadata stream alongside the synthesized
speech audio stream to sync audio and visual
Amazon AI
Intelligent Services Powered By Deep Learning
https://aws.amazon.com/blogs/ai/
https://aws.amazon.com/amazon-ai/
“The future is here,
it’s just not evenly distributed yet”
William Gibson
Thank You!
pearsond@amazon.com
top related