recognizing the impact of ai · ai and media metadata management advances in computer vision...

Post on 25-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

David Pearson, AWS AI Services

May 2017

Recognizing the Impact of AI

Media and Entertainment

– in –

Media Metadata Management

Audience Engagement

Lifelike Speech

Recognizing the Impact of AI…

AI and Media Metadata Management

Advances in computer vision enables:

• Detection of objects, scenes, and concepts in images

• Estimation of age range, gender and emotion in faces

• Recognition of individuals in images and video

Using AI to Extract Metadata from Visual Content

objects, scenes, facial attributes, people

rich media

index

Deer 98.8%

Wildlife 95.1%

Conifer 95.1%

Spruce 95.1%

Wood 78.3%

Tree 63.5%

Forest 63.5%

Vegetation 61.9%

Pine 60.6%

Outdoors 54.0%

Flower 53.9%

Plant 52.9%

Nature 50.7%

Field 50.7%

Grass 50.7%

smart cropping

& ad overlays

demographic &

sentiment analysis

face editing

& pixelation

Age Range 38-59

Beard: False 84.3%

Emotion: Happy 86.5%

Eyeglasses: False 99.6%

Eyes Open: True 99.9%

Gender: Male 99.9%

Mouth Open: False 86.2%

Mustache: False 98.4%

Smile: True 95.9%

Sunglasses: False 99.8%

Landmarks

EyeLeftEyeRightNoseMouthLeftMouthRightLeftPupilRightPupilLeftEyeBrowLeftLeftEyeBrowRightLeftEyeBrowUp

:

Audience Analysis

• Touchless data gathering via cameras facing the audience

• Anonymous, high volume demographic and sentiment capture

• Analysis produces usable feedback trends and patterns

AUDIENCE CAMERA

Facial Matching and Recognition

How AI Analyzes Faces

Face Detection Landmark Feature Extraction Identification/Recognition

Attributes Verification/Comparison

Index/SearchEstimated age range,

gender, and emotion;

facial hair, smiling++

Face comparison,

match, index and

search

C-SPAN’s Index of

Public Figures

AI and Active Audience Engagement

Advances in chatbot technologies enable:

• Fan exchanges with character bots via social

media, mobile and web apps

• Employee conversations with internal support bots

for help desk assistance

• Spoken interactions between executives and

enterprise information

I’d like to book a flight to London

Sure! Do you want to fly to Heathrow or Gatwick?

Conversational Chatbots

Heathrow, pleaseDestination:

LHR

Conversational Chatbots

I’d like to book a flight to London

Sure! Do you want to fly to Heathrow or Gatwick?

When would you like to fly?

Next WednesdayDeparture:

5/31/2017

Conversational Chatbots

Heathrow, pleaseDestination:

LHR

I’d like to book a flight to London

Sure! Do you want to fly to Heathrow or Gatwick?

Origin

Destination

Departure Date

Flight Booking

“I’d like to book a flight

to London”

Automatic

Speech RecognitionNatural Language

Understanding

Book Flight

London

Utterances

Flight booking

London Heathrow

LHR

LocationLocation

LAX

Prompt

“When would you like to fly?”

“When would you

like to fly?”

Text To

Speech

Intent /Slot model

UserPreferences

Origin

Destination

Departure Date

Flight Booking

“Next Wednesday”Automatic

Speech Recognition

Next Wednesday

Natural Language

Understanding

Flight booking

05 / 31 / 2017

LHR

LAX

05/31/2017

Confirmation

“Your flight is booked for next Wednesday”

“Your flight is booked

for next Wednesday”

Fulfilment

Utterances

Intent /Slot model

Text To

Speech

AI and Lifelike Speech

Advances in speech to text technologies enable:

• Computer-generated natural speech

• Automatic, accurate text processing

• Intelligible and easy to understand

• Semantic additions to text

• Customized pronunciation

Text To Speech Quality

Natural sounding speech• A subjective measure of how close is TTS output to human speech

Accurate text processing• Ability of the system to interpret common text formats such as

abbreviations, numerical sequences, homographs etc.

Today in Las Vegas, NV it's 90°F.

"We live for the music", live from the Madison Square Garden.

Highly intelligibile• A measure of how comprehensible speech is.

“Peter Piper picked a peck of pickled peppers.”

Improving Text to Speech with SSML

Speech Synthesis Markup Language

• W3C recommendation, an XML-based markup

language for speech synthesis applications

<speak>

My name is Kuklinski. It is spelled

<prosody rate='x-slow'>

<say-as interpret-as="characters">Kuklinski</say-as>

</prosody>

</speak>

Custom Pronunciation with Lexicons

Enables developers to customize the pronunciation of

words or phrases

My daughter’s name is Kaja.

<lexeme>

<grapheme>Kaja</grapheme>

<grapheme>kaja</grapheme>

<grapheme>KAJA</grapheme>

<phoneme>"kaI.@</phoneme>

</lexeme>

Speech Synchronization

Synchronize speech with visual content for more lifelike

speech behavior from characters & avatars

• Request an additional stream of TTS metadata

containing sentence word timings

• Use the metadata stream alongside the synthesized

speech audio stream to sync audio and visual

Amazon AI

Intelligent Services Powered By Deep Learning

https://aws.amazon.com/blogs/ai/

https://aws.amazon.com/amazon-ai/

“The future is here,

it’s just not evenly distributed yet”

William Gibson

Thank You!

pearsond@amazon.com

top related