the babl project - iit school of applied technology · pdf filewebrtc luis villaseñor...

35
1 The BaBL project Real-Time Closed-Captioning for WebRTC Luis Villaseñor Muñoz [email protected] 30 th April 2014

Upload: lamkhanh

Post on 12-Feb-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

1

The BaBL project Real-Time Closed-Captioning for

WebRTC

Luis Villaseñor Muñoz

[email protected]

30th April 2014

2

BaBL, version 1.0: Project Goal

To develop a proof of concept WebRTC conference

application that is able to use the WebRTC's data channel

for transmitting real-time captioning.

3

BaBL, version 1.0: Final result

4

BaBL, version 2.0: Project Goal

To develop a WebRTC multiconference application with

some extra features based on speech recognition as real-

time closed-captioning, instant translation or transcription

storage.

5

BaBL, version 2.0: Milestones

• Multiconference WebRTC application

• Real-Time Closed-Captioning

• Instant translation

• Transcription storage

6

Multiconference WebRTC application

• WebRTC, what is it?

WebRTC is a free, open project that enables web browsers with Real-

Time Communications capabilities.

• Its goal:

To enable rich, high quality, RTC applications to be developed in the

browser via simple Javascript APIs and HTML5.

[1] As stated in WebRTC.org.

7

WebRTC APIs

• MediaStream:

For acquiring audio and video.

• RTCPeerConnection:

For transmitting audio and video.

• RTCDataChannel:

For transmitting data.

8

MediaStream

• navigator.getUserMedia(constraints,

successCallback, errorCallback);

[2] Figure by Justin Uberti and Sam Dutton.

9

RTCPeerConnection

• Signaling:

Session description, ICE, STUN, TURN…

• Media engines:

Codecs, echo cancelation, noise reduction, jitter buffering…

• Security:

HTTPS, SRTP, DTLS…

10

WebRTC architecture

[1] Figure from WebRTC.org.

11

Signaling server

• NodeJS:

Web server and signaling server.

Fully implemented using Javascript.

• Socket.io:

NodeJS module that enables websockets between clients and server.

12

Calling: The establishment

Download webpage (HTTP/HTTPS)

getUserMedia

getUserMedia

Download webpage (HTTP/HTTPS)

New user joined (websocket)

Create room (websocket)

Join room (websocket)

PeerConnection PeerConnection

Offer (websocket) Offer (websocket)

Answer (websocket) Answer (websocket)

createOffer

createAnswer

ICE candidates (websocket)

Media streams (SRTP)

ICE candidates (websocket)

User A User B Server

13

Calling: The mesh

Room A Room B

14

Calling: ICE/STUN/TURN

• Interactive Connectivity Establishment (ICE):

RFC 5245. Candidates for IP address.

• Session Traversal Utilities for NAT (STUN):

Request and response.

• Traversal Using Relays around NAT (TURN):

STUN extension. Relay. Useful but resource-intensive.

15

Multiconference

accomplished!

16

Real-time closed-captioning

• Web speech API:

SpeechRecognition interface: For converting the voice into text.

• WebRTC data channel:

For sending the text to the other peers.

17

Web Speech API

• Another HTML5 API:

Specification by W3C.

• Only implemented on Chrome:

The voice is sent to Google’s speech recognition web service.

A JSON object with a list of possible matches is returned.

They use it for voice searches: https://www.google.com/

18

RTCDataChannel

• Bidirectional peer to peer:

Really low latency.

• Secure:

Datagram Transport Layer Security.

• Unreliable or reliable:

Latency or accuracy.

19

Challenges

• Subtitles should be switched on/off by the remote user

We send the remote user’s requests using the signaling server.

• Continuous recognition

We keep a list of user requesting subtitles.

• Microphone permission

We use HTTPS.

20

Architecture

1. Subtitles request

2. Subtitles request

3. Voice 4. Subtitles

5. Subtitles

User A User B

Google server

Signaling server

21

Real-time closed-captioning

accomplished!

22

Transcription storage

• Keeping record of our conversations:

Text is much lighter than audio or video.

And easier to find!

• Indexed DB:

One more HTML5 API.

Local storage in the client side.

23

Transcription storage accomplished!

24

Instant translation

• Translation services online:

They are not free.

• Microsoft Translator API:

Free 2 millions characters/month.

25

Challenges

• Should go through the server

My private developer key can’t be in the client side.

• When to request the translation?

isFinal flag. Not so real-time. But much cheaper!

26

Architecture

1. Subtitles request

2. Subtitles request

3. Voice 4. Subtitles

5. Subtitles

User A User B

6. Subtitles

7. Translated subtitles

8. Translated subtitles

Google Server

Signaling Server

Microsoft Translator Server

27

“Real-time” translation accomplished!

28

Wait! Last minute add-

on!

29

Spoken translated subtitles

• Speech Synthesis API:

The other interface included in the Web Speech API.

Chrome has some built-in speech engines.

30

Spoken translated subtitles

accomplished!

31

Conclusion

•Not perfect:

Programmed by just one person.

Using free resources.

These technologies are still under development.

•A little more time, a little more resources:

And Sci-Fi won’t be Sci-Fi anymore!

33

Questions?

34

Acknowledgements

• Don Monte and Nishant Agrawal

• Elias Yousef

• Javier Monte Condeoliva and Miguel Camacho Ruiz

• Tania Arenas de la Rubia

• Carol Davids

35

Thank you