experiments in videoconferencing · 6/14/2005  · why is videoconferencing not ubiquitous...

Post on 14-Oct-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Experiments in Videoconferencing

Milton ChenCTO

http://vseelab.com

The VSee Auditorium

desktop interface

15’ x 5’ video wall

VSee

2nd place Stanford-Berkeley Innovator’s award3rd place Stanford business plan competition

Intel CEO Paul Otellini keynoteOracle Executive VP Chuck Rozwat keynote

Chuck Rozwat keynote

“the breakthrough that collaboration gurus have been hunting for” -

Jack HirschVP of TechnologyShell

“the world’s best videoconferencing system” - Cdr. Eric Rasmussen

Iraq Humanitarian Operations Center Department of Defense

“uniquely suited for planetwalk” -

John FrancisGoodwill AmbassadorUnited Nations

What if there is no network infrastructure ?

Office of Secretary of Defense, State Department, NATO, United Nations …

Strong AngelKona, Hawaii

17-22 July 2004

VSee was selected as the real-time communication system

VSee at Strong Angel

Provide global communication from a temporary shelter

VIP presentation between Kona and DC

Ad-hoc peer-to-peer WiFi

~ 0.5 mile ~1 - 10 mile

Experiment 1: convoy protection

VSee hops from car to car

Can also airdrop arbitrary data

setupscreen shot

Experiment 2: air-to-surface

Experiment 3: ocean search and rescue

The bottom video was from the live underwater camera held by the swimmer. The map with GPS annotation was shared using VSee

setupscreen shot

Experiment 3: ocean search and rescue

Experiment 3: ocean search and rescue

no pre-existing infrastructure

VSee leverages what you have– Internet– Internet2– Satellite– WiMax– Cell phone

VSee ad-hoc peer-to-peer WiFi– Laptop + wireless card is all you need

Afghanistan

Visual fidelity comparable to high-end hardwareSecure (FIPS 140-2 and triple 256 bit AES)Never crash (59-day challenge)Trivial to use (less than 60 seconds for 1st time users)

KabulNov 2004

March 2005From VSee deployment team

VSee for tsunami relief

UN headquarters in Jakarta

VSee in Darfur for refugee management

CARE International field officeSudan, Africa

but

Why is videoconferencing not ubiquitous

World’s first videoconferencing system

75 years later– Technology limitations

– Inadequate visual communication science

April 7, 1927 - Bell Labs3x2 inch black&white display1 msec end-to-end latency

VSeePeer-to-peer wireless

How well can we judge eye contact

“The heart is stirred more slowly by the ear than by the eye.”– Horace

Eye contact stirs us to action

[Sharbat Gula, photographed by McCurry ‘83]

Eye contact fires up our brain

[Kampe et al. ’01 Nature]

Eye contact sensitivity is high

Spatial perception taskAs good as Snellen acuity

[Gibson and Pick ’63]

2 m

0 8.5-8.50

100stdev = 2.8°

Eye

con

tact

(%)

Angle (deg)

* 6 observers judged 1 looker

looker observer

Sensitivity is symmetricCline ’67Kruger and Huckstedt ‘69Anstis, et al. ’69Stokes ’69 Ellgring ’70

PicturePhonecamera above display

Hydracamera below display

Eye contact is difficult

Looking into the camera Attempting eye contact

Solutions to eye contact

Half-silvered mirror [Rosenthal ’47] MAJIC [Okada, et al. ’94]

ClearBoard [Ishii, et al. ’92] GazeMaster [Gemmell, et al. ’00]

Methodology

Observers watch videos of looker

Large display with camera at the center

Eye contact?

Sensitivity is asymmetric

* 16 observers judged recorded videos of 1 looker

An anatomical explanation

looking at you looking sideways

looking up

looking down eye closing

Illustrations from The Artist’s Guide to Facial Expression[Faigin ’90]

VSeeEye contact

How well can we judge lip sync

“We shape our tools, and there after our tools shape us” - Marshal Mcluhan

Why read lips

Improves comprehension – Background noise [Sumby and Pollack ’54]– Hearing loss [Binnie, Montgomery, Jackson ’86]

[Yarbus ’67]

Audio ahead of the video

Videoconferencing– 1 msec to encode audio– Up to 250 msec to encode MPEG-4

Detectable skew130 msec [Dixon and Spitz ’80]80 msec [Steinmetz ’96]

Conventional lip synchronization

encodenetworkdecode

A

a v

time

Unsynchronized

encodenetworkdecodesync

a, v

Audio delay lineA

delayskew

Attribute delay and skew to remote person

=> person is not believable?=> person is slow?

[Reeves and Nass ’96]

encodenetworkdecode

A

a v

time

Unsynchronized

encodenetworkdecodesync

a, v

Audio delay lineA

delayskew

A new lip sync method

encodenetworkdecodesync

synchronized and low perceived latency

a v a v

encodenetworkdecode

A

a v

time

Unsynchronized

encodenetworkdecodesync

a, v

Audio delay lineA

Round trip delay

Methodology

Recorded 3 speakers– 44.1KHz x 16 bps uncompressed audio– 320x240x30fps uncompressed video– Sentences consist of easy to lipread words

Speaker 1female native

speaker

Speaker 2male native

speaker

Speaker 3male non-native

speaker

Perception of variable AV skew

* 16 subjects judging recorded videos

0

25

50

75

100

200,unsync 200,new sync

initial skew (msec) , stretch period

lip sy

nchr

oniza

tion

(%)

VSeeEye contact

Lip syncWhat frame rate is necessary

“We express ourselves into existence.” – Iris Murdoch

Minimum required frame rate

Full motion 10-30 fps

Tolerable 5 fps– [Tang and Isaac ’93]

Lip synchronization 5 fps– [Watson and Sasse ’96]

Content understanding 5 fps– [Ghinea and Thomas ’98]

Sign language recognition 1 fps– [Johnson and Caird ’96]

Gesture Detection Algorithm

input image frame difference after erosion

Visualization of algorithm

Gesture sensitive transmission allows dynamic discussion

15 fps ~0.2 fps 0.2 fps

0

1

2

3

4

5

full motion gesture sensitive low update

spea

ker c

hang

e per

min

ute )

* 8 groups of 4 people during a discussion* requires 10% of full motion bandwidth

Other studiessmile recognition time

0

350

700

0 10 20 30

video size (deg of visual angle)

time

(mse

c)

Importance of f2f interaction

0%

50%

100%

students TAs faculty

extremelyverymoderatelysomewhatnot

[Conveying ConversationalCues Through Video PhD Dissertation, 2003]

When is a smile not a smile

Value of f2f for discussion

Visualizing the pulse of Classroom

VSeeEye contact

Lip syncGestureTelework

“Laugher is the shortest distance between two people”– Victor Borge

VSee customers

telework => less money and influence

Reasons to teleworkBusiness continuity

Manage by results vs. time…

No commuteLife style

but

no tool is able to bridge the physical distance

VSee Lab experiment

Everybody works from home,– hotels, cafes, libraries, airports, … since June 2003– California, Michigan, Scotland, Taiwan, Malaysia

Almost all customer interaction via VSeeProduct support via desktop sharingProduct development via application sharingAvailability via presence indicator

Initial results

What doesn’t work– Still a sense of isolation

• Company meals and outings are critical!• Office of future will be social clubs?

– Remote whiteboard

A surprising bonus– Uninterrupted time to think– Building personal relationships

SummaryVSee

Eye contactLip syncGestureTelework

I love to hear from youmilton.chen@vseelab.com

top related