a robust abstraction for first-person video streaming: techniques, applications, and experiments...

40
A Robust Abstraction for First- Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department of Computer Science and Engineering University of California, San Diego

Post on 20-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

A Robust Abstraction for First-Person Video Streaming: Techniques,

Applications, and Experiments

Neil J. McCurdyWilliam G. Griswold

Leslie A. Lenert

Department of Computer Science and Engineering

University of California, San Diego

Page 2: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

2

Why stream first-person video?

• Remote vision at dangerous job sites– Disaster Response– Hazmat– SWAT

• Live streams for remote loved ones– My-day live diaries

• Citizen reporting– Cell-phone cameras broadcasting

news-worthy events– Think YouTube, but live– No tripods, no expert camera work

Page 3: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

3

Why stream first-person video?

• Remote vision at dangerous job sites– Disaster Response– Hazmat– SWAT

• Live streams for remote loved ones– My-day live diaries

• Citizen reporting– Cell-phone cameras broadcasting

news-worthy events– Think YouTube, but live– No tripods, no expert camera work

Page 4: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

4

Challenges of first-person video

• Limited bandwidth “in the wild”– Cellular networks (60-80 Kbps)– Multiple cameras on 802.11 drops total

throughput

• First-person video compression is difficult– Low inter-frame overlap reduces

compression opportunities

Page 5: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

5

Challenges of first-person video

• Limited bandwidth “in the wild”– Cellular networks (60-80 Kbps)– Multiple cameras on 802.11 drops total

throughput

• First-person video compression is difficult– Low inter-frame overlap reduces

compression opportunities

Page 6: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

6

Challenges of first-person video

• Limited bandwidth “in the wild”– Cellular networks (60-80 Kbps)– Multiple cameras on 802.11 drops total

throughput

• First-person video compression is difficult– Low inter-frame overlap reduces

compression opportunities

Page 7: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

7

Challenges of first-person video

• Limited bandwidth “in the wild”– Cellular networks (60-80 Kbps)– Multiple cameras on 802.11 drops total

throughput

• First-person video compression is difficult– Low inter-frame overlap reduces

compression opportunities– Must either reduce frame rate or image

quality– Low frame-rate video is disorienting. How

do the frames relate to one-another?

Page 8: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

8

Challenges of first-person video

• Limited bandwidth “in the wild”– Cellular networks (60-80 Kbps)– Multiple cameras on 802.11 drops total

throughput

• First-person video compression is difficult– Low inter-frame overlap reduces compression

opportunities– Must either reduce frame rate or image

quality– Low frame-rate video is disorienting. How do

the frames relate to one-another

• Aesthetic challenges– Blair Witch-type nausea– Constant motion difficult to track– Camera operator’s interests may not intersect

viewer’s interests

Page 9: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

9

RealityFlythrough (RFT): A novel solution

What we do• Reduce frame-rate• Approximately reconstruct

camera motion using sensors and image processing

Benefits• High-quality frames• Disorientation minimized• Long dwell-time on each

frame• Aesthetically appealing

– Calm– Mesmerizing

Page 10: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

10

RealityFlythrough (RFT): A novel solution

What we do• Reduce frame-rate• Approximately reconstruct

camera motion using sensors and image processing

Benefits• High-quality frames• Disorientation minimized• Long dwell-time on each

frame• Aesthetically appealing

– Calm– Mesmerizing

Page 11: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

11

Roadmap

• Introduction

• Video compression challenges

• How RealityFlythrough works

• Experimental results

• Conclusion

Page 12: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

12

Video compression challenges revisited

• High-panning video has little redundancy between frames– Most codecs do little better than MJPEG– e.g. sizes of different encodings of 1st

clip• mpg4: 364 KB• mjpeg: 359 KB

• Of course, with redundancy, mpg4 improves– For 2nd clip

• mpg4: 284 KB• mjpeg: 386 KB

• Decimating frame-rate to preserve image quality further reduces temporal redundancy, forcing further decimation in the frame rate– Causes confusion and disorientation

Page 13: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

13

Video compression challenges revisited

• High-panning video has little redundancy between frames– Most codecs do little better than MJPEG– e.g. sizes of different encodings of 1st

clip• mpg4: 364 KB• mjpeg: 359 KB

• Of course, with redundancy, mpg4 improves– For 2nd clip

• mpg4: 284 KB• mjpeg: 386 KB

• Decimating frame-rate to preserve image quality further reduces temporal redundancy, forcing further decimation in the frame rate– Causes confusion and disorientation

Page 14: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

14

Video compression challenges revisited

• High-panning video has little redundancy between frames– Most codecs do little better than MJPEG– e.g. sizes of different encodings of 1st

clip• mpg4: 364 KB• mjpeg: 359 KB

• Of course, with redundancy, mpg4 improves– For 2nd clip

• mpg4: 284 KB• mjpeg: 386 KB

• Decimating frame-rate to preserve image quality further reduces temporal redundancy, forcing further decimation in the frame rate– Causes confusion and disorientation

Page 15: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

15

RFT System Architecture

802.11

H323 Video ConferencingStream

RFT MCU(Multipoint Control Unit)

RFT Engine

Cameras

ImageCapture SensorCapture

StreamCombine (352x288 videoresolution)

RFT Server

How RFT Works

1xEVDO Cellular

(~60 Kbps)

Page 16: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

16

Simplifying 3d space

• We know the orientation of each frame• We project the camera’s image onto a virtual wall

at that same orientation• When the user’s orientation is the same as the

camera’s, the entire screen is filled with the image• Results in a 2d simplification of 3d space

How RFT Works

Page 17: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

17

The transition

• A transition between frames is achieved by moving the user’s orientation from the point of view of the source frame to the point of view of the destination frame

• The virtual walls are shown in perspective• Overlapping portions of images are alpha-blended

How RFT Works

Page 18: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

18

Images are projected inside a sphereHow RFT Works

Page 19: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

19

Images are projected inside a sphereHow RFT Works

Page 20: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

20

Point matching improves experience

• If frames overlap, point matching allows for more accurate placement– Use SIFT method [Lowe, 2004];

autopano implementation– Client device computes match

and transmits meta-data w/ frame

• 2d morphing between frames improves blend

• Works w/ inter-frame and inter-camera

How RFT Works

Page 21: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

21

Point matching meets sensors

• New point-matched frames join the panorama

• The panorama consists of the 5 most recent frames (older ones discarded)

• A new panorama is started when a non-point-matched frame arrives. Sensor data positions the frame.

How RFT Works

Page 22: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

22

Field study

Experimental setup• Hazmat bulking process

– Wore full hazmat suits– Labor-intensive– Accurate motion model for head-

mounted camera• .5 fps video transmitted over 1xEVDO• Hazmat supervisor used video to

explain the bulking process

Results• Ran for 64 minutes• Much more camera motion than

expected• Supervisor preferred transitions over

other encoding techniques– Not because of frame quality– Traditional first-person video was too

busy (“It interferes with my thinking. Literally, it’s messing with my head”)

– 1 fps “video” w/o transitions seen as useless

Experimental results

Page 23: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

23

Field study

Experimental setup• Hazmat bulking process

– Wore full hazmat suits– Labor-intensive– Accurate motion model for head-

mounted camera• .5 fps video transmitted over 1xEVDO• Hazmat supervisor used video to

explain the bulking process

Results• Ran for 64 minutes• Much more camera motion than

expected• Supervisor preferred transitions over

other encoding techniques– Not because of frame quality– Traditional first-person video was too

busy (“It interferes with my thinking. Literally, it’s messing with my head”)

– 1 fps “video” w/o transitions seen as useless

Experimental results

Page 24: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

24

Lab study

• Determine if people may actually prefer transitions to traditional first-person video

Experimental setup• Three first-person videos encoded in 4 different

ways– encFast: RFT Transitions sampled at 1 fps– encSlow: RFT Transitions sampled at .67 fps

Experimental results

Page 25: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

25

Lab study

• Determine if people may actually prefer transitions to traditional first-person video

Experimental setup• Three first-person videos encoded in 4 different

ways– encFast: RFT Transitions sampled at 1 fps– encSlow: RFT Transitions sampled at .67 fps– encIdeal: Regular video encoded at 11 fps (∞ bitrate)

Experimental results

Page 26: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

26

Lab study

• Determine if people may actually prefer transitions to traditional first-person video

Experimental setup• Three first-person videos encoded in 4 different

ways– encFast: RFT Transitions sampled at 1 fps– encSlow: RFT Transitions sampled at .67 fps– encIdeal: Regular video encoded at 11 fps (∞ bitrate)– encChoppy: Regular video encoded at 5 fps

Experimental results

Same bitrate

Page 27: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

27

Lab study

• Determine if people may actually prefer transitions to traditional first-person video

Experimental setup• Three first-person videos encoded in 4 different

ways– encFast: RFT Transitions sampled at 1 fps– encSlow: RFT Transitions sampled at .67 fps– encIdeal: Regular video encoded at 11 fps (∞ bitrate)– encChoppy: Regular video encoded at 5 fps

• Subjects did side-by-side comparisons and ranked encodings in order of preference

• Subjects answered questions to help them arrive at a task-independent ranking

Experimental results

Page 28: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

28

Taking out the trash Experimental results

encChoppy

encFast

encIdeal

Page 29: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

29

Taking out the trash Experimental results

encChoppy

encFast

encIdeal

Page 30: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

30

Taking out the trash Experimental results

encChoppy

encFast

encIdeal

Page 31: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

31

Results and analysis

• 12/14 subjects preferred one of our encodings to encChoppy

• 4/14 subjects preferred our encodings to encIdeal w/ 4 more on fence!

• Our encodings grew on people (4 people ranked our encodings higher at end of experiment than at beginning)

Experimental results

• Positives: calm, smooth, slow-motion, sharp, artistic, soft, not-so-dizzy• Negatives: herkey-jerkey, artificial, makes me feel detached, insecure

Our encodings gave subjects time to catch up with what the camera operator was seeing. First-person video tends to dart around too

much.

Page 32: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

32

Conclusion

• First-person video is difficult to compress

• To stream it, we must sacrifice image quality or frame-rate

• Very low frame-rate video (< 5 fps) is disorienting

• Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame-rate and image quality and can be painful to watch

• Our solution– Transmit high-quality low frame-

rate (~1 fps) video along with tilt sensor meta-data

– “Reconstruct” intervening frames by inferring camera motion from meta-data

Low overlap

Low frame-rate

Low quality

Page 33: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

33

Conclusion

• First-person video is difficult to compress

• To stream it, we must sacrifice image quality or frame-rate

• Very low frame-rate video (< 5 fps) is disorienting

• Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame-rate and image quality and can be painful to watch

• Our solution– Transmit high-quality low frame-

rate (~1 fps) video along with tilt sensor meta-data

– “Reconstruct” intervening frames by inferring camera motion from meta-data

Low overlap

Low frame-rate

Low quality

Page 34: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

34

Conclusion

• First-person video is difficult to compress

• To stream it, we must sacrifice image quality or frame-rate

• Very low frame-rate video (< 5 fps) is disorienting

• Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame-rate and image quality and can be painful to watch

• Our solution– Transmit high-quality low frame-

rate (~1 fps) video along with tilt sensor meta-data

– “Reconstruct” intervening frames by inferring camera motion from meta-data

Low overlap

Low frame-rate

Low quality

Page 35: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

35

Conclusion

• First-person video is difficult to compress

• To stream it, we must sacrifice image quality or frame-rate

• Very low frame-rate video (< 5 fps) is disorienting

• Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame-rate and image quality and can be painful to watch

• Our solution– Transmit high-quality low

frame-rate (~1 fps) video along with tilt sensor meta-data

– “Reconstruct” intervening frames by inferring camera motion from meta-data

http://www.realityflythrough.com [email protected]

Page 36: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

36

Conclusion

• First-person video is difficult to compress

• To stream it, we must sacrifice image quality or frame-rate

• Very low frame-rate video (< 5 fps) is disorienting

• Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame-rate and image quality and can be painful to watch

• Our solution– Transmit high-quality low

frame-rate (~1 fps) video along with tilt sensor meta-data

– “Reconstruct” intervening frames by inferring camera motion from meta-data

http://www.realityflythrough.com [email protected]

Page 37: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

37

Other slides

Page 38: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

38

Lab study results

Page 39: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

39

Why digital instead of analog?

• RealityFlythrough piggy-backs on wireless mesh network that is deployed by first-responders on-site

• Varying conditions of the network can be better managed in digital domain. Frame-rates can be throttled and image quality can be degraded.– Also can guarantee eventual delivery of high-quality data

• Support multiple cameras using same bandwidth managing techniques

Page 40: A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department

40

Related Work

• Panoramic Viewfinder– Baudisch, et al.

• Recognizing Panoramas– Brown, Lowe

• View Morphing– Seitz and Dyer

• Efficient Representations of Video Sequences and their Applications– Irani, et al.

• Predictive perceptual compression for real time video communication– Komogortsev, Khan