a hardware-friendly bilateral solver for real-time virtual ...amrita/slides/hpg17.pdf · bilateral...

33
A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality Video Amrita Mazumdar Armin Alaghi Jonathan T. Barron David Gallup Luis Ceze Mark Oskin Steven M. Seitz University of Washington Google University of Washington 1

Upload: others

Post on 30-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality VideoAmrita Mazumdar Armin AlaghiJonathan T. BarronDavid GallupLuis CezeMark OskinSteven M. Seitz

University of Washington

Google

University of Washington

1

Page 2: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

virtual reality video with omnidirectional stereo (ODS)

2

Page 3: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

the Google Jump camera rig can capture ODS video easily

16 GoPros x 4K camera feed 3.6 GB/s raw video

3

Page 4: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

4

the Google Jump camera rig can capture ODS video easily

Anderson et al., SIGGRAPH Asia 2016

Page 5: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

5

the Google Jump camera rig can capture ODS video easily

Anderson et al., SIGGRAPH Asia 2016

Page 6: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

processing video from Google Jump is slow

1 hour of video

10 hours on 1000 cores

6 Anderson et al., SIGGRAPH Asia 2016

Page 7: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

Google Jump pipeline breakdown

sensor download to viewer

pre-processing alignment optical

flowcompositing

7 Anderson et al., SIGGRAPH Asia 2016

Page 8: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

download to viewer

pre-processing alignment optical

flowcompositing

12% 69% 17%2%

the bilateral solver dominates processing

time

8

Google Jump pipeline breakdown

Anderson et al., SIGGRAPH Asia 2016

sensor

Page 9: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

The bilateral solver produces an image that is smooth and accurate.

9

input pair (from two cameras)

blocky flow field upsample into noisy flow field

transform to bilateral grid and

solve

output result: smooth flow field

Anderson et al., SIGGRAPH Asia 2016

Page 10: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

this work: a hardware-friendly

bilateral solver (HFBS)

Page 11: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

The bilateral solver is hard to parallelize

second-order global optimization

global communication prevents aggressive parallelization

high-dimensional, sparse matrices

sparsity results in significant divergence on GPUs

why not a dense grid? too large to store on-chip

11

Page 12: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

Barron Poole 2016 HFBS (our work)

✅ includes color grayscale only

dense matrix too big to fit in memory ✅ dense matrix fits in memory

global communication required ✅ local communication only

iterative bistochastization before solving ✅ partial, non-iterative bistochastization

HFBS is easier to parallelizedetailed

formulation in paper

Page 13: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

HFBS demonstrates imperceptible accuracy loss

task: Ferstl et al., ICCV 2013, data: Middlebury stereo dataset

input image

noisy depth map

Barron Poole 2016

HFBS (this work)13

Page 14: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

algorithm optimizations make it easier to implement bilateral solver in parallel

hardware

14

Page 15: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

15

plan: exploit this parallelism with a custom hardware accelerator

algorithm optimizations make it easier to implement bilateral solver in parallel

hardware

Page 16: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

Mapping HFBS to hardware

16

download to viewer

pre-processing alignment optical

flowcompositingsensor

Page 17: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

17

load video pair

construct bilateral grid per pair

perform hardware-friendly bilateral

solver

slice out solution into output images

CPU FPGA

Mapping HFBS to hardwaredownload to viewer

pre-processing alignment optical

flowcompositingsensor

Page 18: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

microarchitecture

18

CPU

main memory AXI memory interface

HFBS controller

z-axis memory

controller

z-axis memory bank

z-axis memory bank

z-axis memory bank

bilateral filter worker

bilateral filter worker

bilateral filter worker

memory access selector fixed-point

datapath

custom memory layout

Page 19: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

Floating-point resource requirements limit hardware parallelism

float64 32-bit fixed 64-bit fixed 47-bit fixed

DSPs per worker 18 1 16 4

Maximum # workers 379 6840 427 1710

Error (MSE) - 8.3 x 10-4 7.16 x 10-13 6.69 x 10-7

19

Page 20: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

Fixed-point datapath conversion

20

Erro

r (M

SE re

lativ

e to

floa

t64)

1E-12

1E-10

1E-08

1E-06

1E-04

1E-02

Decimal Precision (Fraction of Bitwidth)40% 50% 60% 70% 80% 90%

Max Error

32 64 47Bitwidth

Page 21: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

z-axis slicing for bilateral grid memory layout

21

x:0,y:0,r:255,g:172,b:0 x:0,y:1,r:255,g:172,b:0

.

.

.

.

. x:100,y:100,r:255,g:172,b:0

z = 0

Page 22: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

Evaluation

Page 23: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

23

Evaluation

download to viewer

pre-processing alignment optical

flowcompositing

12% 69% 17%2%

Does HFBS improve runtime? How does parallelization affect power?

sensor

Page 24: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

Experimental SetupCPU: Intel Xeon E5-2620

GPU: NVIDIA GTX 1080 Ti

FPGA: Xilinx Virtex Ultrascale+

Baseline: Barron Poole et al. 2016 (CPU only)

256 iterations of optimization

Varied bilateral grid vertices count ⇒ 4 KB - 1.8 GB grid sizes

24

Page 25: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

HFBS is faster and more scalable than prior work.

25

log

Run

time

(ms)

0.01

1

100

10000

log Bilateral Grid Vertices1,000 100,000 10,000,000

Prior Work (CPU) CPU GPU FPGA

Page 26: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

26

30 FPS and better

log

Run

time

(ms)

0.01

1

100

10000

log Bilateral Grid Vertices1,000 100,000 10,000,000

Prior Work (CPU) CPU GPU FPGA

HFBS is faster and more scalable than prior work.

Page 27: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

HFBS-FPGA is more power-efficient than other platforms

Ops

/ W

att I

mpr

ovem

ent

0

10

20

30

40

Power-efficiency relative to prior work

30.72x

2.12x0.45x1.00x

Prior Work CPU GPU FPGA

27

Page 28: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

building a VR video camera rig with HFBS

Page 29: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

29

this work full system

Page 30: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

 HFBS-FPGA consumes much less power than a GPU for the same task

30

16 GPUs = 4,560 Wfull system16 FPGAs = 400 W

Page 31: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

HFBS makes real-time VR video more feasible with FPGAs

offloaded to cloud

31

on-node with FPGAs

sensor download to viewer

pre-processing alignment optical

flowcompositing

Page 32: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

to conclude

fast, parallel implementation of bilateral solving with little accuracy loss

fixed-point datatypes and a custom bilateral-grid memory layout for improved FPGA performance

hardware-software codesign to reduce latency and improve quality for future VR applications

32

Page 33: A Hardware-Friendly Bilateral Solver for Real-Time Virtual ...amrita/slides/hpg17.pdf · Bilateral Solver for ... Anderson et al., SIGGRAPH Asia 2016. 5 the Google Jump camera rig

parallel algorithm for bilateral solvingFPGA architecture50x faster, 30x more power-efficient

33

A Hardware-Friendly Bilateral Solver for Real-Time Virtual Reality Video