sgm-nets: semi-global matching with neural...
TRANSCRIPT
SGM-Nets: Semi-global matching with neural networksAkihito Seki1 and Marc Pollefeys2,3 1Toshiba Corporation 2ETH Zurich 3Microsoft
1. Introduction
Our contributions:(1) A first learning based penalties estimation for SGM (Sec.3)
(2) New SGM parameterization that separates Pos/Neg disparity changes (Sec.4)
(3) State-of-the-art accuracy (Sec.6)
2. Related workI. Hand tuned penalties. Ex. Smaller penalties on edges = obj. boundaryII. Learning based penalties for MRF.
How to minimize Eq.(1) by SGM ?
C. All costs
(a) Input image
[1] H.Hirschmuller, “Stereo Processing by Semiglobal Matching and Mutual Information”, PAMI 2008.[2] J.Zbontar et al., “Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches”, JMLR2016.
Stereo correspondence (ex. ZNCC, MC-CNN[2])
Stereo images
Matching cost volume
width
heig
ht
SGM(Penalties P1/P2 are needed)
Disparity map
Disparity estimation pipeline
Proposed method
( ) ( ) [ ] [ ]∑ ∑∑
>−+=−+=
∈∈x y
yx
y
yxx
xx
xNND
ddTPddTPdcDE 11,minˆ21
Disparity map Matching cost Penalty for slant surface Penalty for discontinuity
T [] : Kronecker deltaUnary Pairwise
Semi-global matching (SGM)[1] is the most popular regularization method for stereo because of its high accuracy in real-time. However SGM penalties are difficult to be tuned. This work deals with deep neural networks for predicting the penalties.
SGM: P1/P2 control a smoothness / discontinuity of a disparity map.Energy function E for SGM:
(1)
Position x = (u, v)
Dis
parit
y d
(b) Minimum cost path ( )dL ,xr
Accumulated costs(a) 4 paths from all directions r
Position u
Pos
ition
v
0x
Image( ) ( )∑=r
xx dLD rd,minarg 00
Winner-takes-all (WTA) over accumulated costs Lr
0x1x2x3x 1−x 2−x
( ) ( ) ( ){ ,, min, , 100 dLdcdL rr xxx ′+=′ ( ) ,1, 11 PdLr +±′ x ( ) } , min 211PiLrdi
+′±≠
x
Matching cost
(2)
( ) ( )( )∑≠
+−=00
00 ,,,0max 00xx
xx xxgti dd
igtg mdLdLEHinge loss
Flat Slant Border
(3).
0x1x
Position
Dis
parit
y
Accumulation direction
2x3x( )11 xP
( )21 xP
( )22 xP
0
0
0xgtd
14xd
11xd
23xd3
3xd
( )32 xP0
Backward direction
1d
2d
3d
4d
5d 05xd
( ) ( ) ( ) ( ) ( ) ( )223332110032100 ,,,,, xxxxxx xxxxx PdcdcdcdcdL gtgt ++++=
( ) ( ) ( ) ( ) ( ) ( ) ( )2111333241505032100 ,,,,, xxxxxxx xxxxx PPdcdcdcdcdL +++++=
Accumulated cost at pixel of disparity 0xgtd0x
Accumulated cost at pixel of disparity 05xd0x
( ) ( ) ( ) 1,0,1221211
=∂
∂=
∂
∂−=
∂
∂
xxx PE
PE
PE ggg ( ) ( ) 0,, if 00
00 >+− mdLdL igtxx xx
Derivatives of Eq.(3) wrt. Pare needed to train the networks
Derivatives wrt. P
Correctly estimated pixel
<
( )111,xx dL
( ) ( )11211, xx x PdL +
( ) ( )12311, xx x PdL +
)(⋅NBorder case
( ) ( )1211, xx x PdL gt +
0x1x
2d
3d
4d
=1d
1xgtd
0xgtd
Minimum
)(⋅N
4. Signed parameterization
5. SGM-Net architecture
6. Results
Ex.)
( ) ( ) ( )( )∑≠
+⋅−+=00
1121, ,0max
xx
xx x
gti ddgtnb mNPdLE
Margin
A. Path cost B. Neighbor costSparse GT Acceptable UnacceptableInfluenced pixels Beyond neighbor pixels Only neighborpixelsDisparity ambiguity Yes NoInitial parameters Influenced Uninfluenced
∑ ∑∑∑∑
+++=
r flatnf
slantns
bordernbg EEEEE α
( ) ,0or 112
=∂∂
xPEb
( ) 0or 111
−=∂∂
xPEb 0 if >nbEDerivatives wrt. P
nxEB. Neighbor cost
A. Path cost gE
A. Path B. Neighbor
E
+1P
+2P
−1P
−2P
Position
Dis
parit
y
positive
negative
+P
−P
Posit
ion
Disp
arity
Position
−+ > 11 PP
VerticalHorizontal
−+ < 11 PP
Disparity
( ) ( ) [ ] [ ]∑ ∑∑
−=+=+=
∈
−
∈
+
x yy
x
xx
xNND
dTPdTPdcDE 11,minˆ11 δδ
P1/P2 have different penaltiesdepending on either positive ornegative disparity change.Eq.(1) becomes to
[ ] [ ]
−<+>+ ∑∑
∈
−
∈
+
xx yy NNdTPdTP 11 22 δδ
yx ddd −=δSGM-Net is also applicable to the parameterization
Synthetic dataset
Real dataset
(a) GT disparity map
(b) Initial state of SGM-Net (c) SGM-Net (Path cost)
(d) SGM-Net(Neighbor cost)
AB
(e) SGM-Net with all costs
Original image
BA
+1P −
1P
Original imageDisparity map
LargeSmall
• ZNCC / MC-CNN as a matcher• Various settings of SGM.
• MC-CNN as a matcher (67 [sec.] for the matching)• Got 1st rank at the CVPR deadline#
• Standard and Signed SGM-Nets output 8 and 16 values, respectively.
• 3x3 16 filters and 128 dim FCs• Computation takes 0.02 seconds on GPU
#18/10/2016 except anonymous submission
(b) GT disparity map
(c) Winner-tales-all (d) SGM with hand tuned penalties (e) SGM with our SGM-Net
Appropriate SGM penalties are important for accurate disparity maps
Original image Hand tuned SGM
Standard SGM-Net Signed SGM-Net
Percentage of erroneous pixels on non-occluded areas with an error threshold of 3 pixels
KITTI: http://www.cvlibs.net/datasets/kitti/
SceneFlow: https://lmb.informatik.uni-freiburg.de/
KITTI 2012
For semantic segmentation and difficult to apply to SGM.
positive positivenegative negative
(c) Flat/Slant/Border
Penalty map
8.29
3.95 3.60
Set correct path by checking adjacent pixels
Minimize loss over the accumulated path.
From Eq.(2), we formulate
3. SGM-Nets
FC +
ReL
U
Conv
olut
ion
+ Re
LU
Cons
tant
Patch(5x5)
Direction1
Direction2
Direction3
Direction4Normalized patch position
Conc
aten
ate
Conv
olut
ion
+ Re
LU
FC +
ELU
P1+, P1-P2+, P2-
P1+, P1-P2+, P2-
P1+, P1-P2+, P2-
P1+, P1-P2+, P2-
Predict 16 values for each pixel.
SGM-Net : Predict P1/P2 at each pixel with a image
Signed SGM-Net
Path-ambiguity