Transcript
Page 1: 3D2PM’&3D&Deformable&Part&Models · 2015. 9. 17. · 3D2PM’&3D&Deformable&Part&Models Bojan&Pepik1&&&Peter&Gehler2&&&Michael&Stark1,3&&&BerntSchiele 1& 1Max&Planck&Ins;tute&for&Informacs,&Saarbrücken,&Germany

3D2PM  -­‐  3D  Deformable  Part  ModelsBojan  Pepik1      Peter  Gehler2      Michael  Stark1,3      Bernt  Schiele1  

1Max  Planck  Ins;tute  for  Informa;cs,  Saarbrücken,  Germany                    2Max  Planck  Ins;tute  for  Intelligent  Systems,  Tübingen,  Germany                    3Stanford  University

3D  Deformable  Part  ModelsMo3va3on• Objects  are  inherently  3-­‐dimensional• 3D  object  representa;ons  provide:‣ Compact  and  accurate  approxima;on  of  the  

physical  world• Higher  level  vision  tasks  can  benefit  from  

expressive  object  detectors:‣ Angular  accurate  viewpoints‣ 3D  parts  consistent  across  views

• State-­‐of-­‐the-­‐art  detectors  are  modeled  in  2D• 3D  object  detectors  lack  detec;on  performance

Contribu3ons

➡ 3D  version  of  the  Deformable  Part  Model  [2]  capable  of:  ‣ Richer  object  hypotheses  (beyond  2D  BB)‣ Robust  matching  to  image  evidence

➡ Richer  object  hypotheses:‣ Viewpoint  es;ma;on  of  arbitrary  granularity‣ Consistent  parts  across  views

➡ Favorable  performance:‣ State-­‐of-­‐the-­‐art  viewpoint  es;ma;on  results‣ Compe;;ve  2D  object  localiza;on  results

➡ Jointly  op>mize  for  object  localiza;on  and  con;nuous  viewpoint  es;ma;on

Experiments

Three  dimensional  displacement  model

. . . N (pVj |µVj ,⌃

Vj )N (p1j |µ1

j ,⌃1j ) N (p2j |µ2

j ,⌃2j )

N (pj |µj ,⌃j). . .

• Compact  part  displacement  distribu;on  in  3D‣ 6  parameters  per  part  compared  to  4M  in  DPM  [1,2]

• In  view  projected  part  distribu;on p(pvj

|pvo

) = N (pvj

|Qv

j

µj

, (Qv

j

)⌃j

(Qv

j

)T )

p(pj

|po

) = N (pj

|µj

,⌃j

)orthographic  projec3on

• Structured  output  SVM  with  margin  rescaling  • Jointly  address  object  localiza;on  and  viewpoint  es;ma;on �V P (y, y) =

\(yv,yv)180��V OC(y, y) = 1� yb \ yb

yb [ yb

�(y, y) = ↵�V OC(y, y) + (1� ↵)�V P (y, y)

Fine-­‐grained  viewpoint  es3ma3on    (angular  viewpoint  es3ma3on)3D2PM – 3D Deformable Part Models 11

5102022.530

45 8121618

36

5

10

15

20

#components

Viewpoint Estimation error

angular resolution

angu

lar e

rror

5102022.5

30

45 812

1618

36

5

10

15

20

#components

Viewpoint Estimation error

angular resolution

angu

lar e

rror

Fig. 3. Graphical representation of viewpoint classification results, left - linear inter-polation, right - exponential. The number of components is the number bins.

AP/MAE at 5� at 10� at 20� at 22.5� at 30� at 45�

3D2PM-C Lin 8 96.8 / 11.1 95.6 / 12.0 96.1 / 11.7 96.9 / 12.6 97.3 / 13.1 97.8 / 12.63D2PM-C Lin 12 98.5 / 7.8 98.6 / 8.0 95.9 / 8.7 97.9 / 8.3 98.3 / 8.5 97.2 / 13.33D2PM-C Lin 16 97.8 / 6.9 98.2 / 7.1 96.8 / 8.5 97.5 / 7.4 97.6 / 8.3 95.0 / 13.83D2PM-C Lin 18 99.1 / 5.6 99.0 / 5.9 99.3 / 6.3 98.6 / 7.2 98.5 / 8.3 97.0 / 12.93D2PM-C Lin 36 99.2 / 4.7 98.8 / 5.1 98.5 / 6.1 98.2 / 7.1 98.0 / 8.0 97.5 / 12.2

Table 4. Fine-grained viewpoint estimation in MAE [26] (EPFL dataset).

Comparing linear vs. exponential appearance interpolation, both models achievecomparable performance on the finer viewpoint resolution levels (5�, 10�, 20�,22.5�). However, for coarser viewpoint resolution and wider viewpoint spacingamong the bins, exponential interpolation provides worse results than linear in-terpolation.

Summary . While the 3D2PM-Cmodel is a simple continuous model, it achievesgood performance even when starting from wide angular spacing among themodel viewpoint (appearance) bins.

3.4 CAD vs. real image data

We want to explore the performance impact of using CAD data, as they haveunrealistic appearance but perfect viewpoint annotation. Thus we train modelson synthetic data only (synthetic), on real & synthetic (mixed) and on real datawhere we use CAD data only for model initialization. We do experiments with3D2PM-Dand 3D2PM-Cmodels with 8, 18 and 36 bins on the EPFL dataset.

Tab. 5 gives the result. Synthetic models with 36 bins achieve very goodviewpoint classification performance of 5.9� and 7.9� for 3D2PM-Cand 3D2PM-D, respectively, while also achieving good detection results of 96.3% and 95.2%AP. Adding real data (mixed) leads to improved results of 5.8� MAE for 3D2PM-Dand 5.1� MAE for 3D2PM-C, while real data only with 6.4� MAE and 5.6�

performs worse, speaking in favor of using CAD data with accurate annotation.

3.5 Coarse-to-fine viewpoint inference

As we go towards arbitrarily fine viewpoint estimation with 3D2PM-C, we in-crease the number of model evaluations for a given position and viewpoint

Med

ian  angular  e

rror

angular  resolu;on #components

➡ 3D2PM model outperforms 3D object models by large margin

➡ On par performance to 2D object models

Coarse-­‐grained  viewpoint  es3ma3on  (viewpoint  classifica3on)

0

5

10

15

20

25

30

8  viewpoint  bins 16  viewpoint  bins 36  viewpoint  bins

4.77.5

9.6

4.76.9

11.1

5.87.2

12.9

24.8

Med

ian  angular  e

rror

Glasner  et  al  [5] 3d^2PM-­‐D 3D^2PM-­‐C  Lin 3D^2PM-­‐C  Exp

-­‐20.1°

Object  bounding  box  localiza3on  results

0

20

40

60

80

100

Cars Bicycles

AP

Glasner  et  al  [5] 3D^2PMDPM-­‐3D-­‐Const  [1] DPM-­‐VOCVP  [1]

Pascal  VOC  2007

+29.2%

50

60

70

80

90

100

Cars Bicycles

Zia  et  al  [4] Glasner  et  al  [5] Liebelt  et  al  [3]3D^2PM DPM-­‐VOCVP  [1]

3D  object  classes

+4.7%

+24.3%

Ultra-­‐wide  baseline  matching  (quality  of  part  correspondences)

➡ Unlike the state-of-the-art 3D models, 3D2PMs achieve detection performance on par with the 2D object models

0

40

80

45 90 135 180 AVG%  correct  F  m

atrices Zia  et  al  [4] DPM-­‐3D-­‐Const  [1] 3D^2PM

3D  pose  es3ma3on  results

• Deformable  part  model  [2]‣ Mixture  of  star  CRFs  in  2D  space‣ Parts  are  independent  across  components‣ Discrete  appearance  model‣ Op;mized  for  2D  bounding  box  localiza;on

...... ...

• 3D2PM‣ A  star  CRF  in  3D  space‣ Parts  are  in  3D                                                                            and  linked  ‣ Con;nuous  appearance  model‣ Op;mized  for  object  detec;on  and  viewpoint  es;ma;on

pj = (xj , yj , zj)

......

�M

�1

3D  Parts

3D  Parts

s

Root  

appe

arance

Part  

appe

arance

Suppor;ng  view  K  -­‐  1 Suppor;ng  view  K  Interpolated  root  and  part  appearances

• Linear  and  exponen;al  appearance  interpola;on  scheme• The  model  can  synthesize  infinitely  many  components  without  the  need  to  learn  them  all• Allows  arbitrary  fine  viewpoint  es;ma;on• Faster  inference  w.r.t.  to  brute-­‐force

Con3nuous  appearance  model

Model  training

3D  part  inference

h�, (I1, y1, h1)i h�, (IM , yM , hM )i+...+

• Part  inference  in  3D  per  object  instance  ‣ Across  all  views  of  a  given  object  instance  

in  which  the  part  is  visible  in‣ Projected  parts  are  observed  by  viewpoint-­‐

specific  model  instan;a;ons

Discrete  3D  part  state  space

3D  OBJECT  CLASSES  DATASET  (8  discrete  viewpoint  classes)

50

60

70

80

90

100

Cars Bicycles

MPP

E

Liebelt  et  al  [3] Zia  et  al  [4]Glasner  et  al  [5] 3D^2PM-­‐D

+10.2%+20.5%

3D  object  modelsCars Bicycles

Payet  et  al  [6] Lopez  et  al  [7]DPM-­‐3D-­‐Const.  [1] DPM-­‐VOCVP  [1]

2D  object  models

EPFL  MULTI-­‐VIEW  CARS  DATASET  (angular  viewpoint  annota3ons) ➡ State-of-the-art multi-view models can output discrete set of viewpoints

➡ Our 3D2PM-C can provide fine viewpoint estimates

➡ 3D2PM-C state-of-the-art on angular viewpoint estimation

AP  /MAE at  5° #atomic  opera3ons

3D2PM-­‐C  b36  full  inference 99.2  /  4.7 2.20  x  1010

3D2PM-­‐C  b36  coarse  to  fine 99.0  /  7.0 0.48  x  1010

3D2PM-­‐C  b12 97.6  /  7.5 2.20  x  1010

3D2PM-­‐C  b18 98.0  /  6.9 2.20  x  1010

x5

• Coarse  to  fine  inference

➡ x5 faster inference at minimal loss

References

+12.2%➡ 3D2PM due to the 3D model gives

accurate part correspondences➡ 3D2PM state-of-the-art on ultra-wide

baseline matching experiment

[1]  B.  Pepik,  M.  Stark,  P.  Gehler,  B.  Schiele  Teaching  3D  Geometry  to  Deformable  Part  models  CVPR’12[2]  P.  Felzenszwalb,  R.  Girshick,  D.  McAllester,  D.  Ramanan    Object  Detec3on  with  Discrimina3vely  Trained  Part  Based  Models  PAMI’10[3]  J.  Liebelt,  C.  Schmid  Mul3-­‐view  Object  Class  Detec3on  With  A  3D  Geometric  Model  CVPR’10[4]  M.  Z.  Zia,  M.  Stark,  B.  Schiele,  and  K.  Schindler.  Revisi3ng  3D  Geometric  Models  for  Accurate  and  Object  Shape  and  Pose  3DRR’11[5]  D.  Glasner,  M.  Galun,  S.  Alpert,  R.  Basri,  G.  Shakhnarovich  Viewpoint-­‐Aware  Object  Detec3on  and  Pose  Es3ma3on  ICCV’11[6]  N.  Payet,  S.  Todorovic,  From  Contours  to  3D  Object  Detec3on  and  Pose  Es3ma3on  ICCV’11[7]  R.  J.  López-­‐Sastre,  T.  Tuytelaars,  S.  Savarese.  Deformable  Part  Models  Revisited:  A  Performance  Evalua3on  for  Object  Category  Pose  Es3ma3on  CORP’11

Acknowledgements:  This  work  has  been  supported  by  the  Max  Planck  Center  for  Visual  Compu;ng  and  Communica;on.  We  thank  M.  Zeeshan  Zia  for  his  help  in  conduc;ng  wide  baseline  matching  experiments.  

*Note  the  color  coded  part  correspondences

Monday, December 3, 12

Top Related