improving object localization using macrofeature layout...

8
Improving Object Localization Using Macrofeature Layout Selection Woonhyun Nam Bohyung Han Joon Hee Han Department of Computer Science and Engineering, POSTECH, Republic of Korea {xgene,bhhan,joonhan}@postech.ac.kr Abstract A macrofeature layout selection is proposed for object detection. Macrofeatures [2] are mid-level features that jointly encode a set of low-level features in a neighborhood. Our method employs line, triangle, and pyramid layouts, which are composed of several local blocks in a multi-scale feature pyramid. The method is integrated into boosting for detection, where the best layout is selected for a weak clas- sifier at each iteration. The proposed algorithm is applied to pedestrian detection and compared with several state-of- the-art techniques in public datasets. 1. Introduction Object detection and recognition deal with similar prob- lems, but are different in the sense that object detection in- volves localization (where) as well as identification (what) of a predefined object while object recognition typically refers to a classification problem of an object with no con- sideration of localization. To detect an object, a sliding win- dow approach is typically used, where detector extracts fea- ture vectors by scanning multi-scale windows in a given im- age exhaustively and determines the existence of the object in each window. Face, pedestrian and vehicle are the most popular ob- jects for detection, and there have been significant efforts to improve detector performance by varying features, clas- sifiers and datasets. However, the research in object detec- tion has been focused mainly on identification with loose re- quirement of localization; the bounding boxes for detected objects are sometimes poorly aligned. Figure 1 illustrates some localization examples among true positives in pedes- trian detection. Since localization accuracy in detection is critical for many computer vision applications, it is worth- while to investigate the object localization issue in a holistic manner. We propose a macrofeature layout selection algo- rithm in a boosting framework to improve localization per- formance and apply our technique to pedestrian detection for evaluation. (a) (b) Figure 1. Localization examples of true positives in pedestrian de- tection for INRIA pedestrian dataset. Localization accuracy is measured by PASCAL VOC [11] overlap ratio. In general, de- tections are considered to be correct if the overlap ratio 0.5. (a) bad localizations with overlap ratio 0.51 and 0.57. (b) good localizations with overlap ratio 0.70 and 0.72. Related work Object detection has been widely stud- ied in computer vision to identify various objects such as faces [8, 17], pedestrians [1, 4, 5, 7, 8, 9, 14, 15, 16, 18, 19, 21], and others [1, 14]. Recently, pedestrian detection receives much attention and various algorithms have been proposed so far. Viola and Jones [16] proposed an efficient pedestrian detection framework using a boosted cascade with simple and efficient Haar-like features. As a classifier, AdaBoost [13] was employed to select a number of discrim- inative features among a huge number of features. Dalal and Triggs [4] proposed the Histograms of Oriented Gradients (HOG) feature and published the INRIA dataset for human detection. The HOG feature is combined with other types of features successfully in object detection [5, 9, 18, 19]. Doll´ ar et al.[9] introduced the Caltech Pedestrians dataset and benchmarked existing detection algorithms in image— not in window—with several performance metrics. Walk et al.[18] proposed a new feature based on self-similarity of low-level features and combined the new feature with sev- eral different features. They also claimed that iterative re- training procedure with additional negative hard examples is required to improve detector performance. Finding good features is crucial in many pattern recog- nition problems. Low-level feature combinations that are also called as mid-level features have been widely studied

Upload: others

Post on 29-Oct-2019

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Object Localization Using Macrofeature Layout ...cvlab.postech.ac.kr/~bhhan/papers/iccv2011vs_ped.pdf · Improving Object Localization Using Macrofeature Layout Selection

Improving Object Localization Using Macrofeature Layout Selection

Woonhyun Nam Bohyung Han Joon Hee HanDepartment of Computer Science and Engineering, POSTECH, Republic of Korea

{xgene,bhhan,joonhan}@postech.ac.kr

Abstract

A macrofeature layout selection is proposed for objectdetection. Macrofeatures [2] are mid-level features thatjointly encode a set of low-level features in a neighborhood.Our method employs line, triangle, and pyramid layouts,which are composed of several local blocks in a multi-scalefeature pyramid. The method is integrated into boosting fordetection, where the best layout is selected for a weak clas-sifier at each iteration. The proposed algorithm is appliedto pedestrian detection and compared with several state-of-the-art techniques in public datasets.

1. Introduction

Object detection and recognition deal with similar prob-lems, but are different in the sense that object detection in-volves localization (where) as well as identification (what)of a predefined object while object recognition typicallyrefers to a classification problem of an object with no con-sideration of localization. To detect an object, a sliding win-dow approach is typically used, where detector extracts fea-ture vectors by scanning multi-scale windows in a given im-age exhaustively and determines the existence of the objectin each window.

Face, pedestrian and vehicle are the most popular ob-jects for detection, and there have been significant effortsto improve detector performance by varying features, clas-sifiers and datasets. However, the research in object detec-tion has been focused mainly on identification with loose re-quirement of localization; the bounding boxes for detectedobjects are sometimes poorly aligned. Figure 1 illustratessome localization examples among true positives in pedes-trian detection. Since localization accuracy in detection iscritical for many computer vision applications, it is worth-while to investigate the object localization issue in a holisticmanner. We propose a macrofeature layout selection algo-rithm in a boosting framework to improve localization per-formance and apply our technique to pedestrian detectionfor evaluation.

(a) (b)

Figure 1. Localization examples of true positives in pedestrian de-tection for INRIA pedestrian dataset. Localization accuracy ismeasured by PASCAL VOC [11] overlap ratio. In general, de-tections are considered to be correct if the overlap ratio ≥ 0.5.(a) bad localizations with overlap ratio 0.51 and 0.57. (b) goodlocalizations with overlap ratio 0.70 and 0.72.

Related work Object detection has been widely stud-ied in computer vision to identify various objects such asfaces [8, 17], pedestrians [1, 4, 5, 7, 8, 9, 14, 15, 16, 18,19, 21], and others [1, 14]. Recently, pedestrian detectionreceives much attention and various algorithms have beenproposed so far. Viola and Jones [16] proposed an efficientpedestrian detection framework using a boosted cascadewith simple and efficient Haar-like features. As a classifier,AdaBoost [13] was employed to select a number of discrim-inative features among a huge number of features. Dalal andTriggs [4] proposed the Histograms of Oriented Gradients(HOG) feature and published the INRIA dataset for humandetection. The HOG feature is combined with other typesof features successfully in object detection [5, 9, 18, 19].Dollar et al. [9] introduced the Caltech Pedestrians datasetand benchmarked existing detection algorithms in image—not in window—with several performance metrics. Walk etal. [18] proposed a new feature based on self-similarity oflow-level features and combined the new feature with sev-eral different features. They also claimed that iterative re-training procedure with additional negative hard examplesis required to improve detector performance.

Finding good features is crucial in many pattern recog-nition problems. Low-level feature combinations that arealso called as mid-level features have been widely studied

Page 2: Improving Object Localization Using Macrofeature Layout ...cvlab.postech.ac.kr/~bhhan/papers/iccv2011vs_ped.pdf · Improving Object Localization Using Macrofeature Layout Selection

in object detection and recognition. Boureau et al. ana-lyzed several unsupervised learning schemes for the mid-level features that jointly encode a set of low-level featuresin a spatial neighborhood [2]. They proposed a supervisedlearning method for the features by sparse coding and testedon object recognition benchmark datasets. The feature min-ing strategy was proposed in [8], where a pool of infor-mative and complementary features is obtained from thehuge feature space and the optimal feature set is selectedby AdaBoost. They introduced generalized Haar-like fea-tures that are similar to the original ones [17] but allowarbitrary configurations and numbers of rectangles. Thefeature mining [8] was extended to integral channel fea-tures [7], which appeared to improve pedestrian detectionperformance. Multi-scale generalizations of low-level fea-tures was introduced in [1], where the multi-scale featuresoutperformed the best single-scale features on object detec-tion.

Our approach As a feature subset selection strategy, wepropose a macrofeature layout selection to improve objectlocalization. The proposed method has the following char-acteristics:

• Our macrofeature layout selection prioritizes the fea-ture layouts representing lines, triangles, and pyra-mids. According to our observation, features with lo-cal high-order information such as curves and surfacesare more discriminative than features with the zero-order information such as points.

• The macrofeature layouts selected in our frameworkare composed of multiple low-level feature blocks thatare closely located to each other in a multi-scale fea-ture pyramid.

• The proposed macrofeature layout selection techniqueis integrated into the pedestrian detection algorithmby boosting. The localization performance with ourmacrofeature layout selection in pedestrian detectionis improved in our experiment.

The rest of this paper is organized as follows. The boost-ing framework with the feature selection for object detec-tion is described in Section 2. In Section 3, we introducethe macrofeature layouts and discuss how to select the lay-outs by the boosting. In Section 4, we illustrate evaluationresults for our macrofeature layout selection strategy anddemonstrate the pedestrian detection performance in severalchallenging datasets.

2. Boosting for object detectionBoosting [13] is a well-known learning method and has

been successfully applied to object detection [7, 8, 14, 16,

17, 21]. In this section, we describe a general boostingframework for object detection and present how it is com-bined with macrofeature layout selection.

The detector is a classifier y = f(x) that estimates aclass label y ∈ Y = {+1,−1} for an observed featurevector x based on training. Training data is composed ofa set of feature vector xi and corresponding label yi pairs,{(xi, yi)}i=1,...,N , where N is the number of training ex-amples. The overall training procedure by boosting is sum-marized in Algorithm 1, and more details are described inthe rest of this section.

2.1. Boosting algorithm

We train an object detector by the Discrete AdaBoost al-gorithm [13]. The boosting is an additive learning frame-work that combines a set of weak classifiers into a strongclassifier. The strong classifier h(·) is a weighted sum ofweak classifiers ht(·), which is given by

h(x) =

T∑t=1

αtht(x). (1)

Weak classifiers are added iteratively and the learning ateach iteration is focused more on the data samples that havebeen still misclassified. The weak classifiers are learned tominimize the weighted training errors defined by

errt , Ew

[1y 6=ht(x)

]= wT1y 6=ht(x), (2)

where Ew represents the expectation over the sample distri-bution, w = (w1, . . . , wN )T is a normalized weight vectorof samples, and 1S is the indicator function of the set S.The optimal weak classifier with the minimum error is se-lected and its weight is given by

αt = ln1− errterrt

. (3)

At each iteration, the weights of misclassified data samplesare increased as

wi ← wi exp[αt1yi 6=ht(xi)

], (4)

and renormalized so that∑

i wi = 1.

2.2. Feature selection

In many pattern recognition problems including classifi-cation, it is desirable to select features that are more relevantto the given problem, for accuracy and efficiency. In ob-ject detection, the feature selection is very important for ro-bustness to handle partial occlusions, illumination changes,view-point changes, and pose variations.

In the boosting, feature selection is performed whenweak classifiers are learned. Let Φ(·) be a mapping func-tion for features selection. The weak classifiers in Eq. (1)

Page 3: Improving Object Localization Using Macrofeature Layout ...cvlab.postech.ac.kr/~bhhan/papers/iccv2011vs_ped.pdf · Improving Object Localization Using Macrofeature Layout Selection

Algorithm 1 Training by Discrete AdaBoost [13]Input: {(xi, yi)}i=1...N and {Φj(·)}j=1...M

Initialization: wi ← 1N

while t < T dofor j dohj = argminhj err

[hj]

where err[hj],∑

i|yi 6=hj(Φj(xi))wi.

end forj = argminj err

[hj].

ht ← hj

Φt(·)← Φj(·)αt = ln 1−errt

errtwhere errt , err [ht].

wi ← wi exp [αt] if yi 6= ht(Φt(xi)).Normalize wi so that

∑i wi = 1.

end whileOutput: h(x) =

∑Tt=1 αtht(Φ(x)).

combined with the feature selection and the strong classifieris given by

h(x) =

T∑t=1

αtht(Φt(x)), (5)

where ht is the selected weak classifier and Φt(·) is thecorresponding mapping function.

2.3. Weak classifier

A weak classifier is a binary classifier that has slightlybetter accuracy than random guessing. To learn weak clas-sifiers with the selected features, we employ the WeightedFisher Linear Discriminant (WFLD) [14], which is a variantof the Fisher Linear Discriminant (FLD) [12]. The WFLDcan handle the data samples whose weights change in eachiteration in the boosting procedure. Note that resamplingis required to apply FLD to weak classifier learning andWFLD is more convenient than FLD with resampling. Clas-sification trees [3], which divide the feature space in an axis-aligned direction and assign labels to the partitions, are usedwidely for weak classifiers. However, FLD and WFLD finda more optimal linear partition than an axis-aligned parti-tion. The WFLD is solved in a similar way of FLD, butweighted means and covariance matrices are used. Theoptimal linear partition maximizes weighted between-classscatter and minimizes weighted within-class scatter simul-taneously.

3. Macrofeature layout selection

We propose a macrofeature layout selection to improveobject localization in a general boosting framework for ob-ject detection. Macrofeatures [2] are mid-level features thatjointly encode a set of low-level features in a neighborhood.

Table 1. Types of local macrofeature layouts used in our method

Type Type 1 Type 2 Scale-variant

Layout A1A2A3

A1

A2A3

A1

A2

A3

Shape Line Triangle Pyramid

3.1. Macrofeature layouts

We employ line, triangle, and pyramid layouts in a multi-scale feature pyramid to model high-order structural infor-mation. The layouts in our framework are composed of low-level feature blocks that are closely located to each other inthe feature pyramid; we call such layouts as local layouts.

Table 1 illustrates the types of local macrofeature layoutsused in our method. The layouts are designed to capturehigh-order information of local structures varying locationor scale. All layouts consist of three blocks and capturethe structural information up to the second-order. The Type1 and Scale-variant layouts capture local structure informa-tion along the one direction while the Type 2 layout captureslocal structure information over the surface defined by thetriangle layout.

3.2. Selection algorithm

The macrofeature layout selection is integrated into ageneral boosting framework for object detection. A macro-feature vector is a concatenation of low-level feature vectorswhich come from the blocks of a macrofeature layout. Theprocedure obtaining a macrofeature vector for a macrofea-ture layout is represented by the mapping function Φ(·) inSec. 2.2. Denote macrofeature layout candidates for selec-tion by {Φj(·)}j=1,··· ,M , where M is the number of thecandidates. Each layout candidate is used to select a weakclassifier based on minimizing the weighted training error inEq. (2). The best layout with has the lowest error is selectedat each iteration of the boosting. The overall procedure issame as Algorithm 1.

3.3. Local layout candidates

We design a number of macrofeature layout candidatesand search for better candidates manually that improve ob-ject localization accuracy. The layout candidates are con-structed by varying the number of blocks, the step size be-tween two blocks, and the orientation of layouts.

Table 2 illustrates the final candidates of macrofeaturelayout for selection after finishing the search. All layoutsconsist of three low-level feature blocks. The layouts basedon one or two blocks are discarded because they have neverbeen selected when layouts based on three blocks exist inthe candidates. The selected Type 1 and Type 2 layoutsrepresent line segments of four directions, and triangles of

Page 4: Improving Object Localization Using Macrofeature Layout ...cvlab.postech.ac.kr/~bhhan/papers/iccv2011vs_ped.pdf · Improving Object Localization Using Macrofeature Layout Selection

Table 2. Final candidates of local macrofeature layouts for selec-tion; each layout consists of three low-level feature blocks withoverlaps

Type Layouts

Type 1

Type 2

Scale-variant

eight different orientations, respectively. The Scale-variantlayout represents a pyramid in scale. The designed layoutcandidates are used for the macrofeature layout selection.

4. Experiment

Our macrofeature layout selection algorithm was eval-uated in pedestrian detection with several challengingdatasets—INRIA [4], ETH [10], TUD-Brussels [20], andCaltech [9] datasets. We first investigated several issueson designing macrofeature layouts—the number of featureblocks, their spatial layouts, scales of the blocks, and lo-cal layouts in either spatial or scale domain. Second, theproposed algorithm was compared with other layout vari-ations to analyze their characteristics in pedestrian detec-tion. Third, we compared three different types of the locallayouts which were selected in our boosting classifier forpedestrian detection. Finally, the pedestrian detection byour local macrofeature layout selection was compared withother algorithms.

In our implementation, INRIA train dataset based onHOG features was used for training. Our feature is a mod-ified version of HOG [4], where three different scales ofthe HOG are computed and the three local HOG blocks areused to generate macrofeature vectors as Table 1 and 2.

4.1. Issues on designing macrofeature layouts

We evaluated local layouts to investigate several issueson designing the layouts. Local layouts of Type 1+2 andScale-variant ones were tested with single block layout andrandom layouts by per-window protocol.

The number of blocks Type 1+2 layouts based on threeblocks were compared with single block layout in spatialdomain only. Every block has a single scale of 16 × 16.Fig. 2(a) illustrates that the Type 1+2 layouts have betterperformance than the single block layout. We also testedthe different numbers of blocks for the local layouts, andconcluded the layouts with three blocks are better than oneor two blocks. Actually, the local layouts based on one or

0 0.01 0.02 0.03 0.04 0.050.95

0.96

0.97

0.98

0.99

1

False Positives

Tru

e P

ositiv

es

Comparative Evaluations on Spatial Feature Layouts

Local Layouts (Type 1+2)

Random Layouts

Single Block Layout

(a)

0 0.01 0.02 0.03 0.04 0.050.95

0.96

0.97

0.98

0.99

1

False Positives

Tru

e P

ositiv

es

Comparative Evaluations on Scale−variant Layouts

Best Layout of {8,16,32}

Other Layouts

(b)

Figure 2. ROC curves of comparative evaluations on (a) spatiallayouts in a single scale; single block layout, random layouts ofthree blocks, and local layouts of Type 1+2; the local layouts werethe best, (b) Scale-variant layouts in multiple scales; the scales of{8× 8, 16× 16, 32× 32} was the best.

two blocks have never been selected in our boosting clas-sifier when the local layouts based on three blocks exist incandidates for layout selection.

Spatial layouts of blocks Type 1+2 layouts were com-pared with random layouts in spatial domain only. Bothlayouts consist of three blocks which has a single scale of16×16. The number of the random layout candidates whichare sampled for each iteration of boosting is same with thenumber of our local layout candidates. Fig. 2(a) illustratesthat the Type 1+2 layouts have better performance than therandom layouts.

Scales of blocks Scale-variant layouts were comparedwith varying sizes of the blocks. The layouts consist ofthree blocks of different scales among {8×8, 12×12, 16×16, . . . , 32× 32}. According to our experiment, the combi-nation of three scales {8×8, 16×16, 32×32} was the bestas in Fig. 2(b).

Local layouts in either spatial or scale domain Scale-variant layouts in scale domain were tested with Type 1+2layouts in spatial domain. Comparing Fig. 2(a) and 2(b),the Scale-variant layouts typically outperforms all spatiallayouts. It suggests that the multi-scale features with a sin-gle location is better than the multi-location features witha single scale. The same phenomenon was observed in thefollowing evaluations.

4.2. Local layouts in pedestrian detection

Our local macrofeature layout selection were evaluatedin pedestrian detection. The proposed method with allType 1+2 and Scale-variant layouts in Table 2 were com-pared with other options such as single block layout andrandom layouts. For INRIA and ETH datasets, the algo-rithms were evaluated by per-image protocol, where PAS-CAL VOC [11] overlap criterion varies from 0.50 to 0.75

Page 5: Improving Object Localization Using Macrofeature Layout ...cvlab.postech.ac.kr/~bhhan/papers/iccv2011vs_ped.pdf · Improving Object Localization Using Macrofeature Layout Selection

Table 3. Per-image evaluations for pedestrian localization with lay-out variations. Miss rate (%) for each overlap criterion was mea-sured at fppi = 1. The value is the minimum miss rate for eachoverlap criterion. (*) the proposed method with local layouts, (s)single block layout, (r) random layouts

Dataset Algo. Miss rate (%) for each overlap criterion0.50 0.55 0.60 0.65 0.70 0.75

INRIA(*) 13.6 14.9 16.1 17.5 20.4 26.7(s) 17.5 18.3 19.4 20.7 25.6 31.7(r) 13.2 14.9 15.6 17.8 21.9 28.0

ETH(*) 28.1 29.7 31.9 35.1 42.2 59.6(s) 33.8 35.2 38.0 42.0 48.8 64.0(r) 28.7 30.4 32.7 36.6 45.2 61.9

and miss rates were measured with one false positive perimage (fppi = 1). The overlap criterion means a overlapratio between ground truth and detection bounding boxes.

Localization accuracy As in Table 3, the proposedmethod with the local layouts is better than or comparableto others in INRIA and ETH datasets. Fig. 3 illustrates therelative differences of miss rates between the single blocklayout and the proposed method. The relative differences ofmiss rates are computed by

eSB − eOurs

1− eOurs(6)

where eSB and eOurs are the miss rates of the single blocklayout and the proposed method, respectively. The relativedifferences tend to increase when overlap criterion increases(Fig. 3); the proposed method has relatively higher localiza-tion accuracy. In our observation, using layouts of multipleblocks in multi-scale improves localizations and there aremarginal differences in detection accuracies between localand random layouts. Considering the training time affectedby the number of feature layouts, the number of the locallayouts is significantly smaller than the number of all pos-sible layouts, and the local layout selection is more advan-tageous.

Detection response Fig. 4 demonstrates image responsemaps computed by pedestrian detectors with the proposedmethod and the single block layout. The response of theproposed method was more stable than the response of thesingle block layout since it reduces noises and improves lo-calization using the local structures.

4.3. Local layouts selected in boosting

We analyzed the local layouts that are selected in train-ing a pedestrian detector by boosting. Fig. 5 illustrates the

0.50 0.55 0.60 0.65 0.70 0.75

0.04

0.05

0.06

0.07

Overlap Ratio

Re

lative

Diffe

ren

ce

of

Mis

s R

ate

INRIA

(a)

0.50 0.55 0.60 0.65 0.70 0.75

0.08

0.09

0.10

0.11

Overlap Ratio

Re

lative

Diffe

ren

ce

of

Mis

s R

ate

ETH

(b)

Figure 3. Relative difference of miss rates for each overlap cri-terion between single block layouts and the proposed method in(a) INRIA, (b) ETH. Greater values of the differences mean betterperformance of the proposed method.

(a) (b)

Figure 4. Image response maps computed by pedestrian detectorswith (a) the proposed method, (b) single block layout.

0

0.2

0.4

0.6

0.8

1

Type of Layouts

Se

lectio

n R

ate

Selection Rate of Layout Types

Type 1Type 2Scale−variant

(a)

0

0.2

0.4

0.6

0.8

1

Type of Layouts

We

igh

ted

Se

lectio

n R

ate

Selection Rate of Layout Types

Type 1Type 2Scale−variant

(b)

Figure 5. Rates of the selected layout types in boosting. (a) selec-tion rates. (b) weighted selection rates.

selection rate of each layout type, where the relative oc-currences (Fig.5(a)) and the weighted relative occurrences(Fig.5(b)) of each type selected are presented.1 The Scale-variant layouts are selected more frequently than Type 1 andType 2; it is consistent with the results in Fig. 2.

The selection order in boosting iterations is demon-strated in Fig. 6. At early iterations, the selection rates ofType 1 and Type 2 are high, but after the first several it-erations, the selection rate of the Scale-variant layouts be-comes dominant. The Type 1 and Type 2 layouts are morediscriminative in a coarse level, but the Scale-variant lay-outs capture fine structural information that differentiatespedestrians from hard non-pedestrian examples.

1The weight of a feature layout means the weight of the correspondingweak classifier.

Page 6: Improving Object Localization Using Macrofeature Layout ...cvlab.postech.ac.kr/~bhhan/papers/iccv2011vs_ped.pdf · Improving Object Localization Using Macrofeature Layout Selection

20 40 60 80 100 1200

0.2

0.4

0.6

0.8

1

Boosting Iteration

Se

lectio

n R

ate

Relative Selection Rate of Layout Types

Type 1Type 2Scale−variant

(a)

20 40 60 80 100 1200

0.2

0.4

0.6

0.8

1

Boosting Iteration

We

igh

ted

Se

lectio

n R

ate

Relative Selection Rate of Layout Types

Type 1Type 2Scale−variant

(b)

Figure 6. Rates of the selected layout types in boosting iterations.(a) relative rates of selection. (b) relative weighted rates of selec-tion.

(a) (b) (c) (d) (e)

Figure 7. Density for the locations of the local feature layouts,which are selected by boosting for each type. (a) Type 1. (b) Type2. (c) Scale-variant. (d) All layouts. (e) Corresponding alignmentof pedestrian in the window.

The densities of the selected local feature layouts aredemonstrated in Fig. 7. The Type 1 and Type 2 are comple-mentary; they have strong responses near head and torso, re-spectively. The Scale-variant layouts capture various struc-tures of human body such as head, shoulders, torso and feet.

4.4. Comparison with other detection algorithms

Our algorithm was evaluated and compared withfour state-of-the-art algorithms—MultiFtr+CSS [18], Chn-Ftrs [7], FPDW [6], and HOG [4]. In the evaluations, PAS-CAL VOC [11] overlap criteria 0.50 and 0.75 was used.

Per-image evaluations of pedestrian localization in alldatasets with and without considering the number of pedes-trians in each dataset are shown in Fig. 8 and 9, respec-tively. Fig. 8 illustrates our algorithm is the best with con-sidering the number of pedestrians and Fig. 9 suggests thatours is comparable to other techniques without consider-ing the number of pedestrians. In the both cases, our al-gorithm has relatively higher localization accuracy; the dif-ference between the second smallest miss rate and ours atfppi = 1 tends to increase when overlap criterion increases(Fig. 8), and our ranking is improved when overlap crite-rion increases (Fig. 9). Per-image evaluations of pedes-trian localization on each dataset with the overlap criterion0.70 are shown in Fig. 10. Our algorithm is the best in the

Table 4. Rank of the proposed algorithm in miss rates comparedwith four state-of-the-art algorithms with fppi = 1. The value inparenthesis is the difference between the smallest miss rate of theothers and ours at each overlap criterion; greater values mean bet-ter performance. †The overall ranking is computed based on theaverage miss rates which are weighted by the numbers of pedes-trians in the datasets; same as those in Fig. 8.

Dataset Ranking for each overlap criterion0.50 0.55 0.60 0.65 0.70 0.75

Overall†1 1 1 1 1 1

(3.9) (4.7) (4.9) (4.9) (4.8) (3.0)

INRIA4 4 4 4 4 3

(-4.9) (-4.9) (-5.1) (-4.1) (-3.9) (-4.3)

ETH1 1 1 1 1 1

(5.7) (7.1) (9.4) (9.4) (8.9) (3.5)

TUD4 4 3 2 2 2

(-4.3) (-3.6) (-4.8) (-5.9) (-7.3) (-9.4)

Caltech4 4 4 3 2 1

(-5.8) (-4.2) (-3.5) (-0.9) (-0.3) (1.0)

ETH dataset and comparable to other techniques in the IN-RIA and Caltech datasets. Note that all algorithms exceptHOG and ours are based on multiple heterogeneous featuresand that all algorithms trained the classifier in the INRIAdataset except MultiFtr+CSS, which is trained in the TUD-MotionPairs dataset. On the other hand, the ranks of ouralgorithm out of five is presented with the variation of theoverlap criterion in Table 4. It suggests that our algorithmhas relatively higher localization accuracy.

Overall, our algorithm is decent compared with otheralgorithms even though we use a single type of feature(HOG). The local feature layout selection improves the per-formance of our algorithm by finding relevant features effi-ciently and capturing structural information effectively.

5. ConclusionsWe proposed a macrofeature layout selection to improve

object localization in a general boosting framework for ob-ject detection. The macrofeature layouts are modeled bylines, triangles, and pyramids with three blocks. The struc-tural information involved in the line and triangle featuresimproves localization, and the “local” selection strategy re-duces training time by eliminating many irrelevant candi-dates. Our macrofeature layout selection is successfullyemployed for pedestrian detection and showed compara-ble performance to the state-of-the-art techniques even withonly a single type of feature, e.g., HOG.

AcknowledgementThis research was supported in part by the Future IT Lab-

oratory of the LG Electronics Corporation, in part by the

Page 7: Improving Object Localization Using Macrofeature Layout ...cvlab.postech.ac.kr/~bhhan/papers/iccv2011vs_ped.pdf · Improving Object Localization Using Macrofeature Layout Selection

(a) (b)

Figure 8. Per-image evaluations of pedestrian localization in all datasets—INRIA, ETH, TUD-Brussels, and Caltech datasets. The missrate in each dataset was weighted by the number of pedestrians in the dataset and the weighted averages were obtained. FLS (FeatureLayout Selection) represents our results. Note that all algorithms except HOG and ours are based on multiple heterogeneous featureswhile ours is only based on HOG and that all algorithms trained the classifier in INRIA dataset except MultiFtr+CSS, which is trained inTUD-MotionPairs dataset. (a) results with PASCAL VOC overlap criterion = 0.5 (b) results with PASCAL VOC overlap criterion = 0.7

(a) (b)

Figure 9. Per-image evaluations of pedestrian localization in all datasets—INRIA, ETH, TUD-Brussels, and Caltech datasets. The missrates from all datasets were simply averaged with no weights. FLS (Feature Layout Selection) represents our results. Note that allalgorithms except HOG and ours are based on multiple heterogeneous features while ours is only based on HOG and that all algorithmstrained the classifier in INRIA dataset except MultiFtr+CSS, which is trained in TUD-MotionPairs dataset. (a) results with PASCAL VOCoverlap criterion = 0.5 (b) results with PASCAL VOC overlap criterion = 0.7

IT R&D program of MKE/IITA (2008-F-031-01, Develop-ment of Computational Photography Technologies for Im-age and Video Contents), and in part by Basic Science Re-search Program through the National Research Foundationof Korea (NRF) funded by the Ministry of Education, Sci-ence and Technology (2011-0005749). We thank researchengineers in LG Electronics Corporation, particularly JasonYoon, Seungmin Baek, Kwangho An, Youngkyung Park,and Shounan An.

References[1] S. Bileschi. Object detection at multiple scales improves ac-

curacy. In ICPR, 2008. 1, 2[2] Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. Learning

mid-level features for recognition. In CVPR, 2010. 1, 2, 3

[3] L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classifi-cation and Regression Trees. Wadsworth and Brooks, 1984.3

[4] N. Dalal and B. Triggs. Histograms of oriented gradients forhuman detection. In CVPR, 2005. 1, 4, 6

[5] N. Dalal, B. Triggs, and C. Schmid. Human detection usingoriented histograms of flow and appearance. In ECCV, 2006.1

[6] P. Dollar, S. Belongie, and P. Perona. The fastest pedestriandetector in the west. In BMVC, 2010. 6

[7] P. Dollar, Z. Tu, P. Perona, and S. Belongie. Integral ChannelFeatures. In BMVC, 2009. 1, 2, 6

[8] P. Dollar, Z. Tu, H. Tao, and S. Belongie. Feature mining forimage classification. In CVPR, 2007. 1, 2

[9] P. Dollar, C. Wojek, B. Schiele, and P. Perona. Pedestriandetection: A benchmark. In CVPR, 2009. 1, 4

Page 8: Improving Object Localization Using Macrofeature Layout ...cvlab.postech.ac.kr/~bhhan/papers/iccv2011vs_ped.pdf · Improving Object Localization Using Macrofeature Layout Selection

(a) (b)

(c) (d)

Figure 10. Per-image evaluations of pedestrian localization on challenging datasets. PASCAL VOC overlap criterion = 0.7 was used.FLS (Feature Layout Selection) represents our results. Note that all algorithms except HOG and ours are based on multiple heterogeneousfeatures while ours is only based on HOG and that all algorithms trained the classifier in INRIA dataset except MultiFtr+CSS, which istrained in TUD-MotionPairs dataset. (a) INRIA dataset (b) ETH dataset (c) TUD-Brussels dataset (d) Caltech dataset

[10] A. Ess, B. Leibe, K. Schindler, , and L. van Gool. A mobilevision system for robust multi-person tracking. In CVPR,2008. 4

[11] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn,and A. Zisserman. The pascal visual object classes (voc)challenge. IJCV, 88(2):303–338, June 2010. 1, 4, 6

[12] R. Fisher. The statistical utilization of multiple measure-ments. In Annals of Eugenics, volume 8, pages 376–386,1938. 3

[13] J. Friedman, T. Hastie, and R. Tibshirani. Additive LogisticRegression: a Statistical View of Boosting. The Annals ofStatistics, 38(2), 2000. 1, 2, 3

[14] I. Laptev. Improvements of Object Detection Using BoostedHistograms. In BMVC, 2006. 1, 2, 3

[15] O. Tuzel, F. Porikli, and P. Meer. Pedestrian detection viaclassification on riemannian manifolds. IEEE Trans. PAMI,30(10):1713 –1727, oct. 2008. 1

[16] P. Viola, M. Jones, and D. Snow. Detecting pedestrians usingpatterns of motion and appearance. In ICCV, 2003. 1, 2

[17] P. Viola and M. J. Jones. Robust real-time face detection.IJCV, 57:137–154, 2004. 1, 2

[18] S. Walk, N. Majer, K. Schindler, and B. Schiele. New fea-tures and insights for pedestrian detection. In CVPR, 2010.1, 6

[19] X. Wang, T. X. Han, and S. Yan. An hog-lbp human detectorwith partial occlusion handling. In ICCV, 2009. 1

[20] C. Wojek, S. Walk, and B. Schiele. Multi-cue onboard pedes-trian detection. In CVPR, 2009. 4

[21] Q. Zhu, M.-C. Yeh, K.-T. Cheng, and S. Avidan. Fast humandetection using a cascade of histograms of oriented gradi-ents. In CVPR, 2006. 1, 2