human body detection using histogram of oriented · pdf fileoverlapping blocks in which...

a

Technovision-2014: 1st International Conference at SITS, Narhe, Pune on April 5-6, 2014

All copyrights Reserved by Technovision-2014, Department of Electronics and Telecommunication Engineering,Sinhgad Institute of Technology and Science, Narhe, PunePublished by IJECCE (www.ijecce.org) 271

International Journal of Electronics Communication and Computer EngineeringVolume 5, Issue (4) July, Technovision-2014, ISSN 2249–071X

Human Body Detection using Histogram of OrientedGradients and SVM

Kishor B. BhangaleDepartment of Electronics & Telecommunication

R.M.D Sinhagad School of Engineering, Pune, IndiaEmail: [email protected]

Prof. R. U. ShekokarDepartment of Electronics & Telecommunication

R.M.D Sinhagad School of Engineering, Pune, IndiaEmail: [email protected]

Abstract – Human detection is a challenging task in manyfields because it is difficult to detect humans due to theirvariable appearance and posture. Detecting humansaccurately is the first fundamental step for many computervision applications such as video surveillance, smart vehicles,intersection traffic analysis and so on. This paper consists ofefficient human detection in static images using Histogram ofOriented Gradients (HOG) for local feature extraction andsupport vector machine (SVM) classifiers. Histogram oforiented gradient (HOG) gives an accurate description of thecontour of human body. Based on HOG and support vectormachine (SVM) theory, a classifier for human is obtained. Ihave to evaluate the performance of pedestrian histogram oforiented gradients (HOG) and Support Vector Machine onINRIA human database images.

Keywords – Human Detection, Histogram of OrientedGradients, Classification, Support Vector Machine.

I. INTRODUCTION

The detection of humans in images and videosespecially is an important problem for computer visionand pattern recognition. A robust solution to this problemwould have various applications to autonomous drivingsystems, video surveillance, image retrieval, robotics, andentertainment. In general, the goal of pedestrian detectionis to determine the presence of humans in images andvideos and return information about their position. Humandetection is a challenging task in many fields because ashumans are highly deformable objects whose appearancedepends on numerous factors: Variability of appearance due to the size, color and

texture of the clothes, or due to the accessories(umbrellas, bags etc) that pedestrians may carry

Irregularity of shape: pedestrians may have differentheights, weights

Variability of the environment in which they appear(usually pedestrians exist in a cluttered background incomplex scenarios whose look is influenced byillumination or by weather conditions)

Variability of the actions they may perform andpositions they may have (run, walk, stand, shake handsetc).In existing human detection methods, feature

representation and classifier design are two main problemsbeing investigated. Visual feature descriptors have beenproposed for human detection including Haar-like

features, HOG, v-HOG, Gabor filter based cortex features,covariance features, Local Binary Pattern (LBP) , HOG-LBP [1], Edgelet [2], Shapelet [3], Local Receptive Field(LRF) [4], Multi-Scale Orientation (MSO) [5], AdaptiveLocal Contour [5], Granularity-tunable Gradients Partition(GGP) descriptors [5], pose-invariant descriptors [7],Practical Swarm Optimization . Recently, histogram oforiented gradients (HOG) and region covariance featuresare preferred for pedestrian detection. It has been shownthat they outperform those previous approaches. HOG is agray level image feature formed by a set of normalisedgradient histograms. Linear SVM is the most popularclassifier with several reported landmark works for humandetection. The reasons we selection of SVM classifiers isthat, it is easy to train and, unlike neural networks, theglobal optimum is guaranteed. The extracted features onlabeled samples are usually fed into a classifier for training[11]. However, when we need to detect multi-view andmulti-posture humans simultaneously in a video system,the performance of a linear SVM often drops significantly.It is observed in experiments that humans of continuousview and posture variations form a manifold, which isdifficult to be linearly classified from the negatives. Analgorithm that requires multi-view and multi-posturehumans to be correctly classified by a linear SVM in thetraining process often leads to over-fitting. Some non-linear classification methods such as Piecewise LinearSVM (PLSVM), Kernel SVM, and Profile SVM areoptions to handle this problem, but they are generallymuch more computationally expensive than linearmethods.

II. OVERVIEW OF METHOD

Navneet Dalal and Bill Triggs algorithm on Histogramof Oriented Gradients (HoG) is based on evaluating wellnormalized local histograms of image gradient orientationsin a dense grid [1]. The basic idea is that local objectappearance and shape can often be characterized ratherwell by the distribution of local intensity gradients or edgedirections, even without precise knowledge of thecorresponding gradient or edge positions. In practice thisis implemented by dividing the image window into smallspatial regions (cells), for each cell accumulating a local 1-D histogram of gradient directions or edge orientationsover the pixels of the cell [1]. The combined histogram

www.ijecce.org

mailto:[email protected]

mailto:[email protected]

a




entries form the representation. For better invariance toillumination, shadowing, etc., it is also useful to contrast-normalize the local responses before using them. This canbe done by accumulating a measure of local histogram.energy. over somewhat larger spatial regions (blocks) andusing the results to normalize all of the cells in the block.

We will refer to the normalized descriptor blocks asHistogram of Oriented Gradient (HOG) descriptors [1].Tiling the detection window with a dense (in fact,overlapping) grid of HOG descriptors and using thecombined feature vector in a conventional SVM basedwindow classifier gives our human detection chain.

Fig.1. An overview of our feature extraction and object detection chain. The detector window is tiled with a grid ofoverlapping blocks in which Histogram of Oriented Gradient feature vectors are extracted. The combined vectors are fedto a linear SVM for object/non-object classification. The detection window is scanned across the image at all positionsand scales, and conventional non-maximum suppression is run on the output pyramid to detect object instances, but this

paper concentrates on the feature extraction process.

The HOG/SIFT representation has several advantages. Itcaptures edge or gradient structure that is verycharacteristic of local shape, and it does so in a localrepresentation with an easily controllable degree ofinvariance to local geometric and photometrictransformations: translations or rotations make littledifference if they are much smaller that the local spatial ororientation bin size [15]. For human detection, rathercoarse spatial sampling, fine orientation sampling andstrong local photometric normalization turns out to be thebest strategy, presumably because it permits limbs andbody segments to change appearance and move from sideto side quite a lot provided that they maintain a roughlyupright orientation [4].

III. HUMAN DETECTION

The detection of human body based on HOG includesthe following six steps: gamma correction andnormalization in RGB space, gradient calculation,statistical analysis of gradients of a cell, normalization ofblock, generation of vector, and classification based onSVM.A. Gamma and Color NormalizationWe use exponential gamma correction function to removethe effect of ambient disturbance.B. Gradient Computation

For gradient computation, first the grayscale image isfiltered to obtain x and y derivatives of pixels usingconv2(image,filter,’same’) method with those kernels:Ix = [-1 0 1] andIy = [-1 0 1]T

After calculating x, y derivatives (Ix and Iy), the magnitudeand orientation of the gradient is also computed:

One thing to note is that, at orientation calculationrad2deg(atan2(val)) method is used, which returns valuesbetween [-180°,180°]. Since unsigned orientations aredesired for this implementation, the values which are lessthan 0° is summed up with 180°.

Fig.2. (a) Original image containing human in uprightposition (b) Horizontal gradient of the original image (c)

Vertical gradients of the original image.

C. Orientation BinningThe next step is the fundamental nonlinearity of the

descriptor. Each pixel calculates a weighted vote for anedge orientation histogram channel based on theorientation of the gradient element centered on it, and thevotes are accumulated into orientation bins over localspatial regions that we call cells. Cells can be eitherrectangular or radial (log-polar sectors). The orientationbins are evenly spaced over 0-1800 (“unsigned” gradient)or 0-3600 (“signed” gradient) [1].

To reduce aliasing, votes are interpolated bilinearlybetween the neighbouring bin centers in both orientationand position. The vote is a function of the gradientmagnitude at the pixel, either the magnitude itself, itssquare, its square root, or a clipped form of the magnituderepresenting soft presence/absence of an edge at the pixel.

www.ijecce.org

a




Fig.3. Histogram of orientation gradients. (a) 64 × 128detection window (the biggest rectangle) in an image.

(b)16 × 16 block consists of four cells. (c) Histograms oforientation gradients corresponding to the four cells.

D. Descriptor BlocksIn order to account for changes in illumination and

contrast, the gradient strengths must be locally normalized,which requires grouping the cells together into larger,spatially connected blocks [5]. The HOG descriptor is thenthe vector of the components of the normalized cellhistograms from all of the block regions. These blockstypically overlap, meaning that each cell contributes morethan once to the final descriptor.

Fig.3. Rectangular HOG

Two main block geometries exist: rectangular R-HOGblocks and circular C-HOG blocks. R-HOG blocks aregenerally square grids, represented by three parameters:the number of cells per block, the number of pixels percell, and the number of channels per cell histogram [6]. Inthe Dalal and Triggs human detection experiment, theoptimal parameters were found to be 3x3 cell blocks of6x6 pixel cells with 9 histogram channels. Moreover, theyfound that some minor improvement in performance couldbe gained by applying a Gaussian spatial window withineach block before tabulating histogram votes in order toweight pixels around the edge of the blocks less. The R-

HOG blocks appear quite similar to the scale-invariantfeature transform descriptors; however, despite theirsimilar formation, R-HOG blocks are computed in densegrids at some single scale without orientation alignment,whereas SIFT descriptors are computed at sparse, scale-invariant key image points and are rotated to alignorientation. In addition, the R-HOG blocks are used inconjunction to encode spatial form information, whileSIFT descriptors are used singly.

Fig.4 Circular HOG

C-HOG blocks can be found in two variants: those witha single, central cell and those with an angularly dividedcentral cell. In addition, these C-HOG blocks can bedescribed with four parameters: the number of angular andradial bins, the radius of the center bin, and the expansionfactor for the radius of additional radial bins. Dalal andTriggs found that the two main variants provided equalperformance, and that two radial bins with four angularbins, a center radius of 4 pixels, and an expansion factor of2 provided the best performance in their experimentation.Also, Gaussian weighting provided no benefit when usedin conjunction with the C-HOG blocks.E. Block Normalization

For better invariance to illumination and noise, anormalization step is usually used after calculating thehistogram vectors. Four different normalization schemeshave been proposed: L2-norm, L2-Hys, L1-sqrt, and L1-norm. This analysis used the L2-norm scheme due to itsbetter performance:

where ε is a small positive value used for someregularization when an empty cell is taken into accountand v stands for the characterization vector [8].F. Detector Window

As previously mentioned, the detector window size is64x128 pixels. Our 64_128 detection window includesabout 16 pixels of margin around the person on all foursides. This border provides a significant amount of contextthat helps detection.

www.ijecce.org

a




IV. SUPPORT VECTOR MACHINE

To train our human descriptor, simple binary linearSVM is used in this research. It is a useful technique fordata classification [10]. Somehow, it is sufficient in thecontext of a human detection problem. Training method isalso very important for detection result. Reasonabletraining method improves result efficiently [2]. So for faircomparison of different features, the effect of trainingmethod should be considered. Support vector machine andmany boosting methods, such as Adaboost, Logitboost andGentleboost are widely used in many tasks.

Fig.5. Examples of positive and negative samples fromINRIA image dataset

In our experiment, SVM is used for comparison.SVM isused for training. It is effective for learning with smallsampling in high-dimensional spaces. The objective inSVM is to find a decision plane that maximizes theinterclass margin [12]. The size of sub window should befixed. It is hard to take variable window size strategybecause of the computation problem, although variablewindow could improve the performance efficiently. Butfor comparison purpose, SVM is suitable. The trainingtime is less than boosting method. Optimization isguaranteed. The difference of performance caused byoptimization can be ignored. The parameter of SVM iscontrollable [12]. The suitable parameter could be selectedavoiding the difference caused by parameter difference.

V. CONCLUSION

In this paper, we proposed efficient implementation ofhuman detection system using Histogram of OrientedGradients features (HoG) and Linear Support VectorMachine algorithm. The proposed algorithm will consistof efficient detection of human in images based ondetection error trade off and false positive per window.

The detection rate increases by increase in size oforientation bin and for 64 * 128 descriptor sliding

window. The performance of HOG- SVM detectionmethod is expected better than previous method for humandetection algorithm like human detection using principalcomponent analysis, human detection using Local binaryPattern, human detection using image subtractiontechniques etc on the basis of detection rate and efficiency.

REFERENCES

[1] N. Dalal and B. Triggs, “Histograms of oriented gradients forhuman detection,” in Proc. IEEE Int. Conference on ComputerVision Pattern Recognition, Jun.2005, pp. 886–893.

[2] Qixiang Ye, Zhenjun Han, Jianbin Jiao, Jianzhuang Liu, “HumanDetection in Images via Piecewise Linear Support Vector” IEEETransaction on Image Processing , VOL. 22, NO. 2, February,2013.

[3] Yanwei pang, He Yan, Konquio wang, “ Robust CoHOG featureExtraction in human centered inamge/ video managementsystem”, IEEE Transaction on System, management andybernetics, Vol. 42, No. 2, April 2012, pp. 458-468.

[4] Bongjin Jun, Inho Choi, and Daljin Kim, “Local transformfeatures and hybridization for accurate face and human detection”, IEEE transaction on pattern analysis and machine intelligence,Vo. 35, N0. 6, pp. 1423-1436.

[5] Q. Zhu, S. Avidan, M. Yeh, and K. Cheng, “Fast humandetection using a cascade of histograms of oriented gradients,”IEEE Int. Conf. Computer Vision. Pattern Recognition., Jul.2006, pp. 1491–1498.

[6] Fen Xu, ming Gao, “ Human detection and tracking based onHOG and particle filter”, IEEE International Congress on Imageand Signal Processing, CISP-2010, pp. 1503-1507

[7] Kelvin Lee, Che Yon Choo, Hui Quing See, and Yunil Lee,“Human detection using histogram of oriented gradients andhuman body ratio estimation”, IEEE conference on Comp.Vision, May 2010, pp 18-22

[8] R. Xu, B. Zhang, Q. Ye, and J. Jiao, “Cascaded L1-normMinimization learning (CLML) classifier for human detection,”in Proc. IEEE International Conference on Computer Vision.Pattern Recognition., Jun. 2010, pp. 89–96.

[9] P. Viola, M. Jones, and D. Snow, “Detecting pedestrians usingPatterns of motion and appearance,”, Int. J. Comput. Vis., vol.63, no. 2, pp. 153–161, 2005.

[10] O. Oladunni and G. Singhal, “Piecewise multi-classificationSupport vector machines,” in Proc. Int. Joint Conf. NeuralNetwork., Jun. 2009, pp. 2323–2330.

[11] S. Q. Ren, D. Yang, X. Li, and Z. W. Zhuang, “Piecewisesupport vector machines,” Chin. J. Comput., vol. 32, no. 1, pp.77–85, 2009.

[12] H. B. Cheng, P.-N. Tan, and R. Jin, “Efficient algorithm forlocalized support vector machine,” IEEE Trans. Knowl. DataEng., vol. 22, no. 4, pp. 537–549, Apr. 2010.

[13] Junfeng Ge, Yupin Luo, and Gyomei Tei, “Real time pedestriandetection and tracking at nighttime for driving assistancesystem”, IEEE trans.on Intelligent Transportation Systems, vol.No. 2, pp.283-298, June. 2009.

[14] Essam El-Naqa, Yongyi Yang, Miles N, Werick, Nicolas P.Galatsanos, and Robert m. Nishikawa, “ A support vectormachine approach for detection of micro-calcification”, IEEETransaction on Medical Imaging, vol. 21, Noi.12, December2002, pp. 1552-1563.

[15] Boudour Ammar, Ali Wali, Adel M. Alimi, “Incrementallearning approach for human detection and tracking”, IEEEInternational Conference on innovations in Informationtechnology, 2008, pp. 128-133.

[16] Sung Tae An, Jeong Jung Kim, and Ju Jang Lee, “SDATSimulteneous Detection and Tracking of Humans using ParticalSwarm Optimization”, IEEE International Conference onMechatronics and Automation, Aug.2010, pp. 483-488.

www.ijecce.org

human body detection using histogram of oriented · pdf fileoverlapping blocks in which...

Documents