histogram*of*oriented*gradients*for*detection*of*multiple...

9
Histogram of Oriented Gradients for Detection of Multiple Scene Properties Maesen Churchill Adela Fedor I. Introduction Contentbased image retrieval is an interesting problem within the field of computer vision. There are many situations in which a user might want to query a large set of images to find only images with some set of properties. For example, in order to show photos from a recent vacation to a friend, one might want to be able to search through a large collection of personal photos to find scenes of people, on the beach. It would be ideal if multiple properties of a scene could be detected from a single set of features. Ideally, a program would only have to make a onetime calculation of a feature vector upon image upload, and learn several properties of the image by applying this defined set of features to multiple classifiers. Furthermore, introducing a new category to this scheme could be done without recomputing all feature vectors for this set of images; one could simply impose a new, trained classifier to this set. Thus, it would be convenient if there were some canonical descriptor for different types of images. For our project, we will test the feasibility of using Histogram of Oriented Gradient feature sets as such canonical descriptors. We will evaluate the performance of these descriptors at detecting multiple properties of images by attempting to classify images into one of four categories: scenes with humans, scenes with bikes, scenes with both humans and bikes, and scenes with neither humans nor bikes. II. Previous Work Histogram of Oriented Gradient (HOG) descriptors are proven to be effective at object detection, and in particular with human detection [1]. These features have been used to solve a variety of problems, including pedestrians recognition and tracking [7], hand gesture recognition for sign language translation and gesture based technology interfaces [4], face recognition [3], and body part recognition for tracking [1]. HOG descriptors have also been used in tasks that are not human centric, such as classification of vehicle orientation, a problem relevant for autonomous vehicles [5]. However, these classification tasks use a sliding window approach to obtain descriptions of areas localized to just the subject of detection. To detect multiple scene properties, descriptors would have to use the entire image

Upload: ngotuong

Post on 06-Mar-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

Histogram  of  Oriented  Gradients  for  Detection  of  Multiple  Scene  Properties  Maesen  Churchill  Adela  Fedor  

I.  Introduction  Content-­‐based  image  retrieval  is  an  interesting  problem  within  the  field  of  computer  vision.    There  are  many  situations  in  which  a  user  might  want  to  query  a  large  set  of  images  to  find  only  images  with  some  set  of  properties.    For  example,  in  order  to  show  photos  from  a  recent  vacation  to  a  friend,  one  might  want  to  be  able  to  search  through  a  large  collection  of  personal  photos  to  find  scenes  of  people,  on  the  beach.    It  would  be  ideal  if  multiple  properties  of  a  scene  could  be  detected  from  a  single  set  of  features.    Ideally,  a  program  would  only  have  to  make  a  one-­‐time  calculation  of  a  feature  vector  upon  image  upload,  and  learn  several  properties  of  the  image  by  applying  this  defined  set  of  features  to  multiple  classifiers.    Furthermore,  introducing  a  new  category  to  this  scheme  could  be  done  without  recomputing  all  feature  vectors  for  this  set  of  images;  one  could  simply  impose  a  new,  trained  classifier  to  this  set.    Thus,  it  would  be  convenient  if  there  were  some  canonical  descriptor  for  different  types  of  images.    For  our  project,  we  will  test  the  feasibility  of  using  Histogram  of  Oriented  Gradient  feature  sets  as  such  canonical  descriptors.    We  will  evaluate  the  performance  of  these  descriptors  at  detecting  multiple  properties  of  images  by  attempting  to  classify  images  into  one  of  four  categories:  scenes  with  humans,  scenes  with  bikes,  scenes  with  both  humans  and  bikes,  and  scenes  with  neither  humans  nor  bikes.  

II.  Previous  Work  Histogram  of  Oriented  Gradient  (HOG)  descriptors  are  proven  to  be  effective  at  object  detection,  and  in  particular  with  human  detection  [1].    These  features  have  been  used  to  solve  a  variety  of  problems,  including  pedestrians  recognition  and  tracking  [7],  hand  gesture  recognition  for  sign  language  translation  and  gesture-­‐based  technology  interfaces  [4],  face  recognition  [3],  and  body  part  recognition  for  tracking  [1].    HOG  descriptors  have  also  been  used  in  tasks  that  are  not  human-­‐centric,  such  as  classification  of  vehicle  orientation,  a  problem  relevant  for  autonomous  vehicles  [5].    However,  these  classification  tasks  use  a  sliding  window  approach  to  obtain  descriptions  of  areas  localized  to  just  the  subject  of  detection.    To  detect  multiple  scene  properties,  descriptors  would  have  to  use  the  entire  image  

as  a  detection  window.    Moreover,  for  each  of  these  tasks,  the  parameters  of  the  HOG  features  are  tuned  to  optimize  for  the  specific  classification  task.    If  using  a  single  feature  vector  to  learn  several  properties  of  an  image  is  possible,  we  will  have  to  use  a  fixed  set  of  parameters.  

III.  HOG  Descriptors  for  Image  Classification  

Part  1:  HOG  Descriptors  for  Human  Detection  Summary  We  first  replicated  Dalal  and  Trigg’s  method  for  extracting  HOG  features  from  images  [2].    We  used  an  SVM  classifier  to  detect  humans  for  an  easy  dataset.  

 Figure  1:  Pipeline  for  computation  of  HOG  descriptors  (Dalal  and  B.  Triggs,  2005).  

Dataset  

We  collected  images  from  the  MIT  pedestrian  database  used  by  Dalal  and  Triggs  for  images  containing  humans.    For  negative  images,  we  used  the  GRAZ01  portion  of  the  INRIA  Persons  dataset.    In  total,  we  used  914  images  containing  humans,  which  we  partitioned  into  831  images  for  training  and  93  images  for  testing.    We  also  used  273  negative  images,  which  we  divided  into  246  images  for  training  and  27  images  for  testing.    The  images  from  the  MIT  database  had  dimensions  128  pixels  by  64  pixels.    

The  negative  images  had  various  dimensions,  all  around  200  pixels  by  300  pixels.    We  cropped  the  negative  images  to  match  the  size  of  the  positives,  mirroring  Dalal  and  Triggs’  strategy  of  creating  a  human  detection  window  in  the  images.    Preprocessing  and  Gradient  Computation  Dalal  and  Triggs  computed  the  gradients  for  each  color  channel  at  each  pixel.    For  the  overall  gradient,  they  used  the  color  channel  with  the  maximum  gradient  magnitude  [2].    For  simplicity,  we  make  images  gray  scale  and  compute  gradients  using  the  library  function  provided  by  Matlab’s  Image  Processing  toolbox.    Gradients  are  computed  using  Centered  1-­‐D  derivatives.              

(a) (b) (c) (d) (e) (f) (g)Figure 6. Our HOG detectors cue mainly on silhouette contours (especially the head, shoulders and feet). The most active blocks arecentred on the image background just outside the contour. (a) The average gradient image over the training examples. (b) Each “pixel”shows the maximum positive SVM weight in the block centred on the pixel. (c) Likewise for the negative SVM weights. (d) A test image.(e) It’s computed R-HOG descriptor. (f,g) The R-HOG descriptor weighted by respectively the positive and the negative SVM weights.

would help to improve the detection results in more generalsituations.Acknowledgments. This work was supported by the Euro-pean Union research projects ACEMEDIA and PASCAL. Wethanks Cordelia Schmid for many useful comments. SVM-Light [10] provided reliable training of large-scale SVM’s.

References[1] S. Belongie, J. Malik, and J. Puzicha. Matching shapes. The8th ICCV, Vancouver, Canada, pages 454–461, 2001.

[2] V. de Poortere, J. Cant, B. Van den Bosch, J. dePrins, F. Fransens, and L. Van Gool. Efficient pedes-trian detection: a test case for svm based categorization.Workshop on Cognitive Vision, 2002. Available online:http://www.vision.ethz.ch/cogvis02/.

[3] P. Felzenszwalb and D. Huttenlocher. Efficient matching ofpictorial structures. CVPR, Hilton Head Island, South Car-olina, USA, pages 66–75, 2000.

[4] W. T. Freeman and M. Roth. Orientation histograms forhand gesture recognition. Intl. Workshop on Automatic Face-and Gesture- Recognition, IEEE Computer Society, Zurich,Switzerland, pages 296–301, June 1995.

[5] W. T. Freeman, K. Tanaka, J. Ohta, and K. Kyuma. Com-puter vision for computer games. 2nd International Confer-ence on Automatic Face and Gesture Recognition, Killington,VT, USA, pages 100–105, October 1996.

[6] D. M. Gavrila. The visual analysis of human movement: Asurvey. CVIU, 73(1):82–98, 1999.

[7] D. M. Gavrila, J. Giebel, and S. Munder. Vision-based pedes-trian detection: the protector+ system. Proc. of the IEEE In-telligent Vehicles Symposium, Parma, Italy, 2004.

[8] D. M. Gavrila and V. Philomin. Real-time object detection forsmart vehicles. CVPR, Fort Collins, Colorado, USA, pages87–93, 1999.

[9] S. Ioffe and D. A. Forsyth. Probabilistic methods for findingpeople. IJCV, 43(1):45–68, 2001.

[10] T. Joachims. Making large-scale svm learning practical. InB. Schlkopf, C. Burges, and A. Smola, editors, Advances inKernel Methods - Support Vector Learning. The MIT Press,Cambridge, MA, USA, 1999.

[11] Y. Ke and R. Sukthankar. Pca-sift: A more distinctive rep-resentation for local image descriptors. CVPR, Washington,DC, USA, pages 66–75, 2004.

[12] D. G. Lowe. Distinctive image features from scale-invariantkeypoints. IJCV, 60(2):91–110, 2004.

[13] R. K. McConnell. Method of and apparatus for pattern recog-nition, January 1986. U.S. Patent No. 4,567,610.

[14] K. Mikolajczyk and C. Schmid. A performance evaluation oflocal descriptors. PAMI, 2004. Accepted.

[15] K. Mikolajczyk and C. Schmid. Scale and affine invariantinterest point detectors. IJCV, 60(1):63–86, 2004.

[16] K. Mikolajczyk, C. Schmid, and A. Zisserman. Human detec-tion based on a probabilistic assembly of robust part detectors.The 8th ECCV, Prague, Czech Republic, volume I, pages 69–81, 2004.

[17] A. Mohan, C. Papageorgiou, and T. Poggio. Example-basedobject detection in images by components. PAMI, 23(4):349–361, April 2001.

[18] C. Papageorgiou and T. Poggio. A trainable system for objectdetection. IJCV, 38(1):15–33, 2000.

[19] R. Ronfard, C. Schmid, and B. Triggs. Learning to parse pic-tures of people. The 7th ECCV, Copenhagen, Denmark, vol-ume IV, pages 700–714, 2002.

[20] Henry Schneiderman and Takeo Kanade. Object detectionusing the statistics of parts. IJCV, 56(3):151–177, 2004.

[21] Eric L. Schwartz. Spatial mapping in the primate sensory pro-jection: analytic structure and relevance to perception. Bio-logical Cybernetics, 25(4):181–194, 1977.

[22] P. Viola, M. J. Jones, and D. Snow. Detecting pedestriansusing patterns of motion and appearance. The 9th ICCV, Nice,France, volume 1, pages 734–741, 2003.

Figure  3:  Average  gradient  computed  by  Dalal  and  Triggs  [2].  

Figure  2:  Examples  from  the  MIT  pedestrian  dataset.  

Weighted  voting  in  cells  We  partitioned  each  image  into  6  by  6  pixel  cells.    For  each  cell,  we  computed  a  histogram  of  gradient  orientation.    We  used  a  signed  orientation  (between  0  and  360  degrees)  and  had  9  spatial  bins  for  each  histogram.    Votes  were  weighted  by  gradient  magnitude.      Normalization  over  overlapping  spatial  blocks  In  order  to  provide  some  invariance  to  illumination,  histograms  were  concatenated  together  and  then  normalized  across  larger  spatial  blocks.    We  used  blocks  of  3  cells  by  3  cells.    Histograms  were  normalized  according  to  L2  norm:  

𝑣 → 𝑣/ 𝑣 ! + 𝜖!  For  the  overall  descriptor,  we  concatenated  the  normalized  histograms  for  dense,  overlapping  blocks  across  the  entire  image.      Support  Vector  Machine  For  classification,  we  trained  a  soft  (C  =  0.01)  SVM  with  a  Gaussian  kernel  on  our  training  examples.    

Part  1:  Results  We  were  able  to  classify  our  test  set  with  92%  accuracy.    Full  results  are  shown  below:       Predicted  Human   Predicted  Non-­‐human  

Human   87   4  Non  Human   5   22  

   

Part  2:  HOG  Descriptors  for  detection  of  multiple  image  properties  Summary  We  then  assessed  the  accuracy  of  HOG  descriptors  at  predicting:  1)  whether  the  image  has  a  human  or  not,  and  2)  whether  the  image  has  a  bike  or  not,  by  applying  two  SVM  classification  schemes.    Descriptors  were  computed  across  the  entirety  of  images,  not  just  a  spatial  window.      Dataset  We  used  a  portion  of  the  INRIA  person  dataset  to  create  image  sets  for  four  object  categories:  those  containing  humans,  those  containing  bikes,  those  containing  humans  and  bikes,  and  those  containing  neither.    In  total,  we  had  323  images  of  humans  (partitioned  into  291  for  training  and  32  for  testing),  373  images  of  bikes  (partitioned  into  336  for  training  and  37  for  testing),  273  images  of  humans  and  bikes  (partitioned  into  246  for  training  and  27  for  testing),  and  210  images  of  neither  (partitioned  into  186  for  training  and  24  for  testing).  

 Our  dataset  was  diverse  and  challenging.    It  included  images  taken  from  a  variety  of  viewpoints,  with  different  illuminations  and  levels  of  occlusion.    Though  it  is  common  practice  to  eliminate  images  that  are  particularly  difficult  [8],  due  to  extreme  occlusion,  for  example,  we  did  not  perform  any  data  sanitation.  

Figure  4:  Samples  from  our  dataset.    Examples  (from  left  to  right)  of  bikes,  humans,  humans  and  bikes,  and  neither  humans  nor  bikes.    

Because  our  descriptors  must  serve  as  feature  vectors  for  classifiers  that  predict  the  presence  of  multiple  object  categories,  we  did  not  clip  our  images  or  use  a  sliding  detecting  window  at  various  scales.    This  marks  a  departure  from  Dalal  and  Triggs,  who  used  a  superset  of  our  dataset,  clipped  to  object-­‐sized  windows:  

 Figure  5:  Images  used  by  Dalal  and  Triggs,  for  comparison.  Figure 2. Some sample images from our new human detection database. The subjects are always upright, but with some partial occlusionsand a wide range of variations in pose, appearance, clothing, illumination and background.

probabilities to be distinguished more easily. We will oftenuse miss rate at 10!4FPPW as a reference point for results.This is arbitrary but no more so than, e.g. Area Under ROC.In a multiscale detector it corresponds to a raw error rate ofabout 0.8 false positives per 640!480 image tested. (The fulldetector has an even lower false positive rate owing to non-maximum suppression). Our DET curves are usually quiteshallow so even very small improvements in miss rate areequivalent to large gains in FPPW at constant miss rate. Forexample, for our default detector at 1e-4 FPPW, every 1%absolute (9% relative) reduction in miss rate is equivalent toreducing the FPPW at constant miss rate by a factor of 1.57.

5 Overview of ResultsBefore presenting our detailed implementation and per-

formance analysis, we compare the overall performance ofour final HOG detectors with that of some other existingmethods. Detectors based on rectangular (R-HOG) or cir-cular log-polar (C-HOG) blocks and linear or kernel SVMare compared with our implementations of the Haar wavelet,PCA-SIFT, and shape context approaches. Briefly, these ap-proaches are as follows:Generalized Haar Wavelets. This is an extended set of ori-ented Haar-like wavelets similar to (but better than) that usedin [17]. The features are rectified responses from 9!9 and12!12 oriented 1st and 2nd derivative box filters at 45" inter-vals and the corresponding 2nd derivative xy filter.PCA-SIFT. These descriptors are based on projecting gradi-ent images onto a basis learned from training images usingPCA [11]. Ke & Sukthankar found that they outperformedSIFT for key point based matching, but this is controversial[14]. Our implementation uses 16!16 blocks with the samederivative scale, overlap, etc., settings as our HOG descrip-tors. The PCA basis is calculated using positive training im-ages.Shape Contexts. The original Shape Contexts [1] used bi-nary edge-presence voting into log-polar spaced bins, irre-spective of edge orientation. We simulate this using our C-HOG descriptor (see below) with just 1 orientation bin. 16angular and 3 radial intervals with inner radius 2 pixels andouter radius 8 pixels gave the best results. Both gradient-

strength and edge-presence based voting were tested, withthe edge threshold chosen automatically to maximize detec-tion performance (the values selected were somewhat vari-able, in the region of 20–50 graylevels).Results. Fig. 3 shows the performance of the various detec-tors on the MIT and INRIA data sets. The HOG-based de-tectors greatly outperform the wavelet, PCA-SIFT and ShapeContext ones, giving near-perfect separation on the MIT testset and at least an order of magnitude reduction in FPPWon the INRIA one. Our Haar-like wavelets outperform MITwavelets because we also use 2nd order derivatives and con-trast normalize the output vector. Fig. 3(a) also shows MIT’sbest parts based and monolithic detectors (the points are in-terpolated from [17]), however beware that an exact compar-ison is not possible as we do not know how the database in[17] was divided into training and test parts and the nega-tive images used are not available. The performances of thefinal rectangular (R-HOG) and circular (C-HOG) detectorsare very similar, with C-HOG having the slight edge. Aug-menting R-HOG with primitive bar detectors (oriented 2nd

derivatives – ‘R2-HOG’) doubles the feature dimension butfurther improves the performance (by 2% at 10!4 FPPW).Replacing the linear SVM with a Gaussian kernel one im-proves performance by about 3% at 10!4 FPPW, at the costof much higher run times1. Using binary edge voting (EC-HOG) instead of gradient magnitude weighted voting (C-HOG) decreases performance by 5% at 10!4 FPPW, whileomitting orientation information decreases it by much more,even if additional spatial or radial bins are added (by 33% at10!4 FPPW, for both edges (E-ShapeC) and gradients (G-ShapeC)). PCA-SIFT also performs poorly. One reason isthat, in comparison to [11], many more (80 of 512) principalvectors have to be retained to capture the same proportion ofthe variance. This may be because the spatial registration isweaker when there is no keypoint detector.

6 Implementation and Performance StudyWe now give details of our HOG implementations and

systematically study the effects of the various choices on de-1We use the hard examples generated by linear R-HOG to train the ker-

nel R-HOG detector, as kernel R-HOG generates so few false positives thatits hard example set is too sparse to improve the generalization significantly.

HOG  Computation  For  efficiency  and  accuracy,  we  used  a  Matlab  library  function  for  HOG  computation  during  this  part  of  our  investigation.    The  HOG  implementation  that  we  used  normalized  non-­‐overlapping  blocks.    To  prevent  over-­‐fitting  our  classifiers,  we  used  much  larger  blocks.    We  tried  computing  HOG  descriptors  for  our  200  pixel  by  300  pixel  images  with  3  and  6  blocks  per  image  side.    Support  Vector  Machines  We  used  two  soft  (C=0.01)  Support  Vector  Machines  for  classification  (each  with  a  Gaussian  kernel).    For  the  human  classifier,  images  of  humans  made  up  the  set  of  positive  samples.    For  the  bike  classifier,  images  of  bikes  made  up  the  set  of  positive  samples.    Thus,  for  both  classifiers,  images  with  bikes  and  humans  are  included  as  positive  samples.    After  classification  by  both  SVMs,  predictions  for  test  images  are  assigned  according  to:     Bike  classifier  predicted  

positive  Bike  classifier  predicted  

negative  Human  classifier  predicted  positive  

“Human  and  bike”   “Human”  

Human  classifier  predicted  negative  

“Bike”   “Neither  human  nor  bike”  

 

IV.  Results    Through  this  method,  we  achieved  the  highest  success  rates  with  6  by  6  blocks.  The  success  rate  was  77%  for  human  classification  and  71%  for  bike  classification.  Overall,  58%  of  images  were  correctly  classified  into  one  of  the  four  categories.    

  Predicted:  Human  

Predicted:  Bike  

Predicted:  Human  &  Bike  

Predicted:  Neither  

Actual:  Human   22   6   0   4  Actual:  Bike   5   32   0   0  Actual:    

Human  &  Bike  10   5   12   0  

Actual:  Neither   8   15   1   0      

These  results  demonstrate  that  our  method  does  a  fairly  good  job  categorizing  images  containing  either  just  humans  or  just  bikes.    Many  of  the  incorrect  classifications  had  obvious  challenges  that  a  HOG  descriptor  cannot  account  for,  such  as  occlusion,  scaling,  illumination  and  orientation.                                                                    Scaling                                                                                                                        Orientation    

                                                             Illumination                                                                                                                    Occlusion    

       However,  our  classification  has  very  poor  specificity.    In  particular,  images  with  neither  bikes  nor  humans  were  consistently  labeled  inaccurately.        A  large  set  of  humans  make  up  the  set  of  negative  samples  for  the  bike  classifier.    Likewise,  bikes  are  abundant  in  the  set  of  negative  images  for  the  human  classifier.    As  a  result,  our  model  is  likely  over-­‐fitting  negative  samples  for  bikes  to  be  images  of  humans,  and  vice  versa.    To  correct  for  this,  we  removed  a  large  number  of  bike  images  from  the  negative  samples  of  our  human  training  set.    We  also  removed  a  large  number  of  humans  from  our  bike  training  set.    We  then  reran  the  classifier  on  descriptors  with  6  blocks  per  side.    Using  the  new  training  sets,  we  achieved  a  success  rate  of  about  76%  for  classifying  humans  correctly;  however,  the  rate  for  classifying  bikes  correctly  fell  to  about  58%.  

  Predicted:  Human  

Predicted:  Bike  

Predicted:  Human  &  Bike  

Predicted:  Neither  

Actual:  Human   3   1   16   12  Actual:  Bike   8   5   2   22  Actual:    

Human  &  Bike  5   3   13   6  

Actual:  Neither   3   0   1   20      Clearly,  specificity  improves  at  the  expense  of  sensitivity.  

V.  Discussion  Our  initial  classification  scheme  demonstrates  that  HOG  features  do  have  some  descriptive  power  beyond  the  detection  of  a  single  object.    Our  dataset  was  very  challenging,  and  the  accuracy  we  attained  was  reasonable.        However,  our  investigation  shows  that  classification  schemes  can  easily  become  over-­‐fit  in  a  multiple  object  detection  scenario.    The  performance  of  our  method  on  the  full  training  set  shows  that  these  descriptors  would  be  well-­‐suited  to  differentiate  between  two  object  categories,  e.g.  separate  images  of  bikes  from  images  of  humans.    However,  the  lack  of  specificity  in  this  scheme  makes  it  unsuitable  for  use  in  differentiating  multiple  separate  categories.          Our  second  investigation  with  a  more  restricted  training  set  gives  a  more  realistic  interpretation  of  the  performance  we  could  expect  if  we  were  to  scale  this  method  to  several  more  object  categories,  since  the  set  of  negative  samples  for  one  category  is  not  fit  to  match  the  description  of  the  other  category.    Predictably,  classification  of  bikes  is  less  successful  than  classification  of  humans,  since  HOG  descriptors  are  known  to  be  best  at  classifying  objects  that  take  on  a  limited  set  of  shapes  when  pictured  from  different  angles.      Though  the  algorithm’s  performance  was  better  than  random  guessing,  the  lack  of  a  detection  window  diminished  object  detection  success  significantly.      Furthermore,  since  HOG-­‐based  classification  schemes  depend  on  finding  patterns  in  gradients,  we  would  expect  performance  to  worsen  when  more  difficult  object  categories  (i.e.  object  categories  with  more  intra-­‐class  variability)  were  added.    Furthermore,  we  would  not  expect  good  performance  if  HOG  descriptors  were  used  to  classify  images  based  on  scene  information  (e.g.  indoor/outdoor  classification).        On  this  topic,  future  investigations  could  include  multi-­‐category  classification  using  different  types  of  descriptors,  such  as  SIFT.    One  could  also  construct  classification  schemes  for  multiple  object  categories  using  Implicit  Shape  Models  or  ObjectBank  

heat  maps.    The  performance  of  HOG  descriptors  could  be  used  as  a  baseline  for  performance.          

 References    [1]  Corvee,  E.,  2010.    Body  Parts  Detection  for  People  Tracking  Using  Trees  of  Histogram  of  Oriented  Gradient  Descriptors.    In:  IEEE  pp.  469  –  475.    [2]  Dalal,  N.,  Triggs,  B.,  2005.    Histograms  of  oriented  gradients  for  human  detection.  In:  Proc.  CVPR  2005,  vol.  1,  pp.  886-­‐893.    <http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1%467360>.    [3]  Déniz,  O.,    Bueno,  G.,  Salido,  J.,  De  la  Torre,  F.,  2011.    Face  recognition  using  Histograms  of  Oriented  Gradients.    In:  Pattern  Recognition  Letters  vol.  32  issue  12,  http://www.sciencedirect.com/science/journal/01678655/32/12,  pp.  1598–1603.    [4]  Freeman,  W.  T.,  Roth,  M.,  1994.    Orientation  Histograms  for  Hand  Gesture  Recognition.    In:  IEEE.    [5]  Huber,  D.,  Morris,  D.D.,  Hoffman,  R.,  2010.      Visual  classification  of  coarse  vehicle  orientation  using  Histogram  of  Oriented  Gradients  features.    In:  IEEE,  pg.  921  –  928.    [6]  Ludwig,  O.,  2011.    HOG  descriptor  for  Matlab.    http://  http://www.mathworks.com/matlabcentral/fileexchange/28689-­‐hog-­‐descriptor-­‐for-­‐matlab/content/HOG.m.    Universidade  de  Coimbra.      [7]  Suard,  F.,  Rakotomamonjy,  A.,  Bensrhair,  A.,  Broggi,  A.,  2006.    Pedestrian  Detection  using  Infrared  images  and  Histograms  of  Oriented  Gradients.    In:  IEEE,  pp.  206  –  212.    [8]  Szummer  ,  M.,  R.  W.  Picard,  1997.    Indoor-­‐Outdoor  Image  Classification.    In:  IEEE,  pp.  43-­‐51.