1 rotation invariant face detection using neural network lecturers: mehdi dehghani - mahdy bashary...

Post on 18-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Rotation Invariant Face Detection Rotation Invariant Face Detection Using Neural NetworkUsing Neural Network

Lecturers:

Mehdi Dehghani - Mahdy Bashary

Supervisor:Dr. Bagheri Shouraki

Spring 2007

2

Agenda

What’s face detection? Usages Face Detection Techniques in Grayscale Images Template-Based Face Detection with Neural

Network Structure Router Network Detector Network Arbitration Among Multiple Networks Empirical Results

3

Face Detection Face detection is a computer technology that determines the locations

and sizes of human faces in arbitrary (digital) images. It detects facial features and ignores anything else, such as buildings, trees and bodies.

4

Usages

Biometrics: often as a part of face recognition system

Security Surveillance (e.g. for logging people passing area by saving their faces.)

Image database Management (e.g. for make several picture of face in a database uniform by align face in the center of image)

5

Face Detection Techniques in Grayscale Images

Template-based face detection: these techniques encode facial images directly in terms of pixel intensities. These images can be characterized by probabilistic models of the set of face images or implicitly by neural networks or other mechanisms.

6

Face Detection Techniques in Grayscale Images (cont.)

Feature-based face detection: This approach based on extracting features and applying either manually or automatically generated rules for evaluating these features. (e.g.: finding place of eyes, mouth and nose and checking if nose is in the triangle made by eyes and mouth.)

7

Template-base Face Detection

8

Image Pyramid

It’s used to detect faces larger than window size.

It’s made by repeatedly reducing size of input image by subsampling.

This amount of reduction in size in each stage is determined by invariance of detector network to scale.

9

Rotation Invariance

Rotation invariance is ability to detect faces which are rotated in-plane

10

Rotation Invariance (cont.)

The simplest would be to employ the upright face detection, by repeatedly rotating the input image in small increments and applying the detector to each rotated image. However, this would be an extremely computationally expensive procedure

θ

11

Structure

Image Pyramid Router Network Detector Network

12

Router Network

First, the window is preprocessed using histogram equalization, and given to a router network. The rotation angle returned by the router is then used to rotate the window with the potential face to an upright position. Finally, the derotated window is preprocessed and passed to one or more detector networks.

13

Router Network (cont.)

DerotatorCompute

Orientation

14

Output Angle

Single Unit: The activation amount of a single output unit (usually either between 0-1 or -1 and +1) is mapped linearly between the range of 0-360 to determine the angle of rotation.

1-of-N Encoding: N units are used to represent the output Each unit represents 360/N For example, if there were 180 units, and if unit 30 had the highest activation, this would indicate a rotation of 60.

15

Output Angle (cont.)

If we presume there are vectors from center of circle to each units with length of pixel intensity. The direction of average vector of these vectors is interpreted as the angle of face.

)(tan

)10sin(

)10cos(1

35

0

35

0

x

y

ioutputy

ioutputx

ii

ii

16

Architecture

The architecture for the router network consists of three layers, an input layer of 400 units, a hidden layer of 15 units, and an output layer of 36 units. Each layer is fully connected to the next. Each unit uses a hyperbolic tangent activation function, and the network is trained using the standard error backpropogation algorithm.

17

Generating training set

The training examples are generated from a set of manually labelled example images containing 1048 faces. In each face,

the eyes, tip of the nose, and the corners and center of the mouth are labelled.

We first compute the average location for each of the labelled features over the entire training set.

Then, each face is aligned with the average feature locations, by computing the rotation, translation, and scaling that

minimizes the distances between the corresponding features. After iterating these steps a small number of times, the

alignments converge.

18

Generating training set (cont.)

19

Generating training set (cont.)

Example upright frontal face images aligned to one another.

20

Training Router Network

To generate the training set, the faces are rotated to a random orientation.

21

Training Router Network (cont.)

Value[i]=cos(Value[i]=cos(θθ – i×10) – i×10)

i=0i=35

θθ

22

Review

DerotatorCompute

Orientation

)(tan

)10sin(

)10cos(1

35

0

35

0

x

y

ioutputy

ioutputx

ii

ii

23

Detector Network at a glance

It has a 20×20 pixel region of image as input and generates output ranging from 1 to -1 signifying absence or presence of a face.

24

The Preprocessing

Light Correction: This process equalize light effects in different places of window. This compensate for a variety of lighting conditions.

Histogram Equalization: Histogram equalization is performed on the window. This compensate for difference in camera input gains.

25

The Preprocessing

26

Detector Neural Network It uses multi-layer perceptron. There are three types of hidden units:

four which look at 10 × 10 pixel subregions,16 which look at 5 × 5 pixel subregions, and six which look at overlapping 20 × 5 pixel horizontal stripes of pixels.

In particular, the horizontal stripes allow the hidden units to detect such features as mouths or pairs of eyes, while the hidden units with square receptive fields might detect features such as individual eyes,the nose, or corners of the mouth.

27

Training Technique

It uses backpropagation with momentum technique to train the network.

The detectors have two sets of training examples: images which are faces, and images which are not.

Training a neural network for the face detection task is challenging because of the difficulty in characterizing prototypical “non-face” images

28

Generating face images training set from each original image by randomly

rotating the images (about their center points) up to 10º,scaling between 90 percent and 110 percent, translating up to half a pixel, and mirroring.

The randomization gives the filter invariance to translations of less than a pixel, scalings of 20 percent and rotations up to 20º.

29

General non-face images

Practically any image can serve as a nonface example because the space of nonface images is much larger than the space of face images. However, collecting a “representative” set of nonfaces is difficult.

30

A “bootstrap” training algorithm 1. Create an initial set of non-face images by generating

1000 random images.2. Train the neural network to produce an output of +1,0

for the face examples, and -1,0 for the nonface examples. In the first iteration, the network’s weights are initialized random. After the first iteration, we use the weights computed by training in the previous iteration as the starting point.

3. Run the system on an image of scenery which contains no faces. Collect subimages in which the network incorrectly identifies a face (an output activation > 0,0).

4. Select up to 250 of these subimages at random, and add them into the training set as negative examples. Go to step 2.

31

An Example

32

An Example of Result

33

Refinement

The raw output from a single network will contain a number of false detections.

A strategy should be used to reduce number of false positives.

There are two ways to improve the reliability of the detector: cleaning-up the outputs from an individual network, and arbitrating among multiple networks.

34

Clean-Up Heuristic

The faces is detected at nearby position or scales, while false detections often occur with less consistency. These observation will lead to a heuristic which can eliminate false detections.

If a particular location is correctly identified as a face, then all other detection locations which overlap it are likely to be errors, and therefore be eliminated. So we preserve the locations with the higher number of detections within a small neighborhood, and eliminate locations with fewer detections.

35

Illustration For Heuristic

36

The Result

37

Arbitration Among Multiple Network

To reduce the number of false positives, we can apply multiple networks, and arbitrate between their outputs to produce the final decision. Each network is trained using the same algorithm with the same set of face examples, but with different random initial weights, random initial nonface images, and permutations of the order of presentation of the scenery images.

The detection and false positive rates of the individual networks will be quite close. However, because of different training conditions and because of self selection of negative training examples, the networks will have different biases and will make different errors.

38

Arbitration Among Multiple Network

39

40

Analysis of the Networks

The output of the router network is used to derotate the input for the detector, the angular accuracy of the router must be compatible with the angular invariance of the detector. To measure the accuracy of the router, we generated test example images based

on the training images, with angles between -30º and 30º at 1º increments. We applied the detector to the same set of test images as the router, and

measured the fraction of faces which were correctly classified as a function of the angle of the face.

Because 92% of errors range between -10 to 10 and our network detect about 90 percent of faces which are rotated between -10 and 10, the two networks are compatible.

41

Empirical Results

Upright Test Set: There are a total of 130 images, with 511 faces (of which 469 are within 10º of upright).

Rotated Test Set: There are 50 images containing 223 faces, of which 210 are at angles of more than 10º from upright.

42

Proposed System

In current system we train detector network with the scenery images straightly fed to detector network.

If we train our detector network with scenery images passed from the router network, the performance of system increases.

43

Exhaustive Search of Orientations

To demonstrate the effectiveness of the router for rotation invariant detection, we applied the two sets

of detector networks described above without the router. The detectors were instead applied at 18

different orientations (in increments of 20º) for each image location.

44

Upright Detection Accuracy

To ensure that adding the capability to detect rotated face has not come with expense of losing accuracy in detecting upright faces. We apply upright face detector on test set image.

45

Comparison

Our new system has a slightly lower detection rate on upright faces for two reasons. First, the detector networks cannot recover from all the errors

made by the router network. Second, the detector networks which are trained with derotated

negative examples are more conservative in signalling detections; this is because the derotation process makes the negative examples look more like faces, which makes the classification problem harder.

46

47

48

Movie Examples

49

References

1. H. A. Rowley, S. Baluja, and T. Kanade, “Neural Network-Based Face Detection,” IEEE Trans. PAMI, vol. 20, pp. 23-38, Jan. 1998.

2. H.A. Rowley, S. Baluja, and T. Kanade, "Rotation Invariant Neural Network-Based Face Detection" Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 38-44, 1998.

3. H.A. Rowley, ”Neural Network Face Detection”, PhD Thesis, May 1999.

4. Shumeet Baluja. Face detection with in-plane rotation: Early concepts and preliminary results. JPRC-1997-001-1, Justsystem Pittsburgh Research Center, 1997.

50

Any Question?

top related