an implementation of model-free face detection and...

13
An Implementation of Model-Free Face Detection and Head Tracking with Morphological Hole Mapping in C# ENCM 503: Digital Video Processing Department of Electrical and Computer Engineering Schulich School of Engineering University of Calgary Paul Lapides Charles Hateley December 7, 2007

Upload: lethu

Post on 08-Jul-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

An Implementation of Model-Free Face Detection and Head Tracking with Morphological Hole

Mapping in C#

ENCM 503: Digital Video Processing Department of Electrical and Computer Engineering

Schulich School of Engineering University of Calgary

Paul Lapides Charles Hateley

December 7, 2007

Table of Contents Abstract ............................................................................................................................... 1

Procedure ............................................................................................................................ 1

Binary Skin map ............................................................................................................. 2

RGB skin clustering.................................................................................................... 2

YCrCb skin clustering................................................................................................. 2

HSV skin clustering .................................................................................................... 3

Observations ............................................................................................................... 3

Morphological Face Extraction....................................................................................... 3

Closing the Skin Regions............................................................................................ 3

Dilation ....................................................................................................................... 4

Erosion ........................................................................................................................ 4

Hole map..................................................................................................................... 4

Labeling ...................................................................................................................... 5

Identify Faces.................................................................................................................. 7

Limitations and Improvements ........................................................................................... 8

Lighting Conditions ........................................................................................................ 8

False Positives and Recognition Problems ..................................................................... 9

Imperfect Closing............................................................................................................ 9

Speed............................................................................................................................. 10

Credits ............................................................................................................................... 10

Abstract This project implements face detection and tracking using morphological methods as

outlined by [1]. An interactive C# program was developed which uses a supplied camera

interface [5] for acquiring image data. This interface was the only library used in the

implementation – image-processing functionality was implemented specific to the classes

defined in the project.

The proposed method fares quite well at detecting faces in varying conditions as well the

tracking there of. The hole-mapping method successfully detects faces at most rotational

angles as long as some face is showing.

Procedure The algorithm we implemented finds face regions by examining the number of “holes”

(e.g. eyes, nose, mouth, ears) in each region. If the number of holes is above a threshold,

the region is considered a face.

Regions are created by applying a simple set of rules to each pixel in the image [1]. These

rules determine if the pixel is skin colored and should be considered further. This creates

a black & white image (not grayscale) of candidate skin regions.

Some regions will have black holes inside of them that could be eyes, nose, etc and need

to be isolated. To isolate, the skin regions are “closed”, which means that these holes are

filled in. Closing is done by two similar operations called dilation and erosion [2,3]. Once

the holes are closed, the original skin regions are subtracted from the closed regions,

leaving only the holes.

Both the skin and hole regions are then labeled, which assigns connected regions a

unique tag [4]. The labeled hole image is overlaid above the labeled image and the

number of holes above each labeled region are counted. The labeled regions with the

most holes are considered faces and are enclosed with a rectangle.

We will examine this photograph:

Binary Skin map

Three methods of detecting skin were implemented that generate a binary skin map.

RGB skin clustering

For a given input frame, pixels are marked “1” (white) considering the following

constraints:

� R>95 && G>40 &&B>20

� Max(R,G,B) – Min(R,G,B) > 15

� |R-G| > 15 && R>G && R>B

Otherwise, pixels are labeled “0” (black).

YCrCb skin clustering

For a given input frame, pixels are marked “1” (white) considering the following

constraints:

� Y>80

� 85<Cb<135

� 135<Cr<180

Otherwise, pixels are labeled “0” (black).

HSV skin clustering

For a given input frame, pixels are marked “1” (white) considering the following

constraints:

� 0<H<50

� 0.23<S<0.68

Otherwise, pixels are labeled “0” (black).

Observations

YCrCb and HSV methods better binary maps than RGB due to the focus on intensity and

saturation values versus true color values. HSV performs better in low light conditions.

RGB Rules YCrCb Rules HSV Rules

We will use the YCrCb skin map for the remained of the document.

Morphological Face Extraction

Closing the Skin Regions

Two operations are used in combination to close the binary skin map. Dilation is the

process of making a region “fatter” while erosion makes it thinner. The idea is that when

the skin regions are dilated, the holes inside will be filled in. Erosion is used to return the

regions to the original size, but now without any holes.

Original Closed

Both methods use a filter that changes how much the regions are changed. The filters are

also binary images, only a few pixels in size. The bigger the filter, the more the region

will be dilated or eroded. A circle was used in our implementation, but a square will be

used in the explanations to follow.

Dilation

The filter is centered on each black pixel in the skin image, called the target pixel. If at

least a single white pixel of the filter overlaps with a white pixel of the skin image, then

the target pixel is made white. If none of the filter overlaps, the target pixel remains

black.

Dilation shown with 5x5 rectangular filter. The red dot represents a pixel that remains

unchanged, while the green dot shows one that will change color to gray. The picture on

the right shows the original pixels with a dilation using a 3x3 filter (light gray) and 5x5

filter (lightest gray).

Erosion

This time, the filter is centered on each white pixel in the skin image, again called the

target pixel. If any part of the filter overlaps with a black pixel of the skin image, the

target pixel is made black. If the entire filter is overlapping with white pixels from the

skin image, then the target pixel remains white.

Erosion shown with 3x3 rectangular filter. The green dot represents a pixel that remains

unchanged, while the red dot shows one that will change color to white. The picture on

the right shows the original pixels with an erosion using a 3x3 filter (light gray) and 5x5

filter (darkest gray).

Hole map

A hole map is generated as the difference between the closed skin map and the skin map.

The operation is performed on a per-pixel basis as follows:

Equation 1

),(),(),( yxPyxPyxP SkinCSkinhole −=

The result is a new binary image representing the “holes” in the skin regions.

Labeling

Each skin and hole region must be labeled so it can be treated as a single unit instead of

simply a group of connected pixels. Labeling is done by assigning unique tags (numbers)

to each pixel in the image. Black pixels are given a tag of 0, while white pixels have tags

that are greater than zero, kept track with a counter that is incremented.

The image is scanned left to right, row by row. Only white pixels are considered for

tagging. A target pixel will have four neighbors that have already been tagged. The left,

up-left, up, and up-right will be used to assign a tag to the target pixel. There are three

possible cases that can occur.

The first case is if none of the neighbor pixels are white. The neighbors will all have tags

of 0. A new unique tag is assigned to the target pixel using the value of the counter. The

counter is incremented.

The second case is if only one of the neighbor pixels is white. This neighbor will have a

non-zero tag that will be assigned to the target pixel since the two pixels are touching

(they are neighbors).

The third case is the most complex. It is when more than one neighbor of the target pixel

is white. In this case, there are two other possibilities. If all of the neighbors have the

same tag, the tag is simply copied to the target pixel. If two neighbors have different tags,

this means that the algorithm has been assigning different tags to a single connected

region that must only have one tag. The different tags must be placed in an equivalence

class that will be resolved later. A list of classes (basically, a list of lists) is used to keep

track of equivalence classes. We will call this the class list.

This situation is only possible if the up-left and up-right neighbors are white while the up

neighbor is black or if the left and up-right pixels are black while the up-left and up pixels

are black. In the first case, the left neighbor will have the same tag as the up-left neighbor

since the pixels have been processed already and are themselves neighbors, and in the

second case, only two pixels are neighbors that happen to have different tags. This means

that two different tags, tag-l and tag-r, must be resolved using the class list.

The two possible cases of neighbors having different tags.

The class list is searched for tag-l and tag-r. Either none, one, or two equivalence classes

will be returned. If no classes are found then both tags have not been encountered before

and are put into a class, which is added to the class list. If one class is found, the tag

whose class was not found is added to this class. If two classes are found (one for each

tag), then the classes are combined and one of them is deleted. This is only done if the

classes found are not the same class, implying that the tags have already been equated

and reside in the same class already.

The original shape and first and second equivalence class matching.

Once every white pixel in the image has been processed this way, the algorithm does

another pass through each white pixel, this time to equate equivalent tags. If a pixel’s tag

is found in a class, it is assigned the first tag in the class (which is a list). This way, each

pixel whose tag is in the same class will have the same tag.

Identify Faces

The closed skin map and hole map are both labeled. The hole map is overlaid on top of

the skin map and each hole that is above a skin region is counted. This leaves the skin

map with a hole count for every labeled region.

Aside: the hole map is also labeled so that each hole is only considered once per skin

region.

What is left is a skin map of labeled regions and their associated hole counts. Regions

like walls or clothing will have little holes compared with face regions. Complex

statistics can be used to determine if a region has a “high” hole count. In our

implementation, a hole count is considered to be “high” if it is greater than the average

number of holes per region.

Finally, as visual output, the regions determined to be faces are enclosed by a rectangle

that is drawn on the original image.

Limitations and Improvements

Lighting Conditions

The rules for identifying skin are based entirely on color. When a poorly lit image is

captured, the colors of the skin pixels are not the same as in a well lit image. Because of

this discrepancy, skin regions are not correctly identified if the image is too dark.

A value can be computed based on the average brightness or contrast of the image that

can be used to adjust the values used in the skin rules.

False Positives and Recognition Problems

Regions that are not faces are sometimes incorrectly identified as faces. This is because

the region is skin colored and has a high number of holes – the exact metrics we use to

detect faces. For example, an arm with a tattoo or a beige colored shirt with a graphic will

be considered skin while the tattoo and graphic will not, creating holes. This will

contaminate the hole count for that region, causing it to be incorrectly detected as skin.

Also, if a face is far away from the camera it will be identified as a small patch of skin.

However, because it is far away, it is unlikely that the facial features will be captured

with the resolution, resulting in skin with no holes. This causes real faces to go

undetected while other skin colored regions with holes to be incorrectly detected.

The size of the holes may taken into consideration when they are overlaid on top of the

skin regions so that only larger holes will be counted.

Hands and walls are detected as faces.

Imperfect Closing

The closing procedure uses dilation and erosion to remove holes from a skin region.

Dilation makes the regions “fatter” and this has the side effect of linking two

disconnected skin regions together. These regions remain connected after the erosion

process, contaminating the image with a large skin region. If the two previously separate

regions were faces, a new, connected, region exists with a very high amount of holes –

twice the normal amount for a face.

This high hole count will bring the average holes per region up and will cause other real

faces to not be identified. As well, two faces will be detected as a single face, due to the

fact that both faces have been combined into a single region.

A better closing routine can solve this problem, but may be more computationally

expensive than dilation and erosion. Smaller filters may be used that will not make the

regions a lot “fatter” during dilation, but holes may be not be completely closed, causing

problems generating the hole map.

The dilation blends many separated regions into a single connected region that will be

classified as a very large face.

Speed

The procedure just outlined performs at approximately 2 frames per second on a dual-

core machine (1.66GHz per core) using frames that are 320 x 240 pixels. This is

sufficient for refreshing the position of the rectangles on top of the original image in real

time. However, higher resolution frames perform much slower and cannot be used for

real time applications.

The skin mapping, dilation, erosion, and hole mapping routine have a computational

complexity of O(n) where n is the number of pixels in the image. However, the labeling

routine is slower, having a complexity of O(n2) due to the equivalence class matching.

Scaling the original image to half its dimensions will result in a loss of detection

accuracy, as the holes of the original image may be lost when the image is scaled.

Credits [1] Udo Ahlvers, Ruben Rajagopalan, Udo Z¨olzer. “Model-Free Face Detection and

Head Tracking with Morphological Hole Mapping”, in 13th

European Signal

Processing Conference: EUSIPCO’2005, Antalya, Turkey, September 4-8, 2005.

[2] Robert Fisher, Simon Perkins, Ashley Walker and Erik Wolfart. (2003).

Morphology – Dilation. Retrieved Dec, 2007, from HIPR – HyperMedia Image

Processing Reference. http://www.cee.hw.ac.uk/hipr/html/dilate.html

[3] Robert Fisher, Simon Perkins, Ashley Walker and Erik Wolfart. (2003).

Morphology – Erosion. Retrieved Dec, 2007, from HIPR – HyperMedia Image

Processing Reference. http://www.cee.hw.ac.uk/hipr/html/erode.html

[4] Robert Fisher, Simon Perkins, Ashley Walker and Erik Wolfart. (2003). Image

Analysis – Connected Components Labeling. Retrieved Dec, 2007, from HIPR2 –

Image Processing Learning Resources.

http://homepages.inf.ed.ac.uk/rbf/HIPR2/label.htm

[5] EasyImage Camera Library, Saul Greenberg and Mark Watson, iLab, Department

of Computer Science, University of Calgary.

http://grouplab.cpsc.ucalgary.ca/cookbook/index.php/Toolkits/EasyImage