object detection: a reinforcement learning approach using adversarial … · 2019-05-13 ·...

Object Detection: A Reinforcement Learningapproach using Adversarial Models

Angel A. CantuDepartment of Computer Science

University of Texas Rio Grande ValleyEdinburg, United States of America

[email protected]

Nami AkazawaDepartment of Computer Science

University of Texas Rio Grande ValleyEdinburg, United States of America

[email protected]

Abstract—In this research experiment we attempt toremove the need of a human to label objects in images as apre-processing procedure for object detection algorithms.We come up with a model that uses a reinforcementlearning agent and a artificial neural network to form anadversarial network. This network is used to construct areward system for the agent as the agent performs. Thenetwork will learn which actions are best for the agent asthe network learns the similarities of some dataset and theagent’s output. The agent performs in a dataset of images,where it chooses what dimensions to crop from the image.For now, we consider a fixed bounding box of size 80×80.This cropped image is send to the network for evaluation,where the network will determine if it is similar to someobject.

Index Terms—Machine Learning, Adversarial Networks,Object Detection, Reinforcement Learning

I. INTRODUCTION

Object detection is a computer technology related tocomputer vision that deals with the task of detectinginstances of some particular object in digital imagesand videos. Common uses are facial detection, im-age retrieval, video survelliance and many more. Themost common methods of achieving functional objectdetection generally fall within machine learning-basedalgorithms and deep learning. One of the most populardeep learning algorithms is that of You Only Look Once(YOLO). YOLO’s procedure is to use a neural networkthat divides the image into regions and predicts boundingboxes and probabilities for each region. These boundingboxes are weighted by the predicted probabilities. Figure1 depicts YOLO’s object detection algorithm in detail.Although YOLO has great performance, there is a ex-pensive pre-processing method that is needed in orderto prepare some dataset for training. This pre-processingrequires the researcher to label the objects in the imagesusing a free software distributed by YOLO. For smalldatasets this is easily dealth with; with large datasets,

(a)

(b)

Figure 1: YOLO’s object detection examples.

however, is a long process that usually requires thecombined work of many researchers. The free softwareoffered by YOLO is shown in Figure 2. This software isavailable at their GitHub repository. In this software, youselect some folder which contains all your images. Youuse the buttons in the software to select images one-by-one and use the selection tool to decide the dimensions ofthe bounding box. This produces some .txt file with thedimensions and the image’s file name. This informationis fed into YOLO’s network.

(a)

Figure 2: YOLO’s object detection examples.

II. MOTIVATION

As our technology advances over the years, thereexists some areas of advancement where human interac-tion is necessary. Making accurate pre-processing oftenrequires a human to be involved in the procedure. Wetake the first steps in solving the problem of automizationin data pre-processing. The goal of this research is toautomatize the process of image labeling by using a rein-forcement learning agent in an adversarial environment.We aim to speed up the advancement of technologicalexperiments with the use of intelligent agents in orderto save and create time for humans.

III. METHODS

To address the problem of human labeling, we use astate-of-the-art machine learning-based algorithm calledGenerative Adversarial Networks (GANs). GANs area class of machine learning systems that puts twoneural networks to contest against each other, forminga zero-sum game framework. The system consists oftwo networks called the generator and discriminator.The generator network’s generates candiates while thediscriminator network evaluates them. This zero-sumgame operate in terms of data distributions; typically, thegenerator learns to form a mapping function from latentspace to the data distribution of the real dataset. Thereal dataset consists of the datasets which the generatorwill essentially learn to generate. These generated datasare send to the discriminator for evaluation. If thegenerator “fools” the discriminator, the generator’s errorrate decreases while the discriminator’s increases. Tofool the discriminator is the discriminator classifying thegenerated data as real.

The goal of the agent is to receive an image andproduce a center point where the bounding box should

(a)

Figure 3: GAN network architecture.

be placed. For now, we consider only a bounding box ofsize 80× 80. Formally, the agent’s details are:

• Environment: The agent’s environment is best de-scribed as the images that require annotation.

• Actions: The agent’s actions consists of givingtwo integers that dictate the center location of thebounding box. These two integers are less than orequal to the size of the image.

• Observation: The observation after each action isthe image used as the environment with the bound-ing box placed in the image. We do this to allow theagent to see where the bounding box was placed.

• Reward: The discriminator uses binary classifica-tion to detect which images are real or fake. Realimages are classified as [1] and fake as [0]. After theagent sends the bounding box image to the discrim-inator, the agent uses the prediction as its reward. Ifthe discriminator detects the agent’s cropped imageas real, then the prediction will be of higher value(closer to 1).

By using an Agent as our generator in a GAN systemwe do not only automatize the labeling part for objectdetection but create an agent that is able to detectobjects itself. The environment is depicted in figure 4.

(a)

Figure 4: Environment for the agent is some image.

Of course, the environment is some image where someparticular object exists (a face for this example). Theagent is to receive this fixed sized image and decide

2

at which locations does the object exist in the image.Our environments have dimensions 225× 300. We alsoconsidered converting the images to black and white.After the agent produces two integers of which are used

(a)

Figure 5: The agent performs and crops some fixed spacefrom the image.

to create some bounding box, we send those integersand the environment (image) through some piece ofcode which removes that particular area from the image.Figure 5 depicts this process. Immediately after, we send

(a)

Figure 6: The cropped image is send to the discriminatorfor validation.

the cropped image to the discriminator for evaluation.The discriminator has access to the real dataset, whichconsists of ground-truth images of the particular object.The discriminator essentially is what dictates how goodthe agent may perform. We therefore consider differentmethods to training the discriminator:

• We consider pre-training the discriminator beforeusing it for evaluations. There is one problemthough: although we have the ground-truth forthe real datasets, we do not have access to someground-truth for the non-object data. We have at-tempted pre-training the discriminator on the realdata and on random crops from the discriminator’sdata, though the reward system does not seem towork well this way.

• We also consider training the discriminator concur-rently with the agent. This allows for both actors

to maintain an equal level of performance. Wecurrently use this method to test our model.

(a)

Figure 7: The discriminator produces some predictionvalue x which is used for the agent’s reward system.

After the cropped images are send to the discriminator,the discriminator will make some prediction on thefalsity or truth of the agent’s cropped image. If thecropped image is similar to the real dataset (similar tothe object), then the discriminator should output somehigh integer value to manage the predition to the class[1]. If the cropped image is not similar to the real data,then the prediction will be closer to values of [0].

IV. DATA

We used many images from online resources. Mostof these resources were from universities that offer freeface datasets online. We also use OpenAI’s baselinesalgorithms. We currently test our method using PPO2.

(a) (b) (c)

Figure 8: Real dataset that the discriminator uses to makeevaluations.

(a) (b) (c)

Figure 9: Fake dataset that the agent uses to makebounding boxes.

V. RESULTS

For our first evaluation, we use a simple environment:A single star in a white background, as depicted inFigure 10. As we let our model train longer time, ouragent learns to put a bounding box in a correct position.Once we saw a good result from our first evaluation,we decided to perform face detection using a face

3

(a)

Figure 10: Simple environment consisting of a singlestar.

dataset that is publicly available online. This increasesthe difficulty for our model to learn since every faceimages are under various conditions such as illumination,background, face alignment and facial expression. Thisrequired a longer training than the first experiment forus to see some good results. Despite the fact that ourfixed bounding box size 80× 80 is not large enough tocover the whole face in the image, as we keep trainingour model, it was able to crop. In Figure 11, we see

(a) (b)

Figure 11: Two examples of the agent’s performance.

the results after a few hours of training. However, therewere still many misplacements of bounding boxes onthe faces. We suspect that the requirement of a fixedbounding box limits the agent to what it is capable ofperforming. For our future work, we will allow the agentto choose a larger bounding box, along with multiplebounding boxes.

ACKNOWLEDGMENT

We like to thank Adolfo Gonzalez III for helping uswith experimental questions and idea sharing. We alsolike to thank Dr. Kim for the help and advice towardsthe current state of this project.

REFERENCES

[1] J. Redmon, S. Divvala, R. Girshick, A. Farhadl. “You Only LookOnce: Unified Real-Time Object Detection,” arXiv.org, May 9,2016.

[2] Goodfellow, Ian J., Pouget-Abadie, Jean, Mirza, Mehdi, Xu,Bing, Warde-Farley, David, Ozair, Sherjil, Courville, Aaron C.,and Bengio, Yoshua. Generative adversarial nets. NIPS, 2014

4

object detection: a reinforcement learning approach using adversarial … · 2019-05-13 ·...

Documents