multi-person fever screening using a thermal and a visual camera845466/... · 2015. 8. 11. ·...

Multi-person fever screening using a thermaland a visual camera

Jorgen Ahlberg∗†, Nenad Markus‡, Amanda Berg∗

∗Termisk Systemteknik AB, Linkoping, Swedenwww.termisk.se

Email: {jorgen.ahlberg, amanda.berg}@termisk.se

†Visage Technologies AB, Linkoping, Swedenwww.visagetechnologies.com

Email: [email protected]

‡Human-Oriented Technologies Laboratory,Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia

hotlab.fer.hrEmail: [email protected]

Abstract—We propose a system to automatically measurethe body temperature of persons as they pass. In contrastto exisitng systems, the persons do not need to stop andlook into a camera one-by-one. Instead, their eye cornersare automatically detected and the temperatures thereinmeasured using a thermal camera. The system handlesmultiple simultaneous persons and can thus be used wherea flow of people pass, such as at airport gates.

I. INTRODUCTION

At each outbreak of an epidemic disease, such as Ebolaor Avian influenza, various measures are taken in order tolimit its spread. One such measure is to detect travelerscarrying a fever, and deny them entrance to their desti-nation country before medical examination and possiblyquarantine. Thermal cameras have been used for detectionof travelers with increased temperatures, but has shown tobe impractical since they stop the flow of people. Existingsystems require that each person stops and looks intoa thermal camera where an operator manually approveor stop the traveler. Other systems automatically screenpassing travelers, but anyway requires an operator tobe watching the screen and stop any detected feverishpersons.

In contrast, our system uses both a visual and a thermalcamera. Faces and eye corners are detected using thevisual camera, and the visual imagery is presented to theoperator. Using the coordinates from the visual detections,the eye corners are found in the thermal image as well,and their temperatures measured. If abnormal eye cornertemperatures are found, the visual image of the personis placed on an alarm list, and the operator is notified.Thus, the operator does not need to be looking at thescreen all the time, and the screen might even be placedat a different place than the camera.

In order to build such a system, several subproblemsneed to be solved. First, faces and eye corners need to bedetected in the visual images. Second, the faces need to

be tracked, so that each new image does not result in anew alarm. Third, the correspondence between visual andthermal image coordinates need to be established. Fourth,the temperature needs to be measured and suitable alarmcriteria need to be designed.

A. Outline

In the following section, a short overview of the systemis given. Then, in Section III, the method we use andits individual components are presented. Section IV tellsabout our preliminary experiments, and our conclusionsare discussed in Section V.

II. SYSTEM OVERVIEW

Our system combines a thermal and a visual camerain a single box. The purpose of the visual camera isto present easily interpretable images to the operator –thermal images are much less useful for manual iden-tification of a person than a color image. The purposeof the thermal camera is to measure the temperatures ofthe subjects. The hardware prototype is shown in Fig. 1.The thermal camera is a FLIR A65 with a resolutionof 640 × 512 pixels and a framerate of less than 9frames per second. Even if faster models are available, thisframerate was deemed to good enough, and, moreover, itcomes with no export restrictions. The visual camera isa Microsoft LifeCam Studio with 1080p HD resolutionand a considerably higher framerate. For a final product,a completely different visual camera will be selected.

The cameras are connected to a standard PC where theuser interface shown in Fig. 2 is presented to the user.Detected faces are shown with rectangles, with colorsmarking their status. Yellow indicates that no robust tem-perature measurement is made so far, green indicates thatthe person has a normal temperature, and red indicatesthat the person has a abnormally high temperature.

Fig. 1. The hardware prototype mounted on a tripod. The visual cameraon top and the thermal camera below.

Fig. 2. The user interface prototype. Persons having an abnormallyhigh temperature goes to the alarm list in the lower part of the window(this particular list is obviously fake).

III. PROPOSED METHOD

We propose to use the visual camera for detecting faces.The main reason is that training data for a large varietyof situations and faces are available in the form of visualimagery, but more difficult to come over in the thermaldomain. For the detection, we employ a sliding windowapproach and a random forest classifier described inSection III-A. Faces are kept track of using a multi targettracker described in Section III-B. Next, eye corners arefound by using a random forest regressor in the detectedface area (Section III-C). The eye corners’ positions in thevisual images are transformed to corresponding thermalimage coordinates (Section III-D), and the temperature ismeasured as described in Section III-E

A. Face detection

We use a modification of the standard Viola-Jones facedetection framework [1]. The basic idea is to scan theimage with a cascade of binary classifiers at all reasonablepositions and scales. An image region is classified asa face if it successfully passes all the members of thecascade. Each binary classifier consists of a boostedensemble of decision trees with pixel intensity comparisonbinary tests (”Ix1,y1

< Ix2,y2?”) in their internal nodes.

The system can achieve real-time frame rates because thefirst few classifiers are simple and fast (i.e., they consist ofjust a few decision trees). Thus, the majority of non-facesare rejected by early stages, i.e., with little processing timespent. The details of the developed system are given ina technical report [2] and the source code is available athttps://github.com/nenadmarkus/pico.

In addition, the face pose is estimated using the VisageSDK. The face pose is used for selecting measurementframes (see Section III-E below).

B. Multi-face tracking

Faces are detected in each new frame, independent ofwhat happened in the last frame. If face detections arenot tracked from frame to frame, each person with afever will give rise to multiple fever detections. Therefore,a multi-target tracking framework is used to associateface detections from different frames. A collection ofassociated face detections belonging to the same objectfor a number of frames is hereby called a track.

Each track has a Kalman filter which is used to predictthe position and velocity of the object in the next frame.The Kalman filter is initialized with a static linear obser-vation model and a static linear process model. Constantvelocity is assumed. Measurements are found as

zt =

[1 0 0 00 1 0 0

]xt + v (1)

where z is the measured position at time t and v is themeasurement noise. xt is the state

xt =

pxpyvxvy

(2)

which contains the position (px, py) and velocity (vx, vy)of the object in the image plane. The constant velocitymotion model

xt =

1 0 ∆T 00 1 0 ∆T0 0 1 00 0 0 1

xt−1 + w (3)

used describes how the state is updated. w is the processnoise and ∆T the time difference between the current andthe last frame.

Given the predicted positions of all current tracks,associations of new face detections to existing tracks canbe made. Here, a Global Nearest Neighbour is used,which means that only a single hypothesis, the global

nearest neighbour, is associated to each detection. Facedetections can also be associated to false alarms or newdetections, depending on what is the most probable case.The association method used is the auction algorithmwhich operates on a so called assignment matrix. Formore details, see, for example, the textbook by Blackmanand Popoli [3].

C. Eye corner detectionEye corner detection is performed with a landmark

point localization framework from [4]. The basic ideais to use a multi-scale sequence of regression tree-basedestimators. We assume that the face region is alreadyknown, and the problem is thus rather to localize thanto detect the eye corners, i.e., a regression problem. Thelandmark point position is estimated with an ensembleof regression trees based on pixel comparison binarytests (same ones as in Section III-A). Unfortunately, theaccuracy and robustness of this process critically dependon the scale of the rectangle within which we performthe estimation. If the rectangle is too small, we risk that itwill not contain the landmark at all due to the uncertaintyintroduced by face tracker/detector. If the rectangle is big,the localization is more robust but accuracy suffers. Tominimize these effects, we learn multiple tree ensembles,each for estimation at a different scale. The methodproceeds in a recursive manner, starting with an ensemblelearned for the largest scale (i.e., whole face region).The obtained intermediate result is used to position therectangle for the next ensemble in the chain. The processcontinues until the last one is reached. Its output isaccepted as the final result.

D. Camera correspondenceThe visual camera is positioned on top of the thermal

camera, and the horizontal axes of the two cameras arealigned. Detecting a face in the visual image thus also tellsus the horizontal position of the face in the thermal image,but without knowing the distance to the detected face, thevertical position of the face in the thermal image cannotbe computed. However, knowing the the approximatedistance to the face, say ±50%, the error in the verticalcoordinate will be up to 50% of the distance betweenthe two cameras’ optical axes. Simply by detecting facesonly within a certain size interval (limiting the size of thesliding window in Section III-A) we can achieve at leastthat precision, and thus know the vertical position of theeye corners in the thermal image with a precision of afew centimeters in world coordinates.

In practice, the problem of the cameras not beingsynchronized is worse. If the face is moving to fast inthe image plane, its position will be different when thevisual and thermal images are acquired. Thus, we exploitthe information from the tracker, and do not use imageswhere the in-plane velocity components of the state vectorare too large.

E. Temperature measurementThe camera is radiometric, that is, the pixel values

correspond linearly to estimated temperatures of the

pictured object. These estimates are computed by thecamera, using object reflectance, ambient temperature, airtransmittance and temperature, and distance to the objectas input parameters. Since human skin and eyes haveknown and low reflectance in the used waveband, almostall of the radiation received by the camera is emitted(and not reflected). Thus, the temperature measurement isrelatively insensitive to an erroneous ambient temperature.If the ambient temperature is 10 K higher than theparameter value, the reflection in the skin/eye would resultin a bias in the temperature estimate with around 0.3 K.Moreover, at the short distances that relevant here, theair is essentially transparent, and its temperature does notinfluence the measurement.

A worse problem is the camera itself. As the cameraoperates, its temperature will increase and thus emitmore radiation. In spite of the internal thermometer andcompensation for this, the camera might well drift a fewdegrees. In order to have absolute measurements, we thusneed a reference temperature. Suitably, a patch of a high-emissivity material with a contact thermometer is placedso that it is visible by the thermal camera.

Another approach is to not care about absolute tem-peratures, but instead collect statistics of the measuredtemperatures and warn when a person with a deviatingtemperature appears. For example, the mean µ and vari-ance σ2 of the temperatures could iteratively be estimated,and then a warning be given when a measurement exceedsa certain threshold (for example, t = µ+ c ·σ, for a user-configurable parameter c). We have chosen to implementrelative as well as absolute temperature alarms.

In order to use a thermal image for temperature mea-surement, a set of criteria needs to be met. First, weneed to have high enough resolution on the measuredobject, so that we have pure pixels on the eye corner.As mentioned, we already require the faces to be withina certain distance interval for the camera correspondenceto be valid. Second, frames with large in-plane motionare not used. Third, the face should be approximatelyfrontal. For each detected face, temperature measurementsare accumulated and the maximum is used.

IV. EXPERIMENT

The performance of the face detection and eye cornerdetection have been documented elsewhere [2], [4], andthe experimental evaluation here is rather on a systemlevel. The purpose is twofold: To identify what problemsare critical to the usability of the system, and evaluatethe output of the entire system, that is, the temperaturemeasurements.

We have tried out the system by setting it up in acorridor at our office and letting our colleagues passthe cameras. Before and after each passing, the subject’seye corner temperature was carefully measured manuallyusing a FLIR T650sc high-end handheld thermal camera.This camera has a sensitivity of less than 20 mK, butwithout a reference temperature it can still give a temper-ature measurement error of an entire Kelvin. This is lessof a problem though, since the camera is relatively stable

when it has warmed up. Thus, we can use it as a referencefor the relative temperatures, that is, verify that thetemperature differences between the subjects are correctlymeasured by the system. The absolute temperatures, ifwanted, can then be found using a temperature reference.

After compensating for the absolute temperature dif-ference between the two cameras, we find that we canmeasure the temperature within half a Kelvin. The numberof tests is at the time of writing too small to get anyreliable statistics on the number of failures, i.e., personspassing without being measured.

V. CONCLUSION AND FUTURE WORK

The system presented here is an early prototype, andthere are several more or less obvious improvements tobe addressed. One such is the camera correspondencementioned in Section III-D. By using a visual camera thatcould be triggered by the thermal camera for acquiringan image, the synchronization problem can be solved.Moreover, using a visually transparent thermal mirror, thecameras could be arranged to have a common optical axis,which would solve the correspondence problem.

Another problem is that the multi-target tracker some-times fails in the association. By adding a template-basedtracker (such as DSST [5] or EDFT [6]), this could bemitigated. A third problem is people wearing glasses, asglass is intransparent for thermal radiation in the usedwaveband. Presumably, a detection algorithm for findingglasses in a thermal image is easily implemented.

ACKNOWLEDGMENT

The research leading to these results has receivedfunding from the European Union Seventh FrameworkProgramme (FP7/2007-2013) under grant agreement n°312784. Specifically, the multi-target tracker was imple-mented for the FP7 project P5. Funding for developingthe prototype has been received in the form of a loan fromAlmi. A patent is pending.

REFERENCES

[1] P. Viola and M. Jones, “Rapid object detection using a boostedcascade of simple features,” in CVPR, vol. 1, 2001, pp. I–511–I–518.

[2] N. Markus, M. Frljak, I. S. Pandzic, J. Ahlberg, and R. Forchheimer,“Object detection with pixel intensity comparisons organized indecision trees,” Tech. Rep. [Online]. Available: http://arxiv.org/abs/1305.4537

[3] S. Blackman and R. Popoli, Design and Analysis of ModernTracking Systems. Artech House, Norwood, MA, 1999.

[4] N. Markus, M. F. I. S. Pandzic, J. Ahlberg, and R. Forchheimer,“Eye pupil localization with an ensemble of randomized trees,”Pattern Recognition, vol. 47, pp. 578–578, 2014.

[5] M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg, “Ac-curate scale estimation for robust visual tracking,” in Proceedingsof the British Machine Vision Conference. BMVA Press, 2014.

[6] M. Felsberg, “Enhanced Distribution Field Tracking using ChannelRepresentations,” in Proceedings of the IEEE International Confer-ence on Computer Vision Workshops (ICCVW), 2013. IEEE, 2013,pp. 121–128.

multi-person fever screening using a thermal and a visual camera845466/... · 2015. 8. 11. ·...

Documents