on the effect of selfie beautification filters on face

7
1 Pattern Recognition Letters journal homepage: www.elsevier.com On the Eect of Selfie Beautification Filters on Face Detection and Recognition Pontus Hedman a , Vasilios Skepetzis a , Kevin Hernandez-Diaz a , Josef Bigun a , Fernando Alonso-Fernandez a,** a School of Information Science, Computer and Electrical Engineering, Halmstad University, Box 823, Halmstad SE 301-18, Sweden Article history: Keywords: Face Detection, Face Recog- nition, Social Media Filters, Beautifica- tion, U-NET, Convolutional Neural Net- work ABSTRACT Beautification and augmented reality filters are very popular in applications that use selfie images captured with smartphones or personal devices. However, they can distort or modify biometric features, severely aecting the capability of recognizing individuals’ identity or even detecting the face. Accordingly, we address the eect of such filters on the accuracy of automated face detection and recognition. The social media image filters studied either modify the image contrast or illumination or occlude parts of the face with for example artificial glasses or animal noses. We observe that the eect of some of these filters is harmful both to face detection and identity recognition, specially if they obfuscate the eye or (to a lesser extent) the nose. To counteract such eect, we develop a method to reconstruct the applied manipulation with a modified version of the U-NET segmentation network. This is observed to contribute to a better face detection and recognition accuracy. From a recognition perspective, we employ distance measures and trained machine learning algorithms applied to features extracted using a ResNet-34 network trained to recognize faces. We also evaluate if incorporating filtered images to the training set of machine learning approaches are beneficial for identity recognition. Our results show good recognition when filters do not occlude important landmarks, spe- cially the eyes (identification accuracy >99%, EER<2%). The combined eect of the proposed approaches also allow to mitigate the eect produced by filters that occlude parts of the face, achieving an identification accuracy of >92% with the majority of per- turbations evaluated, and an EER <8%. Although there is room for improvement, when neither U-NET reconstruction nor training with filtered images is applied, the accuracy with filters that severely occlude the eye is <72% (identification) and >12% (EER) © 2021 Elsevier Ltd. All rights reserved. 1. Introduction Selfie images captured with smartphones enjoy huge popu- larity and acceptability, and social media platforms centered around sharing such images oer several filters to ”beautify” them before uploading. Filtered images are more likely to be viewed and commented, achieving a higher engagement (Bakhshi et al., 2021), and selfies are also increasingly used in security applications since mobiles have been converted in data hubs used for all type of transactions (Rattani et al., 2019). A challenge posed by such filters is that facial features may ** Corresponding author: Tel.: +46-(0)35-167100; fax: +46-(0)35-120348; e-mail: [email protected] (Fernando Alonso-Fernandez) be distorted or concealed. Given their low cost and immediate availability, they are a commodity for many that is used daily, not necessarily with the aim of compromising face recognition systems. However, the capability of recognizing individuals may be aected, and even the possibility of detecting the face itself before any recognition can take place. This may be cru- cial for example in crime investigation on social media (Powell and Haynes, 2020), where automatic pre-analysis is necessary given the magnitude of information posted or stored in con- fiscated devices (Hassan, 2019). There are multiple examples of crimes captured on mobiles (Berman and Hawkins, 2017; Pagones, 2021), with the most striking lately being the use of posted videos of the US Capitol to identify rioters (Morrison, 2021). arXiv:2110.08934v2 [cs.CV] 20 Oct 2021

Upload: others

Post on 07-Dec-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

1

Pattern Recognition Lettersjournal homepage: www.elsevier.com

On the Effect of Selfie Beautification Filters on Face Detection and Recognition

Pontus Hedmana, Vasilios Skepetzisa, Kevin Hernandez-Diaza, Josef Biguna, Fernando Alonso-Fernandeza,∗∗

aSchool of Information Science, Computer and Electrical Engineering, Halmstad University, Box 823, Halmstad SE 301-18, Sweden

Article history:

Keywords: Face Detection, Face Recog-nition, Social Media Filters, Beautifica-tion, U-NET, Convolutional Neural Net-work

ABSTRACT

Beautification and augmented reality filters are very popular in applications that use selfieimages captured with smartphones or personal devices. However, they can distort ormodify biometric features, severely affecting the capability of recognizing individuals’identity or even detecting the face. Accordingly, we address the effect of such filterson the accuracy of automated face detection and recognition. The social media imagefilters studied either modify the image contrast or illumination or occlude parts of theface with for example artificial glasses or animal noses. We observe that the effect ofsome of these filters is harmful both to face detection and identity recognition, speciallyif they obfuscate the eye or (to a lesser extent) the nose. To counteract such effect, wedevelop a method to reconstruct the applied manipulation with a modified version of theU-NET segmentation network. This is observed to contribute to a better face detectionand recognition accuracy. From a recognition perspective, we employ distance measuresand trained machine learning algorithms applied to features extracted using a ResNet-34network trained to recognize faces. We also evaluate if incorporating filtered images tothe training set of machine learning approaches are beneficial for identity recognition.Our results show good recognition when filters do not occlude important landmarks, spe-cially the eyes (identification accuracy >99%, EER<2%). The combined effect of theproposed approaches also allow to mitigate the effect produced by filters that occludeparts of the face, achieving an identification accuracy of >92% with the majority of per-turbations evaluated, and an EER <8%. Although there is room for improvement, whenneither U-NET reconstruction nor training with filtered images is applied, the accuracywith filters that severely occlude the eye is <72% (identification) and >12% (EER)

© 2021 Elsevier Ltd. All rights reserved.

1. Introduction

Selfie images captured with smartphones enjoy huge popu-larity and acceptability, and social media platforms centeredaround sharing such images offer several filters to ”beautify”them before uploading. Filtered images are more likely tobe viewed and commented, achieving a higher engagement(Bakhshi et al., 2021), and selfies are also increasingly used insecurity applications since mobiles have been converted in datahubs used for all type of transactions (Rattani et al., 2019).

A challenge posed by such filters is that facial features may

∗∗Corresponding author: Tel.: +46-(0)35-167100; fax: +46-(0)35-120348;e-mail: [email protected] (Fernando Alonso-Fernandez)

be distorted or concealed. Given their low cost and immediateavailability, they are a commodity for many that is used daily,not necessarily with the aim of compromising face recognitionsystems. However, the capability of recognizing individualsmay be affected, and even the possibility of detecting the faceitself before any recognition can take place. This may be cru-cial for example in crime investigation on social media (Powelland Haynes, 2020), where automatic pre-analysis is necessarygiven the magnitude of information posted or stored in con-fiscated devices (Hassan, 2019). There are multiple examplesof crimes captured on mobiles (Berman and Hawkins, 2017;Pagones, 2021), with the most striking lately being the use ofposted videos of the US Capitol to identify rioters (Morrison,2021).

arX

iv:2

110.

0893

4v2

[cs

.CV

] 2

0 O

ct 2

021

2

(a) Aden (b) Ashby (c) Dogpatch (d) Gingham (e) Hudson (f) Ludwig (g) Skyline (h) Slumber (i) Valencia

Fig. 1: Examples of the 9 Instagram filters applied.

(a) Dog nose (b) Glasses (c) shades 95% (d) shades 100%

Fig. 2: Examples of the 4 AR filters applied.

There is, therefore, interest to study the consequences of dif-ferent levels of image manipulation and concealment of facialparts due to these ”beautification” filters. It would be also of in-terest to evaluate methods that remove the filter’s effect to avoida decrease in face detection and recognition performance. Thepurpose and contributions of this work are therefore multi-fold:i) we summarise related works in image digital manipulation, inparticular with the purpose of facial beautification; ii) we studythe impact of image enhancement and Augmented Reality (AR)filters both on the detection of filtered faces and on the recog-nition of individuals; iii) we develop a method to reverse someof the applied manipulations (in particular, those who entail ob-fuscating parts of the face) and evaluate its benefit both on im-proving detection and identity recognition of filtered faces; andiv) we study if training the identity recognition system with ma-nipulated images helps to increase recognition accuracy whensuch manipulations are present in test images.

As manipulation methods, we use the most popular selfiefilters in Instagram (Canva, 2020), which mainly modify con-trast/lightning, as well as AR filters that obfuscates the eyes ornose by adding items like glasses or animal noses. These areapplied to 4,324 images of the LFW database (Huang et al.,2007). Face detection is carried out with the CNN of Geit-gey (2018), while feature extraction is done with a ResNet-34pre-trained for face recognition (King, 2017). To reverse themanipulations, we use a modified U-NET (Ronneberger et al.,2015) that we train on 203K images of the CelebA database(Liu et al., 2015). We focus on reversing modifications to theeye region, since these are observed to have the biggest im-pact on face detection or recognition. For example, obfuscatingthe eyes reduces face detection accuracy to <90%, while mod-ifying contrast/lightning or obfuscating the nose do not have asignificant impact (>98%). A similar trend is observed in faceidentification and verification experiments. After applying thegenerative U-NET reconstruction, detection is recovered to thelevels of original unmodified images. This highlights both theimpact of eye occlusion on the face detector, as well as the ben-efit of the reconstruction method employed. t-SNE scatter plotsof the vectors given by the feature extraction network are alsoprovided, showing a progressively higher intra-class (identity)and smaller inter-class variability as the nose or the eyes be-

come occluded. This mirrors the results of face detection, andsuggests that identity separation will be impacted as well.

The rest of the paper is organized as follows. Related workson facial manipulation are outlined in Section 2. The employeddata and its preparation is described in Section 5.1, includingthe beautification filters and the U-NET reconstruction method.The experimental protocol is then detailed in Section 5, with theobtained results discussed in Section 6. Finally, conclusions aredrawn in Section 7.

2. Related Works

Facial manipulation can be done in the physical or digital do-main. Physical manipulation can be achieved for example viamake-up or surgery, while digital manipulation or retouching isvia software (Rathgeb et al., 2019). Physical manipulation canbe permanent (surgery) or non-permanent (make-up). Make-upcan be quickly induced, so the same person may appear differ-ent even after a short period. Also, given the wide acceptanceof cosmetics, it may also appear in enrolment data. Digital re-touching allows similar modifications than surgery or cosmet-ics, but in the digital domain, as well as other changes such asre-positioning or resizing facial landmarks. A common aim ofthese modifications is to improve attractiveness (beautification).Of course, it is also possible that someone pretends to look likesomebody else to gain illegitimate access, or to hide the ownidentity to avoid recognition (Ramachandra and Busch, 2017;Scherhag et al., 2019). Another manipulation is the use of fa-cial masks, either surgical due to the current pandemic (Dameret al., 2020), or artificial as used in Presentation Attacks (Ra-machandra and Busch, 2017). However, this is out of the scopeof this paper, since they are not oriented towards beautification.

Some works focus on detecting retouched images. The meth-ods proposed include Supervised Restricted Boltzmann Ma-chine (SRBM) (Bharati et al., 2016), semi-supervised autoen-coders (Bharati et al., 2017), or Convolutional Neural Net-works (CNN) (Jain et al., 2018). They also present newdatabases such as the ND-IIITD Retouched Faces Database(Bharati et al., 2016) or the MDRF Multi-Demographic Re-touched Faces (Bharati et al., 2017), generated with paid andfree applications that provide for example skin smoothing, faceslimming, eye/lip color change, eye/teeth brightening, etc. Au-thors of the MDRF database also analyze the impact of genderor ethnicity, showing that detection accuracy can vary greatlywith the demographics. The work (Jain et al., 2018) also an-alyzes the detection of GAN-altered images. All these ap-proaches consider the use of a single image (the retouched one).In contrast, Rathgeb et al. (2020) proposes a differential ap-proach where the unaltered image is also available, something

3

(a) Original (b) Landmarks (c) Multiple faces

Fig. 3: (a), (b): Example of landmarks detected on a face image. (c) Exampleof images with multiple faces detected.

which, according to the authors, is plausible in some scenarios(e.g. border control). They use texture and deep features withimages of the FERET and FRGCv2 datasets. Retouching isdone with free applications from the Google PlayStore, arguingthat free applications are more likely to be used by consumers.

Another set of works analyze the impact of manipulated im-ages on the recognition performance. In (Dantcheva et al.,2012), they gather two databases of Caucasian females withmakeup. One is from before/after YouTube tutorials mostlyaffecting the ocular area, and the other is by modification ofFRGC images with lipstick, eye makeup or full makeup. Thestudy employs Gabor features, LBP and a commercial system,showing an increase in error when testing against makeup pic-tures. They also found that applying LBP to Gabor filteredimages (as opposed to the original image) partly compensatesthe effect. In (Ferrara et al., 2013), alterations such as barreldistortion or aspect ratio change are studied. They also simu-late surgery digitally, such as injectables, wrinkle removal, lipaugmentation, etc. They employ the AR database, with twocommercial and a SIFT algorithm, concluding that the systemscan overcome limited alterations, but they stumble on heavymanipulations. Digital retouching is studied in (Bharati et al.,2016) with the ND-IIITD database. They use a commercial sys-tem and OpenBR, an open source face engine, finding that theperformance is considerably degraded when testing against re-touched images. Image retouching is also examined by Rathgebet al. (2020) with a commercial system and the open-source Ar-cFace, showing its negative impact as well.

3. Beautification Filters

We focus on two manipulations: image enhancement andAugmented Reality (AR). These modifications, in particularAR filters, have not been addressed in the related literature. Forenhancement, we use the 9 most popular selfie Instagram fil-ters (Canva, 2020), which mostly change contrast and lightning(Figure 1). The ranking is based on the number of images witha particular filter and the hashtag “#selfie“. Since the InstagramAPI does not allow to apply filters to a large amount of data,they are recreated with a four-layer neural network that learnsthe changes of each filter (Hoppe, 2021). Regarding AR filters,they obfuscate parts of the face that can be critical for recogni-tion (eye and nose, Figure 2). Such filters are very popular insocial media (e.g. Snapchat) and even in conference platformssuch as Zoom. In particular, we apply: “Dog nose”, “Transpar-ent glasses”, “Sunglasses with slight transparency”, and “Sun-glasses with no transparency”. These are merged with the faceby using the landmarks (Figure 3b) given by (Geitgey, 2018).

Fig. 4: U-NET model employed. The blue rectangles are convolutions (com-pression), the yellow are transpose convolutions (expansion), and the greensymbolizes the addition of blue and yellow through the add operation.

Table 1: Summary description of the 8 datasets employed in the study. Thenumbers in brackets indicate the percentage of images of each dataset for whichthe face is correctly detected (see Section 5.2).

Name Manipulation Imagesbenchmark Original images 4,276 (98.9%)

dog Dog nose 4,229 (97.8%)glasses Transparent glasses 3,666 (84.8%)

instagram Instagram filters 4,277 (98.9%)shades leak Shades (95% opacity) 3,851 (89.1%)

shades recon leak Reconstructed (95% op.) 4,288 (99.2%)shades no leak Shades (100% opacity) 3,825 (88.5%)

shades recon no leak Reconstructed (100% op.) 4,271 (98.8%)

4. Image Reconstruction with U-NET

We use a modified version of the U-NET network. Originallypresented for image segmentation (Ronneberger et al., 2015), itoutperformed more complex networks in accuracy and speedwhile requiring less training data. It has a compression or en-coding path with convolutions and max-pooling, followed bydecompression or decoding path with up-convolutions. Thisgives the network a U-shape (Figure 4), hence its name. Resid-ual links connect maps of the encoding and decoding paths,with channels concatenated, allowing the model to focus on theparts of the image that change. The original network has beenmodified, since the task is different. Inspired by Springenberget al. (2015), max-pooling and up-convolutions are changed tostrided convolutions/transposed convolutions. Also, map con-catenation in residual links is changed by addition to halve thenumber of channels. With this, we expect to still retain changesof image patches while counteracting over-fitting.

5. Experimental Framework

5.1. Databases

We use the version aligned by funneling (Huang et al., 2012)of Labeled Faces in the Wild (LFW) (Huang et al., 2007). Ithas 13,233 images of 5,749 celebrities from the web with largevariations in pose, light, expression, etc. To ensure a sufficientamount of images per person, we remove people with less than10 images, resulting in 158 individuals and 4,324 images. Fivedatasets are then created by applying the Instagram filters and

4

(a) shades 95% (b) shades 100%

Fig. 5: Examples of the reconstruction on the shades dataset. The left part ofeach sub-figure shows the first 10 pixel values of the images with shades (re-scaled to 0-255), while the right part shows the reconstructed image. Shadesat 95% preserve some information, so a good reconstruction is still possible.Shades at 100%, on the other hand, destroys the pixels behind the shades.

the four AR filters of Section 3. The Instagram dataset is cre-ated by applying one filter of Figure 1 randomly to each un-filtered image. Additionally, images with sunglasses are pro-cessed with U-NET, giving two more datasets of reconstructedimages. This results in the 8 different datasets of Table 1.U-NET is trained to reconstruct the filters shades leak andshades no leak (Figure 2c, d) with the CelebA dataset (202, 599pictures of 10, 177 people) (Liu et al., 2015). We use a batchsize of 64, with Adam as optimizer and the MSE between theoutput and the target (unfiltered) images as loss. CelebA is notused for biometric recognition experiments, allowing to test thegeneralization ability of the U-NET model on unseen data.

5.2. Face Detection and Feature Extraction

The 8 datasets of Table 1 are further encoded into a featurevector that will be used for biometric authentication. First,faces are detected with the “face location” function of (Geit-gey, 2018). The detector used is the more accurate CNN, ratherthan the default HOG model. If more than one face or no faceis found, the image is discarded (e.g. Figure 3c). Note in Ta-ble 1 that the accuracy of the detector varies across the datasets,suggesting that the applied manipulations have different impact(Section 6). Feature extraction is done with a ResNet-34 net-work of 29 convolutional layers and the number of filters perlayer reduced by half (King, 2017). This CNN has been trainedfrom scratch for face recognition using around 3 million facesof 7485 identities from the VGG dataset, the face scrub dataset,and other pictures scraped from the internet. The accuracy ofthis network on the LFW benchmark is 99.38% (King, 2017). Ituses as input images of 64×64, and produces a 128-dimensionalvector (taken from the last layer before the classification layer).

5.3. Face Identification and Verification

For identification, we carry out closed-set experiments. Tofind the closest subject of the database, we use both distancemeasures (Euclidean, Manhattan, and Cosine) and trained ap-proaches (Support Vector Machines, SVMs (Cortes and Vap-nik, 1995) and Extreme Gradient Boosting, XGBoost (Chenand Guestrin, 2016)). Since SVM is a binary classifier, weadopt a one-vs-all approach with multiple SVMs, taking thedecision of the model that is the most confident. XGBoost ismulti-class, using softmax with cross-entropy loss as objective.SVM is a widely employed classifier with good results in bio-metric authentication (Fierrez et al., 2018), and XGBoost haswide adoption in the industry, having obtained top rankings in

recent machine learning challenges (DMLC, 2021). Before theexperiments, feature vectors are scaled with min-max normal-ization, so each element is in the [0, 1] range.

To measure identification accuracy, we first obtain the con-fusion matrix, which reports the number of instances in an ac-tual class vs. the instances in the predicted class. The num-ber of true positives (TPi), false positives (FPi), false negatives(FNi), and true negatives (TNi) of each class i are then obtained(i=1,...,N, with N being the number of classes). In a multi-classproblem, the results are treated as a collection of one-vs-all bi-nary problems, one for each class. Then, the average TP, FP,FN and TN of all classes is computed. (for example, TP isobtained by averaging TP1, ..., TPi, ..., TPN). Finally, the Ac-curacy, Precision, Recall and F1-Measure metrics are obtained(Lever et al., 2016). In principle, these four metrics should bedifferent, but since we are averaging the TPi, FPi, FNi and TNi

of all classes, the four measures turn out to be the equal (noa,2020). For this reason, we will report a single identificationmetric that we will call “Accuracy” in general. This method,called micro averaging, is the recommended choice in multi-class classification with unbalanced classes (noa, 2020), as it isthe case of the LFW database. Another alternative would beto first obtain the Accuracy, Precision, Recall and F1-Measuresof each individual class, and then average them. This is calledmacro averaging, since it combines the four overall metrics ofeach class, and it would result in them being different.

For verification experiments, we use the same distance mea-sures than previously (Euclidean, Manhattan and Cosine). Ac-curacy is measured by obtaining the False Rejection Rate (FRR)and False Acceptance Rates (FAR) at different distance thresh-olds (Wayman, 2009). Then, we report the Equal Error Rate(EER), which is the error at the threshold where FRR=FAR.

6. Results and Discussion

Table 1 provides the number of images of each dataset forwhich a face is detected. The benchmark (unfiltered) datasethas a detection rate of ∼99%. The face is also detected suc-cessfully in case of Instagram filters, which can be expectedsince they mainly enhance contrast or lightning. Faces with adog nose are also well detected, suggesting that the detectoris not sensitive to nose occlusions. On the other hand, occlu-sions in the eye region has a high impact, even with transparentglasses. When the reconstruction network of Section 4 is ap-plied to images with shades, detection accuracy is recovered toa higher extent, highlighting the benefits of the employed re-construction. Figure 5 depicts the reconstruction of the imagesof Figure 2c, d, showing a clear reconstruction of the major-ity of the eye area in the case of shades with 95% opacity. Inthe case of 100% opacity, the reconstruction is less successful,although sufficient to obtain a good detection accuracy.

Figure 6 details the class separation of the various datasetsafter feature extraction by t-SNE (van der Maaten and Hinton,2008) (perplexity 30). Only the five most frequent classes arecolored due to limitation of colors. For the benchmark (unmod-ified) and the Instagram datasets, the clusters appear well sepa-rated, which suggests that class (identity) separation is possible

5

(a) Benchmark (b) Instagram (c) Dog (d) Glasses (e) Shades 95% Opacity (f) Shades 100% Opacity

Fig. 6: Class separation for the various datasets shown via t-SNE scatter plots after feature extraction.

Table 2: Identification Accuracy. The higher the value, the better.

Distance measures SVM XGBoostTrain= Train= Train= Train=

Dataset Eucl Manh Cosine Mean Benchmark Filter Change Benchmark Filter Changebenchmark 0.930 0.921 0.925 0.925 0.993 N/A N/A 0.993 N/A N/A

dog 0.562 0.543 0.539 0.548 0.921 0.966 +0.045 0.917 0.974 +0.057glasses 0.479 0.455 0.498 0.477 0.879 0.922 +0.043 0.892 0.925 +0.033

instagram 0.923 0.916 0.921 0.92 0.992 0.991 -0.001 0.993 0.993 0shades leak 0.435 0.407 0.423 0.421 0.708 0.866 +0.158 0.722 0.030 -0.692

shades recon leak 0.663 0.639 0.655 0.652 0.885 0.946 +0.091 0.881 0.932 +0.051shades no leak 0.386 0.365 0.379 0.376 0.672 0.854 +0.182 0.663 0.009 -0.654

shades recon no leak 0.368 0.357 0.350 0.358 0.594 0.849 +0.255 0.619 0.124 -0.495

with a high accuracy. Clusters of the dog nose dataset are stillseparated, although closer and with higher intra-class variabil-ity. In the datasets with glasses, and specially with shades, theclusters appear much closer. It can also be seen a parallelismbetween the t-SNE plots and the face detection results reportedin Table 1, in the sense that faces where the nose or eye appearobfuscated are more difficult to detect and to recognize.

We now report face identification experiments in Table 2.With distance measures, the first original (unfiltered) image isused for enrolment, and identification attempts are done withfiltered images of the different datasets. Comparatively, the Eu-clidean distance performs best, although just 1-2% better thanthe other distance metrics. The performance on the non-filteredbenchmark and instagram datasets are the highest, with minus-cule differences between them (≈92-93% for all distance mea-sures). The dog dataset follows at ≈56%, and the transparentglasses dataset at ≈50%. The performance of shades leak andshades no leak datasets is poor, specially the latter one. Afterreconstruction, the shades recon leak dataset shows some per-formance recovery (66.3%). On the other hand, reconstructionwith the non-leaking shades (100% opacity) does not contributeto any performance improvement with distance measures.

To carry out identification experiments with trained methods,the datasets are split into 80% (training) and 20% (test). Train-ing is done either with benchmark unfiltered images (denotedas “Train=Benchmark”) or with filtered images of the corre-sponding datasets (“Train=Filter”). Identification tests are al-ways done with filtered images. A first observation is that bothtrained methods behave similarly on the different datasets, atleast in those that do not involve shades (eye obfuscation). Forthe datasets with shades, SVM is significantly better than XG-Boost. Also, both trained methods are benefited in the major-ity of cases by training with filtered images. This is speciallytrue for SVM with the datasets involving shades, where accu-

racy improvements between 9.1% and 25.5% are observed. Itis also relevant the improvement observed between shades leak(86.6%) and shades recon leak (94.6%), highlighting again thebenefits of the image reconstruction method. On the otherhand, as observed in the previous paragraph too, reconstruc-tion with the non-leaking shades (100% opacity) does not con-tribute to any performance improvement. A significant differ-ence, however, is that the performance of SVM in the datasetsinvolving shades is much better than with distance measures(compare the best cases marked in bold). Another benefitof trained methods is that the performance with datasets en-tailing any type of obfuscation (either nose or eyes) is sig-nificantly better. For example, with SVM, the performancewith any dataset is above 85%. Notably, the dog dataset has96.6% (56.2% with distance measures), the transparent glasses92.2% (49.8%), the shades recon leak 94.6% (66.3%), and theshades recon no leak 84.9% (36.8%).

Finally, we report verification experiments (Table 3). Asbefore, the first original (unfiltered) image is used for enrol-ment, and the remaining filtered images of the different datasetsfor verification attempts, both genuine (mated) and impostor(non-mated). As can be observed, the Euclidean distance per-forms best, although the other distances are less than 1% be-hind. The best EER is for benchmark and instagram sets at≈ 2%, with the rest (in descending order) at 7% (dog), 8.2%(glasses), 12.5% (shades leak), and 14% (shades no leak). TheEER for reconstructed shades leak surpasses the dog results at6%. Also, as before, reconstruction with the non-leaking shades(shades recon no leak) does not show any improvement.

7. Conclusions

Social media platforms offer many different filters to ”beau-tify” selfie images before they are uploaded. Augmented Re-

6

Table 3: Verification EER. The lower the value, the better.

Dataset Eucl Manh Cosinebenchmark 0.019 0.020 0.020

dog 0.071 0.070 0.073glasses 0.082 0.085 0.082

instagram 0.022 0.023 0.023shades leak 0.125 0.132 0.125

shades recon leak 0.062 0.067 0.060shades no leak 0.140 0.147 0.144

shades recon no leak 0.144 0.152 0.144

ality (AR) filters that modify the face image by adding itemslike ”noses” or ”glasses” are also popular in social media andvideo-conference applications. Unfortunately, such filters maydistort or occlude facial features, affecting the capability of rec-ognizing individuals or even detecting the face itself. This mayhinder for example the investigation of crimes on images fromsocial media platforms or confiscated mobile devices, where thehuge amount of data to be analyzed demands some automaticpre-screening (Powell and Haynes, 2020; Hassan, 2019).

Accordingly, we are interested in studying the effect of pop-ular selfie ”beautification” filters on the accuracy of both facedetection and recognition. We use the most popular selfie filtersin Instagram (Figure 1) as well as AR filters that incorporate ar-tificial animal noses or glasses to the face (Figure 2). We con-sider three types of glasess: transparent, sunglasses with 95%opacity, and with 100% opacity. The first class of filters mostlyalter the contrast or illumination of the image, where the secondclass obfuscate part of the face with the artificial item that is in-troduced. These are applied to 4,324 images from 158 individ-uals of the Labeled Faces in the Wild (LFW) database (Huanget al., 2007), after which we apply a CNN-based face detec-tor (Geitgey, 2018). Recognition is done using features from aResNet-34 network trained to recognize faces (King, 2017).

The effect of some of the employed filters have been ob-served to be detrimental both to face detection and identityrecognition, specially those who obfuscate the eye region.Thus, we also explore methods to reverse the applied manip-ulations. To do so, we train a modified U-NET network (Ron-neberger et al., 2015; Springenberg et al., 2015) on a separateset of 203K images from the CelebA database (Liu et al., 2015)to recover the obfuscated eye region (Figure 5). In the face de-tection arena, changes to image contrast/illumination or obfus-cation of the nose region do not have a great impact (detectionaccuracy >98%). Obfuscation of the eye region, on the otherhand, reduces the accuracy by 10%. This means that the de-tector relies on the availability of the eye region, even if therest of the face is visible. This is confirmed after applying thereconstruction U-NET network, when detection accuracy is re-covered to the levels of unmodified images. Face occlusion isa difficulty known to make detection systems to struggle (Zenget al., 2021), although our results indicate that eye occlusion ismore critical than for example nose occlusion.

A similar trend is observed in face identification and veri-fication experiments. Alterations to the eye region produce amuch stronger decline in accuracy than altering the nose areaor changing contrast/illumination. This indicates that the em-

ployed ResNet model also relies strongly on the eye region todistinguish identities. Recognition experiments are done bothwith distance measures between feature vectors and trainedclassifiers (Cortes and Vapnik, 1995; Chen and Guestrin, 2016).As enrolment images, we evaluate both the use of original un-filtered images, as well as filtered images. In general, trainedclassifiers overcome distance measures, and the use of filteredimages for enrolment also provides better accuracy. In overallterms, by combining the use of filtered images to train machinelearning methods and the reconstruction of eye regions by U-NET, we manage to achieve an identification accuracy above92% with the majority of image perturbations. In the verifi-cation experiments, the EER is kept generally below 8%. Theapplication of sunglasses affect the system most heavily, and inthe case of sunglasses with 100% opacity, the recognition per-formance does not change after applying U-NET. Even if facedetection was improved also in this case, the achieved recon-struction of the eye region (Figure 5) is not sufficient to im-prove recognition. This can be somehow expected given thatthe original pixels behind the sunglasses are completely de-stroyed. Thus, a faithful reconstruction is not achieved as inthe glasses with 95% opacity, where the original information ispreserved to a certain extent behind the glasses.

As future work, we are exploring to improve the recon-struction performance further, for example incorporating imagetranslation methods based on adversarial training (Isola et al.,2017; Zhu et al., 2017). We expect to achieve a more realisticand accurate result, specially with non-leaking (100% opacity)AR filters such as the sunglasses, which have been shown in ourexperiments to have a negative impact on both face detectionand identity recognition. We will also explore reconstruction ofother face areas, such as the nose or the mouth.

Another direction that we have not addressed in this work isthe detection of applied manipulations. Our results have shownthat accuracy can be improved by using filtered images to trainthe recognition algorithm and by reconstructing the altered faceregions. To allow this, knowing which specific alteration hasbeen applied or which face regions have been modified is use-ful. We predict that detecting alterations at the patch level willbe a fruitful avenue (Jain et al., 2018). The latter will be com-bined with the use of detection and recognition methods basedon local analysis, so if one particular region is occluded or al-tered, it is set to not contributing to the task. This is similarto for example using detectors of the periocular region that donot rely on the full-face being available (Alonso-Fernandez andBigun, 2016), but applied to different regions of the face.

Acknowledgments

This work has been carried out by P. Hedman and V. Skepet-zis in the context of their Master Thesis at Halmstad Univer-sity (Master’s Programme in Network Forensics). Authors K.Hernandez-Diaz, J. Bigun and F. Alonso-Fernandez would liketo thank the Swedish Research Council (VR) and the SwedishInnovation Agency (VINNOVA) for funding their research.

7

References

, 2020. Metrics and scoring: quantifying the quality of predic-tions. URL: https://scikit-learn.org/stable/modules/model_evaluation.html.

Alonso-Fernandez, F., Bigun, J., 2016. A survey on periocular biometrics re-search. Pattern Recognition Letters 82, 92–105.

Bakhshi, S., Shamma, D., Kennedy, L., Gilbert, E., 2021. Why we filter ourphotos and how it impacts engagement. Proceedings of the InternationalAAAI Conference on Web and Social Media 9, 12–21.

Berman, M., Hawkins, D., 2017. Hate crime charges filedafter ‘reprehensible’ video shows attack on mentally illman in Chicago. Washington Post URL: https://www.

washingtonpost.com/news/morning-mix/wp/2017/01/05/

4-in-custody-after-group-beats-disabled-man-on-facebook-live-while-shouting-anti-trump-profanities-chicago-police-say/.Bharati, A., Singh, R., Vatsa, M., Bowyer, K.W., 2016. Detecting Facial Re-

touching Using Supervised Deep Learning. IEEE Transactions on Informa-tion Forensics and Security 11, 1903–1913.

Bharati, A., Vatsa, M., Singh, R., Bowyer, K.W., Tong, X., 2017. Demography-based facial retouching detection using subclass supervised sparse autoen-coder, in: IEEE International Joint Conference on Biometrics, IJCB, pp.474–482.

Canva, 2020. Most popular Instagram filters from around the world. URL:https://www.canva.com/learn/popular-instagram-filters/.

Chen, T., Guestrin, C., 2016. XGBoost: A Scalable Tree Boosting System. Pro-ceedings of the 22nd ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining , 785–794.

Cortes, C., Vapnik, V., 1995. Support-vector networks. Machine Learning 20,273–297.

Damer, N., Grebe, J.H., Chen, C., Boutros, F., Kirchbuchner, F., Kuijper, A.,2020. The effect of wearing a mask on face recognition performance: anexploratory study, in: International Conference of the Biometrics SpecialInterest Group, BIOSIG, pp. 1–6.

Dantcheva, A., Chen, C., Ross, A., 2012. Can facial cosmetics affect the match-ing accuracy of face recognition systems?, in: IEEE International Confer-ence on Biometrics: Theory, Applications and Systems, BTAS, pp. 391–398.

DMLC, 2021. Distributed (Deep) Machine Learning Com-munity. Awesome XGBoost. URL: https://github.

com/dmlc/xgboost/blob/master/demo/README.md#

machine-learning-challenge-winning-solutions.Ferrara, M., Franco, A., Maltoni, D., Sun, Y., 2013. On the Impact of Alter-

ations on Face Photo Recognition Accuracy, in: International Conference onImage Analysis and Processing, ICIAP, pp. 743–751.

Fierrez, J., Morales, A., Vera-Rodriguez, R., Camacho, D., 2018. Multiple clas-sifiers in biometrics. part 1: Fundamentals and review. Information Fusion44, 57–64.

Geitgey, A., 2018. face-recognition: Recognize faces from Python orfrom the command line. URL: https://github.com/ageitgey/face_recognition.

Hassan, N.A., 2019. Gathering Evidence from OSINT Sources, in: Hassan,N.A. (Ed.), Digital Forensics Basics: A Practical Guide Using WindowsOS. Apress, Berkeley, CA, pp. 311–322.

Hoppe, T., 2021. thoppe/instafilter. URL: https://github.com/thoppe/instafilter. original-date: 2020-08-29T13:49:40Z.

Huang, G.B., Mattar, M., Lee, H., Learned-Miller, E., 2012. Learning toalign from scratch, in: Advances in Neural Information Processing Systems,NIPS.

Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E., 2007. Labeled Faces inthe Wild: A Database for Studying Face Recognition in Unconstrained En-vironments. Technical Report 07-49. University of Massachusetts, Amherst.

Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A., 2017. Image-to-image translationwith conditional adversarial networks, in: IEEE Conference on ComputerVision and Pattern Recognition (CVPR), pp. 5967–5976.

Jain, A., Singh, R., Vatsa, M., 2018. On detecting GANs and retouching basedsynthetic alterations, in: IEEE International Conference on Biometrics The-ory, Applications and Systems (BTAS), pp. 1–7.

King, D., 2017. High Quality Face Recognition with DeepMetric Learning. URL: http://blog.dlib.net/2017/02/

high-quality-face-recognition-with-deep.html.Lever, J., Krzywinski, M., Altman, N., 2016. Classification evaluation. Nature

Methods 13, 603–604.

Liu, Z., Luo, P., Wang, X., Tang, X., 2015. Deep learning face attributes in thewild, in: International Conference on Computer Vision, ICCV.

van der Maaten, L., Hinton, G., 2008. Visualizing data using t-sne. Journalof Machine Learning Research 9, 2579–2605. URL: http://jmlr.org/papers/v9/vandermaaten08a.html.

Morrison, S., 2021. The Capitol rioters put them-selves all over social media. Now they’re getting ar-rested. URL: https://www.vox.com/recode/22218963/

capitol-photos-legal-charges-fbi-police-facebook-twitter.Pagones, S., 2021. Live-streamed California double-murder victims include

girl, 15, and fiancee of suspect, reports say. URL: https://www.foxnews.com/us/live-streamed-california-double-murder-victims.last Modified: 2021-02-02T12:21:55-05:00 Publisher: Fox News.

Powell, A., Haynes, C., 2020. Social Media Data in Digital Forensics Investi-gations. Springer International Publishing, Cham. pp. 281–303.

Ramachandra, R., Busch, C., 2017. Presentation attack detection methods forface recognition systems: A comprehensive survey. ACM Comput. Surv.50.

Rathgeb, C., Dantcheva, A., Busch, C., 2019. Impact and Detection of FacialBeautification in Face Recognition: An Overview. IEEE Access 7, 152667–152678.

Rathgeb, C., Satnoianu, C.I., Haryanto, N.E., Bernardo, K., Busch, C., 2020.Differential Detection of Facial Retouching: A Multi-Biometric Approach.IEEE Access 8, 106373–106385.

Rattani, A., Derakhshani, R., Ross, A., 2019. Introduction to Selfie Biometrics.Springer International Publishing, Cham. pp. 1–18.

Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks forbiomedical image segmentation, in: Navab, N., Hornegger, J., Wells, W.M.,Frangi, A.F. (Eds.), Medical Image Computing and Computer-Assisted In-tervention – MICCAI 2015, Springer International Publishing, Cham. pp.234–241.

Scherhag, U., Rathgeb, C., Merkle, J., Breithaupt, R., Busch, C., 2019. Facerecognition systems under morphing attacks: A survey. IEEE Access 7,23012–23026.

Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A., 2015. Striv-ing for simplicity: The all convolutional net, in: Bengio, Y., LeCun, Y.(Eds.), 3rd International Conference on Learning Representations, ICLR,San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings.

Wayman, J.L., 2009. Biometric Verification/ Identification/ Authentication/

Recognition: The Terminology. Springer US, Boston, MA. pp. 153–157.Zeng, D., Veldhuis, R., Spreeuwers, L., 2021. A survey of face recognition

techniques under occlusion. IET Biometrics 10, 581–606.Zhu, J., Park, T., Isola, P., Efros, A.A., 2017. Unpaired image-to-image trans-

lation using cycle-consistent adversarial networks, in: IEEE InternationalConference on Computer Vision, ICCV, pp. 2242–2251.