multi-source deep domain adaptation for quality control in

8
Multi-Source Deep Domain Adaptation for Quality Control in Retail Food Packaging Mamatha Thota & Stefanos Kollias School of Computer Science University of Lincoln Lincoln, UK {mthota, skollias}@lincoln.ac.uk Mark Swainson National Centre for Food Manufacturing University of Lincoln Holbeach Technology Park, UK [email protected] Georgios Leontidis Department of Computing Science University of Aberdeen Aberdeen, UK [email protected] Abstract—Retail food packaging contains information which informs choice and can be vital to consumer health, including product name, ingredients list, nutritional information, aller- gens, preparation guidelines, pack weight, storage and shelf life information (use-by / best before dates). The presence and accuracy of such information is critical to ensure a detailed understanding of the product and to reduce the potential for health risks. Consequently, erroneous or illegible labeling has the potential to be highly detrimental to consumers and many other stakeholders in the supply chain. In this paper, a multi-source deep learning-based domain adaptation system is proposed and tested to identify and verify the presence and legibility of use-by date information from food packaging photos taken as part of the validation process as the products pass along the food production line. This was achieved by improving the generalization of the techniques via making use of multi-source datasets in order to extract domain-invariant representations for all domains and aligning distribution of all pairs of source and target domains in a common feature space, along with the class boundaries. The proposed system performed very well in the conducted experiments, for automating the verification process and reducing labeling errors that could otherwise threaten public health and contravene legal requirements for food packaging information and accuracy. Comprehensive experiments on our food packaging datasets demonstrate that the proposed multi- source deep domain adaptation method significantly improves the classification accuracy and therefore has great potential for application and beneficial impact in food manufacturing control systems. Index Terms—deep learning, convolutional neural network, multi-source domain adaptation, domain adaptation, optical character verification, retail food packaging I. I NTRODUCTION Europe’s food and drink sector employs 4.57 million people and has a turnover of 1.1 trillion, making it the largest manufacturing industry in the EU (Source: Data & Trends. EU Food & Drink Industry 2018. FoodDrink Europe 2018). The priority of all nations is to protect and feed their citizens. To assure public health, food safety is a legal requirement across the food supply chain. As part of this control approach all pre-packaged food products are required to display mandatory information on the food pack label. This information serves to ensure that the consumer can make a clear and informed choice as to the nature of their food purchases and are warned of any particular issues which could affect their health, e.g. Product shelf life information, the presence of allergens and also other warnings such as the potential for presence of physical or microbiological hazards associated with the particular food type. Labeling mistakes or legibility issues can therefore create major food safety problems including: Food poisoning, due to the consumption of a product that has exceeded its actual use-by date (due to the date on the packaging either being incorrect or illegible); The triggering of an allergic reaction in a consumer who is susceptible to a particular food allergen, but due to a label fault was not aware that this allergen was present in their chosen food. Food product traceability is a legal requirement in the EU, typically achieved by the presence of production date, time and process line code information on the food packaging. The presence of this information ensures that in the event of an emergency it is possible to identify and remove the affected food products in the supply chain. As a result, any fault in the accuracy or legibility of the pack traceabilility information will result in the supply chain stakeholders having to recall far more product than necessary due to the fact that the specific batch of product actually affected cannot be individually identified [1]. Such circumstances result in inefficient food recall processes and an unnecessarily high level of food waste. All very financially expensive to the food manufacturers both in monetary terms and reputation. The environmental impact of such events also cannot be overlooked in terms of the impact of food waste on sector carbon footprints and ultimately in exacerbating climate change. A common approach to overcome these labeling risks in the food supply chain is to manually read and verify the use by dates on the pre-packaged products during the manufacturing process. These Quality Assurance (QA) checks are typically conducted by an operator, but as such practices are very labo- rious and repetitive in nature, it places the operator in an error- prone working environment. Another common approach is to use Optical Character Verification (OCV) approaches, where a supervisory system has the correct date code format which gets transferred both to the printer and the vision system. The vision system then verifies the date, heavily relying on the consistency in the date format, packaging and camera view angle, which is often hard to achieve in the food and drink manufacturing environment. OCV systems also require accurately labeled data to be utilized for training; but labeling arXiv:2001.10335v1 [cs.LG] 28 Jan 2020

Upload: others

Post on 22-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-Source Deep Domain Adaptation for Quality Control in

Multi-Source Deep Domain Adaptation for QualityControl in Retail Food Packaging

Mamatha Thota & Stefanos KolliasSchool of Computer Science

University of LincolnLincoln, UK

{mthota, skollias}@lincoln.ac.uk

Mark SwainsonNational Centre for Food Manufacturing

University of LincolnHolbeach Technology Park, UK

[email protected]

Georgios LeontidisDepartment of Computing Science

University of AberdeenAberdeen, UK

[email protected]

Abstract—Retail food packaging contains information whichinforms choice and can be vital to consumer health, includingproduct name, ingredients list, nutritional information, aller-gens, preparation guidelines, pack weight, storage and shelflife information (use-by / best before dates). The presence andaccuracy of such information is critical to ensure a detailedunderstanding of the product and to reduce the potential forhealth risks. Consequently, erroneous or illegible labeling has thepotential to be highly detrimental to consumers and many otherstakeholders in the supply chain. In this paper, a multi-sourcedeep learning-based domain adaptation system is proposed andtested to identify and verify the presence and legibility ofuse-by date information from food packaging photos takenas part of the validation process as the products pass alongthe food production line. This was achieved by improving thegeneralization of the techniques via making use of multi-sourcedatasets in order to extract domain-invariant representations forall domains and aligning distribution of all pairs of source andtarget domains in a common feature space, along with the classboundaries. The proposed system performed very well in theconducted experiments, for automating the verification processand reducing labeling errors that could otherwise threaten publichealth and contravene legal requirements for food packaginginformation and accuracy. Comprehensive experiments on ourfood packaging datasets demonstrate that the proposed multi-source deep domain adaptation method significantly improvesthe classification accuracy and therefore has great potential forapplication and beneficial impact in food manufacturing controlsystems.

Index Terms—deep learning, convolutional neural network,multi-source domain adaptation, domain adaptation, opticalcharacter verification, retail food packaging

I. INTRODUCTION

Europe’s food and drink sector employs 4.57 million peopleand has a turnover of 1.1 trillion, making it the largestmanufacturing industry in the EU (Source: Data & Trends. EUFood & Drink Industry 2018. FoodDrink Europe 2018). Thepriority of all nations is to protect and feed their citizens. Toassure public health, food safety is a legal requirement acrossthe food supply chain. As part of this control approach allpre-packaged food products are required to display mandatoryinformation on the food pack label. This information serves toensure that the consumer can make a clear and informed choiceas to the nature of their food purchases and are warned of anyparticular issues which could affect their health, e.g. Productshelf life information, the presence of allergens and also other

warnings such as the potential for presence of physical ormicrobiological hazards associated with the particular foodtype. Labeling mistakes or legibility issues can therefore createmajor food safety problems including: Food poisoning, dueto the consumption of a product that has exceeded its actualuse-by date (due to the date on the packaging either beingincorrect or illegible); The triggering of an allergic reaction ina consumer who is susceptible to a particular food allergen,but due to a label fault was not aware that this allergen waspresent in their chosen food.

Food product traceability is a legal requirement in the EU,typically achieved by the presence of production date, timeand process line code information on the food packaging. Thepresence of this information ensures that in the event of anemergency it is possible to identify and remove the affectedfood products in the supply chain. As a result, any fault inthe accuracy or legibility of the pack traceabilility informationwill result in the supply chain stakeholders having to recall farmore product than necessary due to the fact that the specificbatch of product actually affected cannot be individuallyidentified [1]. Such circumstances result in inefficient foodrecall processes and an unnecessarily high level of food waste.All very financially expensive to the food manufacturers bothin monetary terms and reputation. The environmental impactof such events also cannot be overlooked in terms of the impactof food waste on sector carbon footprints and ultimately inexacerbating climate change.

A common approach to overcome these labeling risks in thefood supply chain is to manually read and verify the use bydates on the pre-packaged products during the manufacturingprocess. These Quality Assurance (QA) checks are typicallyconducted by an operator, but as such practices are very labo-rious and repetitive in nature, it places the operator in an error-prone working environment. Another common approach is touse Optical Character Verification (OCV) approaches, wherea supervisory system has the correct date code format whichgets transferred both to the printer and the vision system.The vision system then verifies the date, heavily relying onthe consistency in the date format, packaging and cameraview angle, which is often hard to achieve in the food anddrink manufacturing environment. OCV systems also requireaccurately labeled data to be utilized for training; but labeling

arX

iv:2

001.

1033

5v1

[cs

.LG

] 2

8 Ja

n 20

20

Page 2: Multi-Source Deep Domain Adaptation for Quality Control in

a large number of target samples is overly laborious and avery cost-ineffective process, hence the need for a more robustsolution. Previous studies to consider deep learning (DL)techniques for OCV have primarily focused on one domainand/or using transfer learning to enhance the performance andgeneralization of the developed techniques [2]–[6].

Over the last few years, DL [7] has been successfullyapplied to numerous applications and domains due to theavailability of large amounts of labeled data, such as com-puter vision and image processing [8]–[12], signal processing[13]–[16], time series analysis and forecasting [17]–[21], andautonomous driving [22], [23]. As most of the applicationsof DL techniques, such as the aforementioned ones, referto supervised learning, labeling large number of datasetsconsume a lot of time and is very cost ineffective. In addition,when deploying a trained model to real-life applications theassumption is that both the source (training set) and thetarget one (application-specific) are drawn from the samedistributions. When this assumption is violated, the DL modeltrained on the source domain will not generalize well on thetarget domain due to the distribution differences between thesource and the target domains known as domain shift.

Learning a discriminative model in the presence of domainshift between source and target datasets is known as DomainAdaptation. Typically, domain adaptation methods are usedto model labeled data from one single source domain toanother called target domain. In real life applications, datafrom multiple sources and domains do exist which could beleveraged to develop a more robust and generalizable model.The information extracted from these multiple sources willbe able to better fit the target distribution data that might notmatch any of the available source data – so it is more valuablein the performance improvement and is therefore receivingconsiderable attention in the real world applications like thosedescribed in this paper.

One simple approach for predicting the labels of the targetdomain data is to combine the training samples from allsource domains and build a single model based on the pooledtraining samples. Due to the data expansion, the methodsmight improve the performance, however, this simple approachwill not work well in our application as there are significantconditional probability differences across domains in the foodimage data. Another approach is to extract multiple domain-invariant representations for each source and target domainpairs and map each of them into specific feature spaces inorder to match their distributions, by training multiple models.However, this would take a lot of time as it involves trainingmultiple models, therefore it is necessary to find a better wayto make full use of multiple source domains. So here wepropose an approach to overcome the above mentioned issues.To the best of our knowledge this is the first study to considermulti-source deep learning domain adaptation in the retail foodpackaging control, showing the great potential of advancingrelated quality assurance systems in the food supply chain,further supporting automation towards industry 4.0 and withhigh potential to reduce errors and their related costs to the

consumer and food business operators.The rest of the paper is laid out as follows: Section II

presents the related work in single source and multi sourcedomain adaptation. Section III describes the dataset and ourproposed approach, focusing on multi-source domain adapta-tion techniques, Section IV presents the experimental resultsobtained after applying our model to the food packaging dataand Section V concludes the paper.

II. RELATED WORK

In recent years, many single source domain adaptationmethods have been proposed. Discrepancy-, Adversarial- andReconstruction- based approaches are the three primary do-main adaptation approaches currently being applied to addressthe distribution shift [24]. Discrepancy-based approaches relyon aligning the distributions in order to minimize the diver-gence between them. The most commonly used discrepancy-based methods are Maximum Mean Discrepancy (MMD) [25],Correlation Alignment (CORAL) [26] and KullbackLeiblerdivergence [27]. Adversarial-based approaches minimize thedistance between the source and the target distributions usingdomain confusion, an adversarial method used in GenerativeAdversarial Networks. The approach proposed in [28] triesto minimize the feature distributions by integrating a gradientreversal layer, whereas in [29] the aim is to minimize thedistance between source and target samples in an adver-sarial manner using Wasserstein distance. Another class ofapproaches known as Reconstruction-based approaches createa shared representation between the source and the targetdomains whilst preserving the individual characteristics ofeach domain. Rather than minimizing the divergence, themethod in [30] learns joint representations that classify thelabeled source data and at the same time reconstruct the targetdomain.

In contrast to the single source domain adaptation tech-niques, multi-source domain adaptation techniques assume thedata is available from multiple source domains. In domainadaptation, multi-source domains are even more critical thansingle source domain as they need to handle both domainalignment between source and target domains, along withalignments between multiple available sources. The previ-ous single source domain adaptation methods, as the namesuggests, only consider a single source and a single target,however in real world applications, there are multiple sourcedomains available to extract knowledge from.

Learning from multiple different sources was originatedfrom early theoretical analysis [31], [32], and has manypractical applications. Initially many shallow models wereproposed in order to tackle the multi-source domain adaptationproblem [33] [34]. The work in [32] established a generalbound on the expected loss of the model by minimizing theempirical loss on the nearest k sources. Deep Cocktail Network[35] proposed a multi-way adversarial learning to minimize thediscrepancy between the target and each of the multiple sourcedomains. The work most related to ours has been the one in[36], where only MMD loss was used to minimize the feature

Page 3: Multi-Source Deep Domain Adaptation for Quality Control in

Fig. 1. Overview of our proposed Multi-Source Domain Adaptation model.

discrepancy. Our proposed approach aims at minimizing thefeature discrepancy through implementing a new loss functionthat includes both MMD and CORAL losses in for improvedgeneralization. Additionally, the Class Activation Mapping(CAM) component of our model [37] adds an extra step tothe algorithm that provides a visualization of which areas inthe image contributed the most to the decision-making process,enabling the trust of the end-users.

III. METHODOLOGY

A. Problem Statement

Multi-source domain adaptation with N different sourcedistributions is denoted as {psj(x, y)}Nj=1, and the labeledsource domain image data {(Xsj , Ysj)}Nj=1 are drawn fromthese distributions, where Xsj = {xsji }

cji=1 represents image

data sampled with c number of images from source domain jand Ysj = {ysji }

cji=1 is the corresponding ground truth labels.

We also have the target distribution pt(x, y), from which thetarget domain image data Xt = {xti)}ci=1 with a total numberof c images sampled without any label observation Yt. Theend goal is to predict the labels of the unlabeled data in thetarget domain using the labeled multi-source domain data.

B. Dataset Description

The Food Packaging Image dataset used in this study con-sists of more than 30,000 balanced images (OK vs NOT-OK)from six different locations [Abbeydale (Ab), Burton (Bu),Boston (Bo), Listowel (Li), Windmill-Lane (Wi) and Ossett(Os)], and the task is to automatically verify the quality ofprinted use-by dates. Initially, three people manually annotatedthis dataset, with two more annotators further sampling andverifying the manual annotations for quality control, hencekeeping those 30,000 images that both annotators were in

full agreement. The main challenges of these datasets arethe unavailability of labeled data and high variability acrossthe datasets, such as heavy distortion, varying background,illumination/blur, date format, angle and orientation of thelabel, etc. We explore various techniques and propose anapproach for adapting knowledge learnt by one dataset toanother. The training process was carried out on a 70% samplewith another 10% used for the validation process. Finally, theremaining 20% of the images were used for evaluating andtesting all the methods across the six distinct locations listedabove in order to automatically verify the quality of printeduse-by dates, hence detecting images of very low quality.Examples of how the images used in this study look like canbe seen in figures 2 and 3.

C. Overview of our Architecture

In this section our proposed multi-source DL-based domainadaptation approach is introduced that aims at improving thebinary classification of the food packaging dataset across alllocations. We use labeled source data from multiple locationsand unlabeled target data from a single location. As shownin figure 1, our model comprises of a feature extractor anda classification part. The feature extractor part learns use-ful representations for all domains, whereas its sub-networklearns features specific to each source-target domain pairs.The classification part of the model learns domain-specificattributes for each target image and provides N categorizationresults. The classification part of the model aligns with thedomain-specific classifiers, as the class boundaries are highlylikely to be misclassified, because they are learned fromdifferent classifiers. The CAM part of the model helps with thevisualization of the CNN’s interpretation of the model whenmaking predictions.

Page 4: Multi-Source Deep Domain Adaptation for Quality Control in

Fig. 2. Example images that are considered to be of acceptable quality (OK)for the purposes of our implementations.

Fig. 3. Example images that are considered to be of unacceptable/bad quality(NOT-OK) for the purposes of our implementations.

D. Multi-Source Domain Adaptation

Domain Adaptation is challenging due to the domain shiftbetween the source and the target datasets. Multi-source do-main adaptation is more challenging as it has to deal withdomain shift between multiple source datasets. Our model iscomposed of a feature extractor and source specific classifica-tion parts and aims at minimizing the feature discrepancy, forlearning domain-invariant representations, the class boundarydiscrepancy, for minimizing the mismatch among all classi-fiers, and finally improving source data classification by reduc-ing the classification loss, leading to improved generalizationon the target dataset.

Feature Discrepancy Loss: We reduce the feature dis-crepancy by minimizing both MMD and CORAL loss. Theproposed deep domain adaptation architecture jointly adaptsfeatures using two popular feature adaptation metrics thatcombine MMD with CORAL in order to align higher orderstatistics along with the first and second order statistics.

MMD defines the distance between the two distributionswith their mean embeddings in the Reproducing Kernel HilbertSpace (RKHS). MMD is a two sample kernel test to determinewhether to accept or reject the null hypothesis p = q [38],where p and q are source and target domain probabilitydistributions. Let H be the RKHS with a characteristic kernelk, the squared distance formulation of MMD is shown in theequation (1)

d2k(p, q) = ||Ep[φ(xs)]− Eq[φ(x

t)]||2H (1)

The distance between the distributions with their meanembeddings is given in equation (2)

LossMMD =

∥∥∥∥∥∥ 1

N

N∑i=1

φ(xsi )−1

M

M∑j=1

φ(xtj)

∥∥∥∥∥∥2

(2)

=1

N2

N∑i=1

N∑i′=1

φ(xsi )Tφ(xsi′)−

2

NM

N∑i=1

M∑j=1

φ(xsi )Tφ(xtj)

+1

M2

M∑j=1

M∑j′=1

φ(xtj)Tφ(xtj′)

(3)where N and M are the total number of items in the sourceand target respectively.

The kernel trick can be applied as each term in the equation(3) involves inner products between φ vectors in order toestimate the squared distance between the empirical kernelmean embeddings as follows

LossMMD =1

N2

N∑i=1

N∑i′=1

k(xsi , xsi′)−

2

NM

N∑i=1

M∑j=1

k(xsi , xtj)

+1

M2

M∑j=1

M∑j′=1

k(xtj , xtj′)

(4)CORAL loss [26] is also used to minimize the discrepancy

between source and target data by reducing the distancebetween the source and target feature representations. Wedefine the CORAL loss as the distance between the second-order statistics (covariances) of the source and target features:

LossCORAL =1

4d2||Cs − Ct||2F (5)

||.||2F where denotes the squared matrix Frobenius norm, Cs

is the source covariance matrix and Ct is the target covariancematrix.

The covariance matrices of the source and target data aregiven by:

Cs =1

N − 1(DT

s Ds −1

N(1TDs)

T (1TDs)) (6)

Ct =1

M − 1(DT

t Dt −1

M(1TDt)

T (1TDt)) (7)

1 is a column vector with all elements equal to 1, N andM are the total number of items in the source and targetrespectively.

The total feature discrepancy loss is therefore given by theequation (8)

LossFD = LossMMD + LossCORAL (8)

Class Discrepancy Loss: Classifiers are likely to misclas-sify the target samples near the class boundary as they are

Page 5: Multi-Source Deep Domain Adaptation for Quality Control in

trained using different source domains, each having differenttarget prediction. Therefore we aim at minimizing the discrep-ancy among all classifiers by making their probabilistic outputssimilar. The class discrepancy is calculated by the equation (9)

LossCD =

(N

2

)−1 N−1∑j=1

N∑i=j+1

[|E(Xi)− E(Xj)|] (9)

where N is total number of classifiers.Classification Loss: The network reduces the discrepancy

among classifiers by minimizing the classification loss. Wetrain the network with labeled source data and calculate theempirical loss through minimizing the cross-entropy loss asfollows

LossCL =1

N

N∑i=1

V (f(xsi ), ysi ) (10)

where V (., .) is the cross-entropy loss function and f(xsi ) isthe conditional probability that the CNN assigns to label ysi .

Our total loss is made up of classication loss (CL), featurediscrepancy loss (FD) and class discrepancy loss (CD). Byminimizing each of these losses the network can classify thesource domain data more accurately and reduce the datasetbias and the discrepancy among classiers. Jointly minimizingthese three losses, the network can learn features that general-ize and adapt well on the target dataset. The overall objectiveof our network can be formulated as

LossTotal = LossCL + λLossFD + γLossCD (11)

where λ and γ are penalty parameters.

E. Class Activation Maps

Deep neural networks are often considered to be black boxesthat offer no straightforward way of understanding what anetwork has learned or which part of an input to the networkwas responsible for the prediction of the network. When suchmodels make predictions, there is no explanation to justifythem. Class activation maps [37] are an efficient way tovisualize the importance the model places on various regionsin an image while training, offering insights that are crucial forthe models usability and explainability. We have incorporatedCAM to our approach in order to visualize which areas of thefood packaging images contribute the most to the decision-making process of the algorithm.

CAM provides some insight into the process of CNNinterpretability and explainability by overlaying a heat mapover the original image to demonstrate where the modelis paying more attention for its decision-making process. Itvisually demonstrates how the algorithm comes up with itsprediction by highlighting the pixels of the image that triggerthe model to associate the image with a particular class.CAM help us understand which regions of an input imageinfluence the convolutional neural networks output prediction.Such an information can be used for examining the bias of the

TABLE ICOMPARISON OF CLASSIFICATION ACCURACY AND AVERAGE ACCURACY

(%) ON FOOD PACKAGING TARGET DATASET OSSETT (Os)

Method Source Accuracy AverageSingle-Source Ab 84.7 84.70

Source Combined with 2 datasets

Ab,Bo 85.2

85.30Ab,Bu 86.1Ab,Li 85.3Ab,Wi 84.6

Source Combined with 3 datasets

Ab,Bo,Wi 86.2

86.38

Ab,Bu,Li 85.4Ab,Bu,Bo 86.6Ab,Bu,Wi 87.2Ab,Li,Bo 85.6Ab,Li,Wi 87.3

Multi-Source with 2 datasets

Ab,Bo 88.7

90.35Ab,Bu 90.9Ab,Li 89.9Ab,Wi 91.9

Multi-Source with 3 datasets

Ab,Bo,Wi 93.6

92.61

Ab,Bu,Li 91.5Ab,Bu,Bo 91.8Ab,Bu,Wi 92.7Ab,Li,Bo 93.5Ab,Li,Wi 92.6

TABLE IICOMPARISON OF CLASSIFICATION ACCURACY AND AVERAGE ACCURACY

(%) ON FOOD PACKAGING TARGET DATASET LISTOWEL (Li)

Method Source Accuracy AverageSingle-Source Ab 86.2 86.20

Source Combined with 2 datasets

Ab,Bo 84.4

86.92Ab,Bu 87.4Ab,Os 87.6Ab,Wi 88.3

Source Combined with 3 datasets

Ab,Bo,Wi 85.2

87.15

Ab,Bu,Os 88.6Ab,Bu,Bo 88.1Ab,Bu,Wi 88.7Ab,Os,Bo 82.1Ab,Os,Wi 90.2

Multi-Source with 2 datasets

Ab,Bo 89.6

90.87Ab,Bu 92.1Ab,Os 91.3Ab,Wi 90.5

Multi-Source with 3 datasets

Ab,Bo,Wi 91.2

92.35

Ab,Bu,Os 92.3Ab,Bu,Bo 92.2Ab,Bu,Wi 92.6Ab,Os,Bo 92.1Ab,Os,Wi 93.7

algorithm and the lack of generalization capabilities, allowingto take steps to enhance the robustness of the model andpotentially increase its accuracy.

IV. EXPERIMENTS AND RESULTS

As explained earlier, some of the benefits of applying theproposed approach to this application is the elimination ofmonotonous and inconsistent manual labor and reducing thehuman errors along with increasing speed and productivity.Firstly, we conducted experiments using labeled single sourcedataset and unlabeled single target dataset for all the sixlocations. The goal of this experiment has been to establisha baseline for images that would be classified as readable

Page 6: Multi-Source Deep Domain Adaptation for Quality Control in

TABLE IIICOMPARISON OF CLASSIFICATION ACCURACY AND AVERAGE ACCURACY

(%) ON FOOD PACKAGING TARGET DATASET BURTON (Bu)

Method Source Accuracy AverageSingle-Source Ab 83.5 83.50

Source Combined with 2 datasets

Ab,Bo 84.1

84.95Ab,Li 84.7Ab,Os 88.3Ab,Wi 82.7

Source Combined with 3 datasets

Ab,Bo,Wi 84.95

86.14

Ab,Li,Os 85.4Ab,Li,Bo 85.1Ab,Li,Wi 84.9Ab,Os,Bo 88.6Ab,Os,Wi 87.9

Multi-Source with 2 datasets

Ab,Bo 90.6

90.60Ab,Li 88.7Ab,Os 92.9Ab,Wi 90.2

Multi-Source with 3 datasets

Ab,Bo,Wi 91.4

92.46

Ab,Li,Os 92.6Ab,Li,Bo 92.9Ab,Li,Wi 91.3Ab,Os,Bo 93.4Ab,Os,Wi 93.2

TABLE IVCOMPARISON OF CLASSIFICATION ACCURACY AND AVERAGE ACCURACY

(%) ON FOOD PACKAGING TARGET DATASET ABBEYDALE (Ab)

Method Source Accuracy AverageSingle-Source Bu 84.6 84.60

Source Combined with 2 datasets

Bu,Bo 82.3

85.15Bu,Li 86.6Bu,Os 83.8Bu,Wi 87.9

Source Combined with 3 datasets

Bu,Bo,Wi 83.7

86.75

Bu,Li,Os 87.5Bu,Li,Bo 88.2Bu,Li,Wi 87.2Bu,Os,Bo 85.6Bu,Os,Wi 88.3

Multi-Source with 2 datasets

Bu,Bo 89.1

89.65Bu,Li 88.7Bu,Os 90.6Bu,Wi 90.2

Multi-Source with 3 datasets

Bu,Bo,Wi 91.3

91.95

Bu,Li,Os 91.5Bu,Li,Bo 91.9Bu,Li,Wi 92.1Bu,Os,Bo 92.3Bu,Os,Wi 92.6

and acceptable according to human standards. Further ex-periments were conducted using our proposed multi-sourcedomain adaptation approach this time, i.e. adapting two labeledsource datasets with a single unlabeled target domain and threelabeled source datasets with a single unlabeled target domain.We compared the obtained results with the baseline singlesource adaptation experiment conducted initially. Additionalexperiments were carried out by combining the two/threesource datasets into a single source dataset. Our overall aimhas been to improve the image classification accuracy inthe provided food packaging datasets, hence allowing forenhanced quality control in the food supply chain. The com-binations tested included all six locations available and were

TABLE VCOMPARISON OF CLASSIFICATION ACCURACY AND AVERAGE ACCURACY

(%) ON FOOD PACKAGING TARGET DATASET BOURNE (Bo)

Method Source Accuracy AverageSingle-Source Ab 82.6 82.60

Source Combined with 2 datasets

Ab,Bu 84.8

84.70Ab,Li 83.5Ab,Os 86.3Ab,Wi 84.2

Source Combined with 3 datasets

Ab,Bu,Li 86.2

85.68

Ab,Bu,Os 84.9Ab,Bu,Wi 85.2Ab,Li,Os 86.9Ab,Li,Wi 84.1Ab,Os,Wi 86.8

Multi-Source with 2 datasets

Ab,Bu 90.2

90.70Ab,Li 91.1Ab,Os 90.3Ab,Wi 91.2

Multi-Source with 3 datasets

Ab,Bu,Li 94.2

92.75

Ab,Bu,Os 92.6Ab,Bu,Wi 92.1Ab,Li,Os 92.3Ab,Li,Wi 92.9Ab,Os,Wi 92.4

TABLE VICOMPARISON OF CLASSIFICATION ACCURACY AND AVERAGE ACCURACY

(%) ON FOOD PACKAGING TARGET DATASET WINDMILL-LANE (Wi)

Method Source Accuracy AverageSingle-Source Ab 83.2 83.20

Source Combined with 2 datasets

Ab,Bo 84.5

83.27Ab,Bu 85.6Ab,Li 82.1Ab,Os 80.9

Source Combined with 3 datasets

Ab,Bu,Li 83.6

84.73

Ab,Bu,Os 86.1Ab,Bu,Bo 86.3Ab,Li,Os 84.1Ab,Li,Bo 84.6Ab,Os,Bo 83.7

Multi-Source with 2 datasets

Ab,Bo 90.3

91.05Ab,Bu 89.5Ab,Li 91.6Ab,Os 92.8

Multi-Source with 3 datasets

Ab,Bu,Li 92.8

92.91

Ab,Bu,Os 93.2Ab,Bu,Bo 92.7Ab,Li,Os 93.1Ab,Li,Bo 92.5Ab,Os,Bo 93.2

conducted in the following manner:• Single Source to Single Target• Combined Source to Single Target• Multi Source to Single TargetWe have trained our model on labeled data from a source

domain to achieve better performance on data from a targetdomain, with access to only unlabeled data in the targetdomain. We used ResNet-50 [39] pretrained on ImageNet [40]as our backbone network and replaced the last fully connected(FC) layer with the task specific FC layer. We have fine-tuned all the convolutional and pooling layers and trained theclassifier layer via back propagation. Adam optimizer with alearning rate of 0.001 was used [41].

Page 7: Multi-Source Deep Domain Adaptation for Quality Control in

A. Single-Source Domain Adaptation

The labeled source and the unlabeled target images havebeen fed through the model where the discrepancy betweenthe pair of datasets was minimized by jointly reducing thefeature discrepancy and the classification losses. We performedmultiple experiments with each location as the target domainand presented the results from table I through to table VI,listing the results of each approach from different sources.

B. Combined Source to Single Target

In the source combined setting, all the source domainsare combined into a single domain, and the experiments areconducted in a traditional single domain adaptation manner.The labeled source and the unlabeled target images have beenfed through the model where the discrepancy between thepair of datasets was minimized by jointly reducing the featurediscrepancy and the classification loss. We have combined andexperimented using• Two sources combined• Three sources combinedIn the first set of experiments, two datasets were combined

into a single source dataset, whereas in the next set threesource datasets were combined into a single source dataset.The results of this method can be seen from table I throughto table VI with each location as the target dataset, listing theresults of each approach from different sources.

C. Multi-Source Domain Adaptation

The labeled sources and the unlabeled target images havebeen fed through the model where the discrepancy betweenthe pair of datasets was minimized by jointly reducing thefeature discrepancy, class discrepancy and the classificationlosses using the techniques described in section III. We per-formed multiple experiments per location as target domain andpresented the results in tables I through to VI for each location.We categorized the experiments as follows:• Multi-Source with two datasets• Multi-Source with three datasets

Initially, we performed the experiments taking two sourcedatasets as input domains and in the second set of experiments,we took three sources as input domains. The results of thismethod can be seen from table I through to table VI witheach location as the target domain, listing the results of eachclassifier from different sources.

From the experimental results we can make two clearobservations: The results of the source combine method arecomparatively better than the single source method whichcould have been the result of data enrichment, indicatingthat combining multiple source domains into single sourcedomain is helpful in most of the tasks. Our multi-sourcedomain adaptation approach significantly outperforms twoof the above mentioned baseline methods on most of thetasks with an average classification accuracy improvement bymore than 6%. We can also note that adding more sourcesand learning domain-invariant features for each source-target

TABLE VIICOMPARISON OF AVERAGE CLASSIFICATION ACCURACY FOR ALL

METHODS

Method Average (%)Single Source 84.14Two Sources Combined 85.05Three Sources Combined 86.13Multi-Source 2-source dataset 90.53Multi-Source 3-source dataset 92.50

pairs along with exploiting the domain-specific class boundaryinformation significantly increased the average classificationaccuracy in dataset.

The overall average accuracy for the single-source approachis 84.14%, two and three source combined approach is 85.05%and 86.13% respectively and finally our multi-source approachwith two and three sources is 90.53 % and 92.50% respec-tively.

V. CONCLUSION AND FUTURE WORK

In this paper we proposed a multi-source adaptation method-ology that attempts to adapt and generalize information fromone dataset to another by automating the verification of theuse-by dates on food packaging datasets. The results shown intable VII demonstrate that the accuracy of the food packagingclassification improved significantly by adding each additionalsource and learning domain-invariant features along withexploiting the class alignment. The proposed approach canalso be applied on wider aspects of food package control,such as the verification of the allergen labeling, barcode,nutritional information and many more. Our future work willextend this study to a much larger dataset, consisting of abouthalf a million food packaging images. In addition we aim atimproving the domain adaptation approach via incorporatingadversarial components.

ACKNOWLEDGMENT

We would like to thank OAL (Olympus Automation Lim-ited) Company for providing us with the data used in thisstudy.

REFERENCES

[1] S. Pearson, D. May, G. Leontidis, M. Swainson, S. Brewer, L. Bidaut,J. G. Frey, G. Parr, R. Maull, and A. Zisman, “Are distributed ledgertechnologies the panacea for food traceability?” Global Food Security,vol. 20, pp. 145–149, 2019.

[2] F. D. S. Ribeiro, L. Gong, F. Caliva, M. Swainson, K. Gudmundsson,M. Yu, G. Leontidis, X. Ye, and S. Kollias, “An end-to-end deep neuralarchitecture for optical character verification and recognition in retailfood packaging,” in 2018 25th IEEE International Conference on ImageProcessing (ICIP). IEEE, 2018, pp. 2376–2380.

[3] S. Suh, H. Lee, Y. O. Lee, P. Lukowicz, and J. Hwang, “Robust shippinglabel recognition and validation for logistics by using deep neuralnetworks,” in 2019 IEEE International Conference on Image Processing(ICIP). IEEE, 2019, pp. 4509–4513.

[4] F. D. S. Ribeiro, F. Caliva, M. Swainson, K. Gudmundsson, G. Leontidis,and S. Kollias, “An adaptable deep learning system for optical characterverification in retail food packaging,” in 2018 IEEE Conference onEvolving and Adaptive Intelligent Systems (EAIS). IEEE, 2018, pp.1–8.

Page 8: Multi-Source Deep Domain Adaptation for Quality Control in

[5] N. Katyal, M. Kumar, P. Deshmukh, and N. Ruban, “Automated detec-tion and rectification of defects in fluid-based packaging using machinevision,” in 2019 IEEE International Symposium on Circuits and Systems(ISCAS). IEEE, 2019, pp. 1–5.

[6] F. D. S. Ribeiro, F. Caliva, M. Swainson, K. Gudmundsson, G. Leontidis,and S. Kollias, “Deep bayesian self-training,” Neural Computing andApplications, pp. 1–17, 2019.

[7] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521,no. 7553, pp. 436–444, 2015.

[8] A. Esteva, A. Robicquet, B. Ramsundar, V. Kuleshov, M. DePristo,K. Chou, C. Cui, G. Corrado, S. Thrun, and J. Dean, “A guide to deeplearning in healthcare,” Nature medicine, vol. 25, no. 1, p. 24, 2019.

[9] F. De Sousa Ribeiro, G. Leontidis, and S. Kollias, “Capsule routing viavariational bayes,” in Proceedings of the AAAI Conference on ArtificialIntelligence, vol. 34, 2020.

[10] J. A. Carballo, J. Bonilla, M. Berenguel, J. Fernandez-Reche, andG. Garcıa, “New approach for solar tracking systems based on computervision, low cost hardware and deep learning,” Renewable energy, vol.133, pp. 1158–1166, 2019.

[11] S. Khan, N. Islam, Z. Jan, I. U. Din, and J. J. C. Rodrigues, “A noveldeep learning based framework for the detection and classification ofbreast cancer using transfer learning,” Pattern Recognition Letters, vol.125, pp. 1–6, 2019.

[12] D. Kollias, M. Yu, A. Tagaris, G. Leontidis, A. Stafylopatis, andS. Kollias, “Adaptation and contextualization of deep neural networkmodels,” in 2017 IEEE Symposium Series on Computational Intelligence(SSCI). IEEE, 2017, pp. 1–8.

[13] F. Caliva, F. S. De Ribeiro, A. Mylonakis, C. Demaziere, P. Vinai,G. Leontidis, and S. Kollias, “A deep learning approach to anomalydetection in nuclear reactors,” in 2018 International Joint Conferenceon Neural Networks (IJCNN). IEEE, 2018, pp. 1–8.

[14] R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, and R. X. Gao, “Deeplearning and its applications to machine health monitoring,” MechanicalSystems and Signal Processing, vol. 115, pp. 213–237, 2019.

[15] F. D. S. Ribeiro, F. Caliva, D. Chionis, A. Dokhane, A. Mylonakis,C. Demaziere, G. Leontidis, and S. Kollias, “Towards a deep unifiedframework for nuclear reactor perturbation analysis,” in 2018 IEEESymposium Series on Computational Intelligence (SSCI). IEEE, 2018,pp. 120–127.

[16] C. Ieracitano, N. Mammone, A. Bramanti, S. Marino, A. Hussain, andF. C. Morabito, “A time-frequency based machine learning system forbrain states classification via eeg signal processing,” in 2019 Interna-tional Joint Conference on Neural Networks (IJCNN). IEEE, 2019, pp.1–8.

[17] M. Langkvist, L. Karlsson, and A. Loutfi, “A review of unsupervisedfeature learning and deep learning for time-series modeling,” PatternRecognition Letters, vol. 42, pp. 11–24, 2014.

[18] G. Onoufriou, R. Bickerton, S. Pearson, and G. Leontidis, “Nemesyst:A hybrid parallelism deep learning-based framework applied forinternet of things enabled food retailing refrigeration systems,”Computers in Industry, vol. 113, p. 103133, 2019. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S0166361519304026

[19] H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller,“Deep learning for time series classification: a review,” Data Miningand Knowledge Discovery, vol. 33, no. 4, pp. 917–963, 2019.

[20] B. Alhnaity, S. Pearson, G. Leontidis, and S. Kollias, “Using deeplearning to predict plant growth and yield in greenhouse environments,”arXiv preprint arXiv:1907.00624, 2019.

[21] K. Kashiparekh, J. Narwariya, P. Malhotra, L. Vig, and G. Shroff,“Convtimenet: A pre-trained deep convolutional neural network for timeseries classification,” in 2019 International Joint Conference on NeuralNetworks (IJCNN). IEEE, 2019, pp. 1–8.

[22] D. Wang, C. Devin, Q.-Z. Cai, F. Yu, and T. Darrell, “Deep object-centricpolicies for autonomous driving,” in 2019 International Conference onRobotics and Automation (ICRA). IEEE, 2019, pp. 8853–8859.

[23] S. Ghosh, A. Pal, S. Jaiswal, K. Santosh, N. Das, and M. Nasipuri,“Segfast-v2: Semantic image segmentation with less parameters in deeplearning for autonomous driving,” International Journal of MachineLearning and Cybernetics, vol. 10, no. 11, pp. 3145–3154, 2019.

[24] M. Wang and W. Deng, “Deep visual domain adaptation: A survey,”Neurocomputing, vol. 312, pp. 135–153, 2018.

[25] M. Long, Y. Cao, J. Wang, and M. I. Jordan, “Learning transferable fea-tures with deep adaptation networks,” arXiv preprint arXiv:1502.02791,2015.

[26] B. Sun and K. Saenko, “Deep coral: Correlation alignment for deepdomain adaptation,” in European Conference on Computer Vision.Springer, 2016, pp. 443–450.

[27] F. Zhuang, X. Cheng, P. Luo, S. J. Pan, and Q. He, “Supervisedrepresentation learning: Transfer learning with deep autoencoders,” inTwenty-Fourth International Joint Conference on Artificial Intelligence,2015.

[28] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation bybackpropagation,” arXiv preprint arXiv:1409.7495, 2014.

[29] J. Shen, Y. Qu, W. Zhang, and Y. Yu, “Wasserstein distanceguided representation learning for domain adaptation,” arXiv preprintarXiv:1707.01217, 2017.

[30] M. Ghifary, W. B. Kleijn, M. Zhang, D. Balduzzi, and W. Li, “Deepreconstruction-classification networks for unsupervised domain adapta-tion,” in European Conference on Computer Vision. Springer, 2016,pp. 597–613.

[31] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, andJ. W. Vaughan, “A theory of learning from different domains,” Machinelearning, vol. 79, no. 1-2, pp. 151–175, 2010.

[32] K. Crammer, M. Kearns, and J. Wortman, “Learning from multiplesources,” in Advances in Neural Information Processing Systems, 2007,pp. 321–328.

[33] I.-H. Jhuo, D. Liu, D. Lee, and S.-F. Chang, “Robust visual domainadaptation with low-rank reconstruction,” in 2012 IEEE Conference onComputer Vision and Pattern Recognition. IEEE, 2012, pp. 2168–2175.

[34] H. Liu, M. Shao, and Y. Fu, “Structure-preserved multi-source domainadaptation,” in 2016 IEEE 16th International Conference on DataMining (ICDM). IEEE, 2016, pp. 1059–1064.

[35] R. Xu, Z. Chen, W. Zuo, J. Yan, and L. Lin, “Deep cocktail network:Multi-source unsupervised domain adaptation with category shift,” inProceedings of the IEEE Conference on Computer Vision and PatternRecognition, 2018, pp. 3964–3973.

[36] Y. Zhu, F. Zhuang, and D. Wang, “Aligning domain-specific distributionand classifier for cross-domain classification from multiple sources,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 33,2019, pp. 5989–5996.

[37] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learningdeep features for discriminative localization,” in Proceedings of the IEEEconference on computer vision and pattern recognition, 2016, pp. 2921–2929.

[38] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Scholkopf, and A. Smola,“A kernel two-sample test,” Journal of Machine Learning Research,vol. 13, no. Mar, pp. 723–773, 2012.

[39] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residualnetworks,” in European conference on computer vision. Springer, 2016,pp. 630–645.

[40] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet largescale visual recognition challenge,” International journal of computervision, vol. 115, no. 3, pp. 211–252, 2015.

[41] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014.