research article a case study of sensor data collection...

13
Hindawi Publishing Corporation International Journal of Distributed Sensor Networks Volume 2013, Article ID 382132, 12 pages http://dx.doi.org/10.1155/2013/382132 Research Article A Case Study of Sensor Data Collection and Analysis in Smart City: Provenance in Smart Food Supply Chain Qiannan Zhang, 1 Tian Huang, 1 Yongxin Zhu, 1 and Meikang Qiu 2 1 School of Microelectronics, Shanghai Jiao Tong University, Shanghai 200240, China 2 Department of Computer Engineering, San Jose State University, San Jose, CA 95152, USA Correspondence should be addressed to Yongxin Zhu; [email protected] Received 6 July 2013; Accepted 10 September 2013 Academic Editor: Yu Gu Copyright © 2013 Qiannan Zhang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Accelerated growth of urban population in the world put incremental stresses on metropolitan cities. Smart city centric strategies are expected to comprise solutions to sustainable environment and urban life. Acting as an indispensable role in smart city, IoT (Internet of ings) connects the executive ability of the physical world and the intelligence of the computational world, aiming to enlarge the capabilities of things in real city and strengthen the practicality of functions in cyber world. One of the important application areas of IoT in cities is food industry. Municipality governors are withstanding all kinds of food safety issues and enduring the hardest time ever due to the lack of sufficient guidance and supervision. IoT systems help to monitor, analyze, and manage the real food industry in cities. In this paper, a smart sensor data collection strategy for IoT is proposed, which would improve the efficiency and accuracy of provenance with the minimized size of data set at the same time. We then present algorithms of tracing contamination source and back tracking potential infected food in the markets. Our strategy and algorithms are evaluated with a comprehensive evaluation case of this IoT system, which shows that this system performs well even with big data as well. 1. Introduction As urban population and corresponding needs of living necessities keep expanding in modern cities, much higher requirements are set for municipality governors to manage all aspects in urban living. e performance of cities currently depends on not only the city’s endowment of hardware infrastructure, but also on the availability and quality of knowledge communication and social infrastructure [1]. Smart city mainly focuses on applying the next-generation information technology to all fields of life, embedding sensors to all physical objects in every corner of the world [2], and forming the Internet of ings (IoT) via the Internet. en, we can integrate the internet of things through super computers and cloud computing [3, 4]. IoT refers to uniquely identifiable objects and their virtual representations in an Internet-like structure. It gives the researchers on smart cities a wider platform and much more possibilities [5] and is expected to substantially support sus- tainable development of future smart cities [6]. e aim of IoT is to create a distributed network of intelligent sensor nodes which can measure many parameters to manage the city more efficiently [7]. e term, Internet of ings, was firstly used by Kevin Ashton in 2011 [8]. With the rapid development of Radio-frequency Identification (RFID), people and objects in the physical world are equipped with all kinds of sensors and radio tags to authenticate their identity and status [9]. e introduction of IoT makes people’s daily life easier, safer, and more interesting. Business may no longer run out of stock or generate waste products, as involved parties would know which products are required and consumed. Traffic conditions can be achieved directly via cell-phones or GPS, so that we can safely keep away from traffic jams or even accidents. All kinds of data are collected and analyzed to entertain people on the internet, for example, constellation interpretation or social hotspots. Provenance, which was originally used in works of arts, refers to the chronology of the ownership or location of a historical object [10]. With the rapid development of IoT and cloud computing, provenance has been studied in plenty of

Upload: others

Post on 11-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article A Case Study of Sensor Data Collection ...downloads.hindawi.com/journals/ijdsn/2013/382132.pdf · smart cities. Our major contributionsare as follows. We propose

Hindawi Publishing CorporationInternational Journal of Distributed Sensor NetworksVolume 2013 Article ID 382132 12 pageshttpdxdoiorg1011552013382132

Research ArticleA Case Study of Sensor Data Collection and Analysis inSmart City Provenance in Smart Food Supply Chain

Qiannan Zhang1 Tian Huang1 Yongxin Zhu1 and Meikang Qiu2

1 School of Microelectronics Shanghai Jiao Tong University Shanghai 200240 China2Department of Computer Engineering San Jose State University San Jose CA 95152 USA

Correspondence should be addressed to Yongxin Zhu zhuyongxinsjtueducn

Received 6 July 2013 Accepted 10 September 2013

Academic Editor Yu Gu

Copyright copy 2013 Qiannan Zhang et alThis is an open access article distributed under the Creative CommonsAttribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Accelerated growth of urban population in theworld put incremental stresses onmetropolitan cities Smart city centric strategies areexpected to comprise solutions to sustainable environment and urban life Acting as an indispensable role in smart city IoT (InternetofThings) connects the executive ability of the physical world and the intelligence of the computational world aiming to enlarge thecapabilities of things in real city and strengthen the practicality of functions in cyber world One of the important application areasof IoT in cities is food industry Municipality governors are withstanding all kinds of food safety issues and enduring the hardesttime ever due to the lack of sufficient guidance and supervision IoT systems help to monitor analyze and manage the real foodindustry in cities In this paper a smart sensor data collection strategy for IoT is proposed which would improve the efficiency andaccuracy of provenance with the minimized size of data set at the same time We then present algorithms of tracing contaminationsource and back tracking potential infected food in the markets Our strategy and algorithms are evaluated with a comprehensiveevaluation case of this IoT system which shows that this system performs well even with big data as well

1 Introduction

As urban population and corresponding needs of livingnecessities keep expanding in modern cities much higherrequirements are set formunicipality governors tomanage allaspects in urban living The performance of cities currentlydepends on not only the cityrsquos endowment of hardwareinfrastructure but also on the availability and quality ofknowledge communication and social infrastructure [1]Smart city mainly focuses on applying the next-generationinformation technology to all fields of life embedding sensorsto all physical objects in every corner of the world [2]and forming the Internet of Things (IoT) via the InternetThen we can integrate the internet of things through supercomputers and cloud computing [3 4]

IoT refers to uniquely identifiable objects and their virtualrepresentations in an Internet-like structure It gives theresearchers on smart cities a wider platform and much morepossibilities [5] and is expected to substantially support sus-tainable development of future smart cities [6]The aimof IoT

is to create a distributed network of intelligent sensor nodeswhich canmeasuremany parameters tomanage the citymoreefficiently [7] The term Internet of Things was firstly usedby Kevin Ashton in 2011 [8] With the rapid development ofRadio-frequency Identification (RFID) people and objectsin the physical world are equipped with all kinds of sensorsand radio tags to authenticate their identity and status [9]The introduction of IoT makes peoplersquos daily life easier saferand more interesting Business may no longer run out ofstock or generate waste products as involved parties wouldknow which products are required and consumed Trafficconditions can be achieved directly via cell-phones or GPSso that we can safely keep away from traffic jams or evenaccidents All kinds of data are collected and analyzed toentertain people on the internet for example constellationinterpretation or social hotspots

Provenance which was originally used in works of artsrefers to the chronology of the ownership or location of ahistorical object [10] With the rapid development of IoT andcloud computing provenance has been studied in plenty of

2 International Journal of Distributed Sensor Networks

areas beyond arts among which tracing the provenance ofan object or entity is a major aspect The main purpose oftracing the provenance is to provide contextual and circum-stantial evidence for its original production or discovery byestablishing as far as practicable its later history especiallythe sequences of its formal ownership custody and places ofstorage

Industrialization and rapid growth of human demandshave made food supply chain in modern cities move beyondregional and include global participation in importing andexporting According to the US Census the importedproportion of US food consumption has grown from 79to 96 between 1997 and 2005 roughly a 22 gain [11] Themomentumof changes grows even faster nowadaysThe scaleand heterogeneity of food supply chain make the capacity ofexisting regulations and approaches limitedAt this point IoTis a must for us as a platform to monitor and manage foodsupply chain

In this paper we discuss a case in tracing provenance offood supply chain which is a feasible application of IoT insmart cities Our major contributions are as follows

We propose Self-adaptive Dynamic Partition Sampling(SDPS) Strategy to collect data from sensors which wouldmitigate the workload with minor loss of tracing accuracyWithout loss of performance our strategy needs only a smallportion of end marketsrsquo samples from huge volume of rawmaterials and products along all levels in the food supplychain form This would be an interesting discovery as smartsampling is not explored intensively to manage data in IoTsystems for food supply chains though data collection andmodeling have been studied in IoT domain before

As a case of SDPS applications we introduce tracing andbacktracking algorithms to achieve provenance reasoning infood supply chain These methods can pinpoint the con-tamination source in the network and identify the potentialproblematic products in the markets We are able to sample asmall portion of food only in the end markets and maintainsufficient accuracy of provenance tracing over the whole IoTsystem at the same time

We further visualize the data flow and contaminationconditions for intuitive analysis Some work on provenancereasoning has already been realized and well modeled [12]but no one has ever explicitly modeled contamination condi-tions in cities

The rest of the paper is organized as follows In Section 2wewill present existing relatedworks Section 3 briefly gives aview of the systemrsquos hierarchy Section 4 raises the algorithmsand approaches in detail Results evaluation is presented inSection 5 and conclusion is drawn in Section 6 respectively

2 Related Work

Provenance issues have been studied by researchers in theareas of computer systems as well as management appli-cations in diversified information systems which comprisepart of the information technology (IT) infrastructure ofsmart citymanagementWikipedia on smart city [1] proposeda prototype of Provenance-Aware Storage System (PASS)

which could automatically collect provenance at the oper-ating system level Hasan et al [13] focused on a thoroughanalysis of threats to provenance systems Both of theirmethods are metadata models of provenance but they havenot explained how to exploit and process these data model todraw informative conclusions In the context of food safetymanagement information systems are important to assistdecision making in a short time frame potentially allowingdecisions to be made in real time

In smart city management domain food safety issuescaused by contamination have not been studied adequatelyin terms of modeling and visualization McMeekin et al [14]introduced the technique of information systems used in thesafety management of food supply chain A stochastic statetransition simulation model [15] as described to simulate thespread of Salmonella from multiplying through slaughterwith special emphasis for critical control points to preventor reduce Salmonella contamination Wein and Liu [16]developed a mathematical model of a cows-to-consumerssupply chain associated with a single milk-processing facilitythat is the victim of a deliberate release of botulinum toxinQin established a quality management model for food supplychain based on game theory [17]

Wehavemodeled and discussed about traceability in foodsupply chain in [18] In the current work algorithms arefurther optimized for big data and self-correction strategiesare applied tomake sampling and the whole scheme adaptiveAlso contamination conditions can be visualized tomake theIoT system more intuitive

Sampling strategies in IoT systems have attracted inten-sive studies [19ndash21] however some issues still remainunsolved for example how to exploit a small sampling sizefrom huge volume of food supplies without loss of accuracy

3 Modeling IoT System Structure forFood Supply Chains

With the growing size and demands of modern cities thestructure of food supply chain has become huge and compli-cated Moreover due to huge volume of sensors attached toitems travelling along it it is usually infeasible to collect andprocess sensing data from all the food in every level Based onthose concerns to speed up provenance solutions we onlygather a small part of sensor data on the end nodes in thechain So how to reckon on this small portion of sensor datato figure out contamination source appears to be a pendingissue in our strategy Additional concerns also arise fromthis problem regarding loss of accuracy due to small samplevolume and performance of tracing scheme We will proposeour heuristic approach and algorithms to tackle this problemlater in this paper with additional thoughts on algorithmcomplexity

31 Physical Structure of IoT Systems for Food Supply ChainsWe have sensors at every end node in the supply chain whichprovide us comparable information to determinewhether theproduct is safe or notWith the sensor data and their physical

International Journal of Distributed Sensor Networks 3

Data bus

Warehouse

Sensor data Server Display and visualization

SensorstorageSensorstorageSensorstorage SensorstorageSensorstorage

MarketTransportationFarm Transportation

Figure 1 An illustration of IoT systemrsquos physical structure modeled for food supply chain

connection food supply chain forms an internet of things(IoT) network

In reality real-time decision making is critical in foodsafety issues If contamination source is unknown for onemore hour more people will be exposed to danger Besidesin food industry examination is more or less material-consuming We cannot take part of every piece of food inevery stage of the chain no matter there are problems or notto the sensors for physical and chemical check as it wouldbring food companies a great economical loss As a resultwe would like to sample food only in the end markets witha small portion

After that with this small part of products wemanipulatethese data to get a whole picture over contamination condi-tions in the entire network such as the contamination sourceand the other involved foods that need to be recalled

The Physical structure of the system is shown in Figure 1

32 Modeling of Food Supply Chain Generally food supplychain can be divided into seven stages plantationcultivationslaughtering transportation inventory wholesale retailingand customers Although the chain is heterogeneous we canview it as the flow through the combination and repetition ofthose stages based on certain rules

Firstly it is often impossible for us to know in advancewhich physical position (eg vehicle or warehouse) a piece offood would be in In other words the trend of food is almostrandom

Secondly food can access a particular location morethan once and a location can play different roles in themanufacturing of one food product For instance pork canbe carried by the same vehicle before and after slaughteringwhich will generate a circle if we view the chain as a flow

Thirdly not all the food in the contamination source willbe infected The percentage of infection is determined by

the type of epidemic disease temperature density and otherobjective aspects

Finally other locations which are not the contaminationsources may also generate new contaminated food due tocross contamination The classic Reed Frost Model hasbecome a standard to model cross contamination conditions[22] Based on explicit contamination discussed in thismodel we introduce implicit infection to get the averageinfection possibility 119875 in certain batch and stage location by

119875 = 1 minus (1 minus 119875exp)exp(1 minus 119875imp)

imp (1)

We will use as the mark of number in this paper In (1)119875exp represents the possibility that infection happens if twopieces of food touch each other directly and one of themhas been explicitly infected and exp is the number of foodproducts that have been explicitly infected 119875imp and impmean the same respectively in implicit infection cases Foodthat has been implicitly infected will not infect the othersbut it will be counted as contaminated ones according to itsphysical and chemical characteristics As implicit infectionhas been considered here the model is much more realisticSome scholars have published several extensionmodels basedon Reed Frost Model [23] however we only take implicitinfection into consideration since (1) can describe our casebetter with sufficient accuracy

Food supply chain is viewed as a Directed Acyclic Graph(DAG) inwhich each node stands for one location keeping orprocessing some batches of food for a periodDAGconstructsthe relationship within the internet of things based on theorder and dependency among all the sensor data The graphis acyclic since we use batch number working as a timestamp that can distinguish stages in the chain In this wayalthough food may be carried by the same vehicle in morethan two stages they have different batch numbers which willbe regarded as two nodes in a DAG

4 International Journal of Distributed Sensor Networks

(1) Input type of foodborne disease T(2) Output sample set(3) Training Phase(4) Look up contamination probability p according to T(5) Configuration Topology information (contamination intervals) p(6) Sampling Phase(7) Sample a small portion n(8) BEST = infin(9) while 119899 le BEST do(10) Compute posterior probability based on Bayesian Estimation(11) 119875(119861 | 119860

119894) = (119896

119899)119886119896

119894(1 minus 119886

119894)119899minus119896

(12) 119875(119860119894| 119861) =

119875(119861 | 119860119894)119875(119860119894)

sum119899

119895=1119875(119861 | 119860

119895)119875(119860119895)

(13) for 119894 = 1(contamination intervals) do(14) if 119875interval

119894ge BEST then

(15) BEST = 119875interval119894

(16) end if(17) end for(18) If BEST le 80 then(19) Find the best sample rate according to its relationship with BEST(20) If 119899 le BEST then(21) Sample (BEST-n) products(22) else(23) break(24) end if(25) end if(26) end while

Algorithm 1 Sampling algorithm

Records of each location and product are documentedrespectively Location records include batch numbers thenumber of sampled products labeled as GOOD (uninfected)or BAD (infected) in this batch and the IDs of pollutedsamples in this batch Product records contain the informa-tion of examination result for a piece of food the orders ofbatches and locations the product passed and the pointers tothese location records The pointers serve as the connectionbetween those two data structures

33 Logic Structure of IoT System for Food Supply ChainsAs shown in Figure 2 the hierarchy of this IoT systemcontains four layers data collection and management layerintelligent processing layer graphic representation layer andself-correction layer Specific approachingmethods and algo-rithms will be discussed in the following sections

4 Heuristic Provenance Approach andTracing and BackTracking Algorithms

In this section detailed approaches and algorithms to solveprovenance issues in food supply chain are introducedFirstly we present a Self-adaptive Dynamic Partition Sam-pling (SDPS) Strategy to improve the efficiency and intel-ligence of sensor data collection and management Thentracing and backtracking algorithms are discussed respec-tively to catch the contamination source and dig out potentialinfected food products still circulating in themarkets Finallywe introduce Self-CorrectionMethod tomaintain andupdate

Data collection and management layer

Intelligent processing layer

Graphic representation layer

Self-correction layer

Training LUT Sampling Sample more

Tracing origin

Back-tracking

Datavisualization

Confidence metric metric

Success

Figure 2 Logic structure of IoT systemmodel for food supply chain

the system which would make the system adaptive andflexible to certain applications

41 Self-Adaptive Dynamic Partition Sampling Strategy Aswe mentioned in Section 31 to improve the efficiency ofmanipulating sensor data Self-adaptive Dynamic PartitionSampling Strategy (SDPS) is introduced which cuts downthe number of samples in a great deal The pseudocode forsampling algorithm is shown in Algorithm 1

International Journal of Distributed Sensor Networks 5

411 Partition Strategy Partition strategy makes samplesmore general and representative In this case the systemdivides the whole group of products into several partsaccording to the batches they belong to in the end marketsThe sampled volume for batch 119899 of market 119898 is determinedby

sample(119898119899)= sampletotal times

products(119898119899)

sum119872

119898=0sum119873

119899=0products

(119898119899)

(2)

Here 119872 and 119873 are the total number of end marketsand batches in network Subscripts (119898 119899) and total mean thenumber of samples or products in batch 119899 market 119898 and inall batches for all end markets respectively

412 Dynamic Strategy Dynamic strategy based on Bayesianestimation is adopted to achieve minimal sample vol-ume According to the infection probability of a particularpathogen determined by medical experiments the modelcan be trained to gain the distribution of total infectionprobability within the whole food supply network which isthe prior probability The probability density function of thedistribution is presented as a function of infection probabilityintervals

On the other hand after sampling a small part theinfectious rate of the samples the posterior probabilitiescan be obtained If 119896 infected products are found within 119899samples under a certain contamination percentage intervalwith prior probability of 119886

119894 conditional probability is

obtained by binomial distribution in the following

119875 (119861 | 119860119894) = (119899

119896) 119886119896

119894(1 minus 119886

119894)119899minus119896=119899

119896 (119899 minus 119896)119886119896

119894(1 minus 119886

119894)119899minus119896

(3)

Here119860119894means the event that the contamination percent-

age of the whole products falls into the 119894th interval with aprior probability of 119886

119894 and 119861means the event that we find 119896

contaminated products in 119899 samplesAfter that Bayesian Formula (4) is applied to combine

prior probabilities with posterior probabilities and get revisedprobabilities which describe the specific environment betteras follows

119875 (119860119894| 119861) =

119875 (119861 | 119860119894) 119875 (119860

119894)

sum119899

119895=1119875 (119861 | 119860

119894) 119875 (119860

119894) (4)

413 Self-Adaptive Strategy The tracing algorithmwhichwillbe discussed in the next section has some requirements forits input sampling data If the ratio of infected products touninfected ones is too high the tracing algorithm performspoorly as there are not enough healthy samples to exclude thesuspicions On the contrary if the ratio is too low the noiseintroduced by sampling process may dominate the resultIn these two extreme cases more samples than other casesshould be tested to improve the accuracy Hence under eachsampling rate there is a relationship between the best tracingalgorithm accuracy and pollution proportion interval for

a particular topology Given all the relationships in a specificinterval for economical reasons this strategy picks up thesmallest sampling rate that achieves certain requirements(eg 90 accuracy)Then we sample the food in endmarketagain under that rate and update Bayesian Estimation to findif the sampling rate has met the requirements

42 Tracing Algorithm Pseudocodes of tracing algorithm areshown inAlgorithm 2After sampling and sensing we add upthe number of infected and uninfected food products passingevery location and batch They are stored in two variablesGOOD and BAD for each place

Supposing that the samples properly reveal the conditionof the whole products set the criterion to find the suspectsources is set as GOOD lt 120576 and BAD gt 0 It is feasiblebecause the contamination source would be the primaryspot generating polluted food and the number of uninfectedsamples is limited there 120576 regarded as the error factoris a small integer which enables the algorithm to remainvalid when not all the food passing the source is infectedor there is some disturbance caused by nonideal problems(eg imperfect sampling) The specific 120576 value is decided bythe samplesrsquo number and infection probability of pollutantsource which can be roughly represented as follows

120576 =samples times pollution probability

batches (5)

This criterion is not strict enough to pinpoint only onecontamination source as the result So extra work shouldbe applied to eliminate these confusion suspects First of allto improve the speed of the algorithm suspects with smallBAD value will be excluded Then the system will generate aSuspect Tree composed of the suspected locations and batchesaccording to their order in the food supply chain After thattraverse the Suspect Tree layer by layer and the first node thatmeets the same criterion will be picked up as the root sourcesince the original contaminant is always on the top over crosscontaminant in the tree

43 BackTracking Algorithm In order to judge the perfor-mance of backtracking algorithm Hit Rate and False AlarmRate are put forwards to denote the algorithmrsquos ability ofcapturing infected products and the probability of reckoninggood products as infected ones by mistake Supposing thetotal number of products and infected products are119873 and 119868respectively and the algorithm selected 119899 potentially infectedproducts including 119894 infected ones we define Hit Rate as 119894119868andFalse AlarmRate as (119899minus119894)(119873minus119868) Although theoreticallyboth high Hit Rate and low False Alarm Rate are expectedthere is a tradeoff between them

The backtracking algorithm is described in Algorithm 3

44 Self CorrectionMethod Twometrics are defined to judgethe performance of the system and provide reference for latterparametersrsquo settings

6 International Journal of Distributed Sensor Networks

(1) Input samplesrsquo spatial information and examination results(2) Output contamination origin(3) 119896 = 0(4) for 119894 = 1samples do(5) for 119895 = 1(locationsbatches on sample

119894rsquos path) do

(6) if sample119894is infected then

(7) sample119894location

119895batch

119895BAD++

(8) else(9) sample

119894location

119895batch

119895GOOD++

(10) end if(11) end for(12) end for(13) for 119898 = 1(locationsbatches in the entire chain) do(14) if location

119898batch

119898GOOD le 120576 ampamp location

119898batch

119898BAD ge 0 then

(15) Record locationbatch into suspect[k](16) k++(17) end if(18) end for(19) Exclude suspects wsmall BAD(20) if suspect ge 1 then(21) Get food IDs passed all suspects(22) if ID ge 0 then(23) Get food IDs passed at least one suspect(24) end if(25) end if(26) Construct ldquoSuspect Treerdquo of batches according to the paths of these IDs(27) for 119899 = 1(tree nodes) do(28) if (suspect[n]location

119899batch

119899GOOD le 120576 ampamp

(29) suspect[n]location119899batch

119899BAD ge 0) then

(30) origin = suspect[119899](31) end if(32) end for

Algorithm 2 Tracing algorithm

(1) Input contaminated samples set Re-check(2) Output infected food products set Bad(3) while (1) do(4) Construct a tree of locationbatch according to(5) the paths of contaminated products in Re-check(6) Traverse the tree DFS(7) Record all nodes in Bad(8) Empty Re-check(9) if nodelocationbatch is new then(10) Find the food IDs passed these nodes(11) Sensor them(12) if food is contaminated then(13) Put its ID in Re-check(14) end if(15) else(16) break(17) end if(18) end while

Algorithm 3 Back tracking algorithm

International Journal of Distributed Sensor Networks 7

Farm

Vehicle 1

Vehicle 1

Vehicle 2

Vehicle 2 Vehicle 3

Factory

Vehicle 4 Vehicle 5

Market 2Market 1 Market 3

Figure 3 Topology (DAG) of a food supply chain case for evaluation

Confidence Metric (CM) is defined as the differencebetween prior probability and posterior probability as fol-lows

CM =10038161003816100381610038161003816119875post minus 119875pri

10038161003816100381610038161003816

119875pri (6)

119875post and 119875pri are the posterior and prior possibilitiesrespectively If CM is small we are more confident that thedataset suits the model trained previously and vice versaThus we can correct (4) and combine prior and posteriorprobabilities to get more reasonable infection probabilities119875comb of the entire network as follows

119875comb =119875 (119860119894| 119861) + CM times 119875 (119860

119894)

1 + CM (7)

Success Metric (SM)measures the accuracy of the systemas follows

SM = successtotal (8)

It is defined as the ratio of successfully detected times tototal tested times With lower SM the criterion of samplingwould be set stricter and vice versa

These two variables help to adjust sampling algorithmslightly to fit it into certain environment and applications

45 Timing and Space Complexities Suppose there are 119898samples 119899 stages and 119897 batches for all locations in a food

supply chain timing complexities of tracing and back-tracking algorithms are 119874(119898119899) + 119874(119897) and 119874(119898) + 119874(119897)respectively There exists a tradeoff between tracing accuracyand time consumption in SDPS Obviously more samplesmean longer time and better knowledge of the networkCompared with the time spent on chemical testing andsensing time consumption in SDPS is negligible

A piece of record is required for every location and everyfood products so space complexity of the whole IoT systemis 119874(119886 + 119887) where 119886 is the number of food products and 119887 isthe number of locations in the network

5 Evaluation Results and Analysis

We set up two specific cases (Figures 3 and 4) to evaluate theproposed systemThe first case gives a general evaluation andshows that our SDPS scheme outperforms other samplingmethods while the second one focuses on the performanceon large system and big data

51 Experimental Setup In Figure 3 note that vehicles 1 and2 serve as the transportation node both from farm to factoryand factory to market This makes the model closer to realityas some locations in the chain can act as different charactersin food procession

The configuration of the two cases is listed in Table 1Every location in the chain holds 25 and 500 batches in caseof 1 and 2 respectively Total of 60 and 800 thousand of food

8 International Journal of Distributed Sensor Networks

Farm 1

Vehicle 2 Vehicle 3

Vehicle 3 Vehicle 4 Vehicle 5 Vehicle 6

Factory1

Vehicle 1 Vehicle 2

Market 2Market 1 Market 3

Farm 2

Vehicle 7

Factory2

Farm 3

Vehicle 1

Figure 4 Topology (DAG) of a food supply chain case for evaluation (with big data)

products are circulating in the network Which location apiece of food will pass is absolutely random

We build the simulator in C++ which reads in theconfiguration files that describe supply chain topologiesgenerates simulated contamination behavior and senses dataalong food supply chains In simulation the contaminationsource and sampling volume can be set by the users explicitlyFood contamination rules are set as that food can be infectedby passing (1) contamination source directly or (2) crossinfection spots indirectly according to our revised Reed FrostModel

To make this paper compact we only show the trainingprocess of the first model here In the training process weget prior probabilities which highly depend on the topologyand configuration of the chain Figure 5 shows the distribu-tion of infected proportion under different contaminationprobabilities after 300000 tests Under each contaminationprobabilities (depend on contamination type) the actualportion of pollution is almost Gaussian distributed which isthe same as what we discussed in Section 4 Note that thevalue of 119909 axis (119909) should be transferred to the pollutionproportion interval by the following function [(119909 minus 1) lowast4 119909 lowast 4) since the total space is divided into 25 sectionsThe relation between the productsrsquo contamination percentageand the tracing algorithm accuracy under different samplerates is shown in Figure 6 For clarity only 3 samplingrates are tested 3 5 and 10 To make sample strategymore efficient more rates can be evaluated in real situationsFigure 6 confirms the hypothesis we proposed the source isdifficult to be detected if only a small or too large part of foodis contaminated In bothways the flowpath of contaminationis hidden easily

Table 1 Configuration of the two cases

Case 1 Case 2batcheslocation 25 500total products 60000 800000Flow rule Random Random

Table 2 Simulation results of back tracking algorithm

Hit rate False alarm rate96 3

52 Evaluation Results Figure 7 shows the accuracy of thetracing algorithm In different probability of infection in thewhole chain the accuracy can achieve no less than 80 InFigures 8 and 9 partition and dynamic strategy in SDPSare tested respectively In all probabilities of infection casespartition strategy has higher tracing accuracies than thatof global sampling strategy (118 227 109 and 41higher with the infection probabilities of 30 67 80 and90) And compared with sampling in fixed rates (3 5and 10) dynamic method achieves higher tracing accuracyeven with a lower average sampling rate of 78

For backtracking part the result of simulation is shown inTable 2 Both Hit Rate and False Alarm Rate are satisfactory

Case 2 has a large data scale We also fetch a few eachtimes and let the system tell us the amount of samples to getnext time based on (4) Systemrsquos actual sampling rate turnsto be 78 As in Figure 10 the accuracy of tracing algorithmis higher than 80 as well which shows that our proposedapproach works well with big data

International Journal of Distributed Sensor Networks 9

106 8420 12 14 16 18 20 22 24 260

10000

20000

30000

40000

50000

60000

70000

80000

90000

Dist

ribut

ion

of 3

000

00 te

sts

Contamination probability is 30Contamination probability is 67Contamination probability is 80Contamination probability is 90

Pollution proportion interval [(x minus 1)lowast4 xlowast4]

Figure 5 Prior probabilities distribution under different contami-nation probabilities 30 67 80 and 90

0

10

20

30

40

50

60

70

80

90

100

Accu

racy

of t

raci

ng al

gorit

hm (

)

Pollution proportion interval ()

3510

4-5

6-7

8-9

10-1

1

12-1

3

14-1

5

20ndash2

5

30ndash3

5

40ndash4

5

50ndash5

5

60ndash6

5

70ndash7

5

Figure 6 The relationship of tracing algorithm accuracy andpollution proportion intervals under different sample rates 3 5and 10

53 Contamination Visualization With the tool introducedby [24] and the information we choose to record SDPSprovides sampling data that can be represented visually afterbeing tested by sensors The Figures 11 and 12 show the dataflow of infected and uninfected food products respectivelyIn this case we use the configuration in Case 1 and setthe contamination source to be the 4th batch in factory inadvance

10050

0 20 40 60 80

60

70

80

90

100

Probability of infection ()

Trac

ing

accu

racy

()

Figure 7 Accuracy of tracing algorithm with different probabilitiesof infection

40 60 8050

60

70

80

90

100

Probability of infection ()

Partition random samplingGlobal random sampling

Trac

ing

accu

racy

()

Figure 8 Simulation results of partition sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

The vertical lines marked with locations and batchesnumbers represent the nodes in data flow Lines goingthrough these nodes are the traces of foodproducts As shownin those figures most of the infected food while none of theuninfected food passed the 4th batch of factory (the nodewith a circle around it) So it has a great chance to be thesource of contamination which is also proven by our tracingsystem

SDPS makes data concise but still comprehensive whichfacilitates visualization tool displaying the useful informa-tion

Apart from aiding detecting contamination source visu-alization can also help to know contamination conditions(eg contamination severitydistribution) of the whole IoTsystem better For example a well-managed warehouse or acity with lower temperature may lead to less contamination

10 International Journal of Distributed Sensor Networks

Probability of infection ()

Trac

ing

accu

racy

()

3 sampling proportion5 sampling proportion

10 sampling proportionDynamic sampling

40 60 8050

60

70

80

90

100

Figure 9 Simulation results of dynamic sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

Probability of infection ()

Trac

ing

accu

racy

()

500 30 60 90

60

70

80

90

100

Figure 10 Simulation results of tracing accuracywith big data underdifferent infection probabilities

Information of these kinds can be directly read from visual-ization images and help manufacturers to design their foodsupply chain more scientifically

Visualization of the contamination condition in IoTsystemmakes the provenance reasoning in food supply chainintuitive and informative

54 Performance Estimation in Real Situations In realityfood supply chain is more complicated A lot of factors suchas specific food type environment temperaturemanufactureprocess and other parameters could make the chain difficultto predict To implement our strategy into real situationsthose factors should be concerned and some parametersshould be adjusted accordingly

The factor that influences the behavior of food supplychain the most is the type of food Different food has its

own characteristics which may dominate the model theprovenance procedure and expecting results Firstly foodtype can decide the possible contamination source In (1)119875exp and 119875imp are related with the virus that spreads amongfood For example avian influenza virus which is a commoninfectious disease among poultry began to be contiguousamong human beings After the mutation infectious abilityof this virus grew significantly As a result 119875exp and 119875impof avian influenza virus in chickens would also increaseSecondly food type is a deciding factor for its storagepattern and quality guarantee period Some canneddrinks arestacked layer by layer separately so they would not got crosscontaminated However raw meat is generally kept togetherwhich provides an easy environment for virus to spreadThirdly the state of food is also dominant in provenance Onepiece of food in solid state can be seen as a unit while liquidfood like yogurt could be ruined by only one deteriorateddrop In this way for yogurt the sampling process couldbe very different as spatial position should be taken intoconsideration and virusrsquo behavior of liquid should also bestudied

Besides food type there are other factors playing impor-tant roles in real situation Food in summer is more likelyto turn rotten than winter some manufacturing factoriesare more hygienic than others with time passing food mayget easier to be infectious and the types or dosages offood addictives may make the contamination process sloweddown

Although different food supply chain can behave var-iously our proposed strategy can cover most of the casesbecause it obeys the general model of food supply networkand epidemiological principles

6 Conclusion

In this paper we present a heuristic approach to tracingcontamination sources in large IoT systems for complicatedfood supply chains which is a critical issue in metropoli-tan life In our approach Self-adaptive Dynamic PartitionSampling (SDPS) Strategy was proposed to collect data forsensors whose input is only a small portion of end marketsamples from huge volume of samples along food supplychains The approach was illustrated with a case study ofIoT system about provenance in food supply chain whichcan efficiently stop the outbreaks of foodborne disease Withthe intelligent SDPS Strategy objects tested by sensors arethe most reasonable portion of the entire products set Theefficiency is highly improved and the accuracy stays almostthe same as sensing all the objects at the same time SDPSkeeps the integrity of information and approaches a nearlyreal-time examination Also we present a tracing algorithmto find the contamination sources of food supply chainsand a backtracing algorithm to provide strategy for recallingproblematical food undiscovered in the chain It is indicatedin simulation results that our SDPS scheme can achieve up tothe tracing accuracy of 978with a smaller average samplingpercentage compared with traditional global random sam-pling We managed to sample a small portion of food only

International Journal of Distributed Sensor Networks 11

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 11 Visualized data flow of infected food products in food supply chain

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 12 Visualized data flow of uninfected food products in food supply chain

in the end market without loss in accuracy of provenancetracing over the whole IoT system In addition our analyticdata and visualized images can clearly model contaminationconditions in food supply chain within the context of IoTsystem This will give the clients an intuitive impression onfood supply networks in a city

In this paper we assume that all provenance informationof food products is hosted by a centralized repository andthese provenance metadata are organized in a uniform man-ner Our future work is to further make practical implemen-tation of the provenance of food supply chain in a communityas our testing bed for megacity management

12 International Journal of Distributed Sensor Networks

Acknowledgments

This paper is sponsored in part by the Shanghai Interna-tional Science and Technology Collaboration Program underGrant 13430710400 and Campus for Research Excellenceand Technological Enterprise (CREATE) program of Singa-pore National Research Foundation under the joint projecton Energy and Environmental Sustainability Solutions forMegacities (R-706-000-101-281) by Shanghai Jiao Tong Uni-versity (SJTU) and National University of Singapore (NUS)ProfessorQiu is partially supported byNSFCNS-1249223 andNSFC 61071061

References

[1] ldquoWikipedia on smart cityrdquo httpenwikipediaorgwikiSmartcity

[2] M Qiu and E H M Sha ldquoCost minimization while satisfy-ing hardsoft timing constraints for heterogeneous embeddedsystemsrdquoACMTransactions on Design Automation of ElectronicSystems vol 14 no 2 article 25 2009

[3] J Li M Qiu Z Ming G Quan X Qin and Z Gu ldquoOnlineoptimization for scheduling preemptable tasks on IaaS cloudsystemsrdquo Journal of Parallel and Distributed Computing vol 72no 5 pp 666ndash677 2012

[4] K Su J Li and H Fu ldquoSmart city and the applicationsrdquoin Proceedings of the International Conference on ElectronicsCommunications and Control (ICECC rsquo2011) pp 1028ndash1033Zhejiang China September 2011

[5] X Tang J Pu K Cao Y Zhang and Z Xiong ldquoIntegratedextensible simulation platform for vehicular sensor networksin smart citiesrdquo International Journal of Distributed SensorNetworks vol 2012 Article ID 860415 10 pages 2012

[6] P Vlacheas R Giaffreda V Stavroulaki et al ldquoEnabling smartcities through a cognitive management framework for theinternet of thingsrdquo IEEE Communications Magazine vol 51 no6 pp 102ndash111 2013

[7] A Asin ldquoSmart cities from libelium allows systems integratorsto monitor noise pollution structural health and waste man-agementrdquo Smart Cities Articles 2011

[8] Kevin Ashton ldquoThat ldquointernet of thingsrdquo thingrdquo RFID Journal2011

[9] P Magrassi and T Berg ldquoA world of smart objectsrdquo GartnerResearch Report TR-17-2243 2002

[10] Oxford English Dictionary (OED) ldquoThe fact of comingfrom some particular source or quarter source derivationrdquohttpenwikipediaorgwikiProvenance

[11] AV Roth AA TsayM E Pullman and J VGray ldquoUnravelingthe food supply chain strategic insights from China and the2007 recallsrdquo Journal of Supply Chain Management vol 44 no1 pp 22ndash39 2008

[12] S Miles P Groth S Munroe and L Moreau ldquoPrime amethodology for developing provenance-aware applicationsrdquoACM Transactions on Software Engineering and Methodologyvol 20 no 3 article 8 2011

[13] R Hasan R Sion and M Winsltt ldquoThe case of the fakePicasso preventing history forgery with secure provenancerdquo inProceedings of the 7th Conference on File the Storage Technologies(FAST rsquo09) pp 1ndash14 New York NY USA December 2009

[14] T A McMeekin J Baranyi J Bowman et al ldquoInformationsystems in food safety managementrdquo International Journal ofFood Microbiology vol 112 no 3 pp 181ndash194 2006

[15] M A van der Gaag F Vos H W Saatkamp M van Boven Pvan Beek and R B M Huirne ldquoA state-transition simulationmodel for the spread of Salmonella in the pork supply chainrdquoEuropean Journal of Operational Research vol 156 no 3 pp782ndash798 2004

[16] L MWein and Y Liu ldquoAnalyzing a bioterror attack on the foodsupply the case of botulinum toxin in milkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 102 no 28 pp 9984ndash9989 2005

[17] L Qin andQ SWang ldquoFood supply chain qualitymanagementmodel and simulation based on gamerdquo in Proceedings of theInternational Conference on Computer Modeling and Simulation(ICCMS rsquo09) pp 291ndash293 Macau China February 2009

[18] Q Zhang D Wang T Huang et al ldquoModelling provenance infood supply chain to track and trace foodborne diseaserdquo in Pro-ceedings of the International Conference on Computer Modelingand Simulation pp 69ndash75 Hong Kong China February 2012

[19] S Li L Xu and X Wang ldquoCompressed sensing signal and dataacquisitio in wireless sensorrdquo IEEE Transactions on IndustrialInformatics 2012

[20] Z Ding and X Gao ldquoA database cluster system frameworkfor managing massive sensor sampling data in the internet ofthingsrdquo Chinese Journal of Computers vol 35 no 6 pp 1175ndash1191 2012

[21] L Zhang J Liu and H Jiang ldquoEnergy-efficient locationtracking with smartphones for IoTrdquo in Proceedings of the IEEESensors pp 1ndash4 Taipei China October 2012

[22] H Abbey ldquoAn examination of the Reed-Frost theory of epi-demicsrdquo Human Biology vol 24 no 3 pp 201ndash233 1952

[23] L Elveback J P Fox and A Varma ldquoAn extension of tee reed-frost epidemicmodel for the study of competition between viralagents in the presence of interferencerdquoThe American Journal ofEpidemiology vol 80 no 3 pp 356ndash364 1964

[24] X Yuan H Guo H Xiao Z Wang and X Zhang ldquoHigh-dimensional data virtualizationrdquo in Proceedings of the Commu-nications of the CCF pp 13ndash16 April 2011

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Research Article A Case Study of Sensor Data Collection ...downloads.hindawi.com/journals/ijdsn/2013/382132.pdf · smart cities. Our major contributionsare as follows. We propose

2 International Journal of Distributed Sensor Networks

areas beyond arts among which tracing the provenance ofan object or entity is a major aspect The main purpose oftracing the provenance is to provide contextual and circum-stantial evidence for its original production or discovery byestablishing as far as practicable its later history especiallythe sequences of its formal ownership custody and places ofstorage

Industrialization and rapid growth of human demandshave made food supply chain in modern cities move beyondregional and include global participation in importing andexporting According to the US Census the importedproportion of US food consumption has grown from 79to 96 between 1997 and 2005 roughly a 22 gain [11] Themomentumof changes grows even faster nowadaysThe scaleand heterogeneity of food supply chain make the capacity ofexisting regulations and approaches limitedAt this point IoTis a must for us as a platform to monitor and manage foodsupply chain

In this paper we discuss a case in tracing provenance offood supply chain which is a feasible application of IoT insmart cities Our major contributions are as follows

We propose Self-adaptive Dynamic Partition Sampling(SDPS) Strategy to collect data from sensors which wouldmitigate the workload with minor loss of tracing accuracyWithout loss of performance our strategy needs only a smallportion of end marketsrsquo samples from huge volume of rawmaterials and products along all levels in the food supplychain form This would be an interesting discovery as smartsampling is not explored intensively to manage data in IoTsystems for food supply chains though data collection andmodeling have been studied in IoT domain before

As a case of SDPS applications we introduce tracing andbacktracking algorithms to achieve provenance reasoning infood supply chain These methods can pinpoint the con-tamination source in the network and identify the potentialproblematic products in the markets We are able to sample asmall portion of food only in the end markets and maintainsufficient accuracy of provenance tracing over the whole IoTsystem at the same time

We further visualize the data flow and contaminationconditions for intuitive analysis Some work on provenancereasoning has already been realized and well modeled [12]but no one has ever explicitly modeled contamination condi-tions in cities

The rest of the paper is organized as follows In Section 2wewill present existing relatedworks Section 3 briefly gives aview of the systemrsquos hierarchy Section 4 raises the algorithmsand approaches in detail Results evaluation is presented inSection 5 and conclusion is drawn in Section 6 respectively

2 Related Work

Provenance issues have been studied by researchers in theareas of computer systems as well as management appli-cations in diversified information systems which comprisepart of the information technology (IT) infrastructure ofsmart citymanagementWikipedia on smart city [1] proposeda prototype of Provenance-Aware Storage System (PASS)

which could automatically collect provenance at the oper-ating system level Hasan et al [13] focused on a thoroughanalysis of threats to provenance systems Both of theirmethods are metadata models of provenance but they havenot explained how to exploit and process these data model todraw informative conclusions In the context of food safetymanagement information systems are important to assistdecision making in a short time frame potentially allowingdecisions to be made in real time

In smart city management domain food safety issuescaused by contamination have not been studied adequatelyin terms of modeling and visualization McMeekin et al [14]introduced the technique of information systems used in thesafety management of food supply chain A stochastic statetransition simulation model [15] as described to simulate thespread of Salmonella from multiplying through slaughterwith special emphasis for critical control points to preventor reduce Salmonella contamination Wein and Liu [16]developed a mathematical model of a cows-to-consumerssupply chain associated with a single milk-processing facilitythat is the victim of a deliberate release of botulinum toxinQin established a quality management model for food supplychain based on game theory [17]

Wehavemodeled and discussed about traceability in foodsupply chain in [18] In the current work algorithms arefurther optimized for big data and self-correction strategiesare applied tomake sampling and the whole scheme adaptiveAlso contamination conditions can be visualized tomake theIoT system more intuitive

Sampling strategies in IoT systems have attracted inten-sive studies [19ndash21] however some issues still remainunsolved for example how to exploit a small sampling sizefrom huge volume of food supplies without loss of accuracy

3 Modeling IoT System Structure forFood Supply Chains

With the growing size and demands of modern cities thestructure of food supply chain has become huge and compli-cated Moreover due to huge volume of sensors attached toitems travelling along it it is usually infeasible to collect andprocess sensing data from all the food in every level Based onthose concerns to speed up provenance solutions we onlygather a small part of sensor data on the end nodes in thechain So how to reckon on this small portion of sensor datato figure out contamination source appears to be a pendingissue in our strategy Additional concerns also arise fromthis problem regarding loss of accuracy due to small samplevolume and performance of tracing scheme We will proposeour heuristic approach and algorithms to tackle this problemlater in this paper with additional thoughts on algorithmcomplexity

31 Physical Structure of IoT Systems for Food Supply ChainsWe have sensors at every end node in the supply chain whichprovide us comparable information to determinewhether theproduct is safe or notWith the sensor data and their physical

International Journal of Distributed Sensor Networks 3

Data bus

Warehouse

Sensor data Server Display and visualization

SensorstorageSensorstorageSensorstorage SensorstorageSensorstorage

MarketTransportationFarm Transportation

Figure 1 An illustration of IoT systemrsquos physical structure modeled for food supply chain

connection food supply chain forms an internet of things(IoT) network

In reality real-time decision making is critical in foodsafety issues If contamination source is unknown for onemore hour more people will be exposed to danger Besidesin food industry examination is more or less material-consuming We cannot take part of every piece of food inevery stage of the chain no matter there are problems or notto the sensors for physical and chemical check as it wouldbring food companies a great economical loss As a resultwe would like to sample food only in the end markets witha small portion

After that with this small part of products wemanipulatethese data to get a whole picture over contamination condi-tions in the entire network such as the contamination sourceand the other involved foods that need to be recalled

The Physical structure of the system is shown in Figure 1

32 Modeling of Food Supply Chain Generally food supplychain can be divided into seven stages plantationcultivationslaughtering transportation inventory wholesale retailingand customers Although the chain is heterogeneous we canview it as the flow through the combination and repetition ofthose stages based on certain rules

Firstly it is often impossible for us to know in advancewhich physical position (eg vehicle or warehouse) a piece offood would be in In other words the trend of food is almostrandom

Secondly food can access a particular location morethan once and a location can play different roles in themanufacturing of one food product For instance pork canbe carried by the same vehicle before and after slaughteringwhich will generate a circle if we view the chain as a flow

Thirdly not all the food in the contamination source willbe infected The percentage of infection is determined by

the type of epidemic disease temperature density and otherobjective aspects

Finally other locations which are not the contaminationsources may also generate new contaminated food due tocross contamination The classic Reed Frost Model hasbecome a standard to model cross contamination conditions[22] Based on explicit contamination discussed in thismodel we introduce implicit infection to get the averageinfection possibility 119875 in certain batch and stage location by

119875 = 1 minus (1 minus 119875exp)exp(1 minus 119875imp)

imp (1)

We will use as the mark of number in this paper In (1)119875exp represents the possibility that infection happens if twopieces of food touch each other directly and one of themhas been explicitly infected and exp is the number of foodproducts that have been explicitly infected 119875imp and impmean the same respectively in implicit infection cases Foodthat has been implicitly infected will not infect the othersbut it will be counted as contaminated ones according to itsphysical and chemical characteristics As implicit infectionhas been considered here the model is much more realisticSome scholars have published several extensionmodels basedon Reed Frost Model [23] however we only take implicitinfection into consideration since (1) can describe our casebetter with sufficient accuracy

Food supply chain is viewed as a Directed Acyclic Graph(DAG) inwhich each node stands for one location keeping orprocessing some batches of food for a periodDAGconstructsthe relationship within the internet of things based on theorder and dependency among all the sensor data The graphis acyclic since we use batch number working as a timestamp that can distinguish stages in the chain In this wayalthough food may be carried by the same vehicle in morethan two stages they have different batch numbers which willbe regarded as two nodes in a DAG

4 International Journal of Distributed Sensor Networks

(1) Input type of foodborne disease T(2) Output sample set(3) Training Phase(4) Look up contamination probability p according to T(5) Configuration Topology information (contamination intervals) p(6) Sampling Phase(7) Sample a small portion n(8) BEST = infin(9) while 119899 le BEST do(10) Compute posterior probability based on Bayesian Estimation(11) 119875(119861 | 119860

119894) = (119896

119899)119886119896

119894(1 minus 119886

119894)119899minus119896

(12) 119875(119860119894| 119861) =

119875(119861 | 119860119894)119875(119860119894)

sum119899

119895=1119875(119861 | 119860

119895)119875(119860119895)

(13) for 119894 = 1(contamination intervals) do(14) if 119875interval

119894ge BEST then

(15) BEST = 119875interval119894

(16) end if(17) end for(18) If BEST le 80 then(19) Find the best sample rate according to its relationship with BEST(20) If 119899 le BEST then(21) Sample (BEST-n) products(22) else(23) break(24) end if(25) end if(26) end while

Algorithm 1 Sampling algorithm

Records of each location and product are documentedrespectively Location records include batch numbers thenumber of sampled products labeled as GOOD (uninfected)or BAD (infected) in this batch and the IDs of pollutedsamples in this batch Product records contain the informa-tion of examination result for a piece of food the orders ofbatches and locations the product passed and the pointers tothese location records The pointers serve as the connectionbetween those two data structures

33 Logic Structure of IoT System for Food Supply ChainsAs shown in Figure 2 the hierarchy of this IoT systemcontains four layers data collection and management layerintelligent processing layer graphic representation layer andself-correction layer Specific approachingmethods and algo-rithms will be discussed in the following sections

4 Heuristic Provenance Approach andTracing and BackTracking Algorithms

In this section detailed approaches and algorithms to solveprovenance issues in food supply chain are introducedFirstly we present a Self-adaptive Dynamic Partition Sam-pling (SDPS) Strategy to improve the efficiency and intel-ligence of sensor data collection and management Thentracing and backtracking algorithms are discussed respec-tively to catch the contamination source and dig out potentialinfected food products still circulating in themarkets Finallywe introduce Self-CorrectionMethod tomaintain andupdate

Data collection and management layer

Intelligent processing layer

Graphic representation layer

Self-correction layer

Training LUT Sampling Sample more

Tracing origin

Back-tracking

Datavisualization

Confidence metric metric

Success

Figure 2 Logic structure of IoT systemmodel for food supply chain

the system which would make the system adaptive andflexible to certain applications

41 Self-Adaptive Dynamic Partition Sampling Strategy Aswe mentioned in Section 31 to improve the efficiency ofmanipulating sensor data Self-adaptive Dynamic PartitionSampling Strategy (SDPS) is introduced which cuts downthe number of samples in a great deal The pseudocode forsampling algorithm is shown in Algorithm 1

International Journal of Distributed Sensor Networks 5

411 Partition Strategy Partition strategy makes samplesmore general and representative In this case the systemdivides the whole group of products into several partsaccording to the batches they belong to in the end marketsThe sampled volume for batch 119899 of market 119898 is determinedby

sample(119898119899)= sampletotal times

products(119898119899)

sum119872

119898=0sum119873

119899=0products

(119898119899)

(2)

Here 119872 and 119873 are the total number of end marketsand batches in network Subscripts (119898 119899) and total mean thenumber of samples or products in batch 119899 market 119898 and inall batches for all end markets respectively

412 Dynamic Strategy Dynamic strategy based on Bayesianestimation is adopted to achieve minimal sample vol-ume According to the infection probability of a particularpathogen determined by medical experiments the modelcan be trained to gain the distribution of total infectionprobability within the whole food supply network which isthe prior probability The probability density function of thedistribution is presented as a function of infection probabilityintervals

On the other hand after sampling a small part theinfectious rate of the samples the posterior probabilitiescan be obtained If 119896 infected products are found within 119899samples under a certain contamination percentage intervalwith prior probability of 119886

119894 conditional probability is

obtained by binomial distribution in the following

119875 (119861 | 119860119894) = (119899

119896) 119886119896

119894(1 minus 119886

119894)119899minus119896=119899

119896 (119899 minus 119896)119886119896

119894(1 minus 119886

119894)119899minus119896

(3)

Here119860119894means the event that the contamination percent-

age of the whole products falls into the 119894th interval with aprior probability of 119886

119894 and 119861means the event that we find 119896

contaminated products in 119899 samplesAfter that Bayesian Formula (4) is applied to combine

prior probabilities with posterior probabilities and get revisedprobabilities which describe the specific environment betteras follows

119875 (119860119894| 119861) =

119875 (119861 | 119860119894) 119875 (119860

119894)

sum119899

119895=1119875 (119861 | 119860

119894) 119875 (119860

119894) (4)

413 Self-Adaptive Strategy The tracing algorithmwhichwillbe discussed in the next section has some requirements forits input sampling data If the ratio of infected products touninfected ones is too high the tracing algorithm performspoorly as there are not enough healthy samples to exclude thesuspicions On the contrary if the ratio is too low the noiseintroduced by sampling process may dominate the resultIn these two extreme cases more samples than other casesshould be tested to improve the accuracy Hence under eachsampling rate there is a relationship between the best tracingalgorithm accuracy and pollution proportion interval for

a particular topology Given all the relationships in a specificinterval for economical reasons this strategy picks up thesmallest sampling rate that achieves certain requirements(eg 90 accuracy)Then we sample the food in endmarketagain under that rate and update Bayesian Estimation to findif the sampling rate has met the requirements

42 Tracing Algorithm Pseudocodes of tracing algorithm areshown inAlgorithm 2After sampling and sensing we add upthe number of infected and uninfected food products passingevery location and batch They are stored in two variablesGOOD and BAD for each place

Supposing that the samples properly reveal the conditionof the whole products set the criterion to find the suspectsources is set as GOOD lt 120576 and BAD gt 0 It is feasiblebecause the contamination source would be the primaryspot generating polluted food and the number of uninfectedsamples is limited there 120576 regarded as the error factoris a small integer which enables the algorithm to remainvalid when not all the food passing the source is infectedor there is some disturbance caused by nonideal problems(eg imperfect sampling) The specific 120576 value is decided bythe samplesrsquo number and infection probability of pollutantsource which can be roughly represented as follows

120576 =samples times pollution probability

batches (5)

This criterion is not strict enough to pinpoint only onecontamination source as the result So extra work shouldbe applied to eliminate these confusion suspects First of allto improve the speed of the algorithm suspects with smallBAD value will be excluded Then the system will generate aSuspect Tree composed of the suspected locations and batchesaccording to their order in the food supply chain After thattraverse the Suspect Tree layer by layer and the first node thatmeets the same criterion will be picked up as the root sourcesince the original contaminant is always on the top over crosscontaminant in the tree

43 BackTracking Algorithm In order to judge the perfor-mance of backtracking algorithm Hit Rate and False AlarmRate are put forwards to denote the algorithmrsquos ability ofcapturing infected products and the probability of reckoninggood products as infected ones by mistake Supposing thetotal number of products and infected products are119873 and 119868respectively and the algorithm selected 119899 potentially infectedproducts including 119894 infected ones we define Hit Rate as 119894119868andFalse AlarmRate as (119899minus119894)(119873minus119868) Although theoreticallyboth high Hit Rate and low False Alarm Rate are expectedthere is a tradeoff between them

The backtracking algorithm is described in Algorithm 3

44 Self CorrectionMethod Twometrics are defined to judgethe performance of the system and provide reference for latterparametersrsquo settings

6 International Journal of Distributed Sensor Networks

(1) Input samplesrsquo spatial information and examination results(2) Output contamination origin(3) 119896 = 0(4) for 119894 = 1samples do(5) for 119895 = 1(locationsbatches on sample

119894rsquos path) do

(6) if sample119894is infected then

(7) sample119894location

119895batch

119895BAD++

(8) else(9) sample

119894location

119895batch

119895GOOD++

(10) end if(11) end for(12) end for(13) for 119898 = 1(locationsbatches in the entire chain) do(14) if location

119898batch

119898GOOD le 120576 ampamp location

119898batch

119898BAD ge 0 then

(15) Record locationbatch into suspect[k](16) k++(17) end if(18) end for(19) Exclude suspects wsmall BAD(20) if suspect ge 1 then(21) Get food IDs passed all suspects(22) if ID ge 0 then(23) Get food IDs passed at least one suspect(24) end if(25) end if(26) Construct ldquoSuspect Treerdquo of batches according to the paths of these IDs(27) for 119899 = 1(tree nodes) do(28) if (suspect[n]location

119899batch

119899GOOD le 120576 ampamp

(29) suspect[n]location119899batch

119899BAD ge 0) then

(30) origin = suspect[119899](31) end if(32) end for

Algorithm 2 Tracing algorithm

(1) Input contaminated samples set Re-check(2) Output infected food products set Bad(3) while (1) do(4) Construct a tree of locationbatch according to(5) the paths of contaminated products in Re-check(6) Traverse the tree DFS(7) Record all nodes in Bad(8) Empty Re-check(9) if nodelocationbatch is new then(10) Find the food IDs passed these nodes(11) Sensor them(12) if food is contaminated then(13) Put its ID in Re-check(14) end if(15) else(16) break(17) end if(18) end while

Algorithm 3 Back tracking algorithm

International Journal of Distributed Sensor Networks 7

Farm

Vehicle 1

Vehicle 1

Vehicle 2

Vehicle 2 Vehicle 3

Factory

Vehicle 4 Vehicle 5

Market 2Market 1 Market 3

Figure 3 Topology (DAG) of a food supply chain case for evaluation

Confidence Metric (CM) is defined as the differencebetween prior probability and posterior probability as fol-lows

CM =10038161003816100381610038161003816119875post minus 119875pri

10038161003816100381610038161003816

119875pri (6)

119875post and 119875pri are the posterior and prior possibilitiesrespectively If CM is small we are more confident that thedataset suits the model trained previously and vice versaThus we can correct (4) and combine prior and posteriorprobabilities to get more reasonable infection probabilities119875comb of the entire network as follows

119875comb =119875 (119860119894| 119861) + CM times 119875 (119860

119894)

1 + CM (7)

Success Metric (SM)measures the accuracy of the systemas follows

SM = successtotal (8)

It is defined as the ratio of successfully detected times tototal tested times With lower SM the criterion of samplingwould be set stricter and vice versa

These two variables help to adjust sampling algorithmslightly to fit it into certain environment and applications

45 Timing and Space Complexities Suppose there are 119898samples 119899 stages and 119897 batches for all locations in a food

supply chain timing complexities of tracing and back-tracking algorithms are 119874(119898119899) + 119874(119897) and 119874(119898) + 119874(119897)respectively There exists a tradeoff between tracing accuracyand time consumption in SDPS Obviously more samplesmean longer time and better knowledge of the networkCompared with the time spent on chemical testing andsensing time consumption in SDPS is negligible

A piece of record is required for every location and everyfood products so space complexity of the whole IoT systemis 119874(119886 + 119887) where 119886 is the number of food products and 119887 isthe number of locations in the network

5 Evaluation Results and Analysis

We set up two specific cases (Figures 3 and 4) to evaluate theproposed systemThe first case gives a general evaluation andshows that our SDPS scheme outperforms other samplingmethods while the second one focuses on the performanceon large system and big data

51 Experimental Setup In Figure 3 note that vehicles 1 and2 serve as the transportation node both from farm to factoryand factory to market This makes the model closer to realityas some locations in the chain can act as different charactersin food procession

The configuration of the two cases is listed in Table 1Every location in the chain holds 25 and 500 batches in caseof 1 and 2 respectively Total of 60 and 800 thousand of food

8 International Journal of Distributed Sensor Networks

Farm 1

Vehicle 2 Vehicle 3

Vehicle 3 Vehicle 4 Vehicle 5 Vehicle 6

Factory1

Vehicle 1 Vehicle 2

Market 2Market 1 Market 3

Farm 2

Vehicle 7

Factory2

Farm 3

Vehicle 1

Figure 4 Topology (DAG) of a food supply chain case for evaluation (with big data)

products are circulating in the network Which location apiece of food will pass is absolutely random

We build the simulator in C++ which reads in theconfiguration files that describe supply chain topologiesgenerates simulated contamination behavior and senses dataalong food supply chains In simulation the contaminationsource and sampling volume can be set by the users explicitlyFood contamination rules are set as that food can be infectedby passing (1) contamination source directly or (2) crossinfection spots indirectly according to our revised Reed FrostModel

To make this paper compact we only show the trainingprocess of the first model here In the training process weget prior probabilities which highly depend on the topologyand configuration of the chain Figure 5 shows the distribu-tion of infected proportion under different contaminationprobabilities after 300000 tests Under each contaminationprobabilities (depend on contamination type) the actualportion of pollution is almost Gaussian distributed which isthe same as what we discussed in Section 4 Note that thevalue of 119909 axis (119909) should be transferred to the pollutionproportion interval by the following function [(119909 minus 1) lowast4 119909 lowast 4) since the total space is divided into 25 sectionsThe relation between the productsrsquo contamination percentageand the tracing algorithm accuracy under different samplerates is shown in Figure 6 For clarity only 3 samplingrates are tested 3 5 and 10 To make sample strategymore efficient more rates can be evaluated in real situationsFigure 6 confirms the hypothesis we proposed the source isdifficult to be detected if only a small or too large part of foodis contaminated In bothways the flowpath of contaminationis hidden easily

Table 1 Configuration of the two cases

Case 1 Case 2batcheslocation 25 500total products 60000 800000Flow rule Random Random

Table 2 Simulation results of back tracking algorithm

Hit rate False alarm rate96 3

52 Evaluation Results Figure 7 shows the accuracy of thetracing algorithm In different probability of infection in thewhole chain the accuracy can achieve no less than 80 InFigures 8 and 9 partition and dynamic strategy in SDPSare tested respectively In all probabilities of infection casespartition strategy has higher tracing accuracies than thatof global sampling strategy (118 227 109 and 41higher with the infection probabilities of 30 67 80 and90) And compared with sampling in fixed rates (3 5and 10) dynamic method achieves higher tracing accuracyeven with a lower average sampling rate of 78

For backtracking part the result of simulation is shown inTable 2 Both Hit Rate and False Alarm Rate are satisfactory

Case 2 has a large data scale We also fetch a few eachtimes and let the system tell us the amount of samples to getnext time based on (4) Systemrsquos actual sampling rate turnsto be 78 As in Figure 10 the accuracy of tracing algorithmis higher than 80 as well which shows that our proposedapproach works well with big data

International Journal of Distributed Sensor Networks 9

106 8420 12 14 16 18 20 22 24 260

10000

20000

30000

40000

50000

60000

70000

80000

90000

Dist

ribut

ion

of 3

000

00 te

sts

Contamination probability is 30Contamination probability is 67Contamination probability is 80Contamination probability is 90

Pollution proportion interval [(x minus 1)lowast4 xlowast4]

Figure 5 Prior probabilities distribution under different contami-nation probabilities 30 67 80 and 90

0

10

20

30

40

50

60

70

80

90

100

Accu

racy

of t

raci

ng al

gorit

hm (

)

Pollution proportion interval ()

3510

4-5

6-7

8-9

10-1

1

12-1

3

14-1

5

20ndash2

5

30ndash3

5

40ndash4

5

50ndash5

5

60ndash6

5

70ndash7

5

Figure 6 The relationship of tracing algorithm accuracy andpollution proportion intervals under different sample rates 3 5and 10

53 Contamination Visualization With the tool introducedby [24] and the information we choose to record SDPSprovides sampling data that can be represented visually afterbeing tested by sensors The Figures 11 and 12 show the dataflow of infected and uninfected food products respectivelyIn this case we use the configuration in Case 1 and setthe contamination source to be the 4th batch in factory inadvance

10050

0 20 40 60 80

60

70

80

90

100

Probability of infection ()

Trac

ing

accu

racy

()

Figure 7 Accuracy of tracing algorithm with different probabilitiesof infection

40 60 8050

60

70

80

90

100

Probability of infection ()

Partition random samplingGlobal random sampling

Trac

ing

accu

racy

()

Figure 8 Simulation results of partition sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

The vertical lines marked with locations and batchesnumbers represent the nodes in data flow Lines goingthrough these nodes are the traces of foodproducts As shownin those figures most of the infected food while none of theuninfected food passed the 4th batch of factory (the nodewith a circle around it) So it has a great chance to be thesource of contamination which is also proven by our tracingsystem

SDPS makes data concise but still comprehensive whichfacilitates visualization tool displaying the useful informa-tion

Apart from aiding detecting contamination source visu-alization can also help to know contamination conditions(eg contamination severitydistribution) of the whole IoTsystem better For example a well-managed warehouse or acity with lower temperature may lead to less contamination

10 International Journal of Distributed Sensor Networks

Probability of infection ()

Trac

ing

accu

racy

()

3 sampling proportion5 sampling proportion

10 sampling proportionDynamic sampling

40 60 8050

60

70

80

90

100

Figure 9 Simulation results of dynamic sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

Probability of infection ()

Trac

ing

accu

racy

()

500 30 60 90

60

70

80

90

100

Figure 10 Simulation results of tracing accuracywith big data underdifferent infection probabilities

Information of these kinds can be directly read from visual-ization images and help manufacturers to design their foodsupply chain more scientifically

Visualization of the contamination condition in IoTsystemmakes the provenance reasoning in food supply chainintuitive and informative

54 Performance Estimation in Real Situations In realityfood supply chain is more complicated A lot of factors suchas specific food type environment temperaturemanufactureprocess and other parameters could make the chain difficultto predict To implement our strategy into real situationsthose factors should be concerned and some parametersshould be adjusted accordingly

The factor that influences the behavior of food supplychain the most is the type of food Different food has its

own characteristics which may dominate the model theprovenance procedure and expecting results Firstly foodtype can decide the possible contamination source In (1)119875exp and 119875imp are related with the virus that spreads amongfood For example avian influenza virus which is a commoninfectious disease among poultry began to be contiguousamong human beings After the mutation infectious abilityof this virus grew significantly As a result 119875exp and 119875impof avian influenza virus in chickens would also increaseSecondly food type is a deciding factor for its storagepattern and quality guarantee period Some canneddrinks arestacked layer by layer separately so they would not got crosscontaminated However raw meat is generally kept togetherwhich provides an easy environment for virus to spreadThirdly the state of food is also dominant in provenance Onepiece of food in solid state can be seen as a unit while liquidfood like yogurt could be ruined by only one deteriorateddrop In this way for yogurt the sampling process couldbe very different as spatial position should be taken intoconsideration and virusrsquo behavior of liquid should also bestudied

Besides food type there are other factors playing impor-tant roles in real situation Food in summer is more likelyto turn rotten than winter some manufacturing factoriesare more hygienic than others with time passing food mayget easier to be infectious and the types or dosages offood addictives may make the contamination process sloweddown

Although different food supply chain can behave var-iously our proposed strategy can cover most of the casesbecause it obeys the general model of food supply networkand epidemiological principles

6 Conclusion

In this paper we present a heuristic approach to tracingcontamination sources in large IoT systems for complicatedfood supply chains which is a critical issue in metropoli-tan life In our approach Self-adaptive Dynamic PartitionSampling (SDPS) Strategy was proposed to collect data forsensors whose input is only a small portion of end marketsamples from huge volume of samples along food supplychains The approach was illustrated with a case study ofIoT system about provenance in food supply chain whichcan efficiently stop the outbreaks of foodborne disease Withthe intelligent SDPS Strategy objects tested by sensors arethe most reasonable portion of the entire products set Theefficiency is highly improved and the accuracy stays almostthe same as sensing all the objects at the same time SDPSkeeps the integrity of information and approaches a nearlyreal-time examination Also we present a tracing algorithmto find the contamination sources of food supply chainsand a backtracing algorithm to provide strategy for recallingproblematical food undiscovered in the chain It is indicatedin simulation results that our SDPS scheme can achieve up tothe tracing accuracy of 978with a smaller average samplingpercentage compared with traditional global random sam-pling We managed to sample a small portion of food only

International Journal of Distributed Sensor Networks 11

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 11 Visualized data flow of infected food products in food supply chain

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 12 Visualized data flow of uninfected food products in food supply chain

in the end market without loss in accuracy of provenancetracing over the whole IoT system In addition our analyticdata and visualized images can clearly model contaminationconditions in food supply chain within the context of IoTsystem This will give the clients an intuitive impression onfood supply networks in a city

In this paper we assume that all provenance informationof food products is hosted by a centralized repository andthese provenance metadata are organized in a uniform man-ner Our future work is to further make practical implemen-tation of the provenance of food supply chain in a communityas our testing bed for megacity management

12 International Journal of Distributed Sensor Networks

Acknowledgments

This paper is sponsored in part by the Shanghai Interna-tional Science and Technology Collaboration Program underGrant 13430710400 and Campus for Research Excellenceand Technological Enterprise (CREATE) program of Singa-pore National Research Foundation under the joint projecton Energy and Environmental Sustainability Solutions forMegacities (R-706-000-101-281) by Shanghai Jiao Tong Uni-versity (SJTU) and National University of Singapore (NUS)ProfessorQiu is partially supported byNSFCNS-1249223 andNSFC 61071061

References

[1] ldquoWikipedia on smart cityrdquo httpenwikipediaorgwikiSmartcity

[2] M Qiu and E H M Sha ldquoCost minimization while satisfy-ing hardsoft timing constraints for heterogeneous embeddedsystemsrdquoACMTransactions on Design Automation of ElectronicSystems vol 14 no 2 article 25 2009

[3] J Li M Qiu Z Ming G Quan X Qin and Z Gu ldquoOnlineoptimization for scheduling preemptable tasks on IaaS cloudsystemsrdquo Journal of Parallel and Distributed Computing vol 72no 5 pp 666ndash677 2012

[4] K Su J Li and H Fu ldquoSmart city and the applicationsrdquoin Proceedings of the International Conference on ElectronicsCommunications and Control (ICECC rsquo2011) pp 1028ndash1033Zhejiang China September 2011

[5] X Tang J Pu K Cao Y Zhang and Z Xiong ldquoIntegratedextensible simulation platform for vehicular sensor networksin smart citiesrdquo International Journal of Distributed SensorNetworks vol 2012 Article ID 860415 10 pages 2012

[6] P Vlacheas R Giaffreda V Stavroulaki et al ldquoEnabling smartcities through a cognitive management framework for theinternet of thingsrdquo IEEE Communications Magazine vol 51 no6 pp 102ndash111 2013

[7] A Asin ldquoSmart cities from libelium allows systems integratorsto monitor noise pollution structural health and waste man-agementrdquo Smart Cities Articles 2011

[8] Kevin Ashton ldquoThat ldquointernet of thingsrdquo thingrdquo RFID Journal2011

[9] P Magrassi and T Berg ldquoA world of smart objectsrdquo GartnerResearch Report TR-17-2243 2002

[10] Oxford English Dictionary (OED) ldquoThe fact of comingfrom some particular source or quarter source derivationrdquohttpenwikipediaorgwikiProvenance

[11] AV Roth AA TsayM E Pullman and J VGray ldquoUnravelingthe food supply chain strategic insights from China and the2007 recallsrdquo Journal of Supply Chain Management vol 44 no1 pp 22ndash39 2008

[12] S Miles P Groth S Munroe and L Moreau ldquoPrime amethodology for developing provenance-aware applicationsrdquoACM Transactions on Software Engineering and Methodologyvol 20 no 3 article 8 2011

[13] R Hasan R Sion and M Winsltt ldquoThe case of the fakePicasso preventing history forgery with secure provenancerdquo inProceedings of the 7th Conference on File the Storage Technologies(FAST rsquo09) pp 1ndash14 New York NY USA December 2009

[14] T A McMeekin J Baranyi J Bowman et al ldquoInformationsystems in food safety managementrdquo International Journal ofFood Microbiology vol 112 no 3 pp 181ndash194 2006

[15] M A van der Gaag F Vos H W Saatkamp M van Boven Pvan Beek and R B M Huirne ldquoA state-transition simulationmodel for the spread of Salmonella in the pork supply chainrdquoEuropean Journal of Operational Research vol 156 no 3 pp782ndash798 2004

[16] L MWein and Y Liu ldquoAnalyzing a bioterror attack on the foodsupply the case of botulinum toxin in milkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 102 no 28 pp 9984ndash9989 2005

[17] L Qin andQ SWang ldquoFood supply chain qualitymanagementmodel and simulation based on gamerdquo in Proceedings of theInternational Conference on Computer Modeling and Simulation(ICCMS rsquo09) pp 291ndash293 Macau China February 2009

[18] Q Zhang D Wang T Huang et al ldquoModelling provenance infood supply chain to track and trace foodborne diseaserdquo in Pro-ceedings of the International Conference on Computer Modelingand Simulation pp 69ndash75 Hong Kong China February 2012

[19] S Li L Xu and X Wang ldquoCompressed sensing signal and dataacquisitio in wireless sensorrdquo IEEE Transactions on IndustrialInformatics 2012

[20] Z Ding and X Gao ldquoA database cluster system frameworkfor managing massive sensor sampling data in the internet ofthingsrdquo Chinese Journal of Computers vol 35 no 6 pp 1175ndash1191 2012

[21] L Zhang J Liu and H Jiang ldquoEnergy-efficient locationtracking with smartphones for IoTrdquo in Proceedings of the IEEESensors pp 1ndash4 Taipei China October 2012

[22] H Abbey ldquoAn examination of the Reed-Frost theory of epi-demicsrdquo Human Biology vol 24 no 3 pp 201ndash233 1952

[23] L Elveback J P Fox and A Varma ldquoAn extension of tee reed-frost epidemicmodel for the study of competition between viralagents in the presence of interferencerdquoThe American Journal ofEpidemiology vol 80 no 3 pp 356ndash364 1964

[24] X Yuan H Guo H Xiao Z Wang and X Zhang ldquoHigh-dimensional data virtualizationrdquo in Proceedings of the Commu-nications of the CCF pp 13ndash16 April 2011

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Research Article A Case Study of Sensor Data Collection ...downloads.hindawi.com/journals/ijdsn/2013/382132.pdf · smart cities. Our major contributionsare as follows. We propose

International Journal of Distributed Sensor Networks 3

Data bus

Warehouse

Sensor data Server Display and visualization

SensorstorageSensorstorageSensorstorage SensorstorageSensorstorage

MarketTransportationFarm Transportation

Figure 1 An illustration of IoT systemrsquos physical structure modeled for food supply chain

connection food supply chain forms an internet of things(IoT) network

In reality real-time decision making is critical in foodsafety issues If contamination source is unknown for onemore hour more people will be exposed to danger Besidesin food industry examination is more or less material-consuming We cannot take part of every piece of food inevery stage of the chain no matter there are problems or notto the sensors for physical and chemical check as it wouldbring food companies a great economical loss As a resultwe would like to sample food only in the end markets witha small portion

After that with this small part of products wemanipulatethese data to get a whole picture over contamination condi-tions in the entire network such as the contamination sourceand the other involved foods that need to be recalled

The Physical structure of the system is shown in Figure 1

32 Modeling of Food Supply Chain Generally food supplychain can be divided into seven stages plantationcultivationslaughtering transportation inventory wholesale retailingand customers Although the chain is heterogeneous we canview it as the flow through the combination and repetition ofthose stages based on certain rules

Firstly it is often impossible for us to know in advancewhich physical position (eg vehicle or warehouse) a piece offood would be in In other words the trend of food is almostrandom

Secondly food can access a particular location morethan once and a location can play different roles in themanufacturing of one food product For instance pork canbe carried by the same vehicle before and after slaughteringwhich will generate a circle if we view the chain as a flow

Thirdly not all the food in the contamination source willbe infected The percentage of infection is determined by

the type of epidemic disease temperature density and otherobjective aspects

Finally other locations which are not the contaminationsources may also generate new contaminated food due tocross contamination The classic Reed Frost Model hasbecome a standard to model cross contamination conditions[22] Based on explicit contamination discussed in thismodel we introduce implicit infection to get the averageinfection possibility 119875 in certain batch and stage location by

119875 = 1 minus (1 minus 119875exp)exp(1 minus 119875imp)

imp (1)

We will use as the mark of number in this paper In (1)119875exp represents the possibility that infection happens if twopieces of food touch each other directly and one of themhas been explicitly infected and exp is the number of foodproducts that have been explicitly infected 119875imp and impmean the same respectively in implicit infection cases Foodthat has been implicitly infected will not infect the othersbut it will be counted as contaminated ones according to itsphysical and chemical characteristics As implicit infectionhas been considered here the model is much more realisticSome scholars have published several extensionmodels basedon Reed Frost Model [23] however we only take implicitinfection into consideration since (1) can describe our casebetter with sufficient accuracy

Food supply chain is viewed as a Directed Acyclic Graph(DAG) inwhich each node stands for one location keeping orprocessing some batches of food for a periodDAGconstructsthe relationship within the internet of things based on theorder and dependency among all the sensor data The graphis acyclic since we use batch number working as a timestamp that can distinguish stages in the chain In this wayalthough food may be carried by the same vehicle in morethan two stages they have different batch numbers which willbe regarded as two nodes in a DAG

4 International Journal of Distributed Sensor Networks

(1) Input type of foodborne disease T(2) Output sample set(3) Training Phase(4) Look up contamination probability p according to T(5) Configuration Topology information (contamination intervals) p(6) Sampling Phase(7) Sample a small portion n(8) BEST = infin(9) while 119899 le BEST do(10) Compute posterior probability based on Bayesian Estimation(11) 119875(119861 | 119860

119894) = (119896

119899)119886119896

119894(1 minus 119886

119894)119899minus119896

(12) 119875(119860119894| 119861) =

119875(119861 | 119860119894)119875(119860119894)

sum119899

119895=1119875(119861 | 119860

119895)119875(119860119895)

(13) for 119894 = 1(contamination intervals) do(14) if 119875interval

119894ge BEST then

(15) BEST = 119875interval119894

(16) end if(17) end for(18) If BEST le 80 then(19) Find the best sample rate according to its relationship with BEST(20) If 119899 le BEST then(21) Sample (BEST-n) products(22) else(23) break(24) end if(25) end if(26) end while

Algorithm 1 Sampling algorithm

Records of each location and product are documentedrespectively Location records include batch numbers thenumber of sampled products labeled as GOOD (uninfected)or BAD (infected) in this batch and the IDs of pollutedsamples in this batch Product records contain the informa-tion of examination result for a piece of food the orders ofbatches and locations the product passed and the pointers tothese location records The pointers serve as the connectionbetween those two data structures

33 Logic Structure of IoT System for Food Supply ChainsAs shown in Figure 2 the hierarchy of this IoT systemcontains four layers data collection and management layerintelligent processing layer graphic representation layer andself-correction layer Specific approachingmethods and algo-rithms will be discussed in the following sections

4 Heuristic Provenance Approach andTracing and BackTracking Algorithms

In this section detailed approaches and algorithms to solveprovenance issues in food supply chain are introducedFirstly we present a Self-adaptive Dynamic Partition Sam-pling (SDPS) Strategy to improve the efficiency and intel-ligence of sensor data collection and management Thentracing and backtracking algorithms are discussed respec-tively to catch the contamination source and dig out potentialinfected food products still circulating in themarkets Finallywe introduce Self-CorrectionMethod tomaintain andupdate

Data collection and management layer

Intelligent processing layer

Graphic representation layer

Self-correction layer

Training LUT Sampling Sample more

Tracing origin

Back-tracking

Datavisualization

Confidence metric metric

Success

Figure 2 Logic structure of IoT systemmodel for food supply chain

the system which would make the system adaptive andflexible to certain applications

41 Self-Adaptive Dynamic Partition Sampling Strategy Aswe mentioned in Section 31 to improve the efficiency ofmanipulating sensor data Self-adaptive Dynamic PartitionSampling Strategy (SDPS) is introduced which cuts downthe number of samples in a great deal The pseudocode forsampling algorithm is shown in Algorithm 1

International Journal of Distributed Sensor Networks 5

411 Partition Strategy Partition strategy makes samplesmore general and representative In this case the systemdivides the whole group of products into several partsaccording to the batches they belong to in the end marketsThe sampled volume for batch 119899 of market 119898 is determinedby

sample(119898119899)= sampletotal times

products(119898119899)

sum119872

119898=0sum119873

119899=0products

(119898119899)

(2)

Here 119872 and 119873 are the total number of end marketsand batches in network Subscripts (119898 119899) and total mean thenumber of samples or products in batch 119899 market 119898 and inall batches for all end markets respectively

412 Dynamic Strategy Dynamic strategy based on Bayesianestimation is adopted to achieve minimal sample vol-ume According to the infection probability of a particularpathogen determined by medical experiments the modelcan be trained to gain the distribution of total infectionprobability within the whole food supply network which isthe prior probability The probability density function of thedistribution is presented as a function of infection probabilityintervals

On the other hand after sampling a small part theinfectious rate of the samples the posterior probabilitiescan be obtained If 119896 infected products are found within 119899samples under a certain contamination percentage intervalwith prior probability of 119886

119894 conditional probability is

obtained by binomial distribution in the following

119875 (119861 | 119860119894) = (119899

119896) 119886119896

119894(1 minus 119886

119894)119899minus119896=119899

119896 (119899 minus 119896)119886119896

119894(1 minus 119886

119894)119899minus119896

(3)

Here119860119894means the event that the contamination percent-

age of the whole products falls into the 119894th interval with aprior probability of 119886

119894 and 119861means the event that we find 119896

contaminated products in 119899 samplesAfter that Bayesian Formula (4) is applied to combine

prior probabilities with posterior probabilities and get revisedprobabilities which describe the specific environment betteras follows

119875 (119860119894| 119861) =

119875 (119861 | 119860119894) 119875 (119860

119894)

sum119899

119895=1119875 (119861 | 119860

119894) 119875 (119860

119894) (4)

413 Self-Adaptive Strategy The tracing algorithmwhichwillbe discussed in the next section has some requirements forits input sampling data If the ratio of infected products touninfected ones is too high the tracing algorithm performspoorly as there are not enough healthy samples to exclude thesuspicions On the contrary if the ratio is too low the noiseintroduced by sampling process may dominate the resultIn these two extreme cases more samples than other casesshould be tested to improve the accuracy Hence under eachsampling rate there is a relationship between the best tracingalgorithm accuracy and pollution proportion interval for

a particular topology Given all the relationships in a specificinterval for economical reasons this strategy picks up thesmallest sampling rate that achieves certain requirements(eg 90 accuracy)Then we sample the food in endmarketagain under that rate and update Bayesian Estimation to findif the sampling rate has met the requirements

42 Tracing Algorithm Pseudocodes of tracing algorithm areshown inAlgorithm 2After sampling and sensing we add upthe number of infected and uninfected food products passingevery location and batch They are stored in two variablesGOOD and BAD for each place

Supposing that the samples properly reveal the conditionof the whole products set the criterion to find the suspectsources is set as GOOD lt 120576 and BAD gt 0 It is feasiblebecause the contamination source would be the primaryspot generating polluted food and the number of uninfectedsamples is limited there 120576 regarded as the error factoris a small integer which enables the algorithm to remainvalid when not all the food passing the source is infectedor there is some disturbance caused by nonideal problems(eg imperfect sampling) The specific 120576 value is decided bythe samplesrsquo number and infection probability of pollutantsource which can be roughly represented as follows

120576 =samples times pollution probability

batches (5)

This criterion is not strict enough to pinpoint only onecontamination source as the result So extra work shouldbe applied to eliminate these confusion suspects First of allto improve the speed of the algorithm suspects with smallBAD value will be excluded Then the system will generate aSuspect Tree composed of the suspected locations and batchesaccording to their order in the food supply chain After thattraverse the Suspect Tree layer by layer and the first node thatmeets the same criterion will be picked up as the root sourcesince the original contaminant is always on the top over crosscontaminant in the tree

43 BackTracking Algorithm In order to judge the perfor-mance of backtracking algorithm Hit Rate and False AlarmRate are put forwards to denote the algorithmrsquos ability ofcapturing infected products and the probability of reckoninggood products as infected ones by mistake Supposing thetotal number of products and infected products are119873 and 119868respectively and the algorithm selected 119899 potentially infectedproducts including 119894 infected ones we define Hit Rate as 119894119868andFalse AlarmRate as (119899minus119894)(119873minus119868) Although theoreticallyboth high Hit Rate and low False Alarm Rate are expectedthere is a tradeoff between them

The backtracking algorithm is described in Algorithm 3

44 Self CorrectionMethod Twometrics are defined to judgethe performance of the system and provide reference for latterparametersrsquo settings

6 International Journal of Distributed Sensor Networks

(1) Input samplesrsquo spatial information and examination results(2) Output contamination origin(3) 119896 = 0(4) for 119894 = 1samples do(5) for 119895 = 1(locationsbatches on sample

119894rsquos path) do

(6) if sample119894is infected then

(7) sample119894location

119895batch

119895BAD++

(8) else(9) sample

119894location

119895batch

119895GOOD++

(10) end if(11) end for(12) end for(13) for 119898 = 1(locationsbatches in the entire chain) do(14) if location

119898batch

119898GOOD le 120576 ampamp location

119898batch

119898BAD ge 0 then

(15) Record locationbatch into suspect[k](16) k++(17) end if(18) end for(19) Exclude suspects wsmall BAD(20) if suspect ge 1 then(21) Get food IDs passed all suspects(22) if ID ge 0 then(23) Get food IDs passed at least one suspect(24) end if(25) end if(26) Construct ldquoSuspect Treerdquo of batches according to the paths of these IDs(27) for 119899 = 1(tree nodes) do(28) if (suspect[n]location

119899batch

119899GOOD le 120576 ampamp

(29) suspect[n]location119899batch

119899BAD ge 0) then

(30) origin = suspect[119899](31) end if(32) end for

Algorithm 2 Tracing algorithm

(1) Input contaminated samples set Re-check(2) Output infected food products set Bad(3) while (1) do(4) Construct a tree of locationbatch according to(5) the paths of contaminated products in Re-check(6) Traverse the tree DFS(7) Record all nodes in Bad(8) Empty Re-check(9) if nodelocationbatch is new then(10) Find the food IDs passed these nodes(11) Sensor them(12) if food is contaminated then(13) Put its ID in Re-check(14) end if(15) else(16) break(17) end if(18) end while

Algorithm 3 Back tracking algorithm

International Journal of Distributed Sensor Networks 7

Farm

Vehicle 1

Vehicle 1

Vehicle 2

Vehicle 2 Vehicle 3

Factory

Vehicle 4 Vehicle 5

Market 2Market 1 Market 3

Figure 3 Topology (DAG) of a food supply chain case for evaluation

Confidence Metric (CM) is defined as the differencebetween prior probability and posterior probability as fol-lows

CM =10038161003816100381610038161003816119875post minus 119875pri

10038161003816100381610038161003816

119875pri (6)

119875post and 119875pri are the posterior and prior possibilitiesrespectively If CM is small we are more confident that thedataset suits the model trained previously and vice versaThus we can correct (4) and combine prior and posteriorprobabilities to get more reasonable infection probabilities119875comb of the entire network as follows

119875comb =119875 (119860119894| 119861) + CM times 119875 (119860

119894)

1 + CM (7)

Success Metric (SM)measures the accuracy of the systemas follows

SM = successtotal (8)

It is defined as the ratio of successfully detected times tototal tested times With lower SM the criterion of samplingwould be set stricter and vice versa

These two variables help to adjust sampling algorithmslightly to fit it into certain environment and applications

45 Timing and Space Complexities Suppose there are 119898samples 119899 stages and 119897 batches for all locations in a food

supply chain timing complexities of tracing and back-tracking algorithms are 119874(119898119899) + 119874(119897) and 119874(119898) + 119874(119897)respectively There exists a tradeoff between tracing accuracyand time consumption in SDPS Obviously more samplesmean longer time and better knowledge of the networkCompared with the time spent on chemical testing andsensing time consumption in SDPS is negligible

A piece of record is required for every location and everyfood products so space complexity of the whole IoT systemis 119874(119886 + 119887) where 119886 is the number of food products and 119887 isthe number of locations in the network

5 Evaluation Results and Analysis

We set up two specific cases (Figures 3 and 4) to evaluate theproposed systemThe first case gives a general evaluation andshows that our SDPS scheme outperforms other samplingmethods while the second one focuses on the performanceon large system and big data

51 Experimental Setup In Figure 3 note that vehicles 1 and2 serve as the transportation node both from farm to factoryand factory to market This makes the model closer to realityas some locations in the chain can act as different charactersin food procession

The configuration of the two cases is listed in Table 1Every location in the chain holds 25 and 500 batches in caseof 1 and 2 respectively Total of 60 and 800 thousand of food

8 International Journal of Distributed Sensor Networks

Farm 1

Vehicle 2 Vehicle 3

Vehicle 3 Vehicle 4 Vehicle 5 Vehicle 6

Factory1

Vehicle 1 Vehicle 2

Market 2Market 1 Market 3

Farm 2

Vehicle 7

Factory2

Farm 3

Vehicle 1

Figure 4 Topology (DAG) of a food supply chain case for evaluation (with big data)

products are circulating in the network Which location apiece of food will pass is absolutely random

We build the simulator in C++ which reads in theconfiguration files that describe supply chain topologiesgenerates simulated contamination behavior and senses dataalong food supply chains In simulation the contaminationsource and sampling volume can be set by the users explicitlyFood contamination rules are set as that food can be infectedby passing (1) contamination source directly or (2) crossinfection spots indirectly according to our revised Reed FrostModel

To make this paper compact we only show the trainingprocess of the first model here In the training process weget prior probabilities which highly depend on the topologyand configuration of the chain Figure 5 shows the distribu-tion of infected proportion under different contaminationprobabilities after 300000 tests Under each contaminationprobabilities (depend on contamination type) the actualportion of pollution is almost Gaussian distributed which isthe same as what we discussed in Section 4 Note that thevalue of 119909 axis (119909) should be transferred to the pollutionproportion interval by the following function [(119909 minus 1) lowast4 119909 lowast 4) since the total space is divided into 25 sectionsThe relation between the productsrsquo contamination percentageand the tracing algorithm accuracy under different samplerates is shown in Figure 6 For clarity only 3 samplingrates are tested 3 5 and 10 To make sample strategymore efficient more rates can be evaluated in real situationsFigure 6 confirms the hypothesis we proposed the source isdifficult to be detected if only a small or too large part of foodis contaminated In bothways the flowpath of contaminationis hidden easily

Table 1 Configuration of the two cases

Case 1 Case 2batcheslocation 25 500total products 60000 800000Flow rule Random Random

Table 2 Simulation results of back tracking algorithm

Hit rate False alarm rate96 3

52 Evaluation Results Figure 7 shows the accuracy of thetracing algorithm In different probability of infection in thewhole chain the accuracy can achieve no less than 80 InFigures 8 and 9 partition and dynamic strategy in SDPSare tested respectively In all probabilities of infection casespartition strategy has higher tracing accuracies than thatof global sampling strategy (118 227 109 and 41higher with the infection probabilities of 30 67 80 and90) And compared with sampling in fixed rates (3 5and 10) dynamic method achieves higher tracing accuracyeven with a lower average sampling rate of 78

For backtracking part the result of simulation is shown inTable 2 Both Hit Rate and False Alarm Rate are satisfactory

Case 2 has a large data scale We also fetch a few eachtimes and let the system tell us the amount of samples to getnext time based on (4) Systemrsquos actual sampling rate turnsto be 78 As in Figure 10 the accuracy of tracing algorithmis higher than 80 as well which shows that our proposedapproach works well with big data

International Journal of Distributed Sensor Networks 9

106 8420 12 14 16 18 20 22 24 260

10000

20000

30000

40000

50000

60000

70000

80000

90000

Dist

ribut

ion

of 3

000

00 te

sts

Contamination probability is 30Contamination probability is 67Contamination probability is 80Contamination probability is 90

Pollution proportion interval [(x minus 1)lowast4 xlowast4]

Figure 5 Prior probabilities distribution under different contami-nation probabilities 30 67 80 and 90

0

10

20

30

40

50

60

70

80

90

100

Accu

racy

of t

raci

ng al

gorit

hm (

)

Pollution proportion interval ()

3510

4-5

6-7

8-9

10-1

1

12-1

3

14-1

5

20ndash2

5

30ndash3

5

40ndash4

5

50ndash5

5

60ndash6

5

70ndash7

5

Figure 6 The relationship of tracing algorithm accuracy andpollution proportion intervals under different sample rates 3 5and 10

53 Contamination Visualization With the tool introducedby [24] and the information we choose to record SDPSprovides sampling data that can be represented visually afterbeing tested by sensors The Figures 11 and 12 show the dataflow of infected and uninfected food products respectivelyIn this case we use the configuration in Case 1 and setthe contamination source to be the 4th batch in factory inadvance

10050

0 20 40 60 80

60

70

80

90

100

Probability of infection ()

Trac

ing

accu

racy

()

Figure 7 Accuracy of tracing algorithm with different probabilitiesof infection

40 60 8050

60

70

80

90

100

Probability of infection ()

Partition random samplingGlobal random sampling

Trac

ing

accu

racy

()

Figure 8 Simulation results of partition sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

The vertical lines marked with locations and batchesnumbers represent the nodes in data flow Lines goingthrough these nodes are the traces of foodproducts As shownin those figures most of the infected food while none of theuninfected food passed the 4th batch of factory (the nodewith a circle around it) So it has a great chance to be thesource of contamination which is also proven by our tracingsystem

SDPS makes data concise but still comprehensive whichfacilitates visualization tool displaying the useful informa-tion

Apart from aiding detecting contamination source visu-alization can also help to know contamination conditions(eg contamination severitydistribution) of the whole IoTsystem better For example a well-managed warehouse or acity with lower temperature may lead to less contamination

10 International Journal of Distributed Sensor Networks

Probability of infection ()

Trac

ing

accu

racy

()

3 sampling proportion5 sampling proportion

10 sampling proportionDynamic sampling

40 60 8050

60

70

80

90

100

Figure 9 Simulation results of dynamic sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

Probability of infection ()

Trac

ing

accu

racy

()

500 30 60 90

60

70

80

90

100

Figure 10 Simulation results of tracing accuracywith big data underdifferent infection probabilities

Information of these kinds can be directly read from visual-ization images and help manufacturers to design their foodsupply chain more scientifically

Visualization of the contamination condition in IoTsystemmakes the provenance reasoning in food supply chainintuitive and informative

54 Performance Estimation in Real Situations In realityfood supply chain is more complicated A lot of factors suchas specific food type environment temperaturemanufactureprocess and other parameters could make the chain difficultto predict To implement our strategy into real situationsthose factors should be concerned and some parametersshould be adjusted accordingly

The factor that influences the behavior of food supplychain the most is the type of food Different food has its

own characteristics which may dominate the model theprovenance procedure and expecting results Firstly foodtype can decide the possible contamination source In (1)119875exp and 119875imp are related with the virus that spreads amongfood For example avian influenza virus which is a commoninfectious disease among poultry began to be contiguousamong human beings After the mutation infectious abilityof this virus grew significantly As a result 119875exp and 119875impof avian influenza virus in chickens would also increaseSecondly food type is a deciding factor for its storagepattern and quality guarantee period Some canneddrinks arestacked layer by layer separately so they would not got crosscontaminated However raw meat is generally kept togetherwhich provides an easy environment for virus to spreadThirdly the state of food is also dominant in provenance Onepiece of food in solid state can be seen as a unit while liquidfood like yogurt could be ruined by only one deteriorateddrop In this way for yogurt the sampling process couldbe very different as spatial position should be taken intoconsideration and virusrsquo behavior of liquid should also bestudied

Besides food type there are other factors playing impor-tant roles in real situation Food in summer is more likelyto turn rotten than winter some manufacturing factoriesare more hygienic than others with time passing food mayget easier to be infectious and the types or dosages offood addictives may make the contamination process sloweddown

Although different food supply chain can behave var-iously our proposed strategy can cover most of the casesbecause it obeys the general model of food supply networkand epidemiological principles

6 Conclusion

In this paper we present a heuristic approach to tracingcontamination sources in large IoT systems for complicatedfood supply chains which is a critical issue in metropoli-tan life In our approach Self-adaptive Dynamic PartitionSampling (SDPS) Strategy was proposed to collect data forsensors whose input is only a small portion of end marketsamples from huge volume of samples along food supplychains The approach was illustrated with a case study ofIoT system about provenance in food supply chain whichcan efficiently stop the outbreaks of foodborne disease Withthe intelligent SDPS Strategy objects tested by sensors arethe most reasonable portion of the entire products set Theefficiency is highly improved and the accuracy stays almostthe same as sensing all the objects at the same time SDPSkeeps the integrity of information and approaches a nearlyreal-time examination Also we present a tracing algorithmto find the contamination sources of food supply chainsand a backtracing algorithm to provide strategy for recallingproblematical food undiscovered in the chain It is indicatedin simulation results that our SDPS scheme can achieve up tothe tracing accuracy of 978with a smaller average samplingpercentage compared with traditional global random sam-pling We managed to sample a small portion of food only

International Journal of Distributed Sensor Networks 11

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 11 Visualized data flow of infected food products in food supply chain

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 12 Visualized data flow of uninfected food products in food supply chain

in the end market without loss in accuracy of provenancetracing over the whole IoT system In addition our analyticdata and visualized images can clearly model contaminationconditions in food supply chain within the context of IoTsystem This will give the clients an intuitive impression onfood supply networks in a city

In this paper we assume that all provenance informationof food products is hosted by a centralized repository andthese provenance metadata are organized in a uniform man-ner Our future work is to further make practical implemen-tation of the provenance of food supply chain in a communityas our testing bed for megacity management

12 International Journal of Distributed Sensor Networks

Acknowledgments

This paper is sponsored in part by the Shanghai Interna-tional Science and Technology Collaboration Program underGrant 13430710400 and Campus for Research Excellenceand Technological Enterprise (CREATE) program of Singa-pore National Research Foundation under the joint projecton Energy and Environmental Sustainability Solutions forMegacities (R-706-000-101-281) by Shanghai Jiao Tong Uni-versity (SJTU) and National University of Singapore (NUS)ProfessorQiu is partially supported byNSFCNS-1249223 andNSFC 61071061

References

[1] ldquoWikipedia on smart cityrdquo httpenwikipediaorgwikiSmartcity

[2] M Qiu and E H M Sha ldquoCost minimization while satisfy-ing hardsoft timing constraints for heterogeneous embeddedsystemsrdquoACMTransactions on Design Automation of ElectronicSystems vol 14 no 2 article 25 2009

[3] J Li M Qiu Z Ming G Quan X Qin and Z Gu ldquoOnlineoptimization for scheduling preemptable tasks on IaaS cloudsystemsrdquo Journal of Parallel and Distributed Computing vol 72no 5 pp 666ndash677 2012

[4] K Su J Li and H Fu ldquoSmart city and the applicationsrdquoin Proceedings of the International Conference on ElectronicsCommunications and Control (ICECC rsquo2011) pp 1028ndash1033Zhejiang China September 2011

[5] X Tang J Pu K Cao Y Zhang and Z Xiong ldquoIntegratedextensible simulation platform for vehicular sensor networksin smart citiesrdquo International Journal of Distributed SensorNetworks vol 2012 Article ID 860415 10 pages 2012

[6] P Vlacheas R Giaffreda V Stavroulaki et al ldquoEnabling smartcities through a cognitive management framework for theinternet of thingsrdquo IEEE Communications Magazine vol 51 no6 pp 102ndash111 2013

[7] A Asin ldquoSmart cities from libelium allows systems integratorsto monitor noise pollution structural health and waste man-agementrdquo Smart Cities Articles 2011

[8] Kevin Ashton ldquoThat ldquointernet of thingsrdquo thingrdquo RFID Journal2011

[9] P Magrassi and T Berg ldquoA world of smart objectsrdquo GartnerResearch Report TR-17-2243 2002

[10] Oxford English Dictionary (OED) ldquoThe fact of comingfrom some particular source or quarter source derivationrdquohttpenwikipediaorgwikiProvenance

[11] AV Roth AA TsayM E Pullman and J VGray ldquoUnravelingthe food supply chain strategic insights from China and the2007 recallsrdquo Journal of Supply Chain Management vol 44 no1 pp 22ndash39 2008

[12] S Miles P Groth S Munroe and L Moreau ldquoPrime amethodology for developing provenance-aware applicationsrdquoACM Transactions on Software Engineering and Methodologyvol 20 no 3 article 8 2011

[13] R Hasan R Sion and M Winsltt ldquoThe case of the fakePicasso preventing history forgery with secure provenancerdquo inProceedings of the 7th Conference on File the Storage Technologies(FAST rsquo09) pp 1ndash14 New York NY USA December 2009

[14] T A McMeekin J Baranyi J Bowman et al ldquoInformationsystems in food safety managementrdquo International Journal ofFood Microbiology vol 112 no 3 pp 181ndash194 2006

[15] M A van der Gaag F Vos H W Saatkamp M van Boven Pvan Beek and R B M Huirne ldquoA state-transition simulationmodel for the spread of Salmonella in the pork supply chainrdquoEuropean Journal of Operational Research vol 156 no 3 pp782ndash798 2004

[16] L MWein and Y Liu ldquoAnalyzing a bioterror attack on the foodsupply the case of botulinum toxin in milkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 102 no 28 pp 9984ndash9989 2005

[17] L Qin andQ SWang ldquoFood supply chain qualitymanagementmodel and simulation based on gamerdquo in Proceedings of theInternational Conference on Computer Modeling and Simulation(ICCMS rsquo09) pp 291ndash293 Macau China February 2009

[18] Q Zhang D Wang T Huang et al ldquoModelling provenance infood supply chain to track and trace foodborne diseaserdquo in Pro-ceedings of the International Conference on Computer Modelingand Simulation pp 69ndash75 Hong Kong China February 2012

[19] S Li L Xu and X Wang ldquoCompressed sensing signal and dataacquisitio in wireless sensorrdquo IEEE Transactions on IndustrialInformatics 2012

[20] Z Ding and X Gao ldquoA database cluster system frameworkfor managing massive sensor sampling data in the internet ofthingsrdquo Chinese Journal of Computers vol 35 no 6 pp 1175ndash1191 2012

[21] L Zhang J Liu and H Jiang ldquoEnergy-efficient locationtracking with smartphones for IoTrdquo in Proceedings of the IEEESensors pp 1ndash4 Taipei China October 2012

[22] H Abbey ldquoAn examination of the Reed-Frost theory of epi-demicsrdquo Human Biology vol 24 no 3 pp 201ndash233 1952

[23] L Elveback J P Fox and A Varma ldquoAn extension of tee reed-frost epidemicmodel for the study of competition between viralagents in the presence of interferencerdquoThe American Journal ofEpidemiology vol 80 no 3 pp 356ndash364 1964

[24] X Yuan H Guo H Xiao Z Wang and X Zhang ldquoHigh-dimensional data virtualizationrdquo in Proceedings of the Commu-nications of the CCF pp 13ndash16 April 2011

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Research Article A Case Study of Sensor Data Collection ...downloads.hindawi.com/journals/ijdsn/2013/382132.pdf · smart cities. Our major contributionsare as follows. We propose

4 International Journal of Distributed Sensor Networks

(1) Input type of foodborne disease T(2) Output sample set(3) Training Phase(4) Look up contamination probability p according to T(5) Configuration Topology information (contamination intervals) p(6) Sampling Phase(7) Sample a small portion n(8) BEST = infin(9) while 119899 le BEST do(10) Compute posterior probability based on Bayesian Estimation(11) 119875(119861 | 119860

119894) = (119896

119899)119886119896

119894(1 minus 119886

119894)119899minus119896

(12) 119875(119860119894| 119861) =

119875(119861 | 119860119894)119875(119860119894)

sum119899

119895=1119875(119861 | 119860

119895)119875(119860119895)

(13) for 119894 = 1(contamination intervals) do(14) if 119875interval

119894ge BEST then

(15) BEST = 119875interval119894

(16) end if(17) end for(18) If BEST le 80 then(19) Find the best sample rate according to its relationship with BEST(20) If 119899 le BEST then(21) Sample (BEST-n) products(22) else(23) break(24) end if(25) end if(26) end while

Algorithm 1 Sampling algorithm

Records of each location and product are documentedrespectively Location records include batch numbers thenumber of sampled products labeled as GOOD (uninfected)or BAD (infected) in this batch and the IDs of pollutedsamples in this batch Product records contain the informa-tion of examination result for a piece of food the orders ofbatches and locations the product passed and the pointers tothese location records The pointers serve as the connectionbetween those two data structures

33 Logic Structure of IoT System for Food Supply ChainsAs shown in Figure 2 the hierarchy of this IoT systemcontains four layers data collection and management layerintelligent processing layer graphic representation layer andself-correction layer Specific approachingmethods and algo-rithms will be discussed in the following sections

4 Heuristic Provenance Approach andTracing and BackTracking Algorithms

In this section detailed approaches and algorithms to solveprovenance issues in food supply chain are introducedFirstly we present a Self-adaptive Dynamic Partition Sam-pling (SDPS) Strategy to improve the efficiency and intel-ligence of sensor data collection and management Thentracing and backtracking algorithms are discussed respec-tively to catch the contamination source and dig out potentialinfected food products still circulating in themarkets Finallywe introduce Self-CorrectionMethod tomaintain andupdate

Data collection and management layer

Intelligent processing layer

Graphic representation layer

Self-correction layer

Training LUT Sampling Sample more

Tracing origin

Back-tracking

Datavisualization

Confidence metric metric

Success

Figure 2 Logic structure of IoT systemmodel for food supply chain

the system which would make the system adaptive andflexible to certain applications

41 Self-Adaptive Dynamic Partition Sampling Strategy Aswe mentioned in Section 31 to improve the efficiency ofmanipulating sensor data Self-adaptive Dynamic PartitionSampling Strategy (SDPS) is introduced which cuts downthe number of samples in a great deal The pseudocode forsampling algorithm is shown in Algorithm 1

International Journal of Distributed Sensor Networks 5

411 Partition Strategy Partition strategy makes samplesmore general and representative In this case the systemdivides the whole group of products into several partsaccording to the batches they belong to in the end marketsThe sampled volume for batch 119899 of market 119898 is determinedby

sample(119898119899)= sampletotal times

products(119898119899)

sum119872

119898=0sum119873

119899=0products

(119898119899)

(2)

Here 119872 and 119873 are the total number of end marketsand batches in network Subscripts (119898 119899) and total mean thenumber of samples or products in batch 119899 market 119898 and inall batches for all end markets respectively

412 Dynamic Strategy Dynamic strategy based on Bayesianestimation is adopted to achieve minimal sample vol-ume According to the infection probability of a particularpathogen determined by medical experiments the modelcan be trained to gain the distribution of total infectionprobability within the whole food supply network which isthe prior probability The probability density function of thedistribution is presented as a function of infection probabilityintervals

On the other hand after sampling a small part theinfectious rate of the samples the posterior probabilitiescan be obtained If 119896 infected products are found within 119899samples under a certain contamination percentage intervalwith prior probability of 119886

119894 conditional probability is

obtained by binomial distribution in the following

119875 (119861 | 119860119894) = (119899

119896) 119886119896

119894(1 minus 119886

119894)119899minus119896=119899

119896 (119899 minus 119896)119886119896

119894(1 minus 119886

119894)119899minus119896

(3)

Here119860119894means the event that the contamination percent-

age of the whole products falls into the 119894th interval with aprior probability of 119886

119894 and 119861means the event that we find 119896

contaminated products in 119899 samplesAfter that Bayesian Formula (4) is applied to combine

prior probabilities with posterior probabilities and get revisedprobabilities which describe the specific environment betteras follows

119875 (119860119894| 119861) =

119875 (119861 | 119860119894) 119875 (119860

119894)

sum119899

119895=1119875 (119861 | 119860

119894) 119875 (119860

119894) (4)

413 Self-Adaptive Strategy The tracing algorithmwhichwillbe discussed in the next section has some requirements forits input sampling data If the ratio of infected products touninfected ones is too high the tracing algorithm performspoorly as there are not enough healthy samples to exclude thesuspicions On the contrary if the ratio is too low the noiseintroduced by sampling process may dominate the resultIn these two extreme cases more samples than other casesshould be tested to improve the accuracy Hence under eachsampling rate there is a relationship between the best tracingalgorithm accuracy and pollution proportion interval for

a particular topology Given all the relationships in a specificinterval for economical reasons this strategy picks up thesmallest sampling rate that achieves certain requirements(eg 90 accuracy)Then we sample the food in endmarketagain under that rate and update Bayesian Estimation to findif the sampling rate has met the requirements

42 Tracing Algorithm Pseudocodes of tracing algorithm areshown inAlgorithm 2After sampling and sensing we add upthe number of infected and uninfected food products passingevery location and batch They are stored in two variablesGOOD and BAD for each place

Supposing that the samples properly reveal the conditionof the whole products set the criterion to find the suspectsources is set as GOOD lt 120576 and BAD gt 0 It is feasiblebecause the contamination source would be the primaryspot generating polluted food and the number of uninfectedsamples is limited there 120576 regarded as the error factoris a small integer which enables the algorithm to remainvalid when not all the food passing the source is infectedor there is some disturbance caused by nonideal problems(eg imperfect sampling) The specific 120576 value is decided bythe samplesrsquo number and infection probability of pollutantsource which can be roughly represented as follows

120576 =samples times pollution probability

batches (5)

This criterion is not strict enough to pinpoint only onecontamination source as the result So extra work shouldbe applied to eliminate these confusion suspects First of allto improve the speed of the algorithm suspects with smallBAD value will be excluded Then the system will generate aSuspect Tree composed of the suspected locations and batchesaccording to their order in the food supply chain After thattraverse the Suspect Tree layer by layer and the first node thatmeets the same criterion will be picked up as the root sourcesince the original contaminant is always on the top over crosscontaminant in the tree

43 BackTracking Algorithm In order to judge the perfor-mance of backtracking algorithm Hit Rate and False AlarmRate are put forwards to denote the algorithmrsquos ability ofcapturing infected products and the probability of reckoninggood products as infected ones by mistake Supposing thetotal number of products and infected products are119873 and 119868respectively and the algorithm selected 119899 potentially infectedproducts including 119894 infected ones we define Hit Rate as 119894119868andFalse AlarmRate as (119899minus119894)(119873minus119868) Although theoreticallyboth high Hit Rate and low False Alarm Rate are expectedthere is a tradeoff between them

The backtracking algorithm is described in Algorithm 3

44 Self CorrectionMethod Twometrics are defined to judgethe performance of the system and provide reference for latterparametersrsquo settings

6 International Journal of Distributed Sensor Networks

(1) Input samplesrsquo spatial information and examination results(2) Output contamination origin(3) 119896 = 0(4) for 119894 = 1samples do(5) for 119895 = 1(locationsbatches on sample

119894rsquos path) do

(6) if sample119894is infected then

(7) sample119894location

119895batch

119895BAD++

(8) else(9) sample

119894location

119895batch

119895GOOD++

(10) end if(11) end for(12) end for(13) for 119898 = 1(locationsbatches in the entire chain) do(14) if location

119898batch

119898GOOD le 120576 ampamp location

119898batch

119898BAD ge 0 then

(15) Record locationbatch into suspect[k](16) k++(17) end if(18) end for(19) Exclude suspects wsmall BAD(20) if suspect ge 1 then(21) Get food IDs passed all suspects(22) if ID ge 0 then(23) Get food IDs passed at least one suspect(24) end if(25) end if(26) Construct ldquoSuspect Treerdquo of batches according to the paths of these IDs(27) for 119899 = 1(tree nodes) do(28) if (suspect[n]location

119899batch

119899GOOD le 120576 ampamp

(29) suspect[n]location119899batch

119899BAD ge 0) then

(30) origin = suspect[119899](31) end if(32) end for

Algorithm 2 Tracing algorithm

(1) Input contaminated samples set Re-check(2) Output infected food products set Bad(3) while (1) do(4) Construct a tree of locationbatch according to(5) the paths of contaminated products in Re-check(6) Traverse the tree DFS(7) Record all nodes in Bad(8) Empty Re-check(9) if nodelocationbatch is new then(10) Find the food IDs passed these nodes(11) Sensor them(12) if food is contaminated then(13) Put its ID in Re-check(14) end if(15) else(16) break(17) end if(18) end while

Algorithm 3 Back tracking algorithm

International Journal of Distributed Sensor Networks 7

Farm

Vehicle 1

Vehicle 1

Vehicle 2

Vehicle 2 Vehicle 3

Factory

Vehicle 4 Vehicle 5

Market 2Market 1 Market 3

Figure 3 Topology (DAG) of a food supply chain case for evaluation

Confidence Metric (CM) is defined as the differencebetween prior probability and posterior probability as fol-lows

CM =10038161003816100381610038161003816119875post minus 119875pri

10038161003816100381610038161003816

119875pri (6)

119875post and 119875pri are the posterior and prior possibilitiesrespectively If CM is small we are more confident that thedataset suits the model trained previously and vice versaThus we can correct (4) and combine prior and posteriorprobabilities to get more reasonable infection probabilities119875comb of the entire network as follows

119875comb =119875 (119860119894| 119861) + CM times 119875 (119860

119894)

1 + CM (7)

Success Metric (SM)measures the accuracy of the systemas follows

SM = successtotal (8)

It is defined as the ratio of successfully detected times tototal tested times With lower SM the criterion of samplingwould be set stricter and vice versa

These two variables help to adjust sampling algorithmslightly to fit it into certain environment and applications

45 Timing and Space Complexities Suppose there are 119898samples 119899 stages and 119897 batches for all locations in a food

supply chain timing complexities of tracing and back-tracking algorithms are 119874(119898119899) + 119874(119897) and 119874(119898) + 119874(119897)respectively There exists a tradeoff between tracing accuracyand time consumption in SDPS Obviously more samplesmean longer time and better knowledge of the networkCompared with the time spent on chemical testing andsensing time consumption in SDPS is negligible

A piece of record is required for every location and everyfood products so space complexity of the whole IoT systemis 119874(119886 + 119887) where 119886 is the number of food products and 119887 isthe number of locations in the network

5 Evaluation Results and Analysis

We set up two specific cases (Figures 3 and 4) to evaluate theproposed systemThe first case gives a general evaluation andshows that our SDPS scheme outperforms other samplingmethods while the second one focuses on the performanceon large system and big data

51 Experimental Setup In Figure 3 note that vehicles 1 and2 serve as the transportation node both from farm to factoryand factory to market This makes the model closer to realityas some locations in the chain can act as different charactersin food procession

The configuration of the two cases is listed in Table 1Every location in the chain holds 25 and 500 batches in caseof 1 and 2 respectively Total of 60 and 800 thousand of food

8 International Journal of Distributed Sensor Networks

Farm 1

Vehicle 2 Vehicle 3

Vehicle 3 Vehicle 4 Vehicle 5 Vehicle 6

Factory1

Vehicle 1 Vehicle 2

Market 2Market 1 Market 3

Farm 2

Vehicle 7

Factory2

Farm 3

Vehicle 1

Figure 4 Topology (DAG) of a food supply chain case for evaluation (with big data)

products are circulating in the network Which location apiece of food will pass is absolutely random

We build the simulator in C++ which reads in theconfiguration files that describe supply chain topologiesgenerates simulated contamination behavior and senses dataalong food supply chains In simulation the contaminationsource and sampling volume can be set by the users explicitlyFood contamination rules are set as that food can be infectedby passing (1) contamination source directly or (2) crossinfection spots indirectly according to our revised Reed FrostModel

To make this paper compact we only show the trainingprocess of the first model here In the training process weget prior probabilities which highly depend on the topologyand configuration of the chain Figure 5 shows the distribu-tion of infected proportion under different contaminationprobabilities after 300000 tests Under each contaminationprobabilities (depend on contamination type) the actualportion of pollution is almost Gaussian distributed which isthe same as what we discussed in Section 4 Note that thevalue of 119909 axis (119909) should be transferred to the pollutionproportion interval by the following function [(119909 minus 1) lowast4 119909 lowast 4) since the total space is divided into 25 sectionsThe relation between the productsrsquo contamination percentageand the tracing algorithm accuracy under different samplerates is shown in Figure 6 For clarity only 3 samplingrates are tested 3 5 and 10 To make sample strategymore efficient more rates can be evaluated in real situationsFigure 6 confirms the hypothesis we proposed the source isdifficult to be detected if only a small or too large part of foodis contaminated In bothways the flowpath of contaminationis hidden easily

Table 1 Configuration of the two cases

Case 1 Case 2batcheslocation 25 500total products 60000 800000Flow rule Random Random

Table 2 Simulation results of back tracking algorithm

Hit rate False alarm rate96 3

52 Evaluation Results Figure 7 shows the accuracy of thetracing algorithm In different probability of infection in thewhole chain the accuracy can achieve no less than 80 InFigures 8 and 9 partition and dynamic strategy in SDPSare tested respectively In all probabilities of infection casespartition strategy has higher tracing accuracies than thatof global sampling strategy (118 227 109 and 41higher with the infection probabilities of 30 67 80 and90) And compared with sampling in fixed rates (3 5and 10) dynamic method achieves higher tracing accuracyeven with a lower average sampling rate of 78

For backtracking part the result of simulation is shown inTable 2 Both Hit Rate and False Alarm Rate are satisfactory

Case 2 has a large data scale We also fetch a few eachtimes and let the system tell us the amount of samples to getnext time based on (4) Systemrsquos actual sampling rate turnsto be 78 As in Figure 10 the accuracy of tracing algorithmis higher than 80 as well which shows that our proposedapproach works well with big data

International Journal of Distributed Sensor Networks 9

106 8420 12 14 16 18 20 22 24 260

10000

20000

30000

40000

50000

60000

70000

80000

90000

Dist

ribut

ion

of 3

000

00 te

sts

Contamination probability is 30Contamination probability is 67Contamination probability is 80Contamination probability is 90

Pollution proportion interval [(x minus 1)lowast4 xlowast4]

Figure 5 Prior probabilities distribution under different contami-nation probabilities 30 67 80 and 90

0

10

20

30

40

50

60

70

80

90

100

Accu

racy

of t

raci

ng al

gorit

hm (

)

Pollution proportion interval ()

3510

4-5

6-7

8-9

10-1

1

12-1

3

14-1

5

20ndash2

5

30ndash3

5

40ndash4

5

50ndash5

5

60ndash6

5

70ndash7

5

Figure 6 The relationship of tracing algorithm accuracy andpollution proportion intervals under different sample rates 3 5and 10

53 Contamination Visualization With the tool introducedby [24] and the information we choose to record SDPSprovides sampling data that can be represented visually afterbeing tested by sensors The Figures 11 and 12 show the dataflow of infected and uninfected food products respectivelyIn this case we use the configuration in Case 1 and setthe contamination source to be the 4th batch in factory inadvance

10050

0 20 40 60 80

60

70

80

90

100

Probability of infection ()

Trac

ing

accu

racy

()

Figure 7 Accuracy of tracing algorithm with different probabilitiesof infection

40 60 8050

60

70

80

90

100

Probability of infection ()

Partition random samplingGlobal random sampling

Trac

ing

accu

racy

()

Figure 8 Simulation results of partition sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

The vertical lines marked with locations and batchesnumbers represent the nodes in data flow Lines goingthrough these nodes are the traces of foodproducts As shownin those figures most of the infected food while none of theuninfected food passed the 4th batch of factory (the nodewith a circle around it) So it has a great chance to be thesource of contamination which is also proven by our tracingsystem

SDPS makes data concise but still comprehensive whichfacilitates visualization tool displaying the useful informa-tion

Apart from aiding detecting contamination source visu-alization can also help to know contamination conditions(eg contamination severitydistribution) of the whole IoTsystem better For example a well-managed warehouse or acity with lower temperature may lead to less contamination

10 International Journal of Distributed Sensor Networks

Probability of infection ()

Trac

ing

accu

racy

()

3 sampling proportion5 sampling proportion

10 sampling proportionDynamic sampling

40 60 8050

60

70

80

90

100

Figure 9 Simulation results of dynamic sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

Probability of infection ()

Trac

ing

accu

racy

()

500 30 60 90

60

70

80

90

100

Figure 10 Simulation results of tracing accuracywith big data underdifferent infection probabilities

Information of these kinds can be directly read from visual-ization images and help manufacturers to design their foodsupply chain more scientifically

Visualization of the contamination condition in IoTsystemmakes the provenance reasoning in food supply chainintuitive and informative

54 Performance Estimation in Real Situations In realityfood supply chain is more complicated A lot of factors suchas specific food type environment temperaturemanufactureprocess and other parameters could make the chain difficultto predict To implement our strategy into real situationsthose factors should be concerned and some parametersshould be adjusted accordingly

The factor that influences the behavior of food supplychain the most is the type of food Different food has its

own characteristics which may dominate the model theprovenance procedure and expecting results Firstly foodtype can decide the possible contamination source In (1)119875exp and 119875imp are related with the virus that spreads amongfood For example avian influenza virus which is a commoninfectious disease among poultry began to be contiguousamong human beings After the mutation infectious abilityof this virus grew significantly As a result 119875exp and 119875impof avian influenza virus in chickens would also increaseSecondly food type is a deciding factor for its storagepattern and quality guarantee period Some canneddrinks arestacked layer by layer separately so they would not got crosscontaminated However raw meat is generally kept togetherwhich provides an easy environment for virus to spreadThirdly the state of food is also dominant in provenance Onepiece of food in solid state can be seen as a unit while liquidfood like yogurt could be ruined by only one deteriorateddrop In this way for yogurt the sampling process couldbe very different as spatial position should be taken intoconsideration and virusrsquo behavior of liquid should also bestudied

Besides food type there are other factors playing impor-tant roles in real situation Food in summer is more likelyto turn rotten than winter some manufacturing factoriesare more hygienic than others with time passing food mayget easier to be infectious and the types or dosages offood addictives may make the contamination process sloweddown

Although different food supply chain can behave var-iously our proposed strategy can cover most of the casesbecause it obeys the general model of food supply networkand epidemiological principles

6 Conclusion

In this paper we present a heuristic approach to tracingcontamination sources in large IoT systems for complicatedfood supply chains which is a critical issue in metropoli-tan life In our approach Self-adaptive Dynamic PartitionSampling (SDPS) Strategy was proposed to collect data forsensors whose input is only a small portion of end marketsamples from huge volume of samples along food supplychains The approach was illustrated with a case study ofIoT system about provenance in food supply chain whichcan efficiently stop the outbreaks of foodborne disease Withthe intelligent SDPS Strategy objects tested by sensors arethe most reasonable portion of the entire products set Theefficiency is highly improved and the accuracy stays almostthe same as sensing all the objects at the same time SDPSkeeps the integrity of information and approaches a nearlyreal-time examination Also we present a tracing algorithmto find the contamination sources of food supply chainsand a backtracing algorithm to provide strategy for recallingproblematical food undiscovered in the chain It is indicatedin simulation results that our SDPS scheme can achieve up tothe tracing accuracy of 978with a smaller average samplingpercentage compared with traditional global random sam-pling We managed to sample a small portion of food only

International Journal of Distributed Sensor Networks 11

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 11 Visualized data flow of infected food products in food supply chain

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 12 Visualized data flow of uninfected food products in food supply chain

in the end market without loss in accuracy of provenancetracing over the whole IoT system In addition our analyticdata and visualized images can clearly model contaminationconditions in food supply chain within the context of IoTsystem This will give the clients an intuitive impression onfood supply networks in a city

In this paper we assume that all provenance informationof food products is hosted by a centralized repository andthese provenance metadata are organized in a uniform man-ner Our future work is to further make practical implemen-tation of the provenance of food supply chain in a communityas our testing bed for megacity management

12 International Journal of Distributed Sensor Networks

Acknowledgments

This paper is sponsored in part by the Shanghai Interna-tional Science and Technology Collaboration Program underGrant 13430710400 and Campus for Research Excellenceand Technological Enterprise (CREATE) program of Singa-pore National Research Foundation under the joint projecton Energy and Environmental Sustainability Solutions forMegacities (R-706-000-101-281) by Shanghai Jiao Tong Uni-versity (SJTU) and National University of Singapore (NUS)ProfessorQiu is partially supported byNSFCNS-1249223 andNSFC 61071061

References

[1] ldquoWikipedia on smart cityrdquo httpenwikipediaorgwikiSmartcity

[2] M Qiu and E H M Sha ldquoCost minimization while satisfy-ing hardsoft timing constraints for heterogeneous embeddedsystemsrdquoACMTransactions on Design Automation of ElectronicSystems vol 14 no 2 article 25 2009

[3] J Li M Qiu Z Ming G Quan X Qin and Z Gu ldquoOnlineoptimization for scheduling preemptable tasks on IaaS cloudsystemsrdquo Journal of Parallel and Distributed Computing vol 72no 5 pp 666ndash677 2012

[4] K Su J Li and H Fu ldquoSmart city and the applicationsrdquoin Proceedings of the International Conference on ElectronicsCommunications and Control (ICECC rsquo2011) pp 1028ndash1033Zhejiang China September 2011

[5] X Tang J Pu K Cao Y Zhang and Z Xiong ldquoIntegratedextensible simulation platform for vehicular sensor networksin smart citiesrdquo International Journal of Distributed SensorNetworks vol 2012 Article ID 860415 10 pages 2012

[6] P Vlacheas R Giaffreda V Stavroulaki et al ldquoEnabling smartcities through a cognitive management framework for theinternet of thingsrdquo IEEE Communications Magazine vol 51 no6 pp 102ndash111 2013

[7] A Asin ldquoSmart cities from libelium allows systems integratorsto monitor noise pollution structural health and waste man-agementrdquo Smart Cities Articles 2011

[8] Kevin Ashton ldquoThat ldquointernet of thingsrdquo thingrdquo RFID Journal2011

[9] P Magrassi and T Berg ldquoA world of smart objectsrdquo GartnerResearch Report TR-17-2243 2002

[10] Oxford English Dictionary (OED) ldquoThe fact of comingfrom some particular source or quarter source derivationrdquohttpenwikipediaorgwikiProvenance

[11] AV Roth AA TsayM E Pullman and J VGray ldquoUnravelingthe food supply chain strategic insights from China and the2007 recallsrdquo Journal of Supply Chain Management vol 44 no1 pp 22ndash39 2008

[12] S Miles P Groth S Munroe and L Moreau ldquoPrime amethodology for developing provenance-aware applicationsrdquoACM Transactions on Software Engineering and Methodologyvol 20 no 3 article 8 2011

[13] R Hasan R Sion and M Winsltt ldquoThe case of the fakePicasso preventing history forgery with secure provenancerdquo inProceedings of the 7th Conference on File the Storage Technologies(FAST rsquo09) pp 1ndash14 New York NY USA December 2009

[14] T A McMeekin J Baranyi J Bowman et al ldquoInformationsystems in food safety managementrdquo International Journal ofFood Microbiology vol 112 no 3 pp 181ndash194 2006

[15] M A van der Gaag F Vos H W Saatkamp M van Boven Pvan Beek and R B M Huirne ldquoA state-transition simulationmodel for the spread of Salmonella in the pork supply chainrdquoEuropean Journal of Operational Research vol 156 no 3 pp782ndash798 2004

[16] L MWein and Y Liu ldquoAnalyzing a bioterror attack on the foodsupply the case of botulinum toxin in milkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 102 no 28 pp 9984ndash9989 2005

[17] L Qin andQ SWang ldquoFood supply chain qualitymanagementmodel and simulation based on gamerdquo in Proceedings of theInternational Conference on Computer Modeling and Simulation(ICCMS rsquo09) pp 291ndash293 Macau China February 2009

[18] Q Zhang D Wang T Huang et al ldquoModelling provenance infood supply chain to track and trace foodborne diseaserdquo in Pro-ceedings of the International Conference on Computer Modelingand Simulation pp 69ndash75 Hong Kong China February 2012

[19] S Li L Xu and X Wang ldquoCompressed sensing signal and dataacquisitio in wireless sensorrdquo IEEE Transactions on IndustrialInformatics 2012

[20] Z Ding and X Gao ldquoA database cluster system frameworkfor managing massive sensor sampling data in the internet ofthingsrdquo Chinese Journal of Computers vol 35 no 6 pp 1175ndash1191 2012

[21] L Zhang J Liu and H Jiang ldquoEnergy-efficient locationtracking with smartphones for IoTrdquo in Proceedings of the IEEESensors pp 1ndash4 Taipei China October 2012

[22] H Abbey ldquoAn examination of the Reed-Frost theory of epi-demicsrdquo Human Biology vol 24 no 3 pp 201ndash233 1952

[23] L Elveback J P Fox and A Varma ldquoAn extension of tee reed-frost epidemicmodel for the study of competition between viralagents in the presence of interferencerdquoThe American Journal ofEpidemiology vol 80 no 3 pp 356ndash364 1964

[24] X Yuan H Guo H Xiao Z Wang and X Zhang ldquoHigh-dimensional data virtualizationrdquo in Proceedings of the Commu-nications of the CCF pp 13ndash16 April 2011

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Research Article A Case Study of Sensor Data Collection ...downloads.hindawi.com/journals/ijdsn/2013/382132.pdf · smart cities. Our major contributionsare as follows. We propose

International Journal of Distributed Sensor Networks 5

411 Partition Strategy Partition strategy makes samplesmore general and representative In this case the systemdivides the whole group of products into several partsaccording to the batches they belong to in the end marketsThe sampled volume for batch 119899 of market 119898 is determinedby

sample(119898119899)= sampletotal times

products(119898119899)

sum119872

119898=0sum119873

119899=0products

(119898119899)

(2)

Here 119872 and 119873 are the total number of end marketsand batches in network Subscripts (119898 119899) and total mean thenumber of samples or products in batch 119899 market 119898 and inall batches for all end markets respectively

412 Dynamic Strategy Dynamic strategy based on Bayesianestimation is adopted to achieve minimal sample vol-ume According to the infection probability of a particularpathogen determined by medical experiments the modelcan be trained to gain the distribution of total infectionprobability within the whole food supply network which isthe prior probability The probability density function of thedistribution is presented as a function of infection probabilityintervals

On the other hand after sampling a small part theinfectious rate of the samples the posterior probabilitiescan be obtained If 119896 infected products are found within 119899samples under a certain contamination percentage intervalwith prior probability of 119886

119894 conditional probability is

obtained by binomial distribution in the following

119875 (119861 | 119860119894) = (119899

119896) 119886119896

119894(1 minus 119886

119894)119899minus119896=119899

119896 (119899 minus 119896)119886119896

119894(1 minus 119886

119894)119899minus119896

(3)

Here119860119894means the event that the contamination percent-

age of the whole products falls into the 119894th interval with aprior probability of 119886

119894 and 119861means the event that we find 119896

contaminated products in 119899 samplesAfter that Bayesian Formula (4) is applied to combine

prior probabilities with posterior probabilities and get revisedprobabilities which describe the specific environment betteras follows

119875 (119860119894| 119861) =

119875 (119861 | 119860119894) 119875 (119860

119894)

sum119899

119895=1119875 (119861 | 119860

119894) 119875 (119860

119894) (4)

413 Self-Adaptive Strategy The tracing algorithmwhichwillbe discussed in the next section has some requirements forits input sampling data If the ratio of infected products touninfected ones is too high the tracing algorithm performspoorly as there are not enough healthy samples to exclude thesuspicions On the contrary if the ratio is too low the noiseintroduced by sampling process may dominate the resultIn these two extreme cases more samples than other casesshould be tested to improve the accuracy Hence under eachsampling rate there is a relationship between the best tracingalgorithm accuracy and pollution proportion interval for

a particular topology Given all the relationships in a specificinterval for economical reasons this strategy picks up thesmallest sampling rate that achieves certain requirements(eg 90 accuracy)Then we sample the food in endmarketagain under that rate and update Bayesian Estimation to findif the sampling rate has met the requirements

42 Tracing Algorithm Pseudocodes of tracing algorithm areshown inAlgorithm 2After sampling and sensing we add upthe number of infected and uninfected food products passingevery location and batch They are stored in two variablesGOOD and BAD for each place

Supposing that the samples properly reveal the conditionof the whole products set the criterion to find the suspectsources is set as GOOD lt 120576 and BAD gt 0 It is feasiblebecause the contamination source would be the primaryspot generating polluted food and the number of uninfectedsamples is limited there 120576 regarded as the error factoris a small integer which enables the algorithm to remainvalid when not all the food passing the source is infectedor there is some disturbance caused by nonideal problems(eg imperfect sampling) The specific 120576 value is decided bythe samplesrsquo number and infection probability of pollutantsource which can be roughly represented as follows

120576 =samples times pollution probability

batches (5)

This criterion is not strict enough to pinpoint only onecontamination source as the result So extra work shouldbe applied to eliminate these confusion suspects First of allto improve the speed of the algorithm suspects with smallBAD value will be excluded Then the system will generate aSuspect Tree composed of the suspected locations and batchesaccording to their order in the food supply chain After thattraverse the Suspect Tree layer by layer and the first node thatmeets the same criterion will be picked up as the root sourcesince the original contaminant is always on the top over crosscontaminant in the tree

43 BackTracking Algorithm In order to judge the perfor-mance of backtracking algorithm Hit Rate and False AlarmRate are put forwards to denote the algorithmrsquos ability ofcapturing infected products and the probability of reckoninggood products as infected ones by mistake Supposing thetotal number of products and infected products are119873 and 119868respectively and the algorithm selected 119899 potentially infectedproducts including 119894 infected ones we define Hit Rate as 119894119868andFalse AlarmRate as (119899minus119894)(119873minus119868) Although theoreticallyboth high Hit Rate and low False Alarm Rate are expectedthere is a tradeoff between them

The backtracking algorithm is described in Algorithm 3

44 Self CorrectionMethod Twometrics are defined to judgethe performance of the system and provide reference for latterparametersrsquo settings

6 International Journal of Distributed Sensor Networks

(1) Input samplesrsquo spatial information and examination results(2) Output contamination origin(3) 119896 = 0(4) for 119894 = 1samples do(5) for 119895 = 1(locationsbatches on sample

119894rsquos path) do

(6) if sample119894is infected then

(7) sample119894location

119895batch

119895BAD++

(8) else(9) sample

119894location

119895batch

119895GOOD++

(10) end if(11) end for(12) end for(13) for 119898 = 1(locationsbatches in the entire chain) do(14) if location

119898batch

119898GOOD le 120576 ampamp location

119898batch

119898BAD ge 0 then

(15) Record locationbatch into suspect[k](16) k++(17) end if(18) end for(19) Exclude suspects wsmall BAD(20) if suspect ge 1 then(21) Get food IDs passed all suspects(22) if ID ge 0 then(23) Get food IDs passed at least one suspect(24) end if(25) end if(26) Construct ldquoSuspect Treerdquo of batches according to the paths of these IDs(27) for 119899 = 1(tree nodes) do(28) if (suspect[n]location

119899batch

119899GOOD le 120576 ampamp

(29) suspect[n]location119899batch

119899BAD ge 0) then

(30) origin = suspect[119899](31) end if(32) end for

Algorithm 2 Tracing algorithm

(1) Input contaminated samples set Re-check(2) Output infected food products set Bad(3) while (1) do(4) Construct a tree of locationbatch according to(5) the paths of contaminated products in Re-check(6) Traverse the tree DFS(7) Record all nodes in Bad(8) Empty Re-check(9) if nodelocationbatch is new then(10) Find the food IDs passed these nodes(11) Sensor them(12) if food is contaminated then(13) Put its ID in Re-check(14) end if(15) else(16) break(17) end if(18) end while

Algorithm 3 Back tracking algorithm

International Journal of Distributed Sensor Networks 7

Farm

Vehicle 1

Vehicle 1

Vehicle 2

Vehicle 2 Vehicle 3

Factory

Vehicle 4 Vehicle 5

Market 2Market 1 Market 3

Figure 3 Topology (DAG) of a food supply chain case for evaluation

Confidence Metric (CM) is defined as the differencebetween prior probability and posterior probability as fol-lows

CM =10038161003816100381610038161003816119875post minus 119875pri

10038161003816100381610038161003816

119875pri (6)

119875post and 119875pri are the posterior and prior possibilitiesrespectively If CM is small we are more confident that thedataset suits the model trained previously and vice versaThus we can correct (4) and combine prior and posteriorprobabilities to get more reasonable infection probabilities119875comb of the entire network as follows

119875comb =119875 (119860119894| 119861) + CM times 119875 (119860

119894)

1 + CM (7)

Success Metric (SM)measures the accuracy of the systemas follows

SM = successtotal (8)

It is defined as the ratio of successfully detected times tototal tested times With lower SM the criterion of samplingwould be set stricter and vice versa

These two variables help to adjust sampling algorithmslightly to fit it into certain environment and applications

45 Timing and Space Complexities Suppose there are 119898samples 119899 stages and 119897 batches for all locations in a food

supply chain timing complexities of tracing and back-tracking algorithms are 119874(119898119899) + 119874(119897) and 119874(119898) + 119874(119897)respectively There exists a tradeoff between tracing accuracyand time consumption in SDPS Obviously more samplesmean longer time and better knowledge of the networkCompared with the time spent on chemical testing andsensing time consumption in SDPS is negligible

A piece of record is required for every location and everyfood products so space complexity of the whole IoT systemis 119874(119886 + 119887) where 119886 is the number of food products and 119887 isthe number of locations in the network

5 Evaluation Results and Analysis

We set up two specific cases (Figures 3 and 4) to evaluate theproposed systemThe first case gives a general evaluation andshows that our SDPS scheme outperforms other samplingmethods while the second one focuses on the performanceon large system and big data

51 Experimental Setup In Figure 3 note that vehicles 1 and2 serve as the transportation node both from farm to factoryand factory to market This makes the model closer to realityas some locations in the chain can act as different charactersin food procession

The configuration of the two cases is listed in Table 1Every location in the chain holds 25 and 500 batches in caseof 1 and 2 respectively Total of 60 and 800 thousand of food

8 International Journal of Distributed Sensor Networks

Farm 1

Vehicle 2 Vehicle 3

Vehicle 3 Vehicle 4 Vehicle 5 Vehicle 6

Factory1

Vehicle 1 Vehicle 2

Market 2Market 1 Market 3

Farm 2

Vehicle 7

Factory2

Farm 3

Vehicle 1

Figure 4 Topology (DAG) of a food supply chain case for evaluation (with big data)

products are circulating in the network Which location apiece of food will pass is absolutely random

We build the simulator in C++ which reads in theconfiguration files that describe supply chain topologiesgenerates simulated contamination behavior and senses dataalong food supply chains In simulation the contaminationsource and sampling volume can be set by the users explicitlyFood contamination rules are set as that food can be infectedby passing (1) contamination source directly or (2) crossinfection spots indirectly according to our revised Reed FrostModel

To make this paper compact we only show the trainingprocess of the first model here In the training process weget prior probabilities which highly depend on the topologyand configuration of the chain Figure 5 shows the distribu-tion of infected proportion under different contaminationprobabilities after 300000 tests Under each contaminationprobabilities (depend on contamination type) the actualportion of pollution is almost Gaussian distributed which isthe same as what we discussed in Section 4 Note that thevalue of 119909 axis (119909) should be transferred to the pollutionproportion interval by the following function [(119909 minus 1) lowast4 119909 lowast 4) since the total space is divided into 25 sectionsThe relation between the productsrsquo contamination percentageand the tracing algorithm accuracy under different samplerates is shown in Figure 6 For clarity only 3 samplingrates are tested 3 5 and 10 To make sample strategymore efficient more rates can be evaluated in real situationsFigure 6 confirms the hypothesis we proposed the source isdifficult to be detected if only a small or too large part of foodis contaminated In bothways the flowpath of contaminationis hidden easily

Table 1 Configuration of the two cases

Case 1 Case 2batcheslocation 25 500total products 60000 800000Flow rule Random Random

Table 2 Simulation results of back tracking algorithm

Hit rate False alarm rate96 3

52 Evaluation Results Figure 7 shows the accuracy of thetracing algorithm In different probability of infection in thewhole chain the accuracy can achieve no less than 80 InFigures 8 and 9 partition and dynamic strategy in SDPSare tested respectively In all probabilities of infection casespartition strategy has higher tracing accuracies than thatof global sampling strategy (118 227 109 and 41higher with the infection probabilities of 30 67 80 and90) And compared with sampling in fixed rates (3 5and 10) dynamic method achieves higher tracing accuracyeven with a lower average sampling rate of 78

For backtracking part the result of simulation is shown inTable 2 Both Hit Rate and False Alarm Rate are satisfactory

Case 2 has a large data scale We also fetch a few eachtimes and let the system tell us the amount of samples to getnext time based on (4) Systemrsquos actual sampling rate turnsto be 78 As in Figure 10 the accuracy of tracing algorithmis higher than 80 as well which shows that our proposedapproach works well with big data

International Journal of Distributed Sensor Networks 9

106 8420 12 14 16 18 20 22 24 260

10000

20000

30000

40000

50000

60000

70000

80000

90000

Dist

ribut

ion

of 3

000

00 te

sts

Contamination probability is 30Contamination probability is 67Contamination probability is 80Contamination probability is 90

Pollution proportion interval [(x minus 1)lowast4 xlowast4]

Figure 5 Prior probabilities distribution under different contami-nation probabilities 30 67 80 and 90

0

10

20

30

40

50

60

70

80

90

100

Accu

racy

of t

raci

ng al

gorit

hm (

)

Pollution proportion interval ()

3510

4-5

6-7

8-9

10-1

1

12-1

3

14-1

5

20ndash2

5

30ndash3

5

40ndash4

5

50ndash5

5

60ndash6

5

70ndash7

5

Figure 6 The relationship of tracing algorithm accuracy andpollution proportion intervals under different sample rates 3 5and 10

53 Contamination Visualization With the tool introducedby [24] and the information we choose to record SDPSprovides sampling data that can be represented visually afterbeing tested by sensors The Figures 11 and 12 show the dataflow of infected and uninfected food products respectivelyIn this case we use the configuration in Case 1 and setthe contamination source to be the 4th batch in factory inadvance

10050

0 20 40 60 80

60

70

80

90

100

Probability of infection ()

Trac

ing

accu

racy

()

Figure 7 Accuracy of tracing algorithm with different probabilitiesof infection

40 60 8050

60

70

80

90

100

Probability of infection ()

Partition random samplingGlobal random sampling

Trac

ing

accu

racy

()

Figure 8 Simulation results of partition sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

The vertical lines marked with locations and batchesnumbers represent the nodes in data flow Lines goingthrough these nodes are the traces of foodproducts As shownin those figures most of the infected food while none of theuninfected food passed the 4th batch of factory (the nodewith a circle around it) So it has a great chance to be thesource of contamination which is also proven by our tracingsystem

SDPS makes data concise but still comprehensive whichfacilitates visualization tool displaying the useful informa-tion

Apart from aiding detecting contamination source visu-alization can also help to know contamination conditions(eg contamination severitydistribution) of the whole IoTsystem better For example a well-managed warehouse or acity with lower temperature may lead to less contamination

10 International Journal of Distributed Sensor Networks

Probability of infection ()

Trac

ing

accu

racy

()

3 sampling proportion5 sampling proportion

10 sampling proportionDynamic sampling

40 60 8050

60

70

80

90

100

Figure 9 Simulation results of dynamic sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

Probability of infection ()

Trac

ing

accu

racy

()

500 30 60 90

60

70

80

90

100

Figure 10 Simulation results of tracing accuracywith big data underdifferent infection probabilities

Information of these kinds can be directly read from visual-ization images and help manufacturers to design their foodsupply chain more scientifically

Visualization of the contamination condition in IoTsystemmakes the provenance reasoning in food supply chainintuitive and informative

54 Performance Estimation in Real Situations In realityfood supply chain is more complicated A lot of factors suchas specific food type environment temperaturemanufactureprocess and other parameters could make the chain difficultto predict To implement our strategy into real situationsthose factors should be concerned and some parametersshould be adjusted accordingly

The factor that influences the behavior of food supplychain the most is the type of food Different food has its

own characteristics which may dominate the model theprovenance procedure and expecting results Firstly foodtype can decide the possible contamination source In (1)119875exp and 119875imp are related with the virus that spreads amongfood For example avian influenza virus which is a commoninfectious disease among poultry began to be contiguousamong human beings After the mutation infectious abilityof this virus grew significantly As a result 119875exp and 119875impof avian influenza virus in chickens would also increaseSecondly food type is a deciding factor for its storagepattern and quality guarantee period Some canneddrinks arestacked layer by layer separately so they would not got crosscontaminated However raw meat is generally kept togetherwhich provides an easy environment for virus to spreadThirdly the state of food is also dominant in provenance Onepiece of food in solid state can be seen as a unit while liquidfood like yogurt could be ruined by only one deteriorateddrop In this way for yogurt the sampling process couldbe very different as spatial position should be taken intoconsideration and virusrsquo behavior of liquid should also bestudied

Besides food type there are other factors playing impor-tant roles in real situation Food in summer is more likelyto turn rotten than winter some manufacturing factoriesare more hygienic than others with time passing food mayget easier to be infectious and the types or dosages offood addictives may make the contamination process sloweddown

Although different food supply chain can behave var-iously our proposed strategy can cover most of the casesbecause it obeys the general model of food supply networkand epidemiological principles

6 Conclusion

In this paper we present a heuristic approach to tracingcontamination sources in large IoT systems for complicatedfood supply chains which is a critical issue in metropoli-tan life In our approach Self-adaptive Dynamic PartitionSampling (SDPS) Strategy was proposed to collect data forsensors whose input is only a small portion of end marketsamples from huge volume of samples along food supplychains The approach was illustrated with a case study ofIoT system about provenance in food supply chain whichcan efficiently stop the outbreaks of foodborne disease Withthe intelligent SDPS Strategy objects tested by sensors arethe most reasonable portion of the entire products set Theefficiency is highly improved and the accuracy stays almostthe same as sensing all the objects at the same time SDPSkeeps the integrity of information and approaches a nearlyreal-time examination Also we present a tracing algorithmto find the contamination sources of food supply chainsand a backtracing algorithm to provide strategy for recallingproblematical food undiscovered in the chain It is indicatedin simulation results that our SDPS scheme can achieve up tothe tracing accuracy of 978with a smaller average samplingpercentage compared with traditional global random sam-pling We managed to sample a small portion of food only

International Journal of Distributed Sensor Networks 11

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 11 Visualized data flow of infected food products in food supply chain

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 12 Visualized data flow of uninfected food products in food supply chain

in the end market without loss in accuracy of provenancetracing over the whole IoT system In addition our analyticdata and visualized images can clearly model contaminationconditions in food supply chain within the context of IoTsystem This will give the clients an intuitive impression onfood supply networks in a city

In this paper we assume that all provenance informationof food products is hosted by a centralized repository andthese provenance metadata are organized in a uniform man-ner Our future work is to further make practical implemen-tation of the provenance of food supply chain in a communityas our testing bed for megacity management

12 International Journal of Distributed Sensor Networks

Acknowledgments

This paper is sponsored in part by the Shanghai Interna-tional Science and Technology Collaboration Program underGrant 13430710400 and Campus for Research Excellenceand Technological Enterprise (CREATE) program of Singa-pore National Research Foundation under the joint projecton Energy and Environmental Sustainability Solutions forMegacities (R-706-000-101-281) by Shanghai Jiao Tong Uni-versity (SJTU) and National University of Singapore (NUS)ProfessorQiu is partially supported byNSFCNS-1249223 andNSFC 61071061

References

[1] ldquoWikipedia on smart cityrdquo httpenwikipediaorgwikiSmartcity

[2] M Qiu and E H M Sha ldquoCost minimization while satisfy-ing hardsoft timing constraints for heterogeneous embeddedsystemsrdquoACMTransactions on Design Automation of ElectronicSystems vol 14 no 2 article 25 2009

[3] J Li M Qiu Z Ming G Quan X Qin and Z Gu ldquoOnlineoptimization for scheduling preemptable tasks on IaaS cloudsystemsrdquo Journal of Parallel and Distributed Computing vol 72no 5 pp 666ndash677 2012

[4] K Su J Li and H Fu ldquoSmart city and the applicationsrdquoin Proceedings of the International Conference on ElectronicsCommunications and Control (ICECC rsquo2011) pp 1028ndash1033Zhejiang China September 2011

[5] X Tang J Pu K Cao Y Zhang and Z Xiong ldquoIntegratedextensible simulation platform for vehicular sensor networksin smart citiesrdquo International Journal of Distributed SensorNetworks vol 2012 Article ID 860415 10 pages 2012

[6] P Vlacheas R Giaffreda V Stavroulaki et al ldquoEnabling smartcities through a cognitive management framework for theinternet of thingsrdquo IEEE Communications Magazine vol 51 no6 pp 102ndash111 2013

[7] A Asin ldquoSmart cities from libelium allows systems integratorsto monitor noise pollution structural health and waste man-agementrdquo Smart Cities Articles 2011

[8] Kevin Ashton ldquoThat ldquointernet of thingsrdquo thingrdquo RFID Journal2011

[9] P Magrassi and T Berg ldquoA world of smart objectsrdquo GartnerResearch Report TR-17-2243 2002

[10] Oxford English Dictionary (OED) ldquoThe fact of comingfrom some particular source or quarter source derivationrdquohttpenwikipediaorgwikiProvenance

[11] AV Roth AA TsayM E Pullman and J VGray ldquoUnravelingthe food supply chain strategic insights from China and the2007 recallsrdquo Journal of Supply Chain Management vol 44 no1 pp 22ndash39 2008

[12] S Miles P Groth S Munroe and L Moreau ldquoPrime amethodology for developing provenance-aware applicationsrdquoACM Transactions on Software Engineering and Methodologyvol 20 no 3 article 8 2011

[13] R Hasan R Sion and M Winsltt ldquoThe case of the fakePicasso preventing history forgery with secure provenancerdquo inProceedings of the 7th Conference on File the Storage Technologies(FAST rsquo09) pp 1ndash14 New York NY USA December 2009

[14] T A McMeekin J Baranyi J Bowman et al ldquoInformationsystems in food safety managementrdquo International Journal ofFood Microbiology vol 112 no 3 pp 181ndash194 2006

[15] M A van der Gaag F Vos H W Saatkamp M van Boven Pvan Beek and R B M Huirne ldquoA state-transition simulationmodel for the spread of Salmonella in the pork supply chainrdquoEuropean Journal of Operational Research vol 156 no 3 pp782ndash798 2004

[16] L MWein and Y Liu ldquoAnalyzing a bioterror attack on the foodsupply the case of botulinum toxin in milkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 102 no 28 pp 9984ndash9989 2005

[17] L Qin andQ SWang ldquoFood supply chain qualitymanagementmodel and simulation based on gamerdquo in Proceedings of theInternational Conference on Computer Modeling and Simulation(ICCMS rsquo09) pp 291ndash293 Macau China February 2009

[18] Q Zhang D Wang T Huang et al ldquoModelling provenance infood supply chain to track and trace foodborne diseaserdquo in Pro-ceedings of the International Conference on Computer Modelingand Simulation pp 69ndash75 Hong Kong China February 2012

[19] S Li L Xu and X Wang ldquoCompressed sensing signal and dataacquisitio in wireless sensorrdquo IEEE Transactions on IndustrialInformatics 2012

[20] Z Ding and X Gao ldquoA database cluster system frameworkfor managing massive sensor sampling data in the internet ofthingsrdquo Chinese Journal of Computers vol 35 no 6 pp 1175ndash1191 2012

[21] L Zhang J Liu and H Jiang ldquoEnergy-efficient locationtracking with smartphones for IoTrdquo in Proceedings of the IEEESensors pp 1ndash4 Taipei China October 2012

[22] H Abbey ldquoAn examination of the Reed-Frost theory of epi-demicsrdquo Human Biology vol 24 no 3 pp 201ndash233 1952

[23] L Elveback J P Fox and A Varma ldquoAn extension of tee reed-frost epidemicmodel for the study of competition between viralagents in the presence of interferencerdquoThe American Journal ofEpidemiology vol 80 no 3 pp 356ndash364 1964

[24] X Yuan H Guo H Xiao Z Wang and X Zhang ldquoHigh-dimensional data virtualizationrdquo in Proceedings of the Commu-nications of the CCF pp 13ndash16 April 2011

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Research Article A Case Study of Sensor Data Collection ...downloads.hindawi.com/journals/ijdsn/2013/382132.pdf · smart cities. Our major contributionsare as follows. We propose

6 International Journal of Distributed Sensor Networks

(1) Input samplesrsquo spatial information and examination results(2) Output contamination origin(3) 119896 = 0(4) for 119894 = 1samples do(5) for 119895 = 1(locationsbatches on sample

119894rsquos path) do

(6) if sample119894is infected then

(7) sample119894location

119895batch

119895BAD++

(8) else(9) sample

119894location

119895batch

119895GOOD++

(10) end if(11) end for(12) end for(13) for 119898 = 1(locationsbatches in the entire chain) do(14) if location

119898batch

119898GOOD le 120576 ampamp location

119898batch

119898BAD ge 0 then

(15) Record locationbatch into suspect[k](16) k++(17) end if(18) end for(19) Exclude suspects wsmall BAD(20) if suspect ge 1 then(21) Get food IDs passed all suspects(22) if ID ge 0 then(23) Get food IDs passed at least one suspect(24) end if(25) end if(26) Construct ldquoSuspect Treerdquo of batches according to the paths of these IDs(27) for 119899 = 1(tree nodes) do(28) if (suspect[n]location

119899batch

119899GOOD le 120576 ampamp

(29) suspect[n]location119899batch

119899BAD ge 0) then

(30) origin = suspect[119899](31) end if(32) end for

Algorithm 2 Tracing algorithm

(1) Input contaminated samples set Re-check(2) Output infected food products set Bad(3) while (1) do(4) Construct a tree of locationbatch according to(5) the paths of contaminated products in Re-check(6) Traverse the tree DFS(7) Record all nodes in Bad(8) Empty Re-check(9) if nodelocationbatch is new then(10) Find the food IDs passed these nodes(11) Sensor them(12) if food is contaminated then(13) Put its ID in Re-check(14) end if(15) else(16) break(17) end if(18) end while

Algorithm 3 Back tracking algorithm

International Journal of Distributed Sensor Networks 7

Farm

Vehicle 1

Vehicle 1

Vehicle 2

Vehicle 2 Vehicle 3

Factory

Vehicle 4 Vehicle 5

Market 2Market 1 Market 3

Figure 3 Topology (DAG) of a food supply chain case for evaluation

Confidence Metric (CM) is defined as the differencebetween prior probability and posterior probability as fol-lows

CM =10038161003816100381610038161003816119875post minus 119875pri

10038161003816100381610038161003816

119875pri (6)

119875post and 119875pri are the posterior and prior possibilitiesrespectively If CM is small we are more confident that thedataset suits the model trained previously and vice versaThus we can correct (4) and combine prior and posteriorprobabilities to get more reasonable infection probabilities119875comb of the entire network as follows

119875comb =119875 (119860119894| 119861) + CM times 119875 (119860

119894)

1 + CM (7)

Success Metric (SM)measures the accuracy of the systemas follows

SM = successtotal (8)

It is defined as the ratio of successfully detected times tototal tested times With lower SM the criterion of samplingwould be set stricter and vice versa

These two variables help to adjust sampling algorithmslightly to fit it into certain environment and applications

45 Timing and Space Complexities Suppose there are 119898samples 119899 stages and 119897 batches for all locations in a food

supply chain timing complexities of tracing and back-tracking algorithms are 119874(119898119899) + 119874(119897) and 119874(119898) + 119874(119897)respectively There exists a tradeoff between tracing accuracyand time consumption in SDPS Obviously more samplesmean longer time and better knowledge of the networkCompared with the time spent on chemical testing andsensing time consumption in SDPS is negligible

A piece of record is required for every location and everyfood products so space complexity of the whole IoT systemis 119874(119886 + 119887) where 119886 is the number of food products and 119887 isthe number of locations in the network

5 Evaluation Results and Analysis

We set up two specific cases (Figures 3 and 4) to evaluate theproposed systemThe first case gives a general evaluation andshows that our SDPS scheme outperforms other samplingmethods while the second one focuses on the performanceon large system and big data

51 Experimental Setup In Figure 3 note that vehicles 1 and2 serve as the transportation node both from farm to factoryand factory to market This makes the model closer to realityas some locations in the chain can act as different charactersin food procession

The configuration of the two cases is listed in Table 1Every location in the chain holds 25 and 500 batches in caseof 1 and 2 respectively Total of 60 and 800 thousand of food

8 International Journal of Distributed Sensor Networks

Farm 1

Vehicle 2 Vehicle 3

Vehicle 3 Vehicle 4 Vehicle 5 Vehicle 6

Factory1

Vehicle 1 Vehicle 2

Market 2Market 1 Market 3

Farm 2

Vehicle 7

Factory2

Farm 3

Vehicle 1

Figure 4 Topology (DAG) of a food supply chain case for evaluation (with big data)

products are circulating in the network Which location apiece of food will pass is absolutely random

We build the simulator in C++ which reads in theconfiguration files that describe supply chain topologiesgenerates simulated contamination behavior and senses dataalong food supply chains In simulation the contaminationsource and sampling volume can be set by the users explicitlyFood contamination rules are set as that food can be infectedby passing (1) contamination source directly or (2) crossinfection spots indirectly according to our revised Reed FrostModel

To make this paper compact we only show the trainingprocess of the first model here In the training process weget prior probabilities which highly depend on the topologyand configuration of the chain Figure 5 shows the distribu-tion of infected proportion under different contaminationprobabilities after 300000 tests Under each contaminationprobabilities (depend on contamination type) the actualportion of pollution is almost Gaussian distributed which isthe same as what we discussed in Section 4 Note that thevalue of 119909 axis (119909) should be transferred to the pollutionproportion interval by the following function [(119909 minus 1) lowast4 119909 lowast 4) since the total space is divided into 25 sectionsThe relation between the productsrsquo contamination percentageand the tracing algorithm accuracy under different samplerates is shown in Figure 6 For clarity only 3 samplingrates are tested 3 5 and 10 To make sample strategymore efficient more rates can be evaluated in real situationsFigure 6 confirms the hypothesis we proposed the source isdifficult to be detected if only a small or too large part of foodis contaminated In bothways the flowpath of contaminationis hidden easily

Table 1 Configuration of the two cases

Case 1 Case 2batcheslocation 25 500total products 60000 800000Flow rule Random Random

Table 2 Simulation results of back tracking algorithm

Hit rate False alarm rate96 3

52 Evaluation Results Figure 7 shows the accuracy of thetracing algorithm In different probability of infection in thewhole chain the accuracy can achieve no less than 80 InFigures 8 and 9 partition and dynamic strategy in SDPSare tested respectively In all probabilities of infection casespartition strategy has higher tracing accuracies than thatof global sampling strategy (118 227 109 and 41higher with the infection probabilities of 30 67 80 and90) And compared with sampling in fixed rates (3 5and 10) dynamic method achieves higher tracing accuracyeven with a lower average sampling rate of 78

For backtracking part the result of simulation is shown inTable 2 Both Hit Rate and False Alarm Rate are satisfactory

Case 2 has a large data scale We also fetch a few eachtimes and let the system tell us the amount of samples to getnext time based on (4) Systemrsquos actual sampling rate turnsto be 78 As in Figure 10 the accuracy of tracing algorithmis higher than 80 as well which shows that our proposedapproach works well with big data

International Journal of Distributed Sensor Networks 9

106 8420 12 14 16 18 20 22 24 260

10000

20000

30000

40000

50000

60000

70000

80000

90000

Dist

ribut

ion

of 3

000

00 te

sts

Contamination probability is 30Contamination probability is 67Contamination probability is 80Contamination probability is 90

Pollution proportion interval [(x minus 1)lowast4 xlowast4]

Figure 5 Prior probabilities distribution under different contami-nation probabilities 30 67 80 and 90

0

10

20

30

40

50

60

70

80

90

100

Accu

racy

of t

raci

ng al

gorit

hm (

)

Pollution proportion interval ()

3510

4-5

6-7

8-9

10-1

1

12-1

3

14-1

5

20ndash2

5

30ndash3

5

40ndash4

5

50ndash5

5

60ndash6

5

70ndash7

5

Figure 6 The relationship of tracing algorithm accuracy andpollution proportion intervals under different sample rates 3 5and 10

53 Contamination Visualization With the tool introducedby [24] and the information we choose to record SDPSprovides sampling data that can be represented visually afterbeing tested by sensors The Figures 11 and 12 show the dataflow of infected and uninfected food products respectivelyIn this case we use the configuration in Case 1 and setthe contamination source to be the 4th batch in factory inadvance

10050

0 20 40 60 80

60

70

80

90

100

Probability of infection ()

Trac

ing

accu

racy

()

Figure 7 Accuracy of tracing algorithm with different probabilitiesof infection

40 60 8050

60

70

80

90

100

Probability of infection ()

Partition random samplingGlobal random sampling

Trac

ing

accu

racy

()

Figure 8 Simulation results of partition sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

The vertical lines marked with locations and batchesnumbers represent the nodes in data flow Lines goingthrough these nodes are the traces of foodproducts As shownin those figures most of the infected food while none of theuninfected food passed the 4th batch of factory (the nodewith a circle around it) So it has a great chance to be thesource of contamination which is also proven by our tracingsystem

SDPS makes data concise but still comprehensive whichfacilitates visualization tool displaying the useful informa-tion

Apart from aiding detecting contamination source visu-alization can also help to know contamination conditions(eg contamination severitydistribution) of the whole IoTsystem better For example a well-managed warehouse or acity with lower temperature may lead to less contamination

10 International Journal of Distributed Sensor Networks

Probability of infection ()

Trac

ing

accu

racy

()

3 sampling proportion5 sampling proportion

10 sampling proportionDynamic sampling

40 60 8050

60

70

80

90

100

Figure 9 Simulation results of dynamic sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

Probability of infection ()

Trac

ing

accu

racy

()

500 30 60 90

60

70

80

90

100

Figure 10 Simulation results of tracing accuracywith big data underdifferent infection probabilities

Information of these kinds can be directly read from visual-ization images and help manufacturers to design their foodsupply chain more scientifically

Visualization of the contamination condition in IoTsystemmakes the provenance reasoning in food supply chainintuitive and informative

54 Performance Estimation in Real Situations In realityfood supply chain is more complicated A lot of factors suchas specific food type environment temperaturemanufactureprocess and other parameters could make the chain difficultto predict To implement our strategy into real situationsthose factors should be concerned and some parametersshould be adjusted accordingly

The factor that influences the behavior of food supplychain the most is the type of food Different food has its

own characteristics which may dominate the model theprovenance procedure and expecting results Firstly foodtype can decide the possible contamination source In (1)119875exp and 119875imp are related with the virus that spreads amongfood For example avian influenza virus which is a commoninfectious disease among poultry began to be contiguousamong human beings After the mutation infectious abilityof this virus grew significantly As a result 119875exp and 119875impof avian influenza virus in chickens would also increaseSecondly food type is a deciding factor for its storagepattern and quality guarantee period Some canneddrinks arestacked layer by layer separately so they would not got crosscontaminated However raw meat is generally kept togetherwhich provides an easy environment for virus to spreadThirdly the state of food is also dominant in provenance Onepiece of food in solid state can be seen as a unit while liquidfood like yogurt could be ruined by only one deteriorateddrop In this way for yogurt the sampling process couldbe very different as spatial position should be taken intoconsideration and virusrsquo behavior of liquid should also bestudied

Besides food type there are other factors playing impor-tant roles in real situation Food in summer is more likelyto turn rotten than winter some manufacturing factoriesare more hygienic than others with time passing food mayget easier to be infectious and the types or dosages offood addictives may make the contamination process sloweddown

Although different food supply chain can behave var-iously our proposed strategy can cover most of the casesbecause it obeys the general model of food supply networkand epidemiological principles

6 Conclusion

In this paper we present a heuristic approach to tracingcontamination sources in large IoT systems for complicatedfood supply chains which is a critical issue in metropoli-tan life In our approach Self-adaptive Dynamic PartitionSampling (SDPS) Strategy was proposed to collect data forsensors whose input is only a small portion of end marketsamples from huge volume of samples along food supplychains The approach was illustrated with a case study ofIoT system about provenance in food supply chain whichcan efficiently stop the outbreaks of foodborne disease Withthe intelligent SDPS Strategy objects tested by sensors arethe most reasonable portion of the entire products set Theefficiency is highly improved and the accuracy stays almostthe same as sensing all the objects at the same time SDPSkeeps the integrity of information and approaches a nearlyreal-time examination Also we present a tracing algorithmto find the contamination sources of food supply chainsand a backtracing algorithm to provide strategy for recallingproblematical food undiscovered in the chain It is indicatedin simulation results that our SDPS scheme can achieve up tothe tracing accuracy of 978with a smaller average samplingpercentage compared with traditional global random sam-pling We managed to sample a small portion of food only

International Journal of Distributed Sensor Networks 11

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 11 Visualized data flow of infected food products in food supply chain

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 12 Visualized data flow of uninfected food products in food supply chain

in the end market without loss in accuracy of provenancetracing over the whole IoT system In addition our analyticdata and visualized images can clearly model contaminationconditions in food supply chain within the context of IoTsystem This will give the clients an intuitive impression onfood supply networks in a city

In this paper we assume that all provenance informationof food products is hosted by a centralized repository andthese provenance metadata are organized in a uniform man-ner Our future work is to further make practical implemen-tation of the provenance of food supply chain in a communityas our testing bed for megacity management

12 International Journal of Distributed Sensor Networks

Acknowledgments

This paper is sponsored in part by the Shanghai Interna-tional Science and Technology Collaboration Program underGrant 13430710400 and Campus for Research Excellenceand Technological Enterprise (CREATE) program of Singa-pore National Research Foundation under the joint projecton Energy and Environmental Sustainability Solutions forMegacities (R-706-000-101-281) by Shanghai Jiao Tong Uni-versity (SJTU) and National University of Singapore (NUS)ProfessorQiu is partially supported byNSFCNS-1249223 andNSFC 61071061

References

[1] ldquoWikipedia on smart cityrdquo httpenwikipediaorgwikiSmartcity

[2] M Qiu and E H M Sha ldquoCost minimization while satisfy-ing hardsoft timing constraints for heterogeneous embeddedsystemsrdquoACMTransactions on Design Automation of ElectronicSystems vol 14 no 2 article 25 2009

[3] J Li M Qiu Z Ming G Quan X Qin and Z Gu ldquoOnlineoptimization for scheduling preemptable tasks on IaaS cloudsystemsrdquo Journal of Parallel and Distributed Computing vol 72no 5 pp 666ndash677 2012

[4] K Su J Li and H Fu ldquoSmart city and the applicationsrdquoin Proceedings of the International Conference on ElectronicsCommunications and Control (ICECC rsquo2011) pp 1028ndash1033Zhejiang China September 2011

[5] X Tang J Pu K Cao Y Zhang and Z Xiong ldquoIntegratedextensible simulation platform for vehicular sensor networksin smart citiesrdquo International Journal of Distributed SensorNetworks vol 2012 Article ID 860415 10 pages 2012

[6] P Vlacheas R Giaffreda V Stavroulaki et al ldquoEnabling smartcities through a cognitive management framework for theinternet of thingsrdquo IEEE Communications Magazine vol 51 no6 pp 102ndash111 2013

[7] A Asin ldquoSmart cities from libelium allows systems integratorsto monitor noise pollution structural health and waste man-agementrdquo Smart Cities Articles 2011

[8] Kevin Ashton ldquoThat ldquointernet of thingsrdquo thingrdquo RFID Journal2011

[9] P Magrassi and T Berg ldquoA world of smart objectsrdquo GartnerResearch Report TR-17-2243 2002

[10] Oxford English Dictionary (OED) ldquoThe fact of comingfrom some particular source or quarter source derivationrdquohttpenwikipediaorgwikiProvenance

[11] AV Roth AA TsayM E Pullman and J VGray ldquoUnravelingthe food supply chain strategic insights from China and the2007 recallsrdquo Journal of Supply Chain Management vol 44 no1 pp 22ndash39 2008

[12] S Miles P Groth S Munroe and L Moreau ldquoPrime amethodology for developing provenance-aware applicationsrdquoACM Transactions on Software Engineering and Methodologyvol 20 no 3 article 8 2011

[13] R Hasan R Sion and M Winsltt ldquoThe case of the fakePicasso preventing history forgery with secure provenancerdquo inProceedings of the 7th Conference on File the Storage Technologies(FAST rsquo09) pp 1ndash14 New York NY USA December 2009

[14] T A McMeekin J Baranyi J Bowman et al ldquoInformationsystems in food safety managementrdquo International Journal ofFood Microbiology vol 112 no 3 pp 181ndash194 2006

[15] M A van der Gaag F Vos H W Saatkamp M van Boven Pvan Beek and R B M Huirne ldquoA state-transition simulationmodel for the spread of Salmonella in the pork supply chainrdquoEuropean Journal of Operational Research vol 156 no 3 pp782ndash798 2004

[16] L MWein and Y Liu ldquoAnalyzing a bioterror attack on the foodsupply the case of botulinum toxin in milkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 102 no 28 pp 9984ndash9989 2005

[17] L Qin andQ SWang ldquoFood supply chain qualitymanagementmodel and simulation based on gamerdquo in Proceedings of theInternational Conference on Computer Modeling and Simulation(ICCMS rsquo09) pp 291ndash293 Macau China February 2009

[18] Q Zhang D Wang T Huang et al ldquoModelling provenance infood supply chain to track and trace foodborne diseaserdquo in Pro-ceedings of the International Conference on Computer Modelingand Simulation pp 69ndash75 Hong Kong China February 2012

[19] S Li L Xu and X Wang ldquoCompressed sensing signal and dataacquisitio in wireless sensorrdquo IEEE Transactions on IndustrialInformatics 2012

[20] Z Ding and X Gao ldquoA database cluster system frameworkfor managing massive sensor sampling data in the internet ofthingsrdquo Chinese Journal of Computers vol 35 no 6 pp 1175ndash1191 2012

[21] L Zhang J Liu and H Jiang ldquoEnergy-efficient locationtracking with smartphones for IoTrdquo in Proceedings of the IEEESensors pp 1ndash4 Taipei China October 2012

[22] H Abbey ldquoAn examination of the Reed-Frost theory of epi-demicsrdquo Human Biology vol 24 no 3 pp 201ndash233 1952

[23] L Elveback J P Fox and A Varma ldquoAn extension of tee reed-frost epidemicmodel for the study of competition between viralagents in the presence of interferencerdquoThe American Journal ofEpidemiology vol 80 no 3 pp 356ndash364 1964

[24] X Yuan H Guo H Xiao Z Wang and X Zhang ldquoHigh-dimensional data virtualizationrdquo in Proceedings of the Commu-nications of the CCF pp 13ndash16 April 2011

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Research Article A Case Study of Sensor Data Collection ...downloads.hindawi.com/journals/ijdsn/2013/382132.pdf · smart cities. Our major contributionsare as follows. We propose

International Journal of Distributed Sensor Networks 7

Farm

Vehicle 1

Vehicle 1

Vehicle 2

Vehicle 2 Vehicle 3

Factory

Vehicle 4 Vehicle 5

Market 2Market 1 Market 3

Figure 3 Topology (DAG) of a food supply chain case for evaluation

Confidence Metric (CM) is defined as the differencebetween prior probability and posterior probability as fol-lows

CM =10038161003816100381610038161003816119875post minus 119875pri

10038161003816100381610038161003816

119875pri (6)

119875post and 119875pri are the posterior and prior possibilitiesrespectively If CM is small we are more confident that thedataset suits the model trained previously and vice versaThus we can correct (4) and combine prior and posteriorprobabilities to get more reasonable infection probabilities119875comb of the entire network as follows

119875comb =119875 (119860119894| 119861) + CM times 119875 (119860

119894)

1 + CM (7)

Success Metric (SM)measures the accuracy of the systemas follows

SM = successtotal (8)

It is defined as the ratio of successfully detected times tototal tested times With lower SM the criterion of samplingwould be set stricter and vice versa

These two variables help to adjust sampling algorithmslightly to fit it into certain environment and applications

45 Timing and Space Complexities Suppose there are 119898samples 119899 stages and 119897 batches for all locations in a food

supply chain timing complexities of tracing and back-tracking algorithms are 119874(119898119899) + 119874(119897) and 119874(119898) + 119874(119897)respectively There exists a tradeoff between tracing accuracyand time consumption in SDPS Obviously more samplesmean longer time and better knowledge of the networkCompared with the time spent on chemical testing andsensing time consumption in SDPS is negligible

A piece of record is required for every location and everyfood products so space complexity of the whole IoT systemis 119874(119886 + 119887) where 119886 is the number of food products and 119887 isthe number of locations in the network

5 Evaluation Results and Analysis

We set up two specific cases (Figures 3 and 4) to evaluate theproposed systemThe first case gives a general evaluation andshows that our SDPS scheme outperforms other samplingmethods while the second one focuses on the performanceon large system and big data

51 Experimental Setup In Figure 3 note that vehicles 1 and2 serve as the transportation node both from farm to factoryand factory to market This makes the model closer to realityas some locations in the chain can act as different charactersin food procession

The configuration of the two cases is listed in Table 1Every location in the chain holds 25 and 500 batches in caseof 1 and 2 respectively Total of 60 and 800 thousand of food

8 International Journal of Distributed Sensor Networks

Farm 1

Vehicle 2 Vehicle 3

Vehicle 3 Vehicle 4 Vehicle 5 Vehicle 6

Factory1

Vehicle 1 Vehicle 2

Market 2Market 1 Market 3

Farm 2

Vehicle 7

Factory2

Farm 3

Vehicle 1

Figure 4 Topology (DAG) of a food supply chain case for evaluation (with big data)

products are circulating in the network Which location apiece of food will pass is absolutely random

We build the simulator in C++ which reads in theconfiguration files that describe supply chain topologiesgenerates simulated contamination behavior and senses dataalong food supply chains In simulation the contaminationsource and sampling volume can be set by the users explicitlyFood contamination rules are set as that food can be infectedby passing (1) contamination source directly or (2) crossinfection spots indirectly according to our revised Reed FrostModel

To make this paper compact we only show the trainingprocess of the first model here In the training process weget prior probabilities which highly depend on the topologyand configuration of the chain Figure 5 shows the distribu-tion of infected proportion under different contaminationprobabilities after 300000 tests Under each contaminationprobabilities (depend on contamination type) the actualportion of pollution is almost Gaussian distributed which isthe same as what we discussed in Section 4 Note that thevalue of 119909 axis (119909) should be transferred to the pollutionproportion interval by the following function [(119909 minus 1) lowast4 119909 lowast 4) since the total space is divided into 25 sectionsThe relation between the productsrsquo contamination percentageand the tracing algorithm accuracy under different samplerates is shown in Figure 6 For clarity only 3 samplingrates are tested 3 5 and 10 To make sample strategymore efficient more rates can be evaluated in real situationsFigure 6 confirms the hypothesis we proposed the source isdifficult to be detected if only a small or too large part of foodis contaminated In bothways the flowpath of contaminationis hidden easily

Table 1 Configuration of the two cases

Case 1 Case 2batcheslocation 25 500total products 60000 800000Flow rule Random Random

Table 2 Simulation results of back tracking algorithm

Hit rate False alarm rate96 3

52 Evaluation Results Figure 7 shows the accuracy of thetracing algorithm In different probability of infection in thewhole chain the accuracy can achieve no less than 80 InFigures 8 and 9 partition and dynamic strategy in SDPSare tested respectively In all probabilities of infection casespartition strategy has higher tracing accuracies than thatof global sampling strategy (118 227 109 and 41higher with the infection probabilities of 30 67 80 and90) And compared with sampling in fixed rates (3 5and 10) dynamic method achieves higher tracing accuracyeven with a lower average sampling rate of 78

For backtracking part the result of simulation is shown inTable 2 Both Hit Rate and False Alarm Rate are satisfactory

Case 2 has a large data scale We also fetch a few eachtimes and let the system tell us the amount of samples to getnext time based on (4) Systemrsquos actual sampling rate turnsto be 78 As in Figure 10 the accuracy of tracing algorithmis higher than 80 as well which shows that our proposedapproach works well with big data

International Journal of Distributed Sensor Networks 9

106 8420 12 14 16 18 20 22 24 260

10000

20000

30000

40000

50000

60000

70000

80000

90000

Dist

ribut

ion

of 3

000

00 te

sts

Contamination probability is 30Contamination probability is 67Contamination probability is 80Contamination probability is 90

Pollution proportion interval [(x minus 1)lowast4 xlowast4]

Figure 5 Prior probabilities distribution under different contami-nation probabilities 30 67 80 and 90

0

10

20

30

40

50

60

70

80

90

100

Accu

racy

of t

raci

ng al

gorit

hm (

)

Pollution proportion interval ()

3510

4-5

6-7

8-9

10-1

1

12-1

3

14-1

5

20ndash2

5

30ndash3

5

40ndash4

5

50ndash5

5

60ndash6

5

70ndash7

5

Figure 6 The relationship of tracing algorithm accuracy andpollution proportion intervals under different sample rates 3 5and 10

53 Contamination Visualization With the tool introducedby [24] and the information we choose to record SDPSprovides sampling data that can be represented visually afterbeing tested by sensors The Figures 11 and 12 show the dataflow of infected and uninfected food products respectivelyIn this case we use the configuration in Case 1 and setthe contamination source to be the 4th batch in factory inadvance

10050

0 20 40 60 80

60

70

80

90

100

Probability of infection ()

Trac

ing

accu

racy

()

Figure 7 Accuracy of tracing algorithm with different probabilitiesof infection

40 60 8050

60

70

80

90

100

Probability of infection ()

Partition random samplingGlobal random sampling

Trac

ing

accu

racy

()

Figure 8 Simulation results of partition sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

The vertical lines marked with locations and batchesnumbers represent the nodes in data flow Lines goingthrough these nodes are the traces of foodproducts As shownin those figures most of the infected food while none of theuninfected food passed the 4th batch of factory (the nodewith a circle around it) So it has a great chance to be thesource of contamination which is also proven by our tracingsystem

SDPS makes data concise but still comprehensive whichfacilitates visualization tool displaying the useful informa-tion

Apart from aiding detecting contamination source visu-alization can also help to know contamination conditions(eg contamination severitydistribution) of the whole IoTsystem better For example a well-managed warehouse or acity with lower temperature may lead to less contamination

10 International Journal of Distributed Sensor Networks

Probability of infection ()

Trac

ing

accu

racy

()

3 sampling proportion5 sampling proportion

10 sampling proportionDynamic sampling

40 60 8050

60

70

80

90

100

Figure 9 Simulation results of dynamic sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

Probability of infection ()

Trac

ing

accu

racy

()

500 30 60 90

60

70

80

90

100

Figure 10 Simulation results of tracing accuracywith big data underdifferent infection probabilities

Information of these kinds can be directly read from visual-ization images and help manufacturers to design their foodsupply chain more scientifically

Visualization of the contamination condition in IoTsystemmakes the provenance reasoning in food supply chainintuitive and informative

54 Performance Estimation in Real Situations In realityfood supply chain is more complicated A lot of factors suchas specific food type environment temperaturemanufactureprocess and other parameters could make the chain difficultto predict To implement our strategy into real situationsthose factors should be concerned and some parametersshould be adjusted accordingly

The factor that influences the behavior of food supplychain the most is the type of food Different food has its

own characteristics which may dominate the model theprovenance procedure and expecting results Firstly foodtype can decide the possible contamination source In (1)119875exp and 119875imp are related with the virus that spreads amongfood For example avian influenza virus which is a commoninfectious disease among poultry began to be contiguousamong human beings After the mutation infectious abilityof this virus grew significantly As a result 119875exp and 119875impof avian influenza virus in chickens would also increaseSecondly food type is a deciding factor for its storagepattern and quality guarantee period Some canneddrinks arestacked layer by layer separately so they would not got crosscontaminated However raw meat is generally kept togetherwhich provides an easy environment for virus to spreadThirdly the state of food is also dominant in provenance Onepiece of food in solid state can be seen as a unit while liquidfood like yogurt could be ruined by only one deteriorateddrop In this way for yogurt the sampling process couldbe very different as spatial position should be taken intoconsideration and virusrsquo behavior of liquid should also bestudied

Besides food type there are other factors playing impor-tant roles in real situation Food in summer is more likelyto turn rotten than winter some manufacturing factoriesare more hygienic than others with time passing food mayget easier to be infectious and the types or dosages offood addictives may make the contamination process sloweddown

Although different food supply chain can behave var-iously our proposed strategy can cover most of the casesbecause it obeys the general model of food supply networkand epidemiological principles

6 Conclusion

In this paper we present a heuristic approach to tracingcontamination sources in large IoT systems for complicatedfood supply chains which is a critical issue in metropoli-tan life In our approach Self-adaptive Dynamic PartitionSampling (SDPS) Strategy was proposed to collect data forsensors whose input is only a small portion of end marketsamples from huge volume of samples along food supplychains The approach was illustrated with a case study ofIoT system about provenance in food supply chain whichcan efficiently stop the outbreaks of foodborne disease Withthe intelligent SDPS Strategy objects tested by sensors arethe most reasonable portion of the entire products set Theefficiency is highly improved and the accuracy stays almostthe same as sensing all the objects at the same time SDPSkeeps the integrity of information and approaches a nearlyreal-time examination Also we present a tracing algorithmto find the contamination sources of food supply chainsand a backtracing algorithm to provide strategy for recallingproblematical food undiscovered in the chain It is indicatedin simulation results that our SDPS scheme can achieve up tothe tracing accuracy of 978with a smaller average samplingpercentage compared with traditional global random sam-pling We managed to sample a small portion of food only

International Journal of Distributed Sensor Networks 11

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 11 Visualized data flow of infected food products in food supply chain

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 12 Visualized data flow of uninfected food products in food supply chain

in the end market without loss in accuracy of provenancetracing over the whole IoT system In addition our analyticdata and visualized images can clearly model contaminationconditions in food supply chain within the context of IoTsystem This will give the clients an intuitive impression onfood supply networks in a city

In this paper we assume that all provenance informationof food products is hosted by a centralized repository andthese provenance metadata are organized in a uniform man-ner Our future work is to further make practical implemen-tation of the provenance of food supply chain in a communityas our testing bed for megacity management

12 International Journal of Distributed Sensor Networks

Acknowledgments

This paper is sponsored in part by the Shanghai Interna-tional Science and Technology Collaboration Program underGrant 13430710400 and Campus for Research Excellenceand Technological Enterprise (CREATE) program of Singa-pore National Research Foundation under the joint projecton Energy and Environmental Sustainability Solutions forMegacities (R-706-000-101-281) by Shanghai Jiao Tong Uni-versity (SJTU) and National University of Singapore (NUS)ProfessorQiu is partially supported byNSFCNS-1249223 andNSFC 61071061

References

[1] ldquoWikipedia on smart cityrdquo httpenwikipediaorgwikiSmartcity

[2] M Qiu and E H M Sha ldquoCost minimization while satisfy-ing hardsoft timing constraints for heterogeneous embeddedsystemsrdquoACMTransactions on Design Automation of ElectronicSystems vol 14 no 2 article 25 2009

[3] J Li M Qiu Z Ming G Quan X Qin and Z Gu ldquoOnlineoptimization for scheduling preemptable tasks on IaaS cloudsystemsrdquo Journal of Parallel and Distributed Computing vol 72no 5 pp 666ndash677 2012

[4] K Su J Li and H Fu ldquoSmart city and the applicationsrdquoin Proceedings of the International Conference on ElectronicsCommunications and Control (ICECC rsquo2011) pp 1028ndash1033Zhejiang China September 2011

[5] X Tang J Pu K Cao Y Zhang and Z Xiong ldquoIntegratedextensible simulation platform for vehicular sensor networksin smart citiesrdquo International Journal of Distributed SensorNetworks vol 2012 Article ID 860415 10 pages 2012

[6] P Vlacheas R Giaffreda V Stavroulaki et al ldquoEnabling smartcities through a cognitive management framework for theinternet of thingsrdquo IEEE Communications Magazine vol 51 no6 pp 102ndash111 2013

[7] A Asin ldquoSmart cities from libelium allows systems integratorsto monitor noise pollution structural health and waste man-agementrdquo Smart Cities Articles 2011

[8] Kevin Ashton ldquoThat ldquointernet of thingsrdquo thingrdquo RFID Journal2011

[9] P Magrassi and T Berg ldquoA world of smart objectsrdquo GartnerResearch Report TR-17-2243 2002

[10] Oxford English Dictionary (OED) ldquoThe fact of comingfrom some particular source or quarter source derivationrdquohttpenwikipediaorgwikiProvenance

[11] AV Roth AA TsayM E Pullman and J VGray ldquoUnravelingthe food supply chain strategic insights from China and the2007 recallsrdquo Journal of Supply Chain Management vol 44 no1 pp 22ndash39 2008

[12] S Miles P Groth S Munroe and L Moreau ldquoPrime amethodology for developing provenance-aware applicationsrdquoACM Transactions on Software Engineering and Methodologyvol 20 no 3 article 8 2011

[13] R Hasan R Sion and M Winsltt ldquoThe case of the fakePicasso preventing history forgery with secure provenancerdquo inProceedings of the 7th Conference on File the Storage Technologies(FAST rsquo09) pp 1ndash14 New York NY USA December 2009

[14] T A McMeekin J Baranyi J Bowman et al ldquoInformationsystems in food safety managementrdquo International Journal ofFood Microbiology vol 112 no 3 pp 181ndash194 2006

[15] M A van der Gaag F Vos H W Saatkamp M van Boven Pvan Beek and R B M Huirne ldquoA state-transition simulationmodel for the spread of Salmonella in the pork supply chainrdquoEuropean Journal of Operational Research vol 156 no 3 pp782ndash798 2004

[16] L MWein and Y Liu ldquoAnalyzing a bioterror attack on the foodsupply the case of botulinum toxin in milkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 102 no 28 pp 9984ndash9989 2005

[17] L Qin andQ SWang ldquoFood supply chain qualitymanagementmodel and simulation based on gamerdquo in Proceedings of theInternational Conference on Computer Modeling and Simulation(ICCMS rsquo09) pp 291ndash293 Macau China February 2009

[18] Q Zhang D Wang T Huang et al ldquoModelling provenance infood supply chain to track and trace foodborne diseaserdquo in Pro-ceedings of the International Conference on Computer Modelingand Simulation pp 69ndash75 Hong Kong China February 2012

[19] S Li L Xu and X Wang ldquoCompressed sensing signal and dataacquisitio in wireless sensorrdquo IEEE Transactions on IndustrialInformatics 2012

[20] Z Ding and X Gao ldquoA database cluster system frameworkfor managing massive sensor sampling data in the internet ofthingsrdquo Chinese Journal of Computers vol 35 no 6 pp 1175ndash1191 2012

[21] L Zhang J Liu and H Jiang ldquoEnergy-efficient locationtracking with smartphones for IoTrdquo in Proceedings of the IEEESensors pp 1ndash4 Taipei China October 2012

[22] H Abbey ldquoAn examination of the Reed-Frost theory of epi-demicsrdquo Human Biology vol 24 no 3 pp 201ndash233 1952

[23] L Elveback J P Fox and A Varma ldquoAn extension of tee reed-frost epidemicmodel for the study of competition between viralagents in the presence of interferencerdquoThe American Journal ofEpidemiology vol 80 no 3 pp 356ndash364 1964

[24] X Yuan H Guo H Xiao Z Wang and X Zhang ldquoHigh-dimensional data virtualizationrdquo in Proceedings of the Commu-nications of the CCF pp 13ndash16 April 2011

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Research Article A Case Study of Sensor Data Collection ...downloads.hindawi.com/journals/ijdsn/2013/382132.pdf · smart cities. Our major contributionsare as follows. We propose

8 International Journal of Distributed Sensor Networks

Farm 1

Vehicle 2 Vehicle 3

Vehicle 3 Vehicle 4 Vehicle 5 Vehicle 6

Factory1

Vehicle 1 Vehicle 2

Market 2Market 1 Market 3

Farm 2

Vehicle 7

Factory2

Farm 3

Vehicle 1

Figure 4 Topology (DAG) of a food supply chain case for evaluation (with big data)

products are circulating in the network Which location apiece of food will pass is absolutely random

We build the simulator in C++ which reads in theconfiguration files that describe supply chain topologiesgenerates simulated contamination behavior and senses dataalong food supply chains In simulation the contaminationsource and sampling volume can be set by the users explicitlyFood contamination rules are set as that food can be infectedby passing (1) contamination source directly or (2) crossinfection spots indirectly according to our revised Reed FrostModel

To make this paper compact we only show the trainingprocess of the first model here In the training process weget prior probabilities which highly depend on the topologyand configuration of the chain Figure 5 shows the distribu-tion of infected proportion under different contaminationprobabilities after 300000 tests Under each contaminationprobabilities (depend on contamination type) the actualportion of pollution is almost Gaussian distributed which isthe same as what we discussed in Section 4 Note that thevalue of 119909 axis (119909) should be transferred to the pollutionproportion interval by the following function [(119909 minus 1) lowast4 119909 lowast 4) since the total space is divided into 25 sectionsThe relation between the productsrsquo contamination percentageand the tracing algorithm accuracy under different samplerates is shown in Figure 6 For clarity only 3 samplingrates are tested 3 5 and 10 To make sample strategymore efficient more rates can be evaluated in real situationsFigure 6 confirms the hypothesis we proposed the source isdifficult to be detected if only a small or too large part of foodis contaminated In bothways the flowpath of contaminationis hidden easily

Table 1 Configuration of the two cases

Case 1 Case 2batcheslocation 25 500total products 60000 800000Flow rule Random Random

Table 2 Simulation results of back tracking algorithm

Hit rate False alarm rate96 3

52 Evaluation Results Figure 7 shows the accuracy of thetracing algorithm In different probability of infection in thewhole chain the accuracy can achieve no less than 80 InFigures 8 and 9 partition and dynamic strategy in SDPSare tested respectively In all probabilities of infection casespartition strategy has higher tracing accuracies than thatof global sampling strategy (118 227 109 and 41higher with the infection probabilities of 30 67 80 and90) And compared with sampling in fixed rates (3 5and 10) dynamic method achieves higher tracing accuracyeven with a lower average sampling rate of 78

For backtracking part the result of simulation is shown inTable 2 Both Hit Rate and False Alarm Rate are satisfactory

Case 2 has a large data scale We also fetch a few eachtimes and let the system tell us the amount of samples to getnext time based on (4) Systemrsquos actual sampling rate turnsto be 78 As in Figure 10 the accuracy of tracing algorithmis higher than 80 as well which shows that our proposedapproach works well with big data

International Journal of Distributed Sensor Networks 9

106 8420 12 14 16 18 20 22 24 260

10000

20000

30000

40000

50000

60000

70000

80000

90000

Dist

ribut

ion

of 3

000

00 te

sts

Contamination probability is 30Contamination probability is 67Contamination probability is 80Contamination probability is 90

Pollution proportion interval [(x minus 1)lowast4 xlowast4]

Figure 5 Prior probabilities distribution under different contami-nation probabilities 30 67 80 and 90

0

10

20

30

40

50

60

70

80

90

100

Accu

racy

of t

raci

ng al

gorit

hm (

)

Pollution proportion interval ()

3510

4-5

6-7

8-9

10-1

1

12-1

3

14-1

5

20ndash2

5

30ndash3

5

40ndash4

5

50ndash5

5

60ndash6

5

70ndash7

5

Figure 6 The relationship of tracing algorithm accuracy andpollution proportion intervals under different sample rates 3 5and 10

53 Contamination Visualization With the tool introducedby [24] and the information we choose to record SDPSprovides sampling data that can be represented visually afterbeing tested by sensors The Figures 11 and 12 show the dataflow of infected and uninfected food products respectivelyIn this case we use the configuration in Case 1 and setthe contamination source to be the 4th batch in factory inadvance

10050

0 20 40 60 80

60

70

80

90

100

Probability of infection ()

Trac

ing

accu

racy

()

Figure 7 Accuracy of tracing algorithm with different probabilitiesof infection

40 60 8050

60

70

80

90

100

Probability of infection ()

Partition random samplingGlobal random sampling

Trac

ing

accu

racy

()

Figure 8 Simulation results of partition sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

The vertical lines marked with locations and batchesnumbers represent the nodes in data flow Lines goingthrough these nodes are the traces of foodproducts As shownin those figures most of the infected food while none of theuninfected food passed the 4th batch of factory (the nodewith a circle around it) So it has a great chance to be thesource of contamination which is also proven by our tracingsystem

SDPS makes data concise but still comprehensive whichfacilitates visualization tool displaying the useful informa-tion

Apart from aiding detecting contamination source visu-alization can also help to know contamination conditions(eg contamination severitydistribution) of the whole IoTsystem better For example a well-managed warehouse or acity with lower temperature may lead to less contamination

10 International Journal of Distributed Sensor Networks

Probability of infection ()

Trac

ing

accu

racy

()

3 sampling proportion5 sampling proportion

10 sampling proportionDynamic sampling

40 60 8050

60

70

80

90

100

Figure 9 Simulation results of dynamic sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

Probability of infection ()

Trac

ing

accu

racy

()

500 30 60 90

60

70

80

90

100

Figure 10 Simulation results of tracing accuracywith big data underdifferent infection probabilities

Information of these kinds can be directly read from visual-ization images and help manufacturers to design their foodsupply chain more scientifically

Visualization of the contamination condition in IoTsystemmakes the provenance reasoning in food supply chainintuitive and informative

54 Performance Estimation in Real Situations In realityfood supply chain is more complicated A lot of factors suchas specific food type environment temperaturemanufactureprocess and other parameters could make the chain difficultto predict To implement our strategy into real situationsthose factors should be concerned and some parametersshould be adjusted accordingly

The factor that influences the behavior of food supplychain the most is the type of food Different food has its

own characteristics which may dominate the model theprovenance procedure and expecting results Firstly foodtype can decide the possible contamination source In (1)119875exp and 119875imp are related with the virus that spreads amongfood For example avian influenza virus which is a commoninfectious disease among poultry began to be contiguousamong human beings After the mutation infectious abilityof this virus grew significantly As a result 119875exp and 119875impof avian influenza virus in chickens would also increaseSecondly food type is a deciding factor for its storagepattern and quality guarantee period Some canneddrinks arestacked layer by layer separately so they would not got crosscontaminated However raw meat is generally kept togetherwhich provides an easy environment for virus to spreadThirdly the state of food is also dominant in provenance Onepiece of food in solid state can be seen as a unit while liquidfood like yogurt could be ruined by only one deteriorateddrop In this way for yogurt the sampling process couldbe very different as spatial position should be taken intoconsideration and virusrsquo behavior of liquid should also bestudied

Besides food type there are other factors playing impor-tant roles in real situation Food in summer is more likelyto turn rotten than winter some manufacturing factoriesare more hygienic than others with time passing food mayget easier to be infectious and the types or dosages offood addictives may make the contamination process sloweddown

Although different food supply chain can behave var-iously our proposed strategy can cover most of the casesbecause it obeys the general model of food supply networkand epidemiological principles

6 Conclusion

In this paper we present a heuristic approach to tracingcontamination sources in large IoT systems for complicatedfood supply chains which is a critical issue in metropoli-tan life In our approach Self-adaptive Dynamic PartitionSampling (SDPS) Strategy was proposed to collect data forsensors whose input is only a small portion of end marketsamples from huge volume of samples along food supplychains The approach was illustrated with a case study ofIoT system about provenance in food supply chain whichcan efficiently stop the outbreaks of foodborne disease Withthe intelligent SDPS Strategy objects tested by sensors arethe most reasonable portion of the entire products set Theefficiency is highly improved and the accuracy stays almostthe same as sensing all the objects at the same time SDPSkeeps the integrity of information and approaches a nearlyreal-time examination Also we present a tracing algorithmto find the contamination sources of food supply chainsand a backtracing algorithm to provide strategy for recallingproblematical food undiscovered in the chain It is indicatedin simulation results that our SDPS scheme can achieve up tothe tracing accuracy of 978with a smaller average samplingpercentage compared with traditional global random sam-pling We managed to sample a small portion of food only

International Journal of Distributed Sensor Networks 11

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 11 Visualized data flow of infected food products in food supply chain

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 12 Visualized data flow of uninfected food products in food supply chain

in the end market without loss in accuracy of provenancetracing over the whole IoT system In addition our analyticdata and visualized images can clearly model contaminationconditions in food supply chain within the context of IoTsystem This will give the clients an intuitive impression onfood supply networks in a city

In this paper we assume that all provenance informationof food products is hosted by a centralized repository andthese provenance metadata are organized in a uniform man-ner Our future work is to further make practical implemen-tation of the provenance of food supply chain in a communityas our testing bed for megacity management

12 International Journal of Distributed Sensor Networks

Acknowledgments

This paper is sponsored in part by the Shanghai Interna-tional Science and Technology Collaboration Program underGrant 13430710400 and Campus for Research Excellenceand Technological Enterprise (CREATE) program of Singa-pore National Research Foundation under the joint projecton Energy and Environmental Sustainability Solutions forMegacities (R-706-000-101-281) by Shanghai Jiao Tong Uni-versity (SJTU) and National University of Singapore (NUS)ProfessorQiu is partially supported byNSFCNS-1249223 andNSFC 61071061

References

[1] ldquoWikipedia on smart cityrdquo httpenwikipediaorgwikiSmartcity

[2] M Qiu and E H M Sha ldquoCost minimization while satisfy-ing hardsoft timing constraints for heterogeneous embeddedsystemsrdquoACMTransactions on Design Automation of ElectronicSystems vol 14 no 2 article 25 2009

[3] J Li M Qiu Z Ming G Quan X Qin and Z Gu ldquoOnlineoptimization for scheduling preemptable tasks on IaaS cloudsystemsrdquo Journal of Parallel and Distributed Computing vol 72no 5 pp 666ndash677 2012

[4] K Su J Li and H Fu ldquoSmart city and the applicationsrdquoin Proceedings of the International Conference on ElectronicsCommunications and Control (ICECC rsquo2011) pp 1028ndash1033Zhejiang China September 2011

[5] X Tang J Pu K Cao Y Zhang and Z Xiong ldquoIntegratedextensible simulation platform for vehicular sensor networksin smart citiesrdquo International Journal of Distributed SensorNetworks vol 2012 Article ID 860415 10 pages 2012

[6] P Vlacheas R Giaffreda V Stavroulaki et al ldquoEnabling smartcities through a cognitive management framework for theinternet of thingsrdquo IEEE Communications Magazine vol 51 no6 pp 102ndash111 2013

[7] A Asin ldquoSmart cities from libelium allows systems integratorsto monitor noise pollution structural health and waste man-agementrdquo Smart Cities Articles 2011

[8] Kevin Ashton ldquoThat ldquointernet of thingsrdquo thingrdquo RFID Journal2011

[9] P Magrassi and T Berg ldquoA world of smart objectsrdquo GartnerResearch Report TR-17-2243 2002

[10] Oxford English Dictionary (OED) ldquoThe fact of comingfrom some particular source or quarter source derivationrdquohttpenwikipediaorgwikiProvenance

[11] AV Roth AA TsayM E Pullman and J VGray ldquoUnravelingthe food supply chain strategic insights from China and the2007 recallsrdquo Journal of Supply Chain Management vol 44 no1 pp 22ndash39 2008

[12] S Miles P Groth S Munroe and L Moreau ldquoPrime amethodology for developing provenance-aware applicationsrdquoACM Transactions on Software Engineering and Methodologyvol 20 no 3 article 8 2011

[13] R Hasan R Sion and M Winsltt ldquoThe case of the fakePicasso preventing history forgery with secure provenancerdquo inProceedings of the 7th Conference on File the Storage Technologies(FAST rsquo09) pp 1ndash14 New York NY USA December 2009

[14] T A McMeekin J Baranyi J Bowman et al ldquoInformationsystems in food safety managementrdquo International Journal ofFood Microbiology vol 112 no 3 pp 181ndash194 2006

[15] M A van der Gaag F Vos H W Saatkamp M van Boven Pvan Beek and R B M Huirne ldquoA state-transition simulationmodel for the spread of Salmonella in the pork supply chainrdquoEuropean Journal of Operational Research vol 156 no 3 pp782ndash798 2004

[16] L MWein and Y Liu ldquoAnalyzing a bioterror attack on the foodsupply the case of botulinum toxin in milkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 102 no 28 pp 9984ndash9989 2005

[17] L Qin andQ SWang ldquoFood supply chain qualitymanagementmodel and simulation based on gamerdquo in Proceedings of theInternational Conference on Computer Modeling and Simulation(ICCMS rsquo09) pp 291ndash293 Macau China February 2009

[18] Q Zhang D Wang T Huang et al ldquoModelling provenance infood supply chain to track and trace foodborne diseaserdquo in Pro-ceedings of the International Conference on Computer Modelingand Simulation pp 69ndash75 Hong Kong China February 2012

[19] S Li L Xu and X Wang ldquoCompressed sensing signal and dataacquisitio in wireless sensorrdquo IEEE Transactions on IndustrialInformatics 2012

[20] Z Ding and X Gao ldquoA database cluster system frameworkfor managing massive sensor sampling data in the internet ofthingsrdquo Chinese Journal of Computers vol 35 no 6 pp 1175ndash1191 2012

[21] L Zhang J Liu and H Jiang ldquoEnergy-efficient locationtracking with smartphones for IoTrdquo in Proceedings of the IEEESensors pp 1ndash4 Taipei China October 2012

[22] H Abbey ldquoAn examination of the Reed-Frost theory of epi-demicsrdquo Human Biology vol 24 no 3 pp 201ndash233 1952

[23] L Elveback J P Fox and A Varma ldquoAn extension of tee reed-frost epidemicmodel for the study of competition between viralagents in the presence of interferencerdquoThe American Journal ofEpidemiology vol 80 no 3 pp 356ndash364 1964

[24] X Yuan H Guo H Xiao Z Wang and X Zhang ldquoHigh-dimensional data virtualizationrdquo in Proceedings of the Commu-nications of the CCF pp 13ndash16 April 2011

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Research Article A Case Study of Sensor Data Collection ...downloads.hindawi.com/journals/ijdsn/2013/382132.pdf · smart cities. Our major contributionsare as follows. We propose

International Journal of Distributed Sensor Networks 9

106 8420 12 14 16 18 20 22 24 260

10000

20000

30000

40000

50000

60000

70000

80000

90000

Dist

ribut

ion

of 3

000

00 te

sts

Contamination probability is 30Contamination probability is 67Contamination probability is 80Contamination probability is 90

Pollution proportion interval [(x minus 1)lowast4 xlowast4]

Figure 5 Prior probabilities distribution under different contami-nation probabilities 30 67 80 and 90

0

10

20

30

40

50

60

70

80

90

100

Accu

racy

of t

raci

ng al

gorit

hm (

)

Pollution proportion interval ()

3510

4-5

6-7

8-9

10-1

1

12-1

3

14-1

5

20ndash2

5

30ndash3

5

40ndash4

5

50ndash5

5

60ndash6

5

70ndash7

5

Figure 6 The relationship of tracing algorithm accuracy andpollution proportion intervals under different sample rates 3 5and 10

53 Contamination Visualization With the tool introducedby [24] and the information we choose to record SDPSprovides sampling data that can be represented visually afterbeing tested by sensors The Figures 11 and 12 show the dataflow of infected and uninfected food products respectivelyIn this case we use the configuration in Case 1 and setthe contamination source to be the 4th batch in factory inadvance

10050

0 20 40 60 80

60

70

80

90

100

Probability of infection ()

Trac

ing

accu

racy

()

Figure 7 Accuracy of tracing algorithm with different probabilitiesof infection

40 60 8050

60

70

80

90

100

Probability of infection ()

Partition random samplingGlobal random sampling

Trac

ing

accu

racy

()

Figure 8 Simulation results of partition sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

The vertical lines marked with locations and batchesnumbers represent the nodes in data flow Lines goingthrough these nodes are the traces of foodproducts As shownin those figures most of the infected food while none of theuninfected food passed the 4th batch of factory (the nodewith a circle around it) So it has a great chance to be thesource of contamination which is also proven by our tracingsystem

SDPS makes data concise but still comprehensive whichfacilitates visualization tool displaying the useful informa-tion

Apart from aiding detecting contamination source visu-alization can also help to know contamination conditions(eg contamination severitydistribution) of the whole IoTsystem better For example a well-managed warehouse or acity with lower temperature may lead to less contamination

10 International Journal of Distributed Sensor Networks

Probability of infection ()

Trac

ing

accu

racy

()

3 sampling proportion5 sampling proportion

10 sampling proportionDynamic sampling

40 60 8050

60

70

80

90

100

Figure 9 Simulation results of dynamic sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

Probability of infection ()

Trac

ing

accu

racy

()

500 30 60 90

60

70

80

90

100

Figure 10 Simulation results of tracing accuracywith big data underdifferent infection probabilities

Information of these kinds can be directly read from visual-ization images and help manufacturers to design their foodsupply chain more scientifically

Visualization of the contamination condition in IoTsystemmakes the provenance reasoning in food supply chainintuitive and informative

54 Performance Estimation in Real Situations In realityfood supply chain is more complicated A lot of factors suchas specific food type environment temperaturemanufactureprocess and other parameters could make the chain difficultto predict To implement our strategy into real situationsthose factors should be concerned and some parametersshould be adjusted accordingly

The factor that influences the behavior of food supplychain the most is the type of food Different food has its

own characteristics which may dominate the model theprovenance procedure and expecting results Firstly foodtype can decide the possible contamination source In (1)119875exp and 119875imp are related with the virus that spreads amongfood For example avian influenza virus which is a commoninfectious disease among poultry began to be contiguousamong human beings After the mutation infectious abilityof this virus grew significantly As a result 119875exp and 119875impof avian influenza virus in chickens would also increaseSecondly food type is a deciding factor for its storagepattern and quality guarantee period Some canneddrinks arestacked layer by layer separately so they would not got crosscontaminated However raw meat is generally kept togetherwhich provides an easy environment for virus to spreadThirdly the state of food is also dominant in provenance Onepiece of food in solid state can be seen as a unit while liquidfood like yogurt could be ruined by only one deteriorateddrop In this way for yogurt the sampling process couldbe very different as spatial position should be taken intoconsideration and virusrsquo behavior of liquid should also bestudied

Besides food type there are other factors playing impor-tant roles in real situation Food in summer is more likelyto turn rotten than winter some manufacturing factoriesare more hygienic than others with time passing food mayget easier to be infectious and the types or dosages offood addictives may make the contamination process sloweddown

Although different food supply chain can behave var-iously our proposed strategy can cover most of the casesbecause it obeys the general model of food supply networkand epidemiological principles

6 Conclusion

In this paper we present a heuristic approach to tracingcontamination sources in large IoT systems for complicatedfood supply chains which is a critical issue in metropoli-tan life In our approach Self-adaptive Dynamic PartitionSampling (SDPS) Strategy was proposed to collect data forsensors whose input is only a small portion of end marketsamples from huge volume of samples along food supplychains The approach was illustrated with a case study ofIoT system about provenance in food supply chain whichcan efficiently stop the outbreaks of foodborne disease Withthe intelligent SDPS Strategy objects tested by sensors arethe most reasonable portion of the entire products set Theefficiency is highly improved and the accuracy stays almostthe same as sensing all the objects at the same time SDPSkeeps the integrity of information and approaches a nearlyreal-time examination Also we present a tracing algorithmto find the contamination sources of food supply chainsand a backtracing algorithm to provide strategy for recallingproblematical food undiscovered in the chain It is indicatedin simulation results that our SDPS scheme can achieve up tothe tracing accuracy of 978with a smaller average samplingpercentage compared with traditional global random sam-pling We managed to sample a small portion of food only

International Journal of Distributed Sensor Networks 11

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 11 Visualized data flow of infected food products in food supply chain

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 12 Visualized data flow of uninfected food products in food supply chain

in the end market without loss in accuracy of provenancetracing over the whole IoT system In addition our analyticdata and visualized images can clearly model contaminationconditions in food supply chain within the context of IoTsystem This will give the clients an intuitive impression onfood supply networks in a city

In this paper we assume that all provenance informationof food products is hosted by a centralized repository andthese provenance metadata are organized in a uniform man-ner Our future work is to further make practical implemen-tation of the provenance of food supply chain in a communityas our testing bed for megacity management

12 International Journal of Distributed Sensor Networks

Acknowledgments

This paper is sponsored in part by the Shanghai Interna-tional Science and Technology Collaboration Program underGrant 13430710400 and Campus for Research Excellenceand Technological Enterprise (CREATE) program of Singa-pore National Research Foundation under the joint projecton Energy and Environmental Sustainability Solutions forMegacities (R-706-000-101-281) by Shanghai Jiao Tong Uni-versity (SJTU) and National University of Singapore (NUS)ProfessorQiu is partially supported byNSFCNS-1249223 andNSFC 61071061

References

[1] ldquoWikipedia on smart cityrdquo httpenwikipediaorgwikiSmartcity

[2] M Qiu and E H M Sha ldquoCost minimization while satisfy-ing hardsoft timing constraints for heterogeneous embeddedsystemsrdquoACMTransactions on Design Automation of ElectronicSystems vol 14 no 2 article 25 2009

[3] J Li M Qiu Z Ming G Quan X Qin and Z Gu ldquoOnlineoptimization for scheduling preemptable tasks on IaaS cloudsystemsrdquo Journal of Parallel and Distributed Computing vol 72no 5 pp 666ndash677 2012

[4] K Su J Li and H Fu ldquoSmart city and the applicationsrdquoin Proceedings of the International Conference on ElectronicsCommunications and Control (ICECC rsquo2011) pp 1028ndash1033Zhejiang China September 2011

[5] X Tang J Pu K Cao Y Zhang and Z Xiong ldquoIntegratedextensible simulation platform for vehicular sensor networksin smart citiesrdquo International Journal of Distributed SensorNetworks vol 2012 Article ID 860415 10 pages 2012

[6] P Vlacheas R Giaffreda V Stavroulaki et al ldquoEnabling smartcities through a cognitive management framework for theinternet of thingsrdquo IEEE Communications Magazine vol 51 no6 pp 102ndash111 2013

[7] A Asin ldquoSmart cities from libelium allows systems integratorsto monitor noise pollution structural health and waste man-agementrdquo Smart Cities Articles 2011

[8] Kevin Ashton ldquoThat ldquointernet of thingsrdquo thingrdquo RFID Journal2011

[9] P Magrassi and T Berg ldquoA world of smart objectsrdquo GartnerResearch Report TR-17-2243 2002

[10] Oxford English Dictionary (OED) ldquoThe fact of comingfrom some particular source or quarter source derivationrdquohttpenwikipediaorgwikiProvenance

[11] AV Roth AA TsayM E Pullman and J VGray ldquoUnravelingthe food supply chain strategic insights from China and the2007 recallsrdquo Journal of Supply Chain Management vol 44 no1 pp 22ndash39 2008

[12] S Miles P Groth S Munroe and L Moreau ldquoPrime amethodology for developing provenance-aware applicationsrdquoACM Transactions on Software Engineering and Methodologyvol 20 no 3 article 8 2011

[13] R Hasan R Sion and M Winsltt ldquoThe case of the fakePicasso preventing history forgery with secure provenancerdquo inProceedings of the 7th Conference on File the Storage Technologies(FAST rsquo09) pp 1ndash14 New York NY USA December 2009

[14] T A McMeekin J Baranyi J Bowman et al ldquoInformationsystems in food safety managementrdquo International Journal ofFood Microbiology vol 112 no 3 pp 181ndash194 2006

[15] M A van der Gaag F Vos H W Saatkamp M van Boven Pvan Beek and R B M Huirne ldquoA state-transition simulationmodel for the spread of Salmonella in the pork supply chainrdquoEuropean Journal of Operational Research vol 156 no 3 pp782ndash798 2004

[16] L MWein and Y Liu ldquoAnalyzing a bioterror attack on the foodsupply the case of botulinum toxin in milkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 102 no 28 pp 9984ndash9989 2005

[17] L Qin andQ SWang ldquoFood supply chain qualitymanagementmodel and simulation based on gamerdquo in Proceedings of theInternational Conference on Computer Modeling and Simulation(ICCMS rsquo09) pp 291ndash293 Macau China February 2009

[18] Q Zhang D Wang T Huang et al ldquoModelling provenance infood supply chain to track and trace foodborne diseaserdquo in Pro-ceedings of the International Conference on Computer Modelingand Simulation pp 69ndash75 Hong Kong China February 2012

[19] S Li L Xu and X Wang ldquoCompressed sensing signal and dataacquisitio in wireless sensorrdquo IEEE Transactions on IndustrialInformatics 2012

[20] Z Ding and X Gao ldquoA database cluster system frameworkfor managing massive sensor sampling data in the internet ofthingsrdquo Chinese Journal of Computers vol 35 no 6 pp 1175ndash1191 2012

[21] L Zhang J Liu and H Jiang ldquoEnergy-efficient locationtracking with smartphones for IoTrdquo in Proceedings of the IEEESensors pp 1ndash4 Taipei China October 2012

[22] H Abbey ldquoAn examination of the Reed-Frost theory of epi-demicsrdquo Human Biology vol 24 no 3 pp 201ndash233 1952

[23] L Elveback J P Fox and A Varma ldquoAn extension of tee reed-frost epidemicmodel for the study of competition between viralagents in the presence of interferencerdquoThe American Journal ofEpidemiology vol 80 no 3 pp 356ndash364 1964

[24] X Yuan H Guo H Xiao Z Wang and X Zhang ldquoHigh-dimensional data virtualizationrdquo in Proceedings of the Commu-nications of the CCF pp 13ndash16 April 2011

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Research Article A Case Study of Sensor Data Collection ...downloads.hindawi.com/journals/ijdsn/2013/382132.pdf · smart cities. Our major contributionsare as follows. We propose

10 International Journal of Distributed Sensor Networks

Probability of infection ()

Trac

ing

accu

racy

()

3 sampling proportion5 sampling proportion

10 sampling proportionDynamic sampling

40 60 8050

60

70

80

90

100

Figure 9 Simulation results of dynamic sampling strategy tracingaccuracy under infection probabilities of 30 67 80 and 90

Probability of infection ()

Trac

ing

accu

racy

()

500 30 60 90

60

70

80

90

100

Figure 10 Simulation results of tracing accuracywith big data underdifferent infection probabilities

Information of these kinds can be directly read from visual-ization images and help manufacturers to design their foodsupply chain more scientifically

Visualization of the contamination condition in IoTsystemmakes the provenance reasoning in food supply chainintuitive and informative

54 Performance Estimation in Real Situations In realityfood supply chain is more complicated A lot of factors suchas specific food type environment temperaturemanufactureprocess and other parameters could make the chain difficultto predict To implement our strategy into real situationsthose factors should be concerned and some parametersshould be adjusted accordingly

The factor that influences the behavior of food supplychain the most is the type of food Different food has its

own characteristics which may dominate the model theprovenance procedure and expecting results Firstly foodtype can decide the possible contamination source In (1)119875exp and 119875imp are related with the virus that spreads amongfood For example avian influenza virus which is a commoninfectious disease among poultry began to be contiguousamong human beings After the mutation infectious abilityof this virus grew significantly As a result 119875exp and 119875impof avian influenza virus in chickens would also increaseSecondly food type is a deciding factor for its storagepattern and quality guarantee period Some canneddrinks arestacked layer by layer separately so they would not got crosscontaminated However raw meat is generally kept togetherwhich provides an easy environment for virus to spreadThirdly the state of food is also dominant in provenance Onepiece of food in solid state can be seen as a unit while liquidfood like yogurt could be ruined by only one deteriorateddrop In this way for yogurt the sampling process couldbe very different as spatial position should be taken intoconsideration and virusrsquo behavior of liquid should also bestudied

Besides food type there are other factors playing impor-tant roles in real situation Food in summer is more likelyto turn rotten than winter some manufacturing factoriesare more hygienic than others with time passing food mayget easier to be infectious and the types or dosages offood addictives may make the contamination process sloweddown

Although different food supply chain can behave var-iously our proposed strategy can cover most of the casesbecause it obeys the general model of food supply networkand epidemiological principles

6 Conclusion

In this paper we present a heuristic approach to tracingcontamination sources in large IoT systems for complicatedfood supply chains which is a critical issue in metropoli-tan life In our approach Self-adaptive Dynamic PartitionSampling (SDPS) Strategy was proposed to collect data forsensors whose input is only a small portion of end marketsamples from huge volume of samples along food supplychains The approach was illustrated with a case study ofIoT system about provenance in food supply chain whichcan efficiently stop the outbreaks of foodborne disease Withthe intelligent SDPS Strategy objects tested by sensors arethe most reasonable portion of the entire products set Theefficiency is highly improved and the accuracy stays almostthe same as sensing all the objects at the same time SDPSkeeps the integrity of information and approaches a nearlyreal-time examination Also we present a tracing algorithmto find the contamination sources of food supply chainsand a backtracing algorithm to provide strategy for recallingproblematical food undiscovered in the chain It is indicatedin simulation results that our SDPS scheme can achieve up tothe tracing accuracy of 978with a smaller average samplingpercentage compared with traditional global random sam-pling We managed to sample a small portion of food only

International Journal of Distributed Sensor Networks 11

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 11 Visualized data flow of infected food products in food supply chain

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 12 Visualized data flow of uninfected food products in food supply chain

in the end market without loss in accuracy of provenancetracing over the whole IoT system In addition our analyticdata and visualized images can clearly model contaminationconditions in food supply chain within the context of IoTsystem This will give the clients an intuitive impression onfood supply networks in a city

In this paper we assume that all provenance informationof food products is hosted by a centralized repository andthese provenance metadata are organized in a uniform man-ner Our future work is to further make practical implemen-tation of the provenance of food supply chain in a communityas our testing bed for megacity management

12 International Journal of Distributed Sensor Networks

Acknowledgments

This paper is sponsored in part by the Shanghai Interna-tional Science and Technology Collaboration Program underGrant 13430710400 and Campus for Research Excellenceand Technological Enterprise (CREATE) program of Singa-pore National Research Foundation under the joint projecton Energy and Environmental Sustainability Solutions forMegacities (R-706-000-101-281) by Shanghai Jiao Tong Uni-versity (SJTU) and National University of Singapore (NUS)ProfessorQiu is partially supported byNSFCNS-1249223 andNSFC 61071061

References

[1] ldquoWikipedia on smart cityrdquo httpenwikipediaorgwikiSmartcity

[2] M Qiu and E H M Sha ldquoCost minimization while satisfy-ing hardsoft timing constraints for heterogeneous embeddedsystemsrdquoACMTransactions on Design Automation of ElectronicSystems vol 14 no 2 article 25 2009

[3] J Li M Qiu Z Ming G Quan X Qin and Z Gu ldquoOnlineoptimization for scheduling preemptable tasks on IaaS cloudsystemsrdquo Journal of Parallel and Distributed Computing vol 72no 5 pp 666ndash677 2012

[4] K Su J Li and H Fu ldquoSmart city and the applicationsrdquoin Proceedings of the International Conference on ElectronicsCommunications and Control (ICECC rsquo2011) pp 1028ndash1033Zhejiang China September 2011

[5] X Tang J Pu K Cao Y Zhang and Z Xiong ldquoIntegratedextensible simulation platform for vehicular sensor networksin smart citiesrdquo International Journal of Distributed SensorNetworks vol 2012 Article ID 860415 10 pages 2012

[6] P Vlacheas R Giaffreda V Stavroulaki et al ldquoEnabling smartcities through a cognitive management framework for theinternet of thingsrdquo IEEE Communications Magazine vol 51 no6 pp 102ndash111 2013

[7] A Asin ldquoSmart cities from libelium allows systems integratorsto monitor noise pollution structural health and waste man-agementrdquo Smart Cities Articles 2011

[8] Kevin Ashton ldquoThat ldquointernet of thingsrdquo thingrdquo RFID Journal2011

[9] P Magrassi and T Berg ldquoA world of smart objectsrdquo GartnerResearch Report TR-17-2243 2002

[10] Oxford English Dictionary (OED) ldquoThe fact of comingfrom some particular source or quarter source derivationrdquohttpenwikipediaorgwikiProvenance

[11] AV Roth AA TsayM E Pullman and J VGray ldquoUnravelingthe food supply chain strategic insights from China and the2007 recallsrdquo Journal of Supply Chain Management vol 44 no1 pp 22ndash39 2008

[12] S Miles P Groth S Munroe and L Moreau ldquoPrime amethodology for developing provenance-aware applicationsrdquoACM Transactions on Software Engineering and Methodologyvol 20 no 3 article 8 2011

[13] R Hasan R Sion and M Winsltt ldquoThe case of the fakePicasso preventing history forgery with secure provenancerdquo inProceedings of the 7th Conference on File the Storage Technologies(FAST rsquo09) pp 1ndash14 New York NY USA December 2009

[14] T A McMeekin J Baranyi J Bowman et al ldquoInformationsystems in food safety managementrdquo International Journal ofFood Microbiology vol 112 no 3 pp 181ndash194 2006

[15] M A van der Gaag F Vos H W Saatkamp M van Boven Pvan Beek and R B M Huirne ldquoA state-transition simulationmodel for the spread of Salmonella in the pork supply chainrdquoEuropean Journal of Operational Research vol 156 no 3 pp782ndash798 2004

[16] L MWein and Y Liu ldquoAnalyzing a bioterror attack on the foodsupply the case of botulinum toxin in milkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 102 no 28 pp 9984ndash9989 2005

[17] L Qin andQ SWang ldquoFood supply chain qualitymanagementmodel and simulation based on gamerdquo in Proceedings of theInternational Conference on Computer Modeling and Simulation(ICCMS rsquo09) pp 291ndash293 Macau China February 2009

[18] Q Zhang D Wang T Huang et al ldquoModelling provenance infood supply chain to track and trace foodborne diseaserdquo in Pro-ceedings of the International Conference on Computer Modelingand Simulation pp 69ndash75 Hong Kong China February 2012

[19] S Li L Xu and X Wang ldquoCompressed sensing signal and dataacquisitio in wireless sensorrdquo IEEE Transactions on IndustrialInformatics 2012

[20] Z Ding and X Gao ldquoA database cluster system frameworkfor managing massive sensor sampling data in the internet ofthingsrdquo Chinese Journal of Computers vol 35 no 6 pp 1175ndash1191 2012

[21] L Zhang J Liu and H Jiang ldquoEnergy-efficient locationtracking with smartphones for IoTrdquo in Proceedings of the IEEESensors pp 1ndash4 Taipei China October 2012

[22] H Abbey ldquoAn examination of the Reed-Frost theory of epi-demicsrdquo Human Biology vol 24 no 3 pp 201ndash233 1952

[23] L Elveback J P Fox and A Varma ldquoAn extension of tee reed-frost epidemicmodel for the study of competition between viralagents in the presence of interferencerdquoThe American Journal ofEpidemiology vol 80 no 3 pp 356ndash364 1964

[24] X Yuan H Guo H Xiao Z Wang and X Zhang ldquoHigh-dimensional data virtualizationrdquo in Proceedings of the Commu-nications of the CCF pp 13ndash16 April 2011

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 11: Research Article A Case Study of Sensor Data Collection ...downloads.hindawi.com/journals/ijdsn/2013/382132.pdf · smart cities. Our major contributionsare as follows. We propose

International Journal of Distributed Sensor Networks 11

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 11 Visualized data flow of infected food products in food supply chain

Factory batch 4

Farm batch 0ndash9 Vehicle 3 4 5 batch 0ndash9Vehicle 2 batch 0ndash9 Factory batch 0ndash9 Market 1 2 3 batch 0ndash9

Vehicle 1 and 2 batch 0ndash9Vehicle 1 batch 0ndash9

Figure 12 Visualized data flow of uninfected food products in food supply chain

in the end market without loss in accuracy of provenancetracing over the whole IoT system In addition our analyticdata and visualized images can clearly model contaminationconditions in food supply chain within the context of IoTsystem This will give the clients an intuitive impression onfood supply networks in a city

In this paper we assume that all provenance informationof food products is hosted by a centralized repository andthese provenance metadata are organized in a uniform man-ner Our future work is to further make practical implemen-tation of the provenance of food supply chain in a communityas our testing bed for megacity management

12 International Journal of Distributed Sensor Networks

Acknowledgments

This paper is sponsored in part by the Shanghai Interna-tional Science and Technology Collaboration Program underGrant 13430710400 and Campus for Research Excellenceand Technological Enterprise (CREATE) program of Singa-pore National Research Foundation under the joint projecton Energy and Environmental Sustainability Solutions forMegacities (R-706-000-101-281) by Shanghai Jiao Tong Uni-versity (SJTU) and National University of Singapore (NUS)ProfessorQiu is partially supported byNSFCNS-1249223 andNSFC 61071061

References

[1] ldquoWikipedia on smart cityrdquo httpenwikipediaorgwikiSmartcity

[2] M Qiu and E H M Sha ldquoCost minimization while satisfy-ing hardsoft timing constraints for heterogeneous embeddedsystemsrdquoACMTransactions on Design Automation of ElectronicSystems vol 14 no 2 article 25 2009

[3] J Li M Qiu Z Ming G Quan X Qin and Z Gu ldquoOnlineoptimization for scheduling preemptable tasks on IaaS cloudsystemsrdquo Journal of Parallel and Distributed Computing vol 72no 5 pp 666ndash677 2012

[4] K Su J Li and H Fu ldquoSmart city and the applicationsrdquoin Proceedings of the International Conference on ElectronicsCommunications and Control (ICECC rsquo2011) pp 1028ndash1033Zhejiang China September 2011

[5] X Tang J Pu K Cao Y Zhang and Z Xiong ldquoIntegratedextensible simulation platform for vehicular sensor networksin smart citiesrdquo International Journal of Distributed SensorNetworks vol 2012 Article ID 860415 10 pages 2012

[6] P Vlacheas R Giaffreda V Stavroulaki et al ldquoEnabling smartcities through a cognitive management framework for theinternet of thingsrdquo IEEE Communications Magazine vol 51 no6 pp 102ndash111 2013

[7] A Asin ldquoSmart cities from libelium allows systems integratorsto monitor noise pollution structural health and waste man-agementrdquo Smart Cities Articles 2011

[8] Kevin Ashton ldquoThat ldquointernet of thingsrdquo thingrdquo RFID Journal2011

[9] P Magrassi and T Berg ldquoA world of smart objectsrdquo GartnerResearch Report TR-17-2243 2002

[10] Oxford English Dictionary (OED) ldquoThe fact of comingfrom some particular source or quarter source derivationrdquohttpenwikipediaorgwikiProvenance

[11] AV Roth AA TsayM E Pullman and J VGray ldquoUnravelingthe food supply chain strategic insights from China and the2007 recallsrdquo Journal of Supply Chain Management vol 44 no1 pp 22ndash39 2008

[12] S Miles P Groth S Munroe and L Moreau ldquoPrime amethodology for developing provenance-aware applicationsrdquoACM Transactions on Software Engineering and Methodologyvol 20 no 3 article 8 2011

[13] R Hasan R Sion and M Winsltt ldquoThe case of the fakePicasso preventing history forgery with secure provenancerdquo inProceedings of the 7th Conference on File the Storage Technologies(FAST rsquo09) pp 1ndash14 New York NY USA December 2009

[14] T A McMeekin J Baranyi J Bowman et al ldquoInformationsystems in food safety managementrdquo International Journal ofFood Microbiology vol 112 no 3 pp 181ndash194 2006

[15] M A van der Gaag F Vos H W Saatkamp M van Boven Pvan Beek and R B M Huirne ldquoA state-transition simulationmodel for the spread of Salmonella in the pork supply chainrdquoEuropean Journal of Operational Research vol 156 no 3 pp782ndash798 2004

[16] L MWein and Y Liu ldquoAnalyzing a bioterror attack on the foodsupply the case of botulinum toxin in milkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 102 no 28 pp 9984ndash9989 2005

[17] L Qin andQ SWang ldquoFood supply chain qualitymanagementmodel and simulation based on gamerdquo in Proceedings of theInternational Conference on Computer Modeling and Simulation(ICCMS rsquo09) pp 291ndash293 Macau China February 2009

[18] Q Zhang D Wang T Huang et al ldquoModelling provenance infood supply chain to track and trace foodborne diseaserdquo in Pro-ceedings of the International Conference on Computer Modelingand Simulation pp 69ndash75 Hong Kong China February 2012

[19] S Li L Xu and X Wang ldquoCompressed sensing signal and dataacquisitio in wireless sensorrdquo IEEE Transactions on IndustrialInformatics 2012

[20] Z Ding and X Gao ldquoA database cluster system frameworkfor managing massive sensor sampling data in the internet ofthingsrdquo Chinese Journal of Computers vol 35 no 6 pp 1175ndash1191 2012

[21] L Zhang J Liu and H Jiang ldquoEnergy-efficient locationtracking with smartphones for IoTrdquo in Proceedings of the IEEESensors pp 1ndash4 Taipei China October 2012

[22] H Abbey ldquoAn examination of the Reed-Frost theory of epi-demicsrdquo Human Biology vol 24 no 3 pp 201ndash233 1952

[23] L Elveback J P Fox and A Varma ldquoAn extension of tee reed-frost epidemicmodel for the study of competition between viralagents in the presence of interferencerdquoThe American Journal ofEpidemiology vol 80 no 3 pp 356ndash364 1964

[24] X Yuan H Guo H Xiao Z Wang and X Zhang ldquoHigh-dimensional data virtualizationrdquo in Proceedings of the Commu-nications of the CCF pp 13ndash16 April 2011

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 12: Research Article A Case Study of Sensor Data Collection ...downloads.hindawi.com/journals/ijdsn/2013/382132.pdf · smart cities. Our major contributionsare as follows. We propose

12 International Journal of Distributed Sensor Networks

Acknowledgments

This paper is sponsored in part by the Shanghai Interna-tional Science and Technology Collaboration Program underGrant 13430710400 and Campus for Research Excellenceand Technological Enterprise (CREATE) program of Singa-pore National Research Foundation under the joint projecton Energy and Environmental Sustainability Solutions forMegacities (R-706-000-101-281) by Shanghai Jiao Tong Uni-versity (SJTU) and National University of Singapore (NUS)ProfessorQiu is partially supported byNSFCNS-1249223 andNSFC 61071061

References

[1] ldquoWikipedia on smart cityrdquo httpenwikipediaorgwikiSmartcity

[2] M Qiu and E H M Sha ldquoCost minimization while satisfy-ing hardsoft timing constraints for heterogeneous embeddedsystemsrdquoACMTransactions on Design Automation of ElectronicSystems vol 14 no 2 article 25 2009

[3] J Li M Qiu Z Ming G Quan X Qin and Z Gu ldquoOnlineoptimization for scheduling preemptable tasks on IaaS cloudsystemsrdquo Journal of Parallel and Distributed Computing vol 72no 5 pp 666ndash677 2012

[4] K Su J Li and H Fu ldquoSmart city and the applicationsrdquoin Proceedings of the International Conference on ElectronicsCommunications and Control (ICECC rsquo2011) pp 1028ndash1033Zhejiang China September 2011

[5] X Tang J Pu K Cao Y Zhang and Z Xiong ldquoIntegratedextensible simulation platform for vehicular sensor networksin smart citiesrdquo International Journal of Distributed SensorNetworks vol 2012 Article ID 860415 10 pages 2012

[6] P Vlacheas R Giaffreda V Stavroulaki et al ldquoEnabling smartcities through a cognitive management framework for theinternet of thingsrdquo IEEE Communications Magazine vol 51 no6 pp 102ndash111 2013

[7] A Asin ldquoSmart cities from libelium allows systems integratorsto monitor noise pollution structural health and waste man-agementrdquo Smart Cities Articles 2011

[8] Kevin Ashton ldquoThat ldquointernet of thingsrdquo thingrdquo RFID Journal2011

[9] P Magrassi and T Berg ldquoA world of smart objectsrdquo GartnerResearch Report TR-17-2243 2002

[10] Oxford English Dictionary (OED) ldquoThe fact of comingfrom some particular source or quarter source derivationrdquohttpenwikipediaorgwikiProvenance

[11] AV Roth AA TsayM E Pullman and J VGray ldquoUnravelingthe food supply chain strategic insights from China and the2007 recallsrdquo Journal of Supply Chain Management vol 44 no1 pp 22ndash39 2008

[12] S Miles P Groth S Munroe and L Moreau ldquoPrime amethodology for developing provenance-aware applicationsrdquoACM Transactions on Software Engineering and Methodologyvol 20 no 3 article 8 2011

[13] R Hasan R Sion and M Winsltt ldquoThe case of the fakePicasso preventing history forgery with secure provenancerdquo inProceedings of the 7th Conference on File the Storage Technologies(FAST rsquo09) pp 1ndash14 New York NY USA December 2009

[14] T A McMeekin J Baranyi J Bowman et al ldquoInformationsystems in food safety managementrdquo International Journal ofFood Microbiology vol 112 no 3 pp 181ndash194 2006

[15] M A van der Gaag F Vos H W Saatkamp M van Boven Pvan Beek and R B M Huirne ldquoA state-transition simulationmodel for the spread of Salmonella in the pork supply chainrdquoEuropean Journal of Operational Research vol 156 no 3 pp782ndash798 2004

[16] L MWein and Y Liu ldquoAnalyzing a bioterror attack on the foodsupply the case of botulinum toxin in milkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 102 no 28 pp 9984ndash9989 2005

[17] L Qin andQ SWang ldquoFood supply chain qualitymanagementmodel and simulation based on gamerdquo in Proceedings of theInternational Conference on Computer Modeling and Simulation(ICCMS rsquo09) pp 291ndash293 Macau China February 2009

[18] Q Zhang D Wang T Huang et al ldquoModelling provenance infood supply chain to track and trace foodborne diseaserdquo in Pro-ceedings of the International Conference on Computer Modelingand Simulation pp 69ndash75 Hong Kong China February 2012

[19] S Li L Xu and X Wang ldquoCompressed sensing signal and dataacquisitio in wireless sensorrdquo IEEE Transactions on IndustrialInformatics 2012

[20] Z Ding and X Gao ldquoA database cluster system frameworkfor managing massive sensor sampling data in the internet ofthingsrdquo Chinese Journal of Computers vol 35 no 6 pp 1175ndash1191 2012

[21] L Zhang J Liu and H Jiang ldquoEnergy-efficient locationtracking with smartphones for IoTrdquo in Proceedings of the IEEESensors pp 1ndash4 Taipei China October 2012

[22] H Abbey ldquoAn examination of the Reed-Frost theory of epi-demicsrdquo Human Biology vol 24 no 3 pp 201ndash233 1952

[23] L Elveback J P Fox and A Varma ldquoAn extension of tee reed-frost epidemicmodel for the study of competition between viralagents in the presence of interferencerdquoThe American Journal ofEpidemiology vol 80 no 3 pp 356ndash364 1964

[24] X Yuan H Guo H Xiao Z Wang and X Zhang ldquoHigh-dimensional data virtualizationrdquo in Proceedings of the Commu-nications of the CCF pp 13ndash16 April 2011

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 13: Research Article A Case Study of Sensor Data Collection ...downloads.hindawi.com/journals/ijdsn/2013/382132.pdf · smart cities. Our major contributionsare as follows. We propose

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of