protectionofsensitivedatainindustrialinternetbasedon three...

16
Research Article Protection of Sensitive Data in Industrial Internet Based on Three-Layer Local/Fog/Cloud Storage Jing Liu , 1,2 Changbo Yuan, 1 Yingxu Lai , 1,3 and Hua Qin 4 1 College of Computer Science, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2 Shaanxi Key Laboratory of Network and System Security, Xidian University, Xian 710071, China 3 Science and Technology on Information Assurance Laboratory, Beijing 100072, China 4 Information Technology Support Center, Beijing University of Technology, Beijing 100124, China Correspondence should be addressed to Yingxu Lai; [email protected] Received 27 October 2019; Revised 3 February 2020; Accepted 5 February 2020; Published 2 April 2020 Academic Editor: Prosanta Gope Copyright©2020JingLiuetal.isisanopenaccessarticledistributedundertheCreativeCommonsAttributionLicense,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Industrial Internet technology has developed rapidly, and the security of industrial data has received much attention. At present, industrial enterprises lack a safe and professional data security system. us, industries urgently need a complete and effective data protection scheme. is study develops a three-layer framework with local/fog/cloud storage for protecting sensitive industrial data and defines a threat model. For real-time sensitive industrial data, we use the improved local differential privacy algorithm M-RAPPOR to perturb sensitive information. We encode the desensitized data using Reed–Solomon (RS) encoding and then store them in local equipment to realize low cost, high efficiency, and intelligent data protection. For non-real-time sensitive industrial data, we adopt a cloud-fog collaborative storage scheme based on AES-RS encoding to invisibly provide multilayer protection. We adopt the optimal solution of distributed storage in local equipment and the cloud-fog collaborative storage scheme in fog nodes and cloud nodes to alleviate the storage pressure on local equipment and to improve security and re- coverability. According to the defined threat model, we conduct a security analysis and prove that the proposed scheme can provide stronger data protection for sensitive data. Compared with traditional methods, this approach strengthens the protection of sensitive information and ensures real-time continuity of open data sharing. Finally, the feasibility of our scheme is validated through experimental evaluation. 1. Introduction Intelligent manufacturing, which consists of a man-machine integrated intelligent system with intelligent machines and human experts, is an inevitable trend in the continuing development of the global manufacturing industry [1]. In recent years, with the rapid development of Industrial In- ternet technology, industrial data protection has attracted much research interest [2]. In the industrial production process, a large amount of sensitive data is generated, in- cluding data from the manufacturing process of the pro- duction line, product cost information, operations data, operations information, marketing strategy, intellectual property rights, and customer data. If this sensitive data is leaked, it may lead to significant business information loss or even affect the reputation of an enterprise. In recent years, information leakage incidents have occurred repeatedly. For example, in 2017, Equifax (Atlanta, US) announced that a data breach had occurred. Recently, UpGuard (Sydney, Australia), a cybersecurity company, reported that re- searchers found more than 540 million records on Amazon’s S3 server, including Facebook user information such as comments, responses, and account names [3]. erefore, in the context of the Industrial Internet, it is a major challenge to ensure that sensitive information is not leaked. In actual industrial scenarios, some industrial data must be processed in real time, such as predictive maintenance data in current intelligent factories. If computer numerical control (CNC) machine tools fail, delays in the product production cycle as well as economic losses could occur. erefore, to avoid equipment failure, factories should perform predictive maintenance work such as collecting Hindawi Security and Communication Networks Volume 2020, Article ID 2017930, 16 pages https://doi.org/10.1155/2020/2017930

Upload: others

Post on 09-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

Research ArticleProtection of Sensitive Data in Industrial Internet Based onThree-Layer LocalFogCloud Storage

Jing Liu 12 Changbo Yuan1 Yingxu Lai 13 and Hua Qin4

1College of Computer Science Faculty of Information Technology Beijing University of Technology Beijing 100124 China2Shaanxi Key Laboratory of Network and System Security Xidian University Xian 710071 China3Science and Technology on Information Assurance Laboratory Beijing 100072 China4Information Technology Support Center Beijing University of Technology Beijing 100124 China

Correspondence should be addressed to Yingxu Lai laiyingxubjuteducn

Received 27 October 2019 Revised 3 February 2020 Accepted 5 February 2020 Published 2 April 2020

Academic Editor Prosanta Gope

Copyright copy 2020 Jing Liu et al-is is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Industrial Internet technology has developed rapidly and the security of industrial data has received much attention At presentindustrial enterprises lack a safe and professional data security system-us industries urgently need a complete and effective dataprotection scheme -is study develops a three-layer framework with localfogcloud storage for protecting sensitive industrialdata and defines a threat model For real-time sensitive industrial data we use the improved local differential privacy algorithmM-RAPPOR to perturb sensitive information We encode the desensitized data using ReedndashSolomon (RS) encoding and thenstore them in local equipment to realize low cost high efficiency and intelligent data protection For non-real-time sensitiveindustrial data we adopt a cloud-fog collaborative storage scheme based on AES-RS encoding to invisibly provide multilayerprotection We adopt the optimal solution of distributed storage in local equipment and the cloud-fog collaborative storagescheme in fog nodes and cloud nodes to alleviate the storage pressure on local equipment and to improve security and re-coverability According to the defined threat model we conduct a security analysis and prove that the proposed scheme canprovide stronger data protection for sensitive data Compared with traditional methods this approach strengthens the protectionof sensitive information and ensures real-time continuity of open data sharing Finally the feasibility of our scheme is validatedthrough experimental evaluation

1 Introduction

Intelligent manufacturing which consists of a man-machineintegrated intelligent system with intelligent machines andhuman experts is an inevitable trend in the continuingdevelopment of the global manufacturing industry [1] Inrecent years with the rapid development of Industrial In-ternet technology industrial data protection has attractedmuch research interest [2] In the industrial productionprocess a large amount of sensitive data is generated in-cluding data from the manufacturing process of the pro-duction line product cost information operations dataoperations information marketing strategy intellectualproperty rights and customer data If this sensitive data isleaked it may lead to significant business information loss oreven affect the reputation of an enterprise In recent years

information leakage incidents have occurred repeatedly Forexample in 2017 Equifax (Atlanta US) announced that adata breach had occurred Recently UpGuard (SydneyAustralia) a cybersecurity company reported that re-searchers foundmore than 540 million records on AmazonrsquosS3 server including Facebook user information such ascomments responses and account names [3] -erefore inthe context of the Industrial Internet it is a major challengeto ensure that sensitive information is not leaked

In actual industrial scenarios some industrial data mustbe processed in real time such as predictive maintenancedata in current intelligent factories If computer numericalcontrol (CNC) machine tools fail delays in the productproduction cycle as well as economic losses could occur-erefore to avoid equipment failure factories shouldperform predictive maintenance work such as collecting

HindawiSecurity and Communication NetworksVolume 2020 Article ID 2017930 16 pageshttpsdoiorg10115520202017930

sensor data in real time predicting possible situations andadjusting the equipment according to the predicted situationin a timely manner Sensor information such as node lo-cations and the spatial coordinates of the sensors themselvesalso constitute sensitive data We label this as real-timesensitive data To prevent the leakage of sensitive infor-mation and to ensure data security a secure storage methodis needed to store real-time sensitive data includingequipment locations safely and without affecting dataavailability

In addition a large amount of sensitive data is notprocessed in real time including a companyrsquos financial datainventory data production data marketing plans customerinformation intellectual property rights and supplier in-formation [4] We label this as non-real-time sensitive data-is data is usually stored with third-party cloud serviceproviders who may claim that the data is encrypted butcannot verify the encryption To prevent data leakage wecannot store the non-real-time sensitive data in the cloud-us for non-real-time sensitive data it is necessary todesign a data protection scheme that can fully utilize cloudstorage and ensure data security

-e recent development of cloud-fog collaborativecomputing provides a new approach to solve this problemCloud computing combines relatively independent com-puting technology and network technology superimposesdistributed computing power on a converged networkplatform and effectively integrates network and computingresources by virtualization technology which is a majorbreakthrough in IT technology Fog computing is an ex-tension and a powerful complement of cloud computingFigure 1 shows the architecture of cloud-fog computing Itincludes the factory terminal equipment layer fog com-puting layer and cloud computing layer -e main task ofthe factory terminal equipment layer is to collect data andupload it to the fog -e fog computing layer the middlelayer in this architecture plays an important role betweenthe cloud computing layer and the fog nodes -e fog nodeshave a given storage capacity and computing capacity -eintroduction of fog computing can reduce the cloud com-puting layer and improve work efficiency -e cloud com-puting layer has a high storage capacity and computingpower Because non-real-time processed data cannot becompletely stored in the cloud we adopt the cloud-fogcollaborative storage scheme In this study cloud-fog col-laborative storage means that the cloud nodes and the fognodes store corresponding data according to the storagestrategy that is some data is stored in the fog nodes andother data is stored in the cloud nodes

To solve the problem of secure storage for real-time andnon-real-time sensitive industrial data this study designs athree-layer protection framework with localfogcloudstorage for sensitive industrial data For real-time sensitiveindustrial data we use the improved local differential pri-vacy algorithmM-RAPPOR to perturb sensitive information(location information etc) and then encode the maskeddata in local equipment In addition we adopt the optimalscheme of distributed storage in local equipment to realizelow-cost high-efficiency data protection For non-real-time

sensitive industrial data we adopt cloud-fog collaborativestorage to achieve multilevel protection -is frameworkgreatly improves security and restorability and relieves thestorage pressure of local equipment

-e remainder of this paper is organized as followsSection 2 reviews related works Section 3 describes theprotection framework for sensitive data and defines thethreat model Section 4 presents experiments for two dif-ferent schemes and evaluates the results through severalindicators Finally Section 5 presents our conclusions

2 Related Work

With the continuous developments in information tech-nology in the era of big data the problem of data security hasattracted increasing attention How to ensure that sensitivedata is not leaked is a major challenge at present Cloud-fogcollaboration is a new solution to meet the current needs ofInternet of -ings (IoT) applications However there is arisk of sensitive-data leakage in both cloud computing andfog computing At present therefore data security hasemerged as an important issue for both academia and in-dustry As a result experts and researchers have proposedtheir own opinions and data protection programs -issection introduces the current research results from twoaspectsmdashthe protection of sensitive data and cloud-fogcollaborative computing

21 Protection of Sensitive Data

211 Protection of Sensitive Data Based on DifferentialPrivacy Dwork and Lei [5] proposed differential privacytechniques as a privacy protection model that defines an

Fog nodes

Fog server

Cloud

Fog

Factory 1 Factory 2 Factory N Factory 1 Factory N

Figure 1 Architecture of cloud-fog computing

2 Security and Communication Networks

extremely rigorous attack model that operates withoutregard on how much background knowledge the attackerhas Kenthapadi et al [6] used differential privacy to addresssensitive-data protection issues for collective user behaviorMohammed et al [7] proposed a differential privacy algo-rithm for private data distribution and vertical partitiondata In a scenario where third-party data collectors are nottrusted Duchi et al [8] proposed local differential privacywhere every user understands their privacy protectionprocess and can individually process and protect personalsensitive information Erlingsson [9] developed the Ran-domized Aggregatable Privacy-Preserving Ordinal Response(RAPPOR) privacy algorithm that enables Google to collectuser answers to questions such as a browserrsquos default homepage and default search engine to understand the unwel-come or malicious hijacking of user settings Gu et al [10]proposed a privacy protection mining (DIFF-PPM) algo-rithm combined with the Markov chain Monte Carlo al-gorithm to provide privacy protection and maintain dataavailability when the (ε δ)-differential privacy conditions aremet

Most of these studies have aimed to protect usersrsquopersonal sensitive data and they have not applied differentialprivacy to the industrial field In addition the weakness ofdifferential privacy is obvious because the assumptionsabout background knowledge are too strong a large amountof randomization needs to be added to the query resultsresulting in a sharp decline in data availability -ereforehow to balance the level of sensitive-data protection and dataavailability according to specific application scenarios re-mains unclear and challenging in research on differentialprivacy applications

212 Sensitive Data Protection Based on Fog ComputingXu et al [11] introduced local differential privacy into fogcomputing and proposed a framework for local differentialprivacy protection however they did not mention specificimplementation details and privacy algorithms Wang et al[12] proposed a three-layer privacy protection frameworkhowever they did not consider the problem of localequipment failure in which case users would not be able toquery complete data Du et al [13] proposed a fog com-puting support data center query model based on differentialprivacy and proved through rigorous mathematical de-duction that it ensured the reliability and effectiveness ofprivacy protection Lyu et al [14] proposed the PPFA pri-vacy protection aggregation system that uses the stability of aGauss mechanism to ensure the differential privacy of thestatistical results and reduces the loss of privacy by com-bining a stream cipher with a public key cipher to maintainpracticability Huang et al [15] proposed an attribute-basedencryption scheme that makes full use of fog servers -ecollected data is encrypted by fog servers and then out-sourced to cloud servers -e experimental results show thatthe functions of the scheme are limited because it does notsupport efficient data search and the fog server takes up alarge part of the workload Lu et al [16] proposed theLightweight Privacy-preserving Data Aggregation (LPDA)

scheme Experiments indicated that this scheme was highlyefficient in terms of computational cost and communicationoverhead Wang et al [17] proposed a fog computing querymodel based on differential privacy Compared with thetraditional privacy protection model this model affordsdifferent degrees of improvement in computational over-head execution efficiency and energy consumption Kul-karni et al [18] proposed a privacy protection framework toprotect the data security of sensors between the terminalequipment and fog networks -is scheme introduced aselective public key cryptosystem that can resist internalattacks However they did not consider the computationaloverhead and the fog nodes may overload and collapse if thesensor transmits data for a long time

-e above studies introduced fog computing to protectsensitive data however most of them ignored the problemof the limited computational power and storage capacity offog nodes Many data protection schemes based on fogcomputing do a lot of work on the fog side this directly leadsto increased computational overhead and energy con-sumption resulting in jitter or even downtime of fog nodes

213 Protection of Sensitive Data in Cloud ComputingIn recent years data security in cloud storage has attractedattention in industry and academia Many experts and re-searchers have proposed schemes to address this issue Houet al [19] proposed protection schemes that use encrypteddata by the system when data has been transferred by SSLFeng [20] proposed a method in which data is encrypted in aclosed cloud environment to solve the problem of a dataleakage caused by the increased burden of the cloud serverIn addition one-time encryption and multipoint securestorage can be realized However encryption makes it moredifficult to search in the cloud At present a key issue in thefield of cloud computing is searchable encryption Fu et al[21ndash23] and Xia et al [24] provided different solutions to thisproblem that improved accuracy safety and efficiencyKulkarni et al [25] designed a virtual private storage servicebased on a newly developed encryption technology -isservice achieved the best combination of private cloud se-curity and public cloud functionality Wang et al [26]proposed that users do not have actual ownership of out-sourced data this makes data integrity protection in cloudcomputing a difficult task Shen et al [27] proposed anefficient public auditing protocol that includes a doubly-linked information table and a location array Experimentsshow that this protocol can reduce computational andcommunication overheads and achieve certain efficiencies

-e above studies improved the use of cloud storage forthe protection of sensitive data However they suffer from acommon problem they cannot provide a defense againstinternal attacks from the cloud

22 Cloud-Fog Collaborative Computing At present theacademic community mainly solves the problem of taskscheduling through cloud-fog coordination Pham and Huh[28] considered task scheduling in the cloud-fog collabo-ration computing system they proposed a task scheduling

Security and Communication Networks 3

algorithm that guarantees the performance of applicationexecution However they did not consider the fog providerrsquosbudget Deng et al [29] studied the balance between powerconsumption and delay in cloud-fog collaborative com-puting systems -ey first modeled the power consumptionand delay functions of various parts of the system andformulated allocation problems according to the workload-en they developed an approximate solution to theoriginal problem by decomposition and separately formu-lated three subproblems of the corresponding subsystemExperiments showed that fog computing could reducecommunication delay and significantly improve cloudcomputing performance Bierzynski et al [30] believed thatcloud-fog collaboration can solve most problems in IoT anddiscussed four possible ways to distribute workloads be-tween different levels

-e above studies solved the task scheduling problemthrough cloud-fog collaboration however they did notconsider network security and data protection issues Inaddition simply relying on fog computing or cloud com-puting cannot provide more secure protection for industrialdata

In summary the studies discussed in this section solvedthe data protection problem and task scheduling problem tosome extent However for industrial applications with botha large amount of data and a high degree of data protectiontheir results cannot meet the requirements of the IndustrialInternet update speed high data accuracy and large datavolume -us far academic and industrial research on theprotection of sensitive data in the Industrial Internet of-ings (IIoT) remains in its infancy In this study wepropose a protection model for sensitive data that provideshigher protection and is more adaptable to industrialenvironments

3 Sensitive Industrial Data Protection Scheme

Section 31 introduces our protection framework for sen-sitive industrial data and the threat model Sections 32 and33 describe the protection of data for real-time and non-real-time processing respectively

31 Our Work and Contribution

311 Sensitive Industrial Data Protection Framework-is study designs a three-layer protection framework withlocalfogcloud storage for sensitive industrial data asshown in Figure 2 -is framework mainly consists of threemodules M-RAPPOR disturbance module ReedndashSolomon(RS) encoding module and Advanced Encryption Standard(AES) encryption and RS encoding module -is frameworknot only achieves low-cost high-efficiency and intelligentdata protection but also alleviates the storage pressure oflocal equipment through improved recoverability

We divide industrial data into real-time and non-real-time sensitive data depending on its characteristics

(1) For real-time sensitive data we design a data pro-tection scheme based on local differential privacy

and RS encoding We apply the M-RAPPOR algo-rithm to the sensitive data (location information)apply RS encoding to the perturbed data and theundisturbed nonsensitive data and store them inlocal equipment Considering the storage capacity oflocal equipment and the problem that data cannot berecovered owing to local equipment failures weadopt the optimal solution of distributed storage inthe local equipment and add corresponding re-strictions to the RS encoding which not only solvesthe data recovery problem due to the failure of localequipment but also improves the encoding efficiencyand reduces the computational cost If we need toperform predictive maintenance analysis we candecode the data and then analyze it

(2) For non-real-time sensitive data we design a dataprotection scheme based on AES encryption and RSencoding -e local equipment encrypts the non-real-time data using AES uploads the ciphertext tothe fog nodes and performs RS encoding of the dataon the fog nodes Next the encoded data is stored inthe fog nodes and the cloud nodes according to thecloud-fog collaborative storage strategy In this wayif an attempt is made to steal the data from the fognodes or the cloud nodes all the data cannot betaken

312 6reat Model Figure 3 shows the threat model that isdefined according to the data protection framework

-ere are two threats to industrial data

(1) For real-time sensitive data attackers can attack thelocal data center -ey can steal location informationof equipment to infer the general plant layout -elayout is a type of sensitive industrial data andtherefore this attack will lead to the leakage of suchdata

(2) For non-real-time sensitive data attackers (internalpersonnel of the third-party) can abuse data miningto infer customersrsquo private information and thefactoryrsquos business strategy from the factoryrsquos cus-tomer information and marketing plans respec-tively -is will not only threaten customersrsquo privacybut also affect the factoryrsquos overall interests

32 Protection of Real-Time Industrial Data

321 Local Differential Privacy Local differential privacyconsiders scenarios in which third-party data collectors areuntrustworthy Each user perturbs the data according to theprivacy algorithm and then uploads the perturbed data to thedata collector Local differential privacy is formally definedas follows

Definition 1 -ere are n users where each user correspondsto a record A privacy algorithm M is given having a domainDom(M) and a range Ran(M) If algorithm M obtains thesame output result T(TsubeRan(M)) on any two records t and

4 Security and Communication Networks

tprime(t tprime isin Dom(M)) and satisfies the following inequalitiesthen M satisfies local differential privacy

Pr[M(t) T]le eε

times Pr M tprime( 1113857 T1113858 1113859 (1)

As can be seen from Definition 1 the local differentialprivacy ensures that the algorithm M satisfies the ε-localdifferential privacy by controlling the similarity of theoutput results of any two records In short according to the

AES encryption+RS encoding

Real-timeindustrial

sensitive data

Non-real-timeindustrial

sensitive data

M-RAPPOR disturbance

Sensitivedata

(locationinformation)

Non-sensitive data

Industrialcloud

Fog nodes

Cloud

Fog

Localequipment

LocalRSencoding

Data

Figure 2 Protection framework for sensitive industrial data

General plantlayout

Location data

1 Customer private information

2 Factory business strategy

Sensor Sensor Sensor Sensor SensorSensor

Local data center

Fog

Cloud

Local equipmentLocal equipment

Figure 3 -reat model of sensitive industrial data

Security and Communication Networks 5

output result of the privacy algorithm M it is almostimpossible to infer which record the input data is Withlocal differential privacy technology each user can processindividual data independently In other words the processof privacy processing is transferred from the data collectorto a single user and therefore there is no need for trustedthird-party intervention -is can avoid the attacks thatmay be brought on by untrusted third-party data collectors-e implementation of local differential privacy protectionrequires the intervention of a data disturbance mechanismRandom response technology is the mainstream distur-bance mechanism for local differential privacy protectiontechnology

322 ReedndashSolomon Encoding RS encoding is a matrixoperation As shown in Figure 4 the input data is treated asmatrix C (C1 C2 Cn) and the encoding matrix B ismultiplied by matrix C to obtain data blocks(C1 C2 Cn) and y redundant data blocks (n 5 y 3)-e encoding matrix B must have reversibility of anysubmatrix After RS encoding the data is divided into n partsand y redundant data are generated In the data of thesen + y parts one can recover the total data with at least n datavalues In other words the total data cannot be recoveredwith less than n data values

323 Data Protection Scheme Based on Local DifferentialPrivacy and ReedndashSolomon Encoding With continuousdevelopments in industries many factories have startedusing equipment such as CNC machine tools to increaseproductivity However some problems have also beenexposed For example a serious failure of a CNC machinetool will not only lead to the shutdown of the entireproduction line but also seriously delay the product pro-duction cycle -erefore to avoid this situation the factoryshould analyze the data collected by sensors predictpossible situations and warn operators or adjust theequipment according to the forecasted situation in a timelymanner to avoid an accidental shutdown of the equipmentHowever the data contains not only real-time data col-lected by the sensor but also a large amount of sensitivedata of the factory equipment such as node location andspatial coordinates -is type of sensitive data related tolocations should not be leaked before predictive mainte-nance analysis -erefore it needs to be desensitized beforedata analysis to eliminate the correspondence between theequipment and the location In addition because the real-time data information collected by factory sensors is ofgreat value and high confidentiality we cannot upload thisdata to the fog or the cloud it must be stored in localequipment -e protection scheme for real-time sensitivedata is described in detail below

We use local differential privacy to desensitize real-timesensitive data -en we encode the perturbed data andundisturbed nonsensitive data by RS encoding and store it inlocal equipment Figure 5 shows the protection process forreal-time data

-e real-time data collected by the factory sensors is firstuploaded to the local equipment-en the following processis performed in the local equipment

(1) Data Disturbance with Local Differential Privacy Becauseof the limited computing power of local equipment it cannotperform complex computing tasks -erefore it is very im-portant to design a lightweight privacy protection algorithmLocal differential privacy considers the untrustworthiness of thedata collector and its disturbancemechanismmainly includes arandom response information compression and distortion-e disturbance framework of the random response technologyis simple and intuitive and the disturbance level can be directlyquantified -erefore we choose local differential privacy toprotect the sensitive data We refer to the basic idea of theRAPPOR algorithm which is an open-source project of Googleand improve RAPPOR to reduce the computational cost of localequipment while ensuring privacy and data availability In thispaper we label the improved algorithm asM-RAPPOR and weuse it to disturb the location information Section 325 discussesthe specific disturbance steps

(2) RS Encoding and Storage We apply RS encoding to theperturbed data and undisturbed nonsensitive data and store thedata in local equipment However owing to the continuousproduction of data in factories and the limited storage capacityof local equipment the storage pressure on equipment isgradually increasing In addition if the local equipment failsunexpectedly it will not be possible to completely restore thedata resulting in serious losses to the factory To solve these twoproblems we adopt the optimal solution namely distributedstorage in the local equipment and add corresponding re-strictions to RS -is not only solves the problem of the highstorage pressure on local equipment and the fact that datacannot be recovered owing to the failure of the local equipmentbut also improves the encoding efficiency and reduces thecomputational cost Section 324 discusses the specific dis-tributed storage strategy

(3) RS Decoding and Analysis When predictive maintenanceanalysis is performed the data stored in the local equipmentare decoded and the analysis is performed according to thestatistical location distribution and data Section 325 dis-cusses the specific statistical process

lowast =

C

B

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

B11 B13 B14 B15B12 D1

D2

D3

C1

C2

C3

C4

C5

C1

C2

C3

C4

C5

B11 B13 B14 B15B12

B11 B13 B14 B15B12

Figure 4 RS encoding process

6 Security and Communication Networks

324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is

Zi b times Mi

M1 + M2 + middot middot middot + Mn

(2)

To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality

b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

lemlt b (3)

In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is

m b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

(4)

325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability

-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made

(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection

(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding

Masked data

Sensitive data(location space

coordinates)

Nonsensitivedata

Predictivemaintenance

analysis

DesensitizationdataAnalysis result

M-RAPPOR

RS-encoding

Local

Data processing Storage

Industrialproduction

line

Storage

RS decoding

Figure 5 Protection process for real-time data

Security and Communication Networks 7

attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased

-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)

(1) Data Sender

(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1

(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value

P Biprime x( 1113857

f1 x 1

f2 x 0

1 minus f1 minus f2 x Bi

⎧⎪⎪⎨

⎪⎪⎩(5)

To satisfy ε-local differential privacy the privacy budget εis calculated

ε ln1 minus f2

f1 (6)

Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once

(2) Data Collector

(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime

(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows

ni N f2 minus 2f1f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2+

Di f1 minus 1 + f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2

(7)

(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value

In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased

326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data

According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)

attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons

(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation

(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists

(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni

(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location

In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data

33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes

331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment

8 Security and Communication Networks

-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection

332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967

represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc

Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud

nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as

Ttotal Tcloud + Tfog (8)

where Tcloud is the storage time required for the cloud andTfog that for the fog

-e storage time required for each cloud node isexpressed as

Tck tck middot (k minus 1) middotSck

1113936nj1 Scj

(9)

where Tck is the storage time required for cloud node Nck

and tckis the storage time required when cloud node Nck

stores each data block-erefore the storage time Tcloud required for the cloud

is expressed as

Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)

In addition we analyze the two storage methods for thefog nodes as follows

(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as

AES

DATA

Some data

RS encoding

Other data

Local

Equipment 1 Equipment 2 hellip Equipment n

Fog

Cloud

Figure 6 Protection process for data that does not require real-time processing

Security and Communication Networks 9

Tfog

tf1 middot (x + 1) 0lt x + 1leF1 max

tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max

tfn middot x + 1 minus 1113936nminus1

i1Fi max1113888 1113889 + 1113936

nminus1

i1tfi middot Fi max 1113936

nminus1

i1Fi max ltx + 1le 1113936

n

i1Fi max

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi

(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as

Tfk tfk middot (x + 1) middotSfk

1113936ni1 Sfi

(12)

-erefore the storage time Tfog required for the fog isexpressed as

Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)

After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)

333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix

Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory

34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency

341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field

in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values

Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system

342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as

Est n

n + m (14)

-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes

4 Results and Discussion

41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments

411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a

10 Security and Communication Networks

normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters

412 Experimental Results

(1) M-RAPPOR

(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable

(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR

-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value

-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment

(2) ReedndashSolomon EncodingRS encoding consists of the following steps

(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the

Table 1 Degree of difficulty for cracking encoding matrix

Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248

8 6 2 296

16 6 1 296

16 6 2 2192

Table 2 Comparison of processing time for different word lengths

Word size of Galoisfield

Number of source datablocks k

Time(ms)

8 20 19216 20 1928 200 147116 200 1472

Table 3 Environmental parameters (real-time data)

Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB

RAPPORM-RAPPOR

200 300 400 500 600 700 800 900 1000100Number of data (thousand)

0

20

40

60

80

100

120

Tim

e (m

s)

Figure 7 Computational cost of RAPPOR and M-RAPPOR

Security and Communication Networks 11

configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance

(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In

addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability

(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n

6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According

0

000

001

Prop

ortio

n 002

003

50 100 150 0

000

001

Prop

ortio

n

002

003

50 100 150 200

0

0000

0005

0010

0015

0020

Prop

ortio

n

0025

50 100 150

True valueStatistical value

True valueStatistical value

True valueStatistical value

200

f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2

f1 = 001 f2 = 001 ε = 46

Figure 8 Frequency statistics results under different privacy budgets

12 Security and Communication Networks

to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the

corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed

42 Experiments on Protection of Non-Real-TimeIndustrial Data

421 Experimental Configuration

(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes

(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool

422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60

Equalized storageNonequalized storage

18 21 24 27 30 33 36 39 42 4515Number of data blocks

0

50

100

150

200

250

Dec

odin

g tim

e (m

s)

Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time

Encoding time (ms)Decoding time (ms)

5 10 15 20 25 300Number of data blocks

100

150

200

250

300

350

400

450

Tim

e (m

s)

Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks

1 2 3 4 5 60Damaged device number

80

100

120

140

160

180

200

220

240

260

280

Dec

odin

g tim

e (m

s)

Figure 11 Relationship between different local equipment failuresand decoding time

Table 4 Environmental parameters (non-real-time data)

Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB

Security and Communication Networks 13

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 2: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

sensor data in real time predicting possible situations andadjusting the equipment according to the predicted situationin a timely manner Sensor information such as node lo-cations and the spatial coordinates of the sensors themselvesalso constitute sensitive data We label this as real-timesensitive data To prevent the leakage of sensitive infor-mation and to ensure data security a secure storage methodis needed to store real-time sensitive data includingequipment locations safely and without affecting dataavailability

In addition a large amount of sensitive data is notprocessed in real time including a companyrsquos financial datainventory data production data marketing plans customerinformation intellectual property rights and supplier in-formation [4] We label this as non-real-time sensitive data-is data is usually stored with third-party cloud serviceproviders who may claim that the data is encrypted butcannot verify the encryption To prevent data leakage wecannot store the non-real-time sensitive data in the cloud-us for non-real-time sensitive data it is necessary todesign a data protection scheme that can fully utilize cloudstorage and ensure data security

-e recent development of cloud-fog collaborativecomputing provides a new approach to solve this problemCloud computing combines relatively independent com-puting technology and network technology superimposesdistributed computing power on a converged networkplatform and effectively integrates network and computingresources by virtualization technology which is a majorbreakthrough in IT technology Fog computing is an ex-tension and a powerful complement of cloud computingFigure 1 shows the architecture of cloud-fog computing Itincludes the factory terminal equipment layer fog com-puting layer and cloud computing layer -e main task ofthe factory terminal equipment layer is to collect data andupload it to the fog -e fog computing layer the middlelayer in this architecture plays an important role betweenthe cloud computing layer and the fog nodes -e fog nodeshave a given storage capacity and computing capacity -eintroduction of fog computing can reduce the cloud com-puting layer and improve work efficiency -e cloud com-puting layer has a high storage capacity and computingpower Because non-real-time processed data cannot becompletely stored in the cloud we adopt the cloud-fogcollaborative storage scheme In this study cloud-fog col-laborative storage means that the cloud nodes and the fognodes store corresponding data according to the storagestrategy that is some data is stored in the fog nodes andother data is stored in the cloud nodes

To solve the problem of secure storage for real-time andnon-real-time sensitive industrial data this study designs athree-layer protection framework with localfogcloudstorage for sensitive industrial data For real-time sensitiveindustrial data we use the improved local differential pri-vacy algorithmM-RAPPOR to perturb sensitive information(location information etc) and then encode the maskeddata in local equipment In addition we adopt the optimalscheme of distributed storage in local equipment to realizelow-cost high-efficiency data protection For non-real-time

sensitive industrial data we adopt cloud-fog collaborativestorage to achieve multilevel protection -is frameworkgreatly improves security and restorability and relieves thestorage pressure of local equipment

-e remainder of this paper is organized as followsSection 2 reviews related works Section 3 describes theprotection framework for sensitive data and defines thethreat model Section 4 presents experiments for two dif-ferent schemes and evaluates the results through severalindicators Finally Section 5 presents our conclusions

2 Related Work

With the continuous developments in information tech-nology in the era of big data the problem of data security hasattracted increasing attention How to ensure that sensitivedata is not leaked is a major challenge at present Cloud-fogcollaboration is a new solution to meet the current needs ofInternet of -ings (IoT) applications However there is arisk of sensitive-data leakage in both cloud computing andfog computing At present therefore data security hasemerged as an important issue for both academia and in-dustry As a result experts and researchers have proposedtheir own opinions and data protection programs -issection introduces the current research results from twoaspectsmdashthe protection of sensitive data and cloud-fogcollaborative computing

21 Protection of Sensitive Data

211 Protection of Sensitive Data Based on DifferentialPrivacy Dwork and Lei [5] proposed differential privacytechniques as a privacy protection model that defines an

Fog nodes

Fog server

Cloud

Fog

Factory 1 Factory 2 Factory N Factory 1 Factory N

Figure 1 Architecture of cloud-fog computing

2 Security and Communication Networks

extremely rigorous attack model that operates withoutregard on how much background knowledge the attackerhas Kenthapadi et al [6] used differential privacy to addresssensitive-data protection issues for collective user behaviorMohammed et al [7] proposed a differential privacy algo-rithm for private data distribution and vertical partitiondata In a scenario where third-party data collectors are nottrusted Duchi et al [8] proposed local differential privacywhere every user understands their privacy protectionprocess and can individually process and protect personalsensitive information Erlingsson [9] developed the Ran-domized Aggregatable Privacy-Preserving Ordinal Response(RAPPOR) privacy algorithm that enables Google to collectuser answers to questions such as a browserrsquos default homepage and default search engine to understand the unwel-come or malicious hijacking of user settings Gu et al [10]proposed a privacy protection mining (DIFF-PPM) algo-rithm combined with the Markov chain Monte Carlo al-gorithm to provide privacy protection and maintain dataavailability when the (ε δ)-differential privacy conditions aremet

Most of these studies have aimed to protect usersrsquopersonal sensitive data and they have not applied differentialprivacy to the industrial field In addition the weakness ofdifferential privacy is obvious because the assumptionsabout background knowledge are too strong a large amountof randomization needs to be added to the query resultsresulting in a sharp decline in data availability -ereforehow to balance the level of sensitive-data protection and dataavailability according to specific application scenarios re-mains unclear and challenging in research on differentialprivacy applications

212 Sensitive Data Protection Based on Fog ComputingXu et al [11] introduced local differential privacy into fogcomputing and proposed a framework for local differentialprivacy protection however they did not mention specificimplementation details and privacy algorithms Wang et al[12] proposed a three-layer privacy protection frameworkhowever they did not consider the problem of localequipment failure in which case users would not be able toquery complete data Du et al [13] proposed a fog com-puting support data center query model based on differentialprivacy and proved through rigorous mathematical de-duction that it ensured the reliability and effectiveness ofprivacy protection Lyu et al [14] proposed the PPFA pri-vacy protection aggregation system that uses the stability of aGauss mechanism to ensure the differential privacy of thestatistical results and reduces the loss of privacy by com-bining a stream cipher with a public key cipher to maintainpracticability Huang et al [15] proposed an attribute-basedencryption scheme that makes full use of fog servers -ecollected data is encrypted by fog servers and then out-sourced to cloud servers -e experimental results show thatthe functions of the scheme are limited because it does notsupport efficient data search and the fog server takes up alarge part of the workload Lu et al [16] proposed theLightweight Privacy-preserving Data Aggregation (LPDA)

scheme Experiments indicated that this scheme was highlyefficient in terms of computational cost and communicationoverhead Wang et al [17] proposed a fog computing querymodel based on differential privacy Compared with thetraditional privacy protection model this model affordsdifferent degrees of improvement in computational over-head execution efficiency and energy consumption Kul-karni et al [18] proposed a privacy protection framework toprotect the data security of sensors between the terminalequipment and fog networks -is scheme introduced aselective public key cryptosystem that can resist internalattacks However they did not consider the computationaloverhead and the fog nodes may overload and collapse if thesensor transmits data for a long time

-e above studies introduced fog computing to protectsensitive data however most of them ignored the problemof the limited computational power and storage capacity offog nodes Many data protection schemes based on fogcomputing do a lot of work on the fog side this directly leadsto increased computational overhead and energy con-sumption resulting in jitter or even downtime of fog nodes

213 Protection of Sensitive Data in Cloud ComputingIn recent years data security in cloud storage has attractedattention in industry and academia Many experts and re-searchers have proposed schemes to address this issue Houet al [19] proposed protection schemes that use encrypteddata by the system when data has been transferred by SSLFeng [20] proposed a method in which data is encrypted in aclosed cloud environment to solve the problem of a dataleakage caused by the increased burden of the cloud serverIn addition one-time encryption and multipoint securestorage can be realized However encryption makes it moredifficult to search in the cloud At present a key issue in thefield of cloud computing is searchable encryption Fu et al[21ndash23] and Xia et al [24] provided different solutions to thisproblem that improved accuracy safety and efficiencyKulkarni et al [25] designed a virtual private storage servicebased on a newly developed encryption technology -isservice achieved the best combination of private cloud se-curity and public cloud functionality Wang et al [26]proposed that users do not have actual ownership of out-sourced data this makes data integrity protection in cloudcomputing a difficult task Shen et al [27] proposed anefficient public auditing protocol that includes a doubly-linked information table and a location array Experimentsshow that this protocol can reduce computational andcommunication overheads and achieve certain efficiencies

-e above studies improved the use of cloud storage forthe protection of sensitive data However they suffer from acommon problem they cannot provide a defense againstinternal attacks from the cloud

22 Cloud-Fog Collaborative Computing At present theacademic community mainly solves the problem of taskscheduling through cloud-fog coordination Pham and Huh[28] considered task scheduling in the cloud-fog collabo-ration computing system they proposed a task scheduling

Security and Communication Networks 3

algorithm that guarantees the performance of applicationexecution However they did not consider the fog providerrsquosbudget Deng et al [29] studied the balance between powerconsumption and delay in cloud-fog collaborative com-puting systems -ey first modeled the power consumptionand delay functions of various parts of the system andformulated allocation problems according to the workload-en they developed an approximate solution to theoriginal problem by decomposition and separately formu-lated three subproblems of the corresponding subsystemExperiments showed that fog computing could reducecommunication delay and significantly improve cloudcomputing performance Bierzynski et al [30] believed thatcloud-fog collaboration can solve most problems in IoT anddiscussed four possible ways to distribute workloads be-tween different levels

-e above studies solved the task scheduling problemthrough cloud-fog collaboration however they did notconsider network security and data protection issues Inaddition simply relying on fog computing or cloud com-puting cannot provide more secure protection for industrialdata

In summary the studies discussed in this section solvedthe data protection problem and task scheduling problem tosome extent However for industrial applications with botha large amount of data and a high degree of data protectiontheir results cannot meet the requirements of the IndustrialInternet update speed high data accuracy and large datavolume -us far academic and industrial research on theprotection of sensitive data in the Industrial Internet of-ings (IIoT) remains in its infancy In this study wepropose a protection model for sensitive data that provideshigher protection and is more adaptable to industrialenvironments

3 Sensitive Industrial Data Protection Scheme

Section 31 introduces our protection framework for sen-sitive industrial data and the threat model Sections 32 and33 describe the protection of data for real-time and non-real-time processing respectively

31 Our Work and Contribution

311 Sensitive Industrial Data Protection Framework-is study designs a three-layer protection framework withlocalfogcloud storage for sensitive industrial data asshown in Figure 2 -is framework mainly consists of threemodules M-RAPPOR disturbance module ReedndashSolomon(RS) encoding module and Advanced Encryption Standard(AES) encryption and RS encoding module -is frameworknot only achieves low-cost high-efficiency and intelligentdata protection but also alleviates the storage pressure oflocal equipment through improved recoverability

We divide industrial data into real-time and non-real-time sensitive data depending on its characteristics

(1) For real-time sensitive data we design a data pro-tection scheme based on local differential privacy

and RS encoding We apply the M-RAPPOR algo-rithm to the sensitive data (location information)apply RS encoding to the perturbed data and theundisturbed nonsensitive data and store them inlocal equipment Considering the storage capacity oflocal equipment and the problem that data cannot berecovered owing to local equipment failures weadopt the optimal solution of distributed storage inthe local equipment and add corresponding re-strictions to the RS encoding which not only solvesthe data recovery problem due to the failure of localequipment but also improves the encoding efficiencyand reduces the computational cost If we need toperform predictive maintenance analysis we candecode the data and then analyze it

(2) For non-real-time sensitive data we design a dataprotection scheme based on AES encryption and RSencoding -e local equipment encrypts the non-real-time data using AES uploads the ciphertext tothe fog nodes and performs RS encoding of the dataon the fog nodes Next the encoded data is stored inthe fog nodes and the cloud nodes according to thecloud-fog collaborative storage strategy In this wayif an attempt is made to steal the data from the fognodes or the cloud nodes all the data cannot betaken

312 6reat Model Figure 3 shows the threat model that isdefined according to the data protection framework

-ere are two threats to industrial data

(1) For real-time sensitive data attackers can attack thelocal data center -ey can steal location informationof equipment to infer the general plant layout -elayout is a type of sensitive industrial data andtherefore this attack will lead to the leakage of suchdata

(2) For non-real-time sensitive data attackers (internalpersonnel of the third-party) can abuse data miningto infer customersrsquo private information and thefactoryrsquos business strategy from the factoryrsquos cus-tomer information and marketing plans respec-tively -is will not only threaten customersrsquo privacybut also affect the factoryrsquos overall interests

32 Protection of Real-Time Industrial Data

321 Local Differential Privacy Local differential privacyconsiders scenarios in which third-party data collectors areuntrustworthy Each user perturbs the data according to theprivacy algorithm and then uploads the perturbed data to thedata collector Local differential privacy is formally definedas follows

Definition 1 -ere are n users where each user correspondsto a record A privacy algorithm M is given having a domainDom(M) and a range Ran(M) If algorithm M obtains thesame output result T(TsubeRan(M)) on any two records t and

4 Security and Communication Networks

tprime(t tprime isin Dom(M)) and satisfies the following inequalitiesthen M satisfies local differential privacy

Pr[M(t) T]le eε

times Pr M tprime( 1113857 T1113858 1113859 (1)

As can be seen from Definition 1 the local differentialprivacy ensures that the algorithm M satisfies the ε-localdifferential privacy by controlling the similarity of theoutput results of any two records In short according to the

AES encryption+RS encoding

Real-timeindustrial

sensitive data

Non-real-timeindustrial

sensitive data

M-RAPPOR disturbance

Sensitivedata

(locationinformation)

Non-sensitive data

Industrialcloud

Fog nodes

Cloud

Fog

Localequipment

LocalRSencoding

Data

Figure 2 Protection framework for sensitive industrial data

General plantlayout

Location data

1 Customer private information

2 Factory business strategy

Sensor Sensor Sensor Sensor SensorSensor

Local data center

Fog

Cloud

Local equipmentLocal equipment

Figure 3 -reat model of sensitive industrial data

Security and Communication Networks 5

output result of the privacy algorithm M it is almostimpossible to infer which record the input data is Withlocal differential privacy technology each user can processindividual data independently In other words the processof privacy processing is transferred from the data collectorto a single user and therefore there is no need for trustedthird-party intervention -is can avoid the attacks thatmay be brought on by untrusted third-party data collectors-e implementation of local differential privacy protectionrequires the intervention of a data disturbance mechanismRandom response technology is the mainstream distur-bance mechanism for local differential privacy protectiontechnology

322 ReedndashSolomon Encoding RS encoding is a matrixoperation As shown in Figure 4 the input data is treated asmatrix C (C1 C2 Cn) and the encoding matrix B ismultiplied by matrix C to obtain data blocks(C1 C2 Cn) and y redundant data blocks (n 5 y 3)-e encoding matrix B must have reversibility of anysubmatrix After RS encoding the data is divided into n partsand y redundant data are generated In the data of thesen + y parts one can recover the total data with at least n datavalues In other words the total data cannot be recoveredwith less than n data values

323 Data Protection Scheme Based on Local DifferentialPrivacy and ReedndashSolomon Encoding With continuousdevelopments in industries many factories have startedusing equipment such as CNC machine tools to increaseproductivity However some problems have also beenexposed For example a serious failure of a CNC machinetool will not only lead to the shutdown of the entireproduction line but also seriously delay the product pro-duction cycle -erefore to avoid this situation the factoryshould analyze the data collected by sensors predictpossible situations and warn operators or adjust theequipment according to the forecasted situation in a timelymanner to avoid an accidental shutdown of the equipmentHowever the data contains not only real-time data col-lected by the sensor but also a large amount of sensitivedata of the factory equipment such as node location andspatial coordinates -is type of sensitive data related tolocations should not be leaked before predictive mainte-nance analysis -erefore it needs to be desensitized beforedata analysis to eliminate the correspondence between theequipment and the location In addition because the real-time data information collected by factory sensors is ofgreat value and high confidentiality we cannot upload thisdata to the fog or the cloud it must be stored in localequipment -e protection scheme for real-time sensitivedata is described in detail below

We use local differential privacy to desensitize real-timesensitive data -en we encode the perturbed data andundisturbed nonsensitive data by RS encoding and store it inlocal equipment Figure 5 shows the protection process forreal-time data

-e real-time data collected by the factory sensors is firstuploaded to the local equipment-en the following processis performed in the local equipment

(1) Data Disturbance with Local Differential Privacy Becauseof the limited computing power of local equipment it cannotperform complex computing tasks -erefore it is very im-portant to design a lightweight privacy protection algorithmLocal differential privacy considers the untrustworthiness of thedata collector and its disturbancemechanismmainly includes arandom response information compression and distortion-e disturbance framework of the random response technologyis simple and intuitive and the disturbance level can be directlyquantified -erefore we choose local differential privacy toprotect the sensitive data We refer to the basic idea of theRAPPOR algorithm which is an open-source project of Googleand improve RAPPOR to reduce the computational cost of localequipment while ensuring privacy and data availability In thispaper we label the improved algorithm asM-RAPPOR and weuse it to disturb the location information Section 325 discussesthe specific disturbance steps

(2) RS Encoding and Storage We apply RS encoding to theperturbed data and undisturbed nonsensitive data and store thedata in local equipment However owing to the continuousproduction of data in factories and the limited storage capacityof local equipment the storage pressure on equipment isgradually increasing In addition if the local equipment failsunexpectedly it will not be possible to completely restore thedata resulting in serious losses to the factory To solve these twoproblems we adopt the optimal solution namely distributedstorage in the local equipment and add corresponding re-strictions to RS -is not only solves the problem of the highstorage pressure on local equipment and the fact that datacannot be recovered owing to the failure of the local equipmentbut also improves the encoding efficiency and reduces thecomputational cost Section 324 discusses the specific dis-tributed storage strategy

(3) RS Decoding and Analysis When predictive maintenanceanalysis is performed the data stored in the local equipmentare decoded and the analysis is performed according to thestatistical location distribution and data Section 325 dis-cusses the specific statistical process

lowast =

C

B

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

B11 B13 B14 B15B12 D1

D2

D3

C1

C2

C3

C4

C5

C1

C2

C3

C4

C5

B11 B13 B14 B15B12

B11 B13 B14 B15B12

Figure 4 RS encoding process

6 Security and Communication Networks

324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is

Zi b times Mi

M1 + M2 + middot middot middot + Mn

(2)

To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality

b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

lemlt b (3)

In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is

m b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

(4)

325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability

-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made

(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection

(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding

Masked data

Sensitive data(location space

coordinates)

Nonsensitivedata

Predictivemaintenance

analysis

DesensitizationdataAnalysis result

M-RAPPOR

RS-encoding

Local

Data processing Storage

Industrialproduction

line

Storage

RS decoding

Figure 5 Protection process for real-time data

Security and Communication Networks 7

attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased

-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)

(1) Data Sender

(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1

(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value

P Biprime x( 1113857

f1 x 1

f2 x 0

1 minus f1 minus f2 x Bi

⎧⎪⎪⎨

⎪⎪⎩(5)

To satisfy ε-local differential privacy the privacy budget εis calculated

ε ln1 minus f2

f1 (6)

Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once

(2) Data Collector

(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime

(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows

ni N f2 minus 2f1f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2+

Di f1 minus 1 + f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2

(7)

(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value

In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased

326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data

According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)

attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons

(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation

(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists

(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni

(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location

In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data

33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes

331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment

8 Security and Communication Networks

-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection

332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967

represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc

Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud

nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as

Ttotal Tcloud + Tfog (8)

where Tcloud is the storage time required for the cloud andTfog that for the fog

-e storage time required for each cloud node isexpressed as

Tck tck middot (k minus 1) middotSck

1113936nj1 Scj

(9)

where Tck is the storage time required for cloud node Nck

and tckis the storage time required when cloud node Nck

stores each data block-erefore the storage time Tcloud required for the cloud

is expressed as

Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)

In addition we analyze the two storage methods for thefog nodes as follows

(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as

AES

DATA

Some data

RS encoding

Other data

Local

Equipment 1 Equipment 2 hellip Equipment n

Fog

Cloud

Figure 6 Protection process for data that does not require real-time processing

Security and Communication Networks 9

Tfog

tf1 middot (x + 1) 0lt x + 1leF1 max

tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max

tfn middot x + 1 minus 1113936nminus1

i1Fi max1113888 1113889 + 1113936

nminus1

i1tfi middot Fi max 1113936

nminus1

i1Fi max ltx + 1le 1113936

n

i1Fi max

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi

(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as

Tfk tfk middot (x + 1) middotSfk

1113936ni1 Sfi

(12)

-erefore the storage time Tfog required for the fog isexpressed as

Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)

After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)

333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix

Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory

34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency

341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field

in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values

Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system

342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as

Est n

n + m (14)

-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes

4 Results and Discussion

41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments

411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a

10 Security and Communication Networks

normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters

412 Experimental Results

(1) M-RAPPOR

(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable

(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR

-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value

-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment

(2) ReedndashSolomon EncodingRS encoding consists of the following steps

(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the

Table 1 Degree of difficulty for cracking encoding matrix

Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248

8 6 2 296

16 6 1 296

16 6 2 2192

Table 2 Comparison of processing time for different word lengths

Word size of Galoisfield

Number of source datablocks k

Time(ms)

8 20 19216 20 1928 200 147116 200 1472

Table 3 Environmental parameters (real-time data)

Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB

RAPPORM-RAPPOR

200 300 400 500 600 700 800 900 1000100Number of data (thousand)

0

20

40

60

80

100

120

Tim

e (m

s)

Figure 7 Computational cost of RAPPOR and M-RAPPOR

Security and Communication Networks 11

configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance

(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In

addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability

(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n

6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According

0

000

001

Prop

ortio

n 002

003

50 100 150 0

000

001

Prop

ortio

n

002

003

50 100 150 200

0

0000

0005

0010

0015

0020

Prop

ortio

n

0025

50 100 150

True valueStatistical value

True valueStatistical value

True valueStatistical value

200

f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2

f1 = 001 f2 = 001 ε = 46

Figure 8 Frequency statistics results under different privacy budgets

12 Security and Communication Networks

to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the

corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed

42 Experiments on Protection of Non-Real-TimeIndustrial Data

421 Experimental Configuration

(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes

(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool

422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60

Equalized storageNonequalized storage

18 21 24 27 30 33 36 39 42 4515Number of data blocks

0

50

100

150

200

250

Dec

odin

g tim

e (m

s)

Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time

Encoding time (ms)Decoding time (ms)

5 10 15 20 25 300Number of data blocks

100

150

200

250

300

350

400

450

Tim

e (m

s)

Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks

1 2 3 4 5 60Damaged device number

80

100

120

140

160

180

200

220

240

260

280

Dec

odin

g tim

e (m

s)

Figure 11 Relationship between different local equipment failuresand decoding time

Table 4 Environmental parameters (non-real-time data)

Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB

Security and Communication Networks 13

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 3: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

extremely rigorous attack model that operates withoutregard on how much background knowledge the attackerhas Kenthapadi et al [6] used differential privacy to addresssensitive-data protection issues for collective user behaviorMohammed et al [7] proposed a differential privacy algo-rithm for private data distribution and vertical partitiondata In a scenario where third-party data collectors are nottrusted Duchi et al [8] proposed local differential privacywhere every user understands their privacy protectionprocess and can individually process and protect personalsensitive information Erlingsson [9] developed the Ran-domized Aggregatable Privacy-Preserving Ordinal Response(RAPPOR) privacy algorithm that enables Google to collectuser answers to questions such as a browserrsquos default homepage and default search engine to understand the unwel-come or malicious hijacking of user settings Gu et al [10]proposed a privacy protection mining (DIFF-PPM) algo-rithm combined with the Markov chain Monte Carlo al-gorithm to provide privacy protection and maintain dataavailability when the (ε δ)-differential privacy conditions aremet

Most of these studies have aimed to protect usersrsquopersonal sensitive data and they have not applied differentialprivacy to the industrial field In addition the weakness ofdifferential privacy is obvious because the assumptionsabout background knowledge are too strong a large amountof randomization needs to be added to the query resultsresulting in a sharp decline in data availability -ereforehow to balance the level of sensitive-data protection and dataavailability according to specific application scenarios re-mains unclear and challenging in research on differentialprivacy applications

212 Sensitive Data Protection Based on Fog ComputingXu et al [11] introduced local differential privacy into fogcomputing and proposed a framework for local differentialprivacy protection however they did not mention specificimplementation details and privacy algorithms Wang et al[12] proposed a three-layer privacy protection frameworkhowever they did not consider the problem of localequipment failure in which case users would not be able toquery complete data Du et al [13] proposed a fog com-puting support data center query model based on differentialprivacy and proved through rigorous mathematical de-duction that it ensured the reliability and effectiveness ofprivacy protection Lyu et al [14] proposed the PPFA pri-vacy protection aggregation system that uses the stability of aGauss mechanism to ensure the differential privacy of thestatistical results and reduces the loss of privacy by com-bining a stream cipher with a public key cipher to maintainpracticability Huang et al [15] proposed an attribute-basedencryption scheme that makes full use of fog servers -ecollected data is encrypted by fog servers and then out-sourced to cloud servers -e experimental results show thatthe functions of the scheme are limited because it does notsupport efficient data search and the fog server takes up alarge part of the workload Lu et al [16] proposed theLightweight Privacy-preserving Data Aggregation (LPDA)

scheme Experiments indicated that this scheme was highlyefficient in terms of computational cost and communicationoverhead Wang et al [17] proposed a fog computing querymodel based on differential privacy Compared with thetraditional privacy protection model this model affordsdifferent degrees of improvement in computational over-head execution efficiency and energy consumption Kul-karni et al [18] proposed a privacy protection framework toprotect the data security of sensors between the terminalequipment and fog networks -is scheme introduced aselective public key cryptosystem that can resist internalattacks However they did not consider the computationaloverhead and the fog nodes may overload and collapse if thesensor transmits data for a long time

-e above studies introduced fog computing to protectsensitive data however most of them ignored the problemof the limited computational power and storage capacity offog nodes Many data protection schemes based on fogcomputing do a lot of work on the fog side this directly leadsto increased computational overhead and energy con-sumption resulting in jitter or even downtime of fog nodes

213 Protection of Sensitive Data in Cloud ComputingIn recent years data security in cloud storage has attractedattention in industry and academia Many experts and re-searchers have proposed schemes to address this issue Houet al [19] proposed protection schemes that use encrypteddata by the system when data has been transferred by SSLFeng [20] proposed a method in which data is encrypted in aclosed cloud environment to solve the problem of a dataleakage caused by the increased burden of the cloud serverIn addition one-time encryption and multipoint securestorage can be realized However encryption makes it moredifficult to search in the cloud At present a key issue in thefield of cloud computing is searchable encryption Fu et al[21ndash23] and Xia et al [24] provided different solutions to thisproblem that improved accuracy safety and efficiencyKulkarni et al [25] designed a virtual private storage servicebased on a newly developed encryption technology -isservice achieved the best combination of private cloud se-curity and public cloud functionality Wang et al [26]proposed that users do not have actual ownership of out-sourced data this makes data integrity protection in cloudcomputing a difficult task Shen et al [27] proposed anefficient public auditing protocol that includes a doubly-linked information table and a location array Experimentsshow that this protocol can reduce computational andcommunication overheads and achieve certain efficiencies

-e above studies improved the use of cloud storage forthe protection of sensitive data However they suffer from acommon problem they cannot provide a defense againstinternal attacks from the cloud

22 Cloud-Fog Collaborative Computing At present theacademic community mainly solves the problem of taskscheduling through cloud-fog coordination Pham and Huh[28] considered task scheduling in the cloud-fog collabo-ration computing system they proposed a task scheduling

Security and Communication Networks 3

algorithm that guarantees the performance of applicationexecution However they did not consider the fog providerrsquosbudget Deng et al [29] studied the balance between powerconsumption and delay in cloud-fog collaborative com-puting systems -ey first modeled the power consumptionand delay functions of various parts of the system andformulated allocation problems according to the workload-en they developed an approximate solution to theoriginal problem by decomposition and separately formu-lated three subproblems of the corresponding subsystemExperiments showed that fog computing could reducecommunication delay and significantly improve cloudcomputing performance Bierzynski et al [30] believed thatcloud-fog collaboration can solve most problems in IoT anddiscussed four possible ways to distribute workloads be-tween different levels

-e above studies solved the task scheduling problemthrough cloud-fog collaboration however they did notconsider network security and data protection issues Inaddition simply relying on fog computing or cloud com-puting cannot provide more secure protection for industrialdata

In summary the studies discussed in this section solvedthe data protection problem and task scheduling problem tosome extent However for industrial applications with botha large amount of data and a high degree of data protectiontheir results cannot meet the requirements of the IndustrialInternet update speed high data accuracy and large datavolume -us far academic and industrial research on theprotection of sensitive data in the Industrial Internet of-ings (IIoT) remains in its infancy In this study wepropose a protection model for sensitive data that provideshigher protection and is more adaptable to industrialenvironments

3 Sensitive Industrial Data Protection Scheme

Section 31 introduces our protection framework for sen-sitive industrial data and the threat model Sections 32 and33 describe the protection of data for real-time and non-real-time processing respectively

31 Our Work and Contribution

311 Sensitive Industrial Data Protection Framework-is study designs a three-layer protection framework withlocalfogcloud storage for sensitive industrial data asshown in Figure 2 -is framework mainly consists of threemodules M-RAPPOR disturbance module ReedndashSolomon(RS) encoding module and Advanced Encryption Standard(AES) encryption and RS encoding module -is frameworknot only achieves low-cost high-efficiency and intelligentdata protection but also alleviates the storage pressure oflocal equipment through improved recoverability

We divide industrial data into real-time and non-real-time sensitive data depending on its characteristics

(1) For real-time sensitive data we design a data pro-tection scheme based on local differential privacy

and RS encoding We apply the M-RAPPOR algo-rithm to the sensitive data (location information)apply RS encoding to the perturbed data and theundisturbed nonsensitive data and store them inlocal equipment Considering the storage capacity oflocal equipment and the problem that data cannot berecovered owing to local equipment failures weadopt the optimal solution of distributed storage inthe local equipment and add corresponding re-strictions to the RS encoding which not only solvesthe data recovery problem due to the failure of localequipment but also improves the encoding efficiencyand reduces the computational cost If we need toperform predictive maintenance analysis we candecode the data and then analyze it

(2) For non-real-time sensitive data we design a dataprotection scheme based on AES encryption and RSencoding -e local equipment encrypts the non-real-time data using AES uploads the ciphertext tothe fog nodes and performs RS encoding of the dataon the fog nodes Next the encoded data is stored inthe fog nodes and the cloud nodes according to thecloud-fog collaborative storage strategy In this wayif an attempt is made to steal the data from the fognodes or the cloud nodes all the data cannot betaken

312 6reat Model Figure 3 shows the threat model that isdefined according to the data protection framework

-ere are two threats to industrial data

(1) For real-time sensitive data attackers can attack thelocal data center -ey can steal location informationof equipment to infer the general plant layout -elayout is a type of sensitive industrial data andtherefore this attack will lead to the leakage of suchdata

(2) For non-real-time sensitive data attackers (internalpersonnel of the third-party) can abuse data miningto infer customersrsquo private information and thefactoryrsquos business strategy from the factoryrsquos cus-tomer information and marketing plans respec-tively -is will not only threaten customersrsquo privacybut also affect the factoryrsquos overall interests

32 Protection of Real-Time Industrial Data

321 Local Differential Privacy Local differential privacyconsiders scenarios in which third-party data collectors areuntrustworthy Each user perturbs the data according to theprivacy algorithm and then uploads the perturbed data to thedata collector Local differential privacy is formally definedas follows

Definition 1 -ere are n users where each user correspondsto a record A privacy algorithm M is given having a domainDom(M) and a range Ran(M) If algorithm M obtains thesame output result T(TsubeRan(M)) on any two records t and

4 Security and Communication Networks

tprime(t tprime isin Dom(M)) and satisfies the following inequalitiesthen M satisfies local differential privacy

Pr[M(t) T]le eε

times Pr M tprime( 1113857 T1113858 1113859 (1)

As can be seen from Definition 1 the local differentialprivacy ensures that the algorithm M satisfies the ε-localdifferential privacy by controlling the similarity of theoutput results of any two records In short according to the

AES encryption+RS encoding

Real-timeindustrial

sensitive data

Non-real-timeindustrial

sensitive data

M-RAPPOR disturbance

Sensitivedata

(locationinformation)

Non-sensitive data

Industrialcloud

Fog nodes

Cloud

Fog

Localequipment

LocalRSencoding

Data

Figure 2 Protection framework for sensitive industrial data

General plantlayout

Location data

1 Customer private information

2 Factory business strategy

Sensor Sensor Sensor Sensor SensorSensor

Local data center

Fog

Cloud

Local equipmentLocal equipment

Figure 3 -reat model of sensitive industrial data

Security and Communication Networks 5

output result of the privacy algorithm M it is almostimpossible to infer which record the input data is Withlocal differential privacy technology each user can processindividual data independently In other words the processof privacy processing is transferred from the data collectorto a single user and therefore there is no need for trustedthird-party intervention -is can avoid the attacks thatmay be brought on by untrusted third-party data collectors-e implementation of local differential privacy protectionrequires the intervention of a data disturbance mechanismRandom response technology is the mainstream distur-bance mechanism for local differential privacy protectiontechnology

322 ReedndashSolomon Encoding RS encoding is a matrixoperation As shown in Figure 4 the input data is treated asmatrix C (C1 C2 Cn) and the encoding matrix B ismultiplied by matrix C to obtain data blocks(C1 C2 Cn) and y redundant data blocks (n 5 y 3)-e encoding matrix B must have reversibility of anysubmatrix After RS encoding the data is divided into n partsand y redundant data are generated In the data of thesen + y parts one can recover the total data with at least n datavalues In other words the total data cannot be recoveredwith less than n data values

323 Data Protection Scheme Based on Local DifferentialPrivacy and ReedndashSolomon Encoding With continuousdevelopments in industries many factories have startedusing equipment such as CNC machine tools to increaseproductivity However some problems have also beenexposed For example a serious failure of a CNC machinetool will not only lead to the shutdown of the entireproduction line but also seriously delay the product pro-duction cycle -erefore to avoid this situation the factoryshould analyze the data collected by sensors predictpossible situations and warn operators or adjust theequipment according to the forecasted situation in a timelymanner to avoid an accidental shutdown of the equipmentHowever the data contains not only real-time data col-lected by the sensor but also a large amount of sensitivedata of the factory equipment such as node location andspatial coordinates -is type of sensitive data related tolocations should not be leaked before predictive mainte-nance analysis -erefore it needs to be desensitized beforedata analysis to eliminate the correspondence between theequipment and the location In addition because the real-time data information collected by factory sensors is ofgreat value and high confidentiality we cannot upload thisdata to the fog or the cloud it must be stored in localequipment -e protection scheme for real-time sensitivedata is described in detail below

We use local differential privacy to desensitize real-timesensitive data -en we encode the perturbed data andundisturbed nonsensitive data by RS encoding and store it inlocal equipment Figure 5 shows the protection process forreal-time data

-e real-time data collected by the factory sensors is firstuploaded to the local equipment-en the following processis performed in the local equipment

(1) Data Disturbance with Local Differential Privacy Becauseof the limited computing power of local equipment it cannotperform complex computing tasks -erefore it is very im-portant to design a lightweight privacy protection algorithmLocal differential privacy considers the untrustworthiness of thedata collector and its disturbancemechanismmainly includes arandom response information compression and distortion-e disturbance framework of the random response technologyis simple and intuitive and the disturbance level can be directlyquantified -erefore we choose local differential privacy toprotect the sensitive data We refer to the basic idea of theRAPPOR algorithm which is an open-source project of Googleand improve RAPPOR to reduce the computational cost of localequipment while ensuring privacy and data availability In thispaper we label the improved algorithm asM-RAPPOR and weuse it to disturb the location information Section 325 discussesthe specific disturbance steps

(2) RS Encoding and Storage We apply RS encoding to theperturbed data and undisturbed nonsensitive data and store thedata in local equipment However owing to the continuousproduction of data in factories and the limited storage capacityof local equipment the storage pressure on equipment isgradually increasing In addition if the local equipment failsunexpectedly it will not be possible to completely restore thedata resulting in serious losses to the factory To solve these twoproblems we adopt the optimal solution namely distributedstorage in the local equipment and add corresponding re-strictions to RS -is not only solves the problem of the highstorage pressure on local equipment and the fact that datacannot be recovered owing to the failure of the local equipmentbut also improves the encoding efficiency and reduces thecomputational cost Section 324 discusses the specific dis-tributed storage strategy

(3) RS Decoding and Analysis When predictive maintenanceanalysis is performed the data stored in the local equipmentare decoded and the analysis is performed according to thestatistical location distribution and data Section 325 dis-cusses the specific statistical process

lowast =

C

B

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

B11 B13 B14 B15B12 D1

D2

D3

C1

C2

C3

C4

C5

C1

C2

C3

C4

C5

B11 B13 B14 B15B12

B11 B13 B14 B15B12

Figure 4 RS encoding process

6 Security and Communication Networks

324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is

Zi b times Mi

M1 + M2 + middot middot middot + Mn

(2)

To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality

b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

lemlt b (3)

In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is

m b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

(4)

325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability

-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made

(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection

(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding

Masked data

Sensitive data(location space

coordinates)

Nonsensitivedata

Predictivemaintenance

analysis

DesensitizationdataAnalysis result

M-RAPPOR

RS-encoding

Local

Data processing Storage

Industrialproduction

line

Storage

RS decoding

Figure 5 Protection process for real-time data

Security and Communication Networks 7

attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased

-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)

(1) Data Sender

(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1

(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value

P Biprime x( 1113857

f1 x 1

f2 x 0

1 minus f1 minus f2 x Bi

⎧⎪⎪⎨

⎪⎪⎩(5)

To satisfy ε-local differential privacy the privacy budget εis calculated

ε ln1 minus f2

f1 (6)

Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once

(2) Data Collector

(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime

(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows

ni N f2 minus 2f1f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2+

Di f1 minus 1 + f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2

(7)

(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value

In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased

326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data

According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)

attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons

(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation

(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists

(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni

(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location

In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data

33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes

331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment

8 Security and Communication Networks

-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection

332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967

represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc

Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud

nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as

Ttotal Tcloud + Tfog (8)

where Tcloud is the storage time required for the cloud andTfog that for the fog

-e storage time required for each cloud node isexpressed as

Tck tck middot (k minus 1) middotSck

1113936nj1 Scj

(9)

where Tck is the storage time required for cloud node Nck

and tckis the storage time required when cloud node Nck

stores each data block-erefore the storage time Tcloud required for the cloud

is expressed as

Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)

In addition we analyze the two storage methods for thefog nodes as follows

(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as

AES

DATA

Some data

RS encoding

Other data

Local

Equipment 1 Equipment 2 hellip Equipment n

Fog

Cloud

Figure 6 Protection process for data that does not require real-time processing

Security and Communication Networks 9

Tfog

tf1 middot (x + 1) 0lt x + 1leF1 max

tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max

tfn middot x + 1 minus 1113936nminus1

i1Fi max1113888 1113889 + 1113936

nminus1

i1tfi middot Fi max 1113936

nminus1

i1Fi max ltx + 1le 1113936

n

i1Fi max

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi

(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as

Tfk tfk middot (x + 1) middotSfk

1113936ni1 Sfi

(12)

-erefore the storage time Tfog required for the fog isexpressed as

Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)

After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)

333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix

Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory

34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency

341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field

in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values

Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system

342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as

Est n

n + m (14)

-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes

4 Results and Discussion

41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments

411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a

10 Security and Communication Networks

normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters

412 Experimental Results

(1) M-RAPPOR

(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable

(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR

-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value

-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment

(2) ReedndashSolomon EncodingRS encoding consists of the following steps

(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the

Table 1 Degree of difficulty for cracking encoding matrix

Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248

8 6 2 296

16 6 1 296

16 6 2 2192

Table 2 Comparison of processing time for different word lengths

Word size of Galoisfield

Number of source datablocks k

Time(ms)

8 20 19216 20 1928 200 147116 200 1472

Table 3 Environmental parameters (real-time data)

Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB

RAPPORM-RAPPOR

200 300 400 500 600 700 800 900 1000100Number of data (thousand)

0

20

40

60

80

100

120

Tim

e (m

s)

Figure 7 Computational cost of RAPPOR and M-RAPPOR

Security and Communication Networks 11

configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance

(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In

addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability

(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n

6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According

0

000

001

Prop

ortio

n 002

003

50 100 150 0

000

001

Prop

ortio

n

002

003

50 100 150 200

0

0000

0005

0010

0015

0020

Prop

ortio

n

0025

50 100 150

True valueStatistical value

True valueStatistical value

True valueStatistical value

200

f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2

f1 = 001 f2 = 001 ε = 46

Figure 8 Frequency statistics results under different privacy budgets

12 Security and Communication Networks

to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the

corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed

42 Experiments on Protection of Non-Real-TimeIndustrial Data

421 Experimental Configuration

(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes

(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool

422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60

Equalized storageNonequalized storage

18 21 24 27 30 33 36 39 42 4515Number of data blocks

0

50

100

150

200

250

Dec

odin

g tim

e (m

s)

Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time

Encoding time (ms)Decoding time (ms)

5 10 15 20 25 300Number of data blocks

100

150

200

250

300

350

400

450

Tim

e (m

s)

Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks

1 2 3 4 5 60Damaged device number

80

100

120

140

160

180

200

220

240

260

280

Dec

odin

g tim

e (m

s)

Figure 11 Relationship between different local equipment failuresand decoding time

Table 4 Environmental parameters (non-real-time data)

Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB

Security and Communication Networks 13

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 4: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

algorithm that guarantees the performance of applicationexecution However they did not consider the fog providerrsquosbudget Deng et al [29] studied the balance between powerconsumption and delay in cloud-fog collaborative com-puting systems -ey first modeled the power consumptionand delay functions of various parts of the system andformulated allocation problems according to the workload-en they developed an approximate solution to theoriginal problem by decomposition and separately formu-lated three subproblems of the corresponding subsystemExperiments showed that fog computing could reducecommunication delay and significantly improve cloudcomputing performance Bierzynski et al [30] believed thatcloud-fog collaboration can solve most problems in IoT anddiscussed four possible ways to distribute workloads be-tween different levels

-e above studies solved the task scheduling problemthrough cloud-fog collaboration however they did notconsider network security and data protection issues Inaddition simply relying on fog computing or cloud com-puting cannot provide more secure protection for industrialdata

In summary the studies discussed in this section solvedthe data protection problem and task scheduling problem tosome extent However for industrial applications with botha large amount of data and a high degree of data protectiontheir results cannot meet the requirements of the IndustrialInternet update speed high data accuracy and large datavolume -us far academic and industrial research on theprotection of sensitive data in the Industrial Internet of-ings (IIoT) remains in its infancy In this study wepropose a protection model for sensitive data that provideshigher protection and is more adaptable to industrialenvironments

3 Sensitive Industrial Data Protection Scheme

Section 31 introduces our protection framework for sen-sitive industrial data and the threat model Sections 32 and33 describe the protection of data for real-time and non-real-time processing respectively

31 Our Work and Contribution

311 Sensitive Industrial Data Protection Framework-is study designs a three-layer protection framework withlocalfogcloud storage for sensitive industrial data asshown in Figure 2 -is framework mainly consists of threemodules M-RAPPOR disturbance module ReedndashSolomon(RS) encoding module and Advanced Encryption Standard(AES) encryption and RS encoding module -is frameworknot only achieves low-cost high-efficiency and intelligentdata protection but also alleviates the storage pressure oflocal equipment through improved recoverability

We divide industrial data into real-time and non-real-time sensitive data depending on its characteristics

(1) For real-time sensitive data we design a data pro-tection scheme based on local differential privacy

and RS encoding We apply the M-RAPPOR algo-rithm to the sensitive data (location information)apply RS encoding to the perturbed data and theundisturbed nonsensitive data and store them inlocal equipment Considering the storage capacity oflocal equipment and the problem that data cannot berecovered owing to local equipment failures weadopt the optimal solution of distributed storage inthe local equipment and add corresponding re-strictions to the RS encoding which not only solvesthe data recovery problem due to the failure of localequipment but also improves the encoding efficiencyand reduces the computational cost If we need toperform predictive maintenance analysis we candecode the data and then analyze it

(2) For non-real-time sensitive data we design a dataprotection scheme based on AES encryption and RSencoding -e local equipment encrypts the non-real-time data using AES uploads the ciphertext tothe fog nodes and performs RS encoding of the dataon the fog nodes Next the encoded data is stored inthe fog nodes and the cloud nodes according to thecloud-fog collaborative storage strategy In this wayif an attempt is made to steal the data from the fognodes or the cloud nodes all the data cannot betaken

312 6reat Model Figure 3 shows the threat model that isdefined according to the data protection framework

-ere are two threats to industrial data

(1) For real-time sensitive data attackers can attack thelocal data center -ey can steal location informationof equipment to infer the general plant layout -elayout is a type of sensitive industrial data andtherefore this attack will lead to the leakage of suchdata

(2) For non-real-time sensitive data attackers (internalpersonnel of the third-party) can abuse data miningto infer customersrsquo private information and thefactoryrsquos business strategy from the factoryrsquos cus-tomer information and marketing plans respec-tively -is will not only threaten customersrsquo privacybut also affect the factoryrsquos overall interests

32 Protection of Real-Time Industrial Data

321 Local Differential Privacy Local differential privacyconsiders scenarios in which third-party data collectors areuntrustworthy Each user perturbs the data according to theprivacy algorithm and then uploads the perturbed data to thedata collector Local differential privacy is formally definedas follows

Definition 1 -ere are n users where each user correspondsto a record A privacy algorithm M is given having a domainDom(M) and a range Ran(M) If algorithm M obtains thesame output result T(TsubeRan(M)) on any two records t and

4 Security and Communication Networks

tprime(t tprime isin Dom(M)) and satisfies the following inequalitiesthen M satisfies local differential privacy

Pr[M(t) T]le eε

times Pr M tprime( 1113857 T1113858 1113859 (1)

As can be seen from Definition 1 the local differentialprivacy ensures that the algorithm M satisfies the ε-localdifferential privacy by controlling the similarity of theoutput results of any two records In short according to the

AES encryption+RS encoding

Real-timeindustrial

sensitive data

Non-real-timeindustrial

sensitive data

M-RAPPOR disturbance

Sensitivedata

(locationinformation)

Non-sensitive data

Industrialcloud

Fog nodes

Cloud

Fog

Localequipment

LocalRSencoding

Data

Figure 2 Protection framework for sensitive industrial data

General plantlayout

Location data

1 Customer private information

2 Factory business strategy

Sensor Sensor Sensor Sensor SensorSensor

Local data center

Fog

Cloud

Local equipmentLocal equipment

Figure 3 -reat model of sensitive industrial data

Security and Communication Networks 5

output result of the privacy algorithm M it is almostimpossible to infer which record the input data is Withlocal differential privacy technology each user can processindividual data independently In other words the processof privacy processing is transferred from the data collectorto a single user and therefore there is no need for trustedthird-party intervention -is can avoid the attacks thatmay be brought on by untrusted third-party data collectors-e implementation of local differential privacy protectionrequires the intervention of a data disturbance mechanismRandom response technology is the mainstream distur-bance mechanism for local differential privacy protectiontechnology

322 ReedndashSolomon Encoding RS encoding is a matrixoperation As shown in Figure 4 the input data is treated asmatrix C (C1 C2 Cn) and the encoding matrix B ismultiplied by matrix C to obtain data blocks(C1 C2 Cn) and y redundant data blocks (n 5 y 3)-e encoding matrix B must have reversibility of anysubmatrix After RS encoding the data is divided into n partsand y redundant data are generated In the data of thesen + y parts one can recover the total data with at least n datavalues In other words the total data cannot be recoveredwith less than n data values

323 Data Protection Scheme Based on Local DifferentialPrivacy and ReedndashSolomon Encoding With continuousdevelopments in industries many factories have startedusing equipment such as CNC machine tools to increaseproductivity However some problems have also beenexposed For example a serious failure of a CNC machinetool will not only lead to the shutdown of the entireproduction line but also seriously delay the product pro-duction cycle -erefore to avoid this situation the factoryshould analyze the data collected by sensors predictpossible situations and warn operators or adjust theequipment according to the forecasted situation in a timelymanner to avoid an accidental shutdown of the equipmentHowever the data contains not only real-time data col-lected by the sensor but also a large amount of sensitivedata of the factory equipment such as node location andspatial coordinates -is type of sensitive data related tolocations should not be leaked before predictive mainte-nance analysis -erefore it needs to be desensitized beforedata analysis to eliminate the correspondence between theequipment and the location In addition because the real-time data information collected by factory sensors is ofgreat value and high confidentiality we cannot upload thisdata to the fog or the cloud it must be stored in localequipment -e protection scheme for real-time sensitivedata is described in detail below

We use local differential privacy to desensitize real-timesensitive data -en we encode the perturbed data andundisturbed nonsensitive data by RS encoding and store it inlocal equipment Figure 5 shows the protection process forreal-time data

-e real-time data collected by the factory sensors is firstuploaded to the local equipment-en the following processis performed in the local equipment

(1) Data Disturbance with Local Differential Privacy Becauseof the limited computing power of local equipment it cannotperform complex computing tasks -erefore it is very im-portant to design a lightweight privacy protection algorithmLocal differential privacy considers the untrustworthiness of thedata collector and its disturbancemechanismmainly includes arandom response information compression and distortion-e disturbance framework of the random response technologyis simple and intuitive and the disturbance level can be directlyquantified -erefore we choose local differential privacy toprotect the sensitive data We refer to the basic idea of theRAPPOR algorithm which is an open-source project of Googleand improve RAPPOR to reduce the computational cost of localequipment while ensuring privacy and data availability In thispaper we label the improved algorithm asM-RAPPOR and weuse it to disturb the location information Section 325 discussesthe specific disturbance steps

(2) RS Encoding and Storage We apply RS encoding to theperturbed data and undisturbed nonsensitive data and store thedata in local equipment However owing to the continuousproduction of data in factories and the limited storage capacityof local equipment the storage pressure on equipment isgradually increasing In addition if the local equipment failsunexpectedly it will not be possible to completely restore thedata resulting in serious losses to the factory To solve these twoproblems we adopt the optimal solution namely distributedstorage in the local equipment and add corresponding re-strictions to RS -is not only solves the problem of the highstorage pressure on local equipment and the fact that datacannot be recovered owing to the failure of the local equipmentbut also improves the encoding efficiency and reduces thecomputational cost Section 324 discusses the specific dis-tributed storage strategy

(3) RS Decoding and Analysis When predictive maintenanceanalysis is performed the data stored in the local equipmentare decoded and the analysis is performed according to thestatistical location distribution and data Section 325 dis-cusses the specific statistical process

lowast =

C

B

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

B11 B13 B14 B15B12 D1

D2

D3

C1

C2

C3

C4

C5

C1

C2

C3

C4

C5

B11 B13 B14 B15B12

B11 B13 B14 B15B12

Figure 4 RS encoding process

6 Security and Communication Networks

324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is

Zi b times Mi

M1 + M2 + middot middot middot + Mn

(2)

To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality

b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

lemlt b (3)

In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is

m b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

(4)

325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability

-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made

(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection

(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding

Masked data

Sensitive data(location space

coordinates)

Nonsensitivedata

Predictivemaintenance

analysis

DesensitizationdataAnalysis result

M-RAPPOR

RS-encoding

Local

Data processing Storage

Industrialproduction

line

Storage

RS decoding

Figure 5 Protection process for real-time data

Security and Communication Networks 7

attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased

-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)

(1) Data Sender

(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1

(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value

P Biprime x( 1113857

f1 x 1

f2 x 0

1 minus f1 minus f2 x Bi

⎧⎪⎪⎨

⎪⎪⎩(5)

To satisfy ε-local differential privacy the privacy budget εis calculated

ε ln1 minus f2

f1 (6)

Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once

(2) Data Collector

(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime

(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows

ni N f2 minus 2f1f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2+

Di f1 minus 1 + f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2

(7)

(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value

In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased

326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data

According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)

attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons

(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation

(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists

(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni

(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location

In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data

33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes

331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment

8 Security and Communication Networks

-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection

332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967

represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc

Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud

nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as

Ttotal Tcloud + Tfog (8)

where Tcloud is the storage time required for the cloud andTfog that for the fog

-e storage time required for each cloud node isexpressed as

Tck tck middot (k minus 1) middotSck

1113936nj1 Scj

(9)

where Tck is the storage time required for cloud node Nck

and tckis the storage time required when cloud node Nck

stores each data block-erefore the storage time Tcloud required for the cloud

is expressed as

Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)

In addition we analyze the two storage methods for thefog nodes as follows

(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as

AES

DATA

Some data

RS encoding

Other data

Local

Equipment 1 Equipment 2 hellip Equipment n

Fog

Cloud

Figure 6 Protection process for data that does not require real-time processing

Security and Communication Networks 9

Tfog

tf1 middot (x + 1) 0lt x + 1leF1 max

tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max

tfn middot x + 1 minus 1113936nminus1

i1Fi max1113888 1113889 + 1113936

nminus1

i1tfi middot Fi max 1113936

nminus1

i1Fi max ltx + 1le 1113936

n

i1Fi max

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi

(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as

Tfk tfk middot (x + 1) middotSfk

1113936ni1 Sfi

(12)

-erefore the storage time Tfog required for the fog isexpressed as

Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)

After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)

333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix

Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory

34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency

341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field

in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values

Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system

342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as

Est n

n + m (14)

-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes

4 Results and Discussion

41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments

411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a

10 Security and Communication Networks

normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters

412 Experimental Results

(1) M-RAPPOR

(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable

(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR

-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value

-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment

(2) ReedndashSolomon EncodingRS encoding consists of the following steps

(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the

Table 1 Degree of difficulty for cracking encoding matrix

Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248

8 6 2 296

16 6 1 296

16 6 2 2192

Table 2 Comparison of processing time for different word lengths

Word size of Galoisfield

Number of source datablocks k

Time(ms)

8 20 19216 20 1928 200 147116 200 1472

Table 3 Environmental parameters (real-time data)

Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB

RAPPORM-RAPPOR

200 300 400 500 600 700 800 900 1000100Number of data (thousand)

0

20

40

60

80

100

120

Tim

e (m

s)

Figure 7 Computational cost of RAPPOR and M-RAPPOR

Security and Communication Networks 11

configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance

(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In

addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability

(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n

6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According

0

000

001

Prop

ortio

n 002

003

50 100 150 0

000

001

Prop

ortio

n

002

003

50 100 150 200

0

0000

0005

0010

0015

0020

Prop

ortio

n

0025

50 100 150

True valueStatistical value

True valueStatistical value

True valueStatistical value

200

f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2

f1 = 001 f2 = 001 ε = 46

Figure 8 Frequency statistics results under different privacy budgets

12 Security and Communication Networks

to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the

corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed

42 Experiments on Protection of Non-Real-TimeIndustrial Data

421 Experimental Configuration

(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes

(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool

422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60

Equalized storageNonequalized storage

18 21 24 27 30 33 36 39 42 4515Number of data blocks

0

50

100

150

200

250

Dec

odin

g tim

e (m

s)

Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time

Encoding time (ms)Decoding time (ms)

5 10 15 20 25 300Number of data blocks

100

150

200

250

300

350

400

450

Tim

e (m

s)

Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks

1 2 3 4 5 60Damaged device number

80

100

120

140

160

180

200

220

240

260

280

Dec

odin

g tim

e (m

s)

Figure 11 Relationship between different local equipment failuresand decoding time

Table 4 Environmental parameters (non-real-time data)

Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB

Security and Communication Networks 13

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 5: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

tprime(t tprime isin Dom(M)) and satisfies the following inequalitiesthen M satisfies local differential privacy

Pr[M(t) T]le eε

times Pr M tprime( 1113857 T1113858 1113859 (1)

As can be seen from Definition 1 the local differentialprivacy ensures that the algorithm M satisfies the ε-localdifferential privacy by controlling the similarity of theoutput results of any two records In short according to the

AES encryption+RS encoding

Real-timeindustrial

sensitive data

Non-real-timeindustrial

sensitive data

M-RAPPOR disturbance

Sensitivedata

(locationinformation)

Non-sensitive data

Industrialcloud

Fog nodes

Cloud

Fog

Localequipment

LocalRSencoding

Data

Figure 2 Protection framework for sensitive industrial data

General plantlayout

Location data

1 Customer private information

2 Factory business strategy

Sensor Sensor Sensor Sensor SensorSensor

Local data center

Fog

Cloud

Local equipmentLocal equipment

Figure 3 -reat model of sensitive industrial data

Security and Communication Networks 5

output result of the privacy algorithm M it is almostimpossible to infer which record the input data is Withlocal differential privacy technology each user can processindividual data independently In other words the processof privacy processing is transferred from the data collectorto a single user and therefore there is no need for trustedthird-party intervention -is can avoid the attacks thatmay be brought on by untrusted third-party data collectors-e implementation of local differential privacy protectionrequires the intervention of a data disturbance mechanismRandom response technology is the mainstream distur-bance mechanism for local differential privacy protectiontechnology

322 ReedndashSolomon Encoding RS encoding is a matrixoperation As shown in Figure 4 the input data is treated asmatrix C (C1 C2 Cn) and the encoding matrix B ismultiplied by matrix C to obtain data blocks(C1 C2 Cn) and y redundant data blocks (n 5 y 3)-e encoding matrix B must have reversibility of anysubmatrix After RS encoding the data is divided into n partsand y redundant data are generated In the data of thesen + y parts one can recover the total data with at least n datavalues In other words the total data cannot be recoveredwith less than n data values

323 Data Protection Scheme Based on Local DifferentialPrivacy and ReedndashSolomon Encoding With continuousdevelopments in industries many factories have startedusing equipment such as CNC machine tools to increaseproductivity However some problems have also beenexposed For example a serious failure of a CNC machinetool will not only lead to the shutdown of the entireproduction line but also seriously delay the product pro-duction cycle -erefore to avoid this situation the factoryshould analyze the data collected by sensors predictpossible situations and warn operators or adjust theequipment according to the forecasted situation in a timelymanner to avoid an accidental shutdown of the equipmentHowever the data contains not only real-time data col-lected by the sensor but also a large amount of sensitivedata of the factory equipment such as node location andspatial coordinates -is type of sensitive data related tolocations should not be leaked before predictive mainte-nance analysis -erefore it needs to be desensitized beforedata analysis to eliminate the correspondence between theequipment and the location In addition because the real-time data information collected by factory sensors is ofgreat value and high confidentiality we cannot upload thisdata to the fog or the cloud it must be stored in localequipment -e protection scheme for real-time sensitivedata is described in detail below

We use local differential privacy to desensitize real-timesensitive data -en we encode the perturbed data andundisturbed nonsensitive data by RS encoding and store it inlocal equipment Figure 5 shows the protection process forreal-time data

-e real-time data collected by the factory sensors is firstuploaded to the local equipment-en the following processis performed in the local equipment

(1) Data Disturbance with Local Differential Privacy Becauseof the limited computing power of local equipment it cannotperform complex computing tasks -erefore it is very im-portant to design a lightweight privacy protection algorithmLocal differential privacy considers the untrustworthiness of thedata collector and its disturbancemechanismmainly includes arandom response information compression and distortion-e disturbance framework of the random response technologyis simple and intuitive and the disturbance level can be directlyquantified -erefore we choose local differential privacy toprotect the sensitive data We refer to the basic idea of theRAPPOR algorithm which is an open-source project of Googleand improve RAPPOR to reduce the computational cost of localequipment while ensuring privacy and data availability In thispaper we label the improved algorithm asM-RAPPOR and weuse it to disturb the location information Section 325 discussesthe specific disturbance steps

(2) RS Encoding and Storage We apply RS encoding to theperturbed data and undisturbed nonsensitive data and store thedata in local equipment However owing to the continuousproduction of data in factories and the limited storage capacityof local equipment the storage pressure on equipment isgradually increasing In addition if the local equipment failsunexpectedly it will not be possible to completely restore thedata resulting in serious losses to the factory To solve these twoproblems we adopt the optimal solution namely distributedstorage in the local equipment and add corresponding re-strictions to RS -is not only solves the problem of the highstorage pressure on local equipment and the fact that datacannot be recovered owing to the failure of the local equipmentbut also improves the encoding efficiency and reduces thecomputational cost Section 324 discusses the specific dis-tributed storage strategy

(3) RS Decoding and Analysis When predictive maintenanceanalysis is performed the data stored in the local equipmentare decoded and the analysis is performed according to thestatistical location distribution and data Section 325 dis-cusses the specific statistical process

lowast =

C

B

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

B11 B13 B14 B15B12 D1

D2

D3

C1

C2

C3

C4

C5

C1

C2

C3

C4

C5

B11 B13 B14 B15B12

B11 B13 B14 B15B12

Figure 4 RS encoding process

6 Security and Communication Networks

324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is

Zi b times Mi

M1 + M2 + middot middot middot + Mn

(2)

To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality

b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

lemlt b (3)

In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is

m b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

(4)

325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability

-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made

(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection

(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding

Masked data

Sensitive data(location space

coordinates)

Nonsensitivedata

Predictivemaintenance

analysis

DesensitizationdataAnalysis result

M-RAPPOR

RS-encoding

Local

Data processing Storage

Industrialproduction

line

Storage

RS decoding

Figure 5 Protection process for real-time data

Security and Communication Networks 7

attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased

-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)

(1) Data Sender

(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1

(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value

P Biprime x( 1113857

f1 x 1

f2 x 0

1 minus f1 minus f2 x Bi

⎧⎪⎪⎨

⎪⎪⎩(5)

To satisfy ε-local differential privacy the privacy budget εis calculated

ε ln1 minus f2

f1 (6)

Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once

(2) Data Collector

(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime

(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows

ni N f2 minus 2f1f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2+

Di f1 minus 1 + f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2

(7)

(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value

In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased

326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data

According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)

attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons

(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation

(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists

(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni

(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location

In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data

33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes

331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment

8 Security and Communication Networks

-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection

332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967

represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc

Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud

nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as

Ttotal Tcloud + Tfog (8)

where Tcloud is the storage time required for the cloud andTfog that for the fog

-e storage time required for each cloud node isexpressed as

Tck tck middot (k minus 1) middotSck

1113936nj1 Scj

(9)

where Tck is the storage time required for cloud node Nck

and tckis the storage time required when cloud node Nck

stores each data block-erefore the storage time Tcloud required for the cloud

is expressed as

Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)

In addition we analyze the two storage methods for thefog nodes as follows

(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as

AES

DATA

Some data

RS encoding

Other data

Local

Equipment 1 Equipment 2 hellip Equipment n

Fog

Cloud

Figure 6 Protection process for data that does not require real-time processing

Security and Communication Networks 9

Tfog

tf1 middot (x + 1) 0lt x + 1leF1 max

tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max

tfn middot x + 1 minus 1113936nminus1

i1Fi max1113888 1113889 + 1113936

nminus1

i1tfi middot Fi max 1113936

nminus1

i1Fi max ltx + 1le 1113936

n

i1Fi max

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi

(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as

Tfk tfk middot (x + 1) middotSfk

1113936ni1 Sfi

(12)

-erefore the storage time Tfog required for the fog isexpressed as

Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)

After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)

333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix

Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory

34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency

341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field

in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values

Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system

342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as

Est n

n + m (14)

-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes

4 Results and Discussion

41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments

411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a

10 Security and Communication Networks

normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters

412 Experimental Results

(1) M-RAPPOR

(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable

(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR

-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value

-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment

(2) ReedndashSolomon EncodingRS encoding consists of the following steps

(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the

Table 1 Degree of difficulty for cracking encoding matrix

Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248

8 6 2 296

16 6 1 296

16 6 2 2192

Table 2 Comparison of processing time for different word lengths

Word size of Galoisfield

Number of source datablocks k

Time(ms)

8 20 19216 20 1928 200 147116 200 1472

Table 3 Environmental parameters (real-time data)

Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB

RAPPORM-RAPPOR

200 300 400 500 600 700 800 900 1000100Number of data (thousand)

0

20

40

60

80

100

120

Tim

e (m

s)

Figure 7 Computational cost of RAPPOR and M-RAPPOR

Security and Communication Networks 11

configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance

(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In

addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability

(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n

6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According

0

000

001

Prop

ortio

n 002

003

50 100 150 0

000

001

Prop

ortio

n

002

003

50 100 150 200

0

0000

0005

0010

0015

0020

Prop

ortio

n

0025

50 100 150

True valueStatistical value

True valueStatistical value

True valueStatistical value

200

f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2

f1 = 001 f2 = 001 ε = 46

Figure 8 Frequency statistics results under different privacy budgets

12 Security and Communication Networks

to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the

corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed

42 Experiments on Protection of Non-Real-TimeIndustrial Data

421 Experimental Configuration

(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes

(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool

422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60

Equalized storageNonequalized storage

18 21 24 27 30 33 36 39 42 4515Number of data blocks

0

50

100

150

200

250

Dec

odin

g tim

e (m

s)

Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time

Encoding time (ms)Decoding time (ms)

5 10 15 20 25 300Number of data blocks

100

150

200

250

300

350

400

450

Tim

e (m

s)

Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks

1 2 3 4 5 60Damaged device number

80

100

120

140

160

180

200

220

240

260

280

Dec

odin

g tim

e (m

s)

Figure 11 Relationship between different local equipment failuresand decoding time

Table 4 Environmental parameters (non-real-time data)

Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB

Security and Communication Networks 13

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 6: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

output result of the privacy algorithm M it is almostimpossible to infer which record the input data is Withlocal differential privacy technology each user can processindividual data independently In other words the processof privacy processing is transferred from the data collectorto a single user and therefore there is no need for trustedthird-party intervention -is can avoid the attacks thatmay be brought on by untrusted third-party data collectors-e implementation of local differential privacy protectionrequires the intervention of a data disturbance mechanismRandom response technology is the mainstream distur-bance mechanism for local differential privacy protectiontechnology

322 ReedndashSolomon Encoding RS encoding is a matrixoperation As shown in Figure 4 the input data is treated asmatrix C (C1 C2 Cn) and the encoding matrix B ismultiplied by matrix C to obtain data blocks(C1 C2 Cn) and y redundant data blocks (n 5 y 3)-e encoding matrix B must have reversibility of anysubmatrix After RS encoding the data is divided into n partsand y redundant data are generated In the data of thesen + y parts one can recover the total data with at least n datavalues In other words the total data cannot be recoveredwith less than n data values

323 Data Protection Scheme Based on Local DifferentialPrivacy and ReedndashSolomon Encoding With continuousdevelopments in industries many factories have startedusing equipment such as CNC machine tools to increaseproductivity However some problems have also beenexposed For example a serious failure of a CNC machinetool will not only lead to the shutdown of the entireproduction line but also seriously delay the product pro-duction cycle -erefore to avoid this situation the factoryshould analyze the data collected by sensors predictpossible situations and warn operators or adjust theequipment according to the forecasted situation in a timelymanner to avoid an accidental shutdown of the equipmentHowever the data contains not only real-time data col-lected by the sensor but also a large amount of sensitivedata of the factory equipment such as node location andspatial coordinates -is type of sensitive data related tolocations should not be leaked before predictive mainte-nance analysis -erefore it needs to be desensitized beforedata analysis to eliminate the correspondence between theequipment and the location In addition because the real-time data information collected by factory sensors is ofgreat value and high confidentiality we cannot upload thisdata to the fog or the cloud it must be stored in localequipment -e protection scheme for real-time sensitivedata is described in detail below

We use local differential privacy to desensitize real-timesensitive data -en we encode the perturbed data andundisturbed nonsensitive data by RS encoding and store it inlocal equipment Figure 5 shows the protection process forreal-time data

-e real-time data collected by the factory sensors is firstuploaded to the local equipment-en the following processis performed in the local equipment

(1) Data Disturbance with Local Differential Privacy Becauseof the limited computing power of local equipment it cannotperform complex computing tasks -erefore it is very im-portant to design a lightweight privacy protection algorithmLocal differential privacy considers the untrustworthiness of thedata collector and its disturbancemechanismmainly includes arandom response information compression and distortion-e disturbance framework of the random response technologyis simple and intuitive and the disturbance level can be directlyquantified -erefore we choose local differential privacy toprotect the sensitive data We refer to the basic idea of theRAPPOR algorithm which is an open-source project of Googleand improve RAPPOR to reduce the computational cost of localequipment while ensuring privacy and data availability In thispaper we label the improved algorithm asM-RAPPOR and weuse it to disturb the location information Section 325 discussesthe specific disturbance steps

(2) RS Encoding and Storage We apply RS encoding to theperturbed data and undisturbed nonsensitive data and store thedata in local equipment However owing to the continuousproduction of data in factories and the limited storage capacityof local equipment the storage pressure on equipment isgradually increasing In addition if the local equipment failsunexpectedly it will not be possible to completely restore thedata resulting in serious losses to the factory To solve these twoproblems we adopt the optimal solution namely distributedstorage in the local equipment and add corresponding re-strictions to RS -is not only solves the problem of the highstorage pressure on local equipment and the fact that datacannot be recovered owing to the failure of the local equipmentbut also improves the encoding efficiency and reduces thecomputational cost Section 324 discusses the specific dis-tributed storage strategy

(3) RS Decoding and Analysis When predictive maintenanceanalysis is performed the data stored in the local equipmentare decoded and the analysis is performed according to thestatistical location distribution and data Section 325 dis-cusses the specific statistical process

lowast =

C

B

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

B11 B13 B14 B15B12 D1

D2

D3

C1

C2

C3

C4

C5

C1

C2

C3

C4

C5

B11 B13 B14 B15B12

B11 B13 B14 B15B12

Figure 4 RS encoding process

6 Security and Communication Networks

324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is

Zi b times Mi

M1 + M2 + middot middot middot + Mn

(2)

To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality

b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

lemlt b (3)

In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is

m b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

(4)

325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability

-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made

(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection

(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding

Masked data

Sensitive data(location space

coordinates)

Nonsensitivedata

Predictivemaintenance

analysis

DesensitizationdataAnalysis result

M-RAPPOR

RS-encoding

Local

Data processing Storage

Industrialproduction

line

Storage

RS decoding

Figure 5 Protection process for real-time data

Security and Communication Networks 7

attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased

-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)

(1) Data Sender

(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1

(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value

P Biprime x( 1113857

f1 x 1

f2 x 0

1 minus f1 minus f2 x Bi

⎧⎪⎪⎨

⎪⎪⎩(5)

To satisfy ε-local differential privacy the privacy budget εis calculated

ε ln1 minus f2

f1 (6)

Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once

(2) Data Collector

(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime

(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows

ni N f2 minus 2f1f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2+

Di f1 minus 1 + f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2

(7)

(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value

In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased

326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data

According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)

attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons

(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation

(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists

(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni

(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location

In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data

33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes

331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment

8 Security and Communication Networks

-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection

332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967

represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc

Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud

nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as

Ttotal Tcloud + Tfog (8)

where Tcloud is the storage time required for the cloud andTfog that for the fog

-e storage time required for each cloud node isexpressed as

Tck tck middot (k minus 1) middotSck

1113936nj1 Scj

(9)

where Tck is the storage time required for cloud node Nck

and tckis the storage time required when cloud node Nck

stores each data block-erefore the storage time Tcloud required for the cloud

is expressed as

Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)

In addition we analyze the two storage methods for thefog nodes as follows

(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as

AES

DATA

Some data

RS encoding

Other data

Local

Equipment 1 Equipment 2 hellip Equipment n

Fog

Cloud

Figure 6 Protection process for data that does not require real-time processing

Security and Communication Networks 9

Tfog

tf1 middot (x + 1) 0lt x + 1leF1 max

tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max

tfn middot x + 1 minus 1113936nminus1

i1Fi max1113888 1113889 + 1113936

nminus1

i1tfi middot Fi max 1113936

nminus1

i1Fi max ltx + 1le 1113936

n

i1Fi max

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi

(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as

Tfk tfk middot (x + 1) middotSfk

1113936ni1 Sfi

(12)

-erefore the storage time Tfog required for the fog isexpressed as

Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)

After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)

333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix

Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory

34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency

341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field

in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values

Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system

342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as

Est n

n + m (14)

-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes

4 Results and Discussion

41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments

411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a

10 Security and Communication Networks

normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters

412 Experimental Results

(1) M-RAPPOR

(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable

(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR

-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value

-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment

(2) ReedndashSolomon EncodingRS encoding consists of the following steps

(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the

Table 1 Degree of difficulty for cracking encoding matrix

Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248

8 6 2 296

16 6 1 296

16 6 2 2192

Table 2 Comparison of processing time for different word lengths

Word size of Galoisfield

Number of source datablocks k

Time(ms)

8 20 19216 20 1928 200 147116 200 1472

Table 3 Environmental parameters (real-time data)

Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB

RAPPORM-RAPPOR

200 300 400 500 600 700 800 900 1000100Number of data (thousand)

0

20

40

60

80

100

120

Tim

e (m

s)

Figure 7 Computational cost of RAPPOR and M-RAPPOR

Security and Communication Networks 11

configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance

(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In

addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability

(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n

6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According

0

000

001

Prop

ortio

n 002

003

50 100 150 0

000

001

Prop

ortio

n

002

003

50 100 150 200

0

0000

0005

0010

0015

0020

Prop

ortio

n

0025

50 100 150

True valueStatistical value

True valueStatistical value

True valueStatistical value

200

f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2

f1 = 001 f2 = 001 ε = 46

Figure 8 Frequency statistics results under different privacy budgets

12 Security and Communication Networks

to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the

corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed

42 Experiments on Protection of Non-Real-TimeIndustrial Data

421 Experimental Configuration

(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes

(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool

422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60

Equalized storageNonequalized storage

18 21 24 27 30 33 36 39 42 4515Number of data blocks

0

50

100

150

200

250

Dec

odin

g tim

e (m

s)

Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time

Encoding time (ms)Decoding time (ms)

5 10 15 20 25 300Number of data blocks

100

150

200

250

300

350

400

450

Tim

e (m

s)

Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks

1 2 3 4 5 60Damaged device number

80

100

120

140

160

180

200

220

240

260

280

Dec

odin

g tim

e (m

s)

Figure 11 Relationship between different local equipment failuresand decoding time

Table 4 Environmental parameters (non-real-time data)

Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB

Security and Communication Networks 13

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 7: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is

Zi b times Mi

M1 + M2 + middot middot middot + Mn

(2)

To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality

b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

lemlt b (3)

In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is

m b times max M1 M2 Mn1113864 1113865

M1 + M2 + middot middot middot + Mn

(4)

325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability

-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made

(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection

(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding

Masked data

Sensitive data(location space

coordinates)

Nonsensitivedata

Predictivemaintenance

analysis

DesensitizationdataAnalysis result

M-RAPPOR

RS-encoding

Local

Data processing Storage

Industrialproduction

line

Storage

RS decoding

Figure 5 Protection process for real-time data

Security and Communication Networks 7

attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased

-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)

(1) Data Sender

(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1

(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value

P Biprime x( 1113857

f1 x 1

f2 x 0

1 minus f1 minus f2 x Bi

⎧⎪⎪⎨

⎪⎪⎩(5)

To satisfy ε-local differential privacy the privacy budget εis calculated

ε ln1 minus f2

f1 (6)

Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once

(2) Data Collector

(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime

(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows

ni N f2 minus 2f1f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2+

Di f1 minus 1 + f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2

(7)

(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value

In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased

326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data

According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)

attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons

(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation

(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists

(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni

(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location

In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data

33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes

331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment

8 Security and Communication Networks

-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection

332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967

represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc

Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud

nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as

Ttotal Tcloud + Tfog (8)

where Tcloud is the storage time required for the cloud andTfog that for the fog

-e storage time required for each cloud node isexpressed as

Tck tck middot (k minus 1) middotSck

1113936nj1 Scj

(9)

where Tck is the storage time required for cloud node Nck

and tckis the storage time required when cloud node Nck

stores each data block-erefore the storage time Tcloud required for the cloud

is expressed as

Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)

In addition we analyze the two storage methods for thefog nodes as follows

(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as

AES

DATA

Some data

RS encoding

Other data

Local

Equipment 1 Equipment 2 hellip Equipment n

Fog

Cloud

Figure 6 Protection process for data that does not require real-time processing

Security and Communication Networks 9

Tfog

tf1 middot (x + 1) 0lt x + 1leF1 max

tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max

tfn middot x + 1 minus 1113936nminus1

i1Fi max1113888 1113889 + 1113936

nminus1

i1tfi middot Fi max 1113936

nminus1

i1Fi max ltx + 1le 1113936

n

i1Fi max

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi

(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as

Tfk tfk middot (x + 1) middotSfk

1113936ni1 Sfi

(12)

-erefore the storage time Tfog required for the fog isexpressed as

Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)

After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)

333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix

Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory

34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency

341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field

in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values

Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system

342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as

Est n

n + m (14)

-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes

4 Results and Discussion

41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments

411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a

10 Security and Communication Networks

normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters

412 Experimental Results

(1) M-RAPPOR

(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable

(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR

-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value

-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment

(2) ReedndashSolomon EncodingRS encoding consists of the following steps

(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the

Table 1 Degree of difficulty for cracking encoding matrix

Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248

8 6 2 296

16 6 1 296

16 6 2 2192

Table 2 Comparison of processing time for different word lengths

Word size of Galoisfield

Number of source datablocks k

Time(ms)

8 20 19216 20 1928 200 147116 200 1472

Table 3 Environmental parameters (real-time data)

Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB

RAPPORM-RAPPOR

200 300 400 500 600 700 800 900 1000100Number of data (thousand)

0

20

40

60

80

100

120

Tim

e (m

s)

Figure 7 Computational cost of RAPPOR and M-RAPPOR

Security and Communication Networks 11

configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance

(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In

addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability

(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n

6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According

0

000

001

Prop

ortio

n 002

003

50 100 150 0

000

001

Prop

ortio

n

002

003

50 100 150 200

0

0000

0005

0010

0015

0020

Prop

ortio

n

0025

50 100 150

True valueStatistical value

True valueStatistical value

True valueStatistical value

200

f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2

f1 = 001 f2 = 001 ε = 46

Figure 8 Frequency statistics results under different privacy budgets

12 Security and Communication Networks

to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the

corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed

42 Experiments on Protection of Non-Real-TimeIndustrial Data

421 Experimental Configuration

(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes

(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool

422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60

Equalized storageNonequalized storage

18 21 24 27 30 33 36 39 42 4515Number of data blocks

0

50

100

150

200

250

Dec

odin

g tim

e (m

s)

Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time

Encoding time (ms)Decoding time (ms)

5 10 15 20 25 300Number of data blocks

100

150

200

250

300

350

400

450

Tim

e (m

s)

Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks

1 2 3 4 5 60Damaged device number

80

100

120

140

160

180

200

220

240

260

280

Dec

odin

g tim

e (m

s)

Figure 11 Relationship between different local equipment failuresand decoding time

Table 4 Environmental parameters (non-real-time data)

Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB

Security and Communication Networks 13

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 8: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased

-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)

(1) Data Sender

(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1

(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value

P Biprime x( 1113857

f1 x 1

f2 x 0

1 minus f1 minus f2 x Bi

⎧⎪⎪⎨

⎪⎪⎩(5)

To satisfy ε-local differential privacy the privacy budget εis calculated

ε ln1 minus f2

f1 (6)

Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once

(2) Data Collector

(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime

(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows

ni N f2 minus 2f1f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2+

Di f1 minus 1 + f2( 1113857

2f1 minus 4f1f2 minus 1 + 2f2

(7)

(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value

In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased

326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data

According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)

attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons

(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation

(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists

(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni

(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location

In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data

33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes

331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment

8 Security and Communication Networks

-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection

332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967

represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc

Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud

nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as

Ttotal Tcloud + Tfog (8)

where Tcloud is the storage time required for the cloud andTfog that for the fog

-e storage time required for each cloud node isexpressed as

Tck tck middot (k minus 1) middotSck

1113936nj1 Scj

(9)

where Tck is the storage time required for cloud node Nck

and tckis the storage time required when cloud node Nck

stores each data block-erefore the storage time Tcloud required for the cloud

is expressed as

Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)

In addition we analyze the two storage methods for thefog nodes as follows

(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as

AES

DATA

Some data

RS encoding

Other data

Local

Equipment 1 Equipment 2 hellip Equipment n

Fog

Cloud

Figure 6 Protection process for data that does not require real-time processing

Security and Communication Networks 9

Tfog

tf1 middot (x + 1) 0lt x + 1leF1 max

tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max

tfn middot x + 1 minus 1113936nminus1

i1Fi max1113888 1113889 + 1113936

nminus1

i1tfi middot Fi max 1113936

nminus1

i1Fi max ltx + 1le 1113936

n

i1Fi max

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi

(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as

Tfk tfk middot (x + 1) middotSfk

1113936ni1 Sfi

(12)

-erefore the storage time Tfog required for the fog isexpressed as

Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)

After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)

333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix

Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory

34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency

341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field

in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values

Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system

342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as

Est n

n + m (14)

-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes

4 Results and Discussion

41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments

411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a

10 Security and Communication Networks

normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters

412 Experimental Results

(1) M-RAPPOR

(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable

(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR

-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value

-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment

(2) ReedndashSolomon EncodingRS encoding consists of the following steps

(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the

Table 1 Degree of difficulty for cracking encoding matrix

Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248

8 6 2 296

16 6 1 296

16 6 2 2192

Table 2 Comparison of processing time for different word lengths

Word size of Galoisfield

Number of source datablocks k

Time(ms)

8 20 19216 20 1928 200 147116 200 1472

Table 3 Environmental parameters (real-time data)

Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB

RAPPORM-RAPPOR

200 300 400 500 600 700 800 900 1000100Number of data (thousand)

0

20

40

60

80

100

120

Tim

e (m

s)

Figure 7 Computational cost of RAPPOR and M-RAPPOR

Security and Communication Networks 11

configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance

(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In

addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability

(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n

6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According

0

000

001

Prop

ortio

n 002

003

50 100 150 0

000

001

Prop

ortio

n

002

003

50 100 150 200

0

0000

0005

0010

0015

0020

Prop

ortio

n

0025

50 100 150

True valueStatistical value

True valueStatistical value

True valueStatistical value

200

f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2

f1 = 001 f2 = 001 ε = 46

Figure 8 Frequency statistics results under different privacy budgets

12 Security and Communication Networks

to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the

corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed

42 Experiments on Protection of Non-Real-TimeIndustrial Data

421 Experimental Configuration

(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes

(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool

422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60

Equalized storageNonequalized storage

18 21 24 27 30 33 36 39 42 4515Number of data blocks

0

50

100

150

200

250

Dec

odin

g tim

e (m

s)

Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time

Encoding time (ms)Decoding time (ms)

5 10 15 20 25 300Number of data blocks

100

150

200

250

300

350

400

450

Tim

e (m

s)

Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks

1 2 3 4 5 60Damaged device number

80

100

120

140

160

180

200

220

240

260

280

Dec

odin

g tim

e (m

s)

Figure 11 Relationship between different local equipment failuresand decoding time

Table 4 Environmental parameters (non-real-time data)

Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB

Security and Communication Networks 13

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 9: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection

332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x

redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967

represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc

Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud

nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as

Ttotal Tcloud + Tfog (8)

where Tcloud is the storage time required for the cloud andTfog that for the fog

-e storage time required for each cloud node isexpressed as

Tck tck middot (k minus 1) middotSck

1113936nj1 Scj

(9)

where Tck is the storage time required for cloud node Nck

and tckis the storage time required when cloud node Nck

stores each data block-erefore the storage time Tcloud required for the cloud

is expressed as

Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)

In addition we analyze the two storage methods for thefog nodes as follows

(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as

AES

DATA

Some data

RS encoding

Other data

Local

Equipment 1 Equipment 2 hellip Equipment n

Fog

Cloud

Figure 6 Protection process for data that does not require real-time processing

Security and Communication Networks 9

Tfog

tf1 middot (x + 1) 0lt x + 1leF1 max

tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max

tfn middot x + 1 minus 1113936nminus1

i1Fi max1113888 1113889 + 1113936

nminus1

i1tfi middot Fi max 1113936

nminus1

i1Fi max ltx + 1le 1113936

n

i1Fi max

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi

(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as

Tfk tfk middot (x + 1) middotSfk

1113936ni1 Sfi

(12)

-erefore the storage time Tfog required for the fog isexpressed as

Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)

After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)

333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix

Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory

34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency

341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field

in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values

Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system

342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as

Est n

n + m (14)

-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes

4 Results and Discussion

41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments

411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a

10 Security and Communication Networks

normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters

412 Experimental Results

(1) M-RAPPOR

(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable

(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR

-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value

-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment

(2) ReedndashSolomon EncodingRS encoding consists of the following steps

(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the

Table 1 Degree of difficulty for cracking encoding matrix

Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248

8 6 2 296

16 6 1 296

16 6 2 2192

Table 2 Comparison of processing time for different word lengths

Word size of Galoisfield

Number of source datablocks k

Time(ms)

8 20 19216 20 1928 200 147116 200 1472

Table 3 Environmental parameters (real-time data)

Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB

RAPPORM-RAPPOR

200 300 400 500 600 700 800 900 1000100Number of data (thousand)

0

20

40

60

80

100

120

Tim

e (m

s)

Figure 7 Computational cost of RAPPOR and M-RAPPOR

Security and Communication Networks 11

configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance

(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In

addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability

(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n

6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According

0

000

001

Prop

ortio

n 002

003

50 100 150 0

000

001

Prop

ortio

n

002

003

50 100 150 200

0

0000

0005

0010

0015

0020

Prop

ortio

n

0025

50 100 150

True valueStatistical value

True valueStatistical value

True valueStatistical value

200

f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2

f1 = 001 f2 = 001 ε = 46

Figure 8 Frequency statistics results under different privacy budgets

12 Security and Communication Networks

to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the

corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed

42 Experiments on Protection of Non-Real-TimeIndustrial Data

421 Experimental Configuration

(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes

(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool

422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60

Equalized storageNonequalized storage

18 21 24 27 30 33 36 39 42 4515Number of data blocks

0

50

100

150

200

250

Dec

odin

g tim

e (m

s)

Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time

Encoding time (ms)Decoding time (ms)

5 10 15 20 25 300Number of data blocks

100

150

200

250

300

350

400

450

Tim

e (m

s)

Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks

1 2 3 4 5 60Damaged device number

80

100

120

140

160

180

200

220

240

260

280

Dec

odin

g tim

e (m

s)

Figure 11 Relationship between different local equipment failuresand decoding time

Table 4 Environmental parameters (non-real-time data)

Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB

Security and Communication Networks 13

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 10: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

Tfog

tf1 middot (x + 1) 0lt x + 1leF1 max

tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max

tfn middot x + 1 minus 1113936nminus1

i1Fi max1113888 1113889 + 1113936

nminus1

i1tfi middot Fi max 1113936

nminus1

i1Fi max ltx + 1le 1113936

n

i1Fi max

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi

(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as

Tfk tfk middot (x + 1) middotSfk

1113936ni1 Sfi

(12)

-erefore the storage time Tfog required for the fog isexpressed as

Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)

After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)

333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix

Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory

34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency

341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field

in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values

Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system

342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as

Est n

n + m (14)

-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes

4 Results and Discussion

41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments

411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a

10 Security and Communication Networks

normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters

412 Experimental Results

(1) M-RAPPOR

(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable

(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR

-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value

-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment

(2) ReedndashSolomon EncodingRS encoding consists of the following steps

(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the

Table 1 Degree of difficulty for cracking encoding matrix

Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248

8 6 2 296

16 6 1 296

16 6 2 2192

Table 2 Comparison of processing time for different word lengths

Word size of Galoisfield

Number of source datablocks k

Time(ms)

8 20 19216 20 1928 200 147116 200 1472

Table 3 Environmental parameters (real-time data)

Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB

RAPPORM-RAPPOR

200 300 400 500 600 700 800 900 1000100Number of data (thousand)

0

20

40

60

80

100

120

Tim

e (m

s)

Figure 7 Computational cost of RAPPOR and M-RAPPOR

Security and Communication Networks 11

configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance

(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In

addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability

(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n

6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According

0

000

001

Prop

ortio

n 002

003

50 100 150 0

000

001

Prop

ortio

n

002

003

50 100 150 200

0

0000

0005

0010

0015

0020

Prop

ortio

n

0025

50 100 150

True valueStatistical value

True valueStatistical value

True valueStatistical value

200

f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2

f1 = 001 f2 = 001 ε = 46

Figure 8 Frequency statistics results under different privacy budgets

12 Security and Communication Networks

to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the

corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed

42 Experiments on Protection of Non-Real-TimeIndustrial Data

421 Experimental Configuration

(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes

(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool

422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60

Equalized storageNonequalized storage

18 21 24 27 30 33 36 39 42 4515Number of data blocks

0

50

100

150

200

250

Dec

odin

g tim

e (m

s)

Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time

Encoding time (ms)Decoding time (ms)

5 10 15 20 25 300Number of data blocks

100

150

200

250

300

350

400

450

Tim

e (m

s)

Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks

1 2 3 4 5 60Damaged device number

80

100

120

140

160

180

200

220

240

260

280

Dec

odin

g tim

e (m

s)

Figure 11 Relationship between different local equipment failuresand decoding time

Table 4 Environmental parameters (non-real-time data)

Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB

Security and Communication Networks 13

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 11: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters

412 Experimental Results

(1) M-RAPPOR

(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable

(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR

-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value

-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment

(2) ReedndashSolomon EncodingRS encoding consists of the following steps

(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the

Table 1 Degree of difficulty for cracking encoding matrix

Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248

8 6 2 296

16 6 1 296

16 6 2 2192

Table 2 Comparison of processing time for different word lengths

Word size of Galoisfield

Number of source datablocks k

Time(ms)

8 20 19216 20 1928 200 147116 200 1472

Table 3 Environmental parameters (real-time data)

Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB

RAPPORM-RAPPOR

200 300 400 500 600 700 800 900 1000100Number of data (thousand)

0

20

40

60

80

100

120

Tim

e (m

s)

Figure 7 Computational cost of RAPPOR and M-RAPPOR

Security and Communication Networks 11

configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance

(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In

addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability

(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n

6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According

0

000

001

Prop

ortio

n 002

003

50 100 150 0

000

001

Prop

ortio

n

002

003

50 100 150 200

0

0000

0005

0010

0015

0020

Prop

ortio

n

0025

50 100 150

True valueStatistical value

True valueStatistical value

True valueStatistical value

200

f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2

f1 = 001 f2 = 001 ε = 46

Figure 8 Frequency statistics results under different privacy budgets

12 Security and Communication Networks

to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the

corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed

42 Experiments on Protection of Non-Real-TimeIndustrial Data

421 Experimental Configuration

(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes

(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool

422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60

Equalized storageNonequalized storage

18 21 24 27 30 33 36 39 42 4515Number of data blocks

0

50

100

150

200

250

Dec

odin

g tim

e (m

s)

Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time

Encoding time (ms)Decoding time (ms)

5 10 15 20 25 300Number of data blocks

100

150

200

250

300

350

400

450

Tim

e (m

s)

Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks

1 2 3 4 5 60Damaged device number

80

100

120

140

160

180

200

220

240

260

280

Dec

odin

g tim

e (m

s)

Figure 11 Relationship between different local equipment failuresand decoding time

Table 4 Environmental parameters (non-real-time data)

Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB

Security and Communication Networks 13

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 12: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance

(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In

addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability

(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n

6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According

0

000

001

Prop

ortio

n 002

003

50 100 150 0

000

001

Prop

ortio

n

002

003

50 100 150 200

0

0000

0005

0010

0015

0020

Prop

ortio

n

0025

50 100 150

True valueStatistical value

True valueStatistical value

True valueStatistical value

200

f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2

f1 = 001 f2 = 001 ε = 46

Figure 8 Frequency statistics results under different privacy budgets

12 Security and Communication Networks

to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the

corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed

42 Experiments on Protection of Non-Real-TimeIndustrial Data

421 Experimental Configuration

(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes

(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool

422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60

Equalized storageNonequalized storage

18 21 24 27 30 33 36 39 42 4515Number of data blocks

0

50

100

150

200

250

Dec

odin

g tim

e (m

s)

Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time

Encoding time (ms)Decoding time (ms)

5 10 15 20 25 300Number of data blocks

100

150

200

250

300

350

400

450

Tim

e (m

s)

Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks

1 2 3 4 5 60Damaged device number

80

100

120

140

160

180

200

220

240

260

280

Dec

odin

g tim

e (m

s)

Figure 11 Relationship between different local equipment failuresand decoding time

Table 4 Environmental parameters (non-real-time data)

Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB

Security and Communication Networks 13

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 13: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the

corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed

42 Experiments on Protection of Non-Real-TimeIndustrial Data

421 Experimental Configuration

(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes

(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool

422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60

Equalized storageNonequalized storage

18 21 24 27 30 33 36 39 42 4515Number of data blocks

0

50

100

150

200

250

Dec

odin

g tim

e (m

s)

Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time

Encoding time (ms)Decoding time (ms)

5 10 15 20 25 300Number of data blocks

100

150

200

250

300

350

400

450

Tim

e (m

s)

Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks

1 2 3 4 5 60Damaged device number

80

100

120

140

160

180

200

220

240

260

280

Dec

odin

g tim

e (m

s)

Figure 11 Relationship between different local equipment failuresand decoding time

Table 4 Environmental parameters (non-real-time data)

Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB

Security and Communication Networks 13

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 14: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution

5 Conclusions

At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer

protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations

Data Availability

Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project

Storage method 1Storage method 2

224 284 344 404164Number of cloud-fog collaborative storage data blocks

0

100

200

300

400

500

600

700

800

Tota

l tim

e req

uire

d fo

r clo

ud-fo

gco

llabo

rativ

e sto

rage

(ms)

Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes

14 Security and Communication Networks

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 15: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)

References

[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018

[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020

[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html

[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017

[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009

[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013

[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014

[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013

[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014

[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015

[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018

[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018

[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019

[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018

[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017

[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017

[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017

[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014

[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011

[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015

[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016

[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016

[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c

[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012

[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013

[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017

[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016

[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015

[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017

Security and Communication Networks 15

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks

Page 16: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed

[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017

[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016

16 Security and Communication Networks