protectionofsensitivedatainindustrialinternetbasedon three...
TRANSCRIPT
![Page 1: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/1.jpg)
Research ArticleProtection of Sensitive Data in Industrial Internet Based onThree-Layer LocalFogCloud Storage
Jing Liu 12 Changbo Yuan1 Yingxu Lai 13 and Hua Qin4
1College of Computer Science Faculty of Information Technology Beijing University of Technology Beijing 100124 China2Shaanxi Key Laboratory of Network and System Security Xidian University Xian 710071 China3Science and Technology on Information Assurance Laboratory Beijing 100072 China4Information Technology Support Center Beijing University of Technology Beijing 100124 China
Correspondence should be addressed to Yingxu Lai laiyingxubjuteducn
Received 27 October 2019 Revised 3 February 2020 Accepted 5 February 2020 Published 2 April 2020
Academic Editor Prosanta Gope
Copyright copy 2020 Jing Liu et al-is is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Industrial Internet technology has developed rapidly and the security of industrial data has received much attention At presentindustrial enterprises lack a safe and professional data security system-us industries urgently need a complete and effective dataprotection scheme -is study develops a three-layer framework with localfogcloud storage for protecting sensitive industrialdata and defines a threat model For real-time sensitive industrial data we use the improved local differential privacy algorithmM-RAPPOR to perturb sensitive information We encode the desensitized data using ReedndashSolomon (RS) encoding and thenstore them in local equipment to realize low cost high efficiency and intelligent data protection For non-real-time sensitiveindustrial data we adopt a cloud-fog collaborative storage scheme based on AES-RS encoding to invisibly provide multilayerprotection We adopt the optimal solution of distributed storage in local equipment and the cloud-fog collaborative storagescheme in fog nodes and cloud nodes to alleviate the storage pressure on local equipment and to improve security and re-coverability According to the defined threat model we conduct a security analysis and prove that the proposed scheme canprovide stronger data protection for sensitive data Compared with traditional methods this approach strengthens the protectionof sensitive information and ensures real-time continuity of open data sharing Finally the feasibility of our scheme is validatedthrough experimental evaluation
1 Introduction
Intelligent manufacturing which consists of a man-machineintegrated intelligent system with intelligent machines andhuman experts is an inevitable trend in the continuingdevelopment of the global manufacturing industry [1] Inrecent years with the rapid development of Industrial In-ternet technology industrial data protection has attractedmuch research interest [2] In the industrial productionprocess a large amount of sensitive data is generated in-cluding data from the manufacturing process of the pro-duction line product cost information operations dataoperations information marketing strategy intellectualproperty rights and customer data If this sensitive data isleaked it may lead to significant business information loss oreven affect the reputation of an enterprise In recent years
information leakage incidents have occurred repeatedly Forexample in 2017 Equifax (Atlanta US) announced that adata breach had occurred Recently UpGuard (SydneyAustralia) a cybersecurity company reported that re-searchers foundmore than 540 million records on AmazonrsquosS3 server including Facebook user information such ascomments responses and account names [3] -erefore inthe context of the Industrial Internet it is a major challengeto ensure that sensitive information is not leaked
In actual industrial scenarios some industrial data mustbe processed in real time such as predictive maintenancedata in current intelligent factories If computer numericalcontrol (CNC) machine tools fail delays in the productproduction cycle as well as economic losses could occur-erefore to avoid equipment failure factories shouldperform predictive maintenance work such as collecting
HindawiSecurity and Communication NetworksVolume 2020 Article ID 2017930 16 pageshttpsdoiorg10115520202017930
sensor data in real time predicting possible situations andadjusting the equipment according to the predicted situationin a timely manner Sensor information such as node lo-cations and the spatial coordinates of the sensors themselvesalso constitute sensitive data We label this as real-timesensitive data To prevent the leakage of sensitive infor-mation and to ensure data security a secure storage methodis needed to store real-time sensitive data includingequipment locations safely and without affecting dataavailability
In addition a large amount of sensitive data is notprocessed in real time including a companyrsquos financial datainventory data production data marketing plans customerinformation intellectual property rights and supplier in-formation [4] We label this as non-real-time sensitive data-is data is usually stored with third-party cloud serviceproviders who may claim that the data is encrypted butcannot verify the encryption To prevent data leakage wecannot store the non-real-time sensitive data in the cloud-us for non-real-time sensitive data it is necessary todesign a data protection scheme that can fully utilize cloudstorage and ensure data security
-e recent development of cloud-fog collaborativecomputing provides a new approach to solve this problemCloud computing combines relatively independent com-puting technology and network technology superimposesdistributed computing power on a converged networkplatform and effectively integrates network and computingresources by virtualization technology which is a majorbreakthrough in IT technology Fog computing is an ex-tension and a powerful complement of cloud computingFigure 1 shows the architecture of cloud-fog computing Itincludes the factory terminal equipment layer fog com-puting layer and cloud computing layer -e main task ofthe factory terminal equipment layer is to collect data andupload it to the fog -e fog computing layer the middlelayer in this architecture plays an important role betweenthe cloud computing layer and the fog nodes -e fog nodeshave a given storage capacity and computing capacity -eintroduction of fog computing can reduce the cloud com-puting layer and improve work efficiency -e cloud com-puting layer has a high storage capacity and computingpower Because non-real-time processed data cannot becompletely stored in the cloud we adopt the cloud-fogcollaborative storage scheme In this study cloud-fog col-laborative storage means that the cloud nodes and the fognodes store corresponding data according to the storagestrategy that is some data is stored in the fog nodes andother data is stored in the cloud nodes
To solve the problem of secure storage for real-time andnon-real-time sensitive industrial data this study designs athree-layer protection framework with localfogcloudstorage for sensitive industrial data For real-time sensitiveindustrial data we use the improved local differential pri-vacy algorithmM-RAPPOR to perturb sensitive information(location information etc) and then encode the maskeddata in local equipment In addition we adopt the optimalscheme of distributed storage in local equipment to realizelow-cost high-efficiency data protection For non-real-time
sensitive industrial data we adopt cloud-fog collaborativestorage to achieve multilevel protection -is frameworkgreatly improves security and restorability and relieves thestorage pressure of local equipment
-e remainder of this paper is organized as followsSection 2 reviews related works Section 3 describes theprotection framework for sensitive data and defines thethreat model Section 4 presents experiments for two dif-ferent schemes and evaluates the results through severalindicators Finally Section 5 presents our conclusions
2 Related Work
With the continuous developments in information tech-nology in the era of big data the problem of data security hasattracted increasing attention How to ensure that sensitivedata is not leaked is a major challenge at present Cloud-fogcollaboration is a new solution to meet the current needs ofInternet of -ings (IoT) applications However there is arisk of sensitive-data leakage in both cloud computing andfog computing At present therefore data security hasemerged as an important issue for both academia and in-dustry As a result experts and researchers have proposedtheir own opinions and data protection programs -issection introduces the current research results from twoaspectsmdashthe protection of sensitive data and cloud-fogcollaborative computing
21 Protection of Sensitive Data
211 Protection of Sensitive Data Based on DifferentialPrivacy Dwork and Lei [5] proposed differential privacytechniques as a privacy protection model that defines an
Fog nodes
Fog server
Cloud
Fog
Factory 1 Factory 2 Factory N Factory 1 Factory N
Figure 1 Architecture of cloud-fog computing
2 Security and Communication Networks
extremely rigorous attack model that operates withoutregard on how much background knowledge the attackerhas Kenthapadi et al [6] used differential privacy to addresssensitive-data protection issues for collective user behaviorMohammed et al [7] proposed a differential privacy algo-rithm for private data distribution and vertical partitiondata In a scenario where third-party data collectors are nottrusted Duchi et al [8] proposed local differential privacywhere every user understands their privacy protectionprocess and can individually process and protect personalsensitive information Erlingsson [9] developed the Ran-domized Aggregatable Privacy-Preserving Ordinal Response(RAPPOR) privacy algorithm that enables Google to collectuser answers to questions such as a browserrsquos default homepage and default search engine to understand the unwel-come or malicious hijacking of user settings Gu et al [10]proposed a privacy protection mining (DIFF-PPM) algo-rithm combined with the Markov chain Monte Carlo al-gorithm to provide privacy protection and maintain dataavailability when the (ε δ)-differential privacy conditions aremet
Most of these studies have aimed to protect usersrsquopersonal sensitive data and they have not applied differentialprivacy to the industrial field In addition the weakness ofdifferential privacy is obvious because the assumptionsabout background knowledge are too strong a large amountof randomization needs to be added to the query resultsresulting in a sharp decline in data availability -ereforehow to balance the level of sensitive-data protection and dataavailability according to specific application scenarios re-mains unclear and challenging in research on differentialprivacy applications
212 Sensitive Data Protection Based on Fog ComputingXu et al [11] introduced local differential privacy into fogcomputing and proposed a framework for local differentialprivacy protection however they did not mention specificimplementation details and privacy algorithms Wang et al[12] proposed a three-layer privacy protection frameworkhowever they did not consider the problem of localequipment failure in which case users would not be able toquery complete data Du et al [13] proposed a fog com-puting support data center query model based on differentialprivacy and proved through rigorous mathematical de-duction that it ensured the reliability and effectiveness ofprivacy protection Lyu et al [14] proposed the PPFA pri-vacy protection aggregation system that uses the stability of aGauss mechanism to ensure the differential privacy of thestatistical results and reduces the loss of privacy by com-bining a stream cipher with a public key cipher to maintainpracticability Huang et al [15] proposed an attribute-basedencryption scheme that makes full use of fog servers -ecollected data is encrypted by fog servers and then out-sourced to cloud servers -e experimental results show thatthe functions of the scheme are limited because it does notsupport efficient data search and the fog server takes up alarge part of the workload Lu et al [16] proposed theLightweight Privacy-preserving Data Aggregation (LPDA)
scheme Experiments indicated that this scheme was highlyefficient in terms of computational cost and communicationoverhead Wang et al [17] proposed a fog computing querymodel based on differential privacy Compared with thetraditional privacy protection model this model affordsdifferent degrees of improvement in computational over-head execution efficiency and energy consumption Kul-karni et al [18] proposed a privacy protection framework toprotect the data security of sensors between the terminalequipment and fog networks -is scheme introduced aselective public key cryptosystem that can resist internalattacks However they did not consider the computationaloverhead and the fog nodes may overload and collapse if thesensor transmits data for a long time
-e above studies introduced fog computing to protectsensitive data however most of them ignored the problemof the limited computational power and storage capacity offog nodes Many data protection schemes based on fogcomputing do a lot of work on the fog side this directly leadsto increased computational overhead and energy con-sumption resulting in jitter or even downtime of fog nodes
213 Protection of Sensitive Data in Cloud ComputingIn recent years data security in cloud storage has attractedattention in industry and academia Many experts and re-searchers have proposed schemes to address this issue Houet al [19] proposed protection schemes that use encrypteddata by the system when data has been transferred by SSLFeng [20] proposed a method in which data is encrypted in aclosed cloud environment to solve the problem of a dataleakage caused by the increased burden of the cloud serverIn addition one-time encryption and multipoint securestorage can be realized However encryption makes it moredifficult to search in the cloud At present a key issue in thefield of cloud computing is searchable encryption Fu et al[21ndash23] and Xia et al [24] provided different solutions to thisproblem that improved accuracy safety and efficiencyKulkarni et al [25] designed a virtual private storage servicebased on a newly developed encryption technology -isservice achieved the best combination of private cloud se-curity and public cloud functionality Wang et al [26]proposed that users do not have actual ownership of out-sourced data this makes data integrity protection in cloudcomputing a difficult task Shen et al [27] proposed anefficient public auditing protocol that includes a doubly-linked information table and a location array Experimentsshow that this protocol can reduce computational andcommunication overheads and achieve certain efficiencies
-e above studies improved the use of cloud storage forthe protection of sensitive data However they suffer from acommon problem they cannot provide a defense againstinternal attacks from the cloud
22 Cloud-Fog Collaborative Computing At present theacademic community mainly solves the problem of taskscheduling through cloud-fog coordination Pham and Huh[28] considered task scheduling in the cloud-fog collabo-ration computing system they proposed a task scheduling
Security and Communication Networks 3
algorithm that guarantees the performance of applicationexecution However they did not consider the fog providerrsquosbudget Deng et al [29] studied the balance between powerconsumption and delay in cloud-fog collaborative com-puting systems -ey first modeled the power consumptionand delay functions of various parts of the system andformulated allocation problems according to the workload-en they developed an approximate solution to theoriginal problem by decomposition and separately formu-lated three subproblems of the corresponding subsystemExperiments showed that fog computing could reducecommunication delay and significantly improve cloudcomputing performance Bierzynski et al [30] believed thatcloud-fog collaboration can solve most problems in IoT anddiscussed four possible ways to distribute workloads be-tween different levels
-e above studies solved the task scheduling problemthrough cloud-fog collaboration however they did notconsider network security and data protection issues Inaddition simply relying on fog computing or cloud com-puting cannot provide more secure protection for industrialdata
In summary the studies discussed in this section solvedthe data protection problem and task scheduling problem tosome extent However for industrial applications with botha large amount of data and a high degree of data protectiontheir results cannot meet the requirements of the IndustrialInternet update speed high data accuracy and large datavolume -us far academic and industrial research on theprotection of sensitive data in the Industrial Internet of-ings (IIoT) remains in its infancy In this study wepropose a protection model for sensitive data that provideshigher protection and is more adaptable to industrialenvironments
3 Sensitive Industrial Data Protection Scheme
Section 31 introduces our protection framework for sen-sitive industrial data and the threat model Sections 32 and33 describe the protection of data for real-time and non-real-time processing respectively
31 Our Work and Contribution
311 Sensitive Industrial Data Protection Framework-is study designs a three-layer protection framework withlocalfogcloud storage for sensitive industrial data asshown in Figure 2 -is framework mainly consists of threemodules M-RAPPOR disturbance module ReedndashSolomon(RS) encoding module and Advanced Encryption Standard(AES) encryption and RS encoding module -is frameworknot only achieves low-cost high-efficiency and intelligentdata protection but also alleviates the storage pressure oflocal equipment through improved recoverability
We divide industrial data into real-time and non-real-time sensitive data depending on its characteristics
(1) For real-time sensitive data we design a data pro-tection scheme based on local differential privacy
and RS encoding We apply the M-RAPPOR algo-rithm to the sensitive data (location information)apply RS encoding to the perturbed data and theundisturbed nonsensitive data and store them inlocal equipment Considering the storage capacity oflocal equipment and the problem that data cannot berecovered owing to local equipment failures weadopt the optimal solution of distributed storage inthe local equipment and add corresponding re-strictions to the RS encoding which not only solvesthe data recovery problem due to the failure of localequipment but also improves the encoding efficiencyand reduces the computational cost If we need toperform predictive maintenance analysis we candecode the data and then analyze it
(2) For non-real-time sensitive data we design a dataprotection scheme based on AES encryption and RSencoding -e local equipment encrypts the non-real-time data using AES uploads the ciphertext tothe fog nodes and performs RS encoding of the dataon the fog nodes Next the encoded data is stored inthe fog nodes and the cloud nodes according to thecloud-fog collaborative storage strategy In this wayif an attempt is made to steal the data from the fognodes or the cloud nodes all the data cannot betaken
312 6reat Model Figure 3 shows the threat model that isdefined according to the data protection framework
-ere are two threats to industrial data
(1) For real-time sensitive data attackers can attack thelocal data center -ey can steal location informationof equipment to infer the general plant layout -elayout is a type of sensitive industrial data andtherefore this attack will lead to the leakage of suchdata
(2) For non-real-time sensitive data attackers (internalpersonnel of the third-party) can abuse data miningto infer customersrsquo private information and thefactoryrsquos business strategy from the factoryrsquos cus-tomer information and marketing plans respec-tively -is will not only threaten customersrsquo privacybut also affect the factoryrsquos overall interests
32 Protection of Real-Time Industrial Data
321 Local Differential Privacy Local differential privacyconsiders scenarios in which third-party data collectors areuntrustworthy Each user perturbs the data according to theprivacy algorithm and then uploads the perturbed data to thedata collector Local differential privacy is formally definedas follows
Definition 1 -ere are n users where each user correspondsto a record A privacy algorithm M is given having a domainDom(M) and a range Ran(M) If algorithm M obtains thesame output result T(TsubeRan(M)) on any two records t and
4 Security and Communication Networks
tprime(t tprime isin Dom(M)) and satisfies the following inequalitiesthen M satisfies local differential privacy
Pr[M(t) T]le eε
times Pr M tprime( 1113857 T1113858 1113859 (1)
As can be seen from Definition 1 the local differentialprivacy ensures that the algorithm M satisfies the ε-localdifferential privacy by controlling the similarity of theoutput results of any two records In short according to the
AES encryption+RS encoding
Real-timeindustrial
sensitive data
Non-real-timeindustrial
sensitive data
M-RAPPOR disturbance
Sensitivedata
(locationinformation)
Non-sensitive data
Industrialcloud
Fog nodes
Cloud
Fog
Localequipment
LocalRSencoding
Data
Figure 2 Protection framework for sensitive industrial data
General plantlayout
Location data
1 Customer private information
2 Factory business strategy
Sensor Sensor Sensor Sensor SensorSensor
Local data center
Fog
Cloud
Local equipmentLocal equipment
Figure 3 -reat model of sensitive industrial data
Security and Communication Networks 5
output result of the privacy algorithm M it is almostimpossible to infer which record the input data is Withlocal differential privacy technology each user can processindividual data independently In other words the processof privacy processing is transferred from the data collectorto a single user and therefore there is no need for trustedthird-party intervention -is can avoid the attacks thatmay be brought on by untrusted third-party data collectors-e implementation of local differential privacy protectionrequires the intervention of a data disturbance mechanismRandom response technology is the mainstream distur-bance mechanism for local differential privacy protectiontechnology
322 ReedndashSolomon Encoding RS encoding is a matrixoperation As shown in Figure 4 the input data is treated asmatrix C (C1 C2 Cn) and the encoding matrix B ismultiplied by matrix C to obtain data blocks(C1 C2 Cn) and y redundant data blocks (n 5 y 3)-e encoding matrix B must have reversibility of anysubmatrix After RS encoding the data is divided into n partsand y redundant data are generated In the data of thesen + y parts one can recover the total data with at least n datavalues In other words the total data cannot be recoveredwith less than n data values
323 Data Protection Scheme Based on Local DifferentialPrivacy and ReedndashSolomon Encoding With continuousdevelopments in industries many factories have startedusing equipment such as CNC machine tools to increaseproductivity However some problems have also beenexposed For example a serious failure of a CNC machinetool will not only lead to the shutdown of the entireproduction line but also seriously delay the product pro-duction cycle -erefore to avoid this situation the factoryshould analyze the data collected by sensors predictpossible situations and warn operators or adjust theequipment according to the forecasted situation in a timelymanner to avoid an accidental shutdown of the equipmentHowever the data contains not only real-time data col-lected by the sensor but also a large amount of sensitivedata of the factory equipment such as node location andspatial coordinates -is type of sensitive data related tolocations should not be leaked before predictive mainte-nance analysis -erefore it needs to be desensitized beforedata analysis to eliminate the correspondence between theequipment and the location In addition because the real-time data information collected by factory sensors is ofgreat value and high confidentiality we cannot upload thisdata to the fog or the cloud it must be stored in localequipment -e protection scheme for real-time sensitivedata is described in detail below
We use local differential privacy to desensitize real-timesensitive data -en we encode the perturbed data andundisturbed nonsensitive data by RS encoding and store it inlocal equipment Figure 5 shows the protection process forreal-time data
-e real-time data collected by the factory sensors is firstuploaded to the local equipment-en the following processis performed in the local equipment
(1) Data Disturbance with Local Differential Privacy Becauseof the limited computing power of local equipment it cannotperform complex computing tasks -erefore it is very im-portant to design a lightweight privacy protection algorithmLocal differential privacy considers the untrustworthiness of thedata collector and its disturbancemechanismmainly includes arandom response information compression and distortion-e disturbance framework of the random response technologyis simple and intuitive and the disturbance level can be directlyquantified -erefore we choose local differential privacy toprotect the sensitive data We refer to the basic idea of theRAPPOR algorithm which is an open-source project of Googleand improve RAPPOR to reduce the computational cost of localequipment while ensuring privacy and data availability In thispaper we label the improved algorithm asM-RAPPOR and weuse it to disturb the location information Section 325 discussesthe specific disturbance steps
(2) RS Encoding and Storage We apply RS encoding to theperturbed data and undisturbed nonsensitive data and store thedata in local equipment However owing to the continuousproduction of data in factories and the limited storage capacityof local equipment the storage pressure on equipment isgradually increasing In addition if the local equipment failsunexpectedly it will not be possible to completely restore thedata resulting in serious losses to the factory To solve these twoproblems we adopt the optimal solution namely distributedstorage in the local equipment and add corresponding re-strictions to RS -is not only solves the problem of the highstorage pressure on local equipment and the fact that datacannot be recovered owing to the failure of the local equipmentbut also improves the encoding efficiency and reduces thecomputational cost Section 324 discusses the specific dis-tributed storage strategy
(3) RS Decoding and Analysis When predictive maintenanceanalysis is performed the data stored in the local equipmentare decoded and the analysis is performed according to thestatistical location distribution and data Section 325 dis-cusses the specific statistical process
lowast =
C
B
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
B11 B13 B14 B15B12 D1
D2
D3
C1
C2
C3
C4
C5
C1
C2
C3
C4
C5
B11 B13 B14 B15B12
B11 B13 B14 B15B12
Figure 4 RS encoding process
6 Security and Communication Networks
324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is
Zi b times Mi
M1 + M2 + middot middot middot + Mn
(2)
To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality
b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
lemlt b (3)
In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is
m b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
(4)
325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability
-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made
(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection
(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding
Masked data
Sensitive data(location space
coordinates)
Nonsensitivedata
Predictivemaintenance
analysis
DesensitizationdataAnalysis result
M-RAPPOR
RS-encoding
Local
Data processing Storage
Industrialproduction
line
Storage
RS decoding
Figure 5 Protection process for real-time data
Security and Communication Networks 7
attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased
-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)
(1) Data Sender
(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1
(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value
P Biprime x( 1113857
f1 x 1
f2 x 0
1 minus f1 minus f2 x Bi
⎧⎪⎪⎨
⎪⎪⎩(5)
To satisfy ε-local differential privacy the privacy budget εis calculated
ε ln1 minus f2
f1 (6)
Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once
(2) Data Collector
(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime
(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows
ni N f2 minus 2f1f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2+
Di f1 minus 1 + f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2
(7)
(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value
In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased
326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data
According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)
attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons
(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation
(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists
(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni
(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location
In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data
33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes
331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment
8 Security and Communication Networks
-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection
332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967
represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc
Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud
nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as
Ttotal Tcloud + Tfog (8)
where Tcloud is the storage time required for the cloud andTfog that for the fog
-e storage time required for each cloud node isexpressed as
Tck tck middot (k minus 1) middotSck
1113936nj1 Scj
(9)
where Tck is the storage time required for cloud node Nck
and tckis the storage time required when cloud node Nck
stores each data block-erefore the storage time Tcloud required for the cloud
is expressed as
Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)
In addition we analyze the two storage methods for thefog nodes as follows
(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as
AES
DATA
Some data
RS encoding
Other data
Local
Equipment 1 Equipment 2 hellip Equipment n
Fog
Cloud
Figure 6 Protection process for data that does not require real-time processing
Security and Communication Networks 9
Tfog
tf1 middot (x + 1) 0lt x + 1leF1 max
tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max
tfn middot x + 1 minus 1113936nminus1
i1Fi max1113888 1113889 + 1113936
nminus1
i1tfi middot Fi max 1113936
nminus1
i1Fi max ltx + 1le 1113936
n
i1Fi max
⎧⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎩
(11)
where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi
(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as
Tfk tfk middot (x + 1) middotSfk
1113936ni1 Sfi
(12)
-erefore the storage time Tfog required for the fog isexpressed as
Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)
After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)
333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix
Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory
34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency
341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field
in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values
Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system
342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as
Est n
n + m (14)
-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes
4 Results and Discussion
41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments
411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a
10 Security and Communication Networks
normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters
412 Experimental Results
(1) M-RAPPOR
(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable
(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR
-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value
-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment
(2) ReedndashSolomon EncodingRS encoding consists of the following steps
(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the
Table 1 Degree of difficulty for cracking encoding matrix
Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248
8 6 2 296
16 6 1 296
16 6 2 2192
Table 2 Comparison of processing time for different word lengths
Word size of Galoisfield
Number of source datablocks k
Time(ms)
8 20 19216 20 1928 200 147116 200 1472
Table 3 Environmental parameters (real-time data)
Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB
RAPPORM-RAPPOR
200 300 400 500 600 700 800 900 1000100Number of data (thousand)
0
20
40
60
80
100
120
Tim
e (m
s)
Figure 7 Computational cost of RAPPOR and M-RAPPOR
Security and Communication Networks 11
configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance
(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In
addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability
(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n
6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According
0
000
001
Prop
ortio
n 002
003
50 100 150 0
000
001
Prop
ortio
n
002
003
50 100 150 200
0
0000
0005
0010
0015
0020
Prop
ortio
n
0025
50 100 150
True valueStatistical value
True valueStatistical value
True valueStatistical value
200
f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2
f1 = 001 f2 = 001 ε = 46
Figure 8 Frequency statistics results under different privacy budgets
12 Security and Communication Networks
to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the
corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed
42 Experiments on Protection of Non-Real-TimeIndustrial Data
421 Experimental Configuration
(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes
(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool
422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60
Equalized storageNonequalized storage
18 21 24 27 30 33 36 39 42 4515Number of data blocks
0
50
100
150
200
250
Dec
odin
g tim
e (m
s)
Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time
Encoding time (ms)Decoding time (ms)
5 10 15 20 25 300Number of data blocks
100
150
200
250
300
350
400
450
Tim
e (m
s)
Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks
1 2 3 4 5 60Damaged device number
80
100
120
140
160
180
200
220
240
260
280
Dec
odin
g tim
e (m
s)
Figure 11 Relationship between different local equipment failuresand decoding time
Table 4 Environmental parameters (non-real-time data)
Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB
Security and Communication Networks 13
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 2: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/2.jpg)
sensor data in real time predicting possible situations andadjusting the equipment according to the predicted situationin a timely manner Sensor information such as node lo-cations and the spatial coordinates of the sensors themselvesalso constitute sensitive data We label this as real-timesensitive data To prevent the leakage of sensitive infor-mation and to ensure data security a secure storage methodis needed to store real-time sensitive data includingequipment locations safely and without affecting dataavailability
In addition a large amount of sensitive data is notprocessed in real time including a companyrsquos financial datainventory data production data marketing plans customerinformation intellectual property rights and supplier in-formation [4] We label this as non-real-time sensitive data-is data is usually stored with third-party cloud serviceproviders who may claim that the data is encrypted butcannot verify the encryption To prevent data leakage wecannot store the non-real-time sensitive data in the cloud-us for non-real-time sensitive data it is necessary todesign a data protection scheme that can fully utilize cloudstorage and ensure data security
-e recent development of cloud-fog collaborativecomputing provides a new approach to solve this problemCloud computing combines relatively independent com-puting technology and network technology superimposesdistributed computing power on a converged networkplatform and effectively integrates network and computingresources by virtualization technology which is a majorbreakthrough in IT technology Fog computing is an ex-tension and a powerful complement of cloud computingFigure 1 shows the architecture of cloud-fog computing Itincludes the factory terminal equipment layer fog com-puting layer and cloud computing layer -e main task ofthe factory terminal equipment layer is to collect data andupload it to the fog -e fog computing layer the middlelayer in this architecture plays an important role betweenthe cloud computing layer and the fog nodes -e fog nodeshave a given storage capacity and computing capacity -eintroduction of fog computing can reduce the cloud com-puting layer and improve work efficiency -e cloud com-puting layer has a high storage capacity and computingpower Because non-real-time processed data cannot becompletely stored in the cloud we adopt the cloud-fogcollaborative storage scheme In this study cloud-fog col-laborative storage means that the cloud nodes and the fognodes store corresponding data according to the storagestrategy that is some data is stored in the fog nodes andother data is stored in the cloud nodes
To solve the problem of secure storage for real-time andnon-real-time sensitive industrial data this study designs athree-layer protection framework with localfogcloudstorage for sensitive industrial data For real-time sensitiveindustrial data we use the improved local differential pri-vacy algorithmM-RAPPOR to perturb sensitive information(location information etc) and then encode the maskeddata in local equipment In addition we adopt the optimalscheme of distributed storage in local equipment to realizelow-cost high-efficiency data protection For non-real-time
sensitive industrial data we adopt cloud-fog collaborativestorage to achieve multilevel protection -is frameworkgreatly improves security and restorability and relieves thestorage pressure of local equipment
-e remainder of this paper is organized as followsSection 2 reviews related works Section 3 describes theprotection framework for sensitive data and defines thethreat model Section 4 presents experiments for two dif-ferent schemes and evaluates the results through severalindicators Finally Section 5 presents our conclusions
2 Related Work
With the continuous developments in information tech-nology in the era of big data the problem of data security hasattracted increasing attention How to ensure that sensitivedata is not leaked is a major challenge at present Cloud-fogcollaboration is a new solution to meet the current needs ofInternet of -ings (IoT) applications However there is arisk of sensitive-data leakage in both cloud computing andfog computing At present therefore data security hasemerged as an important issue for both academia and in-dustry As a result experts and researchers have proposedtheir own opinions and data protection programs -issection introduces the current research results from twoaspectsmdashthe protection of sensitive data and cloud-fogcollaborative computing
21 Protection of Sensitive Data
211 Protection of Sensitive Data Based on DifferentialPrivacy Dwork and Lei [5] proposed differential privacytechniques as a privacy protection model that defines an
Fog nodes
Fog server
Cloud
Fog
Factory 1 Factory 2 Factory N Factory 1 Factory N
Figure 1 Architecture of cloud-fog computing
2 Security and Communication Networks
extremely rigorous attack model that operates withoutregard on how much background knowledge the attackerhas Kenthapadi et al [6] used differential privacy to addresssensitive-data protection issues for collective user behaviorMohammed et al [7] proposed a differential privacy algo-rithm for private data distribution and vertical partitiondata In a scenario where third-party data collectors are nottrusted Duchi et al [8] proposed local differential privacywhere every user understands their privacy protectionprocess and can individually process and protect personalsensitive information Erlingsson [9] developed the Ran-domized Aggregatable Privacy-Preserving Ordinal Response(RAPPOR) privacy algorithm that enables Google to collectuser answers to questions such as a browserrsquos default homepage and default search engine to understand the unwel-come or malicious hijacking of user settings Gu et al [10]proposed a privacy protection mining (DIFF-PPM) algo-rithm combined with the Markov chain Monte Carlo al-gorithm to provide privacy protection and maintain dataavailability when the (ε δ)-differential privacy conditions aremet
Most of these studies have aimed to protect usersrsquopersonal sensitive data and they have not applied differentialprivacy to the industrial field In addition the weakness ofdifferential privacy is obvious because the assumptionsabout background knowledge are too strong a large amountof randomization needs to be added to the query resultsresulting in a sharp decline in data availability -ereforehow to balance the level of sensitive-data protection and dataavailability according to specific application scenarios re-mains unclear and challenging in research on differentialprivacy applications
212 Sensitive Data Protection Based on Fog ComputingXu et al [11] introduced local differential privacy into fogcomputing and proposed a framework for local differentialprivacy protection however they did not mention specificimplementation details and privacy algorithms Wang et al[12] proposed a three-layer privacy protection frameworkhowever they did not consider the problem of localequipment failure in which case users would not be able toquery complete data Du et al [13] proposed a fog com-puting support data center query model based on differentialprivacy and proved through rigorous mathematical de-duction that it ensured the reliability and effectiveness ofprivacy protection Lyu et al [14] proposed the PPFA pri-vacy protection aggregation system that uses the stability of aGauss mechanism to ensure the differential privacy of thestatistical results and reduces the loss of privacy by com-bining a stream cipher with a public key cipher to maintainpracticability Huang et al [15] proposed an attribute-basedencryption scheme that makes full use of fog servers -ecollected data is encrypted by fog servers and then out-sourced to cloud servers -e experimental results show thatthe functions of the scheme are limited because it does notsupport efficient data search and the fog server takes up alarge part of the workload Lu et al [16] proposed theLightweight Privacy-preserving Data Aggregation (LPDA)
scheme Experiments indicated that this scheme was highlyefficient in terms of computational cost and communicationoverhead Wang et al [17] proposed a fog computing querymodel based on differential privacy Compared with thetraditional privacy protection model this model affordsdifferent degrees of improvement in computational over-head execution efficiency and energy consumption Kul-karni et al [18] proposed a privacy protection framework toprotect the data security of sensors between the terminalequipment and fog networks -is scheme introduced aselective public key cryptosystem that can resist internalattacks However they did not consider the computationaloverhead and the fog nodes may overload and collapse if thesensor transmits data for a long time
-e above studies introduced fog computing to protectsensitive data however most of them ignored the problemof the limited computational power and storage capacity offog nodes Many data protection schemes based on fogcomputing do a lot of work on the fog side this directly leadsto increased computational overhead and energy con-sumption resulting in jitter or even downtime of fog nodes
213 Protection of Sensitive Data in Cloud ComputingIn recent years data security in cloud storage has attractedattention in industry and academia Many experts and re-searchers have proposed schemes to address this issue Houet al [19] proposed protection schemes that use encrypteddata by the system when data has been transferred by SSLFeng [20] proposed a method in which data is encrypted in aclosed cloud environment to solve the problem of a dataleakage caused by the increased burden of the cloud serverIn addition one-time encryption and multipoint securestorage can be realized However encryption makes it moredifficult to search in the cloud At present a key issue in thefield of cloud computing is searchable encryption Fu et al[21ndash23] and Xia et al [24] provided different solutions to thisproblem that improved accuracy safety and efficiencyKulkarni et al [25] designed a virtual private storage servicebased on a newly developed encryption technology -isservice achieved the best combination of private cloud se-curity and public cloud functionality Wang et al [26]proposed that users do not have actual ownership of out-sourced data this makes data integrity protection in cloudcomputing a difficult task Shen et al [27] proposed anefficient public auditing protocol that includes a doubly-linked information table and a location array Experimentsshow that this protocol can reduce computational andcommunication overheads and achieve certain efficiencies
-e above studies improved the use of cloud storage forthe protection of sensitive data However they suffer from acommon problem they cannot provide a defense againstinternal attacks from the cloud
22 Cloud-Fog Collaborative Computing At present theacademic community mainly solves the problem of taskscheduling through cloud-fog coordination Pham and Huh[28] considered task scheduling in the cloud-fog collabo-ration computing system they proposed a task scheduling
Security and Communication Networks 3
algorithm that guarantees the performance of applicationexecution However they did not consider the fog providerrsquosbudget Deng et al [29] studied the balance between powerconsumption and delay in cloud-fog collaborative com-puting systems -ey first modeled the power consumptionand delay functions of various parts of the system andformulated allocation problems according to the workload-en they developed an approximate solution to theoriginal problem by decomposition and separately formu-lated three subproblems of the corresponding subsystemExperiments showed that fog computing could reducecommunication delay and significantly improve cloudcomputing performance Bierzynski et al [30] believed thatcloud-fog collaboration can solve most problems in IoT anddiscussed four possible ways to distribute workloads be-tween different levels
-e above studies solved the task scheduling problemthrough cloud-fog collaboration however they did notconsider network security and data protection issues Inaddition simply relying on fog computing or cloud com-puting cannot provide more secure protection for industrialdata
In summary the studies discussed in this section solvedthe data protection problem and task scheduling problem tosome extent However for industrial applications with botha large amount of data and a high degree of data protectiontheir results cannot meet the requirements of the IndustrialInternet update speed high data accuracy and large datavolume -us far academic and industrial research on theprotection of sensitive data in the Industrial Internet of-ings (IIoT) remains in its infancy In this study wepropose a protection model for sensitive data that provideshigher protection and is more adaptable to industrialenvironments
3 Sensitive Industrial Data Protection Scheme
Section 31 introduces our protection framework for sen-sitive industrial data and the threat model Sections 32 and33 describe the protection of data for real-time and non-real-time processing respectively
31 Our Work and Contribution
311 Sensitive Industrial Data Protection Framework-is study designs a three-layer protection framework withlocalfogcloud storage for sensitive industrial data asshown in Figure 2 -is framework mainly consists of threemodules M-RAPPOR disturbance module ReedndashSolomon(RS) encoding module and Advanced Encryption Standard(AES) encryption and RS encoding module -is frameworknot only achieves low-cost high-efficiency and intelligentdata protection but also alleviates the storage pressure oflocal equipment through improved recoverability
We divide industrial data into real-time and non-real-time sensitive data depending on its characteristics
(1) For real-time sensitive data we design a data pro-tection scheme based on local differential privacy
and RS encoding We apply the M-RAPPOR algo-rithm to the sensitive data (location information)apply RS encoding to the perturbed data and theundisturbed nonsensitive data and store them inlocal equipment Considering the storage capacity oflocal equipment and the problem that data cannot berecovered owing to local equipment failures weadopt the optimal solution of distributed storage inthe local equipment and add corresponding re-strictions to the RS encoding which not only solvesthe data recovery problem due to the failure of localequipment but also improves the encoding efficiencyand reduces the computational cost If we need toperform predictive maintenance analysis we candecode the data and then analyze it
(2) For non-real-time sensitive data we design a dataprotection scheme based on AES encryption and RSencoding -e local equipment encrypts the non-real-time data using AES uploads the ciphertext tothe fog nodes and performs RS encoding of the dataon the fog nodes Next the encoded data is stored inthe fog nodes and the cloud nodes according to thecloud-fog collaborative storage strategy In this wayif an attempt is made to steal the data from the fognodes or the cloud nodes all the data cannot betaken
312 6reat Model Figure 3 shows the threat model that isdefined according to the data protection framework
-ere are two threats to industrial data
(1) For real-time sensitive data attackers can attack thelocal data center -ey can steal location informationof equipment to infer the general plant layout -elayout is a type of sensitive industrial data andtherefore this attack will lead to the leakage of suchdata
(2) For non-real-time sensitive data attackers (internalpersonnel of the third-party) can abuse data miningto infer customersrsquo private information and thefactoryrsquos business strategy from the factoryrsquos cus-tomer information and marketing plans respec-tively -is will not only threaten customersrsquo privacybut also affect the factoryrsquos overall interests
32 Protection of Real-Time Industrial Data
321 Local Differential Privacy Local differential privacyconsiders scenarios in which third-party data collectors areuntrustworthy Each user perturbs the data according to theprivacy algorithm and then uploads the perturbed data to thedata collector Local differential privacy is formally definedas follows
Definition 1 -ere are n users where each user correspondsto a record A privacy algorithm M is given having a domainDom(M) and a range Ran(M) If algorithm M obtains thesame output result T(TsubeRan(M)) on any two records t and
4 Security and Communication Networks
tprime(t tprime isin Dom(M)) and satisfies the following inequalitiesthen M satisfies local differential privacy
Pr[M(t) T]le eε
times Pr M tprime( 1113857 T1113858 1113859 (1)
As can be seen from Definition 1 the local differentialprivacy ensures that the algorithm M satisfies the ε-localdifferential privacy by controlling the similarity of theoutput results of any two records In short according to the
AES encryption+RS encoding
Real-timeindustrial
sensitive data
Non-real-timeindustrial
sensitive data
M-RAPPOR disturbance
Sensitivedata
(locationinformation)
Non-sensitive data
Industrialcloud
Fog nodes
Cloud
Fog
Localequipment
LocalRSencoding
Data
Figure 2 Protection framework for sensitive industrial data
General plantlayout
Location data
1 Customer private information
2 Factory business strategy
Sensor Sensor Sensor Sensor SensorSensor
Local data center
Fog
Cloud
Local equipmentLocal equipment
Figure 3 -reat model of sensitive industrial data
Security and Communication Networks 5
output result of the privacy algorithm M it is almostimpossible to infer which record the input data is Withlocal differential privacy technology each user can processindividual data independently In other words the processof privacy processing is transferred from the data collectorto a single user and therefore there is no need for trustedthird-party intervention -is can avoid the attacks thatmay be brought on by untrusted third-party data collectors-e implementation of local differential privacy protectionrequires the intervention of a data disturbance mechanismRandom response technology is the mainstream distur-bance mechanism for local differential privacy protectiontechnology
322 ReedndashSolomon Encoding RS encoding is a matrixoperation As shown in Figure 4 the input data is treated asmatrix C (C1 C2 Cn) and the encoding matrix B ismultiplied by matrix C to obtain data blocks(C1 C2 Cn) and y redundant data blocks (n 5 y 3)-e encoding matrix B must have reversibility of anysubmatrix After RS encoding the data is divided into n partsand y redundant data are generated In the data of thesen + y parts one can recover the total data with at least n datavalues In other words the total data cannot be recoveredwith less than n data values
323 Data Protection Scheme Based on Local DifferentialPrivacy and ReedndashSolomon Encoding With continuousdevelopments in industries many factories have startedusing equipment such as CNC machine tools to increaseproductivity However some problems have also beenexposed For example a serious failure of a CNC machinetool will not only lead to the shutdown of the entireproduction line but also seriously delay the product pro-duction cycle -erefore to avoid this situation the factoryshould analyze the data collected by sensors predictpossible situations and warn operators or adjust theequipment according to the forecasted situation in a timelymanner to avoid an accidental shutdown of the equipmentHowever the data contains not only real-time data col-lected by the sensor but also a large amount of sensitivedata of the factory equipment such as node location andspatial coordinates -is type of sensitive data related tolocations should not be leaked before predictive mainte-nance analysis -erefore it needs to be desensitized beforedata analysis to eliminate the correspondence between theequipment and the location In addition because the real-time data information collected by factory sensors is ofgreat value and high confidentiality we cannot upload thisdata to the fog or the cloud it must be stored in localequipment -e protection scheme for real-time sensitivedata is described in detail below
We use local differential privacy to desensitize real-timesensitive data -en we encode the perturbed data andundisturbed nonsensitive data by RS encoding and store it inlocal equipment Figure 5 shows the protection process forreal-time data
-e real-time data collected by the factory sensors is firstuploaded to the local equipment-en the following processis performed in the local equipment
(1) Data Disturbance with Local Differential Privacy Becauseof the limited computing power of local equipment it cannotperform complex computing tasks -erefore it is very im-portant to design a lightweight privacy protection algorithmLocal differential privacy considers the untrustworthiness of thedata collector and its disturbancemechanismmainly includes arandom response information compression and distortion-e disturbance framework of the random response technologyis simple and intuitive and the disturbance level can be directlyquantified -erefore we choose local differential privacy toprotect the sensitive data We refer to the basic idea of theRAPPOR algorithm which is an open-source project of Googleand improve RAPPOR to reduce the computational cost of localequipment while ensuring privacy and data availability In thispaper we label the improved algorithm asM-RAPPOR and weuse it to disturb the location information Section 325 discussesthe specific disturbance steps
(2) RS Encoding and Storage We apply RS encoding to theperturbed data and undisturbed nonsensitive data and store thedata in local equipment However owing to the continuousproduction of data in factories and the limited storage capacityof local equipment the storage pressure on equipment isgradually increasing In addition if the local equipment failsunexpectedly it will not be possible to completely restore thedata resulting in serious losses to the factory To solve these twoproblems we adopt the optimal solution namely distributedstorage in the local equipment and add corresponding re-strictions to RS -is not only solves the problem of the highstorage pressure on local equipment and the fact that datacannot be recovered owing to the failure of the local equipmentbut also improves the encoding efficiency and reduces thecomputational cost Section 324 discusses the specific dis-tributed storage strategy
(3) RS Decoding and Analysis When predictive maintenanceanalysis is performed the data stored in the local equipmentare decoded and the analysis is performed according to thestatistical location distribution and data Section 325 dis-cusses the specific statistical process
lowast =
C
B
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
B11 B13 B14 B15B12 D1
D2
D3
C1
C2
C3
C4
C5
C1
C2
C3
C4
C5
B11 B13 B14 B15B12
B11 B13 B14 B15B12
Figure 4 RS encoding process
6 Security and Communication Networks
324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is
Zi b times Mi
M1 + M2 + middot middot middot + Mn
(2)
To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality
b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
lemlt b (3)
In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is
m b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
(4)
325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability
-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made
(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection
(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding
Masked data
Sensitive data(location space
coordinates)
Nonsensitivedata
Predictivemaintenance
analysis
DesensitizationdataAnalysis result
M-RAPPOR
RS-encoding
Local
Data processing Storage
Industrialproduction
line
Storage
RS decoding
Figure 5 Protection process for real-time data
Security and Communication Networks 7
attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased
-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)
(1) Data Sender
(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1
(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value
P Biprime x( 1113857
f1 x 1
f2 x 0
1 minus f1 minus f2 x Bi
⎧⎪⎪⎨
⎪⎪⎩(5)
To satisfy ε-local differential privacy the privacy budget εis calculated
ε ln1 minus f2
f1 (6)
Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once
(2) Data Collector
(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime
(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows
ni N f2 minus 2f1f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2+
Di f1 minus 1 + f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2
(7)
(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value
In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased
326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data
According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)
attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons
(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation
(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists
(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni
(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location
In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data
33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes
331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment
8 Security and Communication Networks
-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection
332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967
represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc
Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud
nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as
Ttotal Tcloud + Tfog (8)
where Tcloud is the storage time required for the cloud andTfog that for the fog
-e storage time required for each cloud node isexpressed as
Tck tck middot (k minus 1) middotSck
1113936nj1 Scj
(9)
where Tck is the storage time required for cloud node Nck
and tckis the storage time required when cloud node Nck
stores each data block-erefore the storage time Tcloud required for the cloud
is expressed as
Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)
In addition we analyze the two storage methods for thefog nodes as follows
(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as
AES
DATA
Some data
RS encoding
Other data
Local
Equipment 1 Equipment 2 hellip Equipment n
Fog
Cloud
Figure 6 Protection process for data that does not require real-time processing
Security and Communication Networks 9
Tfog
tf1 middot (x + 1) 0lt x + 1leF1 max
tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max
tfn middot x + 1 minus 1113936nminus1
i1Fi max1113888 1113889 + 1113936
nminus1
i1tfi middot Fi max 1113936
nminus1
i1Fi max ltx + 1le 1113936
n
i1Fi max
⎧⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎩
(11)
where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi
(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as
Tfk tfk middot (x + 1) middotSfk
1113936ni1 Sfi
(12)
-erefore the storage time Tfog required for the fog isexpressed as
Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)
After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)
333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix
Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory
34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency
341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field
in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values
Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system
342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as
Est n
n + m (14)
-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes
4 Results and Discussion
41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments
411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a
10 Security and Communication Networks
normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters
412 Experimental Results
(1) M-RAPPOR
(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable
(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR
-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value
-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment
(2) ReedndashSolomon EncodingRS encoding consists of the following steps
(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the
Table 1 Degree of difficulty for cracking encoding matrix
Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248
8 6 2 296
16 6 1 296
16 6 2 2192
Table 2 Comparison of processing time for different word lengths
Word size of Galoisfield
Number of source datablocks k
Time(ms)
8 20 19216 20 1928 200 147116 200 1472
Table 3 Environmental parameters (real-time data)
Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB
RAPPORM-RAPPOR
200 300 400 500 600 700 800 900 1000100Number of data (thousand)
0
20
40
60
80
100
120
Tim
e (m
s)
Figure 7 Computational cost of RAPPOR and M-RAPPOR
Security and Communication Networks 11
configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance
(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In
addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability
(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n
6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According
0
000
001
Prop
ortio
n 002
003
50 100 150 0
000
001
Prop
ortio
n
002
003
50 100 150 200
0
0000
0005
0010
0015
0020
Prop
ortio
n
0025
50 100 150
True valueStatistical value
True valueStatistical value
True valueStatistical value
200
f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2
f1 = 001 f2 = 001 ε = 46
Figure 8 Frequency statistics results under different privacy budgets
12 Security and Communication Networks
to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the
corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed
42 Experiments on Protection of Non-Real-TimeIndustrial Data
421 Experimental Configuration
(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes
(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool
422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60
Equalized storageNonequalized storage
18 21 24 27 30 33 36 39 42 4515Number of data blocks
0
50
100
150
200
250
Dec
odin
g tim
e (m
s)
Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time
Encoding time (ms)Decoding time (ms)
5 10 15 20 25 300Number of data blocks
100
150
200
250
300
350
400
450
Tim
e (m
s)
Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks
1 2 3 4 5 60Damaged device number
80
100
120
140
160
180
200
220
240
260
280
Dec
odin
g tim
e (m
s)
Figure 11 Relationship between different local equipment failuresand decoding time
Table 4 Environmental parameters (non-real-time data)
Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB
Security and Communication Networks 13
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 3: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/3.jpg)
extremely rigorous attack model that operates withoutregard on how much background knowledge the attackerhas Kenthapadi et al [6] used differential privacy to addresssensitive-data protection issues for collective user behaviorMohammed et al [7] proposed a differential privacy algo-rithm for private data distribution and vertical partitiondata In a scenario where third-party data collectors are nottrusted Duchi et al [8] proposed local differential privacywhere every user understands their privacy protectionprocess and can individually process and protect personalsensitive information Erlingsson [9] developed the Ran-domized Aggregatable Privacy-Preserving Ordinal Response(RAPPOR) privacy algorithm that enables Google to collectuser answers to questions such as a browserrsquos default homepage and default search engine to understand the unwel-come or malicious hijacking of user settings Gu et al [10]proposed a privacy protection mining (DIFF-PPM) algo-rithm combined with the Markov chain Monte Carlo al-gorithm to provide privacy protection and maintain dataavailability when the (ε δ)-differential privacy conditions aremet
Most of these studies have aimed to protect usersrsquopersonal sensitive data and they have not applied differentialprivacy to the industrial field In addition the weakness ofdifferential privacy is obvious because the assumptionsabout background knowledge are too strong a large amountof randomization needs to be added to the query resultsresulting in a sharp decline in data availability -ereforehow to balance the level of sensitive-data protection and dataavailability according to specific application scenarios re-mains unclear and challenging in research on differentialprivacy applications
212 Sensitive Data Protection Based on Fog ComputingXu et al [11] introduced local differential privacy into fogcomputing and proposed a framework for local differentialprivacy protection however they did not mention specificimplementation details and privacy algorithms Wang et al[12] proposed a three-layer privacy protection frameworkhowever they did not consider the problem of localequipment failure in which case users would not be able toquery complete data Du et al [13] proposed a fog com-puting support data center query model based on differentialprivacy and proved through rigorous mathematical de-duction that it ensured the reliability and effectiveness ofprivacy protection Lyu et al [14] proposed the PPFA pri-vacy protection aggregation system that uses the stability of aGauss mechanism to ensure the differential privacy of thestatistical results and reduces the loss of privacy by com-bining a stream cipher with a public key cipher to maintainpracticability Huang et al [15] proposed an attribute-basedencryption scheme that makes full use of fog servers -ecollected data is encrypted by fog servers and then out-sourced to cloud servers -e experimental results show thatthe functions of the scheme are limited because it does notsupport efficient data search and the fog server takes up alarge part of the workload Lu et al [16] proposed theLightweight Privacy-preserving Data Aggregation (LPDA)
scheme Experiments indicated that this scheme was highlyefficient in terms of computational cost and communicationoverhead Wang et al [17] proposed a fog computing querymodel based on differential privacy Compared with thetraditional privacy protection model this model affordsdifferent degrees of improvement in computational over-head execution efficiency and energy consumption Kul-karni et al [18] proposed a privacy protection framework toprotect the data security of sensors between the terminalequipment and fog networks -is scheme introduced aselective public key cryptosystem that can resist internalattacks However they did not consider the computationaloverhead and the fog nodes may overload and collapse if thesensor transmits data for a long time
-e above studies introduced fog computing to protectsensitive data however most of them ignored the problemof the limited computational power and storage capacity offog nodes Many data protection schemes based on fogcomputing do a lot of work on the fog side this directly leadsto increased computational overhead and energy con-sumption resulting in jitter or even downtime of fog nodes
213 Protection of Sensitive Data in Cloud ComputingIn recent years data security in cloud storage has attractedattention in industry and academia Many experts and re-searchers have proposed schemes to address this issue Houet al [19] proposed protection schemes that use encrypteddata by the system when data has been transferred by SSLFeng [20] proposed a method in which data is encrypted in aclosed cloud environment to solve the problem of a dataleakage caused by the increased burden of the cloud serverIn addition one-time encryption and multipoint securestorage can be realized However encryption makes it moredifficult to search in the cloud At present a key issue in thefield of cloud computing is searchable encryption Fu et al[21ndash23] and Xia et al [24] provided different solutions to thisproblem that improved accuracy safety and efficiencyKulkarni et al [25] designed a virtual private storage servicebased on a newly developed encryption technology -isservice achieved the best combination of private cloud se-curity and public cloud functionality Wang et al [26]proposed that users do not have actual ownership of out-sourced data this makes data integrity protection in cloudcomputing a difficult task Shen et al [27] proposed anefficient public auditing protocol that includes a doubly-linked information table and a location array Experimentsshow that this protocol can reduce computational andcommunication overheads and achieve certain efficiencies
-e above studies improved the use of cloud storage forthe protection of sensitive data However they suffer from acommon problem they cannot provide a defense againstinternal attacks from the cloud
22 Cloud-Fog Collaborative Computing At present theacademic community mainly solves the problem of taskscheduling through cloud-fog coordination Pham and Huh[28] considered task scheduling in the cloud-fog collabo-ration computing system they proposed a task scheduling
Security and Communication Networks 3
algorithm that guarantees the performance of applicationexecution However they did not consider the fog providerrsquosbudget Deng et al [29] studied the balance between powerconsumption and delay in cloud-fog collaborative com-puting systems -ey first modeled the power consumptionand delay functions of various parts of the system andformulated allocation problems according to the workload-en they developed an approximate solution to theoriginal problem by decomposition and separately formu-lated three subproblems of the corresponding subsystemExperiments showed that fog computing could reducecommunication delay and significantly improve cloudcomputing performance Bierzynski et al [30] believed thatcloud-fog collaboration can solve most problems in IoT anddiscussed four possible ways to distribute workloads be-tween different levels
-e above studies solved the task scheduling problemthrough cloud-fog collaboration however they did notconsider network security and data protection issues Inaddition simply relying on fog computing or cloud com-puting cannot provide more secure protection for industrialdata
In summary the studies discussed in this section solvedthe data protection problem and task scheduling problem tosome extent However for industrial applications with botha large amount of data and a high degree of data protectiontheir results cannot meet the requirements of the IndustrialInternet update speed high data accuracy and large datavolume -us far academic and industrial research on theprotection of sensitive data in the Industrial Internet of-ings (IIoT) remains in its infancy In this study wepropose a protection model for sensitive data that provideshigher protection and is more adaptable to industrialenvironments
3 Sensitive Industrial Data Protection Scheme
Section 31 introduces our protection framework for sen-sitive industrial data and the threat model Sections 32 and33 describe the protection of data for real-time and non-real-time processing respectively
31 Our Work and Contribution
311 Sensitive Industrial Data Protection Framework-is study designs a three-layer protection framework withlocalfogcloud storage for sensitive industrial data asshown in Figure 2 -is framework mainly consists of threemodules M-RAPPOR disturbance module ReedndashSolomon(RS) encoding module and Advanced Encryption Standard(AES) encryption and RS encoding module -is frameworknot only achieves low-cost high-efficiency and intelligentdata protection but also alleviates the storage pressure oflocal equipment through improved recoverability
We divide industrial data into real-time and non-real-time sensitive data depending on its characteristics
(1) For real-time sensitive data we design a data pro-tection scheme based on local differential privacy
and RS encoding We apply the M-RAPPOR algo-rithm to the sensitive data (location information)apply RS encoding to the perturbed data and theundisturbed nonsensitive data and store them inlocal equipment Considering the storage capacity oflocal equipment and the problem that data cannot berecovered owing to local equipment failures weadopt the optimal solution of distributed storage inthe local equipment and add corresponding re-strictions to the RS encoding which not only solvesthe data recovery problem due to the failure of localequipment but also improves the encoding efficiencyand reduces the computational cost If we need toperform predictive maintenance analysis we candecode the data and then analyze it
(2) For non-real-time sensitive data we design a dataprotection scheme based on AES encryption and RSencoding -e local equipment encrypts the non-real-time data using AES uploads the ciphertext tothe fog nodes and performs RS encoding of the dataon the fog nodes Next the encoded data is stored inthe fog nodes and the cloud nodes according to thecloud-fog collaborative storage strategy In this wayif an attempt is made to steal the data from the fognodes or the cloud nodes all the data cannot betaken
312 6reat Model Figure 3 shows the threat model that isdefined according to the data protection framework
-ere are two threats to industrial data
(1) For real-time sensitive data attackers can attack thelocal data center -ey can steal location informationof equipment to infer the general plant layout -elayout is a type of sensitive industrial data andtherefore this attack will lead to the leakage of suchdata
(2) For non-real-time sensitive data attackers (internalpersonnel of the third-party) can abuse data miningto infer customersrsquo private information and thefactoryrsquos business strategy from the factoryrsquos cus-tomer information and marketing plans respec-tively -is will not only threaten customersrsquo privacybut also affect the factoryrsquos overall interests
32 Protection of Real-Time Industrial Data
321 Local Differential Privacy Local differential privacyconsiders scenarios in which third-party data collectors areuntrustworthy Each user perturbs the data according to theprivacy algorithm and then uploads the perturbed data to thedata collector Local differential privacy is formally definedas follows
Definition 1 -ere are n users where each user correspondsto a record A privacy algorithm M is given having a domainDom(M) and a range Ran(M) If algorithm M obtains thesame output result T(TsubeRan(M)) on any two records t and
4 Security and Communication Networks
tprime(t tprime isin Dom(M)) and satisfies the following inequalitiesthen M satisfies local differential privacy
Pr[M(t) T]le eε
times Pr M tprime( 1113857 T1113858 1113859 (1)
As can be seen from Definition 1 the local differentialprivacy ensures that the algorithm M satisfies the ε-localdifferential privacy by controlling the similarity of theoutput results of any two records In short according to the
AES encryption+RS encoding
Real-timeindustrial
sensitive data
Non-real-timeindustrial
sensitive data
M-RAPPOR disturbance
Sensitivedata
(locationinformation)
Non-sensitive data
Industrialcloud
Fog nodes
Cloud
Fog
Localequipment
LocalRSencoding
Data
Figure 2 Protection framework for sensitive industrial data
General plantlayout
Location data
1 Customer private information
2 Factory business strategy
Sensor Sensor Sensor Sensor SensorSensor
Local data center
Fog
Cloud
Local equipmentLocal equipment
Figure 3 -reat model of sensitive industrial data
Security and Communication Networks 5
output result of the privacy algorithm M it is almostimpossible to infer which record the input data is Withlocal differential privacy technology each user can processindividual data independently In other words the processof privacy processing is transferred from the data collectorto a single user and therefore there is no need for trustedthird-party intervention -is can avoid the attacks thatmay be brought on by untrusted third-party data collectors-e implementation of local differential privacy protectionrequires the intervention of a data disturbance mechanismRandom response technology is the mainstream distur-bance mechanism for local differential privacy protectiontechnology
322 ReedndashSolomon Encoding RS encoding is a matrixoperation As shown in Figure 4 the input data is treated asmatrix C (C1 C2 Cn) and the encoding matrix B ismultiplied by matrix C to obtain data blocks(C1 C2 Cn) and y redundant data blocks (n 5 y 3)-e encoding matrix B must have reversibility of anysubmatrix After RS encoding the data is divided into n partsand y redundant data are generated In the data of thesen + y parts one can recover the total data with at least n datavalues In other words the total data cannot be recoveredwith less than n data values
323 Data Protection Scheme Based on Local DifferentialPrivacy and ReedndashSolomon Encoding With continuousdevelopments in industries many factories have startedusing equipment such as CNC machine tools to increaseproductivity However some problems have also beenexposed For example a serious failure of a CNC machinetool will not only lead to the shutdown of the entireproduction line but also seriously delay the product pro-duction cycle -erefore to avoid this situation the factoryshould analyze the data collected by sensors predictpossible situations and warn operators or adjust theequipment according to the forecasted situation in a timelymanner to avoid an accidental shutdown of the equipmentHowever the data contains not only real-time data col-lected by the sensor but also a large amount of sensitivedata of the factory equipment such as node location andspatial coordinates -is type of sensitive data related tolocations should not be leaked before predictive mainte-nance analysis -erefore it needs to be desensitized beforedata analysis to eliminate the correspondence between theequipment and the location In addition because the real-time data information collected by factory sensors is ofgreat value and high confidentiality we cannot upload thisdata to the fog or the cloud it must be stored in localequipment -e protection scheme for real-time sensitivedata is described in detail below
We use local differential privacy to desensitize real-timesensitive data -en we encode the perturbed data andundisturbed nonsensitive data by RS encoding and store it inlocal equipment Figure 5 shows the protection process forreal-time data
-e real-time data collected by the factory sensors is firstuploaded to the local equipment-en the following processis performed in the local equipment
(1) Data Disturbance with Local Differential Privacy Becauseof the limited computing power of local equipment it cannotperform complex computing tasks -erefore it is very im-portant to design a lightweight privacy protection algorithmLocal differential privacy considers the untrustworthiness of thedata collector and its disturbancemechanismmainly includes arandom response information compression and distortion-e disturbance framework of the random response technologyis simple and intuitive and the disturbance level can be directlyquantified -erefore we choose local differential privacy toprotect the sensitive data We refer to the basic idea of theRAPPOR algorithm which is an open-source project of Googleand improve RAPPOR to reduce the computational cost of localequipment while ensuring privacy and data availability In thispaper we label the improved algorithm asM-RAPPOR and weuse it to disturb the location information Section 325 discussesthe specific disturbance steps
(2) RS Encoding and Storage We apply RS encoding to theperturbed data and undisturbed nonsensitive data and store thedata in local equipment However owing to the continuousproduction of data in factories and the limited storage capacityof local equipment the storage pressure on equipment isgradually increasing In addition if the local equipment failsunexpectedly it will not be possible to completely restore thedata resulting in serious losses to the factory To solve these twoproblems we adopt the optimal solution namely distributedstorage in the local equipment and add corresponding re-strictions to RS -is not only solves the problem of the highstorage pressure on local equipment and the fact that datacannot be recovered owing to the failure of the local equipmentbut also improves the encoding efficiency and reduces thecomputational cost Section 324 discusses the specific dis-tributed storage strategy
(3) RS Decoding and Analysis When predictive maintenanceanalysis is performed the data stored in the local equipmentare decoded and the analysis is performed according to thestatistical location distribution and data Section 325 dis-cusses the specific statistical process
lowast =
C
B
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
B11 B13 B14 B15B12 D1
D2
D3
C1
C2
C3
C4
C5
C1
C2
C3
C4
C5
B11 B13 B14 B15B12
B11 B13 B14 B15B12
Figure 4 RS encoding process
6 Security and Communication Networks
324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is
Zi b times Mi
M1 + M2 + middot middot middot + Mn
(2)
To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality
b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
lemlt b (3)
In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is
m b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
(4)
325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability
-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made
(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection
(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding
Masked data
Sensitive data(location space
coordinates)
Nonsensitivedata
Predictivemaintenance
analysis
DesensitizationdataAnalysis result
M-RAPPOR
RS-encoding
Local
Data processing Storage
Industrialproduction
line
Storage
RS decoding
Figure 5 Protection process for real-time data
Security and Communication Networks 7
attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased
-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)
(1) Data Sender
(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1
(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value
P Biprime x( 1113857
f1 x 1
f2 x 0
1 minus f1 minus f2 x Bi
⎧⎪⎪⎨
⎪⎪⎩(5)
To satisfy ε-local differential privacy the privacy budget εis calculated
ε ln1 minus f2
f1 (6)
Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once
(2) Data Collector
(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime
(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows
ni N f2 minus 2f1f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2+
Di f1 minus 1 + f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2
(7)
(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value
In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased
326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data
According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)
attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons
(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation
(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists
(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni
(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location
In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data
33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes
331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment
8 Security and Communication Networks
-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection
332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967
represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc
Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud
nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as
Ttotal Tcloud + Tfog (8)
where Tcloud is the storage time required for the cloud andTfog that for the fog
-e storage time required for each cloud node isexpressed as
Tck tck middot (k minus 1) middotSck
1113936nj1 Scj
(9)
where Tck is the storage time required for cloud node Nck
and tckis the storage time required when cloud node Nck
stores each data block-erefore the storage time Tcloud required for the cloud
is expressed as
Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)
In addition we analyze the two storage methods for thefog nodes as follows
(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as
AES
DATA
Some data
RS encoding
Other data
Local
Equipment 1 Equipment 2 hellip Equipment n
Fog
Cloud
Figure 6 Protection process for data that does not require real-time processing
Security and Communication Networks 9
Tfog
tf1 middot (x + 1) 0lt x + 1leF1 max
tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max
tfn middot x + 1 minus 1113936nminus1
i1Fi max1113888 1113889 + 1113936
nminus1
i1tfi middot Fi max 1113936
nminus1
i1Fi max ltx + 1le 1113936
n
i1Fi max
⎧⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎩
(11)
where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi
(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as
Tfk tfk middot (x + 1) middotSfk
1113936ni1 Sfi
(12)
-erefore the storage time Tfog required for the fog isexpressed as
Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)
After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)
333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix
Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory
34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency
341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field
in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values
Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system
342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as
Est n
n + m (14)
-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes
4 Results and Discussion
41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments
411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a
10 Security and Communication Networks
normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters
412 Experimental Results
(1) M-RAPPOR
(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable
(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR
-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value
-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment
(2) ReedndashSolomon EncodingRS encoding consists of the following steps
(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the
Table 1 Degree of difficulty for cracking encoding matrix
Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248
8 6 2 296
16 6 1 296
16 6 2 2192
Table 2 Comparison of processing time for different word lengths
Word size of Galoisfield
Number of source datablocks k
Time(ms)
8 20 19216 20 1928 200 147116 200 1472
Table 3 Environmental parameters (real-time data)
Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB
RAPPORM-RAPPOR
200 300 400 500 600 700 800 900 1000100Number of data (thousand)
0
20
40
60
80
100
120
Tim
e (m
s)
Figure 7 Computational cost of RAPPOR and M-RAPPOR
Security and Communication Networks 11
configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance
(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In
addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability
(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n
6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According
0
000
001
Prop
ortio
n 002
003
50 100 150 0
000
001
Prop
ortio
n
002
003
50 100 150 200
0
0000
0005
0010
0015
0020
Prop
ortio
n
0025
50 100 150
True valueStatistical value
True valueStatistical value
True valueStatistical value
200
f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2
f1 = 001 f2 = 001 ε = 46
Figure 8 Frequency statistics results under different privacy budgets
12 Security and Communication Networks
to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the
corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed
42 Experiments on Protection of Non-Real-TimeIndustrial Data
421 Experimental Configuration
(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes
(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool
422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60
Equalized storageNonequalized storage
18 21 24 27 30 33 36 39 42 4515Number of data blocks
0
50
100
150
200
250
Dec
odin
g tim
e (m
s)
Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time
Encoding time (ms)Decoding time (ms)
5 10 15 20 25 300Number of data blocks
100
150
200
250
300
350
400
450
Tim
e (m
s)
Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks
1 2 3 4 5 60Damaged device number
80
100
120
140
160
180
200
220
240
260
280
Dec
odin
g tim
e (m
s)
Figure 11 Relationship between different local equipment failuresand decoding time
Table 4 Environmental parameters (non-real-time data)
Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB
Security and Communication Networks 13
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 4: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/4.jpg)
algorithm that guarantees the performance of applicationexecution However they did not consider the fog providerrsquosbudget Deng et al [29] studied the balance between powerconsumption and delay in cloud-fog collaborative com-puting systems -ey first modeled the power consumptionand delay functions of various parts of the system andformulated allocation problems according to the workload-en they developed an approximate solution to theoriginal problem by decomposition and separately formu-lated three subproblems of the corresponding subsystemExperiments showed that fog computing could reducecommunication delay and significantly improve cloudcomputing performance Bierzynski et al [30] believed thatcloud-fog collaboration can solve most problems in IoT anddiscussed four possible ways to distribute workloads be-tween different levels
-e above studies solved the task scheduling problemthrough cloud-fog collaboration however they did notconsider network security and data protection issues Inaddition simply relying on fog computing or cloud com-puting cannot provide more secure protection for industrialdata
In summary the studies discussed in this section solvedthe data protection problem and task scheduling problem tosome extent However for industrial applications with botha large amount of data and a high degree of data protectiontheir results cannot meet the requirements of the IndustrialInternet update speed high data accuracy and large datavolume -us far academic and industrial research on theprotection of sensitive data in the Industrial Internet of-ings (IIoT) remains in its infancy In this study wepropose a protection model for sensitive data that provideshigher protection and is more adaptable to industrialenvironments
3 Sensitive Industrial Data Protection Scheme
Section 31 introduces our protection framework for sen-sitive industrial data and the threat model Sections 32 and33 describe the protection of data for real-time and non-real-time processing respectively
31 Our Work and Contribution
311 Sensitive Industrial Data Protection Framework-is study designs a three-layer protection framework withlocalfogcloud storage for sensitive industrial data asshown in Figure 2 -is framework mainly consists of threemodules M-RAPPOR disturbance module ReedndashSolomon(RS) encoding module and Advanced Encryption Standard(AES) encryption and RS encoding module -is frameworknot only achieves low-cost high-efficiency and intelligentdata protection but also alleviates the storage pressure oflocal equipment through improved recoverability
We divide industrial data into real-time and non-real-time sensitive data depending on its characteristics
(1) For real-time sensitive data we design a data pro-tection scheme based on local differential privacy
and RS encoding We apply the M-RAPPOR algo-rithm to the sensitive data (location information)apply RS encoding to the perturbed data and theundisturbed nonsensitive data and store them inlocal equipment Considering the storage capacity oflocal equipment and the problem that data cannot berecovered owing to local equipment failures weadopt the optimal solution of distributed storage inthe local equipment and add corresponding re-strictions to the RS encoding which not only solvesthe data recovery problem due to the failure of localequipment but also improves the encoding efficiencyand reduces the computational cost If we need toperform predictive maintenance analysis we candecode the data and then analyze it
(2) For non-real-time sensitive data we design a dataprotection scheme based on AES encryption and RSencoding -e local equipment encrypts the non-real-time data using AES uploads the ciphertext tothe fog nodes and performs RS encoding of the dataon the fog nodes Next the encoded data is stored inthe fog nodes and the cloud nodes according to thecloud-fog collaborative storage strategy In this wayif an attempt is made to steal the data from the fognodes or the cloud nodes all the data cannot betaken
312 6reat Model Figure 3 shows the threat model that isdefined according to the data protection framework
-ere are two threats to industrial data
(1) For real-time sensitive data attackers can attack thelocal data center -ey can steal location informationof equipment to infer the general plant layout -elayout is a type of sensitive industrial data andtherefore this attack will lead to the leakage of suchdata
(2) For non-real-time sensitive data attackers (internalpersonnel of the third-party) can abuse data miningto infer customersrsquo private information and thefactoryrsquos business strategy from the factoryrsquos cus-tomer information and marketing plans respec-tively -is will not only threaten customersrsquo privacybut also affect the factoryrsquos overall interests
32 Protection of Real-Time Industrial Data
321 Local Differential Privacy Local differential privacyconsiders scenarios in which third-party data collectors areuntrustworthy Each user perturbs the data according to theprivacy algorithm and then uploads the perturbed data to thedata collector Local differential privacy is formally definedas follows
Definition 1 -ere are n users where each user correspondsto a record A privacy algorithm M is given having a domainDom(M) and a range Ran(M) If algorithm M obtains thesame output result T(TsubeRan(M)) on any two records t and
4 Security and Communication Networks
tprime(t tprime isin Dom(M)) and satisfies the following inequalitiesthen M satisfies local differential privacy
Pr[M(t) T]le eε
times Pr M tprime( 1113857 T1113858 1113859 (1)
As can be seen from Definition 1 the local differentialprivacy ensures that the algorithm M satisfies the ε-localdifferential privacy by controlling the similarity of theoutput results of any two records In short according to the
AES encryption+RS encoding
Real-timeindustrial
sensitive data
Non-real-timeindustrial
sensitive data
M-RAPPOR disturbance
Sensitivedata
(locationinformation)
Non-sensitive data
Industrialcloud
Fog nodes
Cloud
Fog
Localequipment
LocalRSencoding
Data
Figure 2 Protection framework for sensitive industrial data
General plantlayout
Location data
1 Customer private information
2 Factory business strategy
Sensor Sensor Sensor Sensor SensorSensor
Local data center
Fog
Cloud
Local equipmentLocal equipment
Figure 3 -reat model of sensitive industrial data
Security and Communication Networks 5
output result of the privacy algorithm M it is almostimpossible to infer which record the input data is Withlocal differential privacy technology each user can processindividual data independently In other words the processof privacy processing is transferred from the data collectorto a single user and therefore there is no need for trustedthird-party intervention -is can avoid the attacks thatmay be brought on by untrusted third-party data collectors-e implementation of local differential privacy protectionrequires the intervention of a data disturbance mechanismRandom response technology is the mainstream distur-bance mechanism for local differential privacy protectiontechnology
322 ReedndashSolomon Encoding RS encoding is a matrixoperation As shown in Figure 4 the input data is treated asmatrix C (C1 C2 Cn) and the encoding matrix B ismultiplied by matrix C to obtain data blocks(C1 C2 Cn) and y redundant data blocks (n 5 y 3)-e encoding matrix B must have reversibility of anysubmatrix After RS encoding the data is divided into n partsand y redundant data are generated In the data of thesen + y parts one can recover the total data with at least n datavalues In other words the total data cannot be recoveredwith less than n data values
323 Data Protection Scheme Based on Local DifferentialPrivacy and ReedndashSolomon Encoding With continuousdevelopments in industries many factories have startedusing equipment such as CNC machine tools to increaseproductivity However some problems have also beenexposed For example a serious failure of a CNC machinetool will not only lead to the shutdown of the entireproduction line but also seriously delay the product pro-duction cycle -erefore to avoid this situation the factoryshould analyze the data collected by sensors predictpossible situations and warn operators or adjust theequipment according to the forecasted situation in a timelymanner to avoid an accidental shutdown of the equipmentHowever the data contains not only real-time data col-lected by the sensor but also a large amount of sensitivedata of the factory equipment such as node location andspatial coordinates -is type of sensitive data related tolocations should not be leaked before predictive mainte-nance analysis -erefore it needs to be desensitized beforedata analysis to eliminate the correspondence between theequipment and the location In addition because the real-time data information collected by factory sensors is ofgreat value and high confidentiality we cannot upload thisdata to the fog or the cloud it must be stored in localequipment -e protection scheme for real-time sensitivedata is described in detail below
We use local differential privacy to desensitize real-timesensitive data -en we encode the perturbed data andundisturbed nonsensitive data by RS encoding and store it inlocal equipment Figure 5 shows the protection process forreal-time data
-e real-time data collected by the factory sensors is firstuploaded to the local equipment-en the following processis performed in the local equipment
(1) Data Disturbance with Local Differential Privacy Becauseof the limited computing power of local equipment it cannotperform complex computing tasks -erefore it is very im-portant to design a lightweight privacy protection algorithmLocal differential privacy considers the untrustworthiness of thedata collector and its disturbancemechanismmainly includes arandom response information compression and distortion-e disturbance framework of the random response technologyis simple and intuitive and the disturbance level can be directlyquantified -erefore we choose local differential privacy toprotect the sensitive data We refer to the basic idea of theRAPPOR algorithm which is an open-source project of Googleand improve RAPPOR to reduce the computational cost of localequipment while ensuring privacy and data availability In thispaper we label the improved algorithm asM-RAPPOR and weuse it to disturb the location information Section 325 discussesthe specific disturbance steps
(2) RS Encoding and Storage We apply RS encoding to theperturbed data and undisturbed nonsensitive data and store thedata in local equipment However owing to the continuousproduction of data in factories and the limited storage capacityof local equipment the storage pressure on equipment isgradually increasing In addition if the local equipment failsunexpectedly it will not be possible to completely restore thedata resulting in serious losses to the factory To solve these twoproblems we adopt the optimal solution namely distributedstorage in the local equipment and add corresponding re-strictions to RS -is not only solves the problem of the highstorage pressure on local equipment and the fact that datacannot be recovered owing to the failure of the local equipmentbut also improves the encoding efficiency and reduces thecomputational cost Section 324 discusses the specific dis-tributed storage strategy
(3) RS Decoding and Analysis When predictive maintenanceanalysis is performed the data stored in the local equipmentare decoded and the analysis is performed according to thestatistical location distribution and data Section 325 dis-cusses the specific statistical process
lowast =
C
B
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
B11 B13 B14 B15B12 D1
D2
D3
C1
C2
C3
C4
C5
C1
C2
C3
C4
C5
B11 B13 B14 B15B12
B11 B13 B14 B15B12
Figure 4 RS encoding process
6 Security and Communication Networks
324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is
Zi b times Mi
M1 + M2 + middot middot middot + Mn
(2)
To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality
b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
lemlt b (3)
In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is
m b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
(4)
325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability
-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made
(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection
(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding
Masked data
Sensitive data(location space
coordinates)
Nonsensitivedata
Predictivemaintenance
analysis
DesensitizationdataAnalysis result
M-RAPPOR
RS-encoding
Local
Data processing Storage
Industrialproduction
line
Storage
RS decoding
Figure 5 Protection process for real-time data
Security and Communication Networks 7
attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased
-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)
(1) Data Sender
(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1
(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value
P Biprime x( 1113857
f1 x 1
f2 x 0
1 minus f1 minus f2 x Bi
⎧⎪⎪⎨
⎪⎪⎩(5)
To satisfy ε-local differential privacy the privacy budget εis calculated
ε ln1 minus f2
f1 (6)
Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once
(2) Data Collector
(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime
(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows
ni N f2 minus 2f1f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2+
Di f1 minus 1 + f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2
(7)
(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value
In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased
326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data
According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)
attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons
(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation
(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists
(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni
(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location
In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data
33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes
331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment
8 Security and Communication Networks
-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection
332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967
represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc
Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud
nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as
Ttotal Tcloud + Tfog (8)
where Tcloud is the storage time required for the cloud andTfog that for the fog
-e storage time required for each cloud node isexpressed as
Tck tck middot (k minus 1) middotSck
1113936nj1 Scj
(9)
where Tck is the storage time required for cloud node Nck
and tckis the storage time required when cloud node Nck
stores each data block-erefore the storage time Tcloud required for the cloud
is expressed as
Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)
In addition we analyze the two storage methods for thefog nodes as follows
(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as
AES
DATA
Some data
RS encoding
Other data
Local
Equipment 1 Equipment 2 hellip Equipment n
Fog
Cloud
Figure 6 Protection process for data that does not require real-time processing
Security and Communication Networks 9
Tfog
tf1 middot (x + 1) 0lt x + 1leF1 max
tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max
tfn middot x + 1 minus 1113936nminus1
i1Fi max1113888 1113889 + 1113936
nminus1
i1tfi middot Fi max 1113936
nminus1
i1Fi max ltx + 1le 1113936
n
i1Fi max
⎧⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎩
(11)
where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi
(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as
Tfk tfk middot (x + 1) middotSfk
1113936ni1 Sfi
(12)
-erefore the storage time Tfog required for the fog isexpressed as
Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)
After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)
333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix
Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory
34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency
341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field
in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values
Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system
342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as
Est n
n + m (14)
-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes
4 Results and Discussion
41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments
411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a
10 Security and Communication Networks
normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters
412 Experimental Results
(1) M-RAPPOR
(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable
(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR
-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value
-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment
(2) ReedndashSolomon EncodingRS encoding consists of the following steps
(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the
Table 1 Degree of difficulty for cracking encoding matrix
Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248
8 6 2 296
16 6 1 296
16 6 2 2192
Table 2 Comparison of processing time for different word lengths
Word size of Galoisfield
Number of source datablocks k
Time(ms)
8 20 19216 20 1928 200 147116 200 1472
Table 3 Environmental parameters (real-time data)
Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB
RAPPORM-RAPPOR
200 300 400 500 600 700 800 900 1000100Number of data (thousand)
0
20
40
60
80
100
120
Tim
e (m
s)
Figure 7 Computational cost of RAPPOR and M-RAPPOR
Security and Communication Networks 11
configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance
(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In
addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability
(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n
6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According
0
000
001
Prop
ortio
n 002
003
50 100 150 0
000
001
Prop
ortio
n
002
003
50 100 150 200
0
0000
0005
0010
0015
0020
Prop
ortio
n
0025
50 100 150
True valueStatistical value
True valueStatistical value
True valueStatistical value
200
f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2
f1 = 001 f2 = 001 ε = 46
Figure 8 Frequency statistics results under different privacy budgets
12 Security and Communication Networks
to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the
corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed
42 Experiments on Protection of Non-Real-TimeIndustrial Data
421 Experimental Configuration
(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes
(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool
422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60
Equalized storageNonequalized storage
18 21 24 27 30 33 36 39 42 4515Number of data blocks
0
50
100
150
200
250
Dec
odin
g tim
e (m
s)
Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time
Encoding time (ms)Decoding time (ms)
5 10 15 20 25 300Number of data blocks
100
150
200
250
300
350
400
450
Tim
e (m
s)
Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks
1 2 3 4 5 60Damaged device number
80
100
120
140
160
180
200
220
240
260
280
Dec
odin
g tim
e (m
s)
Figure 11 Relationship between different local equipment failuresand decoding time
Table 4 Environmental parameters (non-real-time data)
Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB
Security and Communication Networks 13
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 5: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/5.jpg)
tprime(t tprime isin Dom(M)) and satisfies the following inequalitiesthen M satisfies local differential privacy
Pr[M(t) T]le eε
times Pr M tprime( 1113857 T1113858 1113859 (1)
As can be seen from Definition 1 the local differentialprivacy ensures that the algorithm M satisfies the ε-localdifferential privacy by controlling the similarity of theoutput results of any two records In short according to the
AES encryption+RS encoding
Real-timeindustrial
sensitive data
Non-real-timeindustrial
sensitive data
M-RAPPOR disturbance
Sensitivedata
(locationinformation)
Non-sensitive data
Industrialcloud
Fog nodes
Cloud
Fog
Localequipment
LocalRSencoding
Data
Figure 2 Protection framework for sensitive industrial data
General plantlayout
Location data
1 Customer private information
2 Factory business strategy
Sensor Sensor Sensor Sensor SensorSensor
Local data center
Fog
Cloud
Local equipmentLocal equipment
Figure 3 -reat model of sensitive industrial data
Security and Communication Networks 5
output result of the privacy algorithm M it is almostimpossible to infer which record the input data is Withlocal differential privacy technology each user can processindividual data independently In other words the processof privacy processing is transferred from the data collectorto a single user and therefore there is no need for trustedthird-party intervention -is can avoid the attacks thatmay be brought on by untrusted third-party data collectors-e implementation of local differential privacy protectionrequires the intervention of a data disturbance mechanismRandom response technology is the mainstream distur-bance mechanism for local differential privacy protectiontechnology
322 ReedndashSolomon Encoding RS encoding is a matrixoperation As shown in Figure 4 the input data is treated asmatrix C (C1 C2 Cn) and the encoding matrix B ismultiplied by matrix C to obtain data blocks(C1 C2 Cn) and y redundant data blocks (n 5 y 3)-e encoding matrix B must have reversibility of anysubmatrix After RS encoding the data is divided into n partsand y redundant data are generated In the data of thesen + y parts one can recover the total data with at least n datavalues In other words the total data cannot be recoveredwith less than n data values
323 Data Protection Scheme Based on Local DifferentialPrivacy and ReedndashSolomon Encoding With continuousdevelopments in industries many factories have startedusing equipment such as CNC machine tools to increaseproductivity However some problems have also beenexposed For example a serious failure of a CNC machinetool will not only lead to the shutdown of the entireproduction line but also seriously delay the product pro-duction cycle -erefore to avoid this situation the factoryshould analyze the data collected by sensors predictpossible situations and warn operators or adjust theequipment according to the forecasted situation in a timelymanner to avoid an accidental shutdown of the equipmentHowever the data contains not only real-time data col-lected by the sensor but also a large amount of sensitivedata of the factory equipment such as node location andspatial coordinates -is type of sensitive data related tolocations should not be leaked before predictive mainte-nance analysis -erefore it needs to be desensitized beforedata analysis to eliminate the correspondence between theequipment and the location In addition because the real-time data information collected by factory sensors is ofgreat value and high confidentiality we cannot upload thisdata to the fog or the cloud it must be stored in localequipment -e protection scheme for real-time sensitivedata is described in detail below
We use local differential privacy to desensitize real-timesensitive data -en we encode the perturbed data andundisturbed nonsensitive data by RS encoding and store it inlocal equipment Figure 5 shows the protection process forreal-time data
-e real-time data collected by the factory sensors is firstuploaded to the local equipment-en the following processis performed in the local equipment
(1) Data Disturbance with Local Differential Privacy Becauseof the limited computing power of local equipment it cannotperform complex computing tasks -erefore it is very im-portant to design a lightweight privacy protection algorithmLocal differential privacy considers the untrustworthiness of thedata collector and its disturbancemechanismmainly includes arandom response information compression and distortion-e disturbance framework of the random response technologyis simple and intuitive and the disturbance level can be directlyquantified -erefore we choose local differential privacy toprotect the sensitive data We refer to the basic idea of theRAPPOR algorithm which is an open-source project of Googleand improve RAPPOR to reduce the computational cost of localequipment while ensuring privacy and data availability In thispaper we label the improved algorithm asM-RAPPOR and weuse it to disturb the location information Section 325 discussesthe specific disturbance steps
(2) RS Encoding and Storage We apply RS encoding to theperturbed data and undisturbed nonsensitive data and store thedata in local equipment However owing to the continuousproduction of data in factories and the limited storage capacityof local equipment the storage pressure on equipment isgradually increasing In addition if the local equipment failsunexpectedly it will not be possible to completely restore thedata resulting in serious losses to the factory To solve these twoproblems we adopt the optimal solution namely distributedstorage in the local equipment and add corresponding re-strictions to RS -is not only solves the problem of the highstorage pressure on local equipment and the fact that datacannot be recovered owing to the failure of the local equipmentbut also improves the encoding efficiency and reduces thecomputational cost Section 324 discusses the specific dis-tributed storage strategy
(3) RS Decoding and Analysis When predictive maintenanceanalysis is performed the data stored in the local equipmentare decoded and the analysis is performed according to thestatistical location distribution and data Section 325 dis-cusses the specific statistical process
lowast =
C
B
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
B11 B13 B14 B15B12 D1
D2
D3
C1
C2
C3
C4
C5
C1
C2
C3
C4
C5
B11 B13 B14 B15B12
B11 B13 B14 B15B12
Figure 4 RS encoding process
6 Security and Communication Networks
324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is
Zi b times Mi
M1 + M2 + middot middot middot + Mn
(2)
To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality
b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
lemlt b (3)
In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is
m b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
(4)
325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability
-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made
(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection
(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding
Masked data
Sensitive data(location space
coordinates)
Nonsensitivedata
Predictivemaintenance
analysis
DesensitizationdataAnalysis result
M-RAPPOR
RS-encoding
Local
Data processing Storage
Industrialproduction
line
Storage
RS decoding
Figure 5 Protection process for real-time data
Security and Communication Networks 7
attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased
-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)
(1) Data Sender
(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1
(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value
P Biprime x( 1113857
f1 x 1
f2 x 0
1 minus f1 minus f2 x Bi
⎧⎪⎪⎨
⎪⎪⎩(5)
To satisfy ε-local differential privacy the privacy budget εis calculated
ε ln1 minus f2
f1 (6)
Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once
(2) Data Collector
(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime
(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows
ni N f2 minus 2f1f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2+
Di f1 minus 1 + f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2
(7)
(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value
In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased
326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data
According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)
attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons
(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation
(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists
(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni
(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location
In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data
33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes
331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment
8 Security and Communication Networks
-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection
332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967
represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc
Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud
nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as
Ttotal Tcloud + Tfog (8)
where Tcloud is the storage time required for the cloud andTfog that for the fog
-e storage time required for each cloud node isexpressed as
Tck tck middot (k minus 1) middotSck
1113936nj1 Scj
(9)
where Tck is the storage time required for cloud node Nck
and tckis the storage time required when cloud node Nck
stores each data block-erefore the storage time Tcloud required for the cloud
is expressed as
Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)
In addition we analyze the two storage methods for thefog nodes as follows
(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as
AES
DATA
Some data
RS encoding
Other data
Local
Equipment 1 Equipment 2 hellip Equipment n
Fog
Cloud
Figure 6 Protection process for data that does not require real-time processing
Security and Communication Networks 9
Tfog
tf1 middot (x + 1) 0lt x + 1leF1 max
tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max
tfn middot x + 1 minus 1113936nminus1
i1Fi max1113888 1113889 + 1113936
nminus1
i1tfi middot Fi max 1113936
nminus1
i1Fi max ltx + 1le 1113936
n
i1Fi max
⎧⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎩
(11)
where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi
(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as
Tfk tfk middot (x + 1) middotSfk
1113936ni1 Sfi
(12)
-erefore the storage time Tfog required for the fog isexpressed as
Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)
After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)
333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix
Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory
34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency
341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field
in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values
Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system
342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as
Est n
n + m (14)
-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes
4 Results and Discussion
41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments
411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a
10 Security and Communication Networks
normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters
412 Experimental Results
(1) M-RAPPOR
(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable
(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR
-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value
-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment
(2) ReedndashSolomon EncodingRS encoding consists of the following steps
(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the
Table 1 Degree of difficulty for cracking encoding matrix
Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248
8 6 2 296
16 6 1 296
16 6 2 2192
Table 2 Comparison of processing time for different word lengths
Word size of Galoisfield
Number of source datablocks k
Time(ms)
8 20 19216 20 1928 200 147116 200 1472
Table 3 Environmental parameters (real-time data)
Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB
RAPPORM-RAPPOR
200 300 400 500 600 700 800 900 1000100Number of data (thousand)
0
20
40
60
80
100
120
Tim
e (m
s)
Figure 7 Computational cost of RAPPOR and M-RAPPOR
Security and Communication Networks 11
configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance
(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In
addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability
(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n
6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According
0
000
001
Prop
ortio
n 002
003
50 100 150 0
000
001
Prop
ortio
n
002
003
50 100 150 200
0
0000
0005
0010
0015
0020
Prop
ortio
n
0025
50 100 150
True valueStatistical value
True valueStatistical value
True valueStatistical value
200
f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2
f1 = 001 f2 = 001 ε = 46
Figure 8 Frequency statistics results under different privacy budgets
12 Security and Communication Networks
to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the
corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed
42 Experiments on Protection of Non-Real-TimeIndustrial Data
421 Experimental Configuration
(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes
(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool
422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60
Equalized storageNonequalized storage
18 21 24 27 30 33 36 39 42 4515Number of data blocks
0
50
100
150
200
250
Dec
odin
g tim
e (m
s)
Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time
Encoding time (ms)Decoding time (ms)
5 10 15 20 25 300Number of data blocks
100
150
200
250
300
350
400
450
Tim
e (m
s)
Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks
1 2 3 4 5 60Damaged device number
80
100
120
140
160
180
200
220
240
260
280
Dec
odin
g tim
e (m
s)
Figure 11 Relationship between different local equipment failuresand decoding time
Table 4 Environmental parameters (non-real-time data)
Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB
Security and Communication Networks 13
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 6: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/6.jpg)
output result of the privacy algorithm M it is almostimpossible to infer which record the input data is Withlocal differential privacy technology each user can processindividual data independently In other words the processof privacy processing is transferred from the data collectorto a single user and therefore there is no need for trustedthird-party intervention -is can avoid the attacks thatmay be brought on by untrusted third-party data collectors-e implementation of local differential privacy protectionrequires the intervention of a data disturbance mechanismRandom response technology is the mainstream distur-bance mechanism for local differential privacy protectiontechnology
322 ReedndashSolomon Encoding RS encoding is a matrixoperation As shown in Figure 4 the input data is treated asmatrix C (C1 C2 Cn) and the encoding matrix B ismultiplied by matrix C to obtain data blocks(C1 C2 Cn) and y redundant data blocks (n 5 y 3)-e encoding matrix B must have reversibility of anysubmatrix After RS encoding the data is divided into n partsand y redundant data are generated In the data of thesen + y parts one can recover the total data with at least n datavalues In other words the total data cannot be recoveredwith less than n data values
323 Data Protection Scheme Based on Local DifferentialPrivacy and ReedndashSolomon Encoding With continuousdevelopments in industries many factories have startedusing equipment such as CNC machine tools to increaseproductivity However some problems have also beenexposed For example a serious failure of a CNC machinetool will not only lead to the shutdown of the entireproduction line but also seriously delay the product pro-duction cycle -erefore to avoid this situation the factoryshould analyze the data collected by sensors predictpossible situations and warn operators or adjust theequipment according to the forecasted situation in a timelymanner to avoid an accidental shutdown of the equipmentHowever the data contains not only real-time data col-lected by the sensor but also a large amount of sensitivedata of the factory equipment such as node location andspatial coordinates -is type of sensitive data related tolocations should not be leaked before predictive mainte-nance analysis -erefore it needs to be desensitized beforedata analysis to eliminate the correspondence between theequipment and the location In addition because the real-time data information collected by factory sensors is ofgreat value and high confidentiality we cannot upload thisdata to the fog or the cloud it must be stored in localequipment -e protection scheme for real-time sensitivedata is described in detail below
We use local differential privacy to desensitize real-timesensitive data -en we encode the perturbed data andundisturbed nonsensitive data by RS encoding and store it inlocal equipment Figure 5 shows the protection process forreal-time data
-e real-time data collected by the factory sensors is firstuploaded to the local equipment-en the following processis performed in the local equipment
(1) Data Disturbance with Local Differential Privacy Becauseof the limited computing power of local equipment it cannotperform complex computing tasks -erefore it is very im-portant to design a lightweight privacy protection algorithmLocal differential privacy considers the untrustworthiness of thedata collector and its disturbancemechanismmainly includes arandom response information compression and distortion-e disturbance framework of the random response technologyis simple and intuitive and the disturbance level can be directlyquantified -erefore we choose local differential privacy toprotect the sensitive data We refer to the basic idea of theRAPPOR algorithm which is an open-source project of Googleand improve RAPPOR to reduce the computational cost of localequipment while ensuring privacy and data availability In thispaper we label the improved algorithm asM-RAPPOR and weuse it to disturb the location information Section 325 discussesthe specific disturbance steps
(2) RS Encoding and Storage We apply RS encoding to theperturbed data and undisturbed nonsensitive data and store thedata in local equipment However owing to the continuousproduction of data in factories and the limited storage capacityof local equipment the storage pressure on equipment isgradually increasing In addition if the local equipment failsunexpectedly it will not be possible to completely restore thedata resulting in serious losses to the factory To solve these twoproblems we adopt the optimal solution namely distributedstorage in the local equipment and add corresponding re-strictions to RS -is not only solves the problem of the highstorage pressure on local equipment and the fact that datacannot be recovered owing to the failure of the local equipmentbut also improves the encoding efficiency and reduces thecomputational cost Section 324 discusses the specific dis-tributed storage strategy
(3) RS Decoding and Analysis When predictive maintenanceanalysis is performed the data stored in the local equipmentare decoded and the analysis is performed according to thestatistical location distribution and data Section 325 dis-cusses the specific statistical process
lowast =
C
B
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
B11 B13 B14 B15B12 D1
D2
D3
C1
C2
C3
C4
C5
C1
C2
C3
C4
C5
B11 B13 B14 B15B12
B11 B13 B14 B15B12
Figure 4 RS encoding process
6 Security and Communication Networks
324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is
Zi b times Mi
M1 + M2 + middot middot middot + Mn
(2)
To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality
b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
lemlt b (3)
In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is
m b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
(4)
325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability
-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made
(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection
(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding
Masked data
Sensitive data(location space
coordinates)
Nonsensitivedata
Predictivemaintenance
analysis
DesensitizationdataAnalysis result
M-RAPPOR
RS-encoding
Local
Data processing Storage
Industrialproduction
line
Storage
RS decoding
Figure 5 Protection process for real-time data
Security and Communication Networks 7
attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased
-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)
(1) Data Sender
(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1
(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value
P Biprime x( 1113857
f1 x 1
f2 x 0
1 minus f1 minus f2 x Bi
⎧⎪⎪⎨
⎪⎪⎩(5)
To satisfy ε-local differential privacy the privacy budget εis calculated
ε ln1 minus f2
f1 (6)
Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once
(2) Data Collector
(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime
(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows
ni N f2 minus 2f1f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2+
Di f1 minus 1 + f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2
(7)
(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value
In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased
326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data
According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)
attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons
(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation
(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists
(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni
(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location
In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data
33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes
331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment
8 Security and Communication Networks
-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection
332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967
represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc
Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud
nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as
Ttotal Tcloud + Tfog (8)
where Tcloud is the storage time required for the cloud andTfog that for the fog
-e storage time required for each cloud node isexpressed as
Tck tck middot (k minus 1) middotSck
1113936nj1 Scj
(9)
where Tck is the storage time required for cloud node Nck
and tckis the storage time required when cloud node Nck
stores each data block-erefore the storage time Tcloud required for the cloud
is expressed as
Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)
In addition we analyze the two storage methods for thefog nodes as follows
(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as
AES
DATA
Some data
RS encoding
Other data
Local
Equipment 1 Equipment 2 hellip Equipment n
Fog
Cloud
Figure 6 Protection process for data that does not require real-time processing
Security and Communication Networks 9
Tfog
tf1 middot (x + 1) 0lt x + 1leF1 max
tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max
tfn middot x + 1 minus 1113936nminus1
i1Fi max1113888 1113889 + 1113936
nminus1
i1tfi middot Fi max 1113936
nminus1
i1Fi max ltx + 1le 1113936
n
i1Fi max
⎧⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎩
(11)
where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi
(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as
Tfk tfk middot (x + 1) middotSfk
1113936ni1 Sfi
(12)
-erefore the storage time Tfog required for the fog isexpressed as
Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)
After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)
333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix
Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory
34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency
341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field
in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values
Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system
342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as
Est n
n + m (14)
-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes
4 Results and Discussion
41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments
411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a
10 Security and Communication Networks
normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters
412 Experimental Results
(1) M-RAPPOR
(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable
(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR
-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value
-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment
(2) ReedndashSolomon EncodingRS encoding consists of the following steps
(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the
Table 1 Degree of difficulty for cracking encoding matrix
Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248
8 6 2 296
16 6 1 296
16 6 2 2192
Table 2 Comparison of processing time for different word lengths
Word size of Galoisfield
Number of source datablocks k
Time(ms)
8 20 19216 20 1928 200 147116 200 1472
Table 3 Environmental parameters (real-time data)
Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB
RAPPORM-RAPPOR
200 300 400 500 600 700 800 900 1000100Number of data (thousand)
0
20
40
60
80
100
120
Tim
e (m
s)
Figure 7 Computational cost of RAPPOR and M-RAPPOR
Security and Communication Networks 11
configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance
(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In
addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability
(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n
6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According
0
000
001
Prop
ortio
n 002
003
50 100 150 0
000
001
Prop
ortio
n
002
003
50 100 150 200
0
0000
0005
0010
0015
0020
Prop
ortio
n
0025
50 100 150
True valueStatistical value
True valueStatistical value
True valueStatistical value
200
f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2
f1 = 001 f2 = 001 ε = 46
Figure 8 Frequency statistics results under different privacy budgets
12 Security and Communication Networks
to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the
corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed
42 Experiments on Protection of Non-Real-TimeIndustrial Data
421 Experimental Configuration
(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes
(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool
422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60
Equalized storageNonequalized storage
18 21 24 27 30 33 36 39 42 4515Number of data blocks
0
50
100
150
200
250
Dec
odin
g tim
e (m
s)
Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time
Encoding time (ms)Decoding time (ms)
5 10 15 20 25 300Number of data blocks
100
150
200
250
300
350
400
450
Tim
e (m
s)
Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks
1 2 3 4 5 60Damaged device number
80
100
120
140
160
180
200
220
240
260
280
Dec
odin
g tim
e (m
s)
Figure 11 Relationship between different local equipment failuresand decoding time
Table 4 Environmental parameters (non-real-time data)
Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB
Security and Communication Networks 13
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 7: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/7.jpg)
324 Optimal Solution for Distributed Storage In theprevious section we mentioned that distributed storagecan solve the problem of high storage pressure on localequipment However to solve the problem in which thedata cannot be completely recovered owing to the failure ofthe local equipment it is necessary to impose restrictionson the redundant data blocks according to the charac-teristics of the RS encoding to ensure that the completedata can be recovered when any of the local equipmentfails Assuming that the encoded real-time data is b theredundant data block is m We store data block b in dif-ferent local pieces of equipment according to the amountof local equipment -e storage capacity of local equip-ment is different for various industrial environments-erefore we assume that the storage capacities ofequipment 1 equipment 2 equipment n areM1 M2 Mn respectively therefore the data Zi storedby equipment i is
Zi b times Mi
M1 + M2 + middot middot middot + Mn
(2)
To ensure that complete data can still be recovered whenany local equipment fails the redundant datammust satisfythe following inequality
b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
lemlt b (3)
In addition with an increase in the number of redundantdata blocks the coding and decoding efficiency will decreaseas confirmed in Section 4 -erefore m should be theminimum value that is
m b times max M1 M2 Mn1113864 1113865
M1 + M2 + middot middot middot + Mn
(4)
325 M-RAPPOR -e RAPPOR algorithm [9] is dividedinto two parts for the privacy protection of user data datasender and data collector In the predictive maintenancescenario considered in this study the data sender is thelocal equipment that collects sensor data and the datacollector is the local data center that performs the pre-dictive maintenance analysis However the processingcapacity of local equipment is limited and the locationdistribution after statistics is needed in predictive main-tenance analysis therefore the computational cost of localequipment should be reduced while ensuring privacy anddata availability
-is study proposes an improvedM-RAPPOR algorithmbased on RAPPOR -e following improvements are made
(1) Data is disturbed only once this can decrease thecomputational cost of the data sender Because onlyone disturbance occurs the level of privacy pro-tection is decreased -erefore we add a probabilityfactor to the permanent random response (PRR)and we use two probability factors to expand theflexibility of parameter adjustment so as to increasethe level of privacy protection
(2) -e number of hash functions h is set to 1 and k ofthe Bloom filter is set to be equal to two times thenumber of data attribute values (increased scal-ability) After correction the number of 1 on each bitof the Bloom filter is the number of corresponding
Masked data
Sensitive data(location space
coordinates)
Nonsensitivedata
Predictivemaintenance
analysis
DesensitizationdataAnalysis result
M-RAPPOR
RS-encoding
Local
Data processing Storage
Industrialproduction
line
Storage
RS decoding
Figure 5 Protection process for real-time data
Security and Communication Networks 7
attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased
-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)
(1) Data Sender
(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1
(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value
P Biprime x( 1113857
f1 x 1
f2 x 0
1 minus f1 minus f2 x Bi
⎧⎪⎪⎨
⎪⎪⎩(5)
To satisfy ε-local differential privacy the privacy budget εis calculated
ε ln1 minus f2
f1 (6)
Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once
(2) Data Collector
(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime
(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows
ni N f2 minus 2f1f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2+
Di f1 minus 1 + f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2
(7)
(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value
In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased
326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data
According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)
attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons
(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation
(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists
(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni
(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location
In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data
33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes
331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment
8 Security and Communication Networks
-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection
332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967
represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc
Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud
nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as
Ttotal Tcloud + Tfog (8)
where Tcloud is the storage time required for the cloud andTfog that for the fog
-e storage time required for each cloud node isexpressed as
Tck tck middot (k minus 1) middotSck
1113936nj1 Scj
(9)
where Tck is the storage time required for cloud node Nck
and tckis the storage time required when cloud node Nck
stores each data block-erefore the storage time Tcloud required for the cloud
is expressed as
Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)
In addition we analyze the two storage methods for thefog nodes as follows
(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as
AES
DATA
Some data
RS encoding
Other data
Local
Equipment 1 Equipment 2 hellip Equipment n
Fog
Cloud
Figure 6 Protection process for data that does not require real-time processing
Security and Communication Networks 9
Tfog
tf1 middot (x + 1) 0lt x + 1leF1 max
tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max
tfn middot x + 1 minus 1113936nminus1
i1Fi max1113888 1113889 + 1113936
nminus1
i1tfi middot Fi max 1113936
nminus1
i1Fi max ltx + 1le 1113936
n
i1Fi max
⎧⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎩
(11)
where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi
(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as
Tfk tfk middot (x + 1) middotSfk
1113936ni1 Sfi
(12)
-erefore the storage time Tfog required for the fog isexpressed as
Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)
After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)
333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix
Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory
34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency
341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field
in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values
Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system
342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as
Est n
n + m (14)
-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes
4 Results and Discussion
41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments
411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a
10 Security and Communication Networks
normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters
412 Experimental Results
(1) M-RAPPOR
(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable
(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR
-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value
-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment
(2) ReedndashSolomon EncodingRS encoding consists of the following steps
(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the
Table 1 Degree of difficulty for cracking encoding matrix
Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248
8 6 2 296
16 6 1 296
16 6 2 2192
Table 2 Comparison of processing time for different word lengths
Word size of Galoisfield
Number of source datablocks k
Time(ms)
8 20 19216 20 1928 200 147116 200 1472
Table 3 Environmental parameters (real-time data)
Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB
RAPPORM-RAPPOR
200 300 400 500 600 700 800 900 1000100Number of data (thousand)
0
20
40
60
80
100
120
Tim
e (m
s)
Figure 7 Computational cost of RAPPOR and M-RAPPOR
Security and Communication Networks 11
configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance
(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In
addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability
(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n
6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According
0
000
001
Prop
ortio
n 002
003
50 100 150 0
000
001
Prop
ortio
n
002
003
50 100 150 200
0
0000
0005
0010
0015
0020
Prop
ortio
n
0025
50 100 150
True valueStatistical value
True valueStatistical value
True valueStatistical value
200
f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2
f1 = 001 f2 = 001 ε = 46
Figure 8 Frequency statistics results under different privacy budgets
12 Security and Communication Networks
to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the
corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed
42 Experiments on Protection of Non-Real-TimeIndustrial Data
421 Experimental Configuration
(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes
(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool
422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60
Equalized storageNonequalized storage
18 21 24 27 30 33 36 39 42 4515Number of data blocks
0
50
100
150
200
250
Dec
odin
g tim
e (m
s)
Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time
Encoding time (ms)Decoding time (ms)
5 10 15 20 25 300Number of data blocks
100
150
200
250
300
350
400
450
Tim
e (m
s)
Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks
1 2 3 4 5 60Damaged device number
80
100
120
140
160
180
200
220
240
260
280
Dec
odin
g tim
e (m
s)
Figure 11 Relationship between different local equipment failuresand decoding time
Table 4 Environmental parameters (non-real-time data)
Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB
Security and Communication Networks 13
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 8: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/8.jpg)
attribute values the lasso regression is not neededand therefore the computation cost of the datacollector is decreased
-e M-RAPPOR algorithm process is divided into thedata sender (local equipment) and the data collector (localdata center)
(1) Data Sender
(1) -e length of the bit array B is set to k and the numberh of the hash function is set to 1-e initial value of allbits is set to 0-e sensitive data v is hashed only onceon the bit array B and the bit value corresponding tothe hash value is changed from 0 to 1
(2) -e PRR is obtained by perturbing each bit of bit arrayB using a random response technique-e disturbanceis performed according to the following equation Weintroduce the second probability factor f2 wheref1 f2 isin [0 1] represents the probability value
P Biprime x( 1113857
f1 x 1
f2 x 0
1 minus f1 minus f2 x Bi
⎧⎪⎪⎨
⎪⎪⎩(5)
To satisfy ε-local differential privacy the privacy budget εis calculated
ε ln1 minus f2
f1 (6)
Compared with RAPPOR the computational cost isdecreased because the data is only mapped using the Bloomfilter and is disturbed once
(2) Data Collector
(1) -e data collector collects all vectors Bprime sent by thedata sender and counts the number Di of 1 valuescorresponding to each bit i on Bprime
(2) Each bit Di is corrected where N is the total numberof data that is sent by the data sender -e correctedresults ni are expressed as follows
ni N f2 minus 2f1f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2+
Di f1 minus 1 + f2( 1113857
2f1 minus 4f1f2 minus 1 + 2f2
(7)
(3) -e corresponding ni is determined according to thehash map of the data attribute value ni is the fre-quency statistic of the attribute value
In the algorithm of the data collector lasso regression isnot conducted therefore the computational cost isdecreased
326 Security Analysis -is section describes the securityanalysis of the proposed threat model and proves that theproposed security storage scheme can really protect sensitiveindustrial data
According to the threat model proposed in Section 312for real-time industrial data (predictive maintenance data)
attackers will attack the local data center and infer thefactoryrsquos internal framework map based on the obtainedlocation data of the factory equipment this will causeeconomic losses to the factory When the proposed datasecurity storage scheme is adopted it is impossible for at-tackers to obtain specific location information for the fol-lowing reasons
(1) We assume that the data center is attacked and thatthe predictive maintenance data including locationinformation is stolen Because the location infor-mation is disturbed by M-RAPPOR on the localequipment the attacker obtains the location infor-mation as a ldquo0-1rdquo string instead of specific locationinformation
(2) From the algorithm of the data collector as describedin Section 324 to obtain the frequency distributionof each location value the list of candidate attributevalues is needed that is it is necessary to know whatthe location attribute values are It is difficult forattackers to collect all attribute value lists
(3) We assume that attackers are intelligent enough tocollect a list of all attribute values According toequation (7) the corrected result ni is directly relatedto f1 and f2 However f1 and f2 are set by us andtherefore it is difficult for attackers to infer thecorrected result ni
(4) We assume that attackers collect a list of all attributevalues and infer f1 and f2 -ey will only obtain thefrequency statistic of each location value and theycannot obtain the corresponding relationship be-tween the factory equipment and the location
In summary it is impossible for attackers to obtain thespecific location information of the factory equipment-erefore the proposed scheme can effectively protectsensitive industrial data
33 Protection of Non-Real-Time Industrial DataIndustrial data includes data processed in real time as well asdata that is not processed in real time Because of the largeamount of non-real-time data it is usually stored with third-party cloud service providers However relying on cloudencrypted storage alone cannot defend against internal at-tacks from the cloud therefore it cannot effectively solve thedata security problem -erefore we design a data protec-tion scheme with AES encryption and RS encoding forcloud-fog collaborative storage According to the cloud-fogcollaborative storage strategy some encoded data is stored inthe fog nodes and the rest of the data is stored in the cloudnodes
331 Data Protection Scheme for Cloud-Fog CollaborativeStorage Figure 6 shows the proposed data protectionscheme for cloud-fog collaborative storage of data that is notprocessed in real time -e scheme includes three layersmdashcloud nodes fog nodes and local equipment
8 Security and Communication Networks
-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection
332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967
represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc
Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud
nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as
Ttotal Tcloud + Tfog (8)
where Tcloud is the storage time required for the cloud andTfog that for the fog
-e storage time required for each cloud node isexpressed as
Tck tck middot (k minus 1) middotSck
1113936nj1 Scj
(9)
where Tck is the storage time required for cloud node Nck
and tckis the storage time required when cloud node Nck
stores each data block-erefore the storage time Tcloud required for the cloud
is expressed as
Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)
In addition we analyze the two storage methods for thefog nodes as follows
(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as
AES
DATA
Some data
RS encoding
Other data
Local
Equipment 1 Equipment 2 hellip Equipment n
Fog
Cloud
Figure 6 Protection process for data that does not require real-time processing
Security and Communication Networks 9
Tfog
tf1 middot (x + 1) 0lt x + 1leF1 max
tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max
tfn middot x + 1 minus 1113936nminus1
i1Fi max1113888 1113889 + 1113936
nminus1
i1tfi middot Fi max 1113936
nminus1
i1Fi max ltx + 1le 1113936
n
i1Fi max
⎧⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎩
(11)
where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi
(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as
Tfk tfk middot (x + 1) middotSfk
1113936ni1 Sfi
(12)
-erefore the storage time Tfog required for the fog isexpressed as
Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)
After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)
333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix
Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory
34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency
341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field
in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values
Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system
342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as
Est n
n + m (14)
-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes
4 Results and Discussion
41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments
411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a
10 Security and Communication Networks
normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters
412 Experimental Results
(1) M-RAPPOR
(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable
(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR
-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value
-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment
(2) ReedndashSolomon EncodingRS encoding consists of the following steps
(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the
Table 1 Degree of difficulty for cracking encoding matrix
Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248
8 6 2 296
16 6 1 296
16 6 2 2192
Table 2 Comparison of processing time for different word lengths
Word size of Galoisfield
Number of source datablocks k
Time(ms)
8 20 19216 20 1928 200 147116 200 1472
Table 3 Environmental parameters (real-time data)
Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB
RAPPORM-RAPPOR
200 300 400 500 600 700 800 900 1000100Number of data (thousand)
0
20
40
60
80
100
120
Tim
e (m
s)
Figure 7 Computational cost of RAPPOR and M-RAPPOR
Security and Communication Networks 11
configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance
(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In
addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability
(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n
6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According
0
000
001
Prop
ortio
n 002
003
50 100 150 0
000
001
Prop
ortio
n
002
003
50 100 150 200
0
0000
0005
0010
0015
0020
Prop
ortio
n
0025
50 100 150
True valueStatistical value
True valueStatistical value
True valueStatistical value
200
f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2
f1 = 001 f2 = 001 ε = 46
Figure 8 Frequency statistics results under different privacy budgets
12 Security and Communication Networks
to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the
corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed
42 Experiments on Protection of Non-Real-TimeIndustrial Data
421 Experimental Configuration
(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes
(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool
422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60
Equalized storageNonequalized storage
18 21 24 27 30 33 36 39 42 4515Number of data blocks
0
50
100
150
200
250
Dec
odin
g tim
e (m
s)
Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time
Encoding time (ms)Decoding time (ms)
5 10 15 20 25 300Number of data blocks
100
150
200
250
300
350
400
450
Tim
e (m
s)
Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks
1 2 3 4 5 60Damaged device number
80
100
120
140
160
180
200
220
240
260
280
Dec
odin
g tim
e (m
s)
Figure 11 Relationship between different local equipment failuresand decoding time
Table 4 Environmental parameters (non-real-time data)
Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB
Security and Communication Networks 13
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 9: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/9.jpg)
-e local equipment encrypts the data using AES andthen uploads the ciphertext to the fog nodes After receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks using RS encoding According to thestorage capacity of fog nodes and cloud nodes we design anallocation strategy for cloud-fog collaborative storage thatstores some data in the fog nodes and uploads the rest of thedata to the cloud nodes -is strategy is described in the nextsection
332 Allocation Strategy for Cloud-Fog CollaborativeStorage As noted in the previous section after receivingthe ciphertext the fog nodes generate k data blocks and x
redundant data blocks by RS encoding -en we allocatek + x data blocks according to the storage capacity of the fognodes and the cloud nodes Let Nf Nf1 Nf2 Nfn1113966 1113967
represent fog nodes and Sf Sf1 Sf2 Sfn1113966 1113967 repre-sent the storage capacity of fog nodes Similarly letNc Nc1 Nc2 Ncn1113864 1113865 represent cloud nodes and Sc
Sc1 Sc2 Scn1113864 1113865 represent the storage capacity of cloudnodes In addition because the storage capacity of the cloudnodes is greater than that of the fog nodes we store most ofthe data in the cloud according to the storage capacity of thecloud nodes To prevent data leakage we store x + 1 blockdata blocks in fog nodes and store k minus 1 data blocks in cloud
nodes -e total time Ttotal required for cloud-fog collabo-rative storage is expressed as
Ttotal Tcloud + Tfog (8)
where Tcloud is the storage time required for the cloud andTfog that for the fog
-e storage time required for each cloud node isexpressed as
Tck tck middot (k minus 1) middotSck
1113936nj1 Scj
(9)
where Tck is the storage time required for cloud node Nck
and tckis the storage time required when cloud node Nck
stores each data block-erefore the storage time Tcloud required for the cloud
is expressed as
Tcloud max Tc1 Tc2 Tcn1113864 1113865 (10)
In addition we analyze the two storage methods for thefog nodes as follows
(1) -e data is first stored in the first fog node theremaining data is stored in the next fog node whenthe amount of data in the first node reaches an ac-ceptable maximum and so onAt this time the storage time Tfog required for the fogis expressed as
AES
DATA
Some data
RS encoding
Other data
Local
Equipment 1 Equipment 2 hellip Equipment n
Fog
Cloud
Figure 6 Protection process for data that does not require real-time processing
Security and Communication Networks 9
Tfog
tf1 middot (x + 1) 0lt x + 1leF1 max
tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max
tfn middot x + 1 minus 1113936nminus1
i1Fi max1113888 1113889 + 1113936
nminus1
i1tfi middot Fi max 1113936
nminus1
i1Fi max ltx + 1le 1113936
n
i1Fi max
⎧⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎩
(11)
where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi
(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as
Tfk tfk middot (x + 1) middotSfk
1113936ni1 Sfi
(12)
-erefore the storage time Tfog required for the fog isexpressed as
Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)
After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)
333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix
Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory
34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency
341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field
in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values
Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system
342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as
Est n
n + m (14)
-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes
4 Results and Discussion
41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments
411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a
10 Security and Communication Networks
normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters
412 Experimental Results
(1) M-RAPPOR
(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable
(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR
-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value
-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment
(2) ReedndashSolomon EncodingRS encoding consists of the following steps
(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the
Table 1 Degree of difficulty for cracking encoding matrix
Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248
8 6 2 296
16 6 1 296
16 6 2 2192
Table 2 Comparison of processing time for different word lengths
Word size of Galoisfield
Number of source datablocks k
Time(ms)
8 20 19216 20 1928 200 147116 200 1472
Table 3 Environmental parameters (real-time data)
Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB
RAPPORM-RAPPOR
200 300 400 500 600 700 800 900 1000100Number of data (thousand)
0
20
40
60
80
100
120
Tim
e (m
s)
Figure 7 Computational cost of RAPPOR and M-RAPPOR
Security and Communication Networks 11
configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance
(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In
addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability
(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n
6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According
0
000
001
Prop
ortio
n 002
003
50 100 150 0
000
001
Prop
ortio
n
002
003
50 100 150 200
0
0000
0005
0010
0015
0020
Prop
ortio
n
0025
50 100 150
True valueStatistical value
True valueStatistical value
True valueStatistical value
200
f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2
f1 = 001 f2 = 001 ε = 46
Figure 8 Frequency statistics results under different privacy budgets
12 Security and Communication Networks
to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the
corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed
42 Experiments on Protection of Non-Real-TimeIndustrial Data
421 Experimental Configuration
(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes
(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool
422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60
Equalized storageNonequalized storage
18 21 24 27 30 33 36 39 42 4515Number of data blocks
0
50
100
150
200
250
Dec
odin
g tim
e (m
s)
Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time
Encoding time (ms)Decoding time (ms)
5 10 15 20 25 300Number of data blocks
100
150
200
250
300
350
400
450
Tim
e (m
s)
Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks
1 2 3 4 5 60Damaged device number
80
100
120
140
160
180
200
220
240
260
280
Dec
odin
g tim
e (m
s)
Figure 11 Relationship between different local equipment failuresand decoding time
Table 4 Environmental parameters (non-real-time data)
Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB
Security and Communication Networks 13
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 10: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/10.jpg)
Tfog
tf1 middot (x + 1) 0lt x + 1leF1 max
tf1 middot F1 max + tf2 middot x + 1 minus F1 max( 1113857 F1 max ltx + 1leF1 max + F2 max
tfn middot x + 1 minus 1113936nminus1
i1Fi max1113888 1113889 + 1113936
nminus1
i1tfi middot Fi max 1113936
nminus1
i1Fi max ltx + 1le 1113936
n
i1Fi max
⎧⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎩
(11)
where tfi is the storage time required when fog nodeNfi stores each data block and Fi max is the maxi-mum storage required for fog node Nfi
(2) We store data based on the storage capacity of the fognodes At this time the storage time required foreach fog node is expressed as
Tfk tfk middot (x + 1) middotSfk
1113936ni1 Sfi
(12)
-erefore the storage time Tfog required for the fog isexpressed as
Tfog max Tf1 Tf2 Tfn1113966 1113967 (13)
After calculating Tfog we can calculate the total timerequired for cloud-fog collaborative storage by substitutingthe results of Tcloud and Tfog into equation (8)
333 Security Analysis For non-real-time processing ofindustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in thefog nodes and the rest in the cloud nodes to realizemultilayer data protection -is solution solves theproblem of data leakage to a large extent Here we discussthe worst case If the attackers are intelligent enough tosteal k (or more than k) data blocks from the cloud andfog they have to crack the encoding matrix in RSencoding and the AES algorithm to obtain the originaldata However it is very difficult for attackers to crackboth the AES algorithm and the encoding matrix Herewe consider the AES-128 algorithm as an example It takes2127 ns to crack the AES algorithm when decryption isperformed once per nanosecond this is impossible forattackers Table 1 shows the degree of difficulty if attackerswant to crack the encoding matrix
Table 1 shows that it is difficult for attackers to crack theencoding matrix In an industrial scenario the numbers ofsource and redundant data blocks are very large therefore itis impossible for attackers to crack the encoding matrix intheory
34 Efficiency Analysis -is section mainly analyzes theencoding efficiency and storage efficiency
341 Encoding Efficiency -e RS algorithm performs thefour fundamental arithmetic operations in the Galois field
in which the number of elements in the field must be greaterthan the sum of the number of source data blocks andredundant data blocks that is 2w gt n+m where n andm arethe numbers of source data blocks and redundant datablocks respectively When the word size w is fixed the timecomplexity of the RS algorithm with the Vandermondematrix is O(n2) therefore the encoding efficiency decreaseswith an increase in the number of the source data blockGenerally w is 8 or 16 because the computer stores bytes as8 bits Table 2 shows the data recovery performance when thenumber of data blocks n remains unchanged and w takesdifferent values
Table 2 shows that the data recovery performance isalmost unaffected with an increase in w Considering that alarger w value has higher encoding efficiency (processingmore data blocks at a time) and that the number of sourcedata blocks n is very large in the actual industrial scenariowe can try to choose the highest w allowed by the system
342 Storage Efficiency -e storage efficiency is an im-portant index of storage-related algorithms A system withhigh storage efficiency can save the storage capacity as muchas possible According to the Storage Industry NetworkingAssociationrsquos definition of storage efficiency the storageefficiency Est in this study is expressed as
Est n
n + m (14)
-is equation shows that when n is larger and m issmaller the storage efficiency becomes increasingly higherConsidering the large number of source data blocks n in theindustrial scenario a small number of redundant datablocks are needed to improve storage efficiency In thisstudy for real-time industrial data the minimum numberof redundant data blocks can be selected as described inSection 324 For non-real-time industrial data the min-imum number of redundant data blocks can be selectedaccording to the storage capacity of cloud nodes and fognodes
4 Results and Discussion
41 Experiments on Protection of Real-Time Industrial Data-is section verifies the performance of the proposed dataprotection scheme through simulation experiments
411 Experimental Configuration In the simulation ex-periment we use the following 10 groups of data that obey a
10 Security and Communication Networks
normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters
412 Experimental Results
(1) M-RAPPOR
(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable
(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR
-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value
-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment
(2) ReedndashSolomon EncodingRS encoding consists of the following steps
(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the
Table 1 Degree of difficulty for cracking encoding matrix
Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248
8 6 2 296
16 6 1 296
16 6 2 2192
Table 2 Comparison of processing time for different word lengths
Word size of Galoisfield
Number of source datablocks k
Time(ms)
8 20 19216 20 1928 200 147116 200 1472
Table 3 Environmental parameters (real-time data)
Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB
RAPPORM-RAPPOR
200 300 400 500 600 700 800 900 1000100Number of data (thousand)
0
20
40
60
80
100
120
Tim
e (m
s)
Figure 7 Computational cost of RAPPOR and M-RAPPOR
Security and Communication Networks 11
configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance
(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In
addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability
(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n
6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According
0
000
001
Prop
ortio
n 002
003
50 100 150 0
000
001
Prop
ortio
n
002
003
50 100 150 200
0
0000
0005
0010
0015
0020
Prop
ortio
n
0025
50 100 150
True valueStatistical value
True valueStatistical value
True valueStatistical value
200
f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2
f1 = 001 f2 = 001 ε = 46
Figure 8 Frequency statistics results under different privacy budgets
12 Security and Communication Networks
to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the
corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed
42 Experiments on Protection of Non-Real-TimeIndustrial Data
421 Experimental Configuration
(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes
(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool
422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60
Equalized storageNonequalized storage
18 21 24 27 30 33 36 39 42 4515Number of data blocks
0
50
100
150
200
250
Dec
odin
g tim
e (m
s)
Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time
Encoding time (ms)Decoding time (ms)
5 10 15 20 25 300Number of data blocks
100
150
200
250
300
350
400
450
Tim
e (m
s)
Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks
1 2 3 4 5 60Damaged device number
80
100
120
140
160
180
200
220
240
260
280
Dec
odin
g tim
e (m
s)
Figure 11 Relationship between different local equipment failuresand decoding time
Table 4 Environmental parameters (non-real-time data)
Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB
Security and Communication Networks 13
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 11: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/11.jpg)
normal distribution as location data 105 2times105 106-e data attribute value set is 1 2 3 100 And we usethe turbofan engine aging dataset provided by the NASAPrognostics Data Repository [31] as nonsensitive data Ta-ble 3 shows the environmental parameters
412 Experimental Results
(1) M-RAPPOR
(1) Performance of M-RAPPOR Figure 7 shows acomparison between the computational cost ofM-RAPPOR and RAPPORWe tested 10 sets of data-is figure shows that the improved M-RAPPORalgorithm has a lower computational cost becauseonly one disturbance occurs in the data sender andlasso regression is not conducted in the data col-lector In addition the communication cost refers tothe cost of data transmission from the data sender tothe data collector M-RAPPOR and RAPPOR havethe same communication cost which is related to thelength k of Bloom filter Communication cost in-creases with the increase of k Because the main goalof this paper is to solve the problem of secure storageof industrial sensitive data we think the commu-nication cost is acceptable
(2) Relationship between Privacy Budget and Level ofPrivacy Protection According to the definition oflocal differential privacy the level of data privacyprotection mainly depends on the setting of theprivacy budget ε and the ε value is directly related tof1 f2 Figure 8 shows the relationship between f1f2 ε and the level of privacy protection-e value ofthe privacy budget ε is varied as 04 2 and 46 -eheight of the blue column represents the true frequencydistribution and that of the orange column the sta-tistical frequency distribution after M-RAPPOR
-is figure shows that with a smaller privacy budget suchas ε 04 the statistical results deviate from the real value toa large extent whereas with a larger privacy budget such asε 46 the statistical results are closer to the real value
-erefore the privacy budget is negatively related to the levelof privacy protection -e lower the privacy budget is thelower the data availability is and the higher the level ofprivacy protection is In the predictive maintenance sce-nario we can appropriately select a higher privacy budgetthat not only ensures data availability but also protects thespecific location information of the factory equipment
(2) ReedndashSolomon EncodingRS encoding consists of the following steps
(1) Determination of Equalized Storage and Non-equalized Storage Figure 9 shows the relationshipsamong the equalized storage nonequalized storageand decoding time -is graph shows that thedecoding time of the data blocks stored in theequalized mode is slightly better than that of theblocks stored in the nonequalizedmode However inan actual industrial scenario not all storage equip-ment has the same storage capacity therefore thebenefits of allocating data blocks according to the
Table 1 Degree of difficulty for cracking encoding matrix
Word size of Galois field Number of source data blocks k Number of redundant data blocks m Cracking time8 6 1 248
8 6 2 296
16 6 1 296
16 6 2 2192
Table 2 Comparison of processing time for different word lengths
Word size of Galoisfield
Number of source datablocks k
Time(ms)
8 20 19216 20 1928 200 147116 200 1472
Table 3 Environmental parameters (real-time data)
Item Parameter valueOperating system LinuxProgramming language PythonCPU Intel Core i5 230GHzMemory 8GBHard disk 1 TB
RAPPORM-RAPPOR
200 300 400 500 600 700 800 900 1000100Number of data (thousand)
0
20
40
60
80
100
120
Tim
e (m
s)
Figure 7 Computational cost of RAPPOR and M-RAPPOR
Security and Communication Networks 11
configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance
(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In
addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability
(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n
6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According
0
000
001
Prop
ortio
n 002
003
50 100 150 0
000
001
Prop
ortio
n
002
003
50 100 150 200
0
0000
0005
0010
0015
0020
Prop
ortio
n
0025
50 100 150
True valueStatistical value
True valueStatistical value
True valueStatistical value
200
f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2
f1 = 001 f2 = 001 ε = 46
Figure 8 Frequency statistics results under different privacy budgets
12 Security and Communication Networks
to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the
corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed
42 Experiments on Protection of Non-Real-TimeIndustrial Data
421 Experimental Configuration
(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes
(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool
422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60
Equalized storageNonequalized storage
18 21 24 27 30 33 36 39 42 4515Number of data blocks
0
50
100
150
200
250
Dec
odin
g tim
e (m
s)
Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time
Encoding time (ms)Decoding time (ms)
5 10 15 20 25 300Number of data blocks
100
150
200
250
300
350
400
450
Tim
e (m
s)
Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks
1 2 3 4 5 60Damaged device number
80
100
120
140
160
180
200
220
240
260
280
Dec
odin
g tim
e (m
s)
Figure 11 Relationship between different local equipment failuresand decoding time
Table 4 Environmental parameters (non-real-time data)
Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB
Security and Communication Networks 13
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 12: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/12.jpg)
configuration of different pieces of equipment aregreater than that of equalization -erefore we adoptnonequalized storage to optimize overall performance
(2) Selection of Optimal Redundant Data BlocksFigure 10 shows the relationship between theencoding time and the decoding time with a changein the number of redundant data blocks We vary thenumber of redundant data blocks from 0 to 30 Withan increase in the number of redundant data blocksthe encoding time increases whereas the decodingtime basically remains unchanged -is is becausethe encoding time is directly related to the encodingmatrix-e encoding matrix comprises a unit matrixand a Vandermonde matrix When the number ofredundant data blocks increases the dimension ofthe Vandermonde matrix increases leading to anincrease in the operation cost and encoding time In
addition because no data block has been removed inthis test it is not necessary to recalculate in thedecoding process data can be extracted directly fromthe equipment and decoding time is basically un-changed -erefore we should select the minimumbased on the data recoverability
(3) Verification of Robustness of Distributed Storage Toreflect the advantages of distributed storage androbustness of the scheme we have designed a localequipment failure scenario Taking Figure 6 as anexample assume that there are six local devices (n
6) and that the storage capacities are 1MB 1MB4MB 6MB 8MB and 10MB respectively -enumber of encoded real-time processing data blocksis 60 (b 60) According to equation (2) thenumber of data blocks stored by each equipmenttype is 2 2 8 12 16 and 20 respectively According
0
000
001
Prop
ortio
n 002
003
50 100 150 0
000
001
Prop
ortio
n
002
003
50 100 150 200
0
0000
0005
0010
0015
0020
Prop
ortio
n
0025
50 100 150
True valueStatistical value
True valueStatistical value
True valueStatistical value
200
f1 = 02 f2 = 07 ε = 04 f1 = 01 f2 = 02 ε = 2
f1 = 001 f2 = 001 ε = 46
Figure 8 Frequency statistics results under different privacy budgets
12 Security and Communication Networks
to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the
corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed
42 Experiments on Protection of Non-Real-TimeIndustrial Data
421 Experimental Configuration
(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes
(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool
422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60
Equalized storageNonequalized storage
18 21 24 27 30 33 36 39 42 4515Number of data blocks
0
50
100
150
200
250
Dec
odin
g tim
e (m
s)
Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time
Encoding time (ms)Decoding time (ms)
5 10 15 20 25 300Number of data blocks
100
150
200
250
300
350
400
450
Tim
e (m
s)
Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks
1 2 3 4 5 60Damaged device number
80
100
120
140
160
180
200
220
240
260
280
Dec
odin
g tim
e (m
s)
Figure 11 Relationship between different local equipment failuresand decoding time
Table 4 Environmental parameters (non-real-time data)
Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB
Security and Communication Networks 13
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 13: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/13.jpg)
to inequality (3) and equation (4) the optimal valueof the number of redundant data blocks m is ob-tained as 20 In our solution if any device fails wecan recover the complete data To verify this weperformed tests six times Each device suffers adevice failure and the test results are shown inFigure 11 -is figure shows that if any one of the sixlocal equipment types fails we can recover thecomplete data In addition we can understand thedata loss caused by equipment failure As the amountof lost data increases the decoding time increases-is is because the lost data need to be solved bythe redundant data block combined with the
corresponding row in the decoding matrix to obtainthe complete equation-e larger the amount of datathat is lost the more difficult to solve the equationsand the greater the amount of time consumed
42 Experiments on Protection of Non-Real-TimeIndustrial Data
421 Experimental Configuration
(1) Dataset We chose an industrial production datasetprovided by Bosch [32] and selected 36MB of non-real-timedata from this dataset We stored x + 1 data blocks in the fognodes -is scheme can protect data and reduce the storagepressure on the fog nodes
(2) Experimental Environment Table 4 shows the environ-mental parameters In addition we used MATLAB(MathWorks Natick MA US) as the data analysis tool
422 Experimental Results -is section presents the resultsof our cloud-fog collaborative storage experiments andcomparisons of the time required for cloud-fog collaborativestorage when different storage methods are adopted for thefog nodes After receiving the ciphertext the fog nodesgenerate k data blocks and x redundant data blocks by RSencoding Considering that the processing capacity of fognodes is better than that of local equipment we set x to 60
Equalized storageNonequalized storage
18 21 24 27 30 33 36 39 42 4515Number of data blocks
0
50
100
150
200
250
Dec
odin
g tim
e (m
s)
Figure 9 Relationship between equalized storage nonequalizedstorage and decoding time
Encoding time (ms)Decoding time (ms)
5 10 15 20 25 300Number of data blocks
100
150
200
250
300
350
400
450
Tim
e (m
s)
Figure 10 Relationship between encoding time and decoding timeas the change of redundant data blocks
1 2 3 4 5 60Damaged device number
80
100
120
140
160
180
200
220
240
260
280
Dec
odin
g tim
e (m
s)
Figure 11 Relationship between different local equipment failuresand decoding time
Table 4 Environmental parameters (non-real-time data)
Item Parameter valueOperating system Windows 10Programming language JavaCPU Intel Core i5 330GHzMemory 8GBHard disk 1 TB
Security and Communication Networks 13
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 14: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/14.jpg)
-us we store x + 1(x + 1 61) data blocks in the fog nodesand upload the remaining k minus 1 data blocks to the cloudnodes We assume that there are three cloud nodesNc1 Nc2 Nc3 where each node has a storage capacity of200MB-e time required for the storage of each data blockis 2ms -ere are three fog nodes Nf1 Nf2 Nf3 havingstorage capacities of 30MB 30MB and 40MB respectivelyand the maximum number of data blocks storingF1 max F2 max F3 max is 60 60 and 70 respectively -etime tf1 tf2 tf3 required for the storage of each data block is6ms 6ms and 55ms respectively From equations(11) ndash(13) we can calculate the storage time required whenthe fog nodes adopt two different storage methodsAccording to equations (9) and (10) we can calculate thestorage time required for the cloud nodes Finally accordingto equation (8) we can calculate the total time required forcloud-fog collaborative storage Figure 12 shows the totaltime required for cloud-fog collaborative storage when thefog nodes adopt storage method 1 and storage method 2-is figure shows that the total time required for cloud-fogcollaborative storage is less for storage method 2 and as thenumber of data blocks that the fog nodes need to encodeincreases the difference in the total storage time required bythe two different storage methods increases that is theadvantage of storage method 2 is greater Considering thelarge amount of non-real-time processing data in the in-dustrial scene we choose storage method 2 in the fog nodesto achieve the best performance for the entire cloud-fogcollaborative storage solution
5 Conclusions
At present most industrial enterprises do not have perfectindustrial data protection measures and lack a complete andeffective industrial data protection solution However re-lying only on fog computing or cloud computing cannotfully protect industrial data We proposed a three-layer
protection framework with localfogcloud storage forsensitive industrial data and defined a corresponding threatmodel For real-time sensitive industrial data we designed adata protection scheme based on the improved local dif-ferential privacy algorithm M-RAPPOR and RS encodingWe desensitized and encoded the data in local equipmentWe then adopted the optimal scheme for distributed storagein local equipment to realize low cost high efficiency andintelligent data protection For non-real-time sensitive in-dustrial data we designed a data protection scheme forcloud-fog collaborative storage based on AES encryptionand RS encoding Some encoded data were stored in the fognodes and the rest in the cloud nodes to realize multilayerdata protection -e feasibility of our scheme has beenvalidated through security analysis and experimentalevaluations
Data Availability
Reference [32] refers to the dataset used in this research-edataset also can be accessed from the web page httpswwwkagglecomcbosch-production-line-performancedata
Conflicts of Interest
-e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
-is research was partly supported by the National NaturalScience Foundation of China (61872015) Qinghai ProvinceNatural Science Foundation (2017-ZJ-91) Beijing NaturalScience Foundation-Haidian Original Innovation JointFund (19L2020) Foundation of Science and Technology onInformation Assurance Laboratory (614211204031117)Industrial Internet Innovation and Development Project
Storage method 1Storage method 2
224 284 344 404164Number of cloud-fog collaborative storage data blocks
0
100
200
300
400
500
600
700
800
Tota
l tim
e req
uire
d fo
r clo
ud-fo
gco
llabo
rativ
e sto
rage
(ms)
Figure 12 Total time required for cloud-fog collaborative storage using storage method 1 and storage method 2 for fog nodes
14 Security and Communication Networks
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 15: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/15.jpg)
(Typical Application and Promotion Project of the SecurityTechnology for the Electronics Industry) of the Ministry ofIndustry and Information Technology of China in 2018Foundation of Shanxi Key Laboratory of Network andSystem Security (NSSOF1900105) and International Re-search Cooperation Seed Fund of Beijing University ofTechnology (2018-B9)
References
[1] M Tseng T Edmunds L Canaran et al Introduction to EdgeComputing in IIoT Industrial Internet Consortium Need-ham MA USA 2018
[2] H Hui C Zhou S Xu and F Lin ldquoA novel secure datatransmission scheme in industrial internet of thingsrdquo ChinaCommunications vol 17 no 1 pp 73ndash88 2020
[3] B Wang ldquoFacebook has another data breach scandal and isquestioned that it cannot fully protect user informationrdquo2019 httpwwwqlmoneycomcontent20190404-347582html
[4] K-S Wong and M H Kim ldquoPrivacy protection for data-driven smart manufacturing systemsrdquo International Journalof Web Services Research vol 14 no 3 pp 17ndash32 2017
[5] C Dwork and J Lei ldquoDifferential privacy and robust sta-tisticsrdquo in Proceedings of the 41st Annual ACM Symposium on6eory of Computing pp 371ndash380 Bethesda MA USAMayndashJune 2009
[6] K Kenthapadi A Korolova I Mironov and N MishraldquoPrivacy via the johnson-lindenstrauss transformrdquo PrivacyConfidentiality vol 5 no 1 pp 39ndash71 2013
[7] N Mohammed D Alhadidi B C M Fung and M DebbabildquoSecure two-party differentially private data release for ver-tically partitioned datardquo IEEE Transactions on Dependableand Secure Computing vol 11 no 1 pp 59ndash71 2014
[8] J C Duchi M I Jordan andM JWainwright ldquoLocal privacyand statistical minimax ratesrdquo in Proceedings of the 54thAnnual IEEE Symposium on Foundations of Computer Science(FOCS) pp 429ndash438 Berkeley CA USA October 2013
[9] U Erlingsson ldquoRAPPOR Randomized aggregatable privacypreserving ordinal responserdquo in Proceedings of the 21st ACMConference on Computer and Communications Securitypp 1054ndash1067 Chicago IL USA October 2014
[10] B Gu V S Sheng Z Wang et al ldquoIncremental learning forv-support vector regressionrdquo Neural Networks vol 67pp 140ndash150 2015
[11] C Xu J Ren D Zhang and Y Zhang ldquoDistilling at the edgea local differential privacy obfuscation framework for IoTdataanalyticsrdquo IEEE Communications Magazine vol 56 no 8pp 20ndash25 2018
[12] T Wang J Zhou X Chen G Wang A Liu and Y Liu ldquoAthree-layer privacy preserving cloud storage scheme based oncomputational intelligence in fog computingrdquo IEEE Trans-actions on Emerging Topics in Computational Intelligencevol 2 no 1 pp 3ndash12 2018
[13] M Du K Wang X Liu S Guo and Y Zhang ldquoA differentialprivacy-based query model for sustainable fog data centersrdquoIEEE Transactions on Sustainable Computing vol 4 no 2pp 145ndash155 2019
[14] L Lyu K Nandakumar B Rubinstein J Jin J Bedo andM Palaniswami ldquoPPFA privacy preserving fog-enabled ag-gregation in smart gridrdquo IEEE Transactions on IndustrialInformatics vol 14 no 8 pp 3733ndash3744 2018
[15] Q Huang Y Yang and L Wang ldquoSecure data access controlwith ciphertext update and computation outsourcing in fogcomputing for Internet of -ingsrdquo IEEE Access vol 5 no 99pp 12941ndash12950 2017
[16] R Lu K Heung A H Lashkari and A A Ghorbani ldquoAlightweight privacy-preserving data aggregation scheme forfog computing-enhanced IoTrdquo IEEE Access vol 5pp 3302ndash3312 2017
[17] T Ghorbani J Zeng M Z A Bhuiyan et al ldquoTrajectoryprivacy preservation based on a fog structure for cloud lo-cation servicesrdquo IEEE Access vol 5 pp 7692ndash7701 2017
[18] S Kulkarni S Saha and R Hockenbury ldquoPreserving privacyin sensor-fog networksrdquo in Proceedings of 6e 9th Interna-tional Conference for Internet Technology and SecuredTransactions (ICITST) vol 96ndash99 London UK December2014
[19] Q Hou Y Wu W Zheng and G Yang ldquoA method onprotection of user data privacy in cloud storage platformrdquoJournal of Computer Research and Development vol 48 no 7pp 1146ndash1154 2011
[20] G Feng ldquoA data privacy protection scheme of cloud storagerdquoSoftware Guide vol 14 no 12 pp 174ndash176 2015
[21] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162016
[22] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2016
[23] Z Fu F Huang X Sun A Vasilakos and C-N YangldquoEnabling semantic search based on conceptual graphs overencrypted out-sourced datardquo IEEE Transactions on ServicesComputing vol 12 no 5 pp 813ndash823 2016c
[24] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016
[25] G Kulkarni R Waghmare R Palwe et al ldquoCloud storagearchitecturerdquo in Proceedings of the 2012 7th InternationalConference on Telecommunication Systems Services andApplications pp 76ndash81 Bali Indonesia October 2012
[26] C Wang S S M Chow Q Wang K Ren and W LouldquoPrivacy-preserving public auditing for secure cloud storagerdquoIEEE Transactions on Computers vol 62 no 2 pp 362ndash3752013
[27] J Shen J Shen X Chen X Huang and W Susilo ldquoAnefficient public auditing protocol with novel dynamic struc-ture for cloud datardquo IEEE Transactions on Information Fo-rensics and Security vol 12 no 10 pp 2402ndash2415 2017
[28] X Q Pham and E N Huh ldquoTowards task scheduling in acloud-fog computing systemrdquo in Proceedings of the 2016 18thAsia-Pacific Network Operations and Management Sympo-sium (APNOMS) pp 1ndash4 Kanazawa Japan October 2016
[29] R Deng R Lu C Lai et al ldquoTowards power consumption-delay tradeoff by workload allocation in cloud-fog comput-ingrdquo in Proceedings of the 2015 IEEE International Conferenceon Communications (ICC) pp 3909ndash3914 London UK June2015
[30] K Bierzynski A Escobar andM Eberl ldquoCloud fog and edgecooperation for the futurerdquo in Proceedings of the 2017 SecondIEEE International Conference on Fog and Mobile EdgeComputing (FMEC) pp 62ndash67 Valencia Spain May 2017
Security and Communication Networks 15
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks
![Page 16: ProtectionofSensitiveDatainIndustrialInternetBasedon Three ...downloads.hindawi.com/journals/scn/2020/2017930.pdf · distributed computing power on a converged network ... proposed](https://reader033.vdocuments.us/reader033/viewer/2022060209/5f044fb17e708231d40d59a3/html5/thumbnails/16.jpg)
[31] V Mathew T Toby V Singh et al ldquoPrediction of remaininguseful lifetime (RUL) of turbofan engine using machinelearningrdquo in Proceedings of the 2017 IEEE InternationalConference on Circuits and Systems (ICCS) pp 306ndash311-iruvananthapuram India December 2017
[32] A Mangal and N Kumar ldquoUsing big data to enhance theBosch production line performance a kaggle challengerdquo inProceedings of the 2016 IEEE Big Data Conferencepp 2029ndash2035 Washington DC USA December 2016
16 Security and Communication Networks