anonyropy : an efficient anonymization scheme using...

4
Anonyropy : An Efficient Anonymization Scheme Using Entropy in Smart Mediator for Mashup Service Da Eun Lee, YoungKi Kim, Choong Seon Hong Department of Computer Science and Engineering Kyung Hee University, 446-701 Republic of Korea {daeunlee, qoo0144, cshong}@khu.ac.kr Abstract—Recently, the mashup services linking multiple separated platforms have appeared on the Internet. However, to communicate between different platforms, those services need the Smart Mediator as a common relaying platform. The Smart Mediator receives data from various platforms and provides a uniform OpenAPI to the mashup service developer to ease the development procedures. One of the most important task of a Smart Mediator is to safeguard the user’s privacy. Thus, it needs to provide anonymized personal data to the mashup services. In this paper, we propose an efficient anonymization scheme using entropy in the Smart Mediator. And we evaluate this scheme in the Smart Mediator. As a result, this scheme is more efficient other when we are comparing attack risk rate and data loss rate. Through this scheme, we expect to shelter user’s privacy and provide enough personal information to the mashup services. Keywords—Anonymization; Entropy; Mashup Service; Smart Mediator; I. INTRODUCTION Recently, many internet services have started providing their OpenAPIs [1], for example, Google, Facebook, Twitter, etc. Hence, different mashup services use these OpenAPIs and public data, such as Korean government 3.0 portal. For developing mashup services using OpenAPIs, developers use the smart mediator which connects to multiple different OpenAPIs. Figure 1 shows the architecture of the smart mediator. In Figure 1, the smart mediator provides OpenAPIs of various platforms, such as Internet of Things, Cloud, Big Data, Mobile and Security (ICBMS) with developers. The smart mediator helps them develop mashup service easily through using various platforms. Since the smart mediator is connecting various platforms, it is exposed to many security problems such as linked attack, distributed denial of service, sniffing and so on. Therefore, in the smart mediator, anonymization is an important component because it is able to protect privacy of the user which is collected from various ICBMS platforms. Thus, the smart mediator anonymizes personal data and sensor values and provides mashup services for preventing attacks such as linked attack and protecting user’s privacy. In this paper, we study the efficient anonymization scheme in the smart mediator for mashup services. It is named anonyropy. It anonymizes the personal data and provides enough personal information using optimal . The rest of this paper is organized as follow. In Session II, we review -anonymity and entropy which are important schemes for the anonyropy. In the session III, we explain the anonyropy architecture and mechanism. Then we evaluate the anonyropy in Section IV. Finally, we conclude the paper and mention about future works in Section V. Figure 1. Smart Mediator Archtecture II. RELATED WORK A. -anonymity -anonymity is suggested by Latanya Sweeney for reducing correlation of data. Things to satisfy -anonymity means that © Copyright IEICE – The 18 th Asia-Pacific Network Operations and Management Symposium (APNOMS) 2016

Upload: others

Post on 11-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Anonyropy : An Efficient Anonymization Scheme Using ...networking.khu.ac.kr/layouts/net/publications/data... · Using Entropy in Smart Mediator for Mashup Service Da Eun Lee, YoungKi

Anonyropy : An Efficient Anonymization Scheme Using Entropy in Smart Mediator for Mashup Service

Da Eun Lee, YoungKi Kim, Choong Seon Hong Department of Computer Science and Engineering

Kyung Hee University, 446-701 Republic of Korea

{daeunlee, qoo0144, cshong}@khu.ac.kr

Abstract—Recently, the mashup services linking multiple separated platforms have appeared on the Internet. However, to communicate between different platforms, those services need the Smart Mediator as a common relaying platform. The Smart Mediator receives data from various platforms and provides a uniform OpenAPI to the mashup service developer to ease the development procedures. One of the most important task of a Smart Mediator is to safeguard the user’s privacy. Thus, it needs to provide anonymized personal data to the mashup services. In this paper, we propose an efficient anonymization scheme using entropy in the Smart Mediator. And we evaluate this scheme in the Smart Mediator. As a result, this scheme is more efficient other when we are comparing attack risk rate and data loss rate. Through this scheme, we expect to shelter user’s privacy and provide enough personal information to the mashup services.

Keywords—Anonymization; Entropy; Mashup Service; Smart Mediator;

I. INTRODUCTION

Recently, many internet services have started providing their

OpenAPIs [1], for example, Google, Facebook, Twitter, etc. Hence, different mashup services use these OpenAPIs and public data, such as Korean government 3.0 portal. For developing mashup services using OpenAPIs, developers use the smart mediator which connects to multiple different OpenAPIs. Figure 1 shows the architecture of the smart mediator. In Figure 1, the smart mediator provides OpenAPIs of various platforms, such as Internet of Things, Cloud, Big Data, Mobile and Security (ICBMS) with developers. The smart mediator helps them develop mashup service easily through using various platforms. Since the smart mediator is connecting various platforms, it is exposed to many security problems such as linked attack, distributed denial of service, sniffing and so on.

Therefore, in the smart mediator, anonymization is an important component because it is able to protect privacy of the user which is collected from various ICBMS platforms. Thus, the smart mediator anonymizes personal data and sensor values and provides mashup services for preventing attacks such as linked attack and protecting user’s privacy.

In this paper, we study the efficient anonymization scheme in the smart mediator for mashup services. It is named anonyropy. It anonymizes the personal data and provides enough personal information using optimal .

The rest of this paper is organized as follow. In Session II, we review -anonymity and entropy which are important schemes for the anonyropy. In the session III, we explain the anonyropy architecture and mechanism. Then we evaluate the anonyropy in Section IV. Finally, we conclude the paper and mention about future works in Section V.

Figure 1. Smart Mediator Archtecture

II. RELATED WORK

A. -anonymity -anonymity is suggested by Latanya Sweeney for reducing

correlation of data. Things to satisfy -anonymity means that

© Copyright IEICE – The 18th Asia-Pacific Network Operations and Management Symposium (APNOMS) 2016

Page 2: Anonyropy : An Efficient Anonymization Scheme Using ...networking.khu.ac.kr/layouts/net/publications/data... · Using Entropy in Smart Mediator for Mashup Service Da Eun Lee, YoungKi

quasi-identifier’s attributes exist same record at least in the data set [2].

Latanya Sweeney defined -anonymity as following: ( , … , ) is the data table and is ’s quasi-identifier. achieves -anonymity only if all RT’s record come out RT[QIRT] at least [3].

Figure 2 shows an example which is a data table satisfied = 3. In Figure 2, the upper table is the original table and the bottom table is anonymized quasi-identifier in the table which satisfies = 3. In case satisfied = 3, records 1, 2 and 3 have the same quasi-identifier and records 4, 5 and 6 have the same quasi-identifier.

Figure 2. An Example K-Anonymity(K=3)

B. Entropy Entropy is a measure of unpredictability of probability value.

In information theory it is the information content. The higher the unpredictability is, the greater the amount of information is and the greater the entropy. For example, in case of a coin toss game, since the number of cases is two which are Head and Tail, it has same probability 1/2 . This situation is the maximum uncertainty. It is the most difficult to predict the result of the next toss. So, the entropy is 1. However, if it has different probability, entropy is less than 1. In this situation, one side is more likely to come up than the other. So, since uncertainty is reduced, entropy is also reduced [4].

In information theory, Shannon defined the entropy of a discrete random variable with possible values { , . . . , } and probability mass function ( ). is the information content

of . ( ) itself is a random variable. The entropy can be written as [5]:

( ) = ∑ ( ) ( ) (1)

( ) = −∑ ( ) ( ) (2)

III. ARCHITECTURE AND MECHANISM

We compute the highest entropy value because an attacker does not recover original data from anonymized data. In this session, we explain anonyropy architecture and algorithm. The anonyropy is based on entropy and -anonymity[6]. Using the anonyropy, the smart mediator provides anonymized data to the mashup services.

A. Architecture Figure 3 shows anonymization architecture in the smart

mediator. The smart mediator has five adaptors which are Internet of Things, Cloud, Big Data, Mobile and Security (ICBMS). ICBMS platform sends data through and is connected to build the mashup service. If the adaptor receives data from ICBMS platform, it will send the data to the anonymization component. The anonymization conducts anonymizing data using anonyropy.

Figure 3. System Architecture

B. Mechanism The anonyropy mechanism is very simple. The information

record IR is based on input record from ICBMS platform in the smart mediator. We define probability of information record . It is shown in the equation (3). Using the equation (3) we normalize probability . It is shown in the equation (4). Using normalized probability , entropy is computed by equation (5). Finally, we can obtain optimal from equation (6).

Page 3: Anonyropy : An Efficient Anonymization Scheme Using ...networking.khu.ac.kr/layouts/net/publications/data... · Using Entropy in Smart Mediator for Mashup Service Da Eun Lee, YoungKi

= (3)

= ∑ (4)

= −∑ =1 2 (5)

= argmax (6)

Figure 4 shows the algorithm of anonyropy. As explained earlier, firstly, we compute and normalized , i.e., . Next, using , we compute . Finally, we obtain optimal for -anonymity. Then, we anonymize personal data using in the smart mediator and the smart mediator is able to provide anonymized data to mashup services.

Figure 4. Algorithm for Anonyropy

IV. EVALUATION

In this section, we evaluate our proposal, anonyropy, which was explained in Section III. We use the anonymization and analysis tool ARX [7].

A. Evaluation Scenario Linking ICBMS platforms, we made a mashup service in the smart mediator. A mashup service send user’s information to the smart mediator. Then, the smart mediator anonymizes user’s information and saves user’s information. When the smart mediator anonymizes using anonyropy. And the smart mediator provides anonymized data to mashup service. Figure 5 shows our evalution scenario.

Figure 5. Evaluation Scenario

B. Result Figure 6 and Figure 7 show attack risk rate and data loss rate

of original data, two random values and anonyropy. In the two random values, one is greater than anonyropy that is named RV1 and the other is less than anonyropy that is named RV2. Attack risk rate means attack success rate such as linked attack. Data loss rate means unavailable data rate.

In Figure 6, attacker is able to attack to original data easily because the highest risk rate is 97.83% and average risk rate is 39.95%. Meanwhile, anonymization data which is RV1, RV2 and anonyropy, is difficult attack. When we compare with RV1, RV2 and anonyropy, RV2 is the lowest attack risk rate. Whereas, in Figure 6, we can see RV1 is the lowest data loss rate.

In Figure 6, when we compare average attack risk rate of anonyropy and RV2, they are similar. RV2 is higher than anonyropy 0.5%. Meanwhile, in Figure 7, when we compare average data loss rate of R1 and anonyropy, R1 is lower than anonyropy 1.5%. Thus, we consider attack risk rate and data loss rate, we can grasp that anonyropy is better than RV1 and RV2.

Figure 6. Attack Risk Rate

Page 4: Anonyropy : An Efficient Anonymization Scheme Using ...networking.khu.ac.kr/layouts/net/publications/data... · Using Entropy in Smart Mediator for Mashup Service Da Eun Lee, YoungKi

Figure 7. Data Loss Rate

V. CONLUSION AND FUTURE WORK

The smart mediator connects OpenAPIs of various ICBMS platforms. Hence, it is important to protect privacy of user from various attacks such as linked attack, sniffing and so on, in the smart mediator. Thus, in this paper, we debate about anonymization scheme using entropy which is named anonyropy in the smart mediator. The anonyropy anonymizes original table using optimal . In Figure 6 and Figure 7, we compare the anonyropy with random value . As a result, when we consider two cases which include attack risk rate and data loss rate, the anonyropy is better than others. So, using the anonyropy we can predict optimal and anonymize personal data efficiently. As a result, we provide secure and enough personal information to mashup services in the smart mediator. Also, we can protect to user’s privacy.

In this paper, we designed computing optimal . However, since we deal with various data types and many data, at the smart

mediator, we need algorithm, quick and correct anonymization algorithm as well as optimal k. In the future, we are planning to improve the anonyropy which is an anonymization algorithm using machine learning.

ACKNOWLEDGMENT

This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIP) (R0126-16-1009, Development of Smart Mediator for Mashup Service and Information Sharing among ICBMS Platform) *Dr. CS Hong is the corresponding author

REFERENCES

[1] Mashup(OpenAPI/Mashup), http://www.mashup.or.kr/business/main/m-ain.do

[2] Chikwang Hwang, Choong Seon Hong, Jongwon Choe, “A Study on Service-based Secure Anonymization for Data Utility Enhancement,” Journal of KIISE, vol. 42, issue no. 5, pp681-689, May 2015-

[3] Latanya sweeney “K-ANONYMITY : A MODEL FOR PROTECTING PRIVACY,” International Journal on Uncertainty Fuzziness and Knowledge-based System, vol. 10, issue no. 5, pp557-570, 2002.

[4] Entropy (informaion theory), https://en.wikipia.org/wiki/Entropy-_(informaion_theory).

[5] Shannon C. L., “The mathematical theory of communication,” Bell Technical Journal, vol.27, pp379-423, 623-626, July,October 1948.

[6] Ben Niu, Qinghua Li, Xiaoyan Zhu, Guohong Cao, Hui Li, “Achieving k-anonymity in Privacy Aware Location-Based Services,” IEEE INFOCOM April 2014, pp754-762.

[7] ARX, http://arx.deidentifier.org/