![Page 1: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/1.jpg)
1
A Novel Anonymization Technique for Privacy Preserving Data Publishing
![Page 2: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/2.jpg)
2
Introduction
• Data publishing approach may lead to insufficient
protection. So, providing privacy for the micro data
publishing.
• Privacy protection is an important issue in data processing.
Data must be anonymized to protect privacy.
• Privacy preserving data publishing approach provides
methods for publishing the useful information.
![Page 3: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/3.jpg)
3
• Here micro data contains records each of which contains
information about an individual entity like person,
organization.
• The data mining community has focused on hiding sensitive
rules generated from transactional databases.
• Before anonymizing the data, one can analyze the data
characteristics and use those characteristics in data
anonymization.
![Page 4: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/4.jpg)
4
Literature Survey
• In both Generalization and Bucketization approaches
attributes are partitioned into three categories:
1) Some attributes are identifiers that can uniquely identify
an individual, such as Name or Social Security Number
2) Some attributes are Quasi Identifiers (QI), which the
adversary may already know and can potentially identify
an individual, e.g., Birthdate, Sex, and Zip code.
![Page 5: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/5.jpg)
5
3) Some attributes are Sensitive Attributes (SAs), which are unknown to the adversary and are considered sensitive, such as Disease and Salary.
• In generalization and bucketization, one first removes identifiers from the data and then partitions tuples into buckets.
• Generalization transforms the QI-values in each bucket.
• Bucketization, one separates the SAs from the QIs by randomly permuting the SA values in each bucket.
![Page 6: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/6.jpg)
6
Privacy threats
• When publishing microdata, there are three types of privacy disclosure threats. They are as follows
1) Membership disclosure
2) Identity disclosure
3) Attribute disclosure
![Page 7: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/7.jpg)
7
Problem Specification
• The anonymization techniques for privacy preserving microdata publishing are:
1.Generalization
2. Bucketization
• Generalization losses considerable amount of information,
especially for high-dimensional data. This is due to curse of
dimensionality.
• Bucketization has better data utility than generalization but
have some drawbacks.
![Page 8: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/8.jpg)
8
• It does not prevent member ship disclosure.
• In many data sets, it is unclear which attributes are QIs and
which are SAs.
• It requires a clear separation between QIs and SAs.
• By separating the QI attributes and SA here it breaks down
the attribute correlation between attributes.
• To overcome the drawbacks of above approaches a novel
technique called slicing will be implemented.
![Page 9: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/9.jpg)
9
Slicing
• Slicing partitions the dataset both vertically and horizontally.
• Grouping the attributes into columns, each column contains
a subset of attributes, i.e., vertical partition
• Slicing also partition tuples into buckets. Each bucket
contains a subset of tuples, i.e. horizontal partition.
• Within each bucket, values in each column are randomly
permutated to break the linking between different columns.
![Page 10: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/10.jpg)
10
Slicing Algorithms
• Our algorithm consists of three phases they are as
follows:
Attribute partitioning
Column generalization
Tuple partitioning
![Page 11: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/11.jpg)
11
Flow Chart for Project Design
![Page 12: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/12.jpg)
12
Algorithm
• The algorithm maintains two data structures: a queue of buckets Q and a set of sliced buckets SB.
• Initially, Q contains only one bucket which includes all tuples and SB is empty.
• In each iteration, the algorithm removes a bucket from Q and splits the bucket into two buckets.
• If the sliced table after the split satisfies ‘l-diversity, then the algorithm puts the two buckets at the end of the queue Q.
![Page 13: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/13.jpg)
13
Cont..
• Otherwise, we cannot split the bucket anymore and the algorithm puts the bucket into SB.
• When Q becomes empty, we have computed the sliced table.
• The set of sliced buckets is SB. The main part of the tuple-partition algorithm is to check whether a sliced table satisfies ‘l-diversity.
![Page 14: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/14.jpg)
14
Tuple Partitioning Algorithm
Algorithm tuple-partition(T, ℓ)
1. Q = {T}; SB = .∅2. while Q is not empty
3. remove the first bucket B from Q; Q = Q − {B}.
4. split B into two buckets B1 and B2.
5. if diversity-check(T, Q {B1,B2} SB, ℓ)∪ ∪6. Q = Q {B1,B2}.∪7. else SB = SB {B}.∪8. return SB.
![Page 15: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/15.jpg)
15
Diversity Check Algorithm
Algorithm diversity-check(T,T_, ℓ)
1. for each tuple t T, L[t] = .∈ ∅2. for each bucket B in T_
3. record f(v) for each column value v in bucket B.
4. for each tuple t T∈5. calculate p(t,B) and find D(t,B).
6. L[t] = L[t] {(p(t,B),D(t,B))}.∪7. for each tuple t T∈8. calculate p(t, s) for each s based on L[t].
9. if p(t, s) ≥ 1/ℓ, return false.
10. return true.
![Page 16: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/16.jpg)
16
Examples for Anonymization Techniques
• This the original microdata table and its anonymized versions using anonymization techniques
![Page 17: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/17.jpg)
17
• In above figure it consists of QI and SA. Age, sex, zipcode is QI and disease is SA and generalized table that satisfies 4-anonymity
![Page 18: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/18.jpg)
18
• The above dataset shows the bucketized table that satisfies 2-diversity.
![Page 19: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/19.jpg)
19
The above tables shows the Multiset-based generalization, one attribute per column slicing and the below table shows sliced table
![Page 20: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/20.jpg)
20
Design and Implementation
It shows that the activities of the user. The user can provide the privacy to the microdata by generalizing the records, dividing into number of buckets and breaking the correlation between the attributes. The attributes of the table are sliced by performing random permutation and probability function.
Tuple Partitioning
Attribute Clustering
Correlation Measure
Diverse Slicing
User
Data SlicingUse case diagram
![Page 21: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/21.jpg)
21
Class diagram
It shows that how probability functions are calculated and randomly permuted. It having five different classes with their attributes, methods how data is retrieved and methods are applied.
CorrelationMeasureAttributes[]mscdomain[]partitionset[]
RetrieveAttributes()ComputeDomain()CalculateMSC()ApplyDiscretization()PartitionsAttributeDomain()
TuplePartitioningtuples[]probabilityldiversity
CreateQueueBuckets()AssignTuples()RemoveBucket()CheckDiversity()RecordMatchingProbability()
1..*1..*
AttributeClusteringcentroidattribcorellation
RetrieveCorrelation()ComputeCentroids()CheckCorrelation()ApplyClustering()
1..*1..*
DataSlicingpartSet[]tuplesAttribTemp
SelectAttributes()ReplaceMultiset()PartitionsAttributes()PartitionTuples()RandomPermuteTuple()
DiverseSlicingbuckets[]bucketIDprobablity
DetermineMatchingBuckets()AssignProbability()ComputeProbabilitySensitive()CheckPotentialMatch()
1..*1..*
1..*1..*
![Page 22: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/22.jpg)
22
Modules
Data slicing Diverse slicing Correlation measure Attribute clustering Tuple partitioning
![Page 23: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/23.jpg)
23
• The attributes of tables are used for slicing and
bucketized.
• Slicing first partition attributes into columns and
then partition tuples into buckets.
• In diverse slicing has to extend the above analysis
to the general case and introduce the notion of l-
diverse slicing and applying the probability
function to the attributes.
![Page 24: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/24.jpg)
24
• Two Correlation measures are used for measuring correlation between two continuous attributes and two categorical attributes.
• After the correlations for each pair of attributes, we use clustering to partition attributes into columns.
• After that tuple partition will be done. Here tuples are
partitioned into buckets.
![Page 25: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/25.jpg)
25
H\w And S\w Specifications
Hardware:
(1)2GB RAM
(2) 320GB Hard disk
(3) Intel processor(P3)
Software:
(1) Visual Studio
(2) Windows xp/7
![Page 26: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/26.jpg)
26
Results
Retrieving the dataset file
![Page 27: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/27.jpg)
27Showing the dataset files to show path of the file
![Page 28: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/28.jpg)
28Selecting The Dataset Test File
![Page 29: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/29.jpg)
29Displaying the path of the dataset file
![Page 30: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/30.jpg)
30Displaying the message in the textbox when user cancel the browsing
![Page 31: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/31.jpg)
31Displaying the file path of the dataset attribute
![Page 32: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/32.jpg)
32Displaying the attributes from the dataset
![Page 33: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/33.jpg)
33Selecting the attribute set from the dropdown list
![Page 34: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/34.jpg)
34Displaying the attribute domains of the attribute set
![Page 35: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/35.jpg)
35Displaying the tuples
![Page 36: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/36.jpg)
36Entering the number of buckets
![Page 37: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/37.jpg)
37Displaying the generalization values in both text and table format
![Page 38: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/38.jpg)
38Displaying the bucketization values in text format
![Page 39: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/39.jpg)
39Displaying the values which have a clear separation between Quasi-
Identifiers and Sensitive Attributes
![Page 40: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/40.jpg)
40Displaying the diversity mapping based on the salary values
![Page 41: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/41.jpg)
41Displaying sliced values in textbox
![Page 42: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/42.jpg)
42Displaying the sliced sets in the table
![Page 43: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/43.jpg)
43Showing the probability based on the countries and salary of all the
buckets
![Page 44: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/44.jpg)
44Providing security for sliced sets and storing the values in the database
![Page 45: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/45.jpg)
45
![Page 46: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/46.jpg)
46Duplicating the attributes
![Page 47: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/47.jpg)
47Displaying the duplicate attributes in the table format
![Page 48: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/48.jpg)
48
Comparison
Graph showing time consumed for three techniques.
Anonymity Diversity Slicing0
50
100
150
200
250
300
350
400
Tim
e (m
sec)
![Page 49: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/49.jpg)
49
Comparative graph showing time consumed in Enhanced Slicing.
Slicing Enhanced Slicing0
50
100
150
200
250
300
350
400
Tim
e (m
sec)
![Page 50: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/50.jpg)
50
Conclusion• Dataset is taken and performing anonymization techniques to protect
privacy for micro data.
• Attribute values are considered and applying the probability functions.
• Generalization, bucketization and slicing techniques are implemented.
• By using DES it provides the security to the sliced set table.
• Overlapping slicing is done, which duplicates an attribute in more
than one columns.
• By comparing slicing preserves better data utility than generalization
and bucketization based on time consuming.
![Page 51: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/51.jpg)
51
Future Work
• Slicing gives better privacy than generalization and bucketization but still in future there is a scope to increase the privacy for microdata publishing, by using different anonymization techniques.
![Page 52: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/52.jpg)
52
References[1] Tiancheng Li, Ninghui Li, Jian Zhang, and Ian Molloy “Slicing: A New
Approach for Privacy Preserving Data Publishing” ieee transactions on
knowledge and data engineering, vol. 24, no. 3, march 2012.
[2] A. Inan, M. Kantarcioglu, and E. Bertino, “Using Anonymized Data for
Classification,” Proc. IEEE 25th Int’l Conf. Data Eng. (ICDE), pp. 429-440,
2009.
[3] B.-C. Chen, K. LeFevre, and R. Ramakrishnan, “Privacy Skyline: Privacy with
Multidimensional Adversarial Knowledge,” Proc. Int’l Conf. Very Large Data
Bases (VLDB), pp. 770-781, 2007.
[4] C. Dwork, “Differential Privacy: A Survey of Results,” Proc. Fifth Int’l Conf.
Theory and Applications of Models of Computation (TAMC), pp. 1-19, 2008.
![Page 53: A Novel Anonymization Technique for Privacy Preserving Data Publishing](https://reader035.vdocuments.us/reader035/viewer/2022062719/55cf9c3e550346d033a92a7a/html5/thumbnails/53.jpg)
53
Thank you..