p2 project
TRANSCRIPT
BY: MOHAMMED ATHEEQ SHARIEFF
HARSHA VAIDYANATH
AMITH B.K
UNDER GUIDANCE OF: Mr.RAJESH
A project on
Privacy-Preserving Detection of Sensitive Data Exposure
1
2
Abstract
The exposure of sensitive data in storage and
transmission poses a serious threat to organizational
and personal security.
Data leak detection aims at scanning content for
exposed sensitive data.
3
In this project the system propose a data- leake
detection (DLD).
It can be outsourced and be deployed in a semi-honest
detection environment.
This approach works well especially in the case where
consecutive data blocks are leaked
4
INTRODUCTIONCurrent applications tend to use personal sensitive
information to achieve better quality with respect to their
services. Since the third parties are not trusted the data must
be protected such that individual data privacy is not
compromised but at the same time operations on it would be
compatible.
5
The system implement, and evaluate a new privacy-
preserving data-leak detection system that enables the
data owner to safely deploy locally, or to delegate the
traffic-inspection task to DLD providers without
exposing the sensitive data.
6
In our model, the data owner computes a special set of digests or fingerprints from the sensitive data, and then discloses only a small amount of digest information to the DLD provider.
Existing systemIn existing system, the system used MD5 algorithms.
The MD5 message-digest algorithm is a widely used
cryptographic hash function producing a 128-bit (16-byte) hash
value, typically expressed in text format as a 32 digit
hexadecimal number.
MD5 has been utilized in a wide variety of cryptographic
applications, and is also commonly used to verify data integrity. 7
8
DisadvantagesThe customer or data owner does not need to fully
trust the DLD provider using our approach.
Keywords usually do not cover enough sensitive data
segments for data-leak detection.
It does not aim to provide an remote service.
9
Proposed systemThe system propose a privacy-preserving data-leak
detection model for preventing inadvertent data leak
in network traffic.
The DLD provider may learn sensitive information
from the traffic, which is inevitable for all deep
packet inspection approaches.
10
The proposed system uses (Secure Hash algorithm (SHA) to generate short and hard-to-reverse digests through the fast polynomial modulus operation.
AdvantagesThis strong privacy guarantee yields a powerful
application of fuzzy fingerprint method in the cloud
computing environment.
It provides high accuracy performance
It has very low false positive rate.
The privacy guarantee of this approach is much higher 11
12
FLOW
DIAGRAM
13
SYSTEM ARCHITECTURE
14
USE CASE DIAGRAM
15
CLASS DIAGRAM
16
SEQUENCE DIAGRAM
17
MODULES
Data Owner
Fuzzy finger Print
DLD
Data Receiver
18
MODULES DESCRIPTION
19
Data Owner
The system enables the data owner to securely
delegate the content-inspection task to DLD providers
without exposing the sensitive data.
The data owner computes a special set of digests or
fingerprints from the sensitive data and then discloses
only a small amount of them to the DLD provider.
20
It is the data owner, who post-processes the potential
leaks sent back by the DLD provider and determines
whether there is any real data leak.
The sensitive data is sent by a legitimate user intended
for legitimate purposes. The data owner is aware of
legitimate data transfers and permits such transfers.
21
So the data owner can tell whether a piece of sensitive data in the network traffic is a leak using legitimate data transfer policies.
22
Data Owner
Register andlogin
Send data
Permit transfers
Data Owner
23
Fuzzy finger Print
To achieve the privacy goal, the data owner
generates a special type of digests.
The digests are called fuzzy fingerprints.
24
IMPLENEMTATION
1.Data Encryption Standard (DES)
DES algorithm is used to encrypt and decrypt data in our project
25
• DES works by encrypting groups of 64 message bits, • Out of which 56 are key bits and remaining 8 are
check bits.
26
• 2.Secure Hash Algorithm
•Message digest is 160 bits, 20 bytes, 40 digit
hexadecimal format notation .
• It has 80 rounds.
• It produces a short and hard to reverse hash key
27
• Algorithm structure :
• Step 1: Padding bits
• Step 2: Appending length as 64 bit unsigned
• Step 3: Buffer initiation
• Step 4: Processing of message
• Step 5: Output
• example, the SHA-256 hash code for “www.mytecbits.com ” is
• 575f62a15889fa8ca55514a10754d2f98e30c57c4538f0f3e39dc531
14533857.
28
It prevents the DLD provider from learning its exact
value.
The data owner transforms each fingerprints into a
fuzzy fingerprint.
All fuzzy fingerprints are collected and form the
output of this operation.
29
Fuzzy finger Print
Generate digestsHide sensitive
data
Prevent the DLD
Fuzzy finger Print
30
DLD
The DLD provider computes fingerprints from
network traffic and identifies potential leaks in
them.
To prevent the DLD provider from gathering
exact knowledge about the sensitive data,
31
the collection of potential leaks is composed of real leaks and noises.
It is the data owner, who post-processes the potential leaks sent back by the DLD provider and determines whether there is any real data leak.
32
DLD
The DLD server detects the sensitive data within
each packet on basis of a stateless filtering
system.
DLD provider inspects the network traffic for
potential data leaks.
33
The inspection can be performed offline without
causing any real-time delay in routing the packets.
However, the DLD provider may attempt to gain
knowledge about the sensitive data.
34
DLD
Identify leaksCompute
fingerprints
Inspect thenetwork traffic
DLD
35
Data receiver
This operation is run by the data receiver on
each piece of sensitive data.
The data reciever recieves the data and this
data is in encrypted format.
The data is decrypted and text is obtained.
36
Data receiver
ReceiveCollect each
packet
Compute
Data receiver
37
System Requirements
38
System RequirementsSoftware Requirements:
• O/S : Windows XP / 7 / 8 / 10
• Language : Java.
• IDE : Eclipse
• Data Base : MySQL
39
System RequirementsHardware Requirements
• System : Pentium IV 2.4 GHz and above
• Hard Disk : 160 GB
• Monitor : 15 VGA color
• Mouse : Logitech.
• Keyboard : 110 keys enhanced
• Ram : 2GB
40
LITERATURE SURVEY
41
Title Year Author Methodology Advantages Disadvantages
Data leak detection as a service
2012 Xiaokui Shu Danfeng (Daphne) Yao
The system propose a network-based data-leak detection (DLD)technique, the main feature of which is that the detectiondoes not require the data owner to reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed
provide a quantifiable method to measure the privacy guarantee offered by ourfuzzy fingerprint framework.
It is not efficient enough for practical dataleak inspection in this setting.
42
Title Year Author Methodology Advantages Disadvantages
Quantifying Information Leaks in Outbound Web Traffic
2009 Kevin Borders Atul Prakash
The system present an approach for quantifying information leak capacity in network traffic. Instead of trying to detect the presence of sensitive data—an impossible task in the general case—our goal is to measure and constrain its maximum volume
it possible to identify smaller leaks.
Traffic measurement does not completely stop information leaks from slipping by undetected
43
Title Year Author Methodology Advantages Disadvantages
Panorama:Capturing system-wide information flow for malware detection andanalysis
2007 H. Yin, D. Song, M. Egele, C. Kruegel, and E. Kirda,
We propose a system, Panorama, todetect and analyze malware by capturing this fundamentaltrait. In our extensive experiments, Panorama successfullydetected all the malware samples and had very few falsepositives.
It does send back sensitive information to remoteservers in certain settings
detecting malware and analyzing unknown code samplesare insufficient and have significant shortcomings.
44
Title Year Author Methodology Advantages Disadvantages
Protecting confidentialdata on personal computers with storage capsules
2009 K. Borders, E. V. Weele, B. Lau, and A . Prakash
This paper introduces Storages Capsules, a new approach for protecting confidential files on a personal computer. Storage Capsules are encrypted file containers that allow a compromised machine to securely view and edit sensitive files without malware being able to steal confidential data
The system achieves this goal by taking a checkpoint of the current system state and disabling device output before allowing access a Storage Capsule
It do not rely on high integrity.
45
Title Year Author Methodology Advantages Disadvantages
Preventing accidental data disclosure inmodern operating systems
2013 A. Nadkarni and W. Enck,
This paper presents Aquifer as a policy framework and system for preventing accidental information disclosure in modernoperating systems. In Aquifer, application developers define secrecy restrictions that protect the entire user interfaceworkflow defining the user task
the lack of application separationdid not expose it as a concern.
It may not be trusted with that data.
46
Title Year Author Methodology Advantages Disadvantages
Revolver: An automated approach to the detection of evasive web-basedmalware,
2013 A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, and G. Vigna
In this paper, we present Revolver, a novel approach to automatically detect evasive behavior in malicious JavaScript.Revolver uses efficient techniques to identify similarities between a large number of JavaScript programs (despite their use of obfuscation techniques, such as packing,polymorphism, and dynamic code generation), and to automatically interpret their differences to detect evasions.
Revolverhas identifiedseveral techniques that attackers use to evade existingdetection tools by continuously running in parallel with a honeyclient.
This approach was defeated bystatic detection of the malicious code using signatures.
47
Title Year Author Methodology Advantages Disadvantages
Gyrus: A framework foruser-intent monitoring of text-based networked applications,
2014 Y. Jang, S. P. Chung, B. D. Payne, and W. Lee
In this paper, wepropose a way to break this cycle by ensuring that a system’s behavior matches the user’s intent. Since our approach is attackagnostic, it will scale better than traditional security systems
Gyrus is very efficient and introducesno noticeable delay to a users’ interaction with the protectedapplications
Gyrus solves problem byrelying on the semantics, but not the timing of user generatedevents
48
Title Year Author Methodology Advantages Disadvantages
Privacy-preserving scanningof big content for sensitive data exposure with MapReduce
2015 F. Liu, X. Shu, D. Yao, and A. R. Butt,
Our solution uses the MapReduce-framework for detecting exposedsensitive content, because it has the ability to arbitrarilyscale and utilize public resources for the task, such as Amazon EC2. We design new MapReduce algorithms for computing collection intersection for data leak detection
This transformation supports the secure out-sourcing of the data leak detection to untrusted MapReduceand cloud providers.
a significant portionof the incidents are caused by unintentional mistakes of employees or data owners
Title Year Author Methodology Advantages Disadvantages
Fuzzy keywordsearch over encrypted data in cloud computing
2010 J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, and W. Lou,
In this paper, forthe first time we formalize and solve the problem of effective fuzzykeyword search over encrypted cloud data while maintainingkeyword privacy.
proposed solution is secure andprivacy-preserving, while correctly realizing the goal of fuzzykeyword search.
unsuitable in Cloud Computing as it greatlyaffects system usability, rendering user searching experiencesvery frustrating and system efficacy very low.
Title Year Author Methodology Advantages Disadvantages
Towards practical avoidance of informationleakage in enterprise networks
2011 J. Croft and M. Caesar,
In this paper, we propose a network-wide methodof confining and controlling the flow of sensitive datawithin a network. Our approach is based onblack-box differencing– we run two logical copies of the network,one with private data scrubbed, and compare outputs of the two to determine if and when private data is being leaked.
purposeschemes that leverage black-box differencing to mitigateleakage of private data.
It may not be able to monitor encrypted traffic without encryption keys or information flows that areintentionally obfuscated by attackers.
50
51
Conclusion
Preventing sensitive data from being compromised is an
important and practical research problem.
The proposed system used (Secure Hash algorithm (SHA) to
generate short and hard-to-reverse digests through the fast
polynomial modulus operation.
52
Using special digests, the exposure of the sensitive data is kept to a minimum during the detection.
53
References[1] X. Shu and D. Yao, “Data leak detection as a service,”
in Proc. 8th Int. Conf. Secur. Privacy Commun. Netw.,
2012, pp. 222–240.
[2] K. Borders and A. Prakash, “Quantifying information
leaks in outbound web traffic,” in Proc. 30th IEEE Symp.
Secur. Privacy , May 2009, pp. 129–140.
54
References
[3] H. Yin, D. Song, M. Egele, C. Kruegel, and E.
Kirda, “Panorama: Capturing system-wide information
flow for malware detection and analysis,” in Proc.
14th ACM Conf. Comput. Commun. Secur. , 2007, pp.
116–127.