p2 project

BY: MOHAMMED ATHEEQ SHARIEFF

HARSHA VAIDYANATH

AMITH B.K

UNDER GUIDANCE OF: Mr.RAJESH

A project on

Privacy-Preserving Detection of Sensitive Data Exposure

1

2

Abstract

The exposure of sensitive data in storage and

transmission poses a serious threat to organizational

and personal security.

Data leak detection aims at scanning content for

exposed sensitive data.

3

In this project the system propose a data- leake

detection (DLD).

It can be outsourced and be deployed in a semi-honest

detection environment.

This approach works well especially in the case where

consecutive data blocks are leaked

4

INTRODUCTIONCurrent applications tend to use personal sensitive

information to achieve better quality with respect to their

services. Since the third parties are not trusted the data must

be protected such that individual data privacy is not

compromised but at the same time operations on it would be

compatible.

5

The system implement, and evaluate a new privacy-

preserving data-leak detection system that enables the

data owner to safely deploy locally, or to delegate the

traffic-inspection task to DLD providers without

exposing the sensitive data.

6

In our model, the data owner computes a special set of digests or fingerprints from the sensitive data, and then discloses only a small amount of digest information to the DLD provider.

Existing systemIn existing system, the system used MD5 algorithms.

The MD5 message-digest algorithm is a widely used

cryptographic hash function producing a 128-bit (16-byte) hash

value, typically expressed in text format as a 32 digit

hexadecimal number.

MD5 has been utilized in a wide variety of cryptographic

applications, and is also commonly used to verify data integrity. 7

8

DisadvantagesThe customer or data owner does not need to fully

trust the DLD provider using our approach.

Keywords usually do not cover enough sensitive data

segments for data-leak detection.

It does not aim to provide an remote service.

9

Proposed systemThe system propose a privacy-preserving data-leak

detection model for preventing inadvertent data leak

in network traffic.

The DLD provider may learn sensitive information

from the traffic, which is inevitable for all deep

packet inspection approaches.

10

The proposed system uses (Secure Hash algorithm (SHA) to generate short and hard-to-reverse digests through the fast polynomial modulus operation.

AdvantagesThis strong privacy guarantee yields a powerful

application of fuzzy fingerprint method in the cloud

computing environment.

It provides high accuracy performance

It has very low false positive rate.

The privacy guarantee of this approach is much higher 11

12

FLOW

DIAGRAM

13

SYSTEM ARCHITECTURE

14

USE CASE DIAGRAM

15

CLASS DIAGRAM

16

SEQUENCE DIAGRAM

17

MODULES

Data Owner

Fuzzy finger Print

DLD

Data Receiver

18

MODULES DESCRIPTION

19

Data Owner

The system enables the data owner to securely

delegate the content-inspection task to DLD providers

without exposing the sensitive data.

The data owner computes a special set of digests or

fingerprints from the sensitive data and then discloses

only a small amount of them to the DLD provider.

20

It is the data owner, who post-processes the potential

leaks sent back by the DLD provider and determines

whether there is any real data leak.

The sensitive data is sent by a legitimate user intended

for legitimate purposes. The data owner is aware of

legitimate data transfers and permits such transfers.

21

So the data owner can tell whether a piece of sensitive data in the network traffic is a leak using legitimate data transfer policies.

22

Data Owner

Register andlogin

Send data

Permit transfers

Data Owner

23

Fuzzy finger Print

To achieve the privacy goal, the data owner

generates a special type of digests.

The digests are called fuzzy fingerprints.

24

IMPLENEMTATION

1.Data Encryption Standard (DES)

DES algorithm is used to encrypt and decrypt data in our project

25

• DES works by encrypting groups of 64 message bits, • Out of which 56 are key bits and remaining 8 are

check bits.

26

• 2.Secure Hash Algorithm

•Message digest is 160 bits, 20 bytes, 40 digit

hexadecimal format notation .

• It has 80 rounds.

• It produces a short and hard to reverse hash key

27

• Algorithm structure :

• Step 1: Padding bits

• Step 2: Appending length as 64 bit unsigned

• Step 3: Buffer initiation

• Step 4: Processing of message

• Step 5: Output

• example, the SHA-256 hash code for “www.mytecbits.com ” is

• 575f62a15889fa8ca55514a10754d2f98e30c57c4538f0f3e39dc531

14533857.

28

It prevents the DLD provider from learning its exact

value.

The data owner transforms each fingerprints into a

fuzzy fingerprint.

All fuzzy fingerprints are collected and form the

output of this operation.

29

Fuzzy finger Print

Generate digestsHide sensitive

data

Prevent the DLD

Fuzzy finger Print

30

DLD

The DLD provider computes fingerprints from

network traffic and identifies potential leaks in

them.

To prevent the DLD provider from gathering

exact knowledge about the sensitive data,

31

the collection of potential leaks is composed of real leaks and noises.

It is the data owner, who post-processes the potential leaks sent back by the DLD provider and determines whether there is any real data leak.

32

DLD

The DLD server detects the sensitive data within

each packet on basis of a stateless filtering

system.

DLD provider inspects the network traffic for

potential data leaks.

33

The inspection can be performed offline without

causing any real-time delay in routing the packets.

However, the DLD provider may attempt to gain

knowledge about the sensitive data.

34

DLD

Identify leaksCompute

fingerprints

Inspect thenetwork traffic

DLD

35

Data receiver

This operation is run by the data receiver on

each piece of sensitive data.

The data reciever recieves the data and this

data is in encrypted format.

The data is decrypted and text is obtained.

36

Data receiver

ReceiveCollect each

packet

Compute

Data receiver

37

System Requirements

38

System RequirementsSoftware Requirements:

• O/S : Windows XP / 7 / 8 / 10

• Language : Java.

• IDE : Eclipse

• Data Base : MySQL

39

System RequirementsHardware Requirements

• System : Pentium IV 2.4 GHz and above

• Hard Disk : 160 GB

• Monitor : 15 VGA color

• Mouse : Logitech.

• Keyboard : 110 keys enhanced

• Ram : 2GB

40

LITERATURE SURVEY

41

Title Year Author Methodology Advantages Disadvantages

Data leak detection as a service

2012 Xiaokui Shu Danfeng (Daphne) Yao

The system propose a network-based data-leak detection (DLD)technique, the main feature of which is that the detectiondoes not require the data owner to reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed

provide a quantifiable method to measure the privacy guarantee offered by ourfuzzy fingerprint framework.

It is not efficient enough for practical dataleak inspection in this setting.

42


Quantifying Information Leaks in Outbound Web Traffic

2009 Kevin Borders Atul Prakash

The system present an approach for quantifying information leak capacity in network traffic. Instead of trying to detect the presence of sensitive data—an impossible task in the general case—our goal is to measure and constrain its maximum volume

it possible to identify smaller leaks.

Traffic measurement does not completely stop information leaks from slipping by undetected

43


Panorama:Capturing system-wide information flow for malware detection andanalysis

2007 H. Yin, D. Song, M. Egele, C. Kruegel, and E. Kirda,

We propose a system, Panorama, todetect and analyze malware by capturing this fundamentaltrait. In our extensive experiments, Panorama successfullydetected all the malware samples and had very few falsepositives.

It does send back sensitive information to remoteservers in certain settings

detecting malware and analyzing unknown code samplesare insufficient and have significant shortcomings.

44


Protecting confidentialdata on personal computers with storage capsules

2009 K. Borders, E. V. Weele, B. Lau, and A . Prakash

This paper introduces Storages Capsules, a new approach for protecting confidential files on a personal computer. Storage Capsules are encrypted file containers that allow a compromised machine to securely view and edit sensitive files without malware being able to steal confidential data

The system achieves this goal by taking a checkpoint of the current system state and disabling device output before allowing access a Storage Capsule

It do not rely on high integrity.

45


Preventing accidental data disclosure inmodern operating systems

2013 A. Nadkarni and W. Enck,

This paper presents Aquifer as a policy framework and system for preventing accidental information disclosure in modernoperating systems. In Aquifer, application developers define secrecy restrictions that protect the entire user interfaceworkflow defining the user task

the lack of application separationdid not expose it as a concern.

It may not be trusted with that data.

46


Revolver: An automated approach to the detection of evasive web-basedmalware,

2013 A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, and G. Vigna

In this paper, we present Revolver, a novel approach to automatically detect evasive behavior in malicious JavaScript.Revolver uses efficient techniques to identify similarities between a large number of JavaScript programs (despite their use of obfuscation techniques, such as packing,polymorphism, and dynamic code generation), and to automatically interpret their differences to detect evasions.

Revolverhas identifiedseveral techniques that attackers use to evade existingdetection tools by continuously running in parallel with a honeyclient.

This approach was defeated bystatic detection of the malicious code using signatures.

47


Gyrus: A framework foruser-intent monitoring of text-based networked applications,

2014 Y. Jang, S. P. Chung, B. D. Payne, and W. Lee

In this paper, wepropose a way to break this cycle by ensuring that a system’s behavior matches the user’s intent. Since our approach is attackagnostic, it will scale better than traditional security systems

Gyrus is very efficient and introducesno noticeable delay to a users’ interaction with the protectedapplications

Gyrus solves problem byrelying on the semantics, but not the timing of user generatedevents

48


Privacy-preserving scanningof big content for sensitive data exposure with MapReduce

2015 F. Liu, X. Shu, D. Yao, and A. R. Butt,

Our solution uses the MapReduce-framework for detecting exposedsensitive content, because it has the ability to arbitrarilyscale and utilize public resources for the task, such as Amazon EC2. We design new MapReduce algorithms for computing collection intersection for data leak detection

This transformation supports the secure out-sourcing of the data leak detection to untrusted MapReduceand cloud providers.

a significant portionof the incidents are caused by unintentional mistakes of employees or data owners


Fuzzy keywordsearch over encrypted data in cloud computing

2010 J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, and W. Lou,

In this paper, forthe first time we formalize and solve the problem of effective fuzzykeyword search over encrypted cloud data while maintainingkeyword privacy.

proposed solution is secure andprivacy-preserving, while correctly realizing the goal of fuzzykeyword search.

unsuitable in Cloud Computing as it greatlyaffects system usability, rendering user searching experiencesvery frustrating and system efficacy very low.


Towards practical avoidance of informationleakage in enterprise networks

2011 J. Croft and M. Caesar,

In this paper, we propose a network-wide methodof confining and controlling the flow of sensitive datawithin a network. Our approach is based onblack-box differencing– we run two logical copies of the network,one with private data scrubbed, and compare outputs of the two to determine if and when private data is being leaked.

purposeschemes that leverage black-box differencing to mitigateleakage of private data.

It may not be able to monitor encrypted traffic without encryption keys or information flows that areintentionally obfuscated by attackers.

50

51

Conclusion

Preventing sensitive data from being compromised is an

important and practical research problem.

The proposed system used (Secure Hash algorithm (SHA) to

generate short and hard-to-reverse digests through the fast

polynomial modulus operation.

52

Using special digests, the exposure of the sensitive data is kept to a minimum during the detection.

53

References[1] X. Shu and D. Yao, “Data leak detection as a service,”

in Proc. 8th Int. Conf. Secur. Privacy Commun. Netw.,

2012, pp. 222–240.

[2] K. Borders and A. Prakash, “Quantifying information

leaks in outbound web traffic,” in Proc. 30th IEEE Symp.

Secur. Privacy , May 2009, pp. 129–140.

54

References

[3] H. Yin, D. Song, M. Egele, C. Kruegel, and E.

Kirda, “Panorama: Capturing system-wide information

flow for malware detection and analysis,” in Proc.

14th ACM Conf. Comput. Commun. Secur. , 2007, pp.

116–127.

p2 project

Documents