p2 project

54
BY: MOHAMMED ATHEEQ SHARIEFF HARSHA VAIDYANATH AMITH B.K UNDER GUIDANCE OF: Mr.RAJESH A project on Privacy-Preserving Detection of Sensitive Data Exposure 1

Upload: 12341234666

Post on 14-Jan-2017

127 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: P2 Project

BY: MOHAMMED ATHEEQ SHARIEFF

HARSHA VAIDYANATH

AMITH B.K

UNDER GUIDANCE OF: Mr.RAJESH

A project on

Privacy-Preserving Detection of Sensitive Data Exposure

1

Page 2: P2 Project

2

Abstract

The exposure of sensitive data in storage and

transmission poses a serious threat to organizational

and personal security.

Data leak detection aims at scanning content for

exposed sensitive data.

Page 3: P2 Project

3

In this project the system propose a data- leake

detection (DLD).

It can be outsourced and be deployed in a semi-honest

detection environment.

This approach works well especially in the case where

consecutive data blocks are leaked

Page 4: P2 Project

4

INTRODUCTIONCurrent applications tend to use personal sensitive

information to achieve better quality with respect to their

services. Since the third parties are not trusted the data must

be protected such that individual data privacy is not

compromised but at the same time operations on it would be

compatible.

Page 5: P2 Project

5

The system implement, and evaluate a new privacy-

preserving data-leak detection system that enables the

data owner to safely deploy locally, or to delegate the

traffic-inspection task to DLD providers without

exposing the sensitive data.

Page 6: P2 Project

6

In our model, the data owner computes a special set of digests or fingerprints from the sensitive data, and then discloses only a small amount of digest information to the DLD provider.

Page 7: P2 Project

Existing systemIn existing system, the system used MD5 algorithms.

The MD5 message-digest algorithm is a widely used

cryptographic hash function producing a 128-bit (16-byte) hash

value, typically expressed in text format as a 32 digit

hexadecimal number.

MD5 has been utilized in a wide variety of cryptographic

applications, and is also commonly used to verify data integrity. 7

Page 8: P2 Project

8

DisadvantagesThe customer or data owner does not need to fully

trust the DLD provider using our approach.

Keywords usually do not cover enough sensitive data

segments for data-leak detection.

It does not aim to provide an remote service.

Page 9: P2 Project

9

Proposed systemThe system propose a privacy-preserving data-leak

detection model for preventing inadvertent data leak

in network traffic.

The DLD provider may learn sensitive information

from the traffic, which is inevitable for all deep

packet inspection approaches.

Page 10: P2 Project

10

The proposed system uses (Secure Hash algorithm (SHA) to generate short and hard-to-reverse digests through the fast polynomial modulus operation.

Page 11: P2 Project

AdvantagesThis strong privacy guarantee yields a powerful

application of fuzzy fingerprint method in the cloud

computing environment.

It provides high accuracy performance

It has very low false positive rate.

The privacy guarantee of this approach is much higher 11

Page 12: P2 Project

12

FLOW

DIAGRAM

Page 13: P2 Project

13

SYSTEM ARCHITECTURE

Page 14: P2 Project

14

USE CASE DIAGRAM

Page 15: P2 Project

15

CLASS DIAGRAM

Page 16: P2 Project

16

SEQUENCE DIAGRAM

Page 17: P2 Project

17

MODULES

Data Owner

Fuzzy finger Print

DLD

Data Receiver

Page 18: P2 Project

18

MODULES DESCRIPTION

Page 19: P2 Project

19

Data Owner

The system enables the data owner to securely

delegate the content-inspection task to DLD providers

without exposing the sensitive data.

The data owner computes a special set of digests or

fingerprints from the sensitive data and then discloses

only a small amount of them to the DLD provider.

Page 20: P2 Project

20

It is the data owner, who post-processes the potential

leaks sent back by the DLD provider and determines

whether there is any real data leak.

The sensitive data is sent by a legitimate user intended

for legitimate purposes. The data owner is aware of

legitimate data transfers and permits such transfers.

Page 21: P2 Project

21

So the data owner can tell whether a piece of sensitive data in the network traffic is a leak using legitimate data transfer policies.

Page 22: P2 Project

22

Data Owner

Register andlogin

Send data

Permit transfers

Data Owner

Page 23: P2 Project

23

Fuzzy finger Print

To achieve the privacy goal, the data owner

generates a special type of digests.

The digests are called fuzzy fingerprints.

Page 24: P2 Project

24

IMPLENEMTATION

1.Data Encryption Standard (DES)

DES algorithm is used to encrypt and decrypt data in our project

Page 25: P2 Project

25

• DES works by encrypting groups of 64 message bits, • Out of which 56 are key bits and remaining 8 are

check bits.

Page 26: P2 Project

26

• 2.Secure Hash Algorithm

•Message digest is 160 bits, 20 bytes, 40 digit

hexadecimal format notation .

• It has 80 rounds.

• It produces a short and hard to reverse hash key

Page 27: P2 Project

27

• Algorithm structure :

• Step 1: Padding bits

• Step 2: Appending length as 64 bit unsigned

• Step 3: Buffer initiation

• Step 4: Processing of message

• Step 5: Output

• example, the SHA-256 hash code for “www.mytecbits.com ” is

• 575f62a15889fa8ca55514a10754d2f98e30c57c4538f0f3e39dc531

14533857.

Page 28: P2 Project

28

It prevents the DLD provider from learning its exact

value.

The data owner transforms each fingerprints into a

fuzzy fingerprint.

All fuzzy fingerprints are collected and form the

output of this operation.

Page 29: P2 Project

29

Fuzzy finger Print

Generate digestsHide sensitive

data

Prevent the DLD

Fuzzy finger Print

Page 30: P2 Project

30

DLD

The DLD provider computes fingerprints from

network traffic and identifies potential leaks in

them.

To prevent the DLD provider from gathering

exact knowledge about the sensitive data,

Page 31: P2 Project

31

the collection of potential leaks is composed of real leaks and noises.

It is the data owner, who post-processes the potential leaks sent back by the DLD provider and determines whether there is any real data leak.

Page 32: P2 Project

32

DLD

The DLD server detects the sensitive data within

each packet on basis of a stateless filtering

system.

DLD provider inspects the network traffic for

potential data leaks.

Page 33: P2 Project

33

The inspection can be performed offline without

causing any real-time delay in routing the packets.

However, the DLD provider may attempt to gain

knowledge about the sensitive data.

Page 34: P2 Project

34

DLD

Identify leaksCompute

fingerprints

Inspect thenetwork traffic

DLD

Page 35: P2 Project

35

Data receiver

This operation is run by the data receiver on

each piece of sensitive data.

The data reciever recieves the data and this

data is in encrypted format.

The data is decrypted and text is obtained.

Page 36: P2 Project

36

Data receiver

ReceiveCollect each

packet

Compute

Data receiver

Page 37: P2 Project

37

System Requirements

Page 38: P2 Project

38

System RequirementsSoftware Requirements:

• O/S : Windows XP / 7 / 8 / 10

• Language : Java.

• IDE : Eclipse

• Data Base : MySQL

Page 39: P2 Project

39

System RequirementsHardware Requirements

• System : Pentium IV 2.4 GHz and above

• Hard Disk : 160 GB

• Monitor : 15 VGA color

• Mouse : Logitech.

• Keyboard : 110 keys enhanced

• Ram : 2GB

Page 40: P2 Project

40

LITERATURE SURVEY

Page 41: P2 Project

41

Title Year Author Methodology Advantages Disadvantages

Data leak detection as a service

2012 Xiaokui Shu Danfeng (Daphne) Yao

The system propose a network-based data-leak detection (DLD)technique, the main feature of which is that the detectiondoes not require the data owner to reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed

provide a quantifiable method to measure the privacy guarantee offered by ourfuzzy fingerprint framework.

It is not efficient enough for practical dataleak inspection in this setting.

Page 42: P2 Project

42

Title Year Author Methodology Advantages Disadvantages

Quantifying Information Leaks in Outbound Web Traffic

2009 Kevin Borders Atul Prakash

The system present an approach for quantifying information leak capacity in network traffic. Instead of trying to detect the presence of sensitive data—an impossible task in the general case—our goal is to measure and constrain its maximum volume

it possible to identify smaller leaks.

Traffic measurement does not completely stop information leaks from slipping by undetected

Page 43: P2 Project

43

Title Year Author Methodology Advantages Disadvantages

Panorama:Capturing system-wide information flow for malware detection andanalysis

2007 H. Yin, D. Song, M. Egele, C. Kruegel, and E. Kirda,

We propose a system, Panorama, todetect and analyze malware by capturing this fundamentaltrait. In our extensive experiments, Panorama successfullydetected all the malware samples and had very few falsepositives.

It does send back sensitive information to remoteservers in certain settings

detecting malware and analyzing unknown code samplesare insufficient and have significant shortcomings.

Page 44: P2 Project

44

Title Year Author Methodology Advantages Disadvantages

Protecting confidentialdata on personal computers with storage capsules

2009 K. Borders, E. V. Weele, B. Lau, and A . Prakash

This paper introduces Storages Capsules, a new approach for protecting confidential files on a personal computer. Storage Capsules are encrypted file containers that allow a compromised machine to securely view and edit sensitive files without malware being able to steal confidential data

The system achieves this goal by taking a checkpoint of the current system state and disabling device output before allowing access a Storage Capsule

It do not rely on high integrity.

Page 45: P2 Project

45

Title Year Author Methodology Advantages Disadvantages

Preventing accidental data disclosure inmodern operating systems

2013 A. Nadkarni and W. Enck,

This paper presents Aquifer as a policy framework and system for preventing accidental information disclosure in modernoperating systems. In Aquifer, application developers define secrecy restrictions that protect the entire user interfaceworkflow defining the user task

the lack of application separationdid not expose it as a concern.

It may not be trusted with that data.

Page 46: P2 Project

46

Title Year Author Methodology Advantages Disadvantages

Revolver: An automated approach to the detection of evasive web-basedmalware,

2013 A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, and G. Vigna

In this paper, we present Revolver, a novel approach to automatically detect evasive behavior in malicious JavaScript.Revolver uses efficient techniques to identify similarities between a large number of JavaScript programs (despite their use of obfuscation techniques, such as packing,polymorphism, and dynamic code generation), and to automatically interpret their differences to detect evasions.

Revolverhas identifiedseveral techniques that attackers use to evade existingdetection tools by continuously running in parallel with a honeyclient.

This approach was defeated bystatic detection of the malicious code using signatures.

Page 47: P2 Project

47

Title Year Author Methodology Advantages Disadvantages

Gyrus: A framework foruser-intent monitoring of text-based networked applications,

2014 Y. Jang, S. P. Chung, B. D. Payne, and W. Lee

In this paper, wepropose a way to break this cycle by ensuring that a system’s behavior matches the user’s intent. Since our approach is attackagnostic, it will scale better than traditional security systems

Gyrus is very efficient and introducesno noticeable delay to a users’ interaction with the protectedapplications

Gyrus solves problem byrelying on the semantics, but not the timing of user generatedevents

Page 48: P2 Project

48

Title Year Author Methodology Advantages Disadvantages

Privacy-preserving scanningof big content for sensitive data exposure with MapReduce

2015 F. Liu, X. Shu, D. Yao, and A. R. Butt,

Our solution uses the MapReduce-framework for detecting exposedsensitive content, because it has the ability to arbitrarilyscale and utilize public resources for the task, such as Amazon EC2. We design new MapReduce algorithms for computing collection intersection for data leak detection

This transformation supports the secure out-sourcing of the data leak detection to untrusted MapReduceand cloud providers.

a significant portionof the incidents are caused by unintentional mistakes of employees or data owners

Page 49: P2 Project

Title Year Author Methodology Advantages Disadvantages

Fuzzy keywordsearch over encrypted data in cloud computing

2010 J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, and W. Lou,

In this paper, forthe first time we formalize and solve the problem of effective fuzzykeyword search over encrypted cloud data while maintainingkeyword privacy.

proposed solution is secure andprivacy-preserving, while correctly realizing the goal of fuzzykeyword search.

unsuitable in Cloud Computing as it greatlyaffects system usability, rendering user searching experiencesvery frustrating and system efficacy very low.

Page 50: P2 Project

Title Year Author Methodology Advantages Disadvantages

Towards practical avoidance of informationleakage in enterprise networks

2011 J. Croft and M. Caesar,

In this paper, we propose a network-wide methodof confining and controlling the flow of sensitive datawithin a network. Our approach is based onblack-box differencing– we run two logical copies of the network,one with private data scrubbed, and compare outputs of the two to determine if and when private data is being leaked.

purposeschemes that leverage black-box differencing to mitigateleakage of private data.

It may not be able to monitor encrypted traffic without encryption keys or information flows that areintentionally obfuscated by attackers.

50

Page 51: P2 Project

51

Conclusion

Preventing sensitive data from being compromised is an

important and practical research problem.

The proposed system used (Secure Hash algorithm (SHA) to

generate short and hard-to-reverse digests through the fast

polynomial modulus operation.

Page 52: P2 Project

52

Using special digests, the exposure of the sensitive data is kept to a minimum during the detection.

Page 53: P2 Project

53

References[1] X. Shu and D. Yao, “Data leak detection as a service,”

in Proc. 8th Int. Conf. Secur. Privacy Commun. Netw.,

2012, pp. 222–240.

[2] K. Borders and A. Prakash, “Quantifying information

leaks in outbound web traffic,” in Proc. 30th IEEE Symp.

Secur. Privacy , May 2009, pp. 129–140.

Page 54: P2 Project

54

References

[3] H. Yin, D. Song, M. Egele, C. Kruegel, and E.

Kirda, “Panorama: Capturing system-wide information

flow for malware detection and analysis,” in Proc.

14th ACM Conf. Comput. Commun. Secur. , 2007, pp.

116–127.