fast detection of transformed data leaks[mithun_p_c]

31
FAST DETECTION OF TRANSFORMED DATA LEAKS Submitted By, JOSNA KRISHNA S7 CSE ROLL No.:35

Upload: mithunpchandra

Post on 14-Jan-2017

361 views

Category:

Engineering


9 download

TRANSCRIPT

Page 1: Fast detection of transformed data leaks[mithun_p_c]

FAST DETECTION OF TRANSFORMED DATA LEAKS

Submitted By,JOSNA KRISHNA

S7 CSEROLL No.:35

Page 2: Fast detection of transformed data leaks[mithun_p_c]

CONTENTS: INTRODUCTION SENSITIVE DATAS IN COMPANIES DATA LEAKAGE-------HOW??? DANGER… TOWARDS SECURITY EXISTING SYSTEM PROPOSED SYSTEM INTO THE ALGORITHM CONCLUSION

Page 3: Fast detection of transformed data leaks[mithun_p_c]

INTRODUCTION

DATA LEAKAGE: Data leakage is the unauthorized transmission of sensitive data or information from within an organization to an external destination .

Page 4: Fast detection of transformed data leaks[mithun_p_c]

SENSITIVE DATAS OF COMPANIES INCLUDES………

•Intellectual Properties•Financial Information•Patient Information•Personal Credit Card Data,•& Other Information Depending Upon the Business and the industry.

Page 5: Fast detection of transformed data leaks[mithun_p_c]

DATA LEAKAGE-----------HOW???•In the course of business, data must be handed over to trusted 3rd Parties for some operations.

•Sometimes these trusted 3rd Parties may act as points ofData leakage.

•Data Leakage mainly happens due to Human Errors.

Page 6: Fast detection of transformed data leaks[mithun_p_c]

EXAMPLES ARE……

•A hospital may give patient records to researcher who will devise new treatment.•Company may have partnership with other companies that require sharing of customer data.•An enterprise may outsource it’s data processing, so data must be given to various othercompanies.

Page 7: Fast detection of transformed data leaks[mithun_p_c]
Page 8: Fast detection of transformed data leaks[mithun_p_c]

DANGER….•Number of leaked sensitive data records has grown 10 times in recent years.•Data leakage by accidents exceeds the risk posed by vulnerable software.•Sensitive data leakage is more in cases where there is no End-to-End encryption (example: PGP-Pretty Good Privacy)

Page 9: Fast detection of transformed data leaks[mithun_p_c]

TOWARDS SECURITY……•Prevent clear text sensitive Data from Direct Access.•Deploy a Screening Tool:

-To scan computer file systems.-To scan server storage.-Inspect outbound network traffic.

•Data leak detection differs from AntiVirus and Network Intrusion Detection System (AV&NIDS).

Page 10: Fast detection of transformed data leaks[mithun_p_c]

DATA LEAK DETECTION HAS.. :

->New security requirements &

->Algorithmic Challenges.

Algorithmic Challenges:-Data Transformation-Scalability

•Direct usage of Automata-based string matching is not possible.

Page 11: Fast detection of transformed data leaks[mithun_p_c]

EXISTING SYSTEM :It is based on Set Intersection.Operation performed on 2 sets of n-grams.One from content and one from sensitive data.This method is used to detect similar documents on:

• The web.• Shared malicious traffic

pattern.• Malware.• E-mail spam.

Page 12: Fast detection of transformed data leaks[mithun_p_c]

EXISTING SYSTEMS ARE :

Symantec DLP Identity Finder Global Velocity GoCloud DLP etc.

Page 13: Fast detection of transformed data leaks[mithun_p_c]

DISADVANTAGES OF EXISTING SYSTEM :

Set Intersection is order less.(Ordering of shared n-grams is not analyzed)

Generates false alerts.(When n is set to small value)Cannot detect the partial data leakage.It is not an adequate method.

Page 14: Fast detection of transformed data leaks[mithun_p_c]

PROPOSED SYSTEM:This one is holding sequential alignment algorithm.Executed on :• Sampled sensitive data sequence.• Sampled content being inspected.

Alignment produces the amount of sensitive data in a content.More accuracy is achieved.

Page 15: Fast detection of transformed data leaks[mithun_p_c]

FACING THE CHALLENGES :

Scalability issue is solved by sampling

both the Sensitive Data & Content

Sequence before aligning.

A pair of algorithms is used:

• Comparable Sampling Algorithm

• Sampling Oblivious Alignment

Algorithm

High detection specificity.Pervasive & localized modifications.

Page 16: Fast detection of transformed data leaks[mithun_p_c]

ABOUT THE ALGORITHM :

o The Comparable Sampling Algorithm yields constant samples of a sequence wherever the sampling starts and ends

o The Sampling Oblivious Alignment Algorithm infers the similarity between the original unsampled sequence with sophisticated techniques through dynamic programming.

Page 17: Fast detection of transformed data leaks[mithun_p_c]

CONTINUATION : In this method, both sensitive data

& content sequence are sampled. The alignment is performed on

sampled sequences Here, a ‘Comparable Sampling’

property is used. Both the algorithms performs more

faster on a GPU than a CPU. Promises high speed security

scanning.

Page 18: Fast detection of transformed data leaks[mithun_p_c]

INTO THE ALGORITHMS

Page 19: Fast detection of transformed data leaks[mithun_p_c]

COMPARABLE SAMPLING ALGORITHM

Requirements:

Definition 1: A substring is a consecutive segment of the original string.

Definition 2: A subsequence does not require its items to be consecutive in the original string.

Page 20: Fast detection of transformed data leaks[mithun_p_c]

Definition 3: Given string x is substring of y ,comparable sampling on x and y yields x’ and y’. x’ is similar to a substring of y’.Definition 4: Given x as a substring of y, a subsequence preserving sampling on x and y yield two subsequences x’ and y’ ,so that x’ is substring of y’.

Page 21: Fast detection of transformed data leaks[mithun_p_c]

ADVANTAGES : It is deterministic and

subsequence preserving.

This algorithm is unbiased.

It yields a constant samples of a sequence wherever the sampling starts and ends.

Page 22: Fast detection of transformed data leaks[mithun_p_c]

COMPARABLE SAMPLING ALGORITHM : Input: an array S of items, a size |w| for a sliding

window w, a selection function f (w, N) that selects N smallest

items from a window w, i.e., f = min(w, N) Output: a sampled array T 1: initialize T as an empty array of size |S| 2: w ←read(S, |w|) 3: let w.head and w.tail be indices in S

corresponding to the higher-indexed end and lower-indexed end of w,

respectively 4: collection mc ← min(w, N) 5: while w is within the boundary of S do

Page 23: Fast detection of transformed data leaks[mithun_p_c]

6: mp ←mc 7: move w toward high index by 1 8: mc ← min(w, N) 9: if mc = mp then 10: item en ← collectionDiff (mc,mp) 11: item eo ← collectionDiff (mp,mc) 12: if en < eo then 13: write value en to T at w.head’s position 14: else 15: write value eo to T at w.tail’s position 16: end if 17: end if 18: end while

Page 24: Fast detection of transformed data leaks[mithun_p_c]

ALGORITHM ANALYSIS :We set our sampling procedure with a sliding

windowof size 6 (i.e., |w| = 6) and N= 3. The input sequence is 1,5,1,9,8,5,3,2,4,8. The initial

windoww= [1,5,1,9,8,5] and collection mc =

sliding{1,1,5}.

Page 25: Fast detection of transformed data leaks[mithun_p_c]

COMPLEXITY :

The complexity of selection function is O(n log|w|) or O(n),where n is the size of input, |w| is the size of the window.

The factor O(log|w|) comes from maintaining the smallest N items within the window.

Page 26: Fast detection of transformed data leaks[mithun_p_c]

SAMPLING OBLIVIOUS ALIGNMENT ALGORITHM

Requirements:

The algorithm runs on compact sampled sequences L .

Extra fields for scoring matrix cells in dynamic

programming.

Extra step in recurrence relation for updating the null

region.

Complex weight function computes similarities

between two null region.

Page 27: Fast detection of transformed data leaks[mithun_p_c]

SECURITY ADVANTAGES :

Order –aware comparison

High Tolerance to pattern variation

Capability of detecting partial leaks

Consistent

Page 28: Fast detection of transformed data leaks[mithun_p_c]

SAMPLING OBLIVIOUS ALIGNMENT ALGORITHM Input: A weight function fw, visited

cells in H matrix that areadjacent to H(i, j ): H(i −1, j −1), H(i, j

−1), and H(i −1, j ),and the i -th and j -th items Lai,Lbjin two sampled sequences Laand Lb, respectively.

Page 29: Fast detection of transformed data leaks[mithun_p_c]
Page 30: Fast detection of transformed data leaks[mithun_p_c]

CONCLUSION :

•Presented here is a content inspection technique for sensitive data leakage.•Detection approach is based on aligning 2 samples for similarity comparison.•Our alignment method is useful for common data scenarios.

Page 31: Fast detection of transformed data leaks[mithun_p_c]