demo, may 2005 privacy preserving database application testing xintao wu, yongge wang, yuliang...

18
Demo, May 2005 Privacy Preserving Database Application Testing Xintao Wu, Yongge Wang, Yuliang Zheng, UNC Charlotte

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Demo, May 2005

Privacy Preserving Database Application Testing

Xintao Wu, Yongge Wang, Yuliang Zheng, UNC Charlotte

Demo 2

Overview• Milestone

Initial investigation from May 2002 to Dec 2002 Official starting from Sept 2003 and being supported by NSF CCR-

0310974 ( 200k, Sept 2003 – August 2005) The prototype system was finished April 2005. Developed using C+

+, Oracle with 22K lines of source code Demo at several Banks, May 2005 …

• Personnel Faculty: Xintao Wu, Yongge Wang, Yuliang Zheng Current graduate students: Songtao Guo, Ying Wu, Chintan

Sanghvi, Guodong Jiao Previous graduate students: Jing Jin, Amol Kedar Several senior undergraduate students

• More Info http://www.cs.uncc.edu/~xwu/privacy [email protected]

Demo 3

Motivation

• To generate synthetic data for DB application testing, especially performance testing.

Many applications are involving large-scale databases with sensitive information.

Complete testing is essential for database applications to function correctly and to provide acceptable performance.

Demo 4

Our Approach

• To generate synthetic databases based on a-priori knowledge about the current production databases

The needed a-priori knowledge is generally available from ER, DDL, Data Dictionary with schema, data integrity rules as well as basic statistical information

Can extract detailed statistical information if original data or samples from production database are available

The data can be either realistic amounts or any amounts

Better controllability, observability, and privacy

Demo 5

Three Characteristics of Synthetic Data• Valid

The synthetic data need to satisfy all the same constraints and business rules as the live data

Necessary for functional testing

• Privacy preserving No disclosure of any confidential information that need to be protected

• Resembling to real data The synthetic data need to have the similar statistical distributions or

patterns as the live data Necessary for performance testing as the statistical nature of the data

determines query performance

We will show if data distributions are not similar, the execution

time of the same workload may be totally different.

Demo 6

ER

Data

DDL

Catalog

R NR S

Schema & Domain Filter

Schema’ Domain’

Disclosure Assessment

Performance Assessment

Data Generator

Syntheticdatabase

General Location Model

Architecture

Demo 7

Building a Project

Demo 8

Data Dictionary Information

Demo 9

Statistical Information Extraction Basic

Demo 10

Statistical Information Extraction Advance

Demo 11

Generating Meta & Data File

Demo 12

Generating Confidential File

Demo 13

Disclosure Analysis - Categorical

Demo 14

Numerical Disclosure Basic Batch Mode

Demo 15

Numerical Disclosure Basic Single Mode

Demo 16

Creating Final Categorical File

Demo 17

Creating Final Rule File (GLM Format)

Demo 18

Generating Data