datprof test data management (data privacy & data subsetting) - english

32
Test Data Management Harald Kikkers, Maarten Urbach & Bert Nienhuis

Upload: datprof

Post on 12-Apr-2017

140 views

Category:

Software


3 download

TRANSCRIPT

Page 1: DATPROF Test data Management (data privacy & data subsetting) - English

Test Data ManagementHarald Kikkers, Maarten Urbach & Bert Nienhuis

Page 2: DATPROF Test data Management (data privacy & data subsetting) - English

DATPROF

Data IntegrationTest Data Management

• Dutch Software supplier

• Founded in 1998

• Partners: ITCG, Sogeti, …

Page 3: DATPROF Test data Management (data privacy & data subsetting) - English

…and many more!

Page 4: DATPROF Test data Management (data privacy & data subsetting) - English

MANYORGANISATIONSUSE MULTIPLE COPIES OFPRODUCTION DATABASES

Page 5: DATPROF Test data Management (data privacy & data subsetting) - English

PURPOSES:• TESTING

• DEVELOPMENT

• OUTSOURCING

• MARKETING

• TRAINING

Page 6: DATPROF Test data Management (data privacy & data subsetting) - English

Agile Development • Building the right product• Room for change• Every 2-4 weeks working increments of the software• Progress in development

Page 7: DATPROF Test data Management (data privacy & data subsetting) - English

Test Automation

How to test all these iterations?

And… what data to use?

?

Page 8: DATPROF Test data Management (data privacy & data subsetting) - English

Team 1 Team 2 Team 3

6 TB 500 GBProduction

10 GB

6 TB 500 GBTest

10 GB 6 TB 500 GBDevelopment

10 GB

Total19,53 TB

Page 9: DATPROF Test data Management (data privacy & data subsetting) - English

Team 1 Team 2 Team 3

6 TB 500 GBProduction

10 GB

6 TB 500 GBTest

10 GB 6 TB 500 GBDevelopment

10 GB

Total19,53 TB

Team 1 Team 2 Team 3

Test

Team 1 Team 2 Team 3

Development

Page 10: DATPROF Test data Management (data privacy & data subsetting) - English

Team 1 Team 2 Team 3

6 TB 500 GBProduction

10 GB

6 TB 500 GB

Development

10 GB

6 TB 500 GBTest

10 GB

6 TB 500 GB

Development

10 GB

6 TB 500 GBTest

10 GB

6 TB 500 GB

Development

10 GB

6 TB 500 GBTest

10 GB

Total45,57 TB

Page 11: DATPROF Test data Management (data privacy & data subsetting) - English

Team 1 Team 2 Team 3

6 TB 500 GBProduction

10 GB

600 GB 50 GB

Development

1 GB

600 GB 50 GBTest

1 GB

600 GB 50 GB

Development

1 GB

600 GB 50 GBTest

1 GB

600 GB 50 GB

Development

1 GB

600 GB 50 GBTest

1 GB

Total10.4 TB

10 % Subset 10 % Subset 10 % Subset

Page 12: DATPROF Test data Management (data privacy & data subsetting) - English

Development

Test

Development

Test

Development

Test

How to protectsensitive customer data?

Page 13: DATPROF Test data Management (data privacy & data subsetting) - English

Test Test Test

Development Development Development

Page 14: DATPROF Test data Management (data privacy & data subsetting) - English

Minimize data usage

Save on hardware & infra

Reduce throughput times

Efficient data management

Protect customer information

Comply with regislation

Prevent brand damage

Maintain competitive advantages

Subsetting AnonymizingAdvantages of subsetting data Advantages of scrambling & masking data

Page 15: DATPROF Test data Management (data privacy & data subsetting) - English

DBA Tools ETL Suites100$ tools IBM, Informatica, Oracle

Page 16: DATPROF Test data Management (data privacy & data subsetting) - English

DBA Tools ETL Suites?

Page 17: DATPROF Test data Management (data privacy & data subsetting) - English

DBA Tools ETL Suites

- User Experience

- Default templates

- Easy to maintain

- Smart functionality

- Chain support

Page 18: DATPROF Test data Management (data privacy & data subsetting) - English

DBA Tools ETL Suites

Page 19: DATPROF Test data Management (data privacy & data subsetting) - English

Production Test/DevelopmentSource Database Target Database

Page 20: DATPROF Test data Management (data privacy & data subsetting) - English

Data model classification

Subset – Process dataExample: Customers, Orders, Contracts, Invoices, Transactions

Full – Master dataExample: Application data, configuration, master tables

Embty – Logging, non relevant historyExample: Logging tables, temp tabellen

Determine data to be subsetted

Page 21: DATPROF Test data Management (data privacy & data subsetting) - English
Page 22: DATPROF Test data Management (data privacy & data subsetting) - English

Chain of systemsMethod for deriving consistent subsets from multiple systems

Production Test/Development

Start FilterAll customers from The

Netherlands

Start FilterAll orders from customers in

the previous subset.

Page 23: DATPROF Test data Management (data privacy & data subsetting) - English

ImportMeta data Classification Deployment

Page 24: DATPROF Test data Management (data privacy & data subsetting) - English

Anonymization of sensitive data

Page 25: DATPROF Test data Management (data privacy & data subsetting) - English

- Bank account balance

- Dept

- Medication

- Illness

- Religion

- Political preference

- Salary

- Phone history

- Et cetera…

- Name

- Date of birth

- Email

- Bank account number

- Social security number

- Adress

- Insurance number

- Cellphone number

- Et cetera..

Personal data

Identifying Characteristics

“Any information relating to an identified or identifiable natural person ("data subject")Source: Data Protection Directive - Directive 95/46/EC

Page 26: DATPROF Test data Management (data privacy & data subsetting) - English

Techniques

Page 27: DATPROF Test data Management (data privacy & data subsetting) - English

ShuffleShuffle values within same column

ConditionalManipulate specified rows+

First name Last name Type

John

Max

Joe

Clark

Smith

Williams

DATPROF

Customer

Customer

Customer

Company

Page 28: DATPROF Test data Management (data privacy & data subsetting) - English

321First name Last name Type Comment E-Mail

John

Max

Joe

Smith

Williams

Clark

BlankDelete values from columns

ScrambleReplace existing characters

[email protected]

[email protected]

[email protected]

“Brother of J. Clark”

“Has dept”

Customer

Customer

Customer

CompanyDATPROF

Page 29: DATPROF Test data Management (data privacy & data subsetting) - English

Nr. First name Last name Type Co.. E-mail Date of Birth

John

Max

Joe

Smith

Williams

Clark

DATPROF

123

Customer

Customer

Customer

Company

321

789

456

First dayChange dates to first day within same month and year

01-02-1954

01-11-1984

01-03-1974

Postal code

Date of Birth 1st day of month 1st day of year

87% 3.7% 0.04%Source: research anonimity by Prof. Dr. Latanya Sweeney (Harvard University)

x.xxxxx@xxxx...

Xxxxx_xxx@xx...

x_xx@XxxXxxx...

Page 30: DATPROF Test data Management (data privacy & data subsetting) - English

Nr. First name Last name Type .. E-mail Date of birth

123

321

789

01-02-1954

01-11-1984

01-03-1974

Look-upReplace values with values from a lookup table

James

Adrian

Thomas

John

Max

Joe

First names

Chris

Thomas

James

Ruben

Adrian

Michael

David

Reference data

Smith

Williams

Clark

DATPROF

Customer

Customer

Customer

Company

x.xxxxx@xxxx...

Xxxxx_xxx@xx...

x_xx@XxxXxxx...

Page 31: DATPROF Test data Management (data privacy & data subsetting) - English

Nr. First name Last name Type Comment E-mail Date of birth

Thomas

James

Adrian

Smith

Williams

Clark

DATPROF

123

Customer

Customer

Customer

Company

321

789

456

01-02-1954

01-11-1984

01-03-1974

ExpressionUse custom made functions

Scrambled [email protected]

[email protected]

[email protected]

Scrambled

Scrambled

Page 32: DATPROF Test data Management (data privacy & data subsetting) - English

ImportMeta data

Define masking rules 3. Deployment