security methods for statistical databases by karen goodwin

24
Security Methods for Security Methods for Statistical Databases Statistical Databases by Karen Goodwin by Karen Goodwin

Upload: mohamed-woollen

Post on 14-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Security Methods for Statistical Databases by Karen Goodwin

Security Methods for Security Methods for Statistical DatabasesStatistical Databases

by Karen Goodwinby Karen Goodwin

Page 2: Security Methods for Statistical Databases by Karen Goodwin

IntroductionIntroduction

Statistical Databases containing medical Statistical Databases containing medical information are often used for researchinformation are often used for research

Some of the data is protected by laws to Some of the data is protected by laws to help protect the privacy of the patienthelp protect the privacy of the patient

Proper security precautions must be Proper security precautions must be implemented to comply with laws and implemented to comply with laws and respect the sensitivity of the datarespect the sensitivity of the data

Page 3: Security Methods for Statistical Databases by Karen Goodwin

Accuracy vs. ConfidentialityAccuracy vs. Confidentiality

Accuracy –Accuracy –

Researchers want to Researchers want to extract accurate and extract accurate and meaningful datameaningful data

Confidentiality – Confidentiality –

Patients, laws and Patients, laws and database database administrators want to administrators want to maintain the privacy maintain the privacy of patients and the of patients and the confidentiality of their confidentiality of their informationinformation

Page 4: Security Methods for Statistical Databases by Karen Goodwin

LawsLaws

Health Insurance Portability and Accountability Act – Health Insurance Portability and Accountability Act – HIPAA (Privacy Rule)HIPAA (Privacy Rule)

Covered organizations must comply by April 14, 2003Covered organizations must comply by April 14, 2003 Designed to improve efficiency of healthcare system by using Designed to improve efficiency of healthcare system by using

electronic exchange of data and maintaining security electronic exchange of data and maintaining security Covered entitiesCovered entities (health plans, healthcare clearinghouses, (health plans, healthcare clearinghouses,

healthcare providers) may not use or disclose protected healthcare providers) may not use or disclose protected information except as permitted or requiredinformation except as permitted or required

Privacy RulePrivacy Rule establishes a “minimum necessary standard” for establishes a “minimum necessary standard” for the purpose of making covered entities evaluate their current the purpose of making covered entities evaluate their current regulations and security precautionsregulations and security precautions

Page 5: Security Methods for Statistical Databases by Karen Goodwin

HIPAA ComplianceHIPAA Compliance

Companies offer 3Companies offer 3rdrd Party Certification of Party Certification of covered entitiescovered entities

Such companies will check your company Such companies will check your company and associating companies for compliance and associating companies for compliance with HIPAAwith HIPAA

Can help with rapid implementation and Can help with rapid implementation and compliance to HIPAA regulationscompliance to HIPAA regulations

Page 6: Security Methods for Statistical Databases by Karen Goodwin

Types of Statistical DatabasesTypes of Statistical Databases

StaticStatic – a static – a static database is made database is made once and never once and never changeschanges

Example: U.S. CensusExample: U.S. Census

DynamicDynamic – changes – changes continuously to reflect continuously to reflect real-time datareal-time data

Example: most online Example: most online research databasesresearch databases

Page 7: Security Methods for Statistical Databases by Karen Goodwin

Security MethodsSecurity Methods Access RestrictionAccess Restriction Query Set RestrictionQuery Set Restriction MicroaggregationMicroaggregation Data PerturbationData Perturbation Output PerturbationOutput Perturbation AuditingAuditing Random SamplingRandom Sampling

Page 8: Security Methods for Statistical Databases by Karen Goodwin

Access RestrictionAccess Restriction

Databases normally have different access levels Databases normally have different access levels for different types of usersfor different types of users

User ID and passwords are the most common User ID and passwords are the most common methods for restricting accessmethods for restricting access In a medical database:In a medical database:

Doctors/Healthcare Representative – full access to Doctors/Healthcare Representative – full access to informationinformation

Researchers – only access to partial information Researchers – only access to partial information (e.g. aggregate information)(e.g. aggregate information)

Page 9: Security Methods for Statistical Databases by Karen Goodwin

Query Set RestrictionQuery Set Restriction A query-set size control can limit the A query-set size control can limit the

number of records that must be in the number of records that must be in the result setresult set

Allows the query results to be displayed Allows the query results to be displayed only if the size of the query set satisfies only if the size of the query set satisfies the conditionthe condition

Setting a minimum query-set size can help Setting a minimum query-set size can help protect against the disclosure of individual protect against the disclosure of individual datadata

Page 10: Security Methods for Statistical Databases by Karen Goodwin

Query Set RestrictionQuery Set Restriction

Let K represents the minimum number or Let K represents the minimum number or records to be present for the query setrecords to be present for the query set

Let R represents the size of the query setLet R represents the size of the query set The query set can only be displayed ifThe query set can only be displayed if

K K R R

Page 11: Security Methods for Statistical Databases by Karen Goodwin

Query Set RestrictionQuery Set Restriction

Query 1

Query 1Results

Query 2Results

Query 2

K KQuery

Results

QueryResults

OriginalDatabase

Page 12: Security Methods for Statistical Databases by Karen Goodwin

MicroaggregationMicroaggregation

Raw (individual) data is grouped into small Raw (individual) data is grouped into small aggregates before publicationaggregates before publication

The average value of the group replaces each The average value of the group replaces each value of the individualvalue of the individual

Data with the most similarities are grouped Data with the most similarities are grouped together to maintain data accuracytogether to maintain data accuracy

Helps to prevent disclosure of individual dataHelps to prevent disclosure of individual data

Page 13: Security Methods for Statistical Databases by Karen Goodwin

MicroaggregationMicroaggregation

National Agricultural Statistics Service (NASS) National Agricultural Statistics Service (NASS) publishes data about farmspublishes data about farms

To protect against data disclosure, data is only To protect against data disclosure, data is only released at the county levelreleased at the county level

Farms in each county are averaged together to Farms in each county are averaged together to maintain as much purity, yet still protect against maintain as much purity, yet still protect against disclosuredisclosure

Page 14: Security Methods for Statistical Databases by Karen Goodwin

MicroaggregationMicroaggregation

10

12

13

11.67

11.67

11.67

Average

Age MicroaggregatedAge

57

54

59

56.67

56.67

56.67

Average

Page 15: Security Methods for Statistical Databases by Karen Goodwin

MicroaggregationMicroaggregation

Averaged

User

Que

ry

Res

ults

MicroaggregatedData

OriginalData

Page 16: Security Methods for Statistical Databases by Karen Goodwin

Data PerturbationData Perturbation

Perturbed data is raw data with noise Perturbed data is raw data with noise addedadded

ProPro: With perturbed databases, if : With perturbed databases, if unauthorized data is accessed, the true unauthorized data is accessed, the true value is not disclosed value is not disclosed

ConCon: Data perturbation runs the risk of : Data perturbation runs the risk of presenting biased datapresenting biased data

Page 17: Security Methods for Statistical Databases by Karen Goodwin

Data PerturbationData Perturbation

Noise Added

User 2

Query

Results

OriginalDatabase

PerturbedDatabase

User 1

Que

ry

Res

ults

Page 18: Security Methods for Statistical Databases by Karen Goodwin

Output PerturbationOutput Perturbation

Instead of the raw data being transformed Instead of the raw data being transformed as in Data Perturbation, only the output or as in Data Perturbation, only the output or query results are perturbedquery results are perturbed

The bias problem is less severe than with The bias problem is less severe than with data perturbationdata perturbation

Page 19: Security Methods for Statistical Databases by Karen Goodwin

Noise Addedto Results

User 2

Query

Results

OriginalDatabase

User 1

Query

Results

Output PerturbationOutput Perturbation

Query

Query Results

Results

Page 20: Security Methods for Statistical Databases by Karen Goodwin

AuditingAuditing

Auditing is the process of keeping track of Auditing is the process of keeping track of all queries made by each userall queries made by each user

Usually done with up-to-date logsUsually done with up-to-date logs Each time a user issues a query, the log is Each time a user issues a query, the log is

checked to see if the user is querying the checked to see if the user is querying the database maliciouslydatabase maliciously

Page 21: Security Methods for Statistical Databases by Karen Goodwin

Random SamplingRandom Sampling

Only a sample of the records meeting the Only a sample of the records meeting the requirements of the query are shownrequirements of the query are shown

Must maintain consistency by giving exact Must maintain consistency by giving exact same results to the same querysame results to the same query

WeaknessWeakness - Logical equivalent queries - Logical equivalent queries can result in a different query setcan result in a different query set

Page 22: Security Methods for Statistical Databases by Karen Goodwin

Comparison MethodsComparison Methods

SecuritySecurity – – possibility of exact disclosure, partial possibility of exact disclosure, partial disclosure, robustnessdisclosure, robustness

Richness of InformationRichness of Information – – amount of non-amount of non-confidential information eliminated, bias, confidential information eliminated, bias, precision, consistencyprecision, consistency

CostsCosts – – initial implementation cost, processing initial implementation cost, processing overhead per query, user educationoverhead per query, user education

The following criteria are used to determine the most effective methods of statistical database security:

Page 23: Security Methods for Statistical Databases by Karen Goodwin

A Comparison of MethodsA Comparison of Methods

MethodMethod SecuritySecurity Richness of Richness of InformationInformation

CostsCosts

Query-set RestrictionQuery-set Restriction LowLow LowLow11 LowLow

MicroaggregationMicroaggregation ModerateModerate ModerateModerate ModerateModerate

Data PerturbationData Perturbation HighHigh High-ModerateHigh-Moderate LowLow

Output PerturbationOutput Perturbation ModerateModerate Moderate-lowModerate-low LowLow

AuditingAuditing Moderate-LowModerate-Low ModerateModerate HighHigh

SamplingSampling ModerateModerate Moderate-LowModerate-Low ModerateModerate

1 Quality is low because a lot of information can be eliminated if the query does not meet the requirements

Page 24: Security Methods for Statistical Databases by Karen Goodwin

SourcesSources

This presentation is posted onThis presentation is posted on

http://www.cs.jmu.edu/users/aboutams Adam, Nabil R. ; Wortmann, John C.; Adam, Nabil R. ; Wortmann, John C.; Security-Control Security-Control

Methods for Statistical Databases: A Comparative Study; Methods for Statistical Databases: A Comparative Study; ACM Computing Surveys, Vol. 21, No. 4, December ACM Computing Surveys, Vol. 21, No. 4, December 1989 1989 (

http://delivery.acm.org/10.1145/80000/76895/p515-adam.pdf?key1=76895&key2=1947043301&coll=portal&dl=ACM&CFID=4702747&CFTOKEN=83773110)

Official HIPAA –Official HIPAA – (http://cms.hhs.gov/hipaa/) incur

Bernstein, Stephen W.; Bernstein, Stephen W.; Impact of HIPAA on Impact of HIPAA on BioTech/Pharma Research: Rules of the RoadBioTech/Pharma Research: Rules of the Road (

http://www.privacyassociation.org/docs/3-02bernstein.pdf)

Service Bureau; Service Bureau; 3rd Party Testing3rd Party Testing (http://hipaatesting.com/service_bureau.html)