data warehousing data mining privacy. reading bhavani thuraisingham, murat kantarcioglu, and...

26
Data Warehousing Data Warehousing Data Mining Data Mining Privacy Privacy

Upload: bridget-lucas

Post on 03-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Data WarehousingData Warehousing

Data MiningData Mining

PrivacyPrivacy

ReadingReading

Bhavani Thuraisingham, Murat Kantarcioglu, and Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer. 2007. Extended RBAC-design and Srinivasan Iyer. 2007. Extended RBAC-design and implementation for a secure data warehouse. Int. J. implementation for a secure data warehouse. Int. J. Bus. Intell. Data Min. 2, 4 (December 2007), 367-Bus. Intell. Data Min. 2, 4 (December 2007), 367-382., 382., https://www.utdallas.edu/~bxt043000/Publications/Technical-Reports/UTDCS-bxt043000/Publications/Technical-Reports/UTDCS-35-07.pdf 35-07.pdf

Sweeney L, Abu A, and Winn J. Identifying Participants in the Personal Genome Project by Name. Harvard University. Data Privacy Lab. White Paper 1021-1. April 24, 2013. http://dataprivacylab.org/projects/pgp/1021-1.pdf

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 22

Data WarehousingData Warehousing

Repository of data providing Repository of data providing organized and cleaned organized and cleaned enterprise-wide data (obtained enterprise-wide data (obtained form a variety of sources) in a form a variety of sources) in a standardized formatstandardized format– Data mart (single subject area)Data mart (single subject area)– Enterprise data warehouse (integrated Enterprise data warehouse (integrated

data marts)data marts)– Metadata Metadata

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 33

OLAP AnalysisOLAP Analysis

Aggregation functionsAggregation functions Factual data accessFactual data access Complex criteriaComplex criteria Visualization Visualization

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 44

Warehouse EvaluationWarehouse Evaluation

Enterprise-wide supportEnterprise-wide support Consistency and integration Consistency and integration

across diverse domainacross diverse domain Security supportSecurity support Support for operational usersSupport for operational users Flexible access for decision Flexible access for decision

makersmakers

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 55

Data IntegrationData Integration

Data accessData access Data federationData federation Change captureChange capture Need ETL (extraction, Need ETL (extraction,

transformation, load)transformation, load)

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 66

Data Warehouse Data Warehouse UsersUsers Internal usersInternal users

– EmployeesEmployees– Managerial Managerial

External usersExternal users– Reporting and auditingReporting and auditing– Research Research

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 77

Data MiningData Mining

Databases to be mined Knowledge to be mined Techniques Used Applications supported

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 88

Data Mining TaskData Mining Task

DM: mostly automatedDM: mostly automated Prediction TasksPrediction Tasks

– Use some variables to predict Use some variables to predict unknown or future values of other unknown or future values of other variablesvariables

Description TasksDescription Tasks– Find human-interpretable patterns Find human-interpretable patterns

that describe the datathat describe the data

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 99

Common TasksCommon Tasks

Classification [Predictive]Classification [Predictive] Clustering [Descriptive]Clustering [Descriptive] Association Rule Mining [Descriptive]Association Rule Mining [Descriptive] Regression [Predictive]Regression [Predictive] Deviation Detection [Predictive]Deviation Detection [Predictive]

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1010

Security for Data Security for Data WarehousingWarehousing Establish organizations security Establish organizations security

policies and procedurespolicies and procedures Implement logical access controlImplement logical access control Restrict physical accessRestrict physical access Establish internal control and Establish internal control and

auditingauditing

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1111

Data Warehousing Data Warehousing Issues: IntegrityIssues: Integrity Poor quality data: inaccurate, Poor quality data: inaccurate,

incomplete, missing meta-dataincomplete, missing meta-data Loss of traditional consistency, Loss of traditional consistency,

e.g., keyse.g., keys Source data quality vs. derived Source data quality vs. derived

data qualitydata quality– Trust in the result of analysis?Trust in the result of analysis?

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1212

Big Data Security and Big Data Security and Privacy Privacy Amount of data being Amount of data being

consideredconsidered Privacy-preserving analyticsPrivacy-preserving analytics Granular Access ControlGranular Access Control

– Flat, two dimensional tablesFlat, two dimensional tables Transaction logs and auditingTransaction logs and auditing Real time monitoringReal time monitoring

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1313

Big Data IntegrityBig Data Integrity

Data AccuracyData Accuracy Source provenanceSource provenance End-point filtering and validationEnd-point filtering and validation

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1414

Access ControlAccess Control

Layered defense:Layered defense:– Access to processes that extract Access to processes that extract

operational dataoperational data– Access to data and process that Access to data and process that

transforms operational datatransforms operational data– Access to data and meta-data in Access to data and meta-data in

the warehousethe warehouse

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1515

Access Control IssuesAccess Control Issues

Mapping from local to Mapping from local to warehouse policieswarehouse policies

How to handle “new” dataHow to handle “new” data ScalabilityScalability Identity ManagementIdentity Management

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1616

Inference ProblemInference Problem

Data Mining: discover “new knowledge” Data Mining: discover “new knowledge” how how to evaluate security risks?to evaluate security risks?

Example security risks: Example security risks: – Prediction of sensitive informationPrediction of sensitive information– Misuse of informationMisuse of information

Assurance of “discovery”Assurance of “discovery”

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1717

Privacy and Privacy and SensitivitySensitivity Large volume of private (personal) Large volume of private (personal)

datadata Need:Need:

– Proper acquisition, maintenance, Proper acquisition, maintenance, usage, and retention policyusage, and retention policy

– Integrity verificationIntegrity verification– Control of analysis methods Control of analysis methods

(aggregation may reveal sensitive (aggregation may reveal sensitive data)data)

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1818

PrivacyPrivacy

What is the difference between What is the difference between confidentiality and privacy?confidentiality and privacy?

Identity, location, activity, etc.Identity, location, activity, etc. Anonymity vs. accountabilityAnonymity vs. accountability

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1919

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 2020

LegislationsLegislations

Privacy Act of 1974, U.S. Department of Justice Privacy Act of 1974, U.S. Department of Justice (http://www.usdoj.gov/oip/04_7_1.html )(http://www.usdoj.gov/oip/04_7_1.html )

Family Educational Rights and Privacy Act (FERPA), Family Educational Rights and Privacy Act (FERPA), U.S. Department of Education, U.S. Department of Education, (http://www.ed.gov/policy/gen/guid/fpco/ferpa/inde(http://www.ed.gov/policy/gen/guid/fpco/ferpa/index.html )x.html )

Health Insurance Portability and Accountability Act Health Insurance Portability and Accountability Act of 1996 (HIPAA), of 1996 (HIPAA), (http://en.wikipedia.org/wiki/Health_Insurance_Port(http://en.wikipedia.org/wiki/Health_Insurance_Portability_and_Accountability_Act )ability_and_Accountability_Act )

Telecommunications Consumer Privacy Act Telecommunications Consumer Privacy Act (http://www.answers.com/topic/electronic-(http://www.answers.com/topic/electronic-communications-privacy-act )communications-privacy-act )

Online Social NetworkOnline Social Network

Social RelationshipSocial RelationshipCommunication context changes Communication context changes

social relationshipssocial relationships

Social relationships maintained Social relationships maintained through different media grow at through different media grow at different rates and to different different rates and to different depthsdepths

No clear consensus which media is No clear consensus which media is the bestthe best

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 2121

Internet and Social Internet and Social RelationshipsRelationships

InternetInternetBridges distance at a low costBridges distance at a low cost

New participants tend to “like” New participants tend to “like” each other moreeach other more

Less stressful than face-to-face Less stressful than face-to-face meetingmeeting

People focus on communicating People focus on communicating their “selves” (except a few their “selves” (except a few malicious users)malicious users)

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 2222

Social NetworkSocial Network

Description of the social structure Description of the social structure between actorsbetween actors

Connections: various levels of social Connections: various levels of social familiarities, e.g., from casual familiarities, e.g., from casual acquaintance to close familiar bondsacquaintance to close familiar bonds

Support online interaction and Support online interaction and content sharingcontent sharing

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 2323

Social Network Social Network AnalysisAnalysis

The mapping and measuring of The mapping and measuring of relationships and flowsrelationships and flows between between people, groups, organizations, people, groups, organizations, computers or other information computers or other information processing entitiesprocessing entities

Behavioral ProfilingBehavioral Profiling Note: Note: Social Network SignaturesSocial Network Signatures

– User names may change, family and User names may change, family and friends are more difficult to changefriends are more difficult to change

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 2424

Interesting Read:Interesting Read:

M. Chew, D. Balfanz, B. Laurie, M. Chew, D. Balfanz, B. Laurie, (Under)mining Privacy in Social (Under)mining Privacy in Social Networks, Networks, http://citeseer.ist.psu.edu/viewdhttp://citeseer.ist.psu.edu/viewdoc/summary?oc/summary?doi=10.1.1.149.4468 doi=10.1.1.149.4468

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 2525

NextNext

Web application insecurity: risk Web application insecurity: risk to databasesto databases

FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 2626