data warehousing data mining privacy. reading bhavani thuraisingham, murat kantarcioglu, and...
TRANSCRIPT
ReadingReading
Bhavani Thuraisingham, Murat Kantarcioglu, and Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer. 2007. Extended RBAC-design and Srinivasan Iyer. 2007. Extended RBAC-design and implementation for a secure data warehouse. Int. J. implementation for a secure data warehouse. Int. J. Bus. Intell. Data Min. 2, 4 (December 2007), 367-Bus. Intell. Data Min. 2, 4 (December 2007), 367-382., 382., https://www.utdallas.edu/~bxt043000/Publications/Technical-Reports/UTDCS-bxt043000/Publications/Technical-Reports/UTDCS-35-07.pdf 35-07.pdf
Sweeney L, Abu A, and Winn J. Identifying Participants in the Personal Genome Project by Name. Harvard University. Data Privacy Lab. White Paper 1021-1. April 24, 2013. http://dataprivacylab.org/projects/pgp/1021-1.pdf
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 22
Data WarehousingData Warehousing
Repository of data providing Repository of data providing organized and cleaned organized and cleaned enterprise-wide data (obtained enterprise-wide data (obtained form a variety of sources) in a form a variety of sources) in a standardized formatstandardized format– Data mart (single subject area)Data mart (single subject area)– Enterprise data warehouse (integrated Enterprise data warehouse (integrated
data marts)data marts)– Metadata Metadata
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 33
OLAP AnalysisOLAP Analysis
Aggregation functionsAggregation functions Factual data accessFactual data access Complex criteriaComplex criteria Visualization Visualization
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 44
Warehouse EvaluationWarehouse Evaluation
Enterprise-wide supportEnterprise-wide support Consistency and integration Consistency and integration
across diverse domainacross diverse domain Security supportSecurity support Support for operational usersSupport for operational users Flexible access for decision Flexible access for decision
makersmakers
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 55
Data IntegrationData Integration
Data accessData access Data federationData federation Change captureChange capture Need ETL (extraction, Need ETL (extraction,
transformation, load)transformation, load)
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 66
Data Warehouse Data Warehouse UsersUsers Internal usersInternal users
– EmployeesEmployees– Managerial Managerial
External usersExternal users– Reporting and auditingReporting and auditing– Research Research
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 77
Data MiningData Mining
Databases to be mined Knowledge to be mined Techniques Used Applications supported
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 88
Data Mining TaskData Mining Task
DM: mostly automatedDM: mostly automated Prediction TasksPrediction Tasks
– Use some variables to predict Use some variables to predict unknown or future values of other unknown or future values of other variablesvariables
Description TasksDescription Tasks– Find human-interpretable patterns Find human-interpretable patterns
that describe the datathat describe the data
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 99
Common TasksCommon Tasks
Classification [Predictive]Classification [Predictive] Clustering [Descriptive]Clustering [Descriptive] Association Rule Mining [Descriptive]Association Rule Mining [Descriptive] Regression [Predictive]Regression [Predictive] Deviation Detection [Predictive]Deviation Detection [Predictive]
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1010
Security for Data Security for Data WarehousingWarehousing Establish organizations security Establish organizations security
policies and procedurespolicies and procedures Implement logical access controlImplement logical access control Restrict physical accessRestrict physical access Establish internal control and Establish internal control and
auditingauditing
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1111
Data Warehousing Data Warehousing Issues: IntegrityIssues: Integrity Poor quality data: inaccurate, Poor quality data: inaccurate,
incomplete, missing meta-dataincomplete, missing meta-data Loss of traditional consistency, Loss of traditional consistency,
e.g., keyse.g., keys Source data quality vs. derived Source data quality vs. derived
data qualitydata quality– Trust in the result of analysis?Trust in the result of analysis?
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1212
Big Data Security and Big Data Security and Privacy Privacy Amount of data being Amount of data being
consideredconsidered Privacy-preserving analyticsPrivacy-preserving analytics Granular Access ControlGranular Access Control
– Flat, two dimensional tablesFlat, two dimensional tables Transaction logs and auditingTransaction logs and auditing Real time monitoringReal time monitoring
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1313
Big Data IntegrityBig Data Integrity
Data AccuracyData Accuracy Source provenanceSource provenance End-point filtering and validationEnd-point filtering and validation
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1414
Access ControlAccess Control
Layered defense:Layered defense:– Access to processes that extract Access to processes that extract
operational dataoperational data– Access to data and process that Access to data and process that
transforms operational datatransforms operational data– Access to data and meta-data in Access to data and meta-data in
the warehousethe warehouse
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1515
Access Control IssuesAccess Control Issues
Mapping from local to Mapping from local to warehouse policieswarehouse policies
How to handle “new” dataHow to handle “new” data ScalabilityScalability Identity ManagementIdentity Management
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1616
Inference ProblemInference Problem
Data Mining: discover “new knowledge” Data Mining: discover “new knowledge” how how to evaluate security risks?to evaluate security risks?
Example security risks: Example security risks: – Prediction of sensitive informationPrediction of sensitive information– Misuse of informationMisuse of information
Assurance of “discovery”Assurance of “discovery”
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1717
Privacy and Privacy and SensitivitySensitivity Large volume of private (personal) Large volume of private (personal)
datadata Need:Need:
– Proper acquisition, maintenance, Proper acquisition, maintenance, usage, and retention policyusage, and retention policy
– Integrity verificationIntegrity verification– Control of analysis methods Control of analysis methods
(aggregation may reveal sensitive (aggregation may reveal sensitive data)data)
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1818
PrivacyPrivacy
What is the difference between What is the difference between confidentiality and privacy?confidentiality and privacy?
Identity, location, activity, etc.Identity, location, activity, etc. Anonymity vs. accountabilityAnonymity vs. accountability
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 1919
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 2020
LegislationsLegislations
Privacy Act of 1974, U.S. Department of Justice Privacy Act of 1974, U.S. Department of Justice (http://www.usdoj.gov/oip/04_7_1.html )(http://www.usdoj.gov/oip/04_7_1.html )
Family Educational Rights and Privacy Act (FERPA), Family Educational Rights and Privacy Act (FERPA), U.S. Department of Education, U.S. Department of Education, (http://www.ed.gov/policy/gen/guid/fpco/ferpa/inde(http://www.ed.gov/policy/gen/guid/fpco/ferpa/index.html )x.html )
Health Insurance Portability and Accountability Act Health Insurance Portability and Accountability Act of 1996 (HIPAA), of 1996 (HIPAA), (http://en.wikipedia.org/wiki/Health_Insurance_Port(http://en.wikipedia.org/wiki/Health_Insurance_Portability_and_Accountability_Act )ability_and_Accountability_Act )
Telecommunications Consumer Privacy Act Telecommunications Consumer Privacy Act (http://www.answers.com/topic/electronic-(http://www.answers.com/topic/electronic-communications-privacy-act )communications-privacy-act )
Online Social NetworkOnline Social Network
Social RelationshipSocial RelationshipCommunication context changes Communication context changes
social relationshipssocial relationships
Social relationships maintained Social relationships maintained through different media grow at through different media grow at different rates and to different different rates and to different depthsdepths
No clear consensus which media is No clear consensus which media is the bestthe best
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 2121
Internet and Social Internet and Social RelationshipsRelationships
InternetInternetBridges distance at a low costBridges distance at a low cost
New participants tend to “like” New participants tend to “like” each other moreeach other more
Less stressful than face-to-face Less stressful than face-to-face meetingmeeting
People focus on communicating People focus on communicating their “selves” (except a few their “selves” (except a few malicious users)malicious users)
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 2222
Social NetworkSocial Network
Description of the social structure Description of the social structure between actorsbetween actors
Connections: various levels of social Connections: various levels of social familiarities, e.g., from casual familiarities, e.g., from casual acquaintance to close familiar bondsacquaintance to close familiar bonds
Support online interaction and Support online interaction and content sharingcontent sharing
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 2323
Social Network Social Network AnalysisAnalysis
The mapping and measuring of The mapping and measuring of relationships and flowsrelationships and flows between between people, groups, organizations, people, groups, organizations, computers or other information computers or other information processing entitiesprocessing entities
Behavioral ProfilingBehavioral Profiling Note: Note: Social Network SignaturesSocial Network Signatures
– User names may change, family and User names may change, family and friends are more difficult to changefriends are more difficult to change
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 2424
Interesting Read:Interesting Read:
M. Chew, D. Balfanz, B. Laurie, M. Chew, D. Balfanz, B. Laurie, (Under)mining Privacy in Social (Under)mining Privacy in Social Networks, Networks, http://citeseer.ist.psu.edu/viewdhttp://citeseer.ist.psu.edu/viewdoc/summary?oc/summary?doi=10.1.1.149.4468 doi=10.1.1.149.4468
FarkasFarkas CSCE 824 - Spring 2015CSCE 824 - Spring 2015 2525