Download - Lecture26 (1)
-
8/12/2019 Lecture26 (1)
1/20
Dr. Bhavani Thuraisingham
The University of Texas at Dallas (UTD)
October 2011
Cloud-based
Assured Information Sharingand
Identity Management
-
8/12/2019 Lecture26 (1)
2/20
Team Members Sponsor: Air Force Office of Scientific Research
The University of Texas at Dallas
Faculty: Dr. Murat Kantarcioglu; Dr. Latifur Khan; Dr. Kevin
Hamlen; Dr. Zhiqiang Lin
Sub-contractors
Prof. Elisa Bertino (Purdue) Ms. Anita Miller, Dr. Bob Johnson (North Texas Fusion Center)
Collaborators
Dr. Steve Barker, Kings College, U of London (EOARD)
Dr. Barbara Carminati; Dr. Elena Ferrari, U of Insubria (EOARD)
Prof. Peng Liu, Penn State
Prof. Ting Yu, NC State
-
8/12/2019 Lecture26 (1)
3/20
Outline Objectives
Layered Framework Data Security Issues for Clouds
Our Research
FY11
Cloud-based Assured Information Sharing Demonstration
RDF-based Policy Engine on the Cloud
Secure Query Processing in Hybrid Cloud
CloudMask: Purdue University
Stream-based Malware Detection on the Cloud
Hypervisor (e.g., Xen) Integrity Issues and Forensics in the Cloud
Preliminary Investigation of Identity Management
FY10
Secure Querying and Storing Relational Data with HIVE
Secure Querying and Storing RDF in Hadoop with SPARQL XACML Implementation for Hadoop
Amazon.com Web Services and Security
Accountability and Access Control (Joint with Purdue)
Acknowledgement: Research Funded by Air Force Office of Scientific Research
-
8/12/2019 Lecture26 (1)
4/20
Objectives
Cloud computingis an example of computing in which dynamically scalableand often virtualized resources are provided as a service over the Internet.
Users need not have knowledge of, expertise in, or control over the
technology infrastructure in the "cloud" that supports them.
Our research on Cloud Computing is based on Hadoop, MapReduce, Xen
Apache Hadoopis a Java software framework that supports data intensivedistributed applications under a free license. It enables applications to work
with thousands of nodes and petabytes of data. Hadoop was inspired by
Google's MapReduce and Google File System (GFS) papers.
XEN is a Virtual Machine Monitor developed at the University of Cambridge,
England
Our goal is to build a secure cloud infrastructure to assured informationsharing applications
-
8/12/2019 Lecture26 (1)
5/20
7/23/2014 5
Layered Framework
User Interface
Hadoop/MapReduc/Storage
HIVE/SPARQL/Query
XEN/Linux/VMM
Secure Virtual
Network Monitor
Policies
XACML
Risks/
Costs
QoSResource
Allocation
Cloud
Monitors
Figure2. Layered Framework for Assured Cloud
-
8/12/2019 Lecture26 (1)
6/20
Secure Query Processing with
Hadoop/MapReduce We have studied clouds based on Hadoop
Query rewriting and optimization techniques designed and
implemented for two types of data
(i) Relational data: Secure query processing with HIVE
(ii) RDF data: Secure query processing with SPARQL
Demonstrated with XACML policies
Joint demonstration with Kings College and University of Insubria
First demo (2011): Each party submits their data and policies Our cloud will manage the data and policies
Second demo (2012): Multiple clouds
-
8/12/2019 Lecture26 (1)
7/20
Fine-grained Access Control with HiveSystem Architecture
Table/View definition and loading,
Users can create tables as well asload data into tables. Further, they
can also upload XACML policies
for the table they are creating.
Users can also create XACML
policies for tables/views.
Users can define views only if
they have permissions for alltables specified in the query used
to create the view. They can also
either specify or create XACML
policies for the views they are
defining.
CollaborateCom 2010
-
8/12/2019 Lecture26 (1)
8/20
Server
Backend
SPARQL Query Optimizer for SecureRDF Data Processing
Web Interface
Data Preprocessor
N-Triples Converter
Prefix Generator
Predicate Based
Splitter
Predicate Object
Based Splitter
MapReduce Framework
Parser
Query Validator &
Rewriter
XACML PDP
Plan Generator
Plan Executor
Query Rewriter By
Policy
New Data Query Answer
To build an
efficient storage
mechanism usingHadoop for large
amounts of data
(e.g. a billion
triples); build an
efficient query
mechanism for
data stored in
Hadoop; Integratewith Jena
Developed a quer
optimizer and
query rewriting
techniques for
RDF Data with
XACML policiesand implemented
on top of JENA
IEEE Transactions
on Knowledge and
Data Engineering,
2011
-
8/12/2019 Lecture26 (1)
9/20
Demonstration: Concept of Operation
User Interface Layer
Fine-grained Access Control
with Hive
SPARQL Query Optimizer
for Secure RDF Data
Processing
Relational Data RDF Data
Agency 1 Agency 2 Agency n
-
8/12/2019 Lecture26 (1)
10/20
RDF-based Policy Engine on the Cloud
Policy
Transformation
Layer
ResultQuery
DB DB RDF
Policy Parser Layer
Regular Expression-Query
Data Controller Provenance Controller
. . .
RDF
XML
Policy / Graph
Transformation Rules
Access Control/ Redaction
Policy (Traditional Mechanism)
User Interface Layer
High Level SpecificationPolicy
Translator
A testbed for evaluating different policy sets overdifferent data representation. Also supporting
provenance as directed graph and viewing policy
outcomes graphically
Determine how access is granted to a resource as
well as how a document is shared
User specify policy: e.g., Access Control, Redaction,
Released Policy
Parse a high-level policy to a low-level
representation
Support Graph operations and visualization. Policy
executed as graph operations
Execute policies as SPARQL queries over large
RDF graphs on Hadoop
Support for policies over Traditional data and itsprovenance
IFIP Data and Applications Security, 2010, ACM
SACMAT 2011
-
8/12/2019 Lecture26 (1)
11/20
Integration with
Assured Information Sharing:
User Interface Layer
RDF Data Preprocessor
Policy Translation and
Transformation Layer
MapReduce Framework for
Query Processing
Hadoop HDFS
Agency 1 Agency 2 Agency n
RDF Data
and Policies
SPARQL Query
Result
-
8/12/2019 Lecture26 (1)
12/20
Secure Storage and Query Processing in a
Hybrid Cloud: Problem Motivation
The use of hybrid clouds is an emerging trend in cloud computing Ability to exploit public resources for high throughput
Yet, better able to control costs and data privacy
Several key challenges
Data Design: how to store data in a hybrid cloud? Solution must account for data representation used
(unencrypted/encrypted), public cloud monetary costs and
query workload characteristics
Query Processing: how to execute a query over a hybrid cloud?
Solution must provide query rewrite rules that ensure the
correctness of a generated query plan over the hybrid cloud
-
8/12/2019 Lecture26 (1)
13/20
Research Results Data Design:A user submits data, a query workload, monetary and confidentiality
constraints
Solve the data partitioning problem in four phases Partition the data into several public (Ppu) and private (Ppr) components
For each partition, Ppu & Ppr, obtain their associated statistics
Estimate the execution cost of given query workload based on a users choice of
confidentiality level as well as the statistics associated with the partition
Select the best partition as the one that minimizes query workload execution cost without
violating monetary and confidentiality constraints
Query Processing: A user submits a query Q
Solve the query processing problem in four phases
Query Rearrangement: Use query rewrite rules to transform an original query Q into public
(Qpu) and private (Qpr) query(ies)
Public Cloud Execution: Execute Qpu on public cloud
Private Cloud Execution: Execute Qpr
on private cloud Post-Processing: Combine the results of the execution of Qpu and Qpr into the final result
-
8/12/2019 Lecture26 (1)
14/20
Hypervisor integrity and forensics
in the Cloud
Cloud integrity&forensics
Hardware Layer
Virtualization Layer (Xen, vSphere)
Linux Solaris XP MacOS
Secure control flow of hypervisor code
Integrity via in-lined reference monitor
Forensics data extraction in the cloud
Multiple VMs
De-mapping (isolate) each VM memory from physical memory
Hypervisor
OS
Applications
integrityforensics
-
8/12/2019 Lecture26 (1)
15/20
-
8/12/2019 Lecture26 (1)
16/20
Cloud-based Malware Detection
ACM Transactions on Management Information Systems
Binary feature extraction involves
Enumerating binary n-grams from the binaries and selecting the best n-grams
based on information gain
For a training data with 3,500 executables, number of distinct 6-grams can
exceed 200 millions
In a single machine, this may take hours, depending on available computing
resourcesnot acceptable for training from a stream of binaries
We use Cloud to overcome this bottleneck
A Cloud Map-reduce framework is used
to extract and select features from each chunk
A 10-node cloud cluster is 10 times faster than a single node
Very effective in a dynamic framework, where malware characteristics
change rapidly
-
8/12/2019 Lecture26 (1)
17/20
Fine-grained attribute-based privacy-preserving access control
Fine-grained access control: different parts of the data can
be covered by different access control policies
Attribute-based access control: access control policies areexpressed in terms of identity attributes of subjects
accessing the data
Privacy-preserving: the cloud does not learn anything about
the contents of the data and the values of the identity
attributes of users System Developed is CloudMask
Joint Paper at CollobarateCom 2011
Key Features of CloudMask System:Elisa Bertino Purdue University and
Murat Kantarcioglu, UT Dallas
-
8/12/2019 Lecture26 (1)
18/20
Directions Secure VMM (Virtual Machine Monitor) and VNM (Virtual
Network Monitor)
Exploring XEN VMM and examining security issues
Developing automated techniques for VMM
introspection
Will examine VNM issues January 2012
Integrate Secure Storage Algorithms into Hadoop (FY
2012)
Identity Management (FY 2012)
Technology Transfer through Knowledge and Security
Analytics, LLC
-
8/12/2019 Lecture26 (1)
19/20
October 2011
Identity Management
for the Cloud
Kevin Hamlen, Peng Liu, Murat Kantarcioglu,
Bhavani Thuraisingham, Ting Yu
-
8/12/2019 Lecture26 (1)
20/20
Identity Management Considerations in a
Cloud
Trust model that handles (i) Various trust relationships, (ii) access control policies based on roles and
attributes, iii) real-time provisioning, (iv) authorization, and (v) auditing and
accountability.
Several technologies have to be examined to develop the trust
model Service-oriented technologies; standards such as SAML and XACML; and
identity management technologies such as OpenID.
Does one size fit all?
Can we develop a trust model that will be applicable to all types of clouds
such as private clouds, public clouds and hybrid clouds Identity architecturehas to be integrated into the cloud architecture.