accelerating multilevel secure database queries using p-tree technology imad rahal and dr. william...
DESCRIPTION
What are MLS/DBSs DBSs that implement secure access control policies to protect their data Each user or process is called a subject Each data item (column value or tuple) is a called an object The security hardware & software are stored in a TCB (Trusted Computing Base) (sometime referred to as Reference Monitor or Security Kernel)TRANSCRIPT
Accelerating Multilevel Secure Database Queries using P-Tree Technology Imad Rahal and Dr. William Perrizo Computer Science Department North Dakota State University
Outline
Introduction 1- What are MLS/DBSs ? 2- The Mandatory Access Control (MAC) Policy Attempts
The Sea View model (Secure data model) and PRISM model [6]
PRISM is based on Sea View but eliminates spurious tuples during recovery
Deficiencies of Seaview/PRISM (mainly speed) Query Acceleration using P-trees
Replace the Recovery data structure of PRISM Advantages: time efficiency
What are MLS/DBSs DBSs that implement secure access control
policies to protect their data Each user or process is called a subject Each data item (column value or tuple) is a
called an object The security hardware & software are stored
in a TCB (Trusted Computing Base) (sometime referred to as Reference Monitor or Security Kernel)
R(A1,C1, A2, C2….,An, Cn,TC) is a multi-level relation or view Ai’s are fields Ci’s are their respective sensitivity levels (form a lattice)
We use the convention that A1,C1 is the apparent key The apparent key does not have uniqueness but will be a key if all security
fields are combined together.
A1,C1,C2,……,Cn is the primary key
TC is the classification level of the tuple
Notice that TC = highest Ci for all i C1= lowest Ci for all i
The Mandatory Access Control (MAC) policy Each subject has a clearance level Each object has a sensitivity level Bell-Lapadula restrictions:
Simple Security Policy for READs (read down, i.e., subject can read at his level or down)
*-Policy for WRITEs (write up, i.e., his level or up)
X (a subject) dominates Y (an object) means X’s classification level must be equal to or exceed Y’s classification level
A simple example of DoD classification levels are (in descending order):
1- Top Secret(TC) 2- Secret (S) 3- Confidential (C) 4- Unclassified (U)
Attempts Seaview Model(Secure Data View)
Sponsored by RADC Joint effort by SRI, Gemini and Oracle Objective: Build an A1 (very secure) MLS/DBMS PRISM Model improves on Seaview by eliminating
spurious tuples during recovery automatically using a bit vector approach to mask surious tuples
Some other Models LDV(Lock Data View) ASD(Advanced Secure DBMS)
SEA View Model Multilevel relations exist at logical level
only(views of Single-level relations which are stored and managed by TCB)
Decomposition algorithm creates single level relations from a multilevel relation.
Recovery Algorithm creates an output multi-level relation from a set of physically stored single level relations.
Decomposition algorithm Let A1=key and Ai = any attribute Let x denote classifications of A1
Let y denote classifications of Ai
For every x, create RA1,x(A1) or just RA1,x i.e., for the key, we vertically partition by attribute and
horizontally partition by security level.
For every y, create RAi,x,y(A1,Ai) x y or just RAi,x,y I.e., for non-keys vertical partitioning by attribute and key and
horizontal partitioning by attribute and key classification level.
C 900 C450 CFD7 CCUC
TC
1000 C 750 U 750 Cspeed
480 C450 U350 Urange
NT5 UNT5 UMT1 UName*Missiles
MT1NT5
NameR name,u
FD7Name
R name,c
450NT5350
RangeMT1
NameR range,u,u
450Range
FD7Name
R range,c,c
900Speed
FD7Name
R speed,c,c
750Speed
NT5Name
R speed,u,u
480Range
NT5Name
R range,u,c
1000NT5750
SpeedMT1
NameR speed,u,c
Resulting decomposed single level relations are:
MT1NT5
NameR name,u
FD7Name
R name,c
450NT5350
RangeMT1
Name
R range,u,u
450Range
FD7Name
R range,c,c
900
Speed
FD7
NameR speed,c,c
750Speed
NT5Name
R speed,u,u
480Range
NT5Name
R range,u,c
1000NT5750
SpeedMT1
NameR speed,u,c
Deficiencies of the SEA View /PRISM Models The deficiencies of the SEA View Model (in its
recovery algorithm) Creation of spurious tuples (due to polyinstantiation) Space cost of temporary tables Time cost of unions Time cost of joins
PRISM solves the spurious tuple problem, but still suffers from time cost problems
Recovery acceleration using P-trees
Based on the Sea View / PRISM Model Uses its Decomposition algorithm New Recovery algorithm using the P-
tree technology (given a query, creates an output multi-level relation from the single level relations). Main contribution is in addressing the
space and time cost problems.
Recovery Algorithm 1. For every relation RAi,x,y (single level relations containing all
entries from the multilevel relation having keys at classification level x and Ai attribute values at classification level y), excluding base relations (those containing the key only), create a P-tree, PAi,x,y, denoting the presence or absence of the keys at level x.
The recovery algorithm is very analogous to the PRISM solution, but addresses time costs (and to some extent space costs – the space savings due to P-tree compression are the main reason for the time savings).
Next we introduce P-trees.
bSQ Format Split each numeric attribute into separate
bit files (one for each bit position). Reasons of using bSQ format
Different bits contribute to the value differently.
bSQ format facilitates the representation of a precision hierarchy (from 1 bit precision, upwards).
bSQ format facilitates the creation of an efficient data structure, the P-tree, P-tree algebra and T-cube.
The “tabular” formats (inverted list) BSQ and bSQ are “tabular” formats
BSQ consist of a separate table for each feature attribute bSQ consist of a separate table for each bit
One can view it this way: Data set is initially 1 relation or table, R(K1,..,Kk, A1,…, An)
K1,..,Kk are structure attributes and Ai are feature attributes.
Structure attributes of a 2-D image are X,Y coordinates of the pixels (rows). Structure attribute of a relation is a 1-D structure consisting of the key In BSQ we separate each feature into a separate file (similar to the
Decomposition Storage Model (DSM), Copeland et al, SIGMOD85, 268-279.) bSQ, separate each bit of each feature into a separate file (with a
consistent structural order assumed) (similar to the Bit Transpose File (BTF) model, Wong et al, VLDB85, pp 448-457.)
Peano Count Tree (P-tree)
A basic P-tree is a representation of a bSQ file in a recursive, segmentized (quadrant-by-quadrant in images) arrangement.
The basic P-trees provide a compressed, lossless, easily-manipulated representation of the original data.
An example Ptree for one bSQ file of an image
Peano or Z-ordering Pure (Pure-1/Pure-0) quadrant Root Count
Level Fan-out QID (Quadrant ID)
1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1
55
16 8 15 16
3 0 4 1 4 4 3 4
1 1 1 0 0 0 1 0 1 1 0 1
16 16
55
0 4 4 4 4
158
1 1 1 0
3
0 0 1 0
1
1 1
3
0 1
111111111111111111100100111100101111111111111111
64-tuple bSQ file
64-pixel bSQ raster image file
55
16 8 15 16
3 0 4 1 4 4 3 4
1 1 1 0 0 0 1 0 1 1 0 1
An example of Ptree
Peano or Z-ordering Pure (Pure-1/Pure-0) quadrant Root Count
Level Fan-out QID (Quadrant ID)
1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1
0 1 2 3
111
( 7, 1 ) ( 111, 001 ) 10.10.11
2
3
2 . 2 . 3
001
P-tree variation – PM-tree
Peano Mask tree (PM-tree) uses mask instead of count. 1 denotes pure-1, 0 denotes pure-0 and m denotes mixed. It provides an efficient way for ANDing. Predicate Tree (1 iff predicate is true for quadrant
E.g., Pure1-Tree (predicate: quad is all 1’s Most compact form (all are lossless)
1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1
m
1 m m 1
m 0 1 m 1 1 m 1
1 1 1 0 0 0 1 0 1 1 0 1
0
1 0 0 1
0 0 1 0 1 1 0 1
1 1 1 0 0 0 1 0 1 1 0 1
Ptree Algebra And Or Complement Other (XOR, etc)
Ptree: 55 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 16 ____8__ _15__ 16 / / | \ / | \ \ 3 0 4 1 4 4 3 4 //|\ //|\ //|\ 1110 0010 1101
Complement: 9 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 0 ____8__ __1__ 0 / / | \ / | \ \ 1 4 0 3 0 0 1 0 //|\ //|\ //|\ 0001 1101 0010
Ptree ANDing OperationPM-tree1: m ______/ / \ \______ / / \ \ / / \ \ 1 m m 1 / / \ \ / / \ \ m 0 1 m 1 1 m 1 //|\ //|\ //|\ 1110 0010 1101
PM-tree2: m ______/ / \ \______ / / \ \ / / \ \ 1 0 m 0 / / \ \ 1 1 1 m //|\ 0100
Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 0 m 0 / | \ \ 1 1 m m //|\ //|\ 1101 0100
0 100 101 102 12 132 20 21 220 221 223 23 3 & 0 20 21 22 231 RESULT0 0 0 20 20 20 21 21 21 220 221 223 22 220 221 223 23 231 231
Depth-first Pure 1 path code
Basic, Value and Tuple Ptrees
Value Ptrees (predicate: quad is purely target value in target attribute) e.g., P1, 5 = P1, 101 = P11 AND P12’ AND P13
Tuple Ptrees (predicate: quad is purely target tuple) e.g., P(1, 2, 3) = P(001, 010, 111) = P1, 001 AND P2, 010 AND P3, 111
AND
AND
Basic Ptrees (a Pure1-Trees predicate-tree for target bit of target attribute)e.g., P11, P12, …, P18, P21, …, P28, …, P71, …, P78
Target Attribute Target Value
Target Attribute Target Bit Position
Cube Ptrees (predicate: quad is purely in target cube (product of intervals)e.g., P([13],, [0.2]) = (P1,1 OR P1,2 OR P1,3) AND (P3,0 OR P3,1 OR P3,2)
AND/OR
Using Ptrees for MLS data(key=structure attribute)
If we have the following query: “Select name,
dev-by, length from R where range 35”
Time improvements to the recovery process using P-trees
0
2
4
6
8
10
12
100 500 900 1300 1700
Number of records (in thousands)
PRISM
P-Tree
Advantages Acceleration results from operating on
p-trees and restricting I/O to only those fields that are involved in the output of the query
Space efficiency due to p-tree compression
Correct output results (no spurious tuples in the output table)