accelerating multilevel secure database queries using p-tree technology imad rahal and dr. william...

Accelerating Multilevel Secure Database Queries using P-Tree Technology Imad Rahal and Dr. William Perrizo Computer Science Department North Dakota State University

Outline

Introduction 1- What are MLS/DBSs ? 2- The Mandatory Access Control (MAC) Policy Attempts

The Sea View model (Secure data model) and PRISM model [6]

PRISM is based on Sea View but eliminates spurious tuples during recovery

Deficiencies of Seaview/PRISM (mainly speed) Query Acceleration using P-trees

Replace the Recovery data structure of PRISM Advantages: time efficiency

What are MLS/DBSs DBSs that implement secure access control

policies to protect their data Each user or process is called a subject Each data item (column value or tuple) is a

called an object The security hardware & software are stored

in a TCB (Trusted Computing Base) (sometime referred to as Reference Monitor or Security Kernel)

R(A1,C1, A2, C2….,An, Cn,TC) is a multi-level relation or view Ai’s are fields Ci’s are their respective sensitivity levels (form a lattice)

We use the convention that A1,C1 is the apparent key The apparent key does not have uniqueness but will be a key if all security

fields are combined together.

A1,C1,C2,……,Cn is the primary key

TC is the classification level of the tuple

Notice that TC = highest Ci for all i C1= lowest Ci for all i

The Mandatory Access Control (MAC) policy Each subject has a clearance level Each object has a sensitivity level Bell-Lapadula restrictions:

Simple Security Policy for READs (read down, i.e., subject can read at his level or down)

*-Policy for WRITEs (write up, i.e., his level or up)

X (a subject) dominates Y (an object) means X’s classification level must be equal to or exceed Y’s classification level

A simple example of DoD classification levels are (in descending order):

1- Top Secret(TC) 2- Secret (S) 3- Confidential (C) 4- Unclassified (U)

Attempts Seaview Model(Secure Data View)

Sponsored by RADC Joint effort by SRI, Gemini and Oracle Objective: Build an A1 (very secure) MLS/DBMS PRISM Model improves on Seaview by eliminating

spurious tuples during recovery automatically using a bit vector approach to mask surious tuples

Some other Models LDV(Lock Data View) ASD(Advanced Secure DBMS)

SEA View Model Multilevel relations exist at logical level

only(views of Single-level relations which are stored and managed by TCB)

Decomposition algorithm creates single level relations from a multilevel relation.

Recovery Algorithm creates an output multi-level relation from a set of physically stored single level relations.

Decomposition algorithm Let A1=key and Ai = any attribute Let x denote classifications of A1

Let y denote classifications of Ai

For every x, create RA1,x(A1) or just RA1,x i.e., for the key, we vertically partition by attribute and

horizontally partition by security level.

For every y, create RAi,x,y(A1,Ai) x y or just RAi,x,y I.e., for non-keys vertical partitioning by attribute and key and

horizontal partitioning by attribute and key classification level.

C 900 C450 CFD7 CCUC

TC

1000 C 750 U 750 Cspeed

480 C450 U350 Urange

NT5 UNT5 UMT1 UName*Missiles

MT1NT5

NameR name,u

FD7Name

R name,c

450NT5350

RangeMT1

NameR range,u,u

450Range

FD7Name

R range,c,c

900Speed

FD7Name

R speed,c,c

750Speed

NT5Name

R speed,u,u

480Range

NT5Name

R range,u,c

1000NT5750

SpeedMT1

NameR speed,u,c

Resulting decomposed single level relations are:

MT1NT5

NameR name,u

FD7Name

R name,c

450NT5350

RangeMT1

Name

R range,u,u

450Range

FD7Name

R range,c,c

900

Speed

FD7

NameR speed,c,c

750Speed

NT5Name

R speed,u,u

480Range

NT5Name

R range,u,c

1000NT5750

SpeedMT1

NameR speed,u,c

Deficiencies of the SEA View /PRISM Models The deficiencies of the SEA View Model (in its

recovery algorithm) Creation of spurious tuples (due to polyinstantiation) Space cost of temporary tables Time cost of unions Time cost of joins

PRISM solves the spurious tuple problem, but still suffers from time cost problems

Recovery acceleration using P-trees

Based on the Sea View / PRISM Model Uses its Decomposition algorithm New Recovery algorithm using the P-

tree technology (given a query, creates an output multi-level relation from the single level relations). Main contribution is in addressing the

space and time cost problems.

Recovery Algorithm 1. For every relation RAi,x,y (single level relations containing all

entries from the multilevel relation having keys at classification level x and Ai attribute values at classification level y), excluding base relations (those containing the key only), create a P-tree, PAi,x,y, denoting the presence or absence of the keys at level x.

The recovery algorithm is very analogous to the PRISM solution, but addresses time costs (and to some extent space costs – the space savings due to P-tree compression are the main reason for the time savings).

Next we introduce P-trees.

bSQ Format Split each numeric attribute into separate

bit files (one for each bit position). Reasons of using bSQ format

Different bits contribute to the value differently.

bSQ format facilitates the representation of a precision hierarchy (from 1 bit precision, upwards).

bSQ format facilitates the creation of an efficient data structure, the P-tree, P-tree algebra and T-cube.

The “tabular” formats (inverted list) BSQ and bSQ are “tabular” formats

BSQ consist of a separate table for each feature attribute bSQ consist of a separate table for each bit

One can view it this way: Data set is initially 1 relation or table, R(K1,..,Kk, A1,…, An)

K1,..,Kk are structure attributes and Ai are feature attributes.

Structure attributes of a 2-D image are X,Y coordinates of the pixels (rows). Structure attribute of a relation is a 1-D structure consisting of the key In BSQ we separate each feature into a separate file (similar to the

Decomposition Storage Model (DSM), Copeland et al, SIGMOD85, 268-279.) bSQ, separate each bit of each feature into a separate file (with a

consistent structural order assumed) (similar to the Bit Transpose File (BTF) model, Wong et al, VLDB85, pp 448-457.)

Peano Count Tree (P-tree)

A basic P-tree is a representation of a bSQ file in a recursive, segmentized (quadrant-by-quadrant in images) arrangement.

The basic P-trees provide a compressed, lossless, easily-manipulated representation of the original data.

An example Ptree for one bSQ file of an image

Peano or Z-ordering Pure (Pure-1/Pure-0) quadrant Root Count

Level Fan-out QID (Quadrant ID)

1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1

55

16 8 15 16

3 0 4 1 4 4 3 4

1 1 1 0 0 0 1 0 1 1 0 1

16 16

55

0 4 4 4 4

158

1 1 1 0

3

0 0 1 0

1

1 1

3

0 1

111111111111111111100100111100101111111111111111

64-tuple bSQ file

64-pixel bSQ raster image file

55

16 8 15 16

3 0 4 1 4 4 3 4

1 1 1 0 0 0 1 0 1 1 0 1

An example of Ptree

Peano or Z-ordering Pure (Pure-1/Pure-0) quadrant Root Count

Level Fan-out QID (Quadrant ID)

1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1

0 1 2 3

111

( 7, 1 ) ( 111, 001 ) 10.10.11

2

3

2 . 2 . 3

001

P-tree variation – PM-tree

Peano Mask tree (PM-tree) uses mask instead of count. 1 denotes pure-1, 0 denotes pure-0 and m denotes mixed. It provides an efficient way for ANDing. Predicate Tree (1 iff predicate is true for quadrant

E.g., Pure1-Tree (predicate: quad is all 1’s Most compact form (all are lossless)

1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1

m

1 m m 1

m 0 1 m 1 1 m 1

1 1 1 0 0 0 1 0 1 1 0 1

0

1 0 0 1

0 0 1 0 1 1 0 1

1 1 1 0 0 0 1 0 1 1 0 1

Ptree Algebra And Or Complement Other (XOR, etc)

Ptree: 55 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 16 ____8__ _15__ 16 / / | \ / | \ \ 3 0 4 1 4 4 3 4 //|\ //|\ //|\ 1110 0010 1101

Complement: 9 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 0 ____8__ __1__ 0 / / | \ / | \ \ 1 4 0 3 0 0 1 0 //|\ //|\ //|\ 0001 1101 0010

Ptree ANDing OperationPM-tree1: m ______/ / \ \______ / / \ \ / / \ \ 1 m m 1 / / \ \ / / \ \ m 0 1 m 1 1 m 1 //|\ //|\ //|\ 1110 0010 1101

PM-tree2: m ______/ / \ \______ / / \ \ / / \ \ 1 0 m 0 / / \ \ 1 1 1 m //|\ 0100

Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 0 m 0 / | \ \ 1 1 m m //|\ //|\ 1101 0100

0 100 101 102 12 132 20 21 220 221 223 23 3 & 0 20 21 22 231 RESULT0 0 0 20 20 20 21 21 21 220 221 223 22 220 221 223 23 231 231

Depth-first Pure 1 path code

Basic, Value and Tuple Ptrees

Value Ptrees (predicate: quad is purely target value in target attribute) e.g., P1, 5 = P1, 101 = P11 AND P12’ AND P13

Tuple Ptrees (predicate: quad is purely target tuple) e.g., P(1, 2, 3) = P(001, 010, 111) = P1, 001 AND P2, 010 AND P3, 111

AND

AND

Basic Ptrees (a Pure1-Trees predicate-tree for target bit of target attribute)e.g., P11, P12, …, P18, P21, …, P28, …, P71, …, P78

Target Attribute Target Value

Target Attribute Target Bit Position

Cube Ptrees (predicate: quad is purely in target cube (product of intervals)e.g., P([13],, [0.2]) = (P1,1 OR P1,2 OR P1,3) AND (P3,0 OR P3,1 OR P3,2)

AND/OR

Using Ptrees for MLS data(key=structure attribute)

If we have the following query: “Select name,

dev-by, length from R where range 35”

Time improvements to the recovery process using P-trees

0

2

4

6

8

10

12

100 500 900 1300 1700

Number of records (in thousands)

PRISM

P-Tree

Advantages Acceleration results from operating on

p-trees and restricting I/O to only those fields that are involved in the output of the query

Space efficiency due to p-tree compression

Correct output results (no spurious tuples in the output table)

accelerating multilevel secure database queries using p-tree technology imad rahal and dr. william...

Documents