securing materialized views : a rewriting-based approachmots cl es : vues mat erialis ees, vues...

Securing Materialized Views : a Rewriting-Based Approach ∗

Sarah Nait-Bahloul Emmanuel CoqueryMohand-Saıd HacidUniversite de Lyon

Universite Claude Bernard Lyon 1 LIRIS CNRS UMR 520543, bd du 11 novembre 1918

69622 Villeurbanne cedex, Francesarah.nait-bahloul,emmanuel.coquery,[email protected]

Resume

Plusieurs modeles et techniques ont ete proposes pour controler l’acces aux donnees.Nous nous interessons dans cet article au probleme de generation automatiquement apartir des politiques de controle d’acces definies sur les relations de base, les politiques decontrole d’acces qui seront attachees aux vues materialisees. Nous avons choisi d’exprimerdes regles de controle d’acces a grains fins a travers l’utilisation des vues d’autorisations.Nous nous basons dans notre approche sur l’adaptation des techniques de reecriture derequetes aux specificites de notre probleme. Nous nous concentrons principalement sur lesvues exprimees sous forme de requetes conjonctives avec egalites. Nous montrons que notreapproche est securisee (aucune divulgation d’information) et sous certaines conditions, elleest maximale (toutes les donnees autorisees, a etre accedees, par les regles de base, sontautorisees par les regles generees).

Mots cles : Vues materialisees, vues d’autorisations, reecriture de requetes.

Abstract

Several techniques and models have been proposed to control access to databases.However, the problem of automatically generating, from access control policies definedover base relations, the access control policies that are needed to control access to mate-rialized views is not investigated so far. We choose to express fine-grained access controlthrough authorization views. We investigate this problem by resorting to an adaptationof query rewriting techniques. We provide a novel approach to automatically derive ac-cess control policies of materialized views from access control policies defined for baserelations when views can be expressed as conjunctive queries. We show that our approachis secure (no information disclosure) and under some conditions, the approch is maximal(all authorized data are accessed).

Keywords: Materialized views, authorization views, query rewriting.

∗This work is partially supported by the Partner University Fund ”Cyberspace Threat Identification,Analysis and Proactive Response” projet (2012-2015 : http ://www.facecouncil.org/puf/grantees-on-home-page/grantees-2012-2015/)

1

1 Introduction

An important requirement of any information management system is to protect data andresources against unauthorized disclosure (confidentiality) and unauthorized or impropermodifications (integrity), while at the same time ensuring their accessibility to legitimateusers (availability). Enforcing protection therefore requires that every access to a systemand its resources has to be controlled and that all and only authorized accesses can takeplace [3].

This work deals with data confidentiality in the context of materialized views. Severaltechniques and models have been proposed in the literature (see, among others, [16, 9, 13, 23,7]). The general idea is the following: whenever a subject tries to access a data object, theaccess control mechanism checks against a set of authorizations if the subject can performthe particular action on the object. The authorizations are stated usually by some securityadministrator according to the access control policies of the organization. Other techniquesare jointly used in order to ensure data confidentiality, the common one is the use of encryption[8].

With the use of large systems such as Data Warehouses ([24, 21]) or Distributed DatabaseSystems [5], new security issues arise ([15, 20]). In our work, we focus on the problem offiltering access to materialized views. A materialized view records the results returned by thecorresponding query into a physical table. In many settings, the views are materialized inorder to optimize access. For instance, in Data Warehouses, they can be used to precomputeand store complex aggregations. In distributed data management systems, materialized viewsare used to replicate data at distributed sites and synchronize updates done at several sites.Thus, the user can use a materialized view as any other base relation. In this context, ensuringsecurity at the materialized view level is as important as ensuring security at the level of baserelations. The question is then how to specify and enforce security policies at the level ofa materialized view? In oracle [1], the owner of the materialized view establishes the newsecurity policies. Users who access the materialized view are subject to the new policies andthey are not additionally subject to policies of the underlying base relations of the materializedview. This can cause information disclosure. Our approach consists in defining access controlrules on materialized views that comply with the basic ones, i.e., the administrator (or theowner) builds new access rules on a materialized view by taking into account those definedon base tables that are used to define the materialized view. In a system containing tens orhundreds of tables controlled by tens or hundreds of rules, it becomes impossible for humainsto deal with such large sets of rules and consider all the relevant ones.

In this paper, we build on our previous work ([11, 12]) to propose a novel approach thatfacilitates the administration of access control rules to ensure data confidentiality at thelevel of materialized views. We derive new access rules from existing access rules over baserelations. In our approach, we consider fine-grained authorization policies that are definedand enforced in the database through authorization views [16]. The Authorization viewsspecify the accessible data, by projecting out specific columns in addition to selecting rows.We primarily focus on conjunctive queries with equalities.

Figure 1 summarizes our approach. The idea is the following: Given a set of base relations(with the corresponding authorizations) and a set of materialized view definitions, synthesize aset of authorization views that will be attached to the materialized views, such that querying thematerialized views through those authorization views does not deliver more information than

2

querying the database through the original ones 1. We propose to generate new authorizationviews by filtering data using the two sets of views: the materialized views definitions and thebasic authorization views. For this, we rely on an adaptation of a query rewriting algorithm[14] to the security context.

The rest of this paper is organized as follows: Section 2 discusses authorization views.Section 3 describes our approach. In Section 5, we discuss some computational properties ofthe algorithm. Proofs of properties can be found in the appendix. In Section 6 we summarizethe related work. We conclude in Section 7.

Figure 1: Security policies for materialized views: System architecture

2 Datalog for Authorization

The problem of access control to data has been investigated by the data managementcommunity. As a consequence, several techniques and models have been proposed to improvedata security and ensure confidentiality ([16, 9, 13]). Among those models, we are particularlyinterested by the content-based and fine-grained access control.

Essentially, content-based access control requires that access control decisions be basedon data contents. Support for this type of access control has been made possible by the factthat SQL is a language for which most operations for data management, such as queries, arebased on declarative conditions against data contents. In particular, a common mechanismadopted by relational DBMSs to support content-based access control is based on the use ofviews. A view can be considered as a dynamic window able to select subsets of columns androws; those subsets are specified by defining a query, referred to as a view definition, whichis associated with the name of the view. Authorization views ([16, 25]) are a well knowndatabase technique that provides content-based and fine-grained access control. They arelogical tables that specify exactly the accessible data, either drawn from a single table orfrom multiple tables. Another advantage of using the authorization views is their flexibility.The views can be parameterized ([13, 16]) with information specific to a session, such as theuser-id, location, date, time, etc. These views provide a rule-based framework, where oneview definition applies across several users. This allows the administrator to avoid encodingthe same policies for each user.

Different ways to use the authorization views have been proposed. Traditionally, the userdirectly queries the appropriate authorization views [9]. An alternative approach achieves

1. Those associated with base relations from which the views are built.

3

”authorization transparency” by using views in a different way [16]. The user writes the queryin terms of the base relations, and the system checks the query for validity by determiningwhether it can be completely rewritten using the authorization views or not.

The goal of our work is to automatically determine the set of ”authorization views” thatwill be attached to materialized views. Doing so, the user can query her/his appropriateauthorization views or (s)he can query the materialized views and the system will rewritethe query using the authorization views. Thus, in our proposal, we are independent of theway the materialized views are accessed. Nevertheless, to facilitate the understanding ofour approach and without loss of generality, we assume that the user can query only theappropriate authorization views.

As we restrict ourselves to conjunctive queries with equality, we use non recursive Datalogwithout negation [2] as a formal framework for expressing access control rules.

We assume the existence of three types of symbols: variables, constants and predicatenames. p(t1, ..., tn) is a literal where p is a predicate name with arity n and each ti for1 ≤ i ≤n is either a constant or a variable. We call the sequence (t1, ..., tn) a tuple with arityn. A rule is a statement of the form:

p(X)← q1(X1), ..., qn(Xn).

where p and each qi for 1 ≤ i ≤n are relation names and X1, ..., Xn are tuples of appropriatearities. Each variable occurring in X must occur in at least one of X1, ..., Xn. We call p(X)the head of the rule, and q1(X1), ..., qn(Xn) atoms in the body of the rule. A Datalog programis a finite set of Datalog rules [2].

Now, we recall the notion of deductive databases. We limit our discussion to facts anddeductive rules. A relation schema is a relation name which is associated with an arity. Letr be a relation with arity n. The fact that a tuple (t1, ..., tn) is a member of the relation ris expressed by the literal r(t1, ..., tn). A relation instance over r is a finite set of facts overr. Such facts define the extensional database, edb, and r is called an extensional (stored)relation. A database schema R is a finite set of relation schemas. A database instance is afinite set I that is the union of relation instances over r, for r ∈ R. The intentional database,idb, contains deductive rules. Relations defined by deductive rules are called intentionalrelations [2]. For example:

q(X)← r1(X1), . . . , rn(Xn).

is a rule that defines the (intentional) relation q in terms of the (extensional) relationsr1, . . . , rn. The interpretation of the rule, noted by q(I), is defined as follows: if we canfind values for the variables of the rule such that the body holds in I, then we can deduce thehead fact. In relational database terminology, view definitions and queries are expressed asdeductive rules. The semantics of a Datalog program P can be found in [2]. In the following,we recall some definitions.

Definition 1 (Query containment) A query q1 is contained in, or subsumed by, a queryq2, written q1 v q2, if for any database instance I, the set of answers to q1 on I is a subsetof the set of answers to q2 on I. The two queries are equivalent, denoted q1 ≡ q2, if q1 v q2and q2 v q1.

Definition 2 (Query expansion) Let q be a query of the form:

q(X)← v1(X1), ..., vn(Xn). (1)

4

defined in terms of a set of views V where for 1 ≤ i ≤ n, vi(Xi) is a query of the form:

vi(Yi)← r1i(Y1i), ..., rli(Yli). (2)

The expansion of (1) using only views of V, denoted by expansion(q), is obtained from (1) byreplacing all the views in (1) with their definitions given by (2) (i.e., they are unfolded). Foreach atom vi(Xi) in (1), suppose σ is a mapping that maps the variables Yi to the argumentsXi and maps every other variable in (2) to a fresh variable, then replace vi(Xi) in q with thebody of σ(vi).

3 Rewriting series

In this section, we present our approach for generating a set of authorization views thatshould be attached to materialized views. More formally, we describe our problem as follows:let R be a set of extensional 2 relations and AV be a set of intentional relations defined interms of the extensional relations R. AV represents the set of authorization views, that is,AV defines which data is allowed to be accessed by users.

Now, suppose we have new intentional relations MV (defined in terms of the extensionalrelations R). By querying MV instead of AV, the user may get additional information. Theaim of our work consists in deriving a set of intentional relations AVMV defined in termsof the intentional relations MV. AVMV represents the set of authorization views, that is,AVMV states which data contained in MV is allowed to be accessed by users.

Note that the approach does not only consist to give access to data that is allowed to beaccessed byMV and AV, but also guarantees that the generated views should not give accessto more information. The following example shows the difference.

Example 1 Consider the following two sets of views:

MV : mv(b)← patients(a, b), emergency(a).AV : av(b)← patients(a, b).

The view mv returns information about patients who are admitted in the emergency serviceand the view av returns informations about all patients. Suppose the following view:

AVMV : avmv(b)← patients(a, b), emergency(a).

In this case, for all instances of the database we have avmv v av and avmv v mv. Inaddition, the view avmv provides a new information compared to the view av. Indeed, from theview avmv we have access to information of only patients who are admitted in the emergencyservice. Even though this information is a subset of the information returned by the view av, itis impossible to directly compute it from av. The attribute a does not appear in the head of theview av, it is impossible to filter the patients according to the admission service. Therefore,having access to the view avmv gives more information. This is the type of informationdisclosure we want to consider.

In this example, there is no set of views AVMV that can give the same information thanAV and MV.

2. Note that our proposal can be extended to allow the use of non recursive intentional relations in R. Weonly need to replace (unfold) each intentional relation by its expansion to have the set of extensional relations.

5

The generated views that express access rules (defined on MV) must be consistent withthe access rules that are expressed by AV. That is, the generated views should not give accessto more information than AV. Moreover, they should give access to as much information aspossible. This can be summarized by the following two requirements:

– Secure: The generated views should not give access to information that are not allowedby the basic authorization views. We have to guarantee that for each intentional relationqAVMV defined in terms of AVMV, there exists an intentional relation qAV defined interms of AV and an intentional relation qMV defined in terms of MV such that:

qAVMV ≡ qAV and qAVMV ≡ qMV

– Maximum: Generated views should return as much information as possible, whilesatisfying the first requirement. We have to guarantee that for each intentional relationqR defined in terms of R, if there exists an intentional relation qAV defined in terms ofAV and an intentional relation qMV defined in terms of MV such that:

qR ≡ qAV and qR ≡ qMV

then there exists an intentional relation qAVMV defined in terms of MV such that:

qR ≡ qAVMV

3.1 Double rewriting

We propose an approach based on a adaptation of a query rewriting algorithm, namelyMiniCon [14]. The general idea is to use the query rewriting algorithm through a series ofdouble rewriting, using the two sets of views AV and MV. The double rewriting by usingthe two sets of views allows to filter data that are accessible from both AV and MV.

Informally, the algorithm first defines a set of queries (Q) onMV that provides a full accessto the data ofMV, i.e. for each mvi inMV we define a query qi in the form qi(Xi)← mvi(Xi).Then, by using the query rewriting algorithm, one filters the data of Q with respect to AVandMV. The first step of the algorithm consists in filtering the queries Q by removing datathat are not accessible from AV, by rewriting Q using AV. The generated rewritings RWdetermine the accessible data of Q from AV. The second step of the algorithm consists inchecking whether RW data is accessible from MV, by rewriting RW using MV. Indeed,RW can provide access to a subset of Q data by filtering them with a relation that is notdefined in MV, hence the second step is important. We note RW ′ the generated rewritingsof the second step. Similarly, the second rewriting can introduce relations for which no accesspermission is defined in AV. In other words, we should also check that the RW ′ data isaccessible from AV. The algorithm performs this double rewriting until reaching a situationwhere the data of the generated views is accessible from MV and from AV.

To do this, we propose an adaptation of the MiniCon algorithm to the security context.The adaptation consists in providing additional mechanisms for capturing specific securityrequirements.

3.2 HMiniCon: adaptation of MiniCon algorithm

The MiniCon algorithm [14] was initially proposed as an efficient method for answeringqueries using views. It takes as input a query q and a set of views V and calculates all possiblerewritings of q using views in V.

6

We present in this section the HMiniCon algorithm, which is based on the MiniConalgorithm. We make two relevant modifications to accommodate our requirements. The firstone consists in relaxing the condition on the head variables at the formation step. The secondmodification consists in adding some variables to the head of rewritings in order to ensurethe maximality property. Thus, the HMiniCon algorithm proceeds in three steps:

1. Formation of MCDs 3: In this first step, the algorithm defines, for each atom g in aquery q, the views that cover g. An atom g′ in a view v covers an atom g in q if thereis a mapping δ from vars(q) to vars(v) such that δ(g) = g′. Once the algorithm finds apartial mapping, it considers variables in the query, and mainly those involved in joinatoms. It tries to find the minimal additional set of atoms that need to be mapped toatoms in v, provided g will be mapped to g′. This set of atoms and mapping informationis called a MiniCon Description(MCD). A MCD C of a query with respect to some viewv is represented by a tuple of the form (h, v(Y ), ϕ,G) where:– hC is a head homomorphism on v.– v(Y )C is the result of applying hC to v.– ϕC is a partial mapping from vars(q) to hC(vars(v)).– GC is a subset of the atoms in q which are covered by some atom in hc(v) using the

mapping ϕC .In the basic MiniCon algorithm, a MCD can be used if it satisfies the following twoconditions:

(C1) For each head variable x of q which is in the domain of ϕC , ϕC(x) is a headvariable in hC(v).

(C2) If ϕC(x) is an existential variable in hC(v), then for every g, atom of q, thatincludes x: (1) all the variables in g are in the domain of ϕC ; and (2) ϕC(g) ∈ hC(v).

Unlike the MiniCon algorithm, in the HMiniCon algorithm a MCD can be used if itverifies only the condition C2. In other words, if x is a head variable in q, ϕC(x) doesnot necessarily appear as one of the head variables of hC(v).

2. Combination of MCDs: This step is the same as in the MiniCon algorithm. Givena query q, a set of views V, and the set of MCDs C for q over the views in V, the onlycombinations of MCDs that can result in non-redundant rewritings of q are of the formC1, ..., Cl, where:

(D1) G1 ∪ ... ∪Gk = atoms(q) and

(D2) ∀i, j, i 6= j,Gi ∩Gj = ∅3. Determining head variables: After the generation of the rewritings, the third step

consists in determining for each rewriting rw which variables are in the head of rw. First,for every head variable x of the original query q such that ϕC(x) is a head variable inhC(v), ϕC(x) is a head variable of rw. Then, the variables that are newly introduced bythe views that are used in rw are also head variables of rw. More precisely, a variabley is said to be newly introduced by a view v if y ∈ headvars(hC(v)) and y ∈ vars(g)where g ∈ atoms(hC(v)) and g is mapped to any atom of q (y does not appear in thecodomain of ϕ). Formally, for each rewriting rw generated by the algorithm and foreach variable y ∈ vars(rw), if y 6∈ ϕ(vars(q)) then add y as a head variable of rw. Thismodification is required to ensure the maximality of the approach (see Section 5.3).

3. MiniCon Descriptions.

7

Let us illustrate by examples why we need to relax the condition on the head variables(example 2) and the necessity to add some variables in the heads of the rewritings to ensurethe maximality of results (example 3, Section 3.3).

Example 2 Assume the following query q, which makes a copy of the table patients and theauthorization view av that specifies an authorization access to the information (here, b) of thepatients table.

q(x, y)← patients(x, y).av(b)← patients(a, b).

We propose to rewrite q using the authorization view av in order to determine, given av,which set of information in q is accessible. If we apply the original MiniCon algorithm,the authorization view av will be considered as irrelevant. The condition regarding the headvariables is not satisfied. The variable x appears in the atom ’patients’ of q and in the headof q, but the variable ϕ(x) does not appear in the head of av with ϕ = {x → a, y → b} themapping of the atom patients(x, y) of q to the atom patients(a, b) of av. If we do not take theauthorization view as relevant, no rewriting will be generated, i.e., no information in q can beaccessed. It is too restrictive since one can have access to the information (y) by projectingit out.

3.3 HMiniCon+

In this section, we present our algorithm. We illustrate through an example the iterativeapplication of the double rewriting algorithm and the need to add some variables in the headof the rewritings in order to ensure the maximality of the generated views.

Example 3 Let us consider the following two sets of views (MV and AV):

MV : mv1(x, y)← patients(x, y).mv2(y, v)← treatments(y, z), doctors(z, v).

AV : av1(a, c)← patients(a, b), treatments(b, c).av2(c, d)← doctors(c, d).

The algorithm starts by defining a set Q of queries that gives a full access toMV and rewritesit using the authorization views AV. Let Q be the following set:

Q : q1(x, y)← mv1(x, y).q2(y, v)← mv2(y, v).

In order to perform rewriting, the queries in Q must be expanded, i.e., replace the atoms ineach query by their definition. We have:

qexp1 (x, y)← patients(x, y).qexp2 (y, v)← treatments(y, z1), doctors(z1, v).

where z1 is a fresh variable introduced by the expansion. In the following, we use the notationvari (e.g., y2, v1, z4...) for fresh variables.

In the following, we consider only the rewriting process of the query q1 (the process isthe same for each query qi in Q). We describe two scenarios of the rewriting process of q1.The first one (Figure 2) is the application of HMiniCon without the third step. The second

8

scenario (Figure 3) is concerned with the application of HMiniCon by adding some variablesin the head of rewritings.

Figure 2: Scenario 1: HMiniCon+ without the third step

Figure 3: Scenario 2: HMiniCon+ with the third step

Only one rewriting of qexp1 is calculated (line 2). We note that the rewriting body is thesame in the two scenarios but the head is different. In the second scenario, we add the variablez2 to the head. We consider the variable z2 as a new variable introduced by the view (it doesnot appear in vars(q1)). The variable was introduced by the atom treatments which wasintroduced by the view (the atom is not mapped to any atom of the query).

In the same manner, the algorithm rewrites the expansion of rw1 using the set MV. Wenote in scenario 2 (line 4) that the variable z2 is not in the head. This because the views thatare used in the rewriting do not project out the variable.

In order to determine if after the double rewriting the algorithm achieves its goal (i.e.,determines the accessible data from the two sets of views), we propose to check, for eachgenerated rewriting, if it contains the query. We determine if q1 v rw2. For this, we resortto a subsumption algorithm [4]. In this case, the application of one double rewriting is notsufficient (rw2 does not subsume the query q1). One applies again the double rewriting (lines6 to 9). At this step, the query subsumes the rewriting. The algorithm stops and returns therewriting as a correct view.

The obtained result (Line 9) shows the necessity to keep some variables in the heads ofthe rewritings. Indeed, in the scenario 1, we conclude that only the variable x can be accessed

9

contrary to the scenario 2 where both variables x and v1 can be accessed. The second scenariorepresents the case where (x, v1) can be accessed from the two sets of views.

More generally, the algorithm takes as input the two sets of views AV and MV. For thefirst iteration, we define the set Q that specifies a full access to MV. The algorithm startsby rewriting each qi in Q using AV, the result is a set of rewritings RWqi . This first stepdetermines which data of qi is accessible from AV. The second step consists in checking ifthis set is also accessible from MV. For this, the algorithm rewrites each rwj of RWqi usingMV. We denote by RW ′qi the rewritings generated in this second rewriting step. This doublerewriting step is performed by Algorithm 1.

Algorithm 1: Double rewritingInput: q the query to rewrite

AV:Set of authorization views

MV: Set of materialized views

Output: RW: Set of Rewritings

qexp = expansion(q)

RWq = HMiniCon(qexp,AV)

foreach rewriting rwj of RWq do

rwexpj = expansion(rwj)

RWrwj = HMiniCon(rwexpj ,MV)

add RWrwj to RW

end

return RW

After this step, the algorithm checks whether the application of the double rewriting usingAV and MV has filtered out additional data of Q. For this, for each generated rewriting wecheck by the double rewriting of qi if it contains qi

4. We recall here that the application ofHMiniCon will generate rewritings that do not necessarily have the same schema (we relaxedthe condition on the head variables and added some variables). So, to check containment,the algorithm selects only the comparable rewritings (CRW) and verifies for each comparablerewriting crwi if crwi subsumes qi. A comparable rewriting [22] is a rewriting that has thesame schema as the query. The subsumption test verifies that no tuple has been filtered outby the double rewriting. In other words, any tuple in qi can be accessed by both AV andMV.The algorithm terminates and returns crwi as a new authorization view onMV. In case crwi

does not subsume qi, then this means that one of the two rewriting steps has filtered out somedata. In this case the double rewriting algorithm is applied again by considering each crwi

that does not subsume the query and the other rewritings (that are not comparable) as theset of queries to be rewritten. This iteration of double rewriting is performed by Algorithm 2.

4 Rewriting and atom trees

In order to investigate some properties of our algorithm, we need to introduce two treestructures resulting from the application of HMiniCon+ algorithm. The first one is therewriting tree. A rewriting tree RT is associated with each query to be rewritten.

4. We rely on subsumption algorithm [4] for this check.

10

Algorithm 2: HMiniCon+AlgorithmInput: AV: Set of authorization views on basic relations

MV: Set of materialized views

Output: AVMV: Set of authorization views on MV

Define Q: Set of queries which give a full access on MVwhile Q is not empty do

pick qi in QRWqi = Double Rewriting(qi, AV, MV)

Compute CRWqi

foreach rewriting rwk of RWqi do

if rwk in CRWqi then

if rwk subsumes qi thenAdd rwk to AVMV

elseAdd rwk to Q

elseAdd rwk to Q

return AVMV;

Definition 3 (Rewriting tree RT ) Let q be a query to be rewritten, AV and MV are twosets of views. The rewriting tree associated to q is defined as follows:

– The root is the query q.– The nodes of depth k + 1 are rewritings generated by HMiniCon by rewriting nodes of

depth k using the set AV or MV. A node nk+1 is a child of a node nk if nk+1 is arewriting of nk.

In order to optimize the rewriting process, we propose to replace in each step of therewriting process some rewritings by equivalent queries. This modification does not changethe result that can be obtained by using the approach. Indeed, rewriting a query q or rewrit-ing its equivalent query q′ using the same set of views, returns the same set of information.We will see in section 5.2 the importance of this modification in order to guarantee the ter-mination of the approach under some conditions.

From each branch 5 of a rewriting tree associated to a query, we define an atom tree. Thistree contains information that concerns the rewriting of each atom (i.e., the view used torewrite it, the query atom it was mapped to...etc). This information is used to ensure thetermination of our algorithm.

Definition 4 (Atom tree) Given a branch X = B0, B1, . . . of a rewriting tree RT , theatom tree AT (RT ), or simply AT , of RT is defined as follows:

– The root is an anonymous node r.– Nodes at depth k + 1 are occurrences of atoms of Bk, noted gk.– gk+1 is a child of gk of type:

– Direct: If there is an MCD C from the rewriting Bk+1 such that ϕC(gk) = gk+1

5. A path from the root to a leaf.

11

– Indirect: If gk+1 belongs to the expansion of view v used to rewrite gk and gk+1 hasno Direct parent.

Since in the HMiniCon algorithm a view can be used to rewrite more than one atom,a node gk+1 can have multiple Indirect parents. In this case, and in order to remain ina tree configuration, we chose arbitrarily one Indirect parent of gk+1 from GC , whereGC is the set of covered atoms by the view v.We note:– view(gk+1) = v;– cpos(gk+1) the position of the atom matching gk+1 in v;– ppos(gk+1) the position of the atom matching gk in v;– type(gk+1) = Direct or Indirect

When parent(g′) = g and type(g′) = Direct (resp. type(g′) = Indirect), we can use the

notation gdir.−→ g′ (resp. g

ind.−→ g′).dir.−→∗

stands for the reflexive transitive closure ofdir.−→.

Example 4 Assume the following two sets of views:

MV: mv1(x, y)← r1(x, y), r3(y, z).mv2(x, y)← r2(x, y).mv3(x, y)← r3(x, y).

AV: av1(x, y)← r1(x, y), r2(y, z).av2(x, y)← r2(x, y).av3(x, y)← r3(x, y).

The above MV and AV, together with the content of figure 4 will be used as an exampleto illustrate the upcoming definitions. Figure 4 shows the atom tree of the query q(x, y) ←r1(x, y), r3(y, z) for some branch of the rewriting tree.

5 Properties of HMiniCon+ Algorithm

5.1 Security property

Property 1 (Security of views) Let AVMV be a set of generated views by HMiniCon+

and qAVMV be a query defined on AVMV. Then, there exists a query qAV defined on AVand a query qMV defined on MV such that qAVMV ≡ qAV ≡ qMV .

Proof Let rwl be a view generated by HMiniCon+. By definition

rwl−2 ≡ rwl−1 ≡ rwl (3)

where rwl−2 is an equivalent rewriting defined on MV and rwl−1 is an equivalent rewritingdefined on AV. Please, remember that l − 2, l − 1, l.. are depths in the rewriting tree(Definition 3).

Now, we have to show that for each query qAVMV on AVMV, there exists an equivalentquery qMV on MV and an equivalent query qAV on AV.Consider the following query on AVMV:

qAVMV = q(x)← avmv1(x1), ..., avmvn(xn).We define qMV as follows:

12

qMV = expansion(qAVMV).

We recall that the generated views are defined on MV. Hence, we have qAVMV ≡ qMV .

From (3) and for each avmvi(xi), we have avmvi(xi) ≡ rwl−1i (xi) where rwl−1

i (xi) isdefined on AV. We then have q′ = q′(x)← rwl−1

1 (x1), ..., rwl−1n (xn) with

qAVMV ≡ q′ (4)

We define qAV as follows: qAV = expansion(q′). We have

q′ ≡ qAV (5)

From (4) and (5), it follows that qAVMV ≡ qAV where qAV is a query defined on AV. �

5.2 Identifying terminating conditions

It is important to show that HMiniCon+ terminates. For this, we have to prove that therewriting tree associated to each query is finite. We propose to characterize the specificity ofcertain atoms and we will show that a query that does not contain such atoms is equivalentto the original one. As stated in the rewriting tree definition, we propose to replace the queryby the equivalent query by removing those atoms. In order to characterize this set of atoms,We introduce some definitions and properties.

The first definition is about a partitioning function of variables. This function determinesthe unification of variables in a given atom.

Definition 5 (Partitioning function) Let g be a node in AT 6. Partg is the partitioningfunction of vars(g) defined as follows:Partg = {{j | xi = xj where xi, xj ∈ vars(g) and 1 ≤ j ≤ n} | 1 ≤ i ≤ n}

Remark: It is important to note that for a given atom, the number of different possiblepartitionings is finite.

Now, we introduce the node history. For each node, we keep track of information of itsancestors that were indirectly generated (additional atoms) and the unifications that havebeen made throughout the rewriting process.

Definition 6 (History) For each node g in AT except for the root, History(g) is a listdefined as follows:

– If g is a child of the root, then History(g) = [(pos, Partg)] where pos is the position ofg in the query;

– If type(g) = Indirect thenHistory(g) = History(parent(g)) + [(Hg, Partg)]where Hg = [view(g), cpos(g), ppos(g)]

– If type(g) = Direct then– If Partg = Partparent(g) then History(g) = History(parent(g))

6. See definition 4.

13

– Otherwise, let History((parent(g)) = L+[Hparent(g), Partparent(g))]. Then History(g)= L+ [(Hparent(g), Partg)]

In figure 4, we can see the history of nodes r3(y, z5) and r3(y, z3).

Theorem 1 (Termination under constraints) Let us consider a query q and two setsof views AV and MV. If for every branch X of the rewriting tree RT (q) generated byHMiniCon+(q,AV,MV) and for every node g of the atom tree AT of X , History(g) doesnot contain any duplicate couple, then RT is finite.

In order to prove theorem 1, we introduce a series of intermediate definitions and lemmas.

Definition 7 (Virtual and Real nodes) Given a branch X of an RT tree and AT (X )the atom tree of X , a node gki is said to be a virtual node if there is a node gkj where

History(gki ) = History(gkj ) and type(gki ) = Indirect. k is the depth of the node in AT . A

node gki is said to be a real node if it is not a virtual node.

Figure 4 shows the virtual nodes represented by colored nodes and the uncolored nodes thatrepresent real nodes.

After characterizing the nodes (real or virtual), we define the effective rewriting tree thatconsists in replacing at each step all the rewritings by real rewritings. These latter resultfrom removing the virtual nodes from the original rewritings.

Definition 8 (Effective rewriting tree) Let us consider a branch X of a tree RT , AT (X )the atom tree of X and an element rwl of X . To each atom in rwl is associated a node inAT (X ). We define real(AT (X )) = AT (X ) − GV where GV is the set of virtual nodes rwl.Effective rewriting tree, noted real(RT ), is defined by replacing each rewriting rwl in RT byreal(rwl) where each atom in real(rwl) is associated with an atom gli of real(AT (X )).

Property 2 (Effective rewriting characterization) Given a branch X of an RT (q) treeand an element rwl of X , rwl ≡ real(rwl).

In order to prove real(rwl) ⇒ rwl, we define a mapping γ from rwl to real(rwl) and provethat it is a morphism [2]. To build γ, we introduce mappings that are present in the rewritingand atom trees.

Definition 9 (Mapping from rwk to rwk+1) From the rewriting step, we define a map-ping δk from rwk (the query) to rwk+1 (the rewriting) such that rwk+1 ⇒ δk(rwk). For each

atom gk in rwk, there is an atom gk+1 in rwk+1 where gkdir.−→ gk+1 such that gk+1 = δk(gk)

with δk a mapping that maps the variables of gk to variables of gk+1 as follows:– δk(x) = x if x ∈ HeadV ars(σ(v)) where view(gk+1) = v and σ is the mapping expansion

of v.– δk(x) = y if σ(x) = y and y ∈ HeadV ars(σ(v)) where view(gk+1) = v and σ is the

mapping expansion of v.– δk(x) = y where y is a fresh variable, otherwise.

Definition 10 (Mapping from rwk to rwl) Let rwk and rwl be two rewritings in X whereX is a branch of a RT (q) tree with k < l. From definition 9 we can deduce that there is a

14

mapping δk→l from rwk to rwl where δk→l(rwk) = δl−1 o δl−2 o ... o δk(rwk) such thatrwl ⇒ δk→l(rwk).

Let gl be an atom in rwl and gk be an atom in rwk such that type(gi) = Direct for allk < i ≤ l. We have gl = δk→l(gk).

Definition 11 (Mapping from gl to gk) Let gl be an atom corresponding to a virtual nodein rwl. By definition, there exists an atom g′l in rwl where History(gl) = History(g′l) and

type(g′l) = Direct. There is an atom gk in rwk where k < l and gkdir.−→∗g′l such that

History(gk) = History(g′l) and type(gk) = Indirect. We have History(gk) = History(gl)and type(gk) = type(gl) = Indirect. By definition, this means that view(gl) = view(gk) = v.We define a mapping δl→k

glfrom V ars(gl) to V ars(gk) such that δl→k

gl(gl) = gk as follows:

– If x is not a fresh variable in rwl then δl→kgl

(x) = x.– Otherwise, this implies that x was introduced as a fresh variable in the expansion step

of v and δl→kgl

(x) = y where y is the fresh variable in gk introduced in the same mannerin the expansion of v.

Figure 4 shows an example of such mappings. This is the case, for example, for the mappingfrom r3(y, z4) to r3(y, z3).

Lemma 3 If x is a fresh variable in rwl and x appears in two distinct atoms gl and g′l thatcorrespond to virtual nodes, then δl→k

gl(x) = δl→k

g′l(x).

Definition 12 (Mapping γ) Let g be an atom of rwk, the mapping γ is defined fromvars(rwl) to vars(real(rwl)) as follows:

– If x is not a fresh variable or there is an atom gl that corresponds to a real node suchthat x ∈ vars(gl), then γ(x) = x.

– Otherwise, x ∈ vars(gl) for some gl corresponding to a virtual node and we pose γ(x) =δk→l(δl→k

gl(x)).

Figure 4 shows the mappings defined above (e.g. δ1→2, δ5→3r3 , γ)

Proof of property 2 7 In order to prove the property we have to prove that rwl ⇒real(rwl) and real(rwl)⇒ rwl.

rwl ⇒ real(rwl): As the identity mapping is a morphism from real(rwl) to rwl, it is clearthat

rwl ⇒ real(Brwl). (6)

Now we have to prove that real(rwl)⇒ rwl. Let gl be an atom in rwl that corresponds to areal node, we have to prove that γ(gl) = gl ∈ real(rwl).Let gl be an atom in rwl that corresponds to a virtual node. From definition 11, we havea mapping from vars(gl) to vars(gk) where gk is some atom in rwk where k < l, such thatδl→kgl

(gl) = gk. (a)

From definition 10, we have a mapping from V ars(gk) to V ars(g′l) such that δk→l(gk) =g′l. (b)

7. Effective rewriting characterization.

15

Figure 4: Mappings in an atom tree

g′l corresponds to a real node, since type(g′l) = Direct. We can conclude that for eachatom gl that corresponds to a virtual node, it can be mapped to some predicate g′l in real(rwl).Using lemma 3, we have γ(gl) = δk→l(δl→k

gl(gl)).

From (a) and (b), we have δk→l(δl→kgl

(gl)) = g′l ∈ real(rwl). Therefore γ(gl) ∈ real(rwl).It follows,

real(rwl)⇒ γ(rwl). (7)

From (6) and (7) we conclude that rwl ≡ real(rwl). �

From property 2, we propose to replace, at each step, the rewritings by their equivalentreal rewritings. Thus, it is no longer useful to rewrite the atoms that correspond to virtualnodes.

A sufficient condition to guarantee the termination of the algorithm is that for each nodein the atom trees, there is no duplicate couple in its history. The following property specifiesthat the real nodes in each atom tree are unique by considering their history.

Property 4 (Uniqueness of real nodes) For each two real nodes gki and gkj in AT . If

History(gki ) = History(gkj ) then gki = gkj .

Sketch of proof The proof follows from a recurrence on k and with respect to the typeof nodes (Direct or Indirect). �

Proof of theorem 1 8 From property 4 and the condition stated on theorem 1 we canconclude that the set of real nodes is bounded.

(8)

8. Termination under constraints.

16

Let us show that each branch X in RT is finite. Suppose that X is infinite. For each rwi

in X we have, from property 2, rwi ≡ real(rwi). From (8), it follows that there exists in Xtwo rewritings rwi and rwi+k such that real(rwi) = real(rwi+k). Indeed, the set of nodes isbounded, so, the set of possible rewritings (combination of real nodes) is finite. Therefore,

rwi ≡ rwi+k (9)

From the property of query rewriting algorithm, we have

rwi+j ⇒ rwi, 1 ≤ j ≤ k (10)

From (9) and (10) we have rwi ≡ rwi+j , 1 ≤ j ≤ k. Particularly,

rwi ≡ rwi+2 (11)

From the definition of rewriting tree it follows that rwi+2 is a rewriting of rwi generatedby the application of double rewriting and from (11) it follows that it is the ending conditionof the algorithm. The branch X is finite. We conclude that HMiniCon+ terminates whenthere is no duplicate couple in the history of each atom in the atom trees. �

From the terminating property, we estimate the upper bound of the complexity.Given an atom occurrence g in an atom tree, if History(g) does not contain more than

one occurrence of the same couple, the maximal size msizeHistory(g) of the history is:

msizeHistory(g) =∑

v∈(AV∪MV)

size(v)× (size(v)− 1)×Bn

where Bn is the nth bell number 9, that is, the number of partitions of a set with n = |vars(g)|members.

It follows that the upper bound of the complexity is

2Maxg∈atoms(q)(msizeHistory(g))!

The proposed approach is not complete. Indeed, it does not take into account the casewhere the rewriting trees are infinite. The formal characterization of this result is part of ourongoing research.

5.3 Maximality property

We show in this section the maximality of our approach in case of a finite rewriting tree.More formally, if there are two queries qAV and qMV defined respectively on AV and MVwhere qAV ≡ qMV , then there is a query qAVMV defined on AVMV such that qAVMV ≡qAV ≡ qMV . For this, we propose to define an undirected graph from the two queries qAV andqMV and prove that each connected component represents an instantiation of one generatedview by HMiniCon+.

The following property allows us to restrict ourselves to simple equivalence morphisms.

9. http://www-history.mcs.st-and.ac.uk/Miscellaneous/StirlingBell/bell.html

17

Property 5 (Mapping reduction) Let µMV→AV and µAV→MV be the two equivalence mor-phisms between expansion(qMV) and expansion(qAV). There are two morphisms µ1 and µ2between expansion(qMV) and expansion(qAV) such that

µ1 = µ1 ◦ µ2 ◦ µ1 and µ2 = µ2 ◦ µ1 ◦ µ2 (12)

Sketch of proof µ1 and µ2 can be obtained by composing µMV→AV and µAV→MV a propernumber of times. �

In the rest of this section, we assume that equivalence morphisms verify the equation (12).

Definition 13 (G(qMV ,qAV ) graph) Let qMV and qAV be two equivalent queries. From the

definition of equivalence, there is a morphism µ1 from the atoms of expansion(qMV) to theatoms of expansion(qAV) and a morphism µ2 from the atoms of expansion(qAV) to the atomsof expansion(qMV). The corresponding G(qMV ,qAV ) graph is constructed as follows:

– The nodes are the atoms of qMV and qAV .– There is an edge between an atom gMV ∈ qMV and an atom gAV ∈ qAV if there is

an atom gexpMV ∈ expansion(qMV) and an atom gexpAV ∈ expansion(qAV) such thateither gexpMV = µ2(g

expAV) or gexpAV = µ1(gexpMV).

5.3.1 Correspondence between RT and G(qMV ,qAV )

The idea in this section is to show that traversing the graph from a node gMV in order tocalculate a connected component amounts to traverse a branch of RT (q) where atoms(q) =gMV . We will use the following lemma to switch between the rewriting tree and the graph.

Lemma 6 Let rwi be a query to be rewritten, βi be a mapping defined on vars(rwi) andq = βi(rw

i) be a query. Let rw be a rewriting of the query q generated by HMiniCon usingthe set of views V ′. There is a rewriting of rwi, noted rwi+1, such that there is a mappingβi+1 defined on vars(rwi+1) where rw = βi+1(rw

i+1).

Sketch of proof By following the construction of the rewritings rw and rwi+1 throughHMiniCon, we can exhibit the construction of βi+1. �

5.3.2 Traversing a connected component in G(qMV ,qAV )We define in this section queries defined on MV and AV that correspond to connected

components, and from there, we prove that these queries are instantiations of views generatedby HMiniCon+. We also prove that the queries can be combined, and from there, we canconstruct a query which is equivalent to qMV and qAV .

Definition 14 (Queries from a connected component) Let G(qMV ,qAV ) be an undirected

graph. Let C be a connected component of the graph. We define two queries qMVC and qAVC

from C as follows: ∀ gMV ∈ C, gMV ∈ atoms(qMVC ) and ∀ gAV ∈ C, gAV ∈ atoms(qAVC ).The head variables of qMVC are defined as follows: x ∈ headvars(qMVC ) if x ∈ vars(qMVC )

and µ1(x) ∈ vars(qAVC ) and µ2(µ1(x)) ∈ vars(qMVC ). In the same way, the head variables ofqAVC are defined as follows: x ∈ headvars(qAVC ) if x ∈ vars(qAVC ) and µ2(x) ∈ vars(qMVC ) andµ1(µ2(x)) ∈ vars(qAVC ).

18

Property 7 (Views as connected components) Let G(qMV ,qAV ) be the graph correspond-

ing to qMV and qAV . Let C be a connected component. Then, there is a mapping βn suchthat: qMVC = βn(rwn) where rwn is a view generated by HMiniCon+.

Proof In order to prove property 7 we define a starting point (qMV1 ) and show iterativelyhow to traverse the graph in order to calculate qMVC .

Let qMV1 be a query defined as follows:– atoms(qMV1 ) = gMV (some node of the graph G(qMV ,qAV )) and

– headvars(qMV1 ) = vars(gMV).We define A1 = {atoms(qMV1 )}.There is a rewriting rw1 defined onMV in the rewriting tree such that qMV1 is an instantiationof the query rw1: q

MV1 = β1(rw

1).We recall that HMiniCon+ takes as input a set of views defined onMV. Therefore, rw1

is a rewriting in the rewriting tree. Indeed, rw1 is defined as follows:– atoms(rw1) = mvi where mvi is the head of some materialized view definition andgMV = β1(mvi)

– headvars(rw1) = vars(mvi)Let qMVi be a query defined at the ith step, where i is odd. Rewriting qMVi consists in rewritingexpansion(qMVi ). One of the possible rewritings of expansion(qMVi ) using views AV is thequery (qAVi+1) defined as follows:

– gAV ∈ atoms(qAVi+1) if there is an edge between gAV and some atom gMV of atoms(qMVi ).Indeed, from the equivalence implications we have:∀ gexpMV ∈ expansion(qMVi ) ∃ g′expAV ∈ expansion(qAVi+1) where g′expAV = µ1(g

expMV).– If x ∈ headvars(qMVi ) and µ1(x) ∈ headvars(qAV), then µ1(x) ∈ headvars(qAVi+1). From

the query definition of qAV , x ∈ headvars(qAV) implies that the views project out thevariable x. Therefore, µ1(x) ∈ headvars(qAV).

According to lemma 6, there is a rewriting of rwi noted rwi+1 such that qAVi+1 = βi+1(rwi+1).

We recall here that the same views are used to rewrite rwi. This implies that if x ∈headvars(rwi) and µ1(βi(x)) ∈ headvars(qAVi+1), then θi+1(x) ∈ headvars(rwi+1), where θi+1

is the rewriting mapping from vars(expansion(rwi)) to vars(expansion(rwi+1)).We define Ai+1 as follows: Ai+1 = Ai ∪ {atoms(qAVi+1)}.In the same manner, one of the possible rewritings of qAVi+1 using the views MV is the

query (qMVi+2 ) defined as follows:– gMV ∈ atoms(qMVi+2 ) if there is an edge between gMV and some atom gAV of atoms(qAVi+1).– If x ∈ headvars(qAVi+1) and µ2(x) ∈ headvars(qMV) then µ2(x) ∈ headvars(qMVi+2 ).

According to lemma 6, there is a rewriting of rwi+1 noted rwi+2 such that qMVi+2 = βi+2(rwi+2).

We define Ai+2 as follows: Ai+2 = Ai+1 ∪ {atoms(qMVi+2 )}.If Ak+2 = Ak then Ak is a connected component. In other words, no atom is added in

the rewriting process, the same views are used, therefore,

qMVk+2 = qMVk = qMVC

The rewriting tree is finite, then we can conclude that there are two rewritings rwn and rwn+2

in the rewriting tree where n ≥ k and rwn ≡ rwn+2 such that

qMVk = βn(rwn) and qMVk+2 = βn+2(rwn+2)

19

This equivalence is the ending condition of HMiniCon+ and the rewriting rwn+2 is returnedas a view onMV and its instantiation corresponds to the query qMVC , a connected componentin the graph. �

The following property shows that it is possible to combine queries related to connectedcomponents in order to define qAVMV .

Property 8 (Join variables and connected components) If x ∈ vars(qMVCi)∩vars(qAVCj

)

with i 6= j then x ∈ headvars(qMVCi) and x ∈ headvars(qAVCj

).

Sketch of proof By following definition 14 of the queries qMVCiand qAVCj

, we can exhibitthat join variables are also head variables. �

Theorem 2 (Maximality of HMiniCon+) Let qMV and qAV be two queries defined re-spectively on MV and AV such that qMV ≡ qAV . Then, there is a query qAVMV defined onAVMV such that qAVMV ≡ qMV ≡ qAV .

Sketch of proof By composing the equivalence morphisms defined between each connectedcomponent and the corresponding generated view, we exhibit morphisms between qMV andqAVMV . �

6 Related Work

Rosenthal and Sciore ([17, 18]) have considered the problem of how to automaticallycoordinate the access rights of a warehouse with those of sources. The authors proposed atheory that allows automated inference of many permissions for the warehouse by a naturalextension of the standard SQL grant/revoke model 10, to systems with redundant and deriveddata. The authors have also defined the witness notion by including the use of views. In otherwords, the user can accesses only a part of a table T represented by a view. Hence, the userhas clearance to view the values of T that contribute to the view (information permission)and (s)he is allowed to execute a query that physically accesses T , but only for the purposeof computing the view (physical permission).Note that the notion of equivalence is very important in that paper. Now assume, we define amaterialized view mv1 based on the join of two tables T1 and T2. We also specify two views v1and v2 defining the data that one has right to access in T1 and T2 respectively. To determineif a user has right to access mv1, one must find an equivalent query q to mv1 that uses only T1and T2 and contributes to v1 and v2 respectively. The proposed framework concludes that theuser has no right to access mv1 if the inference mechanism does not find an equivalent queryeven if the user has right to access a part of mv1. However, in our approach, we propose amore flexible model. Indeed, by inferring the set of authorization views to control access tothe materialized view, we allow users to access even a part of the materialized view.To summarize, the framework proposed by the authors determines only if a user has rightto access a derived table (based on explicit permission) but our proposal goes further by

10. http://www.techonthenet.com/oracle/grant revoke.php

20

determining which part of the table the user has right to access in the derived table. Also, theauthors ([17, 18]) stated the inference rules at a high level. The properties of the underlyinginference system and the efficiency of the proposed algorithm were not investigated and remainan open research issue.In [6], the authors have built on [10] to provide a way to select access control rules to beattached to materialized view definitions based on access control rules over base relations.They resort to the basic form of the bucket algorithm which does not allow to derive allrelevant access control rules. Another limitation of this work is that since they only deal withselection of rules, the framework remains strongly dependent of the base relations. That is,the body of the derived rules involves base relations only. In the present work, we synthesizenew rules from existing rules where the body of the new rules makes reference to materializedviews.

In [7], the authors propose, unlike the use of authorization views, a graph-based modelto define the access control rules and query profilers. These latter consist in capturing theinformation content of the query through the use of graphs. The authors resort to coloring,composing and traversal of graph paths to define an efficient and effective access controlmodel. A permission p is a rule of the form [A, R] → s which states that subject s canview the sub-tuples over the set of attributes A belonging to the join among relations R.The major drawback of this approach is the impossibility to define permissions on a subset oftuples (selection). Indeed, the model allows just to determine accesses to a subset of attributesand therefore there is no way to model content-based access controls.

7 Conclusion

In the case of large organizations, the management of tens or hundreds of datasets isvery common. Ensuring data confidentiality in the presence of materialized views is alsoimportant. In this paper, we presented an automated method to derive authorization viewsto be attached to materialized views. We also presented HMiniCon+, an adaptation of aquery rewriting algorithm to the security context.

We stated the correctness criteria and we have shown that our algorithm satisfies oursecurity requirement. We identified the cases where the rewriting process terminates and wehave shown that the algorithm satisfies the maximality property. For the other cases, theapproach can generate an infinite set of authorization views. We have to characterize theseviews. This is a part of our ongoing research.

We estimated the upper bound of the complexity. Nevertheless, the real complexity ofour approach still remains an open problem.

In this paper, we mainly investigated the case of conjunctive queries with equalities. Inlarge systems, (e.g., data warehouses), materialized views can be used to precompute and storeaggregated data (e.g., sum of sales). This framework will be extended to take into accountother query languages : conjunctive queries with arithmetic comparisons and conjunctivequeries with aggregations. In these cases, we will investigate the potential of algorithms forrewriting aggregate queries using views (see, e.g., [19]).

21

References

[1] Oracle database: Data warehousing guide. http://docs.oracle.com/cd/B28359_01/

server.111/b28313.pdf. Accessed: 2007-09.

[2] S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.

[3] E. Bertino and R. Sandhu. Database security-concepts, approaches, and challenges.IEEE Trans. Dependable Secur. Comput., 2(1):2–19, 2005.

[4] U. S. Chakravarthy, J. Grant, and J. Minker. Logic-based approach to semantic queryoptimization. ACM Trans. Database Syst., 15(2):162–207, 1990.

[5] L. W. F. Chaves, E. Buchmann, F. Hueske, and K. Bohm. Towards materialized viewselection for distributed databases. In EDBT, pages 1088–1099, 2009.

[6] A. Cuzzocrea, M.-S. Hacid, and N. Grillo. Effectively and efficiently selecting accesscontrol rules on materialized views over relational databases. In IDEAS, pages 225–235,2010.

[7] S. D. C. di Vimercati, S. Foresti, S. Jajodia, S. Paraboschi, and P. Samarati. Assessingquery privileges via safe and efficient permission composition. In ACM Conference onComputer and Communications Security, pages 311–322, 2008.

[8] B. Iyer, S. Mehrotra, E. Mykletun, G. Tsudik, and Y. Wu. A framework for efficientstorage security in rdbms. In In EDBT, 2004.

[9] A. Motro. An access authorization model for relational databases based on algebraicmanipulation of view definitions. In ICDE, pages 339–347, 1989.

[10] S. Nait-Bahloul. Inference of security policies on materialized views. rapport de master2 recherche. http://liris.cnrs.fr/∼snaitbah/wiki, 2009.

[11] S. Nait-Bahloul, E. Coquery, and M.-S. Hacid. Access control to materialized views: aninference-based approach. In EDBT/ICDT Ph.D. Workshop, pages 19–24, 2011.

[12] S. Nait-Bahloul, E. Coquery, and M.-S. Hacid. Authorization policies for materializedviews. In SEC, pages 525–530, 2012.

[13] L. E. Olson, C. A. Gunter, and P. Madhusudan. A formal framework for reflectivedatabase access control policies. In ACM Conference on Computer and CommunicationsSecurity, pages 289–298, 2008.

[14] R. Pottinger and A. Y. Levy. A scalable algorithm for answering queries using views. InVLDB, pages 484–495, 2000.

[15] T. Priebe and G. Pernul. A pragmatic approach to conceptual modeling of olap security.In In Proc. ER, pages 311–324. Springer-Verlag, 2001.

[16] S. Rizvi, A. O. Mendelzon, S. Sudarshan, and P. Roy. Extending query rewriting tech-niques for fine-grained access control. In SIGMOD Conference, pages 551–562, 2004.

[17] A. Rosenthal and E. Sciore. View security as the basis for data warehouse security. InCAiSE Workshop on Design and Management of Data Warehouses, pages 5–6, 2000.

[18] A. Rosenthal and E. Sciore. Administering permissions for distributed data: Factoringand automated inference. In In Proc. of IFIP WG11.3 Conf, 2001.

[19] D. Srivastava, S. Dar, H. V. Jagadish, and A. Y. Levy. Answering queries with aggrega-tion using views. In VLDB, pages 318–329, 1996.

22

[20] J. Steger, H. Gunzel, and A. B. 0004. Identifying security holes in olap applications. InB. M. Thuraisingham, R. P. van de Riet, K. R. Dittrich, and Z. Tari, editors, DBSec,volume 201 of IFIP Conference Proceedings, pages 283–294. Kluwer, 2000.

[21] D. Theodoratos and T. Sellis. Dynamic data warehouse design. In 1st Int. Conf. onDaWak ’99, pages 1–10. Springer-Verlag, 1999.

[22] J. Wang, M. Maher, and R. Topor. Rewriting unions of general conjunctive queries usingviews. In Proc. Conf. on Extending Database Technology, LNCS 2287, 2002.

[23] Q. Wang, T. Yu, N. Li, J. Lobo, E. Bertino, K. Irwin, and J.-W. Byun. On the correctnesscriteria of fine-grained access control in relational databases. In VLDB ’07: Proceedingsof the 33rd international conference on Very large data bases, pages 555–566. VLDBEndowment, 2007.

[24] J. Yang, K. Karlapalem, and Q. Li. Algorithms for materialized view design in datawarehousing environment. In VLDB, pages 136–145, 1997.

[25] Z. Zhang and A. O. Mendelzon. Authorization views and conditional query containment.In In Database Theory - ICDT 2005, 10th International Conference, pages 259–273, 2005.

Appendices

Proof of property 4 By recurrence on k. For each two distinct nodes g0i and g0j in AT ,

we have (by definition) History(g0i ) 6= History(g0i ). Indeed, the history of a child of the rootis the position of the atom in the query. It is impossible for two distinct atoms to have thesame position.

Assume for each two real nodes gki and gkj in AT , where History(gki ) = History(gkj ),

gki = gkj . Now, we have to prove that for each two real nodes gk+1i and gk+1

j in rwk+1 where

History(gk+1i ) = History(gk+1

j ) then gk+1i = gk+1

j .

– type(gk+1i ) = Direct and type(gk+1

j ) = Direct. This implies that

History(parent(gk+1i )) = History(parent(gk+1

j )) where parent(gk+1i ) and parent(gk+1

j )

are real nodes in rwk. We have then parent(gk+1i ) = parent(gk+1

j ). Therefore, gk+1i =

gk+1j .

– type(gk+1i ) = Indirect and type(gk+1

j ) = Indirect. This implies that

History(parent(gk+1i )) = History(parent(gk+1

j )) where parent(gk+1i ) and parent(gk+1

j )

are real nodes in rwk. We have then parent(gk+1i ) = parent(gk+1

j ). Therefore, gk+1i =

gk+1j .

– type(gki ) = Direct and type(gkj ) = Indirect: In this case, by definition, gkj is a virtualnode.

�

Proof of lemma 3 Let x a fresh variable in rwl and x appears in two distinct atomsgl and g′l that correspond to virtual nodes. This implies that x was introduced as a freshvariable in the expansion step. Therefore, gl and g′l were generated by the same view v. Fromdefinition 11, there exist mappings δl→k

gland δl→k

g′lsuch that δl→k

gl(gl) = gk and δl→k

g′l(g′l) = g′k.

23

Since x was introduced as a fresh variable at the expansion step of v and by definition gk

and g′k was introduced by the same view, we can conclude that δl→kgl

(x) = δl→kg′l

(x). �

Proof of lemma 6 Let rwi be a query, βi be a mapping on vars(rwi) and q = βi(rwi) be

a query. Let rw a rewriting of q. There is a rewriting rw′ of rwi constructed as follows:LetMCDsrw =

{mcd1, ...,mcdm

}the MCDs that are used in the construction of the rewriting

rw. For each mcdj in MCDsrw that covers k atoms of q, the algorithm uses mcdj1, ...mcdjn

to rewrite the k atoms of rwi that corresponds to atoms covered by mcdj . The MCDsmcdj1, ...mcd

jn are constructed from the same view as mcdj .

There is a mapping β′ from expansion(rw′) to expansion(rw) defined as follows:∀x ∈ vars(expansion(rw′))

– if x ∈ Codomain(θ) then β′(x) = θ′(βi(θ−1(x))), where θ (resp. θ′) corresponds to the

rewriting mapping of the query rwi (resp. q). Let x and y be two distinct variablesof vars(rwi) such that θ(x) = θ(y) = z. This implies that the view unifies the twovariables. We recall here that the same view is used to rewrite the set of atoms of qthat correspond to atoms of rwi. Therefore, θ′(βi(x)) = θ′(βi(y)). We conclude thatθ−1(z) = θ′(βi(x)) = θ′(βi(y)).

– Otherwise, x was introduced as a fresh variable. In this case, we extend the mappingβ′ in order to consider variables that appear in rw′ and that were introduced as freshvariables (noted: FV). The mapping extension is defined as follows: Let y ∈ FV andy ∈ vars(vk) where vk correspond to the MCD mcdij view, then β′(y) = x such that

x ∈ vars(vk) where vk correspond to the MCD mcdi view and pos(y, vk) = pos(w, vk),where pos(var, a) is a function that returns the position of a variable var in an atom a.

We conclude that β′(rw′) = rw.We recall that we replace the rewriting rw′ by its equivalent query noted rwi+1 where

rwi+1 = real(rw′). From the equivalence definition, we have two equivalence morphisms:εrw

′→rwi+1and εrw

i+1→rw′ .We define βi+1 from rwi+1 to rw as follows: βi+1 = β′(εrw

i+1→rw′(rw′))Then, we have: βi+1(rw

i+1) = rw�

Proof of property 8 If x ∈ vars(qMVCi) ∩ vars(qMVCj

) where i 6= j then µ1(x) = y and

y ∈ vars(qAVCi) ∩ vars(qAVCj

). Indeed, if y 6∈ vars(qAVCi) ∩ vars(qAVCj

), this implies that y wasintroduced as a fresh variable in the expansion step ,then, y can appear in only one connectedcomponent. Therefore, x can also appear in only one connected component. This is not ourassumption.

In the same manner, if y ∈ vars(qAVCi) ∩ vars(qAVCj

) where i 6= j then µ2(y) = z and

z ∈ vars(qMVCi) ∩ vars(qMVCj

).

We then conclude that if x ∈ vars(qMVCi)∩vars(qMVCj

) where i 6= j then x ∈ headvars(qMVCi)∩

headvars(qMVCj). Indeed, from definition 14 we have if x ∈ vars(qMVC ) and µ1(x) ∈ vars(qAVC )

and µ2(µ1(x)) ∈ vars(qMVC ) then x ∈ headvars(qMVC ). �

24

Proof of theorem 2 Let qMV and qAV two queries defined respectively on MV and AVsuch that qMV ≡ qAV .From definition 14, we have qMV = qMVC1

, qMVC2, ..., qMVCn

Let qAVMV be a query defined as follows: qAVMV = βC1(avmv1), βC2(avmv2), ..., βCn(avmvn)where for 1 ≤ i ≤ n, avmvi is a generated view by the HMiniCon+ algorithm and qMVCi

≡βi(avmvi). We have to show that qAVMV ≡ qMV . For this, we have to show that thereis a morphism ω1 from vars(qMV) to vars(expansion(qAVMV)) and a morphism ω2 fromvars(expansion(qAVMV)) to vars(qMV).From qMVCi

= βCi(avmvi), there is a morphism ωi1 from vars(qMVCi

) to vars(expansion(βCi(avmvi)))

which is an identity morphism on headvars(qMVCi) and a morphism ωi

2 from vars(expansion(βCi(avmvi)))

to vars(qMVCi)) which is an identity mapping on headvars(βCi(avmvi)).

– morphism ω1: Let x ∈ vars(qMV). From property 8, if x ∈ vars(qMVCi) ∩ vars(qMVCj

)

then x ∈ headvars(qMVCi)∩headvars(qMVCj

). Therefore, we define ω1(x) = x. Otherwise,

x appears only in vars(qMVCi), we define ω1(x) = ωi

1(x).

Let g ∈ atoms(qMV) where g ∈ atoms(qMVCi), there is an atom g′ ∈ atoms(expansion(qAVMV))

such that ω1(g) = ωi1(g) = g′. Indeed, let x ∈ vars(g), if x ∈ headvars(qMVCi

) then

ω1(x) = ωi1(x) = x. Otherwise, x appears only in vars(qMVCi

) and hence, ω1(x) = ωi1(x).

– morphism ω2: Let x ∈ vars(expansion(qAVMV). If x ∈ vars(qAVMV) then ω2(x) = x.Otherwise, this implies that x was introduced as a fresh variable in the expansion stepof some view avmvi. Therefore, we have ω2(x) = ωi

2(x).Let g′ ∈ atoms(expansion(qAVMV)) where g′ ∈ atoms(expansion(βCi(avmvi))), thereis an atom g ∈ atoms(qMV) such that ω2(g

′) = ωi2(g′) = g. Indeed, let x ∈ vars(g),

if x ∈ headvars(βCi(avmvi)) then ω2(x) = ωi2(x) = x. Otherwise, x appears in the

expansion of βCi(avmvi) and hence, ω2(x) = ωi2(x).

From ω1 and ω2, we conclude that qMV ≡ qAVMV . �

25

securing materialized views : a rewriting-based approachmots cl es : vues mat erialis ees, vues...

Documents