secure xml querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · web...

16
Inf Syst Front (2012) 14:617–632 DOI 10.1007/s10796-010-9289-2 Secure XML querying based on authorization graphs Artem Chebotko · Seunghan Chang · Shiyong Lu · Farshad Fotouhi Published online: 5 November 2010 © Springer Science+Business Media, LLC 2010 Abstract XML is rapidly emerging as a standard for data representation and exchange over the World Wide Web and an increasing amount of sensitive business data is processed in XML format. Therefore, it is crit- ical to have control mechanisms to restrict a user to access only the parts of XML documents that she is authorized to access. In this paper, we propose the first DTD-based access control model that employs graph matching to analyze if an input query is fully acceptable, fully rejectable, or partially acceptable. In this way, there will be no further security overhead for the processing of fully acceptable and rejectable queries. For partially acceptable queries, we propose a graph-matching based authorization model for an optimized rewriting procedure in which a recursive query (query with descendant axis ‘//’) will be rewritten A. Chebotko (B ) Department of Computer Science, University of Texas-Pan American, 1201 W. University Drive, Edinburg, TX 78539-2999, USA e-mail: [email protected] S. Chang · S. Lu · F. Fotouhi Department of Computer Science, Wayne State University, 5143 Cass Avenue, Detroit, MI 48202, USA S. Chang e-mail: [email protected] S. Lu e-mail: [email protected] F. Fotouhi e-mail: [email protected] into an equivalent recursive one if possible and into a non-recursive one only if necessary, resulting queries that can fully take advantage of structural join based query optimization techniques. Moreover, we propose an index structure for XML element types to speed up the query rewriting procedure, a facility that is po- tentially useful for applications with large DTDs. Our performance study results showed that our algorithms armed with rewriting indexes are promising. Keywords XML · Access control · Security · Authorization graph · Authorization model 1 Introduction XML (eXtensible Markup Language) (W3C 2006a) is rapidly emerging as a standard for data representation and exchange over the Web. As a result, the problem of secure querying of XML documents becomes more and more important, particularly in business, in which it is critical to protect various trading and financial informa- tion and to ensure that sensitive business information can be accessed by only users who are authorized to access them. Numerous access control models have been proposed for secure querying of XML documents (Damiani et al. 2002; Gabillon and Bruno 2001; Kudo and Hada 2000; Miklau and Suciu 2003; Wang and Osborn 2004; Diao et al. 2003; Luo et al. 2004; Murata et al. 2003; Qi et al. 2005; Yu et al. 2002; Cho et al. 2002; Fan et al. 2004). These models can be largely classified into two cat- egories: XPath-based access control models (Damiani et al. 2002; Gabillon and Bruno 2001; Kudo and Hada 2000; Miklau and Suciu 2003; Wang and Osborn 2004;

Upload: others

Post on 26-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

Inf Syst Front (2012) 14:617–632DOI 10.1007/s10796-010-9289-2

Secure XML querying based on authorization graphs

Artem Chebotko · Seunghan Chang · Shiyong Lu ·Farshad Fotouhi

Published online: 5 November 2010© Springer Science+Business Media, LLC 2010

Abstract XML is rapidly emerging as a standard fordata representation and exchange over the World WideWeb and an increasing amount of sensitive businessdata is processed in XML format. Therefore, it is crit-ical to have control mechanisms to restrict a user toaccess only the parts of XML documents that she isauthorized to access. In this paper, we propose thefirst DTD-based access control model that employsgraph matching to analyze if an input query is fullyacceptable, fully rejectable, or partially acceptable. Inthis way, there will be no further security overheadfor the processing of fully acceptable and rejectablequeries. For partially acceptable queries, we proposea graph-matching based authorization model for anoptimized rewriting procedure in which a recursivequery (query with descendant axis ‘//’) will be rewritten

A. Chebotko (B)Department of Computer Science,University of Texas-Pan American,1201 W. University Drive, Edinburg,TX 78539-2999, USAe-mail: [email protected]

S. Chang · S. Lu · F. FotouhiDepartment of Computer Science,Wayne State University,5143 Cass Avenue, Detroit, MI 48202, USA

S. Change-mail: [email protected]

S. Lue-mail: [email protected]

F. Fotouhie-mail: [email protected]

into an equivalent recursive one if possible and into anon-recursive one only if necessary, resulting queriesthat can fully take advantage of structural join basedquery optimization techniques. Moreover, we proposean index structure for XML element types to speedup the query rewriting procedure, a facility that is po-tentially useful for applications with large DTDs. Ourperformance study results showed that our algorithmsarmed with rewriting indexes are promising.

Keywords XML · Access control · Security ·Authorization graph · Authorization model

1 Introduction

XML (eXtensible Markup Language) (W3C 2006a) israpidly emerging as a standard for data representationand exchange over the Web. As a result, the problem ofsecure querying of XML documents becomes more andmore important, particularly in business, in which it iscritical to protect various trading and financial informa-tion and to ensure that sensitive business informationcan be accessed by only users who are authorized toaccess them.

Numerous access control models have been proposedfor secure querying of XML documents (Damiani et al.2002; Gabillon and Bruno 2001; Kudo and Hada 2000;Miklau and Suciu 2003; Wang and Osborn 2004; Diaoet al. 2003; Luo et al. 2004; Murata et al. 2003; Qi et al.2005; Yu et al. 2002; Cho et al. 2002; Fan et al. 2004).These models can be largely classified into two cat-egories: XPath-based access control models (Damianiet al. 2002; Gabillon and Bruno 2001; Kudo and Hada2000; Miklau and Suciu 2003; Wang and Osborn 2004;

Page 2: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

618 Inf Syst Front (2012) 14:617–632

Diao et al. 2003; Luo et al. 2004; Murata et al. 2003;Qi et al. 2005) and DTD-based access control models(Yu et al. 2002; Cho et al. 2002; Fan et al. 2004). Whilethe former is applicable to XML documents with orwithout schema information, the latter is particulary ap-pealing when the DTDs (Document Type Definition)(W3C 2006a) or XML Schemas (W3C 2004) for XMLdocuments are available since access control policiescan be naturally integrated with the structure of XMLdocuments.

To understand DTD-based access control modelsbetter and motivate our research, consider the follow-ing example: Suppose in a university, the transcriptinformation of all students is stored as XML docu-ments that conform to the DTD shown in Fig. 1a.Each production rule in the DTD describes for eachparent element type, all its children element types, theircardinality (* for zero or many, ? for zero or one, and+ for one or many), and the order of the children. Forexample, the second production rule in Fig. 1a says thateach Transcript element contains one Person element,

followed by History element, and then followed byan optional TestResult element. An instance of XMLdocument that conforms to this DTD is sketched inFig. 1b omitting the contents of Person, History, andTestResult for brevity. A DTD can be viewed as aDTD graph in which nodes represent element types andedges represent parent-child relationships. Figure 1cshows the DTD graph corresponding to the DTD inFig. 1a. Finally, based on the DTD, an access controlpolicy for a group of users can be specified by associ-ating edges in the document DTD graph with securityannotations including ‘Y’, ‘N’, or an XPath qualifier‘[q]’ corresponding to accessible, inaccessible, and con-ditionally accessible element types, respectively. Forexample, one possible access control policy is specifiedin Fig. 1d, which imposes the following restrictions toaccess:

1. To access the transcript information of a student,the user has to be from the same department thatthe student is majored in.

Transcripts Transcript*;Transcript (Person, History, TestResult?);Person (Name, ID, SSN, Major+);History (Major+, Semester*, CumGPA);TestResult Test*;Major (Dept, Prog);Semester (Term, Class*, GPA);Class (CNum, Credit, Grade);Test (TName, TScore);

(a)

< Transcripts >< Transcript >

< Person > ... < / Person >< History > ... < / History >< Test Result > ... < / Test Result >

< / Transcript >< Transcript >

< Person > ... < / Person >< History > ... < / History >

< / Transcript >...

< / Transcripts >

(b)

Transcripts

Transcript

Person

Name ID SSN Major

History TestResult

Test

TName TScoreDept Prog

Semester CumGPA

Term Class GPA

CNum Credit Grade

*

*++*

*

?

(c)

Transcripts

Transcript

Person

Name ID SSN Major

History TestResult

Test

TName TScoreDept Prog

Semester CumGPA

Term Class GPA

CNum Credit Grade

*

*++

*

*

?

q : [./Person/Major/Dept = $ dept]

N

NN

Y N

(d)

Fig. 1 a DTD, b XML document, c DTD graph, and d security specification graph

Page 3: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

Inf Syst Front (2012) 14:617–632 619

2. The user cannot access a student’s SSN.3. The user cannot access a student past majored

program.4. The user cannot access the information of a student

regarding the courses she has taken.5. The user cannot access student’s test result infor-

mation.

Although several DTD-based access control mod-els (Cho et al. 2002; Fan et al. 2004; Yu et al. 2002)have been proposed, there is a lack of query analysistechnique that can decide if an input query is fully ac-ceptable, fully rejectable, or partially acceptable. Sucha technique will enable the elimination of further over-head for the processing of fully acceptable and fullyrejectable queries. The main contributions of this paperare as follows.

1. We propose the first DTD-based access controlmodel that employs graph matching to analyze ifan input query is fully acceptable, fully rejectable,or partially acceptable. In this way, there will be nofurther security overhead for the processing of fullyacceptable and rejectable queries.

2. For partially acceptable queries, we propose agraph-matching based authorization model for anoptimized rewriting procedure in which a recur-sive query (query with descendant axis ‘//’) will berewritten into an equivalent recursive one if possi-ble and into a non-recursive one only if necessary;this enables resulting queries to fully take advan-tage of structural join based query optimizationtechniques.

3. We propose to use an index structure for XMLelement types to speed up the query rewriting pro-cedure, such that a last test node in an XPath querycan be efficiently substituted with an entry fromthe rewriting index. Our performance study resultsshowed that our algorithms armed with rewritingindexes are promising.

This work extends our workshop paper (Chang et al.2007) with the formalization and comprehensive de-scription of our proposed access control model, addi-tional illustrative examples, and the performance studyof our proposed algorithms to derive an authorizationmodel for a security specification graph and to analyzeand rewrite an XPath query into a secure XPath query.

Organization The rest of this paper is organized asfollows. Section 2 presents a survey of related work onXML access controls. Section 3 reviews DocumentType Definitions and XPath queries. Section 4 de-scribes how access control policies are specified in our

access control model. Section 5 deals with securityenforcement and presents algorithms to derive an au-thorization model and to perform XPath query analy-sis and rewriting. Section 6 presents our performancestudy and Section 7 concludes the paper.

2 Related work

Numerous access control models for XML documentshave been proposed to restrict a user to access onlythe XML elements that she/he is authorized to access.According to the specification scheme of access controlpolicies, existing models can be classified into two cat-egories: XPath-based access control models (Damianiet al. 2002; Gabillon and Bruno 2001; Kudo and Hada2000; Miklau and Suciu 2003; Wang and Osborn 2004;Diao et al. 2003; Luo et al. 2004; Murata et al. 2003;Qi et al. 2005) and DTD or XML Schema based accesscontrol models (Yu et al. 2002; Cho et al. 2002; Fanet al. 2004). An XPath-based access control model usesXPath expressions to specify the XML elements that auser is allowed or denied to access. Therefore, in suchmodel, each access control policy is specified by a setof XPath based grant or denial rules. One advantageof using XPath for the specification of access controlpolicies is that XPath is a standard XML query lan-guage with well-defined syntax and semantics. In themeanwhile, a DTD-based access control model usesDTD security annotations to specify the XML elementtypes that a user is allowed or denied to access. Onelimitation of DTD-based access control models is thatthey require that the DTDs of XML documents areavailable, which might not always be the case. In theabsence of XML schema information, an XPath-basedaccess control model can be used.

Another dimension of classification is the enforce-ment mechanism of access control policies. Using thisdimension, XML access control models can be clas-sified into two major categories: document-based-enforcement models and query-based-enforcementmodels. While a document-based-enforcement model(Damiani et al. 2002; Bertino et al. 2001, 2002; Gabillonand Bruno 2001; Kudo and Hada 2000; Miklau andSuciu 2003; Yu et al. 2002; Wang and Osborn 2004;Diao et al. 2003; Gabillon 2005; Stoica and Farkas2002; Cuppens et al. 2005, 2007; Duong and Zhang2008; Sasaki et al. 2008; Kocatürk and Gündem 2008;Finance et al. 2005; Bouganim et al. 2004) enforcesaccess control policies by either preprocessing XMLdocuments into secure views or postprocessing queryresults to filter out inaccessible information, a query-based-enforcement model (Luo et al. 2004; Murata

Page 4: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

620 Inf Syst Front (2012) 14:617–632

et al. 2003; Qi et al. 2005; Cho et al. 2002; Fan et al. 2004;Kuper et al. 2005; Byun and Park 2006; Mohan et al.2005, 2007; Damiani et al. 2008) rewrites a user query qinto a secure query q′ using the information of accesscontrol policies and then evaluates q′ over the originaldocuments, which returns all and only those XMLelements among the query result of q that the user isauthorized to access. For the document preprocessingapproach, which is also called materialized view basedapproach in the literature, XML documents are pre-processed so that for each group of users, a view thatconsists of only accessible elements is calculated inadvance. During execution, each query presented bythe user is evaluated over the materialized view ratherthen over the original XML document. The advantageof this approach is efficient query processing as eachquery can be processed over the view without anyfurther special consideration of policy enforcement.However, the disadvantage is that in a dynamic envi-ronment, in which XML documents and access controlpolicies change frequently, the materialization andmaintenance of views are computationally expensive.The first materialized view based approach was pro-posed by Damiani et al. (2000). On the other hand,for the document postprocessing approach, a queryis evaluated over the original documents first, specialcare is taken to remove those XML elements that arenot accessible according to the access control policies,and finally, only those XML elements that the useris authorized to access are returned to the user. Theadvantage of this approach is that the postprocessingprocedure can dynamically reflect the change of ac-cess control policies. However, the postprocessingprocedure can become the performance bottleneckwhen the intermediate query result contains largevolume of unauthorized XML elements. More seri-ously, the postprocessing approach might not alwaysguarantee security as the user is permitted to checkconditions on XML elements that they are not autho-rized to access (no security check is performed duringquery evaluation) and is able to infer informationregarding these inaccessible elements, leading to asecurity leak. An example of such a security leak isillustrated in (Cho et al. 2002).

In this paper, we propose an XML access con-trol that falls under a DTD-based and query-based-enforcement categories. In particular, in our approach,access control policies are specified as security anno-tations over DTDs and XML queries are rewritteninto secure queries. While there have been proposed anumber of query rewriting techniques (e.g., DFA-basedDamiani et al. 2008 and NFA-based Luo et al. 2004query rewriting techniques), our approach stands out

by using a graph matching based authorization modelfor rewriting. Most related to our work is the securityview approach proposed by Fan et al. (2004). In thisapproach, an access control policy for a group of users isspecified by associating elements types in the documentDTD with security annotations including ‘Y’, ‘N’, or anXPath qualifier ‘[q]’ corresponding to accessible, inac-cessible, and conditionally accessible elements, respec-tively. A view DTD is then calculated automaticallywhich includes only accessible data w.r.t. the accesscontrol policy and is provided to the users authorizedby the policy to formulate queries over the DTD view.Each query formulated by an authorized user over theDTD view is rewritten to an equivalent query overthe original DTD, which is further optimized and thenexecuted. The returned result consists of only XMLelements that the user is authorized to access accord-ing to the access control policy. Although we use thesame DTD annotation language proposed by Fan et al.(2004) for the specification of access control policies,our framework differs from and improves over (Fanet al. 2004) in a number of ways for security enforce-ment. First, instead of only exposing a view DTD, weexpose the full original DTD to all users supporting theargument that the availability of the original DTD iscritical for interoperability and correctness of businessapplications (Damiani et al. 2002; Bertino and Ferrari2002; Luo et al. 2004; Murata et al. 2003; Qi et al. 2005).Second, while in Fan et al. (2004), rewriting is neededfor each input query, we introduce a graph matchingbased static analysis technique to determine if an inputquery is fully acceptable, fully rejectable, or partiallyacceptable. Rewriting is necessary only for partially ac-ceptable queries. Although similar static analysis tech-nique has been proposed by Murata et al. (2003), toour best knowledge, we are the first to propose thestatic analysis technique for DTD-based access controlmodels. Third, while in (Fan et al. 2004), each recursivequery (query with descendant axis ‘//’) will be rewritteninto an equivalent non-recursive one which can be verycomplex and inefficient, our rewriting procedure onlydoes so when necessary and thus leaves more space forXML query optimization techniques that are applicableto recursive axes (Atay et al. 2007). Finally, we proposeto use an index structure for XML element types tospeed up the query rewriting procedure, such that alast test node in an XPath query can be efficientlysubstituted with an entry from the rewriting index.

Finally, our work is related to the secure XMLbroadcasting problem (Bertino and Ferrari 2002;Kundu and Bertino 2008; Ko et al. 2007; Lee andWhang 2006) where the focus is the secure dissemina-tion of XML documents to authorized users. In these

Page 5: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

Inf Syst Front (2012) 14:617–632 621

frameworks, besides access controls, encryption and de-cryption methods are frequently used for the protectionof information leaking in communication channels.

3 Preliminaries

In the following, we review Document Type Definitions(DTDs) (W3C 2006a) and XPath (W3C 2006b) queriesconsidered in this paper.

3.1 Document type definition

A DTD document consists of a set of declarationsthat describe a class of XML documents in terms ofconstraints on the structure of those XML documents.Similarly to Fan et al. (2004), we formalize a DTDdocument or simply a DTD as follows.

A DTD D is a triple (Ele, P, root), where Ele is afinite set of element types; root is a distinguished type inEle, called the root type; and P is a function that definesthe element types, such that for any A in Ele, P(A) is aregular expression of the form:

γ ::= str | ε | B∗1 | B+

1 | B1? | B1, ..., Bn | B1 ∨ ... ∨ Bn,

(1)

where str denotes PCDAT A, ε is the empty word,Bi is an element type in Ele that is referred to as asubelement type of A; ‘*’, ‘+’ and ‘?’ denote ‘zero ormany’, ‘one or many’ and ‘zero or one’ occurrences ofthe subelement type under the element type, respec-tively; ‘,’ and ‘∨’ denote concatenation and disjunction,respectively. We refer to A → γ or A → P(A) as theproduction of A.

A DTD D can be represented as a directed graph,referred to as a DTD graph GD of D. The graphcontains a node for each element type A in D, referredto as the A node, and the edges depict the parent-child relationships among the nodes. Specifically, foreach production A → γ , there is an edge from the Anode to the B node for each element type B in γ . Ifγ = B∗, γ = B+, or γ = B?, then the edge has a ‘*’(zero or many), ‘+’ (one or many) or ‘?’ (zero or one),respectively, as a label indicating how many B elementscan be immediately nested within an A element. Whenit is clear from the context, we shall use the DTD andits graph interchangeably, referred to as D and GD,respectively; similarly for A element type and A node.

In this paper, we assume that a DTD D is non-recursive, such that P(A) contains no A directly orindirectly, and thus a DTD graph GD has no cycles.

A sample DTD and its DTD graph are shown inFig. 1a and b, respectively.

Note that while we choose to use DTD as our XMLschema language for the simplicity of presentation, ourresearch is valid for XML Schema as well. Semantically,XML Schema (W3C 2004) and DTD (W3C 2006a)provide similar constructs for describing XML docu-ment structures, although XML Schema provides thefollowing additional enhancements: (1) XML Schemaprovides a richer data type system; (2) XML Schemasupports namespaces; and (3) XML Schema supportsmore complex constraints, such as cardinality con-straints. However, all these enhancements are orthogo-nal to the access control model proposed in this paper.Therefore, our technique is readily applicable to XMLSchema as well.

3.2 XPath queries

In this paper, we use the same class of XPath (W3C2006b) queries as defined by Fan et al. (2004) in thefollowing:

p ::= ε | l | ∗ | p / p | //p | p ∪ p | p[q],(2)

where ε, l, and ∗ denote the empty path, a label (in Ele),and a wildcard, respectively; ‘∪’, ‘/’ and ‘//’ stand forunion, child-axis and descedant-or-sel f (or recursive)-axis, respectively; and finally, q in p[q] is called aquali f ier and defined by :

q ::= p | p = c | q ∨ q | q ∧ q | ¬q, (3)

where c is a constant, p is as defined above, and ‘∨’, ‘∧’and ‘¬’ denote disjunction, conjunction and negation.

XPath expressions are commonly used in XMLquery languages to access specific parts of an XMLdocument. For example, the XQuery (W3C 2007) lan-guage uses XPath to retrieve XML data and supple-ments additional operations, such as projection andjoin, to further process this data. Therefore, from thesecurity perspective, an access control is required forXPath expressions, while additional operations can beperformed on already secure data resulted from XPathevaluation.

4 Security specification

Our role-based access control for XML documentstreats each XML element as an object to which access iscontrolled by the corresponding access control policieson a DTD. In this section, we formalize how to specify

Page 6: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

622 Inf Syst Front (2012) 14:617–632

a user access request and access control policy in ourmodel.

In order to access XML data, a user submits an ac-cess request to a system as formalized in the following.

Definition 1 (Access Request) An access request arfrom a user is a triple (r, D, q), where

– r is a role of the user,– D is a DTD of XML documents that the user

requests to access,– q is an XPath query that the user uses to access

XML data.

Example 1 (Access request for Transcripts.dtd) An ac-cess request from a secretary in a computer sciencedepartment, who wants to collect all student namesin the department from the transcripts XML dataset(see Fig. 1), can be represented as ar = (r : secretary,D: Transcripts.dtd, q: //Name[..//Dept = “CS”]).

To specify an access control policy for a role we usethe notion of a security specification S, which extendsa DTD D by associating security annotations withproductions of D. More formally, we define a securityspecification as follows.

Definition 2 (Security Specification) A security specifi-cation S is a tuple (r, D, ann), where:

– r is a role in a system,– D is a DTD of XML documents,– ann is a partial mapping from an element type A in

Ele and its subelement type B in P(A) to a securityannotation α, which we concisely denote as A

α−→ B,and α is defined as α ::= Y | [q] | N, where [q] isa qualifier.

Values Y, [q], and N indicate that the B childrenof A elements in an instantiation of D are accessible,conditionally accessible, and inaccessible, respectively.If A

α−→ B is not explicitly defined, then B inheritsthe accessibility of A. On the other hand, if A

α−→ Bis explicitly defined it always overrides the inheritedaccessibility of B.

Example 2 (Security specification for Transcripts.dtd)A security specification for the secretary from Exam-ple 1 is defined as S = (r, D, ann), where r is secretary,D is Transcripts.dtd, and ann is listed as follows:

Transcripts[./Person/Major/Dept=$dept]−−−−−−−−−−−−−−−−→ Transcript : A sec-

retary in department $dept can only access transcript

information for students who study in the same depart-ment $dept.

PersonN−→ SSN : A secretary cannot access a student’s

SSN.

HistoryN−→ Major : A secretary cannot access a stu-

dent past major under the History element.

MajorY−→ Dept : A secretary can access a student cur-

rent or past department information.

SemesterN−→ Class : A secretary cannot access infor-

mation about courses that a student has taken.

TranscriptN−→ TestResult : A secretary cannot access a

student’s test result information.

Additionally, we define a security specif ication graphGS as a DTD graph with security annotations on itsedges as defined in S. When it is clear from the context,we shall use the security specification and its securityspecification graph interchangeably, referred to as Sand GS, respectively.

Example 3 (Security specification graph for Tran-scripts.dtd) The security specification graph GS for thesecurity specification S defined in Example 2 is shownin Fig. 1d. Note that edges with empty security annota-tions recursively inherit annotations of their “parent”edges, such that TestResult −→ Test inherits N from

TranscriptN−→ TestResult and Test −→ TName inherits

N from TestResultN−→ Test, and so forth. An interesting

case occurs when a node, such as Major, has multipleincoming edges with conflicting annotations. While weprovide a solution to this inheritance conflict later inthe paper, intuitively, this case denotes that Majorunder Person is accessible, but Major under History isinaccessible.

Finally, we introduce the notions of accessible XMLelement and secure query in our model.

Definition 3 (Accessible XML Element) Given anXML instance T of a DTD D, an element e in T isaccessible w.r.t. a security specification S if and onlyif either (1) the security annotation for e is Y or [q]and the qualifier [q] is true at e, and, moreover, for allancestors a of e whose security annotation is [q′], thequalifier [q′] is true at a; or (2) the security annotationfor e is not explicitly defined but the parent of e isaccessible w.r.t. S.

Page 7: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

Inf Syst Front (2012) 14:617–632 623

Definition 4 (Secure Query) Given an XML instanceT of a DTD D, a query q is secure w.r.t. a securityspecification S if all the elements returned by the ex-ecution of query q over T are accessible.

5 Security enforcement

A security specification must be enforced for a userquery. Our security enforcement mechanism includesthe following two steps:

1. Deriving authorization model from security specifi-cation graph.

2. XPath query analysis and rewriting.

These steps are elaborated in the following twosubsections.

5.1 Deriving authorization model

To efficiently enforce an access control policy definedby a security specification, we derive authorizationmodel from a security specification graph GS. Ourmodel requires the notion of accessible and inaccessiblenodes, such that an accessible node is a node with allincoming edges annotated with ‘Y’ and an inaccessiblenode is a node with all incoming edges annotated with‘N’. The definition of our authorization model is asfollows.

Definition 5 (Authorization Model) An authorizationmodel A is a tuple (GA, Pt, Ii, Ia), where GA is anauthorization graph, Pt is a predicate table, Ii and Ia arerewriting indexes for inaccessible and accessible nodes,respectively. Authorization graph GA is a fully anno-tated security specification graph, derived from securityspecification graph GS, such that every edge in GA isannotated with ‘Y’ or ‘N’, every node in GA is classifiedas accessible or inaccessible node and inaccessible leafnodes are recursively pruned. Predicate table Pt is a setof tuples, such that each tuple (e, p) relates an edge ein GA and its predicate p. Rewriting index Ii (Ia) is ahash table that for each inaccessible (accessible) noden in GA, contains an XPath query q that retrieves allaccessible information under n.

The algorithm, deriveAuthorizationModel, to derivean authorization model from a security specificationgraph GS is presented in Fig. 2. The algorithm showshow two other algorithms are called to create theauthorization model: (1) deriveAuthorizationGraph iscalled first to derive authorization graph GA fromGS and construct predicate table Pt, and (2) cre-ateRewritingIndexes is called on GA and Pt to createthe query rewriting indexes.

The algorithm deriveAuthorizationGraph is pre-sented in Fig. 3. First (lines 05–06), the algorithm copiessecurity specification graph GS to authorization graphGA and creates a virtual parent r′ of GA’s root r and

edge r′ Y−→ r, since the root is always considered asaccessible. Second (lines 07–10), it creates predicatetable Pt by placing edges and their corresponding pred-icates in the table and replaces all predicate annotationswith ‘Y’ annotations in GA. Third (lines 11–38), thealgorithm fully annotates GA, such that when node nin GA with all incoming edges annotated and at leastone outgoing edge with no annotation is found, thealgorithm proceeds depending on the following threecases:

Case 1 (lines 13–27). If node n has incoming edgeswith both ‘Y’ and ‘N’ annotations, then n is cloned toa new node n′, n’s incoming edges with ‘N’ annotations

piN−→ n are replaced with pi

N−→ n′, n’s outgoing edges

n −→ ci with no annotations are annotated with ‘Y’ nY−→

ci and cloned as n′ N−→ ci, and all the other outgoingedges n

α−→ ci are cloned as n′ α−→ ci. In other words, theincoming edges are split between n and n′, such that nretains edges with ‘Y’ annotations and n′ adopts edgeswith ‘N’ annotations. The outgoing edges are retainedby n with empty annotations replaced with ‘Y’ and thesame edges are copied for n′, but empty annotations arereplaced with ‘N’. Note that in the following we denotenodes n and n′ as an accessible node and an inaccessiblenode with the same label n.

Case 2 (lines 28–32). If all n’s incoming edges areannotated with ‘Y’, then all outgoing edges with emptyannotations are annotated with ‘Y’.

Fig. 2 AlgorithmderiveAuthorizationModel

01 Algorithm: deriveAuthorizationModel02 Input: security specification graph GS03 Output: authorization graph GA, predicate table Pt, query rewriting indexes Ia and Ii04 Begin05 GA, Pt = deriveAuthorizationGraph(GS)06 Ii, Ia = createRewritingIndexes(GA,Pt)07 Return (GA, Pt, Ii, Ia)08 End Algorithm

Page 8: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

624 Inf Syst Front (2012) 14:617–632

Fig. 3 Algorithm deriveAuthorizationGraph

Case 3 (lines 32–36). If all n’s incoming edges areannotated with ‘N’, then all outgoing edges with emptyannotations are annotated with ‘N’.

The same lines 13–27 also split nodes that end uphaving their all incoming and outgoing edges anno-tated, such that two distinct incoming edges have dif-ferent annotations. After all edges are annotated, GA

may still have leaf nodes that have incoming edges withboth ‘N’ and ‘Y’ annotations because the algorithm

does not split leaf nodes. Such leaves cannot be clas-sified as accessible (only ‘Y’ annotations on incomingedges) or inaccessible (only ‘N’ annotations on incom-ing edges), which is resolved in the following pruningstep. Fourth (lines 39–45), the algorithm simplifies fullyannotated GA by recursively removing incoming edgesof leaf nodes with ‘N’ annotations and pruning leafnodes whose all incoming edges have been removed.After this simplification, GA only has accessible and/or

Page 9: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

Inf Syst Front (2012) 14:617–632 625

inaccessible nodes, while all leaf nodes are accessible.Fifth (lines 46–48), leaf nodes of GA that are not alsoleaf nodes in GS are recursively removed (even thoughthey are accessible) because they bear no useful infor-mation. Therefore, GA preserves all the paths of GS toaccessible leaf nodes, while having precise differentia-tion between accessible and inaccessible nodes. Finally(line 49), GA and Pt are returned.

The algorithm createRewritingIndexes that createsrewriting indexes Ii and Ia is shown in Fig. 4. First (line05), createRewritingIndexes creates the Ia index entriesfor every leaf node. Since all leaf nodes are accessibleand have no children, the corresponding queries re-trieve the XML instances of these nodes. Then (lines06–17), it creates index entries for the other nodes,processing nodes whose all children already have en-tries, until all nodes have the entries. This order allowsreusing the previously computed index entries and thussimplifies the algorithm (e.g., we had a recursive algo-rithm originally) and saves computations. To create aquery index entry for an accessible (inaccessible) noden, the algorithm computes the union of queries of theform “/n” + p + Ia(ci) or “/n” + p + Ii(ci) for eachaccessible or inaccessible child ci of n and assigns itto Ia(i)(n), where p is a predicate for edge n −→ ci ∈Pt

if any. Note that a predicate n −→ ci ∈Pt specifies anaccess to ci and thus, may include XPath paths relativeto ci; in this case, the algorithm changes such pathsto be relative to n by adding “./ci” in front of eachrelative path. In summary, given node n, which can beaccessible or inaccessible, its index entry retrieves allaccessible/secure information under n.

Note that an authorization model, similarly to asecurity specification, is only constructed once for aparticular user role. The authorization model for a role

is later used to analyze and rewrite any XPath queryissued by a user with this role.

Example 4 (Authorization model) Given the securityspecification graph GS in Fig. 1d, algorithm deriveAu-thorizationModel computes (GA, Pt, Ii, Ia) by call-ing deriveAuthorizationGraph and createRewritingIn-dexes. The deriveAuthorizationGraph algorithm per-forms the following operations:

– It assigns GS to authorization graph GA andmakes the root Transcripts accessible by adding

Transcripts′ Y−→ Transcripts (lines 05–06).– It inserts the predicate “./Person/Major/Dept=

$dept” on the edge Transcripts → Transcript intopredicate table Pt and annotates the edge with ‘Y’(lines 07–10). The resulting predicate table is shownin Fig. 5b.

– It adds security annotations for all edges withempty annotations (lines 11–38). The algorithm se-lects the edge Transcript → Person and annotatesit with ‘Y’, since the edge Transcripts → Transcripthas the ‘Y’ security annotation. Similarly, it uses

inheritance to add annotations TranscriptY−→ His-

tory, PersonY−→ Name, Person

Y−→ ID, PersonY−→

Major, HistoryY−→ Semester, History

Y−→ CumGPA,

SemesterY−→ Term, Semester

Y−→ GPA, ClassN−→

CNum, ClassN−→ Credit, Class

N−→ Grade, TestResultN−→ Test, Test

N−→ TName, and TestN−→ TScore. The

only edge that is left with an empty annotation isMajor −→ Prog and its annotation can be inherited

from PersonY−→ Major or History

N−→ Major. Thiscase is processed in lines 14–26 of the algorithm.

Fig. 4 AlgorithmcreateRewritingIndexes

01 Algorithm: createRewritingIndexes02 Input: authorization graph GA , predicate table Pt03 Output: query rewriting indexes Ii and Ia for inaccessible and accessible nodes in GA , respectively04 Begin05 For each leaf l in GA do Ia(l) = “/l” /* all leaves are accessible */ End For06 While there exist non-indexed node n ∈

GA and every n’s child has been indexed do07 For each child ci of n do08 If n ci Pt then09 p = “[”+ Pt(n → ci)+ “]”10 If p includes relative path(s) then add “./ci” in front of each relative path End If11 Else p = “” End If12 If ci is an accessible node and ci Ia then qi = “/n” + p + Ia(ci)13 Else ci Ii then qi = “/n” + p + Ii(ci) End If14 End For15 If n is an accessible node then Ia(n) = i (qi)16 Else Ii(n) =

i (qi) End If

17 End While18 Return Ii , Ia19 End Algorithm

Page 10: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

626 Inf Syst Front (2012) 14:617–632

Transcripts

Transcript

Person

Name ID Major

Dept Prog

Semester CumGPA

Term GPA

*

+*

Major’

History

+

(a) Authorization graph

(2) Rewriting indexes Ii and Ia

(b) Predicate table and rewriting indexes

Fig. 5 Authorization model derived from the security specification for Transcripts.dtd

Major is cloned as Major′, HistoryN−→ Major is

removed, HistoryN−→ Major′ is added, Major

Y−→Prog is annotated, and Major′ N−→ Prog and Major′Y−→ Dept are added.

– It recursively removes incoming edges of leaf nodesand prunes leaf nodes whose all incoming edgeshave been removed (lines 39–45). Thus, nodes SSN,CNum, Credit, Grade, Class, TName, TScore, Test,and TestResult, along with their incoming edges,

are pruned; edge Major′ N−→ Prog is removed. Theresulting authorization graph is shown in Fig. 5a,where all the nodes are accessible (have ‘Y’ anno-tations on their all incoming edges), except for therectangle node Major that is inaccessible.

– It does not perform any action based on lines 46–48, since all leaf nodes in GA (see Fig. 5a) are alsoleaves in GS (see Fig. 1d).

– It returns GA and Pt as shown in Fig. 5.

The createRewritingIndexes algorithm takes newlycomputed GA and Pt, and computes indexes Ii and Ia.Since all the leaves are accessible, each one is addedto Ia, e.g., Ia(Dept) = /Dept, Ia(Prog) = /Prog, and soforth. Then, for each node, whose all children are in-dexed, the algorithm computes its index entry by takingthe union of all the paths from the node to its children.For example, for accessible node Major, Ia(Major) =/Major/Dept ∪ /Major/Prog and, for inaccessible nodeMajor′, Ii(Major) = /Major/Dept; note that there is onlyone inaccessible node in the graph and thus Ii has only

one entry. The algorithm continues to generate indexentries in a similar fashion until all the nodes haveentries in Ii or Ia. The predicate table is used to insertpredicates into XPath expressions that are found on thepath from a node to a leaf. Some more complex indexentries generated by the algorithm are shown in Fig. 5b.

5.2 XPath query analysis and rewriting

Given a user query and a precomputed authorizationmodel, the final step of our security enforcement mech-anism is to analyze and rewrite the query into a secureXPath query that can retrieve only authorized XMLdata. To achieve this goal, we design algorithm en-forceSecurity as shown in Fig. 6. The rewriting indexesIa and Ii greatly simplify this task since they alreadycontain secure XPath queries for the last (output) testnode of the user query. Our algorithm employs thenotion of XPath query graph which is derived from aDTD and contains the set of all possible paths for theuser query in any valid XML document that conformsto the DTD. To construct an XPath query graph, weuse the GetXPGraph algorithm proposed by Bottcherand Steinmetz (2003). Note that predicates in a userquery are different from predicates in an authorizationmodel. The predicates in XPath query are used asedge labels in the XPath query graph. The availabilityof such a graph allows us to efficiently match it withthe authorization graph to determine user accessibleinformation.

Page 11: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

Inf Syst Front (2012) 14:617–632 627

Fig. 6 AlgorithmenforceSecurity

The input of the enforceSecurity algorithm includesan authorization model (GA, Pt, Ii, Ia), a DTD graphGD and a user XPath query q. The output is a secure(rewritten) query that may be empty if q only asks forXML data that is not authorized by GA. First (lines 05–06), the algorithm analyzes if the last node test τ(q) inq is not a “*” and τ(q) is not in Ia or Ii. In other words,τ(q) should be an element name and should not appearin GA since all GA’s nodes have entries in the indexes.If this is the case, q must be simply rejected and theempty ø value must be returned. Second (lines 07–08),the algorithm constructs XPath query graph Gq. Third(lines 09–11), since Gq may have many leaf nodes (e.g.,when τ(q) is a “*”), the algorithm analyzes if none ofthe leaves have entries in the indexes, then the querymust be rejected. Fourth (lines 12–13), if all the leavesin Gq are leaves in GD and GA (and leaves of GA arealways accessible), and there are no predicates in Pt tobe inserted in the query, q is fully acceptable query andreturned as it is, without rewriting. Fifth (line 14), itconstructs the intersection graph of Gq and GA, suchthat all nodes and edges that are in both Gq and GA

are copied to a new graph G′q preserving the predicate

labels on Gq’s edges. Due to the intersection, G′q may

have leaf nodes that are not leaves in Gq, and mustbe recursively removed from G′

q (line 15). Note thatG′

q can not be empty, otherwise we would have exitedthe algorithm in line 11. Finally (lines 17–31), after theabove simple query analysis and the construction ofthe intersection graph, the algorithm proceeds with the

rewriting of the query since at least partial informationis accessible to the user:

– If G′q has only one leaf l and no edge in G′

q has apredicate entry in Pt, then the algorithm acceptsthe query (lines 17–20). τ(q) is replaced with thecorresponding index entry for l (note that τ(q) canbe a “*”) that ensures that only authorized resultswill be retrieved by q. Since GA may have acces-sible and inaccessible nodes with the same label l,the algorithm decides what index(es) to use basedon the paths from the root to l in G′

q and GA. Afterthe replacement, the query is returned.

– Otherwise, if G′q has multiple leaves (in this case,

each leaf requires a unique replacement from theindexes) or G′

q’s edges have predicates in Pt (in thiscase the predicates must be added to the query), thealgorithm performs the query rewriting (lines 21–31). The rewriting requires the enumeration of allpaths in G′

q, replacing leaves (last node tests) withthe index(es) entries and adding predicates from Pt

to corresponding nodes. The union of the obtainedqueries (paths) is returned as a secure query.

In the following, we provide a detailed example ofhow enforceSecurity works for three sample XPathqueries.

Example 5 (Query analysis and rewriting) Given theauthorization model (GA, Pt, Ii, Ia) computed in

Page 12: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

628 Inf Syst Front (2012) 14:617–632

Example 4, DTD graph GD in Fig. 1c, we apply theenforceSecurity algorithm to XPath queries:

q1 : //TestResultq2 : /Transcriptsq3 : //Transcript/*/Major/*

Query q1 is fully rejectable based on line 06 ofthe algorithm, because its output test node τ(q1) =TestResult does not have an entry in Ia and Ii.

Query q2 is partially acceptable with the substitu-tion from the indexes only (line 20). The algorithmcomputes XPath query graph Gq2 for q2 (line 07); thegraph has only one node Transcripts and no edges.Next, G′

q2= Gq2 ∩ GA (line 14) is computed, result-

ing in the same graph with one node. Finally, lines17–20 are executed and τ(q2) = Transcripts is substi-tuted with an index entry Ia(Transcripts). The resultingquery is Transcripts/(Transcript[./Person/Major/Dept =$dept]/(Person/Name ∪ Person/ID ∪ Person/(Major/Dept ∪ Major/Prog)) ∪ Transcript[./Person/Major/Dept= $dept]/ (History/Major/Dept ∪ History/(Semester/Term ∪ Semester/GPA) ∪ History/CumGPA))); it re-trieves all accessible information under Transcripts.

Query q3 is partially acceptable with rewriting(line 30). The algorithm computes XPath query graphGq3 for q3 (line 07); the graph is shown in Fig. 7.Next, G′

q3= Gq3 ∩ GA (line 14) is computed, result-

ing in the same graph as Gq3 . Finally, lines 22–30are executed and three possible root-to-leaf pathsin G′

q3, that are also GA, are computed, resulting

in query Transcripts/Transcript[./Person/Major/Dept =$dept]/Person/Major/Dept ∪ Transcripts/Transcript[./Person/Major/Dept = $dept]/Person/Major/Prog ∪Transcripts/Transcript[./Person/Major/Dept = $dept]/History/Major/Dept; it retrieves all accessible informa-tion under Major.

Fig. 7 XPath query graph for//Transcript/*/Major/*

Transcript

History

Transcripts

Major

Person

ProgDept

6 Performance study

This section reports the performance experimentsconducted using our deriveAuthorizationModel, de-riveAuthorizationGraph, createRewritingIndexes andenforceSecurity algorithms. In the following, we de-scribe our experimental setup, datasets, test queries,and three experiments.

Experimental setup All the algorithms were imple-mented in Java. To evaluate XPath queries over sampleXML documents, we used the X-Hive/DB 8.1.2 sys-tem (X-Hive 2008). The experiments were conductedon a PC with one 1.7 GHz Pentium M CPU and 512 MBof main memory operated by MS Windows XP Profes-sional. To measure algorithm running time, we ran eachalgorithm for 10 or more times and took the mean ofthese trails.

Datasets We used two DTDs in our experiments:(1) the transcript DTD Transcripts.dtd presentedthroughout the paper and (2) the auction DTD auc-tion.dtd from the XMark benchmark (Schmidt et al.2002). The DTD graph of Transcripts.dtd was rathersimple with 22 nodes and 23 edges. The DTD graphof auction.dtd was more complicated with over 200nodes and edges after the removal of several recur-sive edges, since our algorithms required an acyclicDTD graph. The security specification for trancripts.dtdwas as defined in Fig. 1d. The security specificationfor auction.dtd allowed the access to a closed auctiononly to the winner of the auction (closed_auctions.//buyer/@person=$userid−−−−−−−−−−−−−−−→ closed_auction) and did not al-

low the access to credit card information (personN−→

creditcard).To evaluate XPath queries over sample XML docu-

ments, we used two XML documents: (1) Transcripts.xml of size 256 KB that conforms to Transcripts.dtdand (2) auction.xml of size 10 MB that conforms toauction.dtd.

Test queries We used the following 10 XPath queries(Q1-Q5 are defined for Transcripts.dtd and Q6-Q10 arefor auction.dtd):

Q1 = /Transcripts/Transcript/Person/NameQ2 = //Transcript/ ∗ /Major/ProgQ3 = /Transcripts/Transcript/PersonQ4 = /Transcripts/Transcript/History/Semester

Table 1 Performance ofalgorithmderiveAuthorizationModel

DTD Time (μs)

Transcripts.dtd 204auction.dtd 732

Page 13: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

Inf Syst Front (2012) 14:617–632 629

Fig. 8 Performancecomparison of our rewritingprocedure enforceSecurityand the rewriting procedureby Fan et al. (2004)

020406080

100120140160

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

XPath queriesT

ime

(ms)

Our rewriting procedure Fan's rewriting procedure

Q5 = //Transcript/History[//Dept = ‘CS′]Q6 = /site/regions/ ∗ / itemQ7 = // item[./region = ‘America′]Q8 = //open_auction[./price > 100]Q9 = //closed_auction[./price > 100]Q10 = //person[./@id = /site//seller/@person

These queries were carefully selected, such that theirrewriting with algorithm enforceSecurity results in thesame secure queries as the ones that can be obtainedwith the rewriting procedure by Fan et al. (2004).All the test queries were partially acceptable and re-quired rewriting. In case of fully acceptable or fullyrejectable queries, our algorithm enforceSecurity gaveits response instantly.

Experiment 1—derivation of an authorization modelIn this experiment, we derived the authorization mod-els for the security specification graphs of Tran-scripts.dtd and auction.dtd. The results are shown inTable 1. The construction of the models using algo-rithm deriveAuthorizationModel that called deriveAu-thorizationGraph and createRewritingIndexes took lessthan one sec. The algorithms performed the fastest onthe graph of Transcripts.dtd, since it was significantlysmaller than the graph of auction.dtd. Note that the

authorization model derivation was required to be per-formed only once for a particular security specification(or a user role). Thus, it did not influence the queryresponse time in the following experiments.

Experiment 2—query analysis and rewriting We usedthe enforceSecurity algorithm to analyze and rewrite10 test XPath queries listed above into their securecounterparts. In addition, we compared the perfor-mance of our algorithm with the performance of therewriting algorithm presented by Fan et al. (2004). Inboth implementations, for a fair comparison, we didnot apply evaluation optimization (Fan et al. 2004),even though it can be beneficial and can be performedfor both approaches. The results of our performancecomparison are shown in Fig. 8. For queries Q1 andQ2, running times for both algorithms were the same,because these queries retrieved XML leaf nodes andthus, the substitutions from our rewriting indexes didnot speed-up query rewriting. On the other hand, ourrewriting procedure showed to be much faster forqueries Q3 through Q10, because our approach rewrotethose queries by just referencing rewriting indexes,while Fan’s approach rewrote the same queries by col-lecting all accessible paths in a DTD.

Fig. 9 Query response timefor our approach to secureXML querying using X-HiveXQuery implementation

0

50

100

150

200

250

300

Q1 Q2 Q3 Q4 Q5

XPath queries(over 256KB Transcripts.xml)

Tim

e(m

s)

Query evaluation Our rewriting procedure

(a) Queries Q1-Q5

0

500

1000

1500

2000

2500

Q6 Q7 Q8 Q9 Q10

XPath queries(over 10MB auction.xml)

Tim

e(m

s)

Query evaluation Our rewriting procedure

(b) Queries Q6-Q10

Page 14: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

630 Inf Syst Front (2012) 14:617–632

Experiment 3—secure XML query evaluation To ex-plore the effect of our approach to secure XMLquerying on the query response time, we storedXML document Transcripts.xml of size 256 KB andXML document auction.xml of size 10 MB into theX-Hive/DB system and measured both rewriting andevaluation performances for our test queries. While therewriting procedure showed to have significant effecton overall query response time (see Fig. 9a) for thesmaller XML document, such effect was inconsiderablefor the larger XML document (see Fig. 9b), since queryevaluation time was much larger than query rewritingtime.

7 Conclusions and future work

As XML becomes the most common data represen-tation over the World Wide Web, we need effectiveand efficient access control mechanisms to restrict auser to access only the parts of XML documents ac-cording to his/her authorized access right. To addressthis requirement, there exist several DTD-based XMLaccess control models, however, there is a lack of queryanalysis technique that can decide if an input queryis fully acceptable, fully rejectable, or partially accept-able. Such a technique will enable the elimination offurther overhead for the processing of fully acceptableand fully rejectable queries. In this paper, we proposedthe first DTD-based access control model that employsgraph matching to analyze if an input query is fullyacceptable, fully rejectable, or partially acceptable. Inthis way, there will be no further security overheadfor the processing of fully acceptable and rejectablequeries. For partially acceptable queries, we proposedan optimized rewriting procedure in which a recursivequery is rewritten into an equivalent recursive one ifpossible and into a non-recursive one only if necessary,resulting queries that can fully take advantage of struc-tural join based query optimization techniques. Finally,we proposed an index structure for XML element typesto speed up the query rewriting procedure. Our perfor-mance study results showed that our algorithms armedwith rewriting indexes are promising.

Recently, XML has also become the de facto stan-dard for scientific datasets used in scientific workflowenvironments (Hastings et al. 2005; Moreau et al. 2005).In such environments, a scientist is usually only willingto share one portion of her datasets to some particularworkflow task under a particular project context (e.g.,permissions are given only to a particular role of usersand only for a particular workflow run), there is a greatneed for an access control for scientific workflows that

is fine-grained not only at the subject side: workflows,tasks, ports, and data channels, but also at the datasetside: parent data elements, child elements, and de-scendant elements. To facilitate and control the securesharing and access of XML-based scientific datasetsin scientific workflow environments, we will extendthe access control mechanism proposed in this papertowards scientific workflow environments.

References

Atay, M., Chebotko, A., Lu, S., & Fotouhi, F. (2007). XML-to-SQL query mapping in the presence of multi-valued schemamappings and recursive XML schemas. In Proceedings ofthe international conference on database and expert systemsapplications (DEXA) (pp. 603–616).

Bertino, E., Castano, S., & Ferrari, E. (2001). Securing XMLdocuments with Author-X. IEEE Internet Computing, 5(3),21–31.

Bertino, E., Castano, S., Ferrari, E., & Mesiti, M. (2002). Pro-tection and administration of XML data sources. Data andKnowledge Engineering, 43(3), 237–260.

Bertino, E., & Ferrari, E. (2002). Secure and selective dissemina-tion of XML documents. ACM Transactions on Informationand System Security, 5(3), 290–331.

Bottcher, S., & Steinmetz, R. (2003). A DTD graph based XPathquery subsumption test. In Proceedings of the internationalXML database symposium (pp. 85–99).

Bouganim, L., Ngoc, F. D., & Pucheral, P. (2004). Client-basedaccess control management for XML documents. In Pro-ceedings of the internatonal conference on very large databases (VLDB) (pp. 84–95).

Byun, C., & Park, S. (2006). An efficient yet secure XML accesscontrol enforcement by safe and correct query modification.In Proceedings of the international conference on databaseand expert systems applications (DEXA) (pp. 276–285).

Chang, S., Chebotko, A., Lu, S., & Fotouhi, F. (2007). Graphmatching based authorization model for efficient secureXML querying. In Proceedings of the international confer-ence on advanced information networking and applications(AINA), workshops proceedings (pp. 473–478).

Cho, S., Amer-Yahia, S., Lakshmanan, L. V. S., & Srivastava, D.(2002). Optimizing the secure evaluation of twig queries. InProceedings of the internatonal conference on very large databases (VLDB) (pp. 490–501).

Cuppens, F., Cuppens-Boulahia, N., & Sans, T. (2005). Protec-tion of relationships in XML documents with the XML-BBmodel. In Proceedings of the international conference on in-formation systems security (ICISS) (pp. 148–163).

Cuppens, F., Cuppens-Boulahia, N., & Sans, T. (2007). XML-BB: A model to handle relationships protection in XMLdocuments. In Proceedings of the international conferenceon knowledge-based intelligent information and engineeringsystems (pp. 1107–1114).

Damiani, E., De Capitani di Vimercati, S., Paraboschi, S., &Samarati, P. (2000). Securing XML documents. In Proceed-ings of the international conference on extending databasetechnology (EDBT) (pp. 121–135).

Damiani, E., di Vimercati, S. D. C., Paraboschi, S., & Samarati,P. (2002). A fine-grained access control system for XMLdocuments. ACM Transactions on Information and SystemSecurity, 5(2), 169–202.

Page 15: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

Inf Syst Front (2012) 14:617–632 631

Damiani, E., Fansi, M., Gabillon, A., & Marrara, S. (2008).A general approach to securely querying XML. ComputerStandards and Interfaces, 30(6), 379–389.

Diao, Y., Altinel, E., Franklin, M. J., Zhang, H., & Fischer,P. (2003). Path sharing and predicate evaluation for high-performance XML filtering. ACM Transactions on DatabaseSystems, 28(4), 467–516.

Duong, M., & Zhang, Y. (2008). An integrated access control forsecurely querying and updating XML data. In Proceedingsof the Australasian database conference (ADC) (pp. 75–83).

Fan, W., Chan, C.-Y., & Garofalakis, M. (2004). Secure XMLquerying with security views. In Proceedings of the SIGMODinternational conference on management of data (pp. 587–598).

Finance, B., Medjdoub, S., & Pucheral, P. (2005). The case foraccess control on XML relationships. In Proceedings of theinternational conference on information and knowledge man-agement (CIKM) (pp. 107–114).

Gabillon, A. (2005). A formal access control model for XMLdatabases. In Proceedings of the international workshop onsecure data management (SDM) (pp. 86–103).

Gabillon, A., & Bruno, E. (2001). Regulating access to XMLdocuments. In Proceedings of the annual working conferenceon database and application security (pp. 299–314).

Hastings, S., Ribeiro, M., Langella, S., Oster, S., Çatalyürek,Ü. V., Pan, T., et al. (2005). XML database support fordistributed execution of data-intensive scientific workflows.SIGMOD Record, 34(3), 50–55.

Ko, H.-K., Kim, M.-J., & Lee, S. (2007). On the efficiency ofsecure XML broadcasting. Information Sciences, 177(24),5505–5521.

Kocatürk, M. M., & Gündem, T. I. (2008). A fine-grained accesscontrol system combining MAC and RBACK models forXML. Informatica, Lith. Acad. Sci., 19(4), 517–534.

Kudo, M., & Hada, S. (2000). XML document securitybased on provisional authorization. In Proceedings of theACM conference on computer and communications security(pp. 87–96).

Kundu, A., & Bertino, E. (2008). A new model for secure dis-semination of XML content. IEEE Transactions on Systems,Man, and Cybernetics, Part C, 38(3), 292–301.

Kuper, G. M., Massacci, F., & Rassadko, N. (2005). GeneralizedXML security views. In Proceedings of the symposium on ac-cess control models and technologies (SACMAT) (pp. 77–84).

Lee, J.-G., & Whang, K.-Y. (2006). Secure query processingagainst encrypted XML data using query-aware decryption.Information Sciences, 176(13), 1928–1947.

Luo, B., Lee, D., Lee, W.-C., & Liu, P. (2004). QFilter: Fine-grained run-time XML access control via NFA-based queryrewriting. In Proceedings of the ACM international confer-ence on information and knowledge management (CIKM)(pp. 543–552).

Miklau, G., & Suciu, D. (2003). Controlling access to publisheddata using cryptography. In Proceedings of the internatonalconference on very large data bases (VLDB) (pp. 898–909).

Mohan, S., Sengupta, A., & Wu, Y. (2005). Access control forXML: A dynamic query rewriting approach. In Proceedingsof the International Conference on Information and knowl-edge management (CIKM) (pp. 251–252).

Mohan, S., Sengupta, A., & Wu, Y. (2007). A rewrite basedapproach for enforcing access constraints for XML. In Pro-ceedings of the international conference on knowledge-basedintelligent information and engineering systems (pp. 1081–1089).

Moreau, L., Zhao, Y., Foster, I. T., Vöckler, J.-S., & Wilde, M.(2005). XDTM: The XML data type and mapping for spec-

ifying datasets. In Proceedings of the European grid confer-ence (EGC) (pp. 495–505).

Murata, M., Tozawa, A., Kudo, M., & Hada, S. (2003). XMLaccess control using static analysis. In Proceedings of theACM conference on computer and communications security(pp. 73–84).

Qi, N., Kudo, M., Myllymaki, J., & Pirahesh, H. (2005). Afunction-based access control model for XML databases.In Proceedings of the ACM international conference on in-formation and knowledge management (CIKM) (pp. 115–122).

Sasaki, T., Fukushima, T., Park, D., & Toyama, M. (2008). Fine-grained access control in hybrid relational-XML database.In Proceedings of the international conference on digital in-formation management (ICDIM) (pp. 599–604).

Schmidt, A., Waas, F., Kersten, M. L., Carey, M. J., Manolescu,I., & Busse, R. (2002). XMark: A benchmark for XML datamanagement. In Proceedings of the internatonal conferenceon very large data bases (VLDB) (pp. 974–985).

Stoica, A., & Farkas, C. (2002). Secure XML vies. In Proceedingsof the IFIP WG11.3 working conference on database andapplication security.

W3C (2004). XML schema part 0: Primer (2nd ed.). http://www.w3.org/XML/Schema. Accessed October 2004.

W3C (2006a). Extensible markup language (XML) 1.0 (4th ed.).http://www.w3.org/TR/REC-xml/. Accessed August 2006.

W3C (2006b). XML path language (XPath) 2.0. http://www.w3.org/TR/xpath20/. Accessed November 2006.

W3C (2007). XQuery 1.0: An XML query language. http://www.w3.org/TR/xquery/. Accessed January 2007.

Wang, J., & Osborn, S. L. (2004). A role-based approach toaccess control for XML databases. In Proceedings of theACM symposium on access control models and technologies(SACMAT) (pp. 70–77).

X-Hive (2008). http://www.x-hive.com.Yu, T., Srivastava, D., Lakshmanan, L. V. S., & Jagadish, H. V.

(2002). Compressed accessibility map: Efficient access con-trol for XML. In Proceedings of the internatonal conferenceon very large data bases (VLDB) (pp. 478–489).

Artem Chebotko received the PhD degree in computer sciencefrom Wayne State University in 2008. He is currently an assistantprofessor in the Department of Computer Science, Universityof Texas -Pan American. His research interests include semanticweb data management, scientific workflow provenance metadatamanagement, scientific workflows and services computing. Hehas published over 30 papers in refereed journals and conferenceproceedings and currently serves as a program committee mem-ber of several international conferences and workshops. He is amember of the IEEE.

Seunghan Chang received the PhD degree in computer sciencefrom Wayne State University in 2008. His research interestsinclude workflow and XML data security. He is currently anactive officer of the Republic of Korea Armed Forces.

Shiyong Lu received the PhD degree in computer science fromthe State University of New York at Stony Brook in 2002, MEfrom the Institute of Computing Technology of Chinese Acad-emy of Sciences at Beijing in 1996, and BE from the Universityof Science and Technology of China at Hefei in 1993. He is

Page 16: Secure XML querying based on authorization graphs › ~shiyong › papers › isf2012.pdf · Web and an increasing amount of sensitive business data is processed in XML format. Therefore,

632 Inf Syst Front (2012) 14:617–632

currently an associate professor in the Department of ComputerScience, Wayne State University, and the director of the Scien-tific Workflow Research Laboratory (SWR Lab). His researchinterests include scientific workflows and databases. He has pub-lished more than 90 papers in refereed international journalsand conference proceedings. He is the founder and currentlya program co-chair of the IEEE International Workshop onScientific Workflows (2007–2011), an editorial board member forInternational Journal of Semantic Web and Information Systemsand International Journal of Healthcare Information Systemsand Informatics. He is a senior member of the IEEE.

Farshad Fotouhi received the PhD degree in computer sci-ence from Michigan State University in 1988. In August 1988,he joined the faculty of Computer Science at Wayne StateUniversity, where he is currently a professor and the chair ofthe department. His research interests include databases, queryoptimization, and multimedia systems. He has published morethan 100 papers in refereed journals and conference proceedings,served as a program committee member of various database-related conferences. He is on the Editorial Boards of the IEEEMultimedia Magazine and International Journal on SemanticWeb and Information Systems. He is a member of the IEEE.