a comprehensive survey of watermarking relational ... · 1 a comprehensive survey of watermarking...

1

A Comprehensive Survey of WatermarkingRelational Databases Research

M. Kamran and Muddassar Farooq

Abstract—Watermarking and fingerprinting of relational databases are quite proficient for ownership protection, tamper proofing, andproving data integrity. In past few years several such techniques have been proposed. A survey of almost all the work done, till date,in these fields has been presented in this paper. The techniques have been classified on the basis of how and where they embed thewatermark. The analysis and comparison of these techniques on different merits has also been provided. In the end, this paper pointsout the direction of future research in these fields.

Index Terms—Database watermarking, ownership protection, database security, tamper detection, data integrity, data usability.

F

1 INTRODUCTIONThe data is an important asset for its owner. Digitaldata sharing is becoming an emerging trend both at thepersonal and organization level. Information and com-munication technology (ICT) systems generate and useenormous amount of data that contains useful knowl-edge which needs to be extracted by using differentdata mining and warehousing techniques. In a col-laborative environment, individuals and organizationsneed to share their data using different resources. Theshared data must adhere to secrecy (for preventingunauthorized disclosure of data), integrity (maliciousdata modifications), and availability (error recovery) [1].The digital data can be copied, altered and may be re-distributed for different purposes, and this fact mayviolate the copyrights of the data owner.

With an ever increase in sharing of data, the digitaltechnology faces the challenge of preventing piracy ofprecious and sensitive data in different formats like text,software, image, video, audio, spatial trajectory data [2],[3], [4], [5], [6] to ensure that the data is used in anauthorized manner only. Only the relevant and necessarydata should be shared across private databases and theshared data should also be right protected. To tackle thissituation, there is a need of an efficient right protectionmechanism for the digital data. In this context, issueslike ownership protection and illegitimate circulationof digital content can be controlled by Digital RightsProtection (DRP).

Cryptographic techniques usually do not associate thecryptographic information with digital contents; there-fore, cryptography cannot provide the ownership infor-

• M. Kamran is with the Department of Computer Science, COMSATSInstitute of Information Technology Wah Campus, Wah Cantt, Pakistan.E-mail: [email protected]

• Muddassar Farooq is with the Muslim Youth University(MYU), Islam-abad, Pakistan.E-mail: [email protected]

mation [7]. Moreover, the cryptosystem (a suite consist-ing of three algorithms: (i) a key generation algorithm;(ii) an encryption algorithm; and (iii) a decryption al-gorithm) does not provide any guaranteed mechanismfor tracing the redistribution and/or alteration of thedigital content. Consequently, they alone are not wellsuited for DRP. Some techniques such as [8] can be usedfor detecting and identifying the changes made in thedata – by organizing the indexing structure of storeddata. Digital watermarking, on the other hand, associatespermanent information with the digital contents. Thisinformation can then be used to prove the ownershipof the digital content. Watermarking of digital data hasbeen quite widely used for right protection of digitalintellectual property (IP) [9]. A watermark is basicallya piece of secret information, which is inserted into aspecific location inside the digital data as a copyrightinformation.

Watermarking has – so far – found its applicationin right protection of images, multimedia, natural lan-guage text, softwares, semi-structured data like XML,map information, relational databases, electronic medicalrecords (EMR), and Geographical Information Systems(GIS) databases. Among these areas, relational databaseswatermarking is a relatively fresh area but is of great im-portance because outsourced databases need to be rightprotected to stop its illegal (or unauthorized) use. Thewatermarking of relational databases is different fromother data formats. As opposed to other data formats,in database watermarking one has to deal with multipletuples simultaneously, the ordering of database tuplesdoes not matter, and a subset of the database is alsouseable. Moreover, a database may contain attributeswith different data types like numeric, text, and discretedata. So watermarking such data requires valid marksfor every data type.

A database watermarking scheme should be robustagainst all kinds of malicious attacks that are launchedfor deteriorating the watermark. The watermarking

arX

iv:1

801.

0827

1v1

[cs

.CR

] 2

5 Ja

n 20

18

2

scheme should also be imperceptible, which requires thatthere should not be any apparent difference betweenthe original and the watermarked content. Furthermore,the watermark should be such that it can be detectedusing some secret parameters like secret key etc. Agood watermarking technique is also blind, that is, itshould not need the original data or watermark todetect the watermark from the altered data. Moreover,the embedded watermark should only be detectable bythe data owner. Furthermore, the watermark should notdeteriorate the original data and data usability should beensured during the process of watermark insertion. Thewatermarking of databases should also take into accountsuch methods which ensure that the query results andassociated statistics should be preserved. If a watermarkhas to be embedded multiple times, then the watermarksshould not affect each other.

A fragile watermark, which is affected even by mi-nor malicious attacks, is also a requirement for tam-per proofing and data integrity. Fragile watermarkingtechniques are suitable for applications that require dataintegrity and therefore fragile watermarks are usually bevulnerable to even benign updates because he attacker–Mallory– does not want to disturb the embedded water-mark while trying to tamper the watermarked dataset.

Fingerprinting of relational databases is also similar towatermarking with a major difference: in fingerprintinga database is marked with a different watermark forevery different user of the fingerprinted database, whilein watermarking usually the same watermark is usedfor all the users of the watermarked database. Onesuch technique is presented in [10]. The fingerprintingtechniques are used in the applications domains wherethe data owner –Alice– wants to identify the guilty agentwho was responsible for the data leakage(the guiltyagent is represented as Mallory in such scenarios.).

To summarize, the three main type of watermarktechniques have different applications as:• The robust watermarking techniques are suitable for

applications that require ownership protection.• The fragile watermarking techniques serve the pur-

pose for data tamperproofing and data integritycheck.

• The fingerprinting techniques provide a mechanismfor identifying the guilty user whose data was com-promised.

Arguably, the major requirement of watermarkingscheme – for ownership protection and fingerprinting– is its robustness against malicious attacks. Therefore,the data owner Alice1, would want to insert the mostpossible robust watermark in the database. Alice hastwo options to embed watermark into her data. Shecan either modify the value of some selected bits ofthe data, or embed watermark into the data statistics.The later approach is more robust to malicious attacks

1. Throughout this document, we refer Alice(female) as the dataowner.

as the former can easily be attacked by simple datamanipulations, such as shifting of some least significantbits (LSBs). Also, the embedding of watermark in largenumber of tuples makes it more resilient. But Alice canonly insert the watermark bounded by the data usabilityconstraints. These usability constraints are applicationdependent, hence every time Alice wants to watermarkher data, she has to define the usability constraints.Once Alice has watermarked her data, she would makeit public or share (or sale) it to some party. The at-tacker, Mallory2, would want to corrupt the embeddedwatermark or even remove the watermark from thewatermarked data. Since the attacker has no access tothe original database; therefore, he might deterioratethe usability of the data during these attacks. But it issupposed that he can locate the watermark if he hasaccess to the secret parameters (such as a secret key). Theattacker can perform every type of manipulation on thewatermarked data because he has unrestricted access tothe watermarked data. He can attack the data using thesemanipulations, so these manipulations define the type ofattack. After attacking the database Mallory yields a newdatabase.

Alice has to make her watermark robust against theseattacks so that they cannot affect the embedded water-mark. These attacks can be broadly classified as:• A1(Insertion attacks): Mallory may insert α new

records in the watermarked database.• A2(Deletion attacks): Mallory may delete α records

from the watermarked database.• A3(Alteration attacks): Mallory may alter α records

from the watermarked database.• A4(Multifaceted attacks): Mallory may delete α

records, alter β records, and insert γ new recordsto form a new database.

• A5(Additive (Re-watermarking) attacks): In such at-tacks, Mallory inserts his own watermark in thealready marked database.

Similarly, for fragile watermarking technique, the ma-jor requirement for the watermark is to be vulnerableto even minor modifications of the dataset while alsofulfilling the additional requirements of characterizationand localization of attacks.

After the pioneer work by Agrawal et al. [11] theproblem of relational databases watermarking has beenaddressed in several techniques, with each techniquehaving its own benefits, drawbacks, issues and lim-itations. Some of these techniques are reversible (forinstance, [12], [13], [59]) while other are irreversible(for instance, [15], [16], [17], [18]) depending upon thenature of their intended applications. Similarly, sometechniques are meant for ownership protection (for in-stance, [53], [19], and [21]) while others are designedfor data integrity and verification (for instance, [22] and[23]). For database watermarking, usually bit strings [24],

2. Throughout this document, we refer Mallory (male) as the mali-cious attacker.

3

images [25], speech signals [26], character strings [27] areused as a watermark.

In [28] Halder et al. presented a survey for classifi-cation and comparison of some of these watermarkingtechniques. We believe that a detailed survey of rela-tional database watermarking techniques is an importantneed in order to facilitate researchers to explore this areaof research by providing them access to state-of-the-artresearch. This paper presents a comprehensive surveyof relational database watermarking and fingerprintingtechniques – developed to date – with the objective ofdeveloping an understanding about: (1) how and wherethe watermark is inserted; (2) their resilience againstmalicious attacks; (3) the shortcomings of the existingwork. Moreover, the paper will also act as a referenceto provide pointers to latest research in this field. As aresult, it will enable researchers to design an effectivewatermarking technique for their application domain.

We classify the existing work done into three broadcategories based on how and where the watermark isinserted. We also present a comparison of major wa-termarking techniques – belonging to the same classi-fication hierarchy in our paper – so that a reader canlearn about their evolutionary history. For this purpose,we short-listed only those papers for comparison whichwere published in reputable computer science Confer-ences and Journals3 and for brevity, we also did notcompare all of them.

For collecting the research articles related to relationaldatabase watermarking, we have used the concept ofcitation graph by making two way transitions: (1) thefirst one is to select a pioneer paper and look at relevantreferences in its list; and (2) use Google Scholar toidentify all articles (published after this article) that citeit. As a result, we have short listed more than 100 papers.The literature search was exhaustive and every effort ismade to include papers published in prestigious Journalsand Conferences.

We have organized the rest of the paper as follows.We present the brief description of the related work inSection 2. A generic framework of relational databasewatermarking technique is given in Section 3 and acomprehensive review and classification of watermark-ing and fingerprinting techniques in Section 4 (for easeof reference figure 1 shows the hierarchical structureof review and classification of watermarking and fin-gerprinting techniques). Then we give a comparison ofdifferent classes of watermarking schemes and point outthe future directions in Section 5 and conclude in Section6.

2 MOTIVATIONS AND RELATED WORKThe major motivation behind this work is to develop abetter understanding of the relational data watermarking

3. The list of computer science Confer-ences and Journals ranking can be found athttp://academic.research.microsoft.com/?SearchDomain=2&entitytype=2(link last accessed on April 04, 2016).

techniques for real world problems. Moreover, we arealso interested in the vulnerability of these techniquesagainst malicious attacks. We also analyze the processof defining data usability constraints and their relevanceto relational data.

In [29], Sion et al. define some attacking scenariosavailable for the attacker and suggest design choices toincrease watermark safety but they did not investigatethe attacker model in detail and also their objective wasnot to present a comparison of watermarking techniques.In [28], Halder et al. presented a classification andcomparison of some relational database watermarkingtechniques. In [30], authors presented the idea of twomodels regarding leakage of secret watermarking con-tents. These models are based on different combinationof keys and databases. In [31], [32], and [33] the surveyof few selected database watermarking techniques hasbeen presented.

But the focus of these efforts are also not towardscomparing different watermarking techniques under dif-ferent attack scenarios. On the other hand, the focus ofour work is to present the first (comprehensive) surveyby critically reviewing almost all relational databasewatermarking techniques – presented so far – by: (1)highlighting issues related to watermark embedding;(2) investigating their robustness; (3) exploring theirapplicability for real world problems; and (4) suggestingthe future directions of research.

3 A GENERIC FRAMEWORK FOR RELATIONALDATABASE WATERMARKING/ FINGERPRINTINGTECHNIQUES

The watermarking or fingerprinting technique needs tomeet certain common requirements, however, they dostill differ in many other aspects. This section entailsa brief description of these common requirements andalso highlights different decision choices among thewatermarking (and fingerprinting) techniques. The ad-vantage of this description is two folds: (1) it providesa generic unified reference framework for describingand comparing different watermarking and fingerprint-ing techniques; (2) this framework can also be used asa guideline for developing meta-techniques for water-marking relational data. It consists of three modulesand a several submodules that can be used to designand implement a watermarking technique. The top levelmodules include: (1) watermark encoding; (2) attackerchannel; and (3) watermark decoding. The frameworkin figure 2 depicts the characteristics of the differentmodules and their inter-module and intra-module col-laborations. The functionality and characteristic of eachmodule and its subsequent components is given in thenext subsection.

3.1 Watermark EncodingThe main purpose of this module is embedding own-ership information in the dataset. This module may

http://academic.research.microsoft.com/?SearchDomain=2

4

Fig. 1. Diagram for depicting hierarchical structure of review and classification of watermarking and fingerprintingtechniques.

include: (1) preprocessing steps like data partitioning,defining secret parameters etc.; (2) strategies to selectdata for watermarking while using an optimization strat-egy to ensure best watermark encoding; (3) definingusability constraints to ensure quality of data duringwatermark embedding; (4) embedding watermark sub-ject to usability constraints; (5) computing a number ofparameters for use in watermark decoding stage, andfinally (6) generating the watermarked dataset for deliv-ering it to the intended recipients. The brief descriptionof each submodule is given in the following.

Preprocessing. The optional preprocessing steps mayinvolve conversion of watermark information (images,owner name etc.) to bit strings. The data is also usuallydivided into logical groups using a particular givencriterion. Data partitioning is performed using secretparameters to ensure that only the data owner knowsthe information about the data partitioning.

Strategies. Strategies for selecting attribute(s), tuple(s),watermark optimization, and watermark embeddingare different components of this sub-module. It is notmandatory that a watermarking technique will utilizeall of the above-mentioned strategies.

Usability. Usability constraints are defined by the dataowner to identify bandwidth for watermarking. Thesole purpose of this sub-module is to ensure that thewatermarked data remains usable.

Watermark embedding. This sub-module embeds the

watermark in accordance with the encoding strategiesand data usability. The watermark embedding has tocater a number of challenges by answering questions likewhether: (1) an inserted watermark brings distortions inthe original data; (2) an inserted watermark is fragile(watermark is corrupted after even minor maliciousattacks), semi-fragile(watermark may tolerate minor ma-licious attacks, but cannot tolerate major attacks), orrobust against attacks (can tolerate malicious attacks to acertain level); (3) the inserted watermark is reversible orirreversible (that is original data may or not be recoveredafter watermark decoding); (4) an inserted watermark isperceptible or imperceptible (but it is desired that rela-tional database watermarking should be imperceptible).

Decoding parametersSome of the parameters of watermark embedding

are usually required for watermark decoding, that arecalculated and saved by this submodule.

Watermarked datasetThe main output of watermark encoding module is

a watermarked dataset that subsequently delivered tointended recipients.

3.2 Attacker Channel

An attacker – Mallory – can launch a number of ma-licious attacks on the watermarked data. The attackerchannel module refers to all such possible attacks. The

5

Fig. 2. Diagram for depicting different components of a watermarking technique.

6

attacks can modify some contents of the data with an aimto disturb the embedded watermark. The description ofthis module has already been given in Section 1.

3.3 Watermark DecodingThis module is used to decode the embedded watermarkfrom the suspicious data. Usually, it involves: (1) somepreprocessing steps for dataset partitioning (if it wasperformed during watermark embedding); (2) decodingstrategies for recovering watermark; (3) decoding stepsfor marked attribute(s), marked tuple(s), and markedbit identification; (4) error correction and recovery steps;and (5) recovery of original data (for reversible water-marking techniques only). The brief description of eachcomponent of this module is as follows.

Preprocessing. If dataset partitioning was performedduring watermark encoding, the same steps are repeatedto generate data partitions.

Decoding Strategies. The watermark decoding strate-gies include: (1) blindness of watermark decoding al-gorithm – a blind decoding algorithm does not requireany part of original data or embedded watermark; butsemi blind decoding usually requires the embeddedwatermark during decoding; (2) the watermark decodingmay be probabilistic or deterministic in nature; and (3)watermark detection may be public or private depend-ing on the security requirements.

Decoding steps. The marked attribute(s) and tuple(s)are identified and watermark bits are decoded by thissubmodule.

Post processing. The main responsibility of this sub-module is to employ error correction mechanism (ifused) on the decoded watermark bits, and convert thesebits into the watermark information that was embeddedas the watermark (ownership information).

Recovered dataset. After watermark decoding theoriginal data may be recovered by reversible watermark-ing techniques only.

4 WATERMARKING/FINGERPRINTINGTECHNIQUES CLASSIFICATION

Figure 3 is used to provide a basis for classification ofwatermarking and fingerprinting techniques. In this fig-ure the blocks containing label ”Watermark Information”and ”Watermark Embedding Method” are used to definedifferent classes and their hierarchies. In this paper,we classify the watermarking/fingerprinting techniquesinto three broad categories based upon these two blocks.They are:• Bit-resetting techniques(BRT). In these techniques,

selected bits (mostly LSBs) are reset by followinga systematic process.

• Data statistics-modifying techniques(DSMT): Datastatistics such as mean, variance, distribution areused to embed watermark in these techniques.

• Constrained data content-modifying tech-niques(CDCMT). These techniques are based

on modifying the contents of the data, for example,techniques based on the ordering of the tuples(suchas in [34]), insertion of extra spaces in attributevalues (such as in [35]) such that watermarkeddata still remains useful. We also place the zero-watermarking techniques (like [36]) under thiscategory.

Now we give a review of techniques that belong to thesetop level techniques.

4.1 Bit-resetting techniques

We further classify such techniques into three categories:• Bit pattern based bit-resetting techniques(BPBBRT):

In such techniques a bit pattern is used as a wa-termark and embedded in the selected bits of thedata.

• Image based bit-resetting techniques(IBBRT): Imageis used as a watermark, for embedding in the se-lected bits of the data, in such techniques.

• Bit-resetting fingerprinting techniques(BRFT): Thesetechniques employ fingerprinting phenomenon formarking selected bits of the relational databases.

4.1.1 Bit pattern based bit-resetting techniques

4.1.1.1 BPBBRT with no specific watermark infor-mation: In [11] Agrawal and Kiernan proposed a pio-neering work of watermarking relational databases. Aparameter γ is used to compute the fraction of tuplesto be watermarked. In watermark embedding phase, atuple is selected for watermarking if the hash value ofits primary key modulo γ is equal to zero. In the nextstep, an attribute and the bit position to be watermarkedis selected. A watermark bit is embedded in a tuple bycomputing a hash value after applying a hash functionon primary key and a secret key. If this hash value iseven, jth LSB of the attribute values is set to 0; otherwise,it is set to 1. The watermark detection algorithm startsby identifying marked tuples, the marked attribute andmarked bit. This bit is matched with the embedded bitand a threshold for matching bit count is computed.If the match count is greater than or equal to thisthreshold, watermark detection is said to be successful.To extend this work, Agrawal et al. in [37], [2] used apseudo-random sequence generator to select tuples forwatermarking with a user specified gap parameter. Theauthors did not use any mechanism for data usability af-ter watermark embedding and used one bit watermark.As a result, such techniques are vulnerable to simpleattacks. For instance, shifting of only one bit results indeletion of watermark bits without a significant loss ofdata usability. As a result, such techniques are not well-suited for datasets which can tolerate changes in the LSBpositions. A number of recent techniques like [38], [39],[40], [41], [42], [43], [44], [45] extend the work of [11] andembed multi-bit watermark in selected LSBs; while, [46]embeds watermark in third most significant bit (MSB) of

7

Fig. 3. The building block for classification of watermarking techniques.

the numeric attribute. Similarly, in [51] the candidate bitfor watermarking is computed.

The watermarking techniques in [47] and [48] pro-vided mechanism for providing ownership protectionfor categorical data. The tuples ”fit” for watermarkingare selected by using a message authentication code(MAC) which also takes a secret key k1 as an inputparameter. In the next step, a secret value for a cate-gorical attribute in ”fit” tuples is generated and its LSBis set to a value based on securely computed position.This position is computed by another secret key k2. Indecoding phase, the ”fit” tuples from the watermarkeddata are again computed using the same procedure as inthe watermark encoding phase. Watermark bits are thendecoded by setting wm data[msb(H(Tj(K), k2), b(N

e ))] =t&1; where wmdata is watermarked table, Tj denotesthe jth tuple, K is the primary key, N is the mappingof categorical data into bit strings, e determines thetuples considered for watermarking, and t represents abit value. An error correction code (e.g., majority voting)is also applied to eliminate the watermark decoding er-rors. Though these are first major contributions towardswatermarking of categorical attributes but these tech-niques are not suited for sensitive categorical data (e.g.medical data) where changing the value of a categoricalattribute may lead to incorrect information regardingan important entity. Moreover, if data is updated, thewatermark may be disturbed specially if a query altersa large number of tuples. For numeric data, Cui et al. in[55] modified the technique of [48].

In [49], a unique id is calculated for each tuple of thetable using a hash function. The next step is to partitionthe data into p partitions with each partition having thesame number of records and then mark them subjectto usability constraints. The usability constraints includestatistical measurement constraints, semantic constraintsand structural constraints. The authors use mean andvariance of the data as statistical usability constraintswhile the semantic and structural constraints are speci-fied by SQL statements. If these constraints are violatedfor any selected tuple, that tuple is excluded during wa-termark detection phase to improve watermark decodingaccuracy. In watermark decoding stage, the tuples’s ids

are again computed using the same hash function anddata is partitioned. The bits are detected from each par-tition probabilistically and the majority voting is used aserror correction mechanism. The technique suffers fromsynchronization errors due to insertion and deletion ofnew tuples because ids of tuples forming the partitionboundaries will no longer remain the same as during thewatermark encoding stage.

In [50], Bertino et al. proposed a hierarchical water-marking for outsourced medical data. They use the sim-ilar approach for watermarking as in [48] after binningthe medical data and embed the watermark in differenthierarchies. The hierarchies are defined with the help ofuser roles. Generalization (trees)nodes are used to depictthese hierarchies.

In a fragile watermarking technique proposed in [52],watermark embedding is done in such a way that itforms a grid – which aids in identifying modificationsmade to the watermarked database relation. Duringwatermark decoding stage, tuples are again grouped andtwo verification vectors are formed for each group todetect the two embedded watermarks. These two vectorsare also used to identify any modifications made to thewatermarked data.

In [56] Cui et al. proposed a watermarking techniqueusing combination of private key and public key. Atrusted third party IPR is informed about the publickey and the data owner is the only one who knowsthe secret private key. In watermark insertion phase,a mixed code is formed using the public key and theprivate key. For watermark embedding, the appropriatebit position is determined using the hash function andthe LSB of the attribute is marked. During watermarkdecoding, the mixed code is computed again and themajority voting scheme is applied to correctly locate themarked bits. Finally, a map transformation is used todecode the embedded watermark.

Wang et al. [57] applied v different watermarkingschemes on a database relation R as a preprocessing stepand then used the best one for watermark embedding. Ascheme is selected on the basis of its impact on the data– the lower impact is preferred in the objective function.

Meng et al. [58] used genetic algorithm (GA) for

8

generation of watermark. The chromosome is a bit stringof zeros and ones. The best chromosome is used asa watermark. The watermark encoding and decodingalgorithm is same as proposed in [55].

A shadowed watermark based technique was pro-posed in [60] by Xian et al. The data is watermarkedwith a secret watermark key and a secret shadowed keywhich is different for every data user. A trusted water-mark server (TWS) watermarks the data and assigns theshadowed key to the data user. When illegal data leakageis detected, the shadowed key of the data user is usedto identify the actual source of data leakage; as a result,the innocent user is not falsely accused of data leakage.Although this techniques tends to protect an innocentuser but it also suffers from the same shortcomings assuffered by the technique presented in [2].

Table 1 shows the summary and a comparison ofwell known BPBBRTs with no specific watermark infor-mation. This table compares different techniques basedon certain parameters depicted in Figure 2. Note thatthe following nomenclature has been used in tables 1through 12.

Column 1: Sch. provides a reference to a watermark-ing technique.

Column 2: MAT refers to marked attribute data type.This column can have values: All (all data types), N(numeric data), C (categorical data), and AN (alpha-numeric data).

Column 3: ASM is acronym for attribute selectionmethod and refers to the procedure of selecting at-tribute(s) for watermarking. This column can have val-ues: Arbitrary (attribute(s) for watermarking is/are ar-bitrarily chosen by data owner), Weighted (based onweights of attribute(s) weight), PRSG based (pseudo-random sequence generator (PRSG) based), All (all at-tributes are selected for watermarking), SHF based (se-cure hash function (SHF) based), Property (based onsome particular properties of attribute), information gain(IG) based, mutual information (MI) based, and Indexbased (database Index based).

Column 4: TSM stands for tuple selection methodand refers to the procedure of selecting tuple(s) forwatermarking. This column can have values: Ar-birary (tuple(s) for watermarking is/are arbitrarily cho-sen), PRSG based (pseudo-random sequence generator(PRSG) based), All (all tuples are selected for watermark-ing), SHF based (secure hash function (SHF) based), andIndex based (Database Index based).

Column 5(for robust techniques): GL is granularitylevel and refers to the basic unit where the watermarkis embedded. This column can have values: Bit (bitlevel), Value (attribute value), Tuple (tuple level), Table(database table level), and Index (database index level).

Column 5(for fragile techniques): Loc. refers to local-ization levels (if any) to which a fragile technique is ableto locate the tempered data. This column can have val-ues: Group (referring to a data block or partition level),Tuple(record level), , and - (referring to the fact that

the technique is not able to localize the data temperingattacks).

Column 6: DMB shows whether a decoding method isblind or not. This column can have values: Blind, Semi-blind, and Not blind (original data is required duringwatermark decoding).

Column 7: ECM is abbreviation of error correctionmechanism and refers to error correction mechanismadapted by the watermark decoding scheme. This col-umn can have values: - (no error detection mechanismis used by the decoding scheme), Voting (the majorityvoting scheme is used for error detection), CRC (thecyclic redundancy check is used as an error detectionmechanism), and EPC (the even parity check is used asan error detection mechanism).

Column 8: Dep. refers to dependencies (if any) of thewatermark decoding scheme for successful watermarkdecoding. This column can have values:

• PK: The successful watermark decoding depends onprimary key,

• Order: The successful watermark decoding dependson the order of tuples in the watermarked database,if the order of tuples is changed then the watermarkdecoding accuracy may decrease,

• Value: The successful watermark decoding dependson the watermarked attribute(s) value, if the valueof the attribute(s) is changed then the watermarkdecoding accuracy may decrease,

• Markers: The successful watermark decoding de-pends on special marker tuples which are used toidentify the boundaries of data partitions in the wa-termarked database, if the position of these markertuples is changed then the watermark decodingaccuracy may decrease,

• Unique Column: The successful watermark decod-ing depends on unique identifying column(s), ifthe value of the unique identifying column(s) ischanged then the watermark decoding accuracymay decrease.

• Information gain (IG): The successful watermarkdecoding depends on information gain of the at-tributes, if the value of the information gain ofattributes is changed then the watermark decodingaccuracy may decrease.

Column 9: Dist. is used for to show distortions intro-duced by watermark embedding scheme. This columncan have values: Yes (the watermark embedding resultsin distortions in original data), and No (the watermarkembedding does not introduce distortions in originaldata).

Column 10: Opti. refers to any optimization strategyused in the proposed technique. This column can havevalues: Yes, and No.

Column 11 (for robust techniques): Rev. refers to thefact that whether the proposed technique is reversible ornot. This column can have values: Yes(reversible), andNo(irreversible).

9

Column 11 (for fragile techniques): Char. refers tothe fact that whether the proposed fragile technique isable to identify (characterize) the type of attack (tuple in-sertion, deletion, and alteration attack). This column canhave values: Yes(the technique is able to characterize theattack), and No(the technique is not able to characterizethe attack).

4.1.1.2 BPBBRT with specific watermark informa-tion: In [61], Zhang et al. used the database contentcharacteristics for watermarking relational databases. Inthis technique, a tuple is selected for watermarkingbased on a random number between 0 and 1. A functionInterString(Abi,K, S) is used to extract a substringfrom Abi (binary equivalent of the selected numericattribute for watermarking), where K is a secret key, Sis the length of substring. The mark bit is embedded atthe end of the binary value of attribute A2. In water-mark detection phase, InterString is again computedand the mark bit is matched with the last bit of A2;if the mark bit is correctly detected, a match count isupdated. If this match count is greater than a certainthreshold, the watermark is successfully detected. Thedecoding accuracy of this scheme suffers if an attackerby launching alteration attacks alters the attribute valueand as a consequence its binary representation.

Qin et al. [62] used the idea of chaotic random num-bers for embedding watermarks in relational databaseto overcome the shortcomings of [2]. Authors generatethese numbers by using logistic equation and processonly the integral part of these numbers. The logisticchaos equation (LCE) used is Xn+1 = µxn(1−xn); where,µε[1, 4] and n is the number of tuples in the database.

In [63], Guo et al. converted a meaningful watermarkto a bit string. The authors then locate the candidateattributes and candidate bits. The database is partitionedinto variable but similar sized groups. The ith bit of thewatermark is embedded into tuples of ith group.

The authors have also proposed a recovery algorithm,if watermark is located in the data. For this purpose,the detected bits for the subsets subset 0 (zero bits) andsubset 1 (one bits) are counted and the bit having largertotal count is supposed to be actual embedded bit.

In [64], a technique addressing additive attacks on thewatermarked relational databases has been proposed. Inthe watermark insertion phase of this technique, a realvalued attribute is converted into its binary equivalentand a bit is extracted from the integral part of a realvalued attribute (A). This bit is then embedded as thewatermark into fractional part of the same attribute if(len(int(A))) > (pos1+ε) and len(frac(A)) < cap, where,pos1 is the LSB of the integer part at position 1, ε isa parameter used to control the amount of distortionintroduced by bit embedding and cap is the numberof bits allocated for fraction part of the number. In thewatermark decoding stage, the value of the fractionalpart is stored by locating the watermark bit in thefractional part and replacing it by the oldBit whichwas marked in the watermark embedding phase. The

secondary watermark attacks are handled by computingthe watermarks of each party (claiming the ownershipof the data) and a party pi is considered as the legitimateowner of a data xi if and only if their watermark isdetected in xi. This technique is able to provide a goodsolution to an important problem of additive attacks butis not very resilient to other attacks.

Other such techniques are presented in [65], [66], [26],and [67] but for brevity, we omit their details becausethey apply small modifications to the techniques dis-cussed above.

Table 2 shows the summary and comparison of wellknown BPBBRTs with specific watermark informationbased on different parameters depicted in Figure 2.

4.1.2 Image based bit-resetting techniquesYong et al. in [68] presented a image based watermarkingtechnique to verify ownership on relational databases.First a subset of tuples is selected for watermarkingusing a secure hash function and then ordered RGBvalues of the image are embedded as a watermark inthe database relation. For watermark detection, the samehash function is applied to identify the marked tuplesand embedded pixel values are decoded. A similar ap-proach was recently proposed in [69].

Zhou et al. [70] used BMP image as a watermark inthe database. The BMP image is first split into two parts:the header part and the main information of the image,i.e., data array. Bose-Chaudhuri-Hocquenhem (BCH) [71]code is then generated as a watermark from the dataarray of the image. Authors show the resilience of pro-posed technique again additive-attacks. They suggest theidea of a trusted third party (TTP) to investigate aboutthe actual owner of the database.

In [72], Wang et al. proposed a watermarking tech-nique for relational data by applying Arnold transform[73] on an image with data partitioning based watermarkembedding and decoding algorithms. This technique isvulnerable to tuple insertion, alteration and deletion at-tacks. Furthermore, a small bit flipping attack effects thewatermark decoding accuracy as the modulo operationyields different results.

Hu et al. also presented an image based watermarkingalgorithm for relational databases in [74]. Almost similarwork is presented by Sun et al. in [75]. They transformthe image into a bit flow which is used as a watermark.In [76], authors use image elements to embed an image-based watermark in the database relation. Like [72], suchtechniques technique also face the same challenges ofrobustness.

The technique in [42] also use image watermark fortextual attributes and numerical attributes. In [77], au-thors use SVR along with FP-tree for reversible databasewatermarking. Another such technique was proposed in[19] where authors use image (other files such as audiocan also be used for watermarking) to create watermarkbits and then embed the same using a partitioned basedapproach.

10

TABLE 1Summary and comparison of well known BPBBRTs with no specific watermark information.

Robust TechniquesSch. MAT ASM TSM GL DMB ECM Dep. Dist. Opti. Rev.[11] N Arbitrary SHF Bit Blind - PK, Yes No No

based Value[2] N Arbitrary PRSG Bit Blind - PK, Yes No No

based Value[48] C Arbitrary SHF Bit Blind Voting PK, Yes No No

based Value[38] N Arbitrary SHF Bit Blind PRSG PK, Yes No No

based based Value[57] N Arbitrary SHF Bit Blind Voting PK, Yes No No

based ValueFragile Techniques

Sch. MAT ASM TSM Loc. DMB ECM Dep. Dist. Opti. Char.[52] N SHF SHF Tuple Blind - PK, Yes No Yes

based based Value

TABLE 2Summary and comparison of well known BPBBRTs with specific watermark information.

Robust TechniquesSch. MAT ASM TSM GL DMB ECM Dep. Dist. Opti. Rev.[62] N All Arbitrary Bit Blind - PK, Value Yes No No[63] N Arbitrary SHF Bit Blind Voting PK, Value Yes No No

based[64] N Arbitrary SHF Bit Not Blind - PK, Value Yes No No

Table 3 shows the summary of well known IBBRTsbased on certain parameters shown in Figure 2.

4.1.3 Bit-resetting fingerprinting techniquesFingerprinting techniques in [78] embed a meaning-ful fingerprint of the data to protect the interests ofbuyer. The authors used the MAC on a secret key anda user(buyer) identifier to compute the fingerprint oflength L with L > logN and N being the numberof buyers. Database relations generally have a primarykey attribute which may be used (along with otherrequired parameters for hash function) for calculatingthe fingerprints. The authors also propose three differentapproaches to fingerprint relations that do not have anyprimary key. In their first approach(S-Scheme), the au-thors have considered the candidates(for watermarking)bits of a single numeric attribute as the virtual primarykey. This approach suffers from two main problems: (1)The candidate bits may not be unique for each tuple(duplicate problem); and (2) if the virtual primary keyis dropped by the attacker, the fingerprint informationis lost (deletion problem). In their second approach(E-Scheme), Li et al. investigate the value of each numericattribute in a tuple. Next, a virtual primary key isconstructed from each attribute. This approach solves thedeletion problem but the duplicate problem still persistsin this approach. In the third approach(M-Scheme), bitpositions are dynamically selected to construct a virtualprimary key. In this approach, different attributes are se-lected for different tuples and hence duplication problemis resolved. Moreover, the fingerprint is not embedded inonly one attribute; therefore, dropping selected attributesdoes not result in the fingerprint deletion problem.

In [79], Li et al. have selected tuples for fingerprintingusing a hash function. The mark embedding and detec-tion scheme is same as proposed in [70].

Liu et al. [80] used blocks of bits that were availablefor marking to embed for fingerprint.

In [81] and [82], the idea of query optimization forfingerprinting relational databases subject to meetingusability constraints was proposed. The authors usedeclarative language to define usability constraints forwatermarking and fingerprinting relational databases.They optimize the watermark embedding by search-ing for specific patterns such as aggregates and joincomputations. The idea of using query optimizationfor fingerprinting is interesting but the dependency offingerprint decoding on the usability constraints maylead to incorrect fingerprint detection after an attack.

Fingerprinting techniques proposed by Guo et al. in[83] embed fingerprints in the database at two levels. Atthe first level, the data is partitioned into m partitionsand m length fingerprint is embedded in the data byembedding ith fingerprint bit in the selected tuples of ith

partition. At the second level, the fingerprint is extractedfrom the fingerprinted tuples and a confidence level isassigned to the extracted fingerprint. This fingerprint isused as a secret key for the second level. The tuplesalready fingerprinted are not selected for fingerprintembedding at this level to avoid conflict between thefingerprint embedding at both levels.

In [84], proposed a fingerprinting technique to identifya malicious traitor, who is redistributing digital con-tent. When a user access the database a fingerprintingprocess is invoked. A manager module performs thefingerprinting task and a parameter manager module isused to store the fingerprinting parameters. Authenticityof users is verified by the manager module; if the useris unauthentic, the fingerprinting process is executed.The encoding and decoding algorithm is the same asproposed in [63].

Table 4 shows the comparison and summary of wellknown BRFTs based on different parameters shown inFigure 2.

11

TABLE 3Summary of well known IBBRTs.

Robust TechniquesSch. MAT ASM TSM GL DMB ECM Dep. Dist. Opti. Rev.[70] N Arbitrary SHF Bit Semi Blind Voting PK, Value Yes No No

based[19] N Arbitrary SHF Bit Blind Voting Value Yes Yes No

based

TABLE 4Comparison and summary of well known BRFTs.

Robust TechniquesSch. MAT ASM TSM GL DMB ECM Dep. Dist. Opti. Rev.[78] N PRSG PRSG Bit Blind Voting PK, Value Yes No No

based based[79] N PRSG PRSG Bit Blind Voting PK, Value Yes No No

based based[80] N All All Bit Yes - PK, Value Yes No No[83] N Arbitrary SHF Bit Blind - PK, Value Yes No No

based[84] N Arbitrary SHF Bit Blind - PK, Value Yes No No

based[82] N Arbitrary PRSG Bit Blind Voting PK, Value Yes No No

based

4.2 Data statistics-modifying techniques

We further classify such techniques into two categories:

• Bit pattern based data statistics-modifying tech-niques(BPBDSMT): In such techniques, a bit patternis used as a watermark and these bits are embeddedinto data statistics.

• Image based data statistics-modifyingtechniques(IBDSMT): Image is used for watermarkembedding in data statistics in such techniques.

4.2.1 Bit pattern based data statistics-modifying tech-niques

4.2.1.1 BPBDSMT with no specific watermark in-formation: The watermarking technique in [24] usesspecial marker tuples to virtually partition the data intomaximal number of subsets. The number of partitionsshould be equal to number of subsets. One watermarkbit is embedding into each subset subject to usabil-ity constraints. If the number of subsets are greaterthan the number of bits in the watermark, an errorcorrection mechanism (e.g., majority voting) is usedto make the watermarking technique resilient to themalicious attacks. The marker tuples are used in thewatermark decoding stage for data partitioning andsuccessful decoding of the watermark. Two arbitrarilychosen thresholds are used to decode the watermarkbits. This technique has two important shortcomings: (1)use of marker tuples for data partitioning makes thetechnique vulnerable to synchronization errors duringwatermark decoding because the tuples deletion andinsertion attacks on the position of these tuples willchange the boundaries of the partitions and hence theembedded bit in those partitions would not be correctlydetected; and (2) storing marker tuples for their use indecoding stage makes this technique not-blind. More-over, the thresholds, used for decoding an embedded bit,are chosen arbitrarily that also result in decoding errors.Another such technique as presented by Sebe et al. in

[85] for numerical datasets while preserving the meanand variance of the watermarked attribute.

In [86], the database is first partitioned into m non-overlapping logical groups. In this technique, bit encod-ing was modeled as an optimization problem and thenoptimization algorithms were used to optimize one-bitwatermark embedding subject to usability constraintsspecified by the data owner. Authors used a thresh-old based approach and a majority voting scheme todecode the watermark. This scheme is robust againsttuple deletion, alteration and insertion attacks but it doesnot specify how to select attribute for watermarking.Deshpande et al. also proposed a similar watermarkingtechnique in [87], that is a combination of [24] and [86].

Huang et al. [88] presented a cluster based watermark-ing algorithm for relational databases. During the em-bedding phase, the data is first clustered into k clustersusing k-means clustering algorithm [89]. Such techniquesare suspectable to alteration attacks if an attribute valueis modified by the attacker to an extent where the clusterof a attribute is changed.

Kamran and Farooq coined the idea of information-preserving watermarking in [90]. This technique is fo-cused on watermarking of relational databases contain-ing patient records ( such databases are also termedas electronic health records (EHR) or electronic medi-cal records (EMR)). Authors first identify the numericattributes for watermarking using information gain asa measure. Then they select attributes with low in-formation gain for watermarking such that change intheir values does not have any impact on the diagnosisprediction by the data mining algorithms. Authors alsospecify equations for calculating the bounds for changesin the numeric features. The change is then optimizedand embedded in the selected attributes in every rowof the EMR subject to usability constraints. Watermarkdecoding also utilizes the knowledge of information gainof the attributes. A watermark decoder is used to decodewatermark bits from each row of the marked attributes.

12

A majority voting scheme is applied to combat decodingerrors. Another interesting feature of this technique isthat it does not require any secret key for watermarkembedding or decoding yet this scheme is resilient totuple insertion, deletion and alteration attacks becausewatermark is embedded in each row of the table.

For minimizing (or controlling) data distortions, Kam-ran et al. in [18] limit the to-be-watermarked tuples bysuing a threshold and even hash values; while for highrobustness is achieved through once-for-all usability con-straints. Similarly, In [14], Saman et al. extended thework of [90] to come up with a reversible techniquewhile considering mutual information MI instead ofinformation gain IG for attribute selection. Other suchtechniques are presented in [21] and [17] where au-thors consider the histogram modulation (in [21]) andsemantic properties (in [17]) of numeric attributes duringwatermark embedding to control the data distortions.

Table 5 shows the comparison and summary of wellknown BPBDSMTs with no specific watermark informa-tion based on different parameters shown in Figure 2.

4.2.1.2 BPBDSMT with specific watermark infor-mation: In [91], Zhang et al. proposed a watermark-ing scheme using cloud models. They form the cloudmodel using different parameter and later watermark thedatabase using these parameters. In the detection phase,a backward cloud generation algorithm is used to extractthe embedded watermark. Finally, a cloud algorithm isused for matching clouds of watermark embedding anddecoding. The watermark decoding is not blind becausethe original database is required during the decodingphase.

In [92], Zhang et al. proposed a histogram expan-sion based reversible watermarking scheme for relationaldatabases. For watermark embedding, authors use thehistogram expansion techniques employing overheadinformation. The overhead information is used to distin-guish the original digits and the watermarked digits. Au-thors use the Haar wavelet transform for watermarkeddatabase attribute identification. For embedded decod-ing, the inverse of watermarking process is applied. Suchtechniques need to keep a track of overhead information.

Fu et al. [93] used spread spectrum for selectingdatabase tuples for watermarking. A function W (t, k) isalso used to yield the watermark signal W = +1,−1by using direct sequence spread spectrum and thenthis signal is embedded in the data as watermark. Inthe watermark decoding phase, tuples are grouped, at-tributes are sorted and watermark is detected using evenparity check. Finally, majority voting scheme is appliedto eventually decode the watermark bits. This techniqueis vulnerable to alteration attacks that may alter thevalues of marked attributes.

In [94] a discrete wavelet transform (DWT) [95] basedwatermarking technique for relational databases wasproposed. A data X is watermarked as CW = C +αφ(β)Ω, where C is the original carrier, CW is thecarrier after watermark embedding, Ω is the watermark

template sequence, α is the intensive factor and L is thelength of watermark string. Ω is generated, by a secretkey, as a pseudo random sequence. This scheme suffersform lack of robustness as embedding of watermarkin only high frequency coefficients make it easy forthe attacker to target one particular group of tuples tocorrupt the watermark.

Table 6 shows the comparison and summary of wellknown BPBDSMTs with specific watermark informationbased on different parameters shown in Figure 2.

4.2.2 Image based data statistics-modifying techniquesIn [96], Zhang et al. first divided into logical groups.Each group has its size equal to the size of an image. Thepixel values (v0, v1, v2, ....., vm) are embedded into thelogical groups of the data. A pixel value vi is embeddedin such a way that if vi = 255, the attribute value modulo3 is set to 1 and if vi = 0, the attribute value modulo3 is set to 2; otherwise, the pixel value is added tothe attribute value. In the watermark detection phase,the attribute value modulo 3 is computed. If it returns1 then vi = 255 is detected and if vi = 0 then 2 isdetected; otherwise, a pixel value vi is calculated bysubtracting the integer part of the attribute value Ai froman overall attribute value Ai. The result is multipliedby 255 yielding the value of vi, and hence watermark isdecoded. If the attribute value is changed by an attacker,the watermark decoding accuracy is degraded.

In [97] Tsai et al. also use image for fragile watermark-ing of database using support vector regression (SVR).The purpose of such techniques is to detect maliciousdatabase tampering.

In [98] Odeh and Al-Haj proposed a watermarkingscheme for relational databases using an image. In wa-termark embedding phase, an image is converted tobit strings, bits are grouped with each group havingsize of 5 bits. The bits are then converted to a decimalnumber. A time attribute is selected for watermark em-bedding. For this purpose, seconds portion of the timeattribute is marked. In watermarking detection phase,the marked tuples are identified by using secret key,and the appended decimal number is extracted fromseconds part of time attribute. The number is convertedto bits, and bits are grouped in a group of 5 bits. Image isagain constructed using these bits. For experiments, theauthors have created their own database with multipletime attributes but in practice most databases usually donot have a great number of such attributes.

In [99], an image is first converted into a stream ofwatermark bits consisting of 0’s, and 1’s. The watermarkis embedded in the fractional part of the numeric at-tribute. A hash function is applied to select the tuplesfor watermarking. For a user ID, a database DBname,with version version, the secret key K is computed usingthe relation: K = H(ID||DBname||version|...). Authorsassume two intensities for each pixel, the original inten-sity and the predicted intensity and use their differenceto encode watermark.

13

TABLE 5Comparison and summary of well known BPBDSMTs with no specific watermark information.

Robust TechniquesSch. MAT ASM TSM GL DMB ECM Dep. Dist. Opti. Rev.[24] N Arbitrary SHF Value Semi Blind Voting PK, Yes No No

based Markers[85] N Arbitrary Arbitrary Value Semi Blind - Value Yes No No[86] N Arbitrary All Value Blind Voting PK Yes Yes No[90] N IG All Value Semi Blind Voting IG Yes Yes No[18] N Arbitrary SHF Value Blind Voting Value Yes No No[21] N Arbitrary SHF Value Blind Voting Value Yes No Yes[14] N MI All Value Semi Blind Voting MI Yes Yes Yes[17] N Arbitrary SHF Value Blind - Value Yes No No

TABLE 6Comparison and summary of well known BPBDSMTs with specific watermark information.

Robust TechniquesSch. MAT ASM TSM GL DMB ECM Dep. Dist. Opti. Rev.[91] N Arbitrary Arbitrary Value Not - PK Yes No No

Blind[92] N Arbitrary Arbitrary Value Blind - PK, Value Yes No Yes[93] N Arbitrary SHF Value Blind EPC, Voting PK Yes No No

based[67] N SHF SHF Value Blind Voting PK, Value Yes No No

based based

Table 7 shows the comparison and summary of wellknown IBDSMTs based on different parameters shownin Figure 2.

4.3 Constrained Data Content-ModifyingTechniquesWe further classify such techniques into two categories:• Tuple based constrained data content-modifying

techniques(TBCDCMT): Such techniques modifydatabase contents modification at the tuple level.

• Attribute based constrained data content-modifyingtechniques(ABCDCMT): Such techniques involvemodification of database content at an attributelevel.

• An image based constrained data content-modifyingtechniques(IBCDCMT): Image is used, for water-mark embedding in such techniques.

4.3.1 Tuple based constrained data content-modifyingtechniques

4.3.1.1 TBCDCMT with no specific watermark in-formation: Gross-Amblard [100] proposed a query-preserving watermarking of XML documents and rela-tional databases. For database watermarking, the authorused Gaifman graph4 while preserving the database localqueries. These queries are expressed in plain SQL fordatabase operations like adding grouping and aggregatefunctions. The database tuples are marked in such a waythat the query result remains unaltered. Such techniques,however, are vulnerable to tuple insertion, deletion andalteration attacks because the values of the marked tuplemay be disturbed by these attacks.

Li et al. [102] proposed a watermarking scheme fordetecting alterations in the database relation containingcategorical data based on the tuples order. For increasing

4. “A Gaifman graph is the graph in which nodes are the variablesof the problem and an edge joins a pair of variables if the two variablesoccur together in a constraint” [101].

the watermark embedding capacity in such algorithms,authors suggest to use Myrvold and Ruskeys linearpermutation unranking algorithm [103] for exchangingthe position of tuples for distortion free watermarking[104]. Such schemes are aimed at watermark fragility;therefore, they can not be used for ownership protection.Another distortion-free Fragile Watermarking techniquein [34] works for categorical data. A similar techniqueis proposed in [105] and it uses tuple pairs for frag-ile distortion-free watermarking. Other such techniqueshave been presented in [106] and [107].

Kamel [108] proposed the idea of using R-tree for wa-termarking relational databases to protect the integrity ofdatabases. In this technique R-tree is used to represententries in the relation as minimum bounding rectangle(MBR). In the watermark encoding phase, a one-to-one mapping function between all permutations of theentries and the numeric value of watermark is per-formed. A factorial number system is used to convert thedecimal value of watermark to factorial form. Watermarkembedding starts by sorting the entries in an R-tree nodeon the basis of a reference order Ek. The sorting ordermay be on the basis of: (1) x value at the MBR; (2) thearea of MBR; and (3) the perimeter of MBR. A circular-left-shift operation is applied on the selected entries ofR-tree for mark encoding. The integrity of databasesis checked by verifying the order of MBR coordinates(reference orders). If an attack aims at modifying thisorder, the watermark decoding accuracy is affected. In[109], Kamel et al. used the idea of a sensitive attributefor tuple ordering.

Hamadou et al. proposed a zero-watermarking tech-nique in [36] to counter additive watermarking attacks.The technique is named zero-watermarking because wa-termark is not actually inserted in the database insteadits information is registered with the certification author-ity (CA). Another technique to counter additive attackis presented in [110].

Table 8 shows the comparison and summary of well

14

TABLE 7Comparison and summary of well known IBDSMTs.

Robust TechniquesSch. MAT ASM TSM GL DMB ECM Dep. Dist. Opti. Rev.[96] N Arbitrary All Value Blind - PK Yes No No[99] N SHF SHF Value Blind Voting PK Yes No No

based based

known TBCDCMTs with no specific watermark informa-tion based on different parameters shown in Figure 2.

4.3.1.2 TBCDCMT with specific watermark infor-mation: A publicly verifiable watermarking scheme wasproposed by Li et. al in [111]. This technique works forwatermarking attributes with any data type includingnumeric, strings or boolean. The watermark key is a publickey that is generated by applying one-way hash func-tions on personal information (e.g., owner’s identity) ofthe author and the database information (e.g., databasename and version). The watermark W is generated usingthis secret key. The watermark is also a database relationand has the same number of attributes and tuples as inthe original database relation R. The number of water-mark bits are decided by using a watermark generationparameter. A pseudo random sequence generator is usedfor watermark embedding. This scheme can detect anymodifications made to MSBs but changes made in otherbits cannot be detected using this technique.

Bhattacharya et al. [112] proposed a modified versionof watermarking technique presented in [111] by using aprivate secret key instead of a public key. The techniqueperforms the data partitioning and extracts a binarywatermark as an image for proving the data ownership.

Other such techniques are presented in [113], and[114] but they are slight modifications of the techniquesdiscussed above.

Table 9 shows the summary and comparison of wellknown TBCDCMTs with specific watermark informationbased on different parameters shown in Figure 2.

4.3.2 Attribute based constrained data content-modifying techniques

4.3.2.1 ABCDCMT with no specific watermark in-formation: In [115] Halder et al. introduced the notion ofpersistent watermarking of relational databases. For thispurpose, this technique identifies two type of featuresfrom the database: (1) Stable cells are the database cellsthat are not affected by any set of queries Q (Select,Insert, Update, and Delete) at all; and (2) Semantic prop-erties. The stable part and semantic properties are wa-termarked separately. In watermark embedding phase,database is first partitioned into m non-overlappingpartitions. The watermark is converted into a bit stringof length m. The stable part and semantic propertiesare watermarked such that the usability constraints aresatisfied. In the watermark decoding phase, data par-titioning is performed and embedded bits are detectedfrom semantic properties and the stable cell. Only thetuples having same primary keys in watermarked andsuspected databases are used for watermark detection.

In [116] Halder et al. extend their work by embeddingboth the private and the public watermark. The purposeof private watermarking is proving the ownership of thedata and the public key is used to check the authenticityof data without worrying for revealing secret parame-ters.

In [117] a query preserving watermarking was beenproposed. This technique selects alphanumeric attributesfor watermarking and alters the case of the attributevalue such that the results of a query are preserved.A secret key is used to select tuple and alphanumericattribute for watermark embedding. The letters of se-lected attribute values are changed to a sentence case if awatermark bit is 0; otherwise, it the value is converted totitle case. The watermark detection is the inverse processof watermark embedding. This technique is not resilientagainst tuple alteration attacks because the attacker canchange the case of the attribute values without degrad-ing the data quality.

In [20], a distortion-free watermarking technique hasbeen proposed that works on all type of attributes andalso considers the importance of attributes for differ-ent application domains. The watermark encoding anddecoding is based on secret ordering for non-numericattributes while for numeric-feature the technique of [90]is extended by using a secret threshold for the informa-tion gain IG of the attributes. Other such techniques arepresented in [118], [119], and [114] but for brevity (andto adhere to the page limit of the Journal), they are notdiscussed here.

Table 10 shows the comparison and summary of wellknown ABCDCMTs with no specific watermark infor-mation based on different parameters shown in Figure2.

4.3.2.2 ABCDCMT with specific watermark infor-mation: The techniques like [120], [121] are based onaddition of a new column in the database relation. Thevalues in the new column are calculated by aggregatingthe numeric columns present in a database relation. Theauthors suggest to lock this column using a secret keyto ensure watermark security. But they do not clearlyhighlight the locking mechanism. The major drawbackof such techniques is that the watermark information islost by dropping only one attribute from the databaserelation without loss of data usability.

Bedi et al. [122] proposed a watermarking approachfor non-numeric data. They first generate a secret keyfrom eigen values of tuple-Relation matrix for a tuple.A low impact attribute (e.g., address) is selected forwatermark embedding. In another technique, Bedi et al.[123] use abbreviations of the words for watermarking.

15

TABLE 8Comparison and summary of well known TBCDCMTs with no specific watermark information.

Robust TechniquesSch. MAT ASM TSM GL DMB ECM Dep. Dist. Opti. Rev.

Fragile TechniquesSch. MAT ASM TSM Loc. DMB ECM Dep. Dist. Opti. Char.[102] C Arbitrary SHF Tuple Blind - PK, Order No No No

based[108] All Index Index - Blind - Reference No No No

based based order

TABLE 9Summary and comparison of well known TBCDCMTs with specific watermark information.

Robust TechniquesSch. MAT ASM TSM GL DMB ECM Dep. Dist. Opti. Rev.[111] All All All Table Blind - PK, Value No No No[112] All All All Table Blind - PK, Value No No No

Similarly, in [124], vowels are used to embed the wa-termark. Such techniques are vulnerable to alterationattacks and even benign updates (changing of valuesof marked attributes) might result in watermark losswithout significantly degrading the data quality.

Another such technique for medical data is proposedin [125]. This technique is based on shifting of attributesbased on the histogram constructed from the tuple val-ues. This technique works with categorical attributes andis aimed to ensure data integrity.

The technique in [23] uses numeric attributes to gener-ate and verify a distortion-free fragile watermark for de-tecting and characterizing the malicious modifications inthe database relations. For numerical attibutes, authorsuse digit frequency, length and range of data values forwatermark generation and verification. This techniqueis also able to characterize the malicious attacks (ormodifications).

Table 11 summarizes the well known ABCDCMTs.

4.3.3 Image based constrained data content-modifyingtechniquesIn [126], a table T with attributes (P,A1, A2, ....., Aj) iswatermarked to detect tampering with the database. Awatermark WM of length N is computed from an image.An image SD is also generated from the certificationverification code to use it for watermark detection. Forthe certification code verification, watermark detectionprocess starts by computing certification codes by ap-plying public key on the image SD. All features arecombined to yield a vector C ′. Finally, an XOR operationis applied on C ′ and certification code to compute thewatermark WM ′. If it matches with embedded water-mark then database integrity is proved otherwise it istampered.

In [35], Al-Haj et al. proposed a watermarking schemefor non-numeric databases based on insertion of adouble-space in multi-word attribute. This technique isnot resilient to alteration attack because an attacker caneasily remove double spaces from the database withoutaffecting the data usability. Moreover, benign updatesalso corrupt the embedded watermark for such tech-niques.

In [127], database is first partitioned and a pseudorandom sequence generator is used to select an attributefor watermarking. This technique does not reset any bitof the original data instead it uses m most significant bits(MSBs) and n LSBs of the selected field to generate thewatermark for that field such that m + n = 8. The wa-termark for a particular partition is converted to a grayscale image, so the value of each cell ranges between 0and 255. The watermark does not bring any distortionin the database. In the watermark detection process,the database is again partitioned and the watermark isgenerated using the same procedure as in watermarkembedding. The watermark bits are matched. If thematch count is equal to ω (with ω = rows × columns),the watermark is successfully decoded. A zero-distortionauthentication watermarking [128] is used for authenti-cating the table without any distortions. This techniqueis only useful for proving integrity of data.

Table 12 shows the comparison and summary of wellknown IBCDCMTs based on different parameters shownin Figure 2.

5 APPLICATIONS, COMPARISON AND FUTUREDIRECTION FOR DIFFERENT TECHNIQUES

In this section, we compare the three classes – BRT,DSMT, and CDCMT – of techniques using two con-texts: (1) robustness against malicious attacks; and (2)data usability (again there parameters were short-listedfrom different parameters listed in Figure 2). In thefirst context, we report the resilience of the watermark-ing/fingerprinting techniques against malicious attacksthat have been discussed in section 1. In the data usabil-ity context, we report methods of ensuring data quality(used in the watermarking/fingerprinting techniques)and depict scenarios where usability is a serious concern.The robustness of each type of techniques is depicted inTable 13. In this table R1, R2, R3, R4, and R5 denote therobustness of watermarking techniques against A1, A2,A3, A4, and A5 attacks respectively.

As depicted in table 13, the DSMT techniques arerelatively better against most type of attacks. ut attacksof type A4 and A5 still expose vulnerabilities of these

16

TABLE 10Comparison and summary of well known ABCDCMTs with no specific watermark information.

Robust TechniquesSch. MAT ASM TSM GL DMB ECM Dep. Dist. Opti. Rev.[20] All IG All Value Blind Voting IG No Yes No

Fragile TechniquesSch. MAT ASM TSM Loc. DMB ECM Dep. Dist. Opti. Char.[116] N Property PRSG - Blind Voting PK No No No

based[117] AN SHF SHF - Blind - PK No No No

based based

TABLE 11Comparison and summary of well known ABCDCMTs.

Fragile TechniquesSch. MAT ASM TSM Loc. DMB ECM Dep. Dist. Opti. Char.[23] N Property All - Blind - Value No No Yes

techniques. In A4 type attacks of the attacker launchesa combination of the A1, A2, A3 attacks. Kamran andFarooq address this attack in [90] but most of othertechniques do not consider these type of attacks intheir robustness study. However, most of the techniques donot consider these type of attacks in their robustness study;therefore, next generation watermarking techniques shouldincorporate mechanisms that counter A4 attacks.

In A5 type attack, Mallory claims false ownership ofthe data by inserting his watermark into Alice’s wa-termarked data. The techniques presented in [70], [39],[110], [36], [129], [64] provide mechanisms to countersuch attacks, but these techniques require the role ofthird party with assumptions like: (1) an attacker will notchange the name of the attributes in a database relation;(2) an attacker would not want to disturb the Alice’swatermark to combat data errors; and (3) the additivewatermark will significantly degrade the data quality; asa consequence, Alice does not need to claim the owner-ship of this type of compromised data. It is important toemphasize that Mallory can surpass these assumptionswhile mantaining, for example, simply changing thenames of attributes does not affect the data quality.Moreover, Mallory may also want to corrupt the Alice’swatermark during A5 type attack. Additionally, data hasintrinsic value; therefore, it is not a great idea for Aliceto withheld ownership claim of the data. So, A5 typeof attacks also pose a new challenge to researchers of thisfield. Such challenges are also valid for fingerprintingtechniques because they also require the watermark tobe robust.

The fragile watermarking techniques provide databaseintegrity and tamper proofing in various applicationdomains because they are sensitive to low intensitymalicious attacks. But in some situations it can favor theattacking scenarios in which Mallory can safely claim theownership of databases watermarked by such techniquesby attacking a small part of the marked database. So, wethink this also pose a unique challenge to the researchersof this field to come with fragile watermarking tech-niques that can also provide the mechanism for owner-ship protection for more generic applications that mightrequire both tamperproofing and ownership protection.

In order to emphasize the problem of defining usabil-ity constraints, Table 14 shows the usability constraintsdefining method and applicability of BRT, DSMT, andCDCMT type of techniques. All techniques require anowner to manually define the usability constraints andthis process is application dependent; therefore an ownermight need to define different usability constraints foreach type of intended use of the data. This method iscumbersome because the data owner has to model adifferent set of usability constraints after mutual con-sensus with the intended user. Moreover, he will haveto watermark a given database for every type of user.This activity might expose the watermarked rows to anattacker because he may easily compare the rows of twodatasets and guess the watermarked rows by calculatingthe difference between the corresponding rows of twodatasets. Therefore, there is a need of a model that automati-cally defines the data usability constraints such that the dataquality is ensured and the data owner is not required to wastehis resources to come up with an agreed (after mutual con-sensus with the intended data user) data usability constraints.This model should ideally be application independent;therefore, it is used for proving data ownership, tamperproofing, and data integrity, and for other applicationsof watermarking and fingerprinting. This model shouldencompass the intelligence to assure the data quality ofthe watermarked database for its potential use in everytype of intended application. Moreover, the significanceof a meaningful watermark (for instance, a biometricfeature like voice, image and so on) can also be testedand emphasized.

6 CONCLUSION

In this paper, a comprehensive review of the databasewatermarking and fingerprinting techniques, proposedto date, has been presented. The classification of tech-niques has been done on the basis of watermarkingtechnique and the orientation of the inserted watermark.Using this nomenclature, three types of top level classesare presented: BRT, DSMT, and CDCMT. A comparisonof these techniques – on the basis of robustness againstmalicious attacks, and the method for defining usabilityconstraints defining method – has also been given. The

17

TABLE 12Comparison and summary of well known IBCDCMTs.

Fragile TechniquesSch. MAT ASM TSM Loc. DMB ECM Dep. Dist. Opti. Char.[126] All All All Tuple Blind - PK Yes No No[127] All PRSG All - Blind - PK No No No

based

TABLE 13Robustness of watermarking techniques against malicious attacks

Class of Techniques R1 R2 R3 R4 R5

Usually resilient except Usually resilient except Usually not resilient Usually not resilient Usually not resilienttechniques that techniques that: because small due to facts depicted apart from techniques

BRT use specific order use specific order manipulations can easily in R1, R2, and R3. that are specifically(or position) of tuples (or position) of tuples disturb the marked bits designed for tacklingfor watermark for watermark insertion. and even multi-bit attacks of type A5

insertion. watermarks can be corrupted. and they still havelimitations.

Usually resilient except Usually resilient except Usually resilient. Usually techniques Usually not resilienttechniques that techniques that that mark more tuples apart from techniquesuse specific order use specific order are more robust to that are specifically(or position) of tuples (or position) of tuples such attacks. But still designed for tackling

DSMT for watermark for watermark insertion. this type of attack attacks of type A5

insertion. needs to be studied and they still havein depth because most of limitations.the techniques proposedso far do notdiscuss this attack.

Usually resilient except Usually not resilient Usually not resilient Usually not resilient Usually not resilienttechniques that because deletion attack because small due to the facts depicted apart from techniquesuse specific order may cause deletion of manipulations can easily in R1, R2, and R3. that are specifically

CDCMT (or position) of tuples newly inserted content disturb the marked bits designed for tacklingor tuples for if new content is inserted without degrading the attacks of type A5

watermark insertion. during watermark data usability significantly. and they still haveembedding. limitations.

DSMT techniques appear to be the best among differentavailable alternatives, but they do have certain limita-tions in different contexts. We have also pointed out thedirections for future research in this and related areas.To conclude, intelligent watermarking technique thatensure data quality and also are efficient and effectiveagainst malicious attacks is an important requirement.

REFERENCES[1] E. Bertino and R. Sandhu, “Database security - concepts, ap-

proaches, and challenges,” IEEE Transactions on Dependable andsecure computing, vol. 2, no. 1, pp. 2–19, 2005.

[2] R. Agrawal, P. Haas, and J. Kiernan, “Watermarking relationaldata: framework, algorithms and analysis,” The VLDB Journal,vol. 12, no. 2, pp. 157–169, 2003.

[3] R. Sion, “Rights assessment for relational data,” in Data Manage-ment in Decentralized Systems. T. Yu and S. Jajodia, Eds. Springer,2007, pp. 427–457.

[4] J. Palsberg, S. Krishnaswamy, M. Kwon, D. Ma, Q. Shao, andY. Zhang, “Experience with software watermarking,” in 16thAnnual Conference on Computer Security Applications ACSAC’00.IEEE, 2000, pp. 308–316.

[5] M. Atallah, V. Raskin, C. Hempelmann, M. Karahan, R. Sion,U. Topkara, and K. Triezenberg, “Natural language watermark-ing and tamperproofing,” in Information Hiding. Springer, 2003,pp. 196–212.

[6] X. Jin, Z. Zhang, J. Wang, and D. Li, “Watermarking spatial tra-jectory database,” in Database Systems for Advanced Applications.Springer, 2005, pp. 56–67.

[7] Z. Yan, “Mobile digital rights management,” in Seminar onNetwork Security 2001. HUT TML, 2001, pp. 1–16.

[8] M. Goodrich, M. Atallah, and R. Tamassia, “Indexing informa-tion for data forensics,” in Applied Cryptography and NetworkSecurity. Springer, 2005, pp. 206–221.

[9] A. Abdel-Hamid, S. Tahar, and E. Aboulhamid, “A survey onip watermarking techniques,” Design Automation for EmbeddedSystems, vol. 9, no. 3, pp. 211–227, 2004.

[10] Y. J. Kumar, M. Chinna Rao et al., “Unauthorized data accessdetection by insertion of duplicate data records,” InternationalJournal of Research in Computer and Communication Technology,vol. 1, no. 7, pp. 538–542, 2013.

[11] R. Agrawal and J. Kiernan, “Watermarking relational databases,”in Proceedings of the 28th international conference on Very Large DataBases. VLDB Endowment, 2002, pp. 155–166.

[12] K. Jawad and A. Khan, “Genetic algorithm and difference ex-pansion based reversible watermarking for relational databases,”Journal of Systems and Software, vol. 86, no. 11, pp. 2742–2753,2013.

[13] Z. Li, J. Liu, and W. Tao, “A novel relational database watermark-ing algorithm based on clustering and polar angle expansion,”International Journal of Security and Its Applications, vol. 7, no. 2,pp. 1–14, 2013.

[14] S. Iftikhar, M. Kamran, and Z. Anwar, “Rrw - a robust and re-versible watermarking technique for relational data,” Knowledgeand Data Engineering, IEEE Transactions on, vol. 27, no. 4, pp.1132–1145, 2015.

[15] P. A. Kharade, V. M. Pokale, V. D. Chougule, and V. V. Man-gave, “Speech based watermarking for non-numeric relationaldatabase,” International Journal of Innovative Research in Engineer-ing & Science, vol. 4, no. 2, pp. 10–16, 2013.

[16] L. Gao, D. Wang, and A. Hamadou, “New fragile database wa-termarking scheme with restoration using reed-solomon codes,”Journal of Computational and Theoretical Nanoscience, vol. 10.

[17] J. Franco-Contreras and G. Coatrieux, “Robust watermarking ofrelational databases with ontology-guided distortion control,”IEEE Transactions on Information Forensics and Security, vol. 10,no. 9, pp. 1939–1952, 2015.

[18] M. Kamran, S. Suhail, and M. Farooq, “A robust, distortionminimizing technique for watermarking relational databasesusing once-for-all usability constraints,” Knowledge and DataEngineering, IEEE Transactions on, vol. 25, no. 12, pp. 2694–2707,2013.

[19] V. Khanduja, O. P. Verma, and S. Chakraverty, “Watermarkingrelational databases using bacterial foraging algorithm,” Multi-media Tools and Applications, vol. 74, no. 3, pp. 813–839, 2015.

[20] M. Kamran and M. Farooq, “A formal usability constraints

18

TABLE 14Usability constraints defining method and applicability of watermarking techniques

Class of Techniques Usability constraints defining method ApplicabilityBRT Application dependent, defined by the mutual Proving data ownership. Fragile

consensus of the data owner and the data user. techniques can be used for temper detection.DSMT Application dependent, defined by the mutual Proving data ownership. Fragile

consensus of the data owner and the data user. techniques can be used for temper detection.CDCMT Application dependent, defined by the mutual Proving data ownership

consensus of the data owner and the data user. and data integrity.Also, protect against tampering.

model for watermarking of outsourced datasets,” InformationForensics and Security, IEEE Transactions on, vol. 8, no. 6, pp. 1061–1072, 2013.

[21] J. Franco-Contreras, G. Coatrieux, F. Cuppens, N. Cuppens-Boulahia, and C. Roux, “Robust lossless watermarking of rela-tional databases based on circular histogram modulation,” IEEETransactions on Information Forensics and Security, vol. 9, no. 3, pp.397–410, 2014.

[22] L. Camara, J. Li, R. Li, F. Kagorora, and D. Hanyurwimfura,“Block-based scheme for database integrity verification,” Inter-national Journal of Security and Its Applications, vol. 8, no. 6, pp.25–40, 2014.

[23] A. Khan and S. A. Husain, “A fragile zero watermarking schemeto detect and characterize malicious modifications in databaserelations,” The Scientific World Journal, vol. 2013, no. 3, pp. 1–16,2013.

[24] R. Sion, M. Atallah, and S. Prabhakar, “Rights protection for re-lational data,” IEEE transactions on knowledge and data engineering,vol. 16, no. 12, pp. 1509–1525, 2004.

[25] A. Al-Haj, A. Odeh, and S. Masadeh, “Copyright protectionof relational database systems,” in International Conference onNetworked Digital Technologies. Springer, 2010, pp. 143–150.

[26] H. Wang, X. Cui, and Z. Cao, “A speech based algorithm forwatermarking relational databases,” in International Symposiumson Information Processing (ISIP), 2008. IEEE, 2008, pp. 603–606.

[27] Z. Yong, N. Xia-mu, A. Khan, L. Qiong, and H. Qi, “A novelmethod of watermarking relational databases using characterstring,” in Proceedings of the 24th IASTED international conferenceon Artificial intelligence and applications. ACTA Press, 2006, pp.120–124.

[28] R. Halder, S. Pal, and A. Cortesi, “Watermarking techniquesfor relational databases: Survey, classification and comparison,”Journal of Universal Computer Science, vol. 16, no. 21, pp. 3164–3190, 2010.

[29] R. Sion and M. Atallah, “Attacking digital watermarks,” inElectronic Imaging 2004. International Society for Optics andPhotonics, 2004, pp. 848–858.

[30] J. Lafaye, “An analysis of database watermarking security,” inInformation Assurance and Security, 2007. IAS 2007. Third Interna-tional Symposium on. IEEE, 2007, pp. 462–467.

[31] M. K. Rathva and G. Sahani, “Watermarking relationaldatabases,” International Journal of Computer Science, Engineeringand Applications, vol. 3, no. 1, pp. 71–79, 2013.

[32] B. B. Mehta and H. D. Aswar, “Watermarking for security indatabase: A review,” in Conference on IT in Business, Industry andGovernment (CSIBIG), 2014. IEEE, 2014, pp. 1–6.

[33] S. Iftikhar, M. Kamran, and Z. Anwar, “A survey on reversiblewatermarking techniques for relational databases,” Security andCommunication Networks, vol. 8, no. 15, pp. 2580–2603, 2015.

[34] S. Bhattacharya and A. Cortesi, “A distortion free watermarkframework for relational databases,” in Proceedings of the 4thInternational Conference on Software and Data Technologies (ICSOFT2009), 2009, pp. 229–234.

[35] A. Al-Haj and A. Odeh, “Robust and blind watermarking ofrelational database systems,” Journal of Computer Science, vol. 4,no. 12, pp. 1024–1029, 2008.

[36] A. Hamadou, X. Sun, L. Gao, and S. A. Shah, “A fragilezero-watermarking technique for authentication of relationaldatabases,” International Journal of Digital Content Technology andits Applications, vol. 5, no. 5, pp. 189–200, 2011.

[37] R. Agrawal, P. Haas, and J. Kiernan, “A system for watermarkingrelational databases,” in Proceedings of the 2003 ACM SIGMOD

International Conference on Management of Data. ACM, 2003, pp.674–674.

[38] X. Xiao, X. Sun, and M. Chen, “Second-lsb-dependent robustwatermarking for relational database,” in Third InternationalSymposium on Information Assurance and Security. IEEE, 2007,pp. 292–300.

[39] R. Manjula and N. Settipalli, “A new relational watermarkingscheme resilient to additive attacks,” International Journal ofComputer Applications, vol. 10, no. 5, pp. 1–7, 2010.

[40] A. Hamadou, X. Sun, S. A. Shah, and L. Gao, “A weight-basedsemi-fragile watermarking scheme for integrity verification ofrelational data,” International Journal of Digital Content Technologyand its Applications, vol. 5, no. 8, pp. 148–157, 2011.

[41] V. Khanduja and O. Verma, “Identification and proof of owner-ship by watermarking relational databases,” International Journalof Information and Electronics Engineering, vol. 2, no. 2, pp. 274–277, 2012.

[42] L. Zhang, W. Gao, N. Jiang, L. Zhang, and Y. Zhang, “Relationaldatabases watermarking for textual and numerical data,” inMechatronic Science, Electric Engineering and Computer (MEC), 2011International Conference on. IEEE, 2011, pp. 1633–1636.

[43] B. Rao and M. Prasad, “Subset selection approach for water-marking relational databases,” in International Conference on DataEngineering and Management. Springer, 2012, pp. 181–188.

[44] S. Iqbal, A. Rauf, S. Mahfooz, S. Khusro, and S. H. Shah, “Self-constructing fragile watermark algorithm for relational databaseintegrity proof,” World Applied Sciences Journal, vol. 19, no. 9, pp.1273–1277, 2012.

[45] W. Yanmin and G. Yuxi, “The digital watermarking algorithm ofthe relational database based on the effective bits of numericalfield,” in World Automation Congress (WAC), 2012. IEEE, 2012,pp. 1–4.

[46] R. K. Tiwari and G. Sahoo, “A novel watermarking scheme forsecure relational databases,” Information Security Journal: A GlobalPerspective, vol. 22, no. 3, pp. 105–116, 2013.

[47] R. Sion, “Proving ownership over categorical data,” in DataEngineering, 2004. Proceedings. 20th International Conference on.IEEE, 2004, pp. 584–595.

[48] R. Sion, M. Atallah, and S. Prabhakar, “Rights protection forcategorical data,” IEEE Transactions on Knowledge and Data En-gineering, vol. 17, no. 7, pp. 912–926, 2005.

[49] M. Huang, J. Cao, Z. Peng, and Y. Fang, “A new watermarkmechanism for relational data,” in Computer and InformationTechnology, 2004. CIT’04. The Fourth International Conference on.IEEE, 2004, pp. 946–950.

[50] E. Bertino, B. Ooi, Y. Yang, and R. Deng, “Privacy and ownershippreserving of outsourced medical data,” in Proceedings of 21stInternational Conference on Data Engineering (ICDE), 2005. IEEE,2005, pp. 521–532.

[51] T.-L. Hu, G. Chen, K. Chen, and J.-X. Dong, “Garwm: Towards ageneralized and adaptive watermark scheme for relational data,”in International Conference on Web-Age Information Management.Springer, 2005, pp. 380–391.

[52] H. Guo, Y. Li, A. Liu, and S. Jajodia, “A fragile watermarkingscheme for detecting malicious modifications of database rela-tions,” Information Sciences, vol. 176, no. 10, pp. 1350–1378, 2006.

[53] C. Jin, Y. Fu, and F. Tao, “The watermarking model for relationaldatabase based on watermarking sharing,” in Proceedings of the2006 International Conference on Intelligent Information Hiding andMultimedia Signal Processing (IIH-MSP’06). IEEE, 2006, pp. 677–680.

[54] A. Shamir, “How to share a secret,” Communications of the ACM,vol. 22, no. 11, pp. 612–613, 1979.

19

[55] X. Cui, X. Qin, G. Sheng, and J. Zheng, “A robust algorithm forwatermark numeric relational databases,” in Intelligent Controland Automation. Springer, 2006, pp. 810–815.

[56] H. Cui, X. Cui, and M. Meng, “A public key cryptography basedalgorithm for watermarking relational databases,” in Interna-tional Conference on Intelligent Information Hiding and MultimediaSignal Processing. IEEE, 2008, pp. 1344–1347.

[57] Y. Wang, Z. Zhu, F. Liang, and G. Jiang, “Watermarking relationaldata based on adaptive mechanism,” in International Conferenceon Information and Automation, 2008. ICIA 2008. IEEE, 2008, pp.131–134.

[58] M. Meng, X. Cui, and H. Cui, “The approach for optimizationin watermark signal of relational databases by using geneticalgorithms,” in International Conference on Computer Science andInformation Technology 2008. IEEE, 2008, pp. 448–452.

[59] G. Gupta and J. Pieprzyk, “Reversible and blind database wa-termarking using difference expansion,” in Proceedings of the1st international conference on Forensic applications and techniquesin telecommunications, information, and multimedia and workshop.ICST (Institute for Computer Sciences, Social-Informatics andTelecommunications Engineering), 2008, pp. 24–29.

[60] H. Xian and D. Feng, “Leakage identification for secret relationaldata using shadowed watermarks,” in International Conference onCommunication Software and Networks, 2009. ICCSN’09. IEEE,2009, pp. 473–478.

[61] Y. Zhang, X. Niu, D. Zhao, J. Li, and S. Liu, “Relationaldatabases watermark technique based on content characteristic,”in Proceedings of the First International Conference on InnovativeComputing, Information and Control - Volume 3. IEEE ComputerSociety, 2006, pp. 677–680.

[62] Z. Qin, Y. Ying, L. Jia-jin, and L. Yi-shu, “Watermark based copy-right protection of outsourced database,” in 10th InternationalDatabase Engineering and Applications Symposium, 2006. IDEAS’06.IEEE, 2006, pp. 301–308.

[63] F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li, “An improvedalgorithm to watermark numeric relational data,” in InternationalWorkshop on Information Security Applications. Springer, 2006, pp.138–149.

[64] “Database relation watermarking resilient against secondary wa-termarking attacks,” in Gupta, G. and Pieprzyk, J. Springer, 2009,pp. 222–236.

[65] X. Cui, X. Qin, and G. Sheng, “A weighted algorithm forwatermarking relational databases,” Wuhan university journal ofnatural sciences, vol. 12, no. 1, pp. 79–82, 2007.

[66] Y. Ali and B. Mahdi, “Watermarking for relational databaseby using threshold generator,” Engineering & Technology Journal,vol. 29, no. 1, pp. 33–44, 2011.

[67] M. E. Farfoura, S.-J. Horng, J.-L. Lai, R.-S. Run, R.-J. Chen,and M. K. Khan, “A blind reversible method for watermarkingrelational databases based on a time-stamping protocol,” ExpertSystems with Applications, vol. 39, no. 3, pp. 3185–3196, 2012.

[68] Y. Zhang, X. Niu, D. Wu, L. Zhao, J. Liang, and W. Xu, “Amethod of verifying relational databases ownership with imagewatermark,” in The 6th International Symposium on Test and Mea-surement, Dalian, PR China, 2005, pp. 6316–6319.

[69] U. P. Rao, D. R. Patel, and P. M. Vikani, “Relational database wa-termarking for ownership protection,” in International Conferenceon Communication, Computing & Security, 2012, pp. 988–995.

[70] X. Zhou, M. Huang, and Z. Peng, “An additive-attack-proofwatermarking mechanism for databases’ copyrights protectionusing image,” in Proceedings of the 2007 ACM symposium onApplied computing. ACM, 2007, pp. 254–258.

[71] P. Shankar, “On bch codes over arbitrary integer rings,” IEEETransactions on Information Theory, vol. 25, no. 4, pp. 480–483,1979.

[72] C. Wang, J. Wang, M. Zhou, G. Chen, and D. Li, “Atbam:An arnold transform based method on watermarking relationaldata,” in International Conference on Multimedia and UbiquitousEngineering. IEEE, 2008, pp. 263–270.

[73] V. Arnol’d and A. Avez, Ergodic problems of classical mechanics.Benjamin New York, 1968.

[74] Z. Hu, Z. Cao, and J. Sun, “An image based algorithm forwatermarking relational databases,” in Proceedings of the 2009International Conference on Measuring Technology and MechatronicsAutomation - Volume 01. IEEE Computer Society, 2009, pp. 425–428.

[75] J. Sun, Z. Cao, and Z. Hu, “Multiple watermarking relationaldatabases using image,” in International Conference on MultiMediaand Information Technology. IEEE, 2008, pp. 373–376.

[76] H. Sardroudi and S. Ibrahim, “A new approach for relationaldatabase watermarking using image,” in 5th International Confer-ence on Computer Sciences and Convergence Information Technology(ICCIT), 2010. IEEE, 2010, pp. 606–610.

[77] J.-N. Chang and H.-C. Wu, “Reversible fragile database water-marking technology using difference expansion based on svrprediction,” in International Symposium on Computer, Consumerand Control (IS3C), 2012. IEEE, 2012, pp. 690–693.

[78] Y. Li, V. Swarup, and S. Jajodia, “Constructing a virtual primarykey for fingerprinting relational data,” in ACM Digital RightsManagement Workshop. ACM, 2003, pp. 133–141.

[79] Y. Li, V. Swarup, and S. Jajodia, “Fingerprinting relationaldatabases: Schemes and specialties,” IEEE Transactions on De-pendable and Secure Computing, vol. 2, no. 1, pp. 34–45, 2005.

[80] S. Liu, S. Wang, R. Deng, and W. Shao, “A block orientedfingerprinting scheme in relational database,” in InternationalConference on Information Security and Cryptology –ICISC 2004.Springer, 2005, pp. 455–466.

[81] C. Constantin, D. Gross-Amblard, and M. Guerrouani, “Water-mill: an optimized fingerprinting system for highly constraineddata,” in Proceedings of the 7th workshop on Multimedia and security.ACM, 2005, pp. 143–155.

[82] J. Lafaye, D. Gross-Amblard, C. Constantin, and M. Guerrouani,“Watermill: An optimized fingerprinting system for databasesunder constraints,” IEEE Transactions on Knowledge and DataEngineering, pp. 532–546, 2008.

[83] F. Guo, J. Wang, and D. Li, “Fingerprinting relational databases,”in Proceedings of the 2006 ACM symposium on Applied computing.ACM, 2006, pp. 487–492.

[84] M. Zhou, J. Wang, C. Wang, and D. Li, “A novel fingerprintingarchitecture for relational data,” in IEEE International Conferenceon Digital EcoSystems and Technologies Conference. IEEE, 2007, pp.477–480.

[85] F. Sebe, J. Domingo-Ferrer, and A. Solanas, “Noise-robust water-marking for numerical datasets,” in International Conference onModelling Decisions for Artificial Intelligence. Springer, 2005, pp.134–143.

[86] M. Shehab, E. Bertino, and A. Ghafoor, “Watermarking relationaldatabases using optimization-based techniques,” IEEE Transac-tions on Knowledge and Data Engineering, vol. 20, no. 1, pp. 116–129, 2008.

[87] A. Deshpande and J. Gadge, “New watermarking technique forrelational databases,” in 2nd International Conference on EmergingTrends in Engineering and Technology (ICETET), 2009. IEEEComputer Society, 2009, pp. 664–669.

[88] K. Huang, M. Yue, P. Chen, Y. He, and X. Chen, “A cluster-based watermarking technique for relational database,” in FirstInternational Workshop on Database Technology and Applications,2009. IEEE, 2009, pp. 107–110.

[89] S. Lloyd, “Least squares quantization in pcm,” IEEE Transactionson Information Theory, vol. 28, no. 2, pp. 129–137, 1982.

[90] M. Kamran and M. Farooq, “An information-preserving wa-termarking scheme for right protection of emr systems,” IEEETransactions on Knowledge and Data Engineering, vol. 24, no. 11,pp. 1950–1962, 2012.

[91] Y. Zhang, X. Niu, and D. ZHAO, “A method of protecting rela-tional databases copyright with cloud watermark,” InternationalJournal of Information Technology, vol. 1, no. 3, pp. 206–210, 2005.

[92] Y. Zhang, B. Yang, and X. Niu, “Reversible watermarking forrelational database authentication,” International Journal of Infor-mation Technology, vol. 17, no. 2, pp. 29–66, 2006.

[93] Y. Fu, C. Jin, and C. Ma, “A novel relational database watermark-ing algorithm,” in Intelligence and security informatics: Pacific AsiaWorkshop, PAISI 2007, Chengdu, China. Springer, 2007, pp. 208–219.

[94] C. Jiang, X. Chen, and Z. Li, “Watermarking relational databasesfor ownership protection based on dwt,” in Fifth International-Conference on Information Assurance and Security, 2009. IAS’09.,vol. 1. IEEE, 2009, pp. 305–308.

[95] M. Shensa, “The discrete wavelet transform: Wedding the a trousand mallat algorithms,” IEEE Transactions on Signal Processing,vol. 40, no. 10, pp. 2464–2482, 1992.

[96] Z. Zhang, X. Jin, J. Wang, and D. Li, “Watermarking relationaldatabase using image,” in Proceedings of the Third International

20

Conference on Machine Learning and Cybernetics 2004. IEEE, 2004,pp. 1739–1744.

[97] M. Tsai, F. Hsu, J. Chang, and H. Wu, “Fragile database wa-termarking for malicious tamper detection using support vectorregression,” in Proceedings of the 3rd International Conference onInternational Information Hiding and Multimedia Signal Processing(IIH-MSP 07). IEEE Computer Society, 2007, pp. 493–496.

[98] A. Odeh and A. Al-Haj, “Watermarking relational databasesystems,” in First International Conference on the Applications ofDigital Information and Web Technologies, 2008. ICADIWT 2008.IEEE, 2008, pp. 270–274.

[99] M. Farfoura and S. Horng, “A novel blind reversible methodfor watermarking relational databases,” in IEEE InternationalSymposium on Parallel and Distributed Processing with Applications.IEEE, 2010, pp. 563–569.

[100] D. Gross-Amblard, “Query-preserving watermarking of rela-tional databases and xml documents,” in ACM SIGMOD-SIGACT-SIGART Conference on Principles of Database Systems.ACM, 2003, pp. 191–201.

[101] F. Rossi, P. Van Beek, and T. Walsh, Handbook of constraintprogramming. Elsevier, 2006.

[102] Y. Li, H. Guo, and S. Jajodia, “Tamper detection and localizationfor categorical data using fragile watermarks,” in ACM DigitalRights Management Workshop. ACM, 2004, pp. 73–82.

[103] W. Myrvold and F. Ruskey, “Ranking and unranking permuta-tions in linear time,” Information Processing Letters, vol. 79, no. 6,pp. 281–284, 2001.

[104] Y. Li, “Database watermarking: A systematic view,” in Handbookof Database Security. Springer, 2008, pp. 329–355.

[105] R. Arun, K. Praveen, D. Chandra Bose, and H. Nath, “A dis-tortion free relational database watermarking using patch workmethod,” in Proceedings of the International Conference on Informa-tion Systems Design and Intelligent Applications 2012 (INDIA 2012)held in Visakhapatnam, India, January 2012. Springer, 2012, pp.531–538.

[106] J. Guo, “Fragile watermarking scheme for tamper detection ofrelational database,” in International Conference on Computer andManagement (CAMAN), 2011. IEEE, 2011, pp. 1–4.

[107] M. Li, W. Zhao, and J. Guo, “An asymmetric watermarkingscheme for relational database,” in IEEE 3rd International Con-ference on Communication Software and Networks (ICCSN), 2011.IEEE, 2011, pp. 180–184.

[108] I. Kamel, “A schema for protecting the integrity of databases,”Computers & Security, vol. 28, no. 7, pp. 698–709, 2009.

[109] I. Kamel and K. Kamel, “Toward protecting the integrity ofrelational databases,” in World Congress on Internet Security(WorldCIS), 2011. IEEE, 2011, pp. 258–261.

[110] X. Chen, P. Chen, Y. He, and L. Li, “A self-resilience digitalimage watermark based on relational database,” in InternationalSymposium on Knowledge Acquisition and Modeling, 2008. KAM’08.IEEE, 2008, pp. 698–702.

[111] Y. Li and R. Deng, “Publicly verifiable ownership protection forrelational databases,” in Proceedings of the 2006 ACM Symposiumon Information, Computer and Communications Security. ACM,2006, pp. 78–89.

[112] S. Bhattacharya and A. Cortesi, “A generic distortion free water-marking technique for relational databases,” International Confer-ence on Information Systems Security, pp. 252–264, 2009.

[113] H. El-Bakry and N. Mastorakis, “A new watermark approach forprotection of databases,” in Proceedings of the 9th WSEAS interna-tional conference on Applied informatics and communications. WorldScientific and Engineering Academy and Society (WSEAS), 2009,pp. 243–248.

[114] H. El-Bakry and M. Hamada, “A novel watermark techniquefor relational databases,” International Conference on ArtificialIntelligence and Computational Intelligence, pp. 226–232, 2010.

[115] R. Halder and A. Cortesi, “Persistent watermarking of relationaldatabases,” in Proceedings of the 2010 International Conferenceon Advances in Communication, Network, and Computing. IEEEComputer Society, 2010, pp. 46–52.

[116] R. Halder and A. Cortesi, “A persistent public watermarkingof relational databases,” International Conference on InformationSystems Security, pp. 216–230, 2011.

[117] S. Shah, S. Xingming, H. Ali, and M. Abdul, “Query preservingrelational database watermarking,” Informatica, Journal of Com-puting and Informatics, vol. 35, no. 3, pp. 391–396, 2011.

[118] D. Hanyurwimfura, Y. Liu, and Z. Liu, “Text format based re-lational database watermarking for non-numeric data,” in Inter-national Conference on Computer Design and Applications (ICCDA),2010. IEEE, pp. 312–316.

[119] E. Uzun and B. Stephenson, “Security of relational databasesin business outsourcing,” Technical Report HPL-2008-168, p. 168,2008.

[120] G. Gamal, M. Rashad, and M. Mohamed, “A simple watermarktechnique for relational database,” Mansoura Journal for ComputerScience and Information Systems, vol. 4, no. 4, pp. 1–7, 2008.

[121] V. Prasannakumari, “A robust tamperproof watermarking fordata integrity in relational databases,” Research Journal of Infor-mation Technology, vol. 1, no. 3, pp. 115–121, 2009.

[122] R. Bedi, A. Thengade, and M. Vijay, “A new watermarking ap-proach for non-numeric relational database,” International Journalof Computer Applications, vol. 13, no. 7, pp. 37–40, 2011.

[123] K. Rajneesh, G. Purva, G. Poonam, and K. Ashish, “A uniqueapproach for watermarking non-numeric relational database,”International Journal of Computer Applications, vol. 36, no. 7, pp.9–14, 2011.

[124] V. Khanduja, A. Khandelwal, A. Madharaia, D. Saraf, andT. Kumar, “A robust watermarking approach for non numericrelational database,” in International Conference on Communication,Information & Computing Technology (ICCICT), 2012, 2012, pp. 1–5.

[125] G. Coatrieux, E. Chazard, R. Beuscart, and C. Roux, “Losslesswatermarking of categorical attributes for verifying medical database integrity,” in IEEE Annual International Conference of theEngineering in Medicine and Biology Society, EMBC, 2011. IEEE,2011, pp. 8195–8198.

[126] M. Tsai, H. Tseng, and C. Lai, “A database watermarkingtechnique for temper detection,” in Proceedings of the 2006 JointConference on Information Sciences (JCIS 2006), October, 2006, pp.8–11.

[127] S. Bhattacharya and A. Cortesi, “Database authentication bydistortion free watermarking,” in 5th International Conference onSoftware and Data Technologies, ICSOFT 2010, 2010, pp. 219–226.

[128] Y. Wu, “Zero-distortion authentication watermarking,” Interna-tional Conference on Information Security, pp. 325–337, 2003.

[129] Y. Li, V. Swarup, and S. Jajodia, “Defending against addi-tive attacks with maximal errors in watermarking relationaldatabases,” Research Directions in Data and Applications SecurityXVIII, pp. 81–94, 2004.

M. Kamran got his BS and MS degree in Com-puter Science in 2005 and 2008 respectively.Currently he is working as Assistant Professor ofComputer Science at COMSATS Institute of In-formation Technology Wah Campus, Wah Cantt,Pakistan. His research interests include the useof machine learning, evolutionary computations,big data analytic, data security, and decisionsupport systems.

Muddassar Farooq completed his D.Sc. in in-formatics from Technical University of Dortmund,Germany, in 2006. Currently, he is working asProfessor and Vice Chancellor at Muslim YouthUniversity, Islamabad, Pakistan. He is also thedirector of Next Generation Intelligent NetworksResearch Center (nexGIN RC). His research in-terests include nature inspired applied systems,nature inspired computing, and network securitysystems. He has authored several papers andbook chapters in these areas.

a comprehensive survey of watermarking relational ... · 1 a comprehensive survey of watermarking...

Documents