1 theoretical modeling and analysis of content fingerprinting
TRANSCRIPT
![Page 1: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/1.jpg)
1
Theoretical Modeling and Analysis of
Content FingerprintingAvinash L. Varna,Student Member, IEEE,and Min Wu,Senior Member, IEEE.
Abstract
Multimedia identification via content fingerprints is used in many applications, such as content
filtering on user-generated content websites, automatic multimedia identification and tagging. A compact
“fingerprint” is computed for each multimedia signal that captures robust and unique properties of the
perceptual content, which is used for identifying the multimedia. Several different multimedia finger-
printing schemes have been proposed in the literature and have been evaluated through experiments.
To complement these experimental evaluations and provide guidelines for choosing system parameters
and designing better schemes, this paper develops models for content fingerprinting and provides an
analysis of the identification performance under these models. Firstly, fingerprinting schemes that generate
independent and equally likely fingerprint bits are examined and bounds are obtained on the identification
accuracy. Guidelines for choosing the fingerprint length toattain a desired accuracy are derived, and it
is shown that identification with a false alarm requirement is similar to joint source-channel coding.
A Markov Random Field based model for fingerprints with correlated components is proposed and a
statistical physics inspired approach for computing the probability of detection is described. The analysis
shows that the commonly used Hamming distance detection criterion is susceptible to correlations among
fingerprint bits, whereas the optimal log-likelihood ratiodecision rule yields5−20% improvement in the
accuracy over a wide range of correlations. Simulation results demonstrate the validity of the theoretical
predictions.
Index Terms
Content fingerprinting, content identification, error exponents, Markov Random Fields, Wang-Landau
density of states estimation.
The authors are with the Department of Electrical and Computer Engineering, University of Maryland, College Park, MD20742, USA (Email:[email protected], [email protected]).
DRAFT
![Page 2: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/2.jpg)
2
I. INTRODUCTION
In recent years, user generated content (UGC) websites suchas Youtube have grown in popularity
and revolutionized multimedia consumption and distribution. Increasingly, the Internet is being seen as
a medium for delivering multimedia content to consumers. These new distribution channels have also
raised concerns about the posting of copyrighted content onUGC websites [1]. Several UGC websites are
deploying content filtering schemes to identify and filter such copyrighted videos. These filtering schemes
rely on an emerging technology calledcontent fingerprintingto identify multimedia content uploaded to
the UGC sites.
A fingerprint is a compact signature that represents robust and unique characteristics of the multimedia
and can be used to identify the document. Content fingerprints are used for identifying multimedia in a
variety of applications in multimedia management. Fingerprints are employed by services such as Shazam,
Midomi, VCAST, etc. to perform automatic music identification. Given a noisy recording of an audio
captured using a mobile phone, these services identify the original audio track and provide metadata
information, such as the album, where to buy the track, etc. Fingerprints have also been used to perform
automatic tagging of audio collections and create automatic playlists based on user preferences [2].
Watermarking, which is a proactive technique wherein a special watermark signal is embedded into the
host at the time of content creation, can also be used for content identification. This embedded signal can
later be extracted and used to identify the content and retrieve associated metadata [3]. Watermarking
techniques are suitable if the embedder has control over thecontent creation stage. This requirement
may be difficult to satisfy in many practical applications, including content filtering on UGC sites. In
particular, a large volume of existing multimedia content does not have embedded watermarks and cannot
be identified using this approach. Content fingerprints, on the other hand, do not require access to the
content at the time of creation and can be used to identify existing multimedia content that does not have
embedded information.
Content fingerprints are designed to be robust to minor content preserving operations while being able
to discriminate between different multimedia objects. At the same time, the fingerprints must be compact
to allow for efficient matching with databases containing millions of multimedia works with different
content. In this respect, content fingerprinting shares similarities with robust hashing [4], [5]. Traditionally,
robust hashing was studied in the context of authentication, where the main objective was to prevent an
adversary from forging an image that has the same hash as a given image or video. In contrast, while
collisions or false alarms are also a concern in content fingerprinting, the main threat model is an adversary
making minor modifications to a given multimedia document that would result in a significantly different
fingerprint and prevent identification. Another differencebetween fingerprinting and robust hashing is that
fingerprinting applications typically involve large databases with several millions of hours of video and
DRAFT
![Page 3: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/3.jpg)
3
audio, whereas traditional applications of image hashing typically focus on authenticating a smaller set
of images. However, many hashing schemes with good robustness properties can be adapted for content
identification purposes and hence the terms “content fingerprinting” and “robust hashing”‘ are often used
interchangeably in the literature.
Multimedia identification also shares some similarities with content-based multimedia retrieval [6], [7]
where multimedia objects are retrieved from a database based on their perceptual similarity to a query.
Such ideas have also been built into the MPEG-7 standard to facilitate similarity comparison, search
and retrieval of video [8]. However, the concept of perceptual similarity is not well-defined and cannot
be expressed in objective terms. In contrast, the multimedia identification problem is better defined, as
two objects are considered similar if they can both be obtained from the same underlying object through
content-preserving transformations. Moreover, in many practical applications involving fingerprinting,
there are stringent requirements on scalability and computational complexity that are less of a concern
in many retrieval applications.
Content fingerprinting has received a lot of interest in the research community and several different
approaches for fingerprinting have been proposed, some of which are reviewed in Section I-A. Most of
these works addressed the problem of designing fingerprinting schemes that are robust to different kinds
of processing. This paper focuses on developing a theoretical model and analyzing the performance of
fingerprinting schemes. Such a theoretical framework wouldcomplement existing experimental evalua-
tions and allow the prediction of how the identification accuracy scales with the size of the database.
Theoretical analysis can also allow us to determine fundamental limits on the performance and provide
guidelines for designing better fingerprinting schemes.
In this paper, we examine content identification under a hypothesis testing framework and examine the
influence of various system parameters on the identificationperformance. We then derive bounds on the
error probabilities of an identification scheme that generates fingerprints with independent and identically
distributed (i.i.d.) bits and provide guidelines for choosing the hash length to achieve a desired accuracy.
As many practical schemes generate fingerprints with correlated components, we propose a Markov
Random Field (MRF) model to capture local dependencies among the bits and use techniques inspired
by statistical physics to determine the influence of the correlation among fingerprint components on the
probability of detection.
A. Related Prior Work
Content fingerprinting has attracted a lot of research and several audio and video fingerprinting
techniques have been proposed in the literature. A robust fingerprinting technique for audio identification
based on the signs of the differences between the energy in different frequency bands of overlapping
DRAFT
![Page 4: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/4.jpg)
4
frames was proposed in [9]. A similar approach for video, coupled with efficient indexing strategies was
proposed in [10]. Ranks of the block average luminance of sub-sampled frames were used as fingerprints
in [11], while signs of significant wavelet coefficients of spectrograms were used to construct fingerprints
in [12]. Moment invariants that capture appearance and motion were proposed as features for fingerprints
in [13].
In the robust hashing literature, hash generation by quantizing projections of images onto smooth
random patterns was proposed in [4], which is used as a building block in many fingerprint constructions
such as [13]. Hashes resilient to geometric transforms based on properties of Fourier transform coefficients
were proposed in [5]. Spatiotemporal video hashes based on 3-D transforms were proposed in [14].
Several other hashing schemes with different robustness properties have been proposed in the literature.
The reader is referred to [15] and the references therein fora more exhaustive review and comparison
of various fingerprinting and hashing techniques.
Regarding theoretical aspects of fingerprinting, qualitative guidelines for designing multimedia hash
functions were provided in [16], with a focus on bit assignment and the use of suitable error-correcting
codes to improve the robustness. Robust hashing was considered as a classification problem in [17]. As
a null-hypothesis and false alarms were not explicitly considered in the formulation of [17], the analysis
cannot be directly applied to the problem of content identification. In the related field of biometrics, the
capacity of biometrics-based identification was studied in[18]. Capacity was defined as the maximum
rateR such that2LR distinct biometrics could be identified with an asymptotic error probability of zero,
as the length of the fingerprintsL → ∞. However, as noted in the paper, while designing practical
systems, we are more interested in determining the best performance obtainable using a given length
of the fingerprint, which is one of the contributions of this paper. We note that while our results are
presented in the context of multimedia content identification, the results are equally applicable in other
related areas such as the biometrics-based identification considered in [18].
B. Organization of the Paper
Section II provides a brief overview of the framework of hypothesis testing adopted in this paper. In
Section III we examine the impact of various system parameters on the identification performance of
binary i.i.d. fingerprints and bounds on the accuracy are derived in Section IV. An MRF based model
for binary fingerprints with correlated bits is proposed in Section V and the impact of the correlation
on the performance is examined. Section VI presents simulation results and Section VII summarizes the
findings of the paper.
DRAFT
![Page 5: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/5.jpg)
5
FingerprintComputation
Videos
V1
V2...
VN
Fingerprints
X(1)
X(2)
...X(N) DistortionVj
W /∈ {Vi}FingerprintComputation
YZ
(a) Database Creation (b) Detection Stage
Fig. 1. System Model
II. H YPOTHESISTESTING FRAMEWORK
Hypothesis testing has been commonly used to model identification and classification problems [19].
We adopt a similar framework in this paper for analyzing content identification. For ease of presentation,
we describe the framework using the example of a video identification application, but the analysis and
results apply to other identification tasks as well.
The system model for a fingerprint-based video identification scheme is shown in Fig. 1. Suppose that
the detector has a collection ofN videosV1, V2, . . . , VN which would serve as a reference database for
identifying query videos. For example, in a UGC website application, the videos{Vi} may correspond to
copyrighted videos that should not be uploaded to the website by users. In the initial creation stage, the
fingerprintX(i) corresponding to videoVi is computed and stored in the database as shown in Fig. 1(a).
Given a query videoZ that needs to be identified, the detector computes the fingerprint Y of the
uploaded video and comparesY with the fingerprints{X(i)}Ni=1 stored in its database. In general, the
query Z may be some videoW that does not correspond to any video in the database or a possibly
distorted version of some videoVi in the database. These distortions may be caused by incidental
changes that occur during transmission and storage, such ascompression and transcoding, or they may
be intentional distortions introduced by an attacker to prevent the identification of the content.
We consider two different detection objectives based on therequirements of different applications. In
some applications, such as a video sharing website implementing content filtering, it may be sufficient
to determine if the content is subject to copyright protection or not. In this case, the detector is only
interested in determining whether a given video is present in a database of copyrighted material or not.
We refer to this scenario as thedetection problem, which can be formulated as a binary hypothesis test:
H0 : Z does not correspond to any video in{V1, V2, . . . , VN},
H1 : Z corresponds tosomevideo in {V1, V2, . . . , VN}. (1)
Under this setting, the performance of a particular fingerprinting scheme with the associated decision
rule δD(·) can be evaluated using the probability of false alarmPf = Pr(δD = 1|H0) and the probability
of correct detectionPd = Pr(δD = 1|H1). In some situations, it may be more convenient to work with
DRAFT
![Page 6: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/6.jpg)
6
the probability of false negativePfn = 1− Pd instead ofPd.
In some applications, such as automatic tagging of content,the detector is further interested in
identifying the original video corresponding to a query video. We refer to this scenario as theidentification
problem. The identification problem can be modeled as a multiple hypothesis test with each hypothesis
corresponding to one original video and a null hypothesis corresponding to the case that the uploaded
video is not present in the database:
H0 : Z is not from the database{V1, V2, . . . , VN},
Hk : Z is a (possibly distorted) version ofVk , k = 1, 2, . . . , N. (2)
In this scenario, the probability of correctly identifyinga query videoPc, the probability of misclas-
sifying a videoPm, and the probability of false alarmPf can be used to quantify the performance of a
given fingerprinting scheme and the corresponding detector. In the remainder of this paper, we examine
the performance of binary fingerprinting schemes under thishypothesis testing framework.
III. F INGERPRINTS WITH INDEPENDENTBITS
Binary strings are commonly employed in fingerprinting schemes such as [9], [10] since comparison
of binary strings can be performed efficiently. From the designer’s point of view, it is desirable for the
fingerprint bits to be independent of each other, so that an attacker cannot alter a significant number of
fingerprint bits at once by making minor changes to the content. Further, if the bits are equally likely to be
0 or 1, the overall entropy is maximized and each bit conveys the maximum amount of information. If the
bits are not equally likely to be0 or 1, they can be compressed into a shorter vector with equiprobable bits,
in order to meet the compactness requirement of the fingerprint. Also, from a game-theoretic perspective,
it has been shown that using equally likely bits is advantageous for the designer [20]. Binary strings
with independent and identically distributed (i.i.d.) bits also arise in biometric identification [21]. Hence,
in this and the next section, we focus our analysis on the performance of fingerprinting schemes with
i.i.d. equally likely bits and assume that each fingerprintX(i) consists ofL bits that are distributed i.i.d.
according to a Bernoulli(0.5) distribution. Binary fingerprints with correlated bits will be examined later
in Section V.
Distortions introduced into the content translate into changes in the fingerprint of the content. By
a suitable choice of features used for constructing the fingerprint and appropriate preprocessing and
synchronization, such attacks can be modeled as additive noisen in the hash space [16]. Since the hash
bits considered in this section are designed to be i.i.d., wemodel the effect of attacks on the multimedia
content as altering each bit of the hash independently with probability p < 0.5, i.e. the components of
n are i.i.d. Bernoulli(p). The maximum possible value ofp is proportional to the maximum amount of
DRAFT
![Page 7: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/7.jpg)
7
distortion that may be introduced into the multimedia content and will be referred to as the distortion
parameter in the rest of the paper.
A. Detection Problem
Under the assumptions outlined above, thedetection problem, where the detector is only interested in
identifying whether a given content is present in a databaseor not, becomes:
H0 : Y 6= X(i) + n for i = 1, 2, . . . , N,
H1 : Y = X(i) + n, for somei ∈ {1, 2, . . . , N} (3)
whereY, X(i), i = 1, 2, . . . , N and the noisen are all binary vectors of lengthL. Under hypothesisH0,
Y can take any value with equal probability, since the fingerprint bits are i.i.d. with equal probability of
being0 or 1, so thatPr(Y = y|H0) =12L ,∀y ∈ {0, 1}L. The distribution of the fingerprintY, given that
it is a modified version ofX(i), Pr(Y|X(i)) can be specified by considering their Hamming distance.
Let di = d(Y,X(i)) be the Hamming distance between the fingerprint of the query video and a given
fingerprintX(i) in the database. Since the probability of a bit being altereddue to the noise isp, the
probability that exactlydi bits are altered isPr(Y|X(i)) = pdi(1− p)L−di .
The alternative hypothesisH1 is thus a composite hypothesis, as the computed fingerprintY can have
different distributions depending on which original fingerprint it corresponds to. The optimal decision
rule for composite hypothesis testing is given as [19]:
DecideH1 ifp(Y|H1)
p(Y|H0)> τ ′′ (4)
where the thresholdτ ′′ can be chosen to satisfy some optimality criterion. If the priors of the hypotheses
and the associated costs are known, thenτ ′′ can be computed so as to minimize the expected Bayes
risk. If the costs are known, but the priors are unknown, the thresholdτ ′′ can be chosen to minimize
the maximum expected risk. In this paper, we use a Neyman-Pearson approach [19] to maximize the
probability of detectionPd subject to the constraint that the probability of false alarm Pf ≤ α.
To simplify the analysis, we assume that all videos in the database are equally likely to correspond to
a query. In situations where some popular videos may be queried more often than others, the analysis
can be applied by appropriately modifying the prior probabilities. With this assumption, the likelihood
ratio test in Eqn. (4) becomes:∑N
i=1 p(Y|X(i))p(X(i)|H1)
p(Y|H0)> τ ′′.
DRAFT
![Page 8: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/8.jpg)
8
Substitutingp(Y|H0) =12L , p(Y|X(i)) = pdi(1− p)L−di , andp(X(i)|H1) =
1N
, we get:
N∑
i=1
(
pdi
L (1− p)1−di
L
)L
> τ ′ (5)
where the constants have been absorbed into the thresholdτ ′. We note that the left hand side is a sum
of exponentials, and for a reasonably largeL, only the largest term would be relevant. Further, since
px(1− p)1−x is a decreasing function ofx for p < 0.5, the largest term in the left hand side of Eqn. (5)
would be the one with the smallest value ofdi. Thus, we arrive at the decision rule:
δD =
1 if dmin < τ,
1 with probability q if dmin = τ ,
0 otherwise,
(6)
wheredmin = mini=1,2,...,N
di. Hereτ is an integer threshold expressed in terms of the Hamming distance,
andτ andq are chosen to achieve a desired probability of false alarmα. Based on this decision rule, the
query is detected as being present in the database (δD = 1), if the minimum Hamming distance between
the fingerprint of the query and the fingerprints in the database is less than a specified thresholdτ .
1) ComputingPd and Pf : The probability of false alarmPf for a thresholdτ is given byPf (τ) =
Pr(dmin < τ |H0)+ qPr(dmin = τ |H0). To compute the value ofPf (τ), consider the Hamming distance
betweenY andX(i), which can be expressed asdi = d(Y,X(i)) = wt(Y ⊕X(i)), where wt(·) denotes
the Hamming weight of a binary vector and⊕ denotes addition over the binary field (XOR). Under
H0, since each bit ofY andX(i) are equally likely to be0 or 1, each component ofY ⊕X(i) is also
Bernoulli(0.5). The probability distribution ofdi = wt(Y ⊕X(i)) thus corresponds to the weight of a
random binary vector with i.i.d. uniform entries, which is abinomial distribution with parametersL and
0.5. Denote the probability mass function (p.m.f.) of a binomial random variable with parametersL and
0.5 by f0(k) , 12L
(
Lk
)
and the tail probability byF0(k) ,∑L
j=k f0(j). ThenPr(di = k|H0) = f0(k)
andPr(di ≥ k|H0) = F0(k).
As the fingerprintsX(i), i = 1, 2, . . . , N are independent, we havePr(dmin ≥ τ |H0) =∏N
i=1 Pr(di ≥
τ |H0) = [F0(τ)]N . The probability of false alarm can now be written as
Pf (τ) = (1− [F0(τ)]N ) + q([F0(τ)]
N − [F0(τ + 1)]N )
= 1− (1− q)[F0(τ)]N − q[F0(τ + 1)]N (7)
To compute the probability of detection, denote the p.m.f. of a binomial random variable with pa-
rametersL and p by f1(k) ,(
Lk
)
pk(1 − p)L−k and the tail probability byF1(k) ,∑L
j=k f1(j).
The probability of detection is given asPd(τ) = Pr(dmin < τ |H1) + qPr(dmin = τ |H1). Suppose
DRAFT
![Page 9: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/9.jpg)
9
that H1 is true and that the query video is actually a distorted version of videoVs. As the noise is
assumed to change each fingerprint bit independently with probability p, Pr(ds = k|H1, s) = f1(k) and
Pr(ds ≥ τ |H1, s) = F1(k). For i 6= s, sinceX(i) is independent ofY and has i.i.d. equally likely bits,
Y ⊕ X(i) has i.i.d. Bernoulli(0.5) components. Thus the distancedi = wt(Y ⊕X(i)), i 6= s follows a
binomial distribution with parametersL and 0.5, which is the same as the distribution underH0. Now
consider
Pr(dmin ≥ τ |H1, Vs) = Pr(ds ≥ τ |H1, Vs)∏
i 6=s
Pr(di ≥ τ |H1, Vs)
= F1(τ)[F0(τ)]N−1.
The probability of detection can then be written as
Pd(τ) = 1− [F1(τ)][F0(τ)]N−1 + q( [F1(τ)][F0(τ)]
N−1 − [F1(τ + 1)][F0(τ + 1)]N−1 )
= 1− (1− q)[F1(τ)][F0(τ)]N−1 − q[F1(τ + 1)][F0(τ + 1)]N−1. (8)
2) Numerical Results:In Fig. 2, we show the receiver operating characteristics (ROC) computed using
Eqns. (7) and (8) for various values of the parametersL, N , andp. Fig. 2(a) shows the ROC curves as
the distortion parameterp is increased from0.2 to 0.3 for N = 230 fingerprints in the database each of
length256 bits. We observe that as the distortion parameterp increases, the probabilityPd of detecting a
copyrighted video reduces for a given probability of false alarmPf . As p approaches0.5, the probability
of detection approaches the lower boundPd = Pf . Fig. 2(b) examines the influence of the number of
fingerprints in the databaseN on the detector performance for a fixed fingerprint lengthL = 256 bits
and distortion parameterp = 0.3. As N increases, the probability of false alarm increases. As a result,
for a givenPd, the Pf is higher, or equivalently, for a fixedPf , the probability of detection is lower.
Fig. 2(c) shows that under a given distortion, the detector performance can be improved by using a longer
fingerprint. As the fingerprint length is increased,Pd increases for a givenPf .
B. Identification Problem
We now consider theidentification problemfor binary fingerprinting schemes, where the detector is
interested in identifying the specific video that the query corresponds to. As discussed in Section II, this
scenario can be modeled as a multiple hypothesis test:
H0 : Y 6= X(i) + n, for i = 1, 2, . . . , N,
Hi : Y = X(i) + n, i = 1, 2, . . . , N. (9)
As before, we assume that the fingerprint bits are i.i.d. and equally likely to be 0 or 1, the noise
DRAFT
![Page 10: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/10.jpg)
10
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pf
Pd
p = 0.2p = 0.25p = 0.3
(a) N = 230, L = 256 bits
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pf
Pd
N = 210
N = 220
N = 230
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pf
Pd
L = 512L = 256L = 128
(b) p = 0.3, L = 256 bits (c)N = 230, p = 0.3
Fig. 2. Receiver Operating Characteristics (ROC) for the binary hypothesis testing problem obtained from theoreticalanalysis.
independently changes each bit with probabilityp and that the prior probability of each hypothesis is the
same. Under this model, the Maximum Likelihood (ML) decision rule can be derived as:
δI =
i if di ≤ τ and i = argminj=1,2,...,N
dj ,
0 otherwise,
(10)
where di = d(Y,X(i)). If fingerprints of several copyrighted videos have the samedistance to the
fingerprint of the query videoY, one of them is chosen randomly as the match.
We now compute the performance metrics for the ML detectorδI . The probability of false alarmPf
DRAFT
![Page 11: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/11.jpg)
11
is given by
Pf (τ) = Pr(at least one ofd1, d2, . . . , dN ≤ τ |H0),
= 1− Pr(none ofd1, d2, . . . , dN ≤ τ |H0),
= 1− [F0(τ + 1)]N .
As the fingerprints{X(i)} are identically distributed and equally likely to be queried, and the distribution
of the noisen under each of the hypotheses is the same, the overall probability of correct identification
Pc will be equal to the probability of correct identification under any given hypothesis, for exampleH1.
Under this hypothesis,d1 has p.m.f.f1 anddi, i 6= 1 has p.m.f.f0, so that:
Pc(τ) = Pr(δI = 1|H1)
= Pr(d1 ≤ τ∧
d1 < mini>1
di|H1) + Pr(mini>1
di = d1∧
d1 ≤ τ∧
δI = 1|H1),
=
τ∑
j=0
f1(j)
[
{F0(j + 1)}N−1 +
N−1∑
k=1
1
k + 1
(
N − 1
k
)
[f0(j)]k [F0(j + 1)]N−1−k
]
.
Similarly, the probability of misclassification can be computed as:
Pm(τ) = Pr(δI ∈ {2, 3, . . . , N}|H1),
= Pr(mini>1
di ≤ τ∧
mini>1
di < d1|H1) + Pr(mini>1
di = d1∧
d1 ≤ τ∧
δI > 1|H1),
=
τ∑
j=0
[
N−1∑
k=1
{(
N − 1
k
)
f0(j)k[F0(j + 1)]N−1−k ×
(
F1(j + 1) +k
k + 1f1(j)
)}
]
.
Fig. 3 shows the influence of the various parameters on the identification accuracy of the ML detector in
Eqn. (10). Fig. 3(a) shows the influence of the distortion parameterp. We observe that asp increases, the
probability of correct identificationPc at a given false alarm probabilityPf reduces, and the probability of
misclassificationPm increases. The influence of the number of videosN on the accuracy of identification
is shown in Fig. 3(b). As the number of videos in the database increases, the probability of false alarm
increases, or equivalently, at a givenPf , the value ofPc is lower. Fig. 3(b) shows that the probability of
correct identification under a given distortionp and a givenPf can be increased by increasing the hash
length. Thus, given the number of videosN and a desired probability of false alarmPf , the content
identification system can be made more robust by choosing a longer hash lengthL. These results are
similar to that obtained for the detection problem in the previous section.
DRAFT
![Page 12: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/12.jpg)
12
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pf
Pc (p=0.2)
Pc (p=0.25)
Pc (p=0.27)
Pm
(p=0.27)
Pm
(p=0.25)
Pm
(p=0.2)
(a) N = 230, L = 256 bits
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pf
Pc (N=220)
Pc (N=230)
Pm
(N=230)
Pm
(N=220)
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pf
Pc (L=1024)
Pc (L=256)
Pm
(L=256)
Pm
(L=1024)
(b) p = 0.25, L = 256 bits (c)N = 230, p = 0.25
Fig. 3. ROC curves for the multiple hypothesis testing problem obtained from theoretical analysis.
IV. ERROR EXPONENTS AND PERFORMANCEBOUNDS
In Section III, we have derived expressions for the probability of correct identification and false alarm
for a given set of parameters and examined the tradeoff between identification accuracy, robustness and
the fingerprint length. In practice, we are often interestedin choosing system parameters to ensure that the
probability of error is below a certain threshold. While theexpressions forPd andPf in Section III can
be used to choose the parameters, the equations are non-linear and cannot be solved easily. Hence, in this
section, we derive bounds on the achievable error probabilities using fingerprints of a given length and
provide guidelines for choosing the fingerprint length required to achieve a desired detection accuracy.
We provide an intuitive interpretation of these bounds and show that content identification with a false
alarm requirement is analogous to the problem of joint source channel coding.
DRAFT
![Page 13: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/13.jpg)
13
A. Error Exponents
Consider the detection problem where the detector is only interested in deciding whether a query
video is a modified version of some video in the database or not. As before, we examine the case of
i.i.d. binary fingerprints with the corresponding decisionrule given by Eqn. (6). As we are interested
in deriving bounds, we assume, for simplicity, thatq = 1 in the decision rule. The probability of false
alarm is given by
Pf (τ) = Pr(
N⋃
i=1
{d(Y,X(i)) < τ}|H0),
≤
N∑
i=1
Pr(d(Y,X(i)) < τ |H0),
= N Pr(d(Y,X(1)) < τ |H0), (11)
where we have used the union bound and the fact that the fingerprints X(i) are i.i.d. As discussed in
the previous section, underH0, Y andX(1) are independent with each component being equally likely
to be 0 or 1. Thus, the XOR ofY andX(1) is uniformly distributed over all binary strings of length
L. The Hamming distanced(Y,X(1)) = wt(Y ⊕ X(1)) and as a result,Pr(d(Y,X(1)) < τ |H0) =
12L
∑
x∈{0,1}L 1({wt(x) < τ}) = 12LSL,τ , where1(·) is the indicator function andSL,τ is the number of
binary vectors within a sphere of radiusτ in {0, 1}L. Let λ = τL
be the normalized radius. The volume
of the sphereSL,Lλ, for λ ≤ 12 can be bounded as
SL,Lλ ≤ 2Lh(λ),
whereh(p) = −p log2 p− (1− p) log2(1− p) is the entropy function [22]. By combining this result with
Eqn. (11), the probability of false alarm can be bounded fromabove as
Pf (Lλ) ≤ N2−LSL,τ
≤ N2−L(1−h(λ)) (12)
where τ = Lλ. The same result can been obtained by applying the Chernoff bound to upper bound
Pr(d(Y,X(1)) < Lλ) for λ < 12 , with d(Y,X(1)) being a binomial random variable with parameters
L and 12 [23]. However, we prefer this approach as it provides an intuitive explanation of the bounds,
which is discussed in Section IV-C.
We next consider the probability of a false negative (misseddetection)Pfn = 1 − Pd. Suppose that
X(i) is the fingerprint of a videoVi in the database and thatY is the fingerprint of a modified version of
Vi. A false negative occurs if no fingerprint in the database is within a distanceτ of the query fingerprint
DRAFT
![Page 14: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/14.jpg)
14
Y. The probability of a false negative can thus be bounded by the probability that the distance between
Y and the original fingerprintX(i) is larger thanτ :
Pfn(τ) ≤ Pr(d(Y,X(i)) > τ |H1).
SinceY is generated by flipping each bit ofX(i) with a probabilityp, d(Y,X(i)) is distributed according
to a binomial random variable with parametersL andp so thatPfn ≤ Pr(Binomial(L, p) > τ). By the
Chernoff bound [23], the tail probability of the binomial distribution can be bounded as
Pr(Binomial(L, p) ≥ Lλ) ≤ 2−LD(λ||p)
whereD(λ||p) is the Kullback-Leibler distance between two Bernoulli distributions with parametersλ
andp respectively. Thus, the probability of false negative whenτ = Lλ can be bounded as
Pfn(Lλ) ≤ 2−LD(λ||p) (13)
Eqns. (12) and (13) show the tradeoff between the probability of false alarmPf , the probability of
missed detectionPfn and the number of fingerprintsN in the database. For example, givenN videos,
reducing thePf would require1 − h(λ) to be as large as possible, or equivalently,λ must be as small
as possible. However, reducingλ leads to an increase in thePfn. To further examine this tradeoff, let us
define the rateR asN = 2LR, the false alarm error exponent asEf = 1−h(λ)−R, and the false negative
error exponent asEfn = D(λ||p), so thatPf ≤ 2−LEf andPfn ≤ 2−LEfn . From the properties of the
Chernoff bound, we know that these bounds are asymptotically tight, i.e., limL→∞1Llog2 Pfn = Efn
defined above. In the Neyman-Pearson setting, given a certain number of videosN and fingerprint length
L, suppose we wish to ensure thatPf ≤ ǫ = 2−L∆ and minimizePfn. This is equivalent to maximizing
Efn for a fixed rateR while ensuring thatEf ≥ ∆:
maxλ
Efn = D(λ||p) subject to1− h(λ)−R ≥ ∆. (14)
Fig. 4 shows the maximum achievable false negative error exponentEfn as a function of the false
alarm error exponent∆, for a fixed rateR, whenp = 0.3. From the figure, we observe that at a given
rate R, Efn reduces as a function of∆, which implies that for a fixed number of fingerprints in the
database, reducing the false alarms leads to an increase in the number of missed detections, and vice
versa. From the figure, we also observe that for a fixed value of∆, Efn reduces asN increases. This
trend matches the results presented in Section III.
To ensure thatPfn < 0.5, the decision thresholdτ = Lλ should be greater than the mean of the
binomial distributionLp. As the entropy functionh(λ) is monotonically increasing forλ < 0.5, this
would in turn imply that the false alarm exponent∆ = 1− h(λ)−R ≤ 1− h(p)−R. Hence, to ensure
DRAFT
![Page 15: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/15.jpg)
15
0 0.002 0.004 0.006 0.008 0.010.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
∆
Efn
R=0.001R=0.01R=0.05
Fig. 4. Error exponent for false negatives as a function of the rate for different values of the false positive error exponent.
thatPf ≤ ǫ = 2−L∆, we require thatR+∆ ≤ 1− h(p), or equivalently,
1
Llog2
N
ǫ≤ 1− h(p). (15)
Thus, given a video database of sizeN , to ensure that the probability of false alarmPf ≤ ǫ when the
attack alters on average a fractionp of the hash bits, the length of the fingerprints used for identification
should be chosen large enough to satisfy Eqn. (15). The corresponding probability of false negative is
then guaranteed to be less than2−LEfn , whereEfn can be computed from Eqn. (14).
B. Bounds on the Error Probabilities for the Identification Problem
Similar bounds may be derived on the various errors that may occur in the identification problem.
As the expression for the false alarm in the identification problem is identical to that in the detection
problem, the bound on the false alarm remains the same, i.e.Pf ≤ 2−L(1−h(λ)−R). Now consider the
probability of correct classificationPc. Given thatHi is true for a giveni 6= 0 is true, the detector does
not make a correct identification ifd(Y,X(i)) > Lλ or if d(Y,X(j)) < d(Y,X(i)), for somej 6= i.
Thus the probability of not making a correct decision can be bounded as
1− Pc ≤ Pr(d(Y,X(i)) > Lλ|Hi) + Pr(d(X(i),X(j)) < 2Lλ, i 6= j)
≤ Pr(d(Y,X(i)) > Lλ|Hi) +∑
j 6=i
Pr(d(X(i),X(j)) < 2Lλ)
≤ 2−LD(λ||p) + 2−L(1−h(2λ)−R)
where the second term corresponds to the bound on the probability that two uniformly chosen binary
strings of lengthL have a distance less than2Lλ and is obtained using the Chernoff bound, or by
DRAFT
![Page 16: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/16.jpg)
16
considering the volume of a binary sphere of radius2Lλ. We also note thatP (δI = 0|Hi, i 6= 0)+Pm+
Pc = 1, as these are the only possible decisions that a detector canmake. Therefore,Pm < 1 − Pc ≤
2−LD(λ||p) + 2−L(1−h(2λ)−R).
C. A Joint Source Channel-Coding Perspective
In the previous subsections, we have examined the relation between the rateR, the false negative error
exponentEfn, and the false alarm error exponent∆. We now provide an intuitive explanation of the
theoretical results obtained.
Consider the space of all binary strings of lengthL, represented by the dashed circle in Fig. 5. Let the
N binary fingerprintsX(i), i = 1, 2, . . . , N present in the database be represented by the solid dots in
the figure and the circles around the dots represent the detection regions for the respective fingerprints.
Any query fingerprint that falls within such a sphere is identified as the fingerprint represented by the
center of the sphere. The number of such spheres controls therate R, and the volume of the spheres
determines the probability of false alarm and missed detections.
To ensure a low probability of false negatives when the probability of a bit flipping is p, the detection
region around each fingerprint should include all binary strings that are within a Hamming distance
Lp from the fingerprint. The volume of such a sphere of radiusLp is SL,Lp, which for largeL is
approximatelySL,Lp ≈ 2Lh(p). As we have assumed that in the null hypothesis, the fingerprints of the
videos absent from the database are uniformly distributed over the entire space of binary strings of length
L, the probability of false alarm is approximately
Pf =N × SL,Lp
2L⇒ ǫ ≈
N2Lh(p)
2L(16)
which upon rearrangement gives Eqn. (15). To achieve a higher rate, we would like to pack more such
spheres into the binary space, but this would increase the probability of false alarms. Similarly, to reduce
the probability of false negatives, the volume of the decoding region around each fingerprint has to be
increased, which would also increasePf and reduce the number of spheres that can be packed into the
binary space.
We see that the fingerprinting problem shares some analogieswith source and channel coding. In
channel coding, to achieve capacity, we are interested in packing as many spheres as possible into the
binary space such that their overlap is minimum. In source coding with a fidelity criterion (rate-distortion
theory), we are interested in covering the entire space withas few spheres of fixed size as possible. Here,
to minimize the probability of false alarms, we would like tocover the space as sparsely as possible,
but the conflicting objective of increasing the rate requires packing as many spheres as possible. Thus,
fingerprinting can be thought of as being similar to joint source-channel coding.
DRAFT
![Page 17: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/17.jpg)
17
Space of allbinary stringsof length L Ball of
radius Lp
Fig. 5. Error exponent for false negatives as a function of the rate for different values of the false positive error exponent.
V. B INARY FINGERPRINTS WITHCORRELATED BITS
Our analysis so far has been focused on content identification using binary fingerprints with i.i.d. equally
likely bits. This analysis provides useful guidelines for designing and characterizing the performance
bounds of fingerprinting schemes. Many practical fingerprinting schemes, however, generate fingerprints
with correlation among components. While it is possible to include an explicit decorrelation stage to
remove such dependencies and obtain a shorter fingerprint with independent bits, to meet stringent
computational requirements in a practical scheme, it may become preferable in practice to use the
correlated fingerprint bits directly and not to incur additional computation for decorrelation in order to
meet stringent computational requirement in large-scale practical deployment. To capture the correlations
in the fingerprints and the noise introduced through distortion of the content, we propose a Markov
Random Field based model and study the impact of the correlation on the identification performance.
We then describe an approach inspired by statistical physics to compute the probability of errors.
A. Markov Random Fields
Markov Random Fields (MRFs) are a generalization of Markov chains in which time indices are
replaced by space indices [24]. MRFs are undirected graphical models and represent conditional inde-
pendence relations among random variables. In this section, we briefly review key concepts of MRFs
that are relevant to modeling content fingerprints.
An MRF consists of an undirected graphG = (V, E) with a set of nodesV and a set of edgesE between
nodes. Each nodeX ∈ V represents a random variable, and we will useX to denote the node and the
random variable interchangeably. The vectorX denotes all random variables represented by the MRF. Two
nodesXi andXj are said to be neighbors if there is an edge between them, i.e.(i, j) ∈ E . A set of nodes
C is called a maximal clique if every pair of nodes inC are neighbors and no node inV\C is a neighbor
of everynode inC. An energy functionEC({xC}) is associated with every maximal cliqueC that maps
DRAFT
![Page 18: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/18.jpg)
18
(a)
OriginalFingerprint
Noise inFingerprint
(b)
Fig. 6. Markov Random Field model for (a) fingerprint components and (b) fingerprint and noise.
the values{xC} of the nodes inC to a real number. The joint probability distribution of all the random
variables represented by the MRF is then given asp(X = x) = 1Zexp (−
∑
C EC({xC})), whereZ is a
normalization constant called the partition function. Theterm in the exponent,E(x) =∑
C EC({xC}),
is sometimes referred to as the energy of the configurationx.
MRFs have been used in image processing [25] and computer vision [26] as they can represent local
correlations among random variables. In the next section, we develop a model for content fingerprints
using MRFs to capture local dependencies.
B. Model for a Block-based Fingerprinting Scheme
We model content fingerprints as a Markov Random Field where each fingerprint value is represented
as a node in the MRF, and pairs of nodes that have dependenciesare joined by edges. We illustrate
our model using a representative fingerprinting scheme thatpartitions each video frame into blocks
and extracts one bit from each block [15]. While we use a simple two-dimensional model for ease of
illustration, the analysis can be extended to three-dimensional and more complex models.
Suppose that each video frame of sizePH ×QW is partitioned intoPQ blocks of sizeH ×W each
and one bit of the fingerprint is extracted from each block. For example, the fingerprint bit could be
obtained by thresholding the average luminance of a block. Due to underlying correlations among the
blocks of the frame, these bits are likely to be correlated. We represent the bit extracted from each block
as a node in a graphG0 = (V0, E0), with the nodeXi,j representing the bit from the(i, j)th block. Each
node may take one of two values±1, with bit ‘b’ represented as(−1)b, and is connected to the four
nearest neighbors, so that the overall graph satisfies 4-connectivity as shown in Fig. 6(a). For convenience,
we use a vectorX to represent the bitsXi,j, 1 ≤ i ≤ P, 1 ≤ j ≤ Q, which could be obtained by any
form of reordering, such as raster scanning.
As described in Section V-A, the joint probability distribution of the fingerprint can be specified by
defining an energy function for the model. We use the energy function that has been commonly used for
DRAFT
![Page 19: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/19.jpg)
19
modeling binary images [26]:
E0(x) = −h∑
i
xi − η∑
(j,k)∈E0
xjxk. (17)
This corresponds to the 2-D Ising model that has been widely used in statistical physics to model
ferromagnetism arising out of interactions between individual spins. Hereη controls the correlation
between nodes that are connected andh determines the marginal distribution of the individual bits. A
higher value forη would increase the correlation among neighboring bits, andlarge h would bias the
bits to be+1. The joint distribution can then be written asp0(x) = 1Z0
exp(−E0(x)), with Z0 being the
normalization constant to ensure that the distribution sums to 1.
The above model describes the fingerprint bits obtained fromthe original video frame. In many practical
applications, fingerprints are extracted from possibly modified versions of the video and may be noisy.
The noise components may be correlated and also dependent onthe fingerprint bits. To accommodate
such modifications, we propose a joint model for the noise bits and the fingerprint bits of the original
unmodified video, which is shown in Fig. 6(b). The filled circles represent the noise bits and the open
circles represent the fingerprint bits. The solid edges capture the dependencies among the fingerprint
components, while the dashed and dotted edges represent thelocal correlations among the noise bits.
The dashed edges capture the dependence between the noise bits and the fingerprint bits. The noise may
be causally dependent on the fingerprint of the original video, but the fingerprint bits of the original
video should not be influenced by the noise. However, the addition of these undirected edges makes the
graph symmetric with respect to the fingerprint and noise bits and does not accurately reflect the causal
dependence. Factor graphs can be used to represent this dependence, and will be addressed in our future
work.
In this paper, we consider the case where the noise bits may bemutually dependent, but are independent
of the fingerprint bits, implying that the dashed edges between the noise bits and the fingerprint bits are
absent. In this case, the model for the noise bits{Ni,j} reduces to a 2-D Ising modelG1 = (V1, E1)
similar to that for the fingerprints. The energy function fora configurationn can be defined as:
E1(n) = −α∑
i
ni − γ∑
(j,k)∈E1
njnk, (18)
and the distribution is specified asp1(n) = 1Z1
exp(−E1(n)). The parametersα andγ control the marginal
distribution and the pairwise correlation among the noise bits, respectively.
The above MRF can be used to model block based binary video fingerprints computed on a frame
by frame basis. For other fingerprinting schemes, differentgraphs can be used to capture the local
dependencies among the fingerprint components.
DRAFT
![Page 20: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/20.jpg)
20
C. Hypothesis Testing
As discussed in Section II, content identification using fingerprints can be modeled as a multiple
hypothesis test. Under the MRF model, since it is difficult tocompute the error probabilities for the
(N+1)-ary hypothesis test, we approximate the multiple hypothesis test as a series of binary hypothesis
tests, where the detector compares the query fingerprint sequentially with each fingerprint in the database.
The first fingerprint that matches with the query is output as the response to the query. While this decision
rule is suboptimal compared to the Maximum Likelihood decision for the multiple hypothesis test, it is
often a good approximation in practice, as the fingerprints in the database are expected to be well
separated and the probability that a query fingerprint is close to two distinct fingerprints in the database
is small. With this approximation, we first examine a simplerbinary hypothesis test of comparing the
query fingerprint to agivenfingerprint in the database and then use the results for the binary hypothesis
test to compute the error probabilities for the overall identification process.
Given a query videoZ and a reference videoV in its database, consider the problem where the
detector has to decide whetherZ is derived fromV or whether the two videos are unrelated. To do so,
the detector computes the fingerprintsy andx from the videosZ andV , respectively. The detector then
performs a binary hypothesis test with the null hypothesisH0 that the two fingerprints are independent
and the alternate hypothesisH1 that the fingerprinty is a noisy version ofx:
H0 : (x,y) ∼ p0(x)p0(y),
H1 : (x,y) ∼ p0(x)p1(n), (19)
wherep0(·) is the distribution of the fingerprints,p1(·) is the distribution of the noise and the noise is
the element-wise product of the two fingerprintsn = x⊗ y, with the fingerprint bits being represented
using±1.
We consider a Neyman-Pearson setting, where the detector seeks to maximize the probability of
detectionP (b)d under the constraint that the probability of false alarmP (b)
f ≤ ǫ. The optimal decision rule
is obtained by comparing the log likelihood ratio (LLR) to a threshold:
LLR(x,y) = E0(y)− E1(n)H1
≷H0
τ, (20)
where the constants have been absorbed into the thresholdτ , which is chosen such thatP (b)f = ǫ. In
cases where the LLR is discrete, it may be necessary to incorporate randomization when the LLR equals
the threshold.
For example, for the block-based binary fingerprinting scheme model described in Section V-B, the
DRAFT
![Page 21: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/21.jpg)
21
LLR is given by:
LLR(x,y) = −h∑
i
yi − η∑
E0
yjyk + α∑
i
ni + γ∑
E1
njnk.
If the fingerprint bits are i.i.d. and equally likely to be±1, corresponding toη = h = 0, and the noise bits
are independent (γ = 0), the optimum decision rule reduces to a comparison of the Hamming distance
betweenx andy to a threshold, as derived in [27]. However, when the bits arenot independent, a decision
rule that compares the Hamming distance to a threshold issuboptimal. We would like to quantify the
accuracy using the optimal decision rule and the performance loss when using the Hamming distance as
opposed to using the optimal decision rule.
Define the probability of detection under this binary hypothesis test asP (b)d = Pr(LLR(x,y) > τ |H1)
and probability of false alarmP (b)f = Pr(LLR(x,y) > τ |H0). A main challenge in accurately estimating
these tail probabilities is that these events have small probability of occurrence and are rarely observed
in a typical Markov Chain Monte Carlo simulation. We take a different approach inspired by statistical
physics to first estimate the so-called density of states andthen utilize this information to estimate these
probabilities.
D. ComputingP (b)d andP
(b)f
For ease of illustration, we again use the example of the binary fingerprint model described in
Section V-B. Suppose we defineM(x) =∑
i xi andEcorr(x) = −∑
(j,k)∈E0xjxk, the LLR in Eqn.( 20)
can be written asLLR(x,y) = −hM(y)+ηEcorr(y)+αM(n)−γEcorr(n), sinceE0 = E1 in this model.
Similarly, the energy for the fingerprint bits and the noise,E0(x) andE1(n), described in Eqns. (17)
and (18) can be rewritten in terms of these functions. Thus, the tuple
S(x,y) = (M(x), Ecorr(x),M(y), Ecorr(y),M(n), Ecorr(n)),
captures all necessary information regarding the configuration (x,y). Defineg(s) = g(mx, ex,my, ey,mn, en)
as the number of configurations(x,y) that haveM(x) = mx, Ecorr(x) = ex,M(y) = my, Ecorr(y) =
ey,M(n) = mn, andEcorr(n) = en. The functiong is referred to as the “density of states” in the physics
literature and it depends only on the underlying graphical model and is independent of the parameters
(h, η, α, γ) of the distributions.
The probability of detectionP (b)d can then be rewritten as:
P(b)d (τ) =
∑
(x,y)
1{LLR(x,y) > τ} p0(x) p1(n) (21)
=∑
s
g(s)1{LLR(s) > τ} p0(s) p1(s), (22)
DRAFT
![Page 22: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/22.jpg)
22
where the summation in Eqn. (22) is over all possible values of s = (mx, ex,my, ey,mn, en) and
p0(s)p1(s) is the probability underH1 of any configuration(x,y) with S(x,y) = s. Similarly,
P(b)f (τ) =
∑
s
g(s)1{LLR(s) > τ}p0(s)p0(s). (23)
As theLLR and the probabilitiesp1(n) and p0(x) depend only ons, knowledge ofg(s) allows us to
computeP (b)d andP (b)
f . Moreover, the number of states is a polynomial function of the number of bits
and thus the summations in Eqns. (22) and (23) have manageable computational complexity. The problem
of computingP (b)d and P
(b)f has been converted into one of estimating the density of states g(s). An
algorithm to estimate the density of states was proposed by Wang and Landau in [28] and is summarized
in the Appendix. The main idea is to construct a Markov chain that has 1g(s) as its stationary distribution
and ensuring that all states are visited approximately equally often. An advantage of this “Wang-Landau”
algorithm is that states with low probability of occurrenceare also visited as often as high probability
states, enabling us to estimate their probabilities accurately. We first use this algorithm [28] to estimate
the density of statesg(s) and then computeP (b)d andP (b)
f using Eqns. (22) and (23).
E. Error Probabilities for Overall Matching
Given the values ofP (b)d andP (b)
f obtained using the above technique, we now compute the probability
of correct identification for the overall matching process.Consider the probability of false alarmPf . When
the multiple hypothesis test is approximated by a series of binary hypothesis tests, there is a false alarm
in the overall matching if there is a false alarm inanyof the individual binary hypothesis tests. The false
alarm probability is thus given by
Pf = 1− (1− P(b)f )N ≈ NP
(b)f .
Now suppose that the query video is actually a modified version of Vi. A misclassification can occur if a
false alarm occurs in the binary hypothesis test comparing the fingerprint of the query videoY to X(j)
for any j < i. Thus, the probability of misclassification can be bounded as:
Pm ≤ 1− (1− P(b)f )N−1 ≈ (N − 1)P
(b)f .
An incorrect decision happens when either a misclassification occurs, or a missed detection occurs in the
binary hypothesis test involvingX(i), implying that
1− Pc ≤ 1− P(b)d + Pm
⇒ Pc ≥ P(b)d −NP
(b)f .
DRAFT
![Page 23: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/23.jpg)
23
−40 −20 0 20 400
0.5
1
1.5
2
2.5
3
3.5
4x 10
−3
E
ε(g I(E
))
Fig. 7. Relative error in the estimation of density of statesfor a 4x4 Ising model with periodic boundary conditions.
*
(a)
*
(b)
0
0.2
0.4
0.6
0.8
1
*
(c)
Fig. 8. Typical correlation structure among the various fingerprinting bits. Correlation coefficients for the (a)(1, 1)th bit, (b)(2, 1)th bit, (c) (2, 2)th bit and the remaining bits. The ‘*’ denotes the bit under consideration.
Thus, given a desired overall probability of correct identification and false alarm, suitable values ofP(b)f
andP (b)d can be computed and used to choose the appropriate thresholdin the binary hypothesis test.
VI. N UMERICAL EVALUATION
We use the MRF model coupled with the technique for computingP(b)d and P
(b)f described in the
previous section to study the influence of correlation amongthe fingerprint components on the overall
detection performance. We focus on binary fingerprinting schemes and provide numerical results for the
model described in Section V-B. As most binary fingerprint schemes generate equally likely (but not
independent) bits, we set the parameterh = 0 in our simulations. This also helps reduce the parameter
space from a 6-D space(mx, ex,my, ey,mn, en) to a 4-D space(ex, ey,mn, en), as the expressions for
theLLR and probability distributions will not involvemx andmy.
A. Density of States Estimation
We evaluate the accuracy of the estimation algorithm using known exact results for the density of
energy statesgI(E) for the 2-D Ising model [29]. To enable comparison, periodicboundary conditions
are imposed on the graphG0 - the nodesX1,j in the top row are connected to the corresponding nodes
XM,j in the bottom row, and the nodes in the first column are similarly connected to the nodes in the
last column, so that every node is 4-connected. 4-connectivity is similarly achieved for the noise nodes
DRAFT
![Page 24: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/24.jpg)
24
{Ni,j}. We then use the Wang-Landau algorithm to estimate the density of statesg(s) = g(ex, ey,mn, en)
by performing a random walk in the 4-D parameter space [28] and use the obtainedg(s) to estimate the
density of energy statesgI(E) by summing over all other variables and normalization,
gI(E) =1
2PQ
∑
(ey,mn,en)
gI(E, ey ,mn, en).
In our simulations, we use the parameters suggested in [28] and the maximum number of iterations is
capped at1010.
We measure the accuracy of estimation by computing the relative error ε(gI(E)) in the estimate of
the density of states, defined asε(x) = |x−xest|x
. Fig. 7 shows the relative error in the estimation of the
density of states for a 2-D Ising model of size4 × 4 with periodic boundary conditions. We observe
from the figure that the maximum relative error is approximately 0.37%, and the mean relative error is
0.1%. These results demonstrate that accurate estimates of the density of states can be obtained using
the Wang-Landau algorithm. The estimation accuracy can be improved by suitably altering parameters
in the algorithm as necessary.
B. Performance of Correlated Fingerprints
To examine the performance of correlated fingerprints, we use the model without periodic boundary
conditions, as practical fingerprints are not expected to have such periodic relationships. The nodes at
the corners are only connected to their two closest neighbors, the remaining nodes at the borders are
connected to their three closest neighbors, and all the other nodes are 4-connected.
The correlation among the fingerprint components is estimated from108 MCMC iterations by retaining
only 1 out of 100 iterations to reduce the effect of correlations between successive iterates in the MCMC
simulations. Fig. 8 shows the correlation among the fingerprint bits for a 4 × 4 model, obtained by
settingη = 0.3, α = 0.3, andγ = 0.1. Fig. 8(a) shows the correlation between the(1, 1)th bit (top left
corner) and every other bit while Fig. 8(b) and (c) show the same for the(2, 1)th bit and the(2, 2)th bit,
respectively. Due to symmetry, other bits in correspondingpositions have similar correlations. We observe
that the correlation coefficient between each bit and its nearest neighborρx ≈ 0.3 and the correlation
decays with distance. This is the typical correlation behavior observed in our model and reflects the
correlation expected in practice - bits extracted from adjacent blocks are expected to be more correlated
than bits extracted from blocks farther apart. We observe a similar correlation among the noise bitsNi,j
as well, as the models for these bits are similar.
Using the estimated density of states, we compute the probabilities P(b)d and P
(b)f as described in
Section V-D and study the effect of different parameters on the detection performance. Although errors
in the estimation of the density of states will also affect the accuracy of the estimates ofP (b)d andP (b)
f ,
DRAFT
![Page 25: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/25.jpg)
25
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pf(b)
Pd(b
)
LLR pn = 0.12
Hamming pn = 0.12
LLR pn = 0.2
Hamming pn = 0.2
Fig. 9. Influence ofpn on the detection performance for4× 4 bits per frame,ρx = 0.2 andρn = 0.2.
we have shown in Section VI-A, these errors are small, and theaccuracy can be improved by obtaining
a better estimate of the density of states.
First, we examine the effect of the noise on the detection accuracy. We characterize the noise by the
probability pn of a noise bit being ‘−1’ which is the equivalent of a binary ‘1’ bit, and the correlation
among the noise bitsρn, which are estimated from the MCMC trials. Fig. 9 shows the ROC curves for
a fingerprint of size4× 4 bits with correlationρx = 0.2 under two differentpn and fixedρn = 0.2, for
detection using the Log Likelihood Ratio (LLR) statistic and the Hamming distance statistic. We observe
that for a given noise level, the LLR statistic gives5− 10% higherP (b)d at a givenP (b)
f compared to the
Hamming distance detector. As expected, the performance for any given detector is worse when there is
a higher probability of the noise changing the fingerprint bits.
Fig 10 shows the influence of the noise correlation on the detection performance. The figure indicates
that for a fixed correlation among the fingerprint bitsρx = 0.2 and a fixed marginal probability of the
noise bitspn = 0.3, detection using the LLR statistic is not significantly affected by the noise correlation.
This is due to the fact that the LLR takes into account the correlation among the noise bits. On the other
hand, using the Hamming distance leads to some degradation in the performance as the correlation
increases. This can be explained by the fact that as the noisecorrelation increases, noise vectors with
large Hamming weights become more probable, leading to higher missed detections.
Next, we examine the influence of the correlation among the fingerprint bits on the detection accuracy.
Fig. 11 shows the ROC curves for content identification usingfingerprints of size4 × 4 for different
correlations, where the noise parameterspn = ρn = 0.2. We again observe that detection using the LLR
statistic, which compensates for the correlation among thefingerprint bits, is not significantly affected by
the correlation. For the Hamming distance statistic, thereis an increase in false alarms at a givenP(b)d as
DRAFT
![Page 26: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/26.jpg)
26
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pf(b)
Pd(b
)
LLR ρn = 0.3
LLR ρn = 0.2
LLR ρn = 0.1
Hamming ρn = 0.1
Hamming ρn = 0.2
Hamming ρn = 0.3
Fig. 10. ROC for different noise correlationρn at fixedpe = 0.3 andρx = 0.2.
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pf(b)
Pd(b
)
LLR ρx=0.4
LLR ρx=0.3
LLR ρx = 0.2
Hamming ρx = 0.2
Hamming ρx=0.3
Hamming ρx=0.4
Fig. 11. Influence of correlation of the fingerprint bits on the detection performance (pn = ρn = 0.2).
the correlation among the fingerprints increases, as similar configurations with smaller distances become
more probable.
C. Simulation Results using Image Database
We compare the performance predicted by the theoretical analysis with simulation results obtained
using an image database. For our experiments, we use a database of1000 images downloaded from the
Flickr photo hosting service by searching for the tag “panda”. For extracting the fingerprints, each image
is divided into 16 blocks in a4 × 4 grid and the average luminance within each block is computed.
The average luminance of each block is then quantized to one bit accuracy according to whether it is
larger or lesser than the grayscale value of128, giving a 16-bit fingerprint for each image. Histogram
equalization is then performed on the luminance portion of the image, and the fingerprint is computed to
obtain the noisy version of the fingerprint. The hypothesis test described in Sec. V-C is then performed
DRAFT
![Page 27: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/27.jpg)
27
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pf(b)
Pd(b
)
LLR (theory)LLRHammingHamming (theory)
Fig. 12. Comparison of theoretical and simulation results for a database consisting of 1000 images.
using the noisy fingerprints. Additionally,1000 pairs of original fingerprints are randomly chosen and
compared to each other to obtain an estimate of the false alarm probability. We also estimate the Ising
model parameters for the fingerprints and the noise using theleast squares method proposed in [30] and
obtain the theoretical predictions for the ROC curves as described in Section V-D.
Fig. 12 compares the ROC curves obtained from theory and simulation for the LLR detector and the
Hamming distance based detector. The figure shows that the simulation results agree very well with
the theoretical predictions. In simulations with some other databases, we observed that when the MRF
model does not accurately capture the distribution of the fingerprints, there is a discrepancy between the
theoretical predictions and the simulation results. In ourfuture work, we will develop better techniques
to predict the performance of practical fingerprinting schemes.
VII. C ONCLUSIONS
In this paper, we have analyzed content identification usingfingerprints under a hypothesis testing
framework. We first considered the case of fingerprinting schemes that generate i.i.d. equally likely
bits and modeled distortions on the host content as alteringeach fingerprint bit independently with
probability p. We derived expressions for the probability of correct identification under this model and
studied the tradeoff between the number of fingerprints, therobustness, the identification performance
and the length of the fingerprints. To understand the fundamental limits on the identification capability,
we next derived bounds on the achievable error probabilities and characterized the tradeoff between the
detection probability and the number of fingerprints in terms of the error exponents. We then derived
guidelines for choosing the fingerprint length to attain a desired performance objective and provided an
interpretation of our results from a joint source-channel coding perspective.
To better predict the performance of practical fingerprinting schemes that have correlated components,
we proposed a Markov Random Field model that captures local correlations among individual fingerprint
DRAFT
![Page 28: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/28.jpg)
28
bits. Under this model, we examined fingerprint matching as ahypothesis testing problem and proposed
a statistical physics inspired approach to compute the probability of detection and the probability of false
alarm. Our analysis showed that Hamming distance based detection, which is commonly employed in
many applications, is suboptimal in this setting and is susceptible to correlations among the fingerprint
bits or the noise. The optimal log-likelihood ratio detector provides5− 20% higher detection probability
and the detection accuracy is relatively stable for different correlations among the fingerprint and noise
components. Simulation results using an image database corroborate our theoretical results.
Establishing models to facilitate performance studies on practical fingerprints and developing an
understanding of the limits of the identification accuracy achievable using fingerprints are the main
contributions of this paper. Results from our modeling and analysis not only provide qualitative guidelines
for content fingerprinting algorithm and system design, butto the best of our knowledge, these studies
for the first time allowquantitative understandings of the impact of major design and operational
parameters on the identification performance. The results have been mainly presented in the context
of content fingerprinting, but they are also applicable in many other applications, such as biometrics
based identification. For example, Vetro et al. recently showed that by suitably transforming features
extracted for human fingerprint matching using minutiae, the biometric data can be transformed into i.i.d.
Bernoulli(0.5) bits and that the distortions caused while recapturing thefingerprint can be modeled by a
binary symmetric channel [21]. The results derived in Section III would then be directly applicable for
analyzing this biometric identification problem.
ACKNOWLEDGEMENT
The authors thank Mr. Suriyanarayanan Vaikuntanathan at the University of Maryland for suggesting
the reference on the Wang Landau algorithm for estimating the density of states.
REFERENCES
[1] Wall Street Journal, “YouTube removes 30,000 files amid Japanese copyright concerns.” [Online]. Available:
http://online.wsj.com/article/SB116133637777798831.html
[2] C.-W. Chen, R. Cook, M. Cremer, and P. DiMaria, “Content identification in consumer applications,” inProc. of IEEE Int.
Conf. on Multimedia & Expo, Jul. 2009, pp. 1536–1539.
[3] M. Barni and F. Bartolini, “Data hiding for fighting piracy,” IEEE Signal Process. Mag., vol. 21, no. 2, pp. 28–39, Mar.
2004.
[4] J. Fridrich and M. Goljan, “Robust hash functions for digital watermarking,” inInt. Conf. on Information Technology:
Coding and Computing, 2000, pp. 178–183.
[5] A. Swaminathan, Y. Mao, and M. Wu, “Robust and secure image hashing,”IEEE Trans. on Information Forensics and
Security, vol. 1, no. 2, pp. 215–230, Jun. 2006.
[6] R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image retrieval: Ideas, influences, and trends of the new age,”ACM Computing
Surveys, no. 2, pp. 1–60, 2008.
DRAFT
![Page 29: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/29.jpg)
29
[7] S.-F. Chang, Q. Huang, T. Huang, A. Puri, and B. Shahraray, “Multimedia search and retrieval,” inMultimedia Systems,
Standards, and Networks, A. Puri and T. Chen, Eds. New York: Marcel Dekker, 2000.
[8] T. Sikora, “The MPEG-7 visual standard for content description - an overview,”IEEE Transactions on Circuits and Systems
for Video Technology, vol. 11, no. 6, pp. 696–702, Jun 2001.
[9] J. Haitsma, T. Kalker, and J. Oostveen, “Robust audio hashing for content identification,” inInt. Workshop on Content-Based
Multimedia Indexing, Brescia, Italy, Sept. 2001.
[10] J. Oostveen, T. Kalker, and J. Haitsma, “Feature extraction and a database strategy for video fingerprinting,” inProc. of
the Int. Conf. on Recent Advances in Visual Information Systems, Lecture Notes in Computer Science, vol. 2314, 2002,
pp. 117–128.
[11] R. Mohan, “Video sequence matching,” inIEEE Conf. on Acoustic, Speech and Signal Processing, vol. 6, May 1998, pp.
3697–3700.
[12] S. Baluja and M. Covell, “Content fingerprinting using wavelets,” inProc. IET Conf. on Multimedia, London, England,
Nov. 2006.
[13] R. Radhakrishnan and C. Bauer, “Video fingerprinting based on moment invariants capturing appearance and motion,”in
Proc. of IEEE Int. Conf. on Multimedia & Expo, 2009, pp. 1532–1535.
[14] B. Coskun, B. Sankur, and N. Memon, “Spatio-temporal transform based video hashing,”IEEE Trans. on Multimedia,
vol. 8, no. 6, pp. 1190–1208, Dec. 2006.
[15] J. Lu, “Video fingerprinting for copy identification: From research to industry applications,” inProc. SPIE/IS&T Media
Forensics and Security, San Jose, CA, Jan. 2009.
[16] E. McCarthy, F. Balado, G. Silvestre, and N. Hurley, “A framework for soft hashing and its application to robust image
hashing,” inIEEE Int. Conf. on Image Processing, vol. 1, Oct. 2004, pp. 397–400.
[17] S. Voloshynovskiy, O. Koval, F. Beekhof, and T. Pun, “Robust perceptual hashing as classification problem: Decision-
theoretic and practical considerations,” inIEEE Workshop on Multimedia Signal Processing, Oct. 2007, pp. 345–348.
[18] F. Willems, T. Kalker, J. Goseling, and J.-P. Linnartz,“On the capacity of a biometrical identification system,” inIEEE
Int. Symp. on Information Theory, June 2003, p. 82.
[19] H. V. Poor,An Introduction to Signal Detection and Estimation, 2nd ed. Springer, 1994.
[20] A. L. Varna and M. Wu, “Theoretical modeling and analysis of content identification,”Proc. of IEEE Int. Conf. on
Multimedia & Expo, Jul. 2009.
[21] A. Vetro, S. C. Draper, S. Rane, and J. S. Yedidia, “Securing biometric data,” inDistributed Source Coding, P. L. Dragotti
and M. Gastpar, Eds. Academic Press, Jan. 2009, ch. 11, pp. 293–323.
[22] R. M. Roth, Introduction to Coding Theory. Cambridge University Press, 2006.
[23] A. Shwartz and A. Weiss,Large Deviations for Performance Analysis. Chapman and Hall, 1995.
[24] R. Kinderman and J. L. Snell,Markov Random Fields and their Applications. American Mathematical Society, 1980.
[25] A. K. Jain, Fundamentals of Digital Image Processing. Prentice-Hall, 1989.
[26] C. M. Bishop,Pattern Recognition and Machine Learning. Springer, 2006.
[27] A. L. Varna, A. Swaminathan, and M. Wu, “A decision-theoretic framework for analyzing binary hash-based content
identification systems,” inProc. ACM Workshop on Digital Rights Management, Oct. 2008, pp. 67–76.
[28] F. Wang and D. P. Landau, “Efficient, multiple-range random walk algorithm to calculate the density of states,”Physical
Review Letters, vol. 86, no. 10, pp. 2050–2053, Mar. 2001.
[29] P. D. Beale, “Exact distribution of energies in the two-dimensional Ising model,”Physical Review Letters, vol. 76, pp.
78–81, 1996.
[30] H. Derin and H. Elliott, “Modeling and segmentation of noisy and textured images using Gibbs random fields,”IEEE
Trans. on Pattern Analysis and Machine Intelligence, vol. 9, no. 1, pp. 39–55, Jan. 1987.
DRAFT
![Page 30: 1 Theoretical Modeling and Analysis of Content Fingerprinting](https://reader033.vdocuments.us/reader033/viewer/2022042614/6263e0b0119c2c39c759457a/html5/thumbnails/30.jpg)
30
APPENDIX
WANG-LANDAU SAMPLING
In statistical physics, the density of statesg(E) is an important quantity that enables the computation
and characterization of various thermodynamic propertiesof a physical system. Given a physical system
which may exist in different configurations or states, the density of statesg(E) is defined as the number
of states that have a given energyE. An advantage of using the density of states for determining
properties of the physical systems is thatg(E) is independent of temperature and can be used to compute
thermodynamic properties of the system at any temperature.Traditional MCMC methods do not allow for
accurate estimation of the density of statesg(E) directly. Wang and Landau [28] proposed a technique
for estimating the density of states by performing a random walk in the energy space that results in a
flat histogram of energies visited. We illustrate the algorithm using the example of a physical system that
has spins±1.
Initialize the system randomly and start with an initial value for the density of states, e.g.g(E) = 1.
At each iteration, a random spin is flipped with probabilityp(E → E′) = min(
g(E)g(E′) , 1
)
, whereE is
the energy of the current state andE′ would be the energy of the resultant state if the spin were to
be flipped. After this trial, ifE∗ is the energy of the resultant state, the density of states isupdated as
g(E∗)← g(E∗)× fi, wherefi is an update factor. Initially,f0 is chosen to be large enough so that all
the energy levels are visited quickly, e.g.f0 = e. A histogram of the number of times each energy level
has been visited is also stored. Once this histogram is “flat enough”,fi is reduced and the histogram is
reset. This process is continued untilfi becomes small enough, e.g.fi < exp(10−8). Theg(E) obtained
after convergence is relative and is normalized to obtain anestimate of the density of states.
The above algorithm can also be used to estimate the density of states in multiple parameters, e.g.
g(M,E) as a function of the magnetizationM =∑
i xi and energyE, by performing the random
walk in the appropriate parameter space. For large systems,the parameter space can be divided into
several regions and independent random walks can be performed over each of these regions for faster
convergence. The overall density of states can then be reconstructed from these individual estimates by
ensuring continuity at the boundaries [28].
DRAFT