computational entropy and cryptog raphic constructions

Computational Entropy and Cryptographic Constructions

CitationPekala, Russell F. 2019. Computational Entropy and Cryptographic Constructions. Bachelor's thesis, Harvard College.

Permanent linkhttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364591

Terms of UseThis article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

Share Your StoryThe Harvard community has made this article openly available.Please share how this access benefits you. Submit a story .

Accessibility

https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364591

http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

http://osc.hul.harvard.edu/dash/open-access-feedback?handle=&title=Computational%20Entropy%20and%20Cryptographic%20Constructions&community=1/1&collection=1/4927603&owningCollection1/4927603&harvardAuthors=e2ef939d62c7be27d2a12bbc5b07c362&departmentComputer%20Science

https://dash.harvard.edu/pages/accessibility

Computational Entropy andCryptographic ConstructionsAn Expository Senior Thesis for the Joint Concentration Requirements ofMath and Computer Science.

Russell Pekala, Harvard College 2019

March 29, 2019

I n the field of cryptography, the primitive of aone-way function is a powerful tool that canbe effectively wielded to construct such tools as

pseudorandom generators (Haastad et al., 1999),commitment schemes (Vadhan and Zheng, 2011),and universal one-way hash functions (Haitner etal., 2010). When first discovered, these construc-tions were intricate, complicated, and seeminglyunmotivated. Recent work has been done to for-malize several new notions of computational en-tropy including next-block pseudoentropy and in-accessible entropy. These entropy formulationsmake cryptographic constructions from one-wayfunctions more modular and better expose the sim-ilarities between pseudorandom generators, com-mitment schemes, and universal one-way hashfunctions. This thesis will primarily explore theconstructions of pseudorandom generators (PRGs),but will also touch on commitment schemes anduniversal one-way hash functions. The focus onthis thesis will be on understanding the definitionsof computational entropy, which may mean thatsome of the specifics of the construction of Vadhanand Zheng, 2011’s PRG from an arbitrary one-wayfunction may be lacking. In an effort to be self-contained, this paper also devotes time to under-standing the “historical” development of compu-tational entropy and to explaining the necessarymathematical and statistical background.

Contents

1 Background 21.1 Classical Definition of Information . . . 21.2 A Historical Perspective of Randomness 3

1.2.1 The Statistical Interpretation . 41.2.2 The Generative Interpretation . 41.2.3 The Indistinguishability Inter-

pretation . . . . . . . . . . . . 51.3 Cryptography Background . . . . . . . 5

1.3.1 Basic Encryption Scheme . . . . 51.3.2 Pseudorandom Generators . . . 61.3.3 Commitment Schemes . . . . . 6

1.4 Mathematical Tools . . . . . . . . . . . 71.4.1 One-Way Functions . . . . . . . 71.4.2 Statistical Divergence . . . . . 9

1.5 Statistics Review . . . . . . . . . . . . 9

2 Previous Results 102.1 Next-Bit Unpredictability and Yao’s The-

orem . . . . . . . . . . . . . . . . . . . 102.2 Goldreich-Levin Theorem . . . . . . . 112.3 Expanding Goldreich-Levin . . . . . . 13

3 Formalizing Pseudoentropy 143.1 Formal Indistinguishability . . . . . . . 14

3.1.1 Understanding the Sampling Or-acle . . . . . . . . . . . . . . . 15

3.2 Next-Block Pseudoentropy . . . . . . . 163.3 Pseudoentropy as KL Divergence . . . 173.4 Distinguishers . . . . . . . . . . . . . . 193.5 Universal Distinguishers . . . . . . . . 20


3.6 KL-hard Implies Conditional Pseudoen-tropy . . . . . . . . . . . . . . . . . . . 21

3.7 Pseudoentropy Implies KL-hardness . . 22

4 Entropy Conversions 224.1 Getting Next-Block Pseudoentropy . . 224.2 Ω(log n) Entropy Gap . . . . . . . . . . 23

5 Constructing a PRG 235.1 Next-Bit Pseudoentropy to Next-Bit

Pseudo-Min-Entropy . . . . . . . . . . 235.2 Next-Block Pseudoentropy to Pseudo-

randomness . . . . . . . . . . . . . . . 245.3 Construction Outline . . . . . . . . . . 255.4 Formal Explanation . . . . . . . . . . . 25

1 Background

1.1 Classical Definition of Information

The study of computational entropy is the computa-tional analogue of Shannon’s study of information, soit’s natural that we start here. Shannon’s study of infor-mation was motivated by the question of securely andefficiently transmitting information over a channel thatcould could potentially be listened to by adversaries.The efficient transmission of information requires the

compression of that data so that it can be representedin as few bits as possible. If there is a deterministicone-to-one function mapping one stream of bits to an-other (through some sort of compression scheme) thenideally there would be a way to measure that the twomessages contain the same amount of underlying “in-formation”, or “entropy” as we’ll refer to it throughoutthe rest of this paper1.We’ll first state Shannon’s definition and then explain

the intuition for it.

Definition 1.1. The Shannon sample-entropy of x ∈suppX is defined to be

HX(x) = − logP(X = x) (1)

Remark. Since 0 < P (X = x) ≤ 1, 0 ≤ HX(x) < ∞.Also note that more probable outputs of X have lowerentropy.We can extend this definition, by defining the Shan-

non entropy of a random variable to be the expectationof the sample-entropy over the random variable’s sup-port.

Definition 1.2. The Shannon entropy of a randomvariable X, denoted H(X) is

H(x) = Ex←X

[− logP(X = x)

](2)

where the notation x← X denotes that x ∈ suppX.1Later when we define computational entropy we will start from

this point but reduce our requirement to say that streams of infor-mation have the same computational entropy only if such a map canbe efficiently computed.

Remark. When the logarithm in the above definitionis chosen with respect to base 2, the alphabet, Σ, is0, 1, and X is equally likely to produce 1’s as it is0’s, then it will have entropy 1. This denotes that eachsymbol carries one bit of information.Remark. It can easily be checked that the Shannonentropy is maximized when X is a uniform randomvariable.Conditional entropy follows as expected. The condi-

tional Shannon entropy is simply the Shannon entropyof a conditional random variable. Formally we havethe following:

Definition 1.3. The conditional Shannon entropy ofthe random variableB with respect toX, defined whenB and X are jointly distributed is given by

H(B|X) = Ex∈suppX

[H(B|X=x)] (3)

These definitions are not an an accident or arbitrarychoice. Shannon, 1948 justified this definition by prov-ing the following beautiful result.

Theorem. (Shannon-Entropy is well-defined): For arandom variable X with finite support S = vi =suppX, if pi denotes the probabilities of the associatedvi’s, then Definition 1.2 is the only definition up toconstant scaling that satisfies the following properties.

1. H(X) is continuous w.r.t each pi.2. If all the pi are equal, then H is a monotonically in-

creasing function in |Σ|. This implies that with moresymbols, it’s possible to express more information.

3. The entropy of a random variable will be the sameno matter how you condition on its outcome. Thatis, for any partition P = Sk partitioning S,

H(X) =∑Sk∈P

P(X ∈ Sk)H(X|X ∈ Sk) (4)

Proof. The proof makes use of basic properties of loga-rithms and probabilities, but is still pretty clever andunfortunately long. The proof is included in Shannon,1948.

Sometimes, especially when trying to prove boundson entropy, it is convenient to have the following addi-tional definitions of entropy.

Definition 1.4. The Shannonmin-entropy of a randomvariable X, denoted H∞(X) is

H∞(X) = minx∈supp(X)

H(X) (5)

Definition 1.5. The Shannon max-entropy of a ran-dom variable X, denoted H∞(X) is

H0(X) = maxx∈supp(X)

H(X) (6)

Remark. It is a bit confusing that H0 denotes max-entropy while H∞ denotes min-entropy.

Page 2 of 26


It is easily verifiable from the definitions above that

H∞(X) ≤ H(X) ≤ H0(X) (7)

with equality coming only when X is flat2. This showswhy min-entropy is useful when establishing lowerbounds on H(X) and max-entropy is useful when es-tablishing upper bounds on H(X). In our analysis ofpseudorandom generators we will be especially inter-ested in lower bounds on entropy, and in our analysisof commitment schemes we will be interested on upperbounds on entropy. The reasons for this will becomeclear in later sections.

There is a variant of the Law of Large Numbers thatwe will use to relate Shannon entropy to min-entropy.

Proposition 1.1. You can convert a random variable Xtaking values in U and having a given Shannon entropyto a random variableXt taking values in U t with a smallstatistical difference from a random variable with a givenShannon min-entropy. Formally,

P[HXt(x)− t ·H(X) ≤ O

(√t · log (1/ε) log (|U| · t)

)](8)

≥ 1− ε− 2Ω(t) (9)

Proof. The proof uses Chernoff–Hoeffding bounds andsome pretty technical math. For details, see Haitner,Reingold, and Vadhan, 2010.

Another important result is the following.

Proposition 1.2. It is impossible to increase the Shan-non entropy of a random variable by passing it througha deterministic function.

H(g(X)) ≤ H(X) (10)

Proof. We will only prove this for the case when| suppX| < ∞ which is sufficient for our purposes.Enumerate suppX = vi and for i ∈ [ | suppX| ] let

2A random variable is flat if its probability distribution is spreadevenly over its support

pi = P(X = vi). Then

H(g(X)) = Ey∈supp g(X)

Hg(X)(y)

=∑

y∈supp g(X)

P[g(X) = y]Hg(X)(y)

=∑

y∈supp g(X)

−P[g(X) = y] log(P[g(X) = y]

)= −

∑y

( ∑x∈g−1(y)

P[X = x])

log( ∑x∈g−1(y)

P[X = x])

≤∑

y∈supp g(X)

∑x∈g−1(y)

−P[X = x] logP[X = x]

=∑

x∈suppX

−P[X = x] logP[X = x]

=∑

x∈suppX

P[X = x]HX(x)

= Ex∈suppX

(HX(x))

= H(X)

The crucial step above is the fact that log(x) is concave;that is for 0 < x, y ≤ 1

(x+ y) log (x+ y) ≤ x log x+ y log y (11)

It’s important to keep Proposition 1.2 in mind sinceits result will not hold for computational definitions ofentropy.The following proposition decomposes conditional

Shannon entropy. It will have a statistical analogue(1.8) and a pseudoentropic one (3.4) later on.

Proposition 1.3. (Chain rule)

H(X,B) = H(X) +H(B|X) (12)

Proof. This proof runs very similarly to the proof ofProposition 1.2, and will not be included.

Remark. If B were a function of X, then we wouldhave

H(X, f(X)) = H(X) = H(X) +H(f(X)|X) (13)

The quantity H(X|f(X)) is called the degeneracy off .

1.2 A Historical Perspective of Random-ness

As we’ll see with our introduction to encryptionschemes in Section 1.3.1, classical notions of entropyare not enough to be useful cryptographically. Instead,we will need to consider pseudoentropy, whose theorywas developed by the search for a good definition ofrandomness. I find this topic fascinating, so this sectionis probably longer than it has to be for our purposes.

Page 3 of 26


The question of what “randomness” is has been askedand answered in many different forms. This paper willexpand on the three definitions that got the most at-tention, and will show why the last of these notionswas chosen as the basis for cryptography. These threeframeworks measure the randomness of sequences ac-cording to

1. How the observed frequencies of finite substringsin a particular sequence compares to expectedfrequencies, assuming that the sequence were ran-dom noise

2. How much information must be given for an algo-rithm to generate the sequence

3. How much computation must be used in order todistinguish the sequence from the uniform distri-bution

There are similarities in these definitions. In particular,definitions (1) and (3) each use the uniform distribu-tion as a baseline for randomness while definitions (2)and (3) use computational difficulty as a yardstick formeasuring it. Before diving into the modern definitionof pseudorandomness, definition (3), we will brieflyexplain the motivations and limitations of the othertwo methods.

1.2.1 The Statistical Interpretation

The first interpretation of this question seeks to charac-terize a random sequence as one that satisfies certainstatistical properties that characterize truly randomnoise. Truly random binary noise, for example, wouldbe expected to have the same frequency of 1’s and 0’sin it. Given a (finite or infinite) binary sequence s, itis possible to sample such a sequence in order to es-tablish a confidence interval for the frequency of 1’sin s. Thus it is possible to conduct statistical tests thatmight accept or reject candidate random sequenceswith a certain degree of confidence. Additional sta-tistical tests, for example sampling the distribution oflength-n substrings of s, could also be thrown into thehypothesis test for s being random.This definition of randomness was highly motivated

by the concept of a normal number, which can beviewed as a sequence which will pass all simple sta-tistical randomness tests like those described above,despite not itself being random.

Definition 1.6. A number x ∈ Rwritten in an alphabetΣ is called a normal number if for any finite stringw ∈ Σ∗ the number of timesw appears as a substring ofx in the first n digits of x, denoted Ns(w, n), convergesto its statistical expectation:

limn→∞

Ns(w, n)

n=

1

|Σ||w|(14)

Remark. Although Borel, 1909 used measure theoryto show that almost all numbers are normal, very fewnumbers are proven to be normal. In particular, it isan open question whether π, e, and

√2 are normal.

Remark. Recent research in the theory of normal num-bers has been inspired by the theory of pseudorandomnumber generators. In particular, Bailey and Crandall,2001 showed that if there exist pseudorandom gen-erators with certain properties, then several famousmathematical constants including π would be normal.

Although normal numbers offer an “ideal” of thistype of randomness, they demonstrate the limitationsof basing cryptography on statistical randomness alone.Explicitly

1. It is hard to develop an exhaustive set of testsfor which we want to guarantee a string looksrandom.

2. It is not clear whether making a string statisticallyrandom will ensure that it would be impossible foran adversary to discover the function generatingthe random sequence, and hence “break” the code.

1.2.2 The Generative Interpretation

The next interpretation of entropy brings in the ele-ment of computation and its formulation is due to Kol-mogorov, 1968. Informally, the Kolmogorov entropyof a sequence s is the length of the shortest possibleprogram p that would output the string s. Since notall real numbers are computable3, it is possible that(infinite) sequences have infinite Kolmogorov entropy.The formal definition is below

Definition 1.7. The Kolmogorov Entropy K of a (fi-nite or infinite) sequence s relative to the programmingparadigm φ is

Kφ(s) =

minφ(p)=s |p|∞ 6 ∃p|φ(p) = s

(15)

Although this definition of entropy is beautiful andintuitive, there are a few reasons why it is an intractablestarting point for cryptographic applications.

1. As can be seen from Definition 1.7, the Kol-mogorov entropy is dependent on fixing a par-ticular programming language φ.

2. The Kolmogorov entropy (even for finite se-quences) is uncomputable4.

3. Even if we knew that a sequence s had Kolmogoroventropy k <∞, there is no guarantee that the pro-gram represented by k bits of information couldefficiently produce the sequence s from a com-plexity point of view. It could, for example, takeexponential time.

3For a number to be computable, a finite program must outputthat number on its Turing Machine tape. But there are countablymany finite programs and an uncountably many real numbers. Thusthere must exist real numbers that are not computable.

4This is a fun problem to think about. Its proof is beyond thescope of this paper.

Page 4 of 26


1.2.3 The Indistinguishability Interpretation

The third notion of randomness to be developed his-torically was that of computational indistinguishabil-ity. Like the notion of randomness measured by Kol-mogorov entropy, this definition of randomness comeswith its own, computationally-grounded definition ofentropy.Though it will take some significant work to define

the notion of “computationally indistinguisbable” in thedefinition below, we follow Vadhan, 2012 by informallydefining pseudoentropy first.

Definition 1.8. (Informal) We say that the pseudoen-tropy, sometimes called “computational entropy”, ofa random variable X is at least k if and only if thereexists a random variable Y such that the following twoconditions hold:

1. Y ≡c X. The notation Y ≡c X means that X iscomputationally indistinguishable from Y .

2. H(Y ) ≥ k, where H(·) denotes Shannon entropyfrom Definition 1.2.

This definition makes a lot of intuitive sense, since itdefines the computational entropy of a random variableX to be the limit supremum of the entropies of randomvariables that look identical computationally to X. Inpractical situations, it will thus be impossible for anypolynomial-time algorithm to treat X any differentlythan it would treat the (potentially higher Shannonentropy) random variable Y .Pseudorandomness, as defined in cryptography and

from now on in this paper, is not equivalent to pseu-doentropy. To be pseudorandom, a random variablemust not only have lower real entropy than can beefficiently computed, but also must look uniform.

Definition 1.9. (Informal) We say that a function g isa pseudorandom generator if

1. Its outputs are longer than its inputs.2. Its outputs are computationally indistinguishable

from being uniformly distributed.

All pseudorandom generators are also pseudoen-tropy generators, but the converse does not hold.

Example. If g is a PRG, then g′ defined to be g with aprepended 1 (g′(x) = 1g(x)) is pseudoentropic but notpseudorandom. Clearly it’s not pseudorandom becausethe first digit of its output will always be 1 and hence itsoutput does not look uniform. But it’s still pseudoentropicsince is still stretching and still just as hard to recoverthe input from the output.

Section 3 will focus on formalizing Definition 1.8and Section 4.1 will show how to make entropy ofthis kind useful in the construction of pseudorandomgenerators, the topic of Section 5.

1.3 Cryptography Background

There are three important primitives from cryptogra-phy that are necessary to explain in order to motivatethe formulation of pseudoentropy. These are (i) en-cryption schemes, (ii) pseudorandom generators, and(iii) commitment schemes. We’ll give a quick survey ofeach of these areas.

1.3.1 Basic Encryption Scheme

To motivate the development of pseudo-random gener-ators in the next section, we will give a cryptographicuse case for them within the context of encryption al-gorithms. We use definitions aligning with those inArora and Barak, 2009.

Definition 1.10. An encryption scheme is an orderedpair (E,D) of deterministic encryption and decryptionalgorithms E and D that, when given a private key k,operate in the following way.

1. In the encryption phase, Ek operates on x, theplaintext (also called the message) to produce y,the ciphertext. Ek(x) = y.

2. In the decryption phase, Dk operates on y to re-cover x. Dk(y) = x.

Definition 1.11. An encryption scheme (E,D) is per-fectly secret if it is impossible for an adversary toacquire any information about a plaintext messagefrom observing its ciphertext. This means that ifEk : 0, 1n → 0, 1m with k ∈ 0, 1l then for allmessages x, x′ ∈ 0, 1n, EUl

(x) and EUl(x′) are iden-

tical as distributions over 0, 1m.

Remark. Note that it is assumed that k is private butthat E and D are potentially known to any attacker.

Example. The one-time pad encryption scheme is per-fectly secret. This encryption scheme takes plaintext in0, 1n and produces ciphers in 0, 1n. It’s secret keyk ∈ 0, 1n. E and D are defined by the following equa-tions:

Ek(x) = k ⊕ xDk(y) = k ⊕ y

where a⊕b denotes the bit-wise XOR of a and b satisfying|a| = |b|. We must check two things:

• We need to ensure that every cipher can be decrypted.Dk(Ek(x)) = Dk(k⊕x) = k⊕ (k⊕x) = (k⊕k)⊕x = 0n ⊕ x = x.

• We need to ensure that the encryption scheme hasperfect secrecy. Since each bit of k is independent ofother bits of k, then each bit of y must be as well.Thus the encryption scheme has perfect secrecy ifany one bit has perfect secrecy. But, if kj , a singlebit of of k, is assumed to be uniformly distributedover 0, 1 then the distribution of xj = yjXORkjis also uniformly distributed.

Page 5 of 26


Remark. A one-time pad can only be used in perfectsecrecy once. Otherwise if y1 = x1⊕k and y2 = x2⊕kthen y1 ⊕ y2 = (x1 ⊕ k) ⊕ (x2 ⊕ k) = x1 ⊕ x2, whichgives information about the distributions for x1 and x2.

In fact, as proven by Shannon, 1949 the one-timepad of Example 1.3.1 is actually the best we can do ifwe require perfect security. Any scheme with |k| < |n|allows adversaries to obtain a posteriori informationabout the plaintext.

Proposition 1.4. Assume an encryption scheme (E,D)with E : 0, 1n → 0, 1m that has |k| < n. Then(E,D) cannot be perfectly secret.

Proof. For any x ∈ 0, 1n, k ∈ 0, 1l we can define

Lx = y ∈ 0, 1m|y = Ek(x) (16)

To show that perfect secrecy is violated, we need toshow that there exists x′, x′′ such that Lx′ 6= Lx′′ . As-sume, seeking a contradiction, that Lx′ = Lx′′ = L forany x′, x′′ ∈ 0, 1n.Then we count. There are 2n unique plaintext values

while |L| = 2l. But this means that for some k, x′, x′′it must be the case that Ek(x′) = Ek(x′′). But this isimpossible because Ek must be injective if it is to bedecryptable. Thus we have a contradiction and we aredone.

The motivation for pseudorandom generators is thefollowing. If we are able to transform the key k in away that preserves the usefulness of its randomnessbut increases its size, then it may be possible to atleast make it really hard for algorithms to distinguishUl(x) from Ul(x

′). Thus we could at least have someguarantees about the security of encryption schemes.

1.3.2 Pseudorandom Generators

The formal definition of a pseudorandom generator(PRG) is based on the informal one we gave in Section1.2.3.

Definition 1.12. Let G : 0, 1n → 0, 1l(n) be apolynomial-time computable function. Let l : N→ Nbe a polynomial-time computable function such thatl(n) > n for every n. We say that G is a pseudoran-dom generator of stretch l(n) if for every x ∈ 0, 1nand for every probabilistic poly-time algorithmA, thereexists a negligible function ε : N→ [0, 1] such that∣∣∣P[A(G(Un)) = 1

]− P

[A(Ul(n)) = 1

]∣∣∣ < ε(n) (17)

Remark. Sometimes PRGs are described according totheir stretch and sometimes according to their seed. Thestretch of a PRG is the number of bits of informationproduced by G as a function of the the number of inputbits. The seed of a PRG is the number of input bitsrequired.

In essence, a PRG is an efficiently computed func-tion that takes a small number of bits and produces alonger stream of bits that can’t be distinguished fromuniform bits by any computationally efficient algorithm.The condition that the PRG output be indistinguishablefrom a uniform distribution seems like a tough condi-tion to be met, but can actually be ensured by meetinga much weaker condition, that of unpredictability (Thisis Yao’s Theorem, the subject of Section 2.1).

1.3.3 Commitment Schemes

A commitment scheme is inspired by the problem ofhow to flip a coin over the phone. Imagine two parties, asender party S who must “call the flip”, and a receiverparty R who will perform the coin flip after hearingwhat S called. Clearly, if S does not trust R, then Scannot send its guess to R without being worried thatan adversarial R would simply report that the coin flipdid not go in S’s favor.Knowing this in advance, R and S agree to a hash

function h : 0, 1n → 0, 1m with m < n and agreethat S will send h(x) to R where x is its guess forthe coin flip. Additionally, R and S agree to someconvention for which strings in 0, 1n correspond toheads and tails (perhaps according to the parity ofthe number of 1’s in the message). Then, since h(·) isa hash function, R will not be able to invert h(x) torecover x. Thus R must be honest about the coin flip itperforms.Now comes the second tricky part: if S was correct

it its call, then S must prove this to R. It does thisby sending R its original value of x. Should R trustthat this was the same x that S came up with beforethe coin was flipped? The answer depends on howdifficult it is for S to find another value x′ such thath(x′) = h(x) yet x′ corresponds to an opposite guessfor the coin flip. It turns out that such functions h existwhich make it NP-Hard for S to find such an x′, givingR a computational guarantee on the integrity of theprotocol.[Insert diagram of this protocol between S and R]Having motivated their development we now for-

mally define a commitment scheme along with the twoproperties that define their usefulness.

Definition 1.13. A commitment scheme (S,R) is atwo-party, two-stage protocol between a sender S anda receiver R with security parameter 1n available toeach S and R.

1. In the first stage of the protocol, S commits to aprivate message m ∈ 0, 1n. The result of thiscommitment is a joint output c, the commitment,and a private output d to S, the decommitment.

2. In the second stage of the protocol, S sends (d,m)to R and R either accepts or rejects.

Remark. We can think of d as random coins availableto S.

Page 6 of 26


Figure 1: A very basic schematic of a commitment scheme.Commitment schemes are only two interactions.Commitment schemes are useful when events (likethe coin flip problem) are included between the com-mit and reveal phase.

Definition 1.14. A commitment scheme (S,R) is com-plete if R will always accept when interacting with anhonest S.

Definition 1.15. A commitment scheme (S,R) is hid-ing if there does not exist an adversarial R∗ that is ableto get information aboutm from just the commit stage.This means that for all security parameter values

n ∈ N, for all messages m,m′, and for all R∗,

ViewR∗(S(m), R∗) ≡ ViewR∗(S(m′), R∗)

where S(m) denotes the commitment produced frommessage m.

Remark. The notation ViewT (S,R) denotes all the in-formation about m that T would possibly be able toobtain through its interactions with S and R.

Definition 1.16. A commitment scheme is binding ifno S∗ is able to send different information thanm afterit has committed to m by sending c to R.This means that S∗ succeeds in the following game

with negligible probability in n.

GAME CHEAT-COMMIT

1. On security parameter 1n, S∗ interacts with R inthe commit stage resulting in a commitment c.

2. S∗ outputs two pairs (d,m) and (d,m′) withm 6=m′ and R(c, d,m) = R(c, d′,m′) = ACCEPT.

The above definitions are imprecise because theydon’t specify what S∗ and R∗ are allowed to be. This isbecause the hiding and binding properties of a commit-ment scheme may be of two5 different strength levels:computational or statistical. For computational secu-rity, we require that the adversarial algorithm S∗ or

5There is also a stronger notion of security, that of perfect security,that is stronger than even statistical security. It is not relevant forthis paper. Roughly, perfect security schemes require that S∗ andR∗ be able to do no better than guessing, which is stronger thansaying they can succeed up to a negligible probability.

R∗ be a PPT. For statistical security, we allow S∗ orR∗ to have computationally unbounded power. Statis-tical security is thus a higher standard of security, andimplies computational security.As intuition might tell you, commitment schemes

cannot be both statistically binding and statisticallyhiding. This means that

Proposition 1.5. It is impossible for a commitmentscheme (S,R) to be both statistically binding and statis-tically hiding.

Proof. Assume (S,R) is statistically hiding. This meansthat a computationally unbounded R is unable toget any information about m from the commitmentc. Since the process of computing a commitment isknown to both parties, this means that there must ex-ist (m, d) 6= (m′, d′) such that both (m, d) and (m′, d′)produce c as a commitment. If this wasn’t the case, Rwould be able to obtain a unique (m, d) from c. But, theexistence of these multiple solutions show that a com-putationally unbounded S can produce either (m, d) or(m′, d′) in the reveal phase, meaning that (S,R) can’tbe statistically binding.

Because of the above result, it’s redundant to mea-sure the security of a commitment scheme by both itshiding and binding properties. Commitment schemeswhich are statistically binding and computationallyhiding are simply called statistically binding and com-mitment schemes which are statistically hiding andcomputationally binding are simply called statisticallyhiding.

1.4 Mathematical Tools

There are several mathematical tools that will come inhandy and should be described before they are utilized.

1.4.1 One-Way Functions

One-way functions (OWFs) are the cryptographic primi-tive that we will use to construct computational entropyin later sections. They are functions that are hard toinvert yet still easy to compute.First, a quick related definition.

Definition 1.17. A function ε(n) is called negligibleif eventually it is smaller than all polynomials whichapproach zero. Formally, we say ε(n) is negligible if

∀c ∃Nc s.t. ∀n > Nc |ε(n)| < 1

nc(18)

Remark. Oftentimes, ε(n) is referred to as simply ε andthe reader is expected to understand by the contextthat ε is a function and not just a small constant.

Proposition 1.6.

ε is negligible ⇐⇒ ∀c ∈ N limn→∞

ε(n)nc = 0 (19)

Page 7 of 26


Remark. The proof for this proposition is not very in-sightful and thus will not be included. It’s a usefulproposition since limits are somewhat more intuitivethan universal and existential quantifiers.Now we move on to the main definition of this sec-

tion.

Definition 1.18. A deterministic polynomial-time com-putable function f : 0, 1∗ → 0, 1∗ is said to bea one-way function (OWF) if for every probabilisticpolynomial-time algorithm A there is a negligible func-tion ε : N→ [0, 1] such that for every n ∈ N

Px∈0,1n

[f(A(f(x))) = f(x)

]< ε(n) (20)

Remark. You might wonder why negligible functionsin particular are chosen for this bound. In lieu ε(n),it would be tougher to require a (1/2n) bound, whichwould correspond to A not being able to do any bet-ter than guessing. Bounds this strong, however, arenot necessary for remaining secure to computationallybounded adversaries who have the ability to repeatprocesses a polynomial number of times only. Thus,using negligible functions is a more natural option.Despite OWFs being a building block of the rest of

our analysis, their existence is not proven.

Proposition 1.7. Complexity theory has not yet beenable to prove which of the following worlds6 we live in

1. P=NP and no one-way functions exist.2. P 6=NP and no one-way functions exist.3. P6=NP and one-way functions exist.

Most complexity theorists believe that OWFs exist,and their confidence is rooted in the fact that manywell-studied functions appearing naturally in math andcomputation seem to be one-way (they have thus farevaded efficient inversion algorithms). These includemultiplication, the RSA function, and the computationof quadratic residues (Arora and Barak, 2009).In working with OWFs, sometimes its helpful to have

a large collection of OWFs which differ only in thesetting of several initial parameters.

Definition 1.19. A one-way function family F is a fi-nite collection of functions fi : 0, 1n → 0, 1mi∈Isuch that each fi is a OWF.

Let’s look at a well-known example of such a functionfamily.

Definition 1.20. A collision-resistant hash functionfamily is a function family F = hi such that eachhi : 0, 1n → 0, 1m ∈ F is compressing (satisfiesm < n) and hard to compute collisions from

Px0

[A(hi, x0) = x1, x0 6= x1, hi(x0) = hi(x1)

]< ε(n)

6Russell Impagliazzo’s famous “five worlds” of possible complexityuniverses partitions the set of possible worlds even further (Impagli-azzo, 1995).

Remark. We will use these functions later in the con-struction of commitment schemes.

Definition 1.21. A family of functions Fk = Fz :0, 1n(k) → 0, 1m(k)z∈0,1k is family of universalone way hash functions (UOWHFs) if it satisfies:

1. Efficiency: Given z ∈ 0, 1k, Fz(x) can be evalu-ated in time poly(n(k), k).

2. Shrinking: m(k) < n(k).3. Target Collision Resistance: For all PPTs

A, P[A succeeds] < ε(n) where ε is neg-ligible and the game is the following:GAME MAKE-COLLISION

(a) A receives security input 1k and a de-scription of F (not yet passed a securitykey).

(b) A may choose any x.(c) A hash parameter k is given to A, drawn

independently at random from 0, 1k.(d) A succeeds if it outputs x′ such that

Fk(x) = Fk(x′).

Remark. UOWHFs differ from CRHFs because in CRHFscheme A is allowed to find any pair of distinct preim-ages that evaluate to the same image after being giventhe specific key k, whereas in UOWHFs A must decidewhich value to invert before getting the key.As is similar with the construction of PRGs from

OWFs, UOWHFs were first constructed from one-wayhash functions and then this method was generalizedto handle OWFs in general (Naor and Young, 1989).A general OWF f : 0, 1n → 0, 1m may have

m > n, m < n, or m = n. This can be seen directlyfrom Definition 1.18, since the Turing Machine A fromthat definition is able to compute any inverse for y as-suming that y is the image of some x ∈ 0, 1n. Whenf is an OWF with m = n, we call f a One-Way Per-mutation, defined below. Assuming the existence ofOne-Way Permutations is useful in many constructionsthat use OWFs since it avoids the need for randomnessextractors (to be defined later).

Definition 1.22. A one-way permutation (OWP) fis a OWF that permutes its input, i.e. f : 0, 1n →0, 1n

Remark. It is not known whether one-way permuta-tions exist, or whether their existence is implied by theexistence of OWFs. Several known function familiesare believed to be OWP families. One of which is theRSA function (below example).

Example. The RSA function is believed to be a OWP. Itis defined by the following.

• The set I = (n, e) : n = pq, gcd(n, e) = 1.• The machine T from Definition 1.19 operates by

1. Generating random n-bit primes p and q. Thiscan be done in probabilistic polynomial-time

Page 8 of 26


because primality testing is in polynomial-time.

2. Computing n = pq.3. Generating a random e ≤ n and check that

gcd(e, n) = 1.• The function f(n,e) is defined by

f(n,e)(x) = xe mod n (21)

We bring up OWPs because although PRGs andUOWHFs can be constructed from the minimal assump-tion that OWFs exist, many things are often easier forOWPs. In particular, the Goldreich-Levin Theorem (seesection 2.2) constructed a PRG from a OWP in a waythat didn’t generalize to arbitrary OWFs.

1.4.2 Statistical Divergence

We will use statistical methods to characterize pseu-doentropy in Section 3.3. In particular, we need to beable to describe precisely the distance between distri-butions. The most natural way to do this is throughstatistical difference.

Definition 1.23. The statistical distance betweentwo distributions X and Y is defined to be

∆(X,Y ) = maxT ∈U|P[X ∈ T ]− P[Y ∈ T ]| (22)

This definition of statistical difference will come inhandy in section 5.1 when we show how to create adistribution statistically close in this sense to anotherdistribution with a bound on min-entropy.We will find another notion of statistical distance

to be useful when proving properties about pseudoen-tropy.

Definition 1.24. For random variables X and Y , theKullback-Leibler Divergence (KL-Divergence) fromX to Y is

KL(X||Y ) = Ex∼X

[log

P[X = x]

P[Y = x]

](23)

with KL(A||B) := 0 if there exists a ∈ suppA such thatP[A = a] 6= 0 while P[B = a] = 0.

Remark. This is not a metric. It’s easy to see that it isnot even symmetric.For reasons that will become clear in Section 3, KL

Divergence behaves similarly to entropy in many ways.One of which is that it has a chain rule. To see this inaction we need to define conditional KL Divergence.

Definition 1.25. For jointly distributed variable pairs(X,A) and (Y,B) the conditional KL Divergencefrom A|X to B|Y is defined to be

KL((A|X)

∣∣∣∣(B|Y ))

= Ex∈suppX

a∈suppA

[log

P[A = a|X = x]

P[B = a|Y = x]

]

Proposition 1.8. (Chain Rule) For jointly-distributedX,A and Y,B the following identity holds.

KL(X,A||Y,B) = KL(X,Y ) + KL(A|X||Y |B)

Proof. We use simple conditioning to acquire the fol-lowing:

P[X = x ∧A = a] = P[X = x] · P[A = a|X = x]

P[Y = x ∧B = a] = P[Y = x] · P[B = a|Y = x]

From Definition 1.24 we can write out the following

KL(X,A||Y,B) = Ex∈suppX

a∈suppA

[log

P[A = a ∧X = x]

P[B = a ∧ Y = x]

]Plugging in the formulas acquired through conditioningand simplifying using products of logarithms, we easilyacquire our result. Slight care must be taken in thecase that some KL-Divergence is infinite.

The last result we will preemptively prove about KLDivergence is that it satisfies the an analogous propertyto entropy when passed through a functional transfor-mation.

Proposition 1.9. For any A, B random variables andg any function

KL(g(A)||g(B)) ≤ KL(A|B) (24)

Proof. The proof of this follows the general steps ofthe proof for Proposition 1.2, and thus will not berepeated.

1.5 Statistics Review

Since we will use it later in the proof of the Goldreich-Levin Theorem of Section 2.2, we are going to proveChebyshev’s Inequality. To prove Chebyshev’s Inequal-ity, we will use Markov’s Inequality, which we shallprove first. We will prove both theorems in the case ofdiscrete random variables rather than continuous ones,since that is the use case we will have in the theoremsto follow.

Theorem. (Markov) IfX is a non-negative random vari-able and a > 0 then

P[X ≥ a] ≤ E[X]

a(25)

Proof. Let IX≥a be the indicator random variable ex-pressing the event that X ≥ a. Then we trivially havethe inequality

X ≥ a · IX≥a (26)

We can take expectations of both sides since both sidesare integrable to get

E[X] ≥ aE[IX≥a] (27)≥ aP[X ≥ a] + 0P[X < a] (28)≥ aP[X ≥ a] (29)

Page 9 of 26


Dividing both sides by a doesn’t reverse the inequalitysince a > 0 by assumption, so

P[X ≥ a] ≤ E[X]

a(30)

Proposition 1.10. (Chebyshev) For a random-variableX the following inequality holds:

P(|X − E[X]| ≥ kVar[X]) ≤ 1

k2(31)

Proof. We apply Markov’s inequality for the variableY = (X − E[X]) and constant a = (k · Var[X]. Thus,after taking a simple square root, we retrieve the in-equality we sought.

There’s another proposition that will be useful.

Proposition 1.11. If the random variablesX1, X2, · · · , Xn are pairwise independent, then

Var[∑

j

Xj

]=∑j

Var[Xj ] (32)

Proof.

Var[∑

j

Xj

]=∑i

∑j

Cov[Xi, Xj ]

=∑i

Var[Xi] +∑i 6=j

∑j

Cov[Xi, Xj ]

=∑i

Var[Xi]

where we used the fact that for Xi, Xj pairwise inde-pendent, Cov[Xi, Xj ] = 0.

2 Previous Results

There are two major results that preceded the formu-lation of computational entropy but are too significantto include in the background section. These are Yao,1982’s equivalence of pseudorandomness to next-bitunpredictability and the Goldrich-Levin Theorem.

2.1 Next-Bit Unpredictability and Yao’sTheorem

The definition of a PRG from Section1.3.2 guaranteesa condition on the statistical distribution of its output.Another way of thinking about the concept of a pseu-dorandomness is to think about how difficult it wouldbe to predict the next-bit of output from the previousbits. Yao’s Theorem equates this computational unpre-dictability with our familiar notion of computationalindistinguishability.A function generating “unpredictable” bits is defined

below.

Definition 2.1. Let G : 0, 1n → 0, 1l(n) be apolynomial-time computable function with stretch l(n).G is next-bit unpredictable if for every probabilis-tic polynomial-time B there is a negligible functionε : N→ [0, 1] such that

Px∈0,1ny=G(x)

[B(1n, y1, · · · , yi−1) = yi

]≤ 1/2 + ε(n)

(33)

Remark. A polynomial-time computable function B, ifit were able to break the condition above, would becalled a predictor of G.

It seems that this notion of next-bit unpredictabilityis weaker than that of pseudorandomness provided inDefinition 1.12. If a function were a pseudorandomgenerator under that definition, then clearly it wouldalso be next-bit unpredictable.

Lemma. Let G : 0, 1n → 0, 1l(n) be a PRG withstretch l(n). Then G is unpredictable.

Proof. If there existed a predictor B for predicting yifrom y1, y2, · · · , yi−1 such that

Px∈0,1ny=G(x)

[B(1n, y1, · · · , yi−1) = yi

]> 1/2 + ε(n)

(34)

for every negligible function ε, then we could useit to distinguish between Ul(n) and G(Un). If w isthe string we want to determine which of Ul(n) andG(Un) it was drawn from, we could simply evaluateb = B(1n, w1, · · · , wn−1) and guess that the distribu-tion was drawn from G(Un) if b = wn and guess thatit was drawn from Ul(n) otherwise. By Equation (34),we know that the probability of our classification be-ing correct is great enough to violate Condition (17)from the definition of PRGs. Hence G cannot be a PRGunless it is next-bit unpredictable.

Surprisingly, we can go the other direction as well.We’ll now move on to showing this equivalence.

Following the strategy of Arora and Barak, 2009, wewill first show the following lemma.

Lemma. For every PPT algorithm A, there exists a PPTB such that for every n ∈ N and ε > 0, if P

[A(G(Un)) =

1]− P

[A(Ul(n)) = 1

]≥ ε then

Py=G(x)

x∈0,1n

[B(1n, y1, · · · , yi−1) = yi

]≥ 1/2 +

ε

l(n)

(35)

Proof. The strategy here will be to use our algorithmA to construct an algorithm B with the properties weclaim.First let’s define our predictor B. On input

1n, i ∈ [l(n)], and y1, · · · , yi−1, B will choose

Page 10 of 26


zi, zi+1, · · · , zl(n) independently at random (which itcan do being probabalistic). It will then compute

a = A(y1, · · · , yi−1, zi, zi+1, · · · , zl(n)

)(36)

If a = 1 then it will guess that zi was the correct guessfor the next bit and will output zi. Otherwise it willoutput the bitwise opposite of zi.Now we just need to show that B satisfies the con-

dition in Equation 35. To show this, it helps to in-troduce some new notation. For 0 ≤ i ≤ l(n), de-fine the distribution Di to be the distribution formedfor taking x ∈ 0, 1n, letting y = G(x), and out-putting y1, · · · , yi, zi+1, · · · , zl(n) where the zk ’s aregenerated independently and at random. We choosethis construction so that the distribution D0 = Ul(n)

and Dl(n) = G(Un), with intermediate distributionsproviding a transition between the two.We know, by conditions on the algorithm A that

P[A(Dl(n)) = 1

]− P

[A(D0) = 1

]≥ ε =⇒

E

[P[A(Di) = 1

]− P

[A(Di−1) = 1

]]>=

ε

l(n)

By this expectation, we know that B will satisfy thecondition in Equation 35 for some i if, for every i wehave

P[B(1n, y1, · · · , yi−1)

]≥

1

2+(P[A(Di) = 1

]− P

[A(Di−1) = 1

])(37)

We know thatB will predict the value zi for yi correctlyif either a from Equation 36 equals 1 and yi = zi ora = 0 and yi = 1 − zi. We can condition on whetherzi = yi. The probability of this event, by virtue of zibeing independently chosen, is 1/2. Thus

pcorrect =1

2P[a = 1|zi = yi

]+

1

2

(1− P

(a = 1|z 6= yi

])=

1

2P[A(Di) = 1

]+

1

2

(1− P

[a = 1|z 6= yi

])To get rid of this term on the right, we notice anotherrelationship. Not conditioning on zi = yi gives us

P[A(Di−1) = 1

]=

1

2P[a = 1|zi = yi

]+

1

2P[a = 1|z 6= yi

]=

1

2P[A(Di) = 1

]+

1

2P[a = 1|z 6= yi

]Combining these results gives us

pcorrect =1

2P[A(Di) = 1

]+

1

2

(1− P

(A(Di−1 = 1)

])=

1

2+ P

[A(Di) = 1

]− P

[A(Di−1) = 1

]This lets us conclude that the condition in Equation 37is satisfied, and hence our construction ofB works.

Now that we’ve shown this lemma, we can provewhat we claimed earlier.

Theorem. Let G : 0, 1n → 0, 1l(n) be unpre-dictable. Then G is a PRG with stretch l(n).

Proof. We will prove this claim by contradiction. As-sume that G is next-bit unpredictable but is not a PRG.Hence there is some algorithm and negligible functionε such that for infinitely many n’s∣∣∣∣∣P[A(G(Un)) = 1

]− P

[A(Ul(n))

]∣∣∣∣∣ ≥ ε(n) (38)

We can remove the absolute values by considering thealgorithm that returns 1 − A if necessary. For eachof these n’s, by the lemma above we can construct apolynomial-time predictor B that succeeds with prob-ability 1/2 + ε(n)/l(n). Thus to be unpredictable, Gmust be a full PRG.

Yao’s Theorem is immediately useful because it pro-vides another way to prove that a function is a PRG.More than that, though, it is an encouraging exampleof how understanding properties of entropy (compu-tational indistinguishability = computational unpre-dictability) lead to a statement about cryptography. Inmuch the same way Section 3 develops insights aboutpseudoentropy that are then used in constructions lateron.

2.2 Goldreich-Levin Theorem

Up until this point, we haven’t shown how OWFs areconnected to pseudorandomness. In this section, weshow a way to construct a PRG from a OWP. Manyparts of this method will be similar to the general casedescribed in Section 5. For example, as Vadhan andZheng, 2011 state, constructing a “hardcore” bit inTheorem 2.2 can be interpreted as a special case ofthe hashing within the method used by Haastad et al.,1999.

Theorem. (Goldreich-Levin ’89) Let f : 0, 1n →0, 1n be a OWP. Then for every PPT algorithm A thereis a negligible function ε such that

Px,r∈0,1n

[A(f(x), r) = x r

]≤ 1

2+ ε(n) (39)

where x r :=∑ni=1 xiri mod 2

Proof. We follow Arora and Barak, 2009 for this proof.Assume, to seek a contradiction, that there exists aPPT A such that for some n and for every negligiblefunction ε

Px,r∈0,1n

[A(f(x), r) = x r

]≥ ε(n) + 1/2 (40)

Our goal is to use this fact to derive a predictor Bthat runs in time polynomial to the security parametern, which would contradict our assumption that f isone-way.

Page 11 of 26


Figure 2: A simplification of the joint probability from Equa-tion 40 marginalized over x. Shaded areas representplaces where the hypothesized algorithm A success-fully predicts x r. The red area has width ε/2 andheight 1/2, and represents a lower bound on thewidth of the “good” x’s, since the total of the shadedarea must be ε(n) + 1/2. This explains Equation41.

First, we notice that it must be true that for at least anε(n)/2 fraction of the input space of x, namely 0, 1n,the following holds

Pr∈0,1n

[A(f(x), r) = x r

]> ε(n)/2 + 1/2 (41)

This can be seen clearly from Figure 2 which marginal-izes the probability space over x. We call this subsetof the x′s “good”. We will now show an algorithmthat will invert f for every good x with high proba-bility. The idea for this inversion of y is to essentiallysample values of r from 0, 1n, use some clever arith-metic to make guesses about bits of x, and then applyChebyshev’s Inequality (Proposition 1.10) to make astatement about the probability of getting f−1(y) cor-rect.

To keep this proof modular, we’ll describe threemethods. The first, CHOOSE, is used to choosepairwise independent samples of r. The second,GUESS, is used to take samples and derive a guess,x, for f−1(y). The third algorithm carefully choosesconstants for the number of true samples and numberof r vectors to construct and, from this, implementsan ingenious enumeration. We’ve separated thesealgorithms’ descriptions into three boxes so they canbe analyzed in a modular fashion.

Algorithm CHOOSE(k,m) takes as arguments (1)a number k of truly random strings to sample and(2) a number m of pairwise independent variablesto construct out of the k truly independent samples.Furthermore, it is assumed that 2k−1 < m ≤ 2k − 1.

1. Choose k strings s1, · · · , sk independently atrandom from 0, 1n.

2. By the constraint on m’s size, we know thatever element in [m] can be written as a nonzerobinary number with k digits (perhaps withprepended zeros). Define Tj ⊂ [k] as

Tj := i| ith digit of j as a k-digitbinary number is 1

3. Define

rj :=⊕t∈Tj

st (42)

where string summation is taken bitwisemod 2.

RETURN r1, r2, · · · , rm.

Although the rjj∈[m] output of the algorithmCHOOSE are clearly not fully independent, they forma pairwise independent set since any two vectorsmust differ on whether they are dependent on somefully-independent and uniform si.

We’ll first explain the guessing part of the algorithm.

Algorithm GUESS(r1, r2, · · · , rm, w) takes msamples of r ∈ 0, 1n and the current stringw ∈ 0, 1k. It’s goal is to come up with a guess forx.

FOR 0 ≤ i ≤ n:

1. FOR 1 ≤ j ≤ m, compute

zij = x rj =∑t∈Tj

x st =∑t∈Tj

wt (43)

z′ij = A(f(x), rj ei) (44)

where ei is the ith basis vector.2. Guess that xi is the majority value of zj ⊕

z′j1≤j≤m. The tilde is used to indicate that xiis just a guess of xi and may not be equal to xi.

RETURN x = x1 · x2 · · · xn.

Next we claim that if the ri put into the GUESS al-gorithm are pairwise independent, then we can obtaina statement about their variances (Proposition 1.11).

There’s one more hard part to this proof. We areonly allowed to sample so many things.

Page 12 of 26


Algorithm MAIN(y) receives as input y ∈ 0, 1nwhere it can be assumed that y = f(x) for a “good”x ∈ 0, 1n.

1. Select m = 200n/ε2 and k = dlog2 (m+ 1)e.2. Let r1, r2, · · · , rm =CHOOSE(k,m).3. FOR every string w ∈ 0, 1k:

(a) Make a guess for the tth bit of w. wt =x st, where the st was a building blockof the rj within the CHOOSE function.Note: We don’t know that this will workout, but since we enumerate over everyw ∈ 0, 1k we can be sure that eventuallyit will hold at every index.

(b) Run GUESS with the ri’s from Step 2 andthe w from this round of iteration.

4. Compute y = f(x).

If y = y, ACCEPT.

What guarantees do we have about the guessesfor bits of x? We will analyze this through the lens ofstatistics. Define the following random variables

Zi :=

1 zij = x rj ∧ z′ij = x (rj ⊕ ei)0 o.w.

(45)

Z = Z1 + Z2 + · · ·+ Zn (46)

Now, since we iterated over all w ∈ 0, 1k there existsat least one loop of Step 3 of MAIN such that zij =x rj for every index j. For this loop, Zi has one fewercondition.

Zi :=

1 A(f(x), rj ei) = x (rj ⊕ ei)0 o.w.

(47)

By Equation 41, we can say something about the expec-tation of Zi. We want to use this bound to show thatthe majority guessing from Step 2 of GUESS is unlikelyto be incorrect for this particular iteration of w wherewj = x rj for all j. The bound we want to show isP[Z ≤ m/2

]≤ 1/(10n). Using Chebyshev’s Inequality,

the fact that the Zi are identically distributed, and thefact that Z1 ≤ 1 we have

P[|Z − E[Z]| ≥ k

√Var[Z]

]≤ 1/k2 (48)

=⇒ P[|Z − E[Z]| ≥ k

√mVar[Z1]

]≤ 1/k2 (49)

=⇒ P[|Z − E[Z]| ≥ k

√m]≤ 1/k2 (50)

=⇒ P[Z ≤ E[Z]− k

√m]

+P[Z ≥ E[Z] + k

√m]≤ 1/k2 (51)

=⇒ P[Z ≤ E[Z]− k

√m]≤ 1/k2 (52)

Plugging in our value of m from Step 1 of MAIN and acarefully chosen k := ε

√m/2 we get

P[Z ≤ m/2] ≤ 4

mε2= 1/(50n) (53)

This shows that the PPT algorithm MAIN success-fully inverts good x values with probability at least1− 1/(50n). But then by some conditioning we have

P[MAIN inverts y]

≥ Px good

[MAIN(y) = x] Px∈0,1n

[x is good]

=ε(n)

2

(1− 1/(50n)

):= δ(n) (54)

Now, it’s not hard to show that the function δ is negli-gible exactly when ε is negligible. Since the result inEquation 54 holds for all negligible ε, then it holds forall negligible δ. But then, the probability of a suitableinversion for f is greater than every negligible functionδ, hence by Definition 1.18, f cannot be a one-wayfunction. This is a contradiction. Thus we assume thatthe x r bit is unpredictable given r and f(x), whichis the statement of the Theorem.

Armed with this theorem, it’s easy to construct apseudorandom generator from a OWP. More formally,

Proposition 2.1. If f : 0, 1n → 0, 1n is a one-way permutation, then g : 0, 12n → 0, 12n+1 is apseudorandom generator where g is defined by

gGL(x, r) = 〈f(x), r, x r〉 (55)

Proof. If it were not a pseudorandom generator, thenby Theorem 2.1 it would be predictable. The first 2nbits are clearly not predictable since f is a permuta-tion and x, r are allowed to be anything in 0, 1n. Thismeans the last bit must be predictable. But, since f is as-sumed to be one-way, this would violate the Goldreich-Levin Theorem. Thus gGL must be a PRG.

This was the first construction given for a PRG froman arbitrary one-way permutation. It has two majorlimitations, which will be addressed in the sections tofollow. First, the stretch of gGL is only one bit morethan its input, making it barely a pseudorandom gen-erator. And secondly, it assumes the existence of aone-way permutation f . As mentioned in the remarkto Definition 1.22, we would like to weaken the as-sumptions for constructing a PRG to only require theweaker assumption of the existence of OWFs.

2.3 Expanding Goldreich-Levin

We may wonder whether it is possible to use the toolsof the Goldreich-Levin Theorem to construct more thanone unpredictable bit. It turns out that this is possible,and relies on the simple observation that if f is a one-way permutation, then so is f2 = f f , and so on forany f i with i ∈ N. This makes x unpredictable from〈r, f i(x) · r〉.

Lemma. Given x, r ∈ 0, 1n as in Theorem 2.2, we canconstruct a PRG G(x, r) : 0, 12n → 0, 1n+N with

Page 13 of 26


stretch l(2n) = n + N that can be arbitrarily large solong as N ∼ poly(n). This function G can be defined as:

G(x, r) = 〈r, fN (x)r, fN−1r, · · · , f(x)r〉 (56)

Proof. By Yao’s Theorem (Theorem 2.1), it is sufficientto show the construction of a function that is unpre-dictable. We can construct G as in Equation 56, withthe goal of showing that G defined this way is unpre-dictable. Assume, by way of contradiction, that thereexisted a PPT algorithm B that was a predictor for G.Then for at least some fixed integer i ∈ [N ] and forevery negligible function ε

Boutput := B(r, fN (x) r, · · · , f i+1(x) r

)(57)

Px,r∈0,1n

[Boutput= f i(x) r] ≥ 1

2+ ε (58)

We claim we can use this predictor to violate theGoldreich-Levin Theorem (Theorem 2.2) by construct-ing a PPT algorithm A that can guess x r from〈f(x), r〉 with non-negligible probability better thanhalf. On input 〈r, f(x)〉, A chooses i ∈ 1, · · · , N ran-domly, computes fN−i

(f(x)

), · · · , f

(f(x)

), and then

outputs b where

b = B(r, fN−i−1

(f(x)

) r, · · · , f0

(f(x)

) r)(59)

The key here is to see that there is symmetry built intothis problem due to f being a permutation. Since theprobability in Equation 58 is taken over all 0, 1n, thedistribution of the value b in Equation 59 is the same asthe distribution obtained by choosing x′ ∈ 0, 1n suchthat x = f i(x′), and hence B will predict f i(x′) rwith probability at least 1/2 + ε by Equation 58. Thisshows that G cannot be predictable without violatingthe Goldreich-Levin Theorem, and hence G must beunpredictable and thus a PRG.The requirement that N ∼ poly(n) comes from the

fact that we need G to be computable in polynomialtime for it to fit the definition of a PRG.

As can be seen in Figure 3, the general strategy forconstructing pseudorandom bits (which come from in-ner products with the uniformly distributed r) is to usethe pseudoentropy amassed through parallel transfig-urations of x through the OWF f . In fact, the reasonwe applied f so many times was just to get multiple in-stances of the same distributions 〈fk(Un), Un〉, there’sno other purpose for f than generating separate ran-dom variables with these distributions.

3 Formalizing Pseudoentropy

This section is mostly just a smattering of definitions,nearly all following the convention of Vadhan andZheng, 2011. Be patient, for these definitions willcome in handy in the next section even if they seem a

Figure 3: A diagram showing roughly how the Goldreich-Levinalgorithm manages to construct pseudorandom bits.

bit unmotivated for now. One of the reasons this sec-tion is so technical is to handle the case of both uniformand non-uniform models of computation7.We follow closely Vadhan and Zheng, 2011 in this

section, occasionally changing small things to fit withthe notation and conventions established thus far, mostof which was adopted from Arora and Barak, 2009.

3.1 Formal Indistinguishability

Let’s begin by defining the foundational principal ofpseudoentropy, as motivated in Section 1.2.3, that ofcomputational indistinguishability.

Definition 3.1. Let n be a security parameter. Thentwo 0, 1n-valued random variablesXn and Yn are (T,ε) indistinguishable, where T (n) and ε(n) are func-tions of n, if for all time T randomized algorithms Aand all sufficiently large n

|P[A(X) = 1]− P[A(Y ) = 1]| < ε(n) (60)

If ε(n) is negligible, we simply say that Xn and Yn arecomputationally indistinguishable and denote thisX ≡c Y .

Although the above definition is completely generalfor allowed behaviors of T and ε, for the rest of thispaper we will only be interested in the case where Tis restricted to polynomial functions and ε is negligi-ble. Refer back to the discussion in section 1.4.1 fora description of why this case models real-world ad-versaries well. Each of the two definitions below givesspecial treatment to this case.

Proposition 3.1. For X, Y indistinguishable, we havefor any deterministic function f

f(X) ≡c f(Y ) (61)7A uniformmodel of computation, like the Turing Machine model,

is one where one set of logic must work for all input lengths. A non-uniform model of computation, like a circuit family, is one wheredifferent logic can be applied for each input length.

Page 14 of 26


Proof. If f(X) and f(Y ) were distinguishable, then itwould have to be that there existed a PPT A such that

|P[A(X) = 1]− P[A(Y ) = 1]| ≥ ε (62)

But then we could construct a PPT B that computesA f that could distinguish X and Y .

Definition 3.2. (PE, Nonuniform) A random variableX has (T, ε) non-uniform pseudoentropy at least kif there exists a random variable Y withH(Y ) ≥ k suchthat X and Y are (T, ε) non-uniformly indistinguish-able. To separate out polynomial indistinguishability asspecial, we sayX has non-uniform pseudoentropy atleast k = k(n) if for every constant c, Xn has (n, 1/nc)non-uniform pseudoentropy at least k(n)− 1/nc.

Conditional pseudoentropy is defined in a very simi-lar way. To be thorough, here is the definition:

Definition 3.3. (CPE, Nonuniform) LetB be a randomvariable jointly distributed with X. We say that B has(T, ε) nonuniform pseudoentropy at least k givenXif there exists a random variable C jointly distributedwith X such that the following holds:

1. H(C|X) ≥ k.2. (X,B) and (X,C) are (T, ε) indistinguishable.

If B = B(n) for a security parameter n, then we saythat B has nonuniform conditional pseudoentropyat least k = k(n) given X if for every constant c, B(n)has (nc, 1/nc) nonuniform conditional pseudoentropyat least k(n)−1/nc givenX(n) for all sufficiently largen.

The uniform case is more complicated. We willbriefly describe why sampling complications are impor-tant considerations after we give the uniform definitionof pseudoentropy.

Definition 3.4. (PE,Uniform) Let n be a security pa-rameter that is allowed to influence T = T (n), ε =ε(n), k = k(n), q = q(n). LetX be a [q]-valued randomvariable. We sayX has (T, ε) uniform pseudoentropyat least k if for all time T randomized oracle algorithmsA there exists a random variable Y jointly distributedwith X such that

1. H(Y ) ≥ k2. X and Y are indistinguishable by an Oracle Turing

Machine AOX,Y having access to an oracle OX,Yfor sampling the joint distribution of X and Y .

We say that X has uniform pseudoentropy at leastk if for every constant c, X(n) has (nc, 1/nc) uniformpseudoentropy at least k(n)− 1/nc.

Remark. There are two differences between this defini-tion and that given for the non-uniform model. Theseare

1. In the uniform model, we consider suppX to beindexed by [q] with q = q(n) a security parameter.We do this because for certain cases we will needto explicitly bound q.

2. In the uniform model, we require that the dis-tinguisher AOX,Y be an Oracle Turing Machinerather than just a standard Turing Machine, as inthe non-uniform definition. We will explore theneed for this in the upcoming Section 3.1.1.

Acknowledging and carrying on with these two ad-ditional pain-points for the uniform case, we move todefine conditional pseudoentropy with no additionalcomplication.

Definition 3.5. (CPE,Uniform) Let n be a security pa-rameter that is allowed to influence T = T (n), ε =ε(n), k = k(n), q = q(n). LetX be a [q]-valued randomvariable. We say B has (T, ε) uniform conditionalpseudoentropy at least k given X if for all time Trandomized oracle algorithms A there exists a randomvariable C jointly distributed with X,B such that

1. H(Y ) ≥ k2. (X,B) and (X,C) are indistinguishable by an Or-

acle Turing Machine AOX,B,C having access to anoracle OX,B,C for sampling the joint distributionof (X,B,C).

We say that B has uniform conditional pseudoen-tropy at least k given X if for every constant c, B(n)has (nc, 1/nc) uniform conditional pseudoentropy atleast k(n)− 1/nc given X(n).

Remark. Results to come will show that in applicationslike the construction of PRGs, we can assume thatB is apolynomial-sized set and that X,B is polynomial-timesamplable. This will remove the need for the oracle inthis definition. Nevertheless, the definition is kept ingenerality.

3.1.1 Understanding the Sampling Oracle

At first glance, the need for oracles in definitions 3.1.1and 3.5 seems unwieldy and abstract. This sectionhopes to give a brief yet sufficient explanation for whyoracle machines are necessary to formally define com-putational indistinguishability in the uniform model.

Definition 3.6. A probability ensemble Xnn∈N isa set of probability distributions indexed by a countableset.

Remark. Some definitions of probability ensemblesrequire that they grow in their support as n→∞, andfor practical reasons this is almost always how theyshow up. For our case, however, it is not important todefine a probability ensemble this way.We will consider the following situation which re-

veals a vulnerability in determining indistinguishabil-ity.

Page 15 of 26


Proposition 3.2. Consider the problem of finding aprobability ensemble Xn that is indistinguishable froma uniform ensemble Un according to a halting PPT al-gorithmM : 0, 1∗ → 0, 1. For anyM , there existsan indistinguishable (from Un) probability ensembleXn such that for all n, we have | suppXn| = 2.

Proof. Following Goldreich and Meyer, 1998 we usethe following strategy:

1. First compute α = P[M(Un) = 1].2. Next, find x, y ∈ 0, 1n such thatM(x) = 0 and

M(y) = 1.3. LetXn be the distribution whereby P[Xn = y] = α

and P[Xn = x] = 1− α. This trivially will foolM .

Remark. See Goldreich and Meyer, 1998 for an expla-nation of how to extend this result to fool collections ofTuring Machines simultaneously and for a descriptionof how to construct such probability ensembles Xn(rather than just prove that they exist).

Now that we’ve shown an example of how indistin-guishers can break down, we will demonstrate thatsampling distinguishers don’t fall into this same trap.

Proposition 3.3. Let≡c mean indistinguishable accord-ing to a PPT B. Then if X ≡c Y with i.i.d X1, X2 ∼ Xand i.i.d. Y1, Y2 ∼ Y it is possible that (X1, X2) 6≡c(Y1, Y2).

Proof. We will show an example. Fix B and letX ∼ Un and Y be constructed according to Proposi-tion 3.2. Then an algorithm to distinguish (X1, Y1)from (X2, Y2) is the following:DISTINGUISHOn input (x1, y1) × (x2, y2) ∼ (X1, Y1) × (X2, Y2),compare y1 to y2. If y1 = y2, reject. Otherwiseaccept.

Since Y has support size two, if it is not uni-form then the probability that the DISTINGUISHalgorithm will reject is greater than half. Computingprobabilities we see that

P[DISTINGUISH = REJECT] = α2 + (1− α)2 (63)= 1/2 + d (64)≥ 1/2 + ε(n) (65)

for all ε negligible. Since α 6= 1/2, it follows that d > 0.Since d is a constant, it is not negligible.

A sampling oracle, though, will make the trick fromProposition 3.2 invalid.

Proposition 3.4. Let ≡c mean indistinguishable ac-cording to a PPT B with access to the sampling oracleOX,Y . Then if X ≡c Y with i.i.d X1, X2 ∼ X and i.i.d.Y1, Y2 ∼ Y it follows that (X1, X2) ∼ (Y1, Y2).

Remark. This shows that composability, a basic fea-ture we would want of a formally-defined notion ofpseudoentropy, requires access to oracles.This proposition shows that for the notion of pseu-

doentropy to compose, the model of computation forthe distinguisher must have access to an oracle for bothdistributions.With pseudoentropy well-defined, we will move on

to a generalization of pseudoentropy that imaginesthat a random variable is not being seen all at once butis instead being revealed piece by piece.

3.2 Next-Block Pseudoentropy

Next-block pseudoentropy comes from Haitner, Rein-gold, and Vadhan, 2010. It functions kind of like next-bit unpredictability (Definition 2.1)), but in fact it is aweaker notion of entropy than simple pseudoentropy.

Definition 3.7. Let n be a security parameter, k = k(n)and X = (X1, X2, · · · , Xm) be a random variable bro-ken into m blocks. We say has next-block pseudoen-tropy at least k if there exists a set of Y1, · · · , Ymsuch that

• High aggregate conditional pseudoentropym∑i=1

H(Yi|X1, · · · , Xi−1) ≥ k (66)

• Hard to differentiate

E[P[DOX,Y (X1, · · · , Xi) = 1]

− P[DOX,Y (X1, · · · , Xi−1, Yi) = 1]]≤ L · ε

where L is a bound on the number of oracle calls. Ifthe size of each Xi is one bit, then we call next-blockpseudoentropy next-bit pseudoentropy.

Remark. This is a generalization of pseudoentropywhich incorporates the importance of order. It is easy tocheck that (Un, f(Un)) only has pseudoentropy n, eventhough (as will be a major result proven in the nextsection) (f(Un), Un) has pseudoentropy n+ Ω(log n).Remark. It is also clear that the pseudoentropy of arandom variable is always less than the next-block pseu-doentropy of a broken up representation of that ran-dom variable. It never hurts distinguishing algorithmsto be able to see more bits at once.Like with Shannon entropy, there is a related notion

of next-block pseudo-min-entropy.

Definition 3.8. Let n be a security parameter, k =k(n) andX = (X1, X2, · · · , Xm) be a random variablebroken into m blocks. We say has next-block pseudo-min-entropy of α if there exists a set of Y1, · · · , Ymsuch that

• High aggregate conditional pseudoentropy

H∞(Yi|X1, · · · , Xi−1) ≥ α (67)

Page 16 of 26


• Hard to differentiate

E[P[DOX,Y (X1, · · · , Xi) = 1]

− P[DOX,Y (X1, · · · , Xi−1, Yi) = 1]]≤ L · ε

where L is a bound on the number of oracle calls. Ifthe size of eachXi is one bit, then we call this quantitynext-bit pseudo-min entropy.

3.3 Pseudoentropy as KL Divergence

Here we will expand on the connection alluded to sofar between the KL divergence of distributions andthe conditional pseudoentropy between them. The re-sult of this section will be to show (for non-uniformand uniform models) in Theorems 3.3 and 3.3 thatKL divergence between random variables is equivalentto measurable pseudoentropy between those randomvariables. Since pseudoentropy is defined according tocomputational indistinguishability and KL Divergenceis related to expectations over random variables, weneed to develop a language to connect these two con-cepts.Our first step in this journey is to use a tool from

Vadhan and Zheng, 2011 which converts a function toa jointly-distributed random variable. Differing fromVadhan and Zheng, 2011, we choose to give a nameand definition to this tool since it is so useful.

Definition 3.9. Let P : 0, 1n × [q] → (0,∞). Wewill define the conditioner of P , denoted CP to bethe random variable jointly distributed with X whosep.m.f. is proportional to P .

CP (a|x) =P (x, a)∑b P (x, b)

(68)

Remark. The conditioner is an operator that takes afunction and converts it into a random variable.Since P operates on 0, 1n × [q] it can be thought

of as an algorithm that tries to predict the probabilityof B taking particular values in [q] conditioned on Xtaking a value in 0, 1n for the random variables Xand B jointly distributed. In fact, as the next definitionshows, if it can do a good job at this prediction, we callP a predictor.

Definition 3.10. (KL-Predictor) Let (X,B) be a0, 1n × [q]-valued random variable and P : 0, 1n ×[q]→ (0,∞) a deterministic function. We say that P isa δ-KL Predictor of B given X if

KL(X,B||X,CP ) ≤ δ (69)

In the case that P is randomized, we view P as adistribution over deterministic functions p : 0, 1n ×[q]→ (0,∞). In this case, being a KL-Predictor meansthe following:

Ep∼P

[KL(X,B||X,CP )] ≤ δ (70)

Remark. Note how this definition of a KL-predictor issimilar to next-bit predictors from Definition 2.1.Just like we defined pseudorandomness with respect

to indistinguishability in Definition 3.1, we will defineKL-Hardness with reference to KL-predictability.As with defining conditional pseudoentropy previ-

ously, it is important to handle the case of uniform andnon-uniform models slightly differently when givingdefinitions of KL-hardness.

Definition 3.11. (KL-Hard, Nonuniform) Let (X,B)be a 0, 1n × [q]-valued random variable. We say thatB is non-uniformly KL-Hard given X if there is nocircuit P of size t that is a KL-predictor of B given X.We say that B is non-uniformly δ KL-Hard if for

every constant c, B is non-uniformly (nc, δ − 1/nc)KL-Hard given X for all sufficiently large n.

Definition 3.12. (KL-Hard, Uniform) Let n be a se-curity parameter, δ = δ(n) > 0, t = t(n) ∈ N,q = q(n). Let (X,B) be a 0, 1n × [q]-valued randomvariable. We say that B is (t, δ) uniformly KL-Hardgiven X if for all time t randomized oracle algorithmsP : 0, 1n × [q]→ (0,∞) and all sufficiently large n,POX,Y is not a δ-KL predictor of B given X (where therandomness of POX,Y comes both from oracle calls toX,Y and from internal coin tosses).We say thatB is uniformly δ KL-Hard givenX if for

every constant c, B is uniformly (nc, δ−1/nc) KL-Hardgiven X.

Remark. Note that if P (x, a) = 1 we get the resultCP = U[q] which implies that KL(X,B||X,C) =log q−H(B|X) ≤ log q. This shows that it’s only worthtalking about KL-Hardness for δ ≤ log q.

Definition 3.13. Let (X,B) be a 0, 1n × [q]-valuedrandom variable. We say B is non-uniformly (t, δ) KL-hard for sampling givenX if for all size-t randomizedcircuits S : 0, 1n × [q] it holds that

KL(X,B||X,S(X)) > δ (71)

Definition 3.14. Let n be a security parameter, δ =δ(n) > 0, t = t(n) ∈ N, q = q(n). Let (X,B) bea 0, 1n × [q]-valued random variable. We say B isuniformly (t, δ) KL-Hard for sampling given X if forall t-randomized oracle algorithms S, for all sufficientlylarge n, it holds that

KL(X,B||X,SOX,B (X)) > δ (72)

It turns out that these two definitions are the sameup to a polynomial factor. We explore this relationshipin the next lemma.

Lemma. (KL-Hard ∼ KL-Hard for Sampling, Nonuni-form). Let (X,B) be a 0, 1n× [q]-valued random vari-able. If B is nonuniformly (t, δ) KL-hard for samplinggiven X, then B is non-uniformly (Ω(t/q), δ) KL-hardgiven X. Conversely, if B is nonuniformly (t, δ) KL-hard

Page 17 of 26


given X, then B is non-uniformly (t′, δ − ε) KL-hardfor sampling given X for t′ = t/poly(n, q, 1/ε) for everyε > 0.

Remark. Intuitively it makes sense that KL-hardnessfor sampling is related to KL-hardness. The role thepredictor P plays is very similar to the role that thesampler S plays in approximating the distribution Bfrom X.

Proof. Let’s first prove that KL-hardness for samplinggiven X implies KL-hardness given X. We will do aproof by contradiction.Suppose B is non-uniformly KL-hard for sampling

givenX but not non-uniformly KL-hard givenX. Then,since B is not non-uniformly KL-hard given X theremust exist a size t′ circuit P that is a δ KL-predictor ofB given X. Mathematically, this means

∃P : 0, 1n × [q]→ R+ | KL(X,B||X,CP ) ≤ δ

We want to use this circuit P to build a samplingcircuit S : 0, 1n → [q] that is capable of samplingB well given X. Here is our construction for S:ALGORITHM SOn input x ∈ 0, 1n

1. Use circuit P to compute P (x, bi) for each i ∈[q].

2. Use the randomness allowed in S to sample sfrom the distribution

s ∼ Cp(a|x) =P (x, a)∑b P (x, b)

(73)

RETURN the sample s.

We were able to do Step 1 because q = | suppB|, so itis possible to hardcode values of B into the circuit S ifwe allow the size of S to increase to O(qt). Doing this,we have for this hardcoded S

KL(X,B||X,S(X)) = KL(X,B||X,CP (X)) ≤ δ

The above contradicts the fact that B is non-uniformly(t, δ) KL-hard for sampling, for t′ = Ω(t/q).

Now we will prove the other direction, again bycontradiction. For this direction assume we have a sizet′ sampling circuit S such that

KL(X,B||X,S(X)

)< δ − ε (74)

We will construct a size-t randomized δ-KL predic-tor P . Here’s our algorithm for such a construction:

ALGORITHM DISTRIBUTION POn input (x, a) ∈ 0, 1n × [q]:

1. Let γ be an error-tolerance parameter. We willshow how to choose γ later (see Equation 78).

2. Choose a minimum estimate, e. To make in-equalities work out, choose e = ε/(cq).

3. Draw N samples of S(x), where N is carefullychosen to be

O(n+ log q + log 1/γ) · q2/ε4 (75)

4. Define the estimator E to be

E(a, x) =samples that equaled a

N(76)

RETURN maxE(a, x), ε/(cq)

Claim. With probability 1− γ, the following holds:

∣∣P[S(x) = a]− E(x, a)∣∣ ≤ ε2

c2q(77)

We view P as a distribution over deterministic func-tions p : 0, 1n × [q] → (ε/(cq), 1]. Consider anyp ∈ suppP such that Equation 77 holds.Notice that

∑b P (x, b) ≤ 1 + q(ε/(cq)) = 1 + ε/c.

Now we consider two disjoint events to acquire boundswhose usefulness will become apparent very soon.

1. If P[S(x) = a] > ε/(cq), then

logP[S(x) = a]

Cp(a|x)≤ log

p(x, a) + ε2

c2

p(x, a)+ log

∑b

p(x, b)

≤ log (1 + ε/c) + log (1 + ε/c)

≤ ε

2

2. While if P[S(x) = a] ≤ ε/(cq)

logP[S(x) = a]

Cp(a|x)≤ log

P[S(x) = a]

p(x, a)+ log

∑b

p(x, b)

≤ log (1 + ε/c)

≤ ε

2

Thus, we have the formula

KL(X,B||X,Cp) = Ex∼Xa∼B

(log

P[X = x,B = a]

P[X = x,Cp = a]

)= Ex∼Xa∼B

(log

P[X = x,B = a] · P[X = x, S(x) = a]

P[X = x,Cp = a] · P[X = x, S(x) = a]

)= KL(X,B||X,S(X))

+ Ex∼X

[∑a

B(a|x) logP[S(x) = a]

Cp(a|X)

]On the other hand, for every p : 0, 1n × [q] →

Page 18 of 26


[ε/(cq), 1] it holds that

KL(X,B||X,Cp) = E

[∑b

B(a|X) logB(a|X)

Cp(a|X)

]

≤ maxx,a

log1

Cp(a|x)

= O(log q − log ε)

Thus

Ep∼P

[KL(X,B||X,Cp)]

≤ (1− γ) · (δ − ε

2) + γ ·O

(log q − log ε

)≤ δ

with the last equality holding so long as we choose γsuch that

γ = O

(ε

log q − log ε

)(78)

Furthermore, P has circuit size O(t′m) = t. Thus B isnot non-uniformly (t, δ) KL-hard given X. Hence wehave our contradiction.

Remark. This is Lemma 3.6 of (VZ).

Lemma. Let n be a security parameter, δ = δ(n) > 0,t = t(n) ∈ N, p = p(n), ε = ε(n) > 0, q = q(n)all computable in polynomial-time according to n. Let(X,B) be a 0, 1n × [q]-valued random variable. If Bis uniformly (t, δ) KL-hard for sampling given X thenB is (Ω(t/(q + n), δ) KL-hard given X. Conversely, ifB is uniformly (t, δ) KL-hard given X, then B is uni-formly (t′, δ − ε) KL-hard for sampling given X fort′ = t/poly(n, q, 1/ε).

Proof. This proof is nearly identical to the proof forthe previous lemma. The only change is in the firstdirection, where Oracle Turing Machines are used in-stead of circuits with particular sizes. See Lemma 3.7of Vadhan and Zheng, 2011 for more details.

The next is the main result of (Vadhan and Zheng,2011).

Theorem. (PE=KL-hardness, Non-Uniform) Let (X,B)be a 0, 1n × [q]-valued random variable, δ > 0, ε > 0.

1. If B is non-uniformly (t, δ) KL-Hard given X, thenfor every ε > 0, B has non-uniform (t′, ε) condi-tional pseudoentropy at least H(B|X) + δ − ε fort′ = tΩ(1)/poly(n, log q, 1/ε).

2. Conversely, if B has non-uniform (t, ε) conditionalpseudoentropy at least H(B|X) + δ then for everyσ > 0, B is non-uniformly (t′, δ′) KL-Hard given Xfor t′ = min tΩ(1)/polylog(1/σ),Ω(σ/ε) and δ′ =δ − σ.

Corollary. Let (X,B) be a 0, 1n× [q]-valued randomvariable. Then B has non-uniform conditional pseudoen-tropy at least H(B|X) if and only if B is non-uniformlyδ KL-Hard given X.

By letting B be independent from X we get an evenmore explicit pattern.

Corollary. An n-bit random variableB has non-uniformpseudoentropy at least H(B) + δ if and only if B is δnon-uniformly KL-hard.

These results mostly translate over to the uniformcase, but instead of having a polylog dependence on qwe instead have a polynomial dependence on q. Thismeans that Corollary 3.3 does not hold in the uniformcase.

Theorem. (PE = KL-hardness, Uniform) Let n be asecurity parameter, δ = δ(n) > 0, t = t(n) ∈ N, ε =ε(n) > 0, q = q(n) and σ = σ(n) all be polynomial-time(in n) computable functions. Let (X,B) be a 0, 1n×[q]-valued random variable. Then

1. If B is uniformly (t, δ) KL-hard givenX, then B hasuniform (t′, ε) conditional pseudoentropy at leastH(B|X) + δ − ε for t′ = tΩ(1)/poly(n, q, 1/ε).

2. Conversely, if B has uniform (t, ε) conditionalpseudoentropy at least H(B|X) + δ, then Bis uniformly (t′, δ′) KL-hard given X for t′ =min tΩ(1)/poly(n, log 1/σ,Ω(σ/ε) and δ′ = δ − σ.

Corollary. Let n be a security parameter, δ = δ(n) >0, q = poly(n) be both computable in time poly(n).Let (X,B) be a 0, 1n × [q]-valued random variable.Then B has uniform conditional pseudoentropy at leastH(B|X) + δ if and only if B is uniformly δ KL-hardgiven X.

3.4 Distinguishers

In order to prove the above results, we will need todefine an object called a generalized distinguisher. Dis-tinguishers are very similar to predictors (think algo-rithm B from Definition 2.1) since they are functionsD : 0, 1n[0, 1] where D(x) denotes the probabilitythat the function outputs 1 on input x ∈ 0, 1n.

Definition 3.15. A generalized distinguisher D isa R+ valued randomized function. We use D(x) todenote the expectation of the output of input x.The distinguishing advantage of a generalized dis-

tinguisher is defined to be

AdvD(X,Y ) =∣∣∣E[D(X)]− E[D(Y )]

∣∣∣ (79)

Remark. The space of generalized distinguishers isclosed under addition and multiplication by non-negative scalars. If D1 and D2 are generalized distin-guishers, then so is k1 ·D1 + k2 ·D2 where k1, k2 ≥ 0.For a given D, AdvD has some interesting structure.

In particular, it is almost a metric.

Lemma. With respect to X and Y we see thatAdvD(X,Y ) satisfies:

1. Symmetric

Page 19 of 26


2. Triangle Inequality3. Non-negative

Proof. Showing all of this is very straightforward.

1. Symmetry.

AdvD(X,Y ) =∣∣∣E[D(X)]− E[D(Y )]

∣∣∣=∣∣∣E[D(Y )]− E[D(X)]

∣∣∣= AdvD(Y,X)

2. Triangle Inequality

AdvD(X,Y ) =∣∣∣E[D(X)]− E[D(Y )]

∣∣∣≤∣∣∣E[D(X)]− E[D(Z)]

∣∣∣+∣∣∣E[D(Y )]− E[D(Z)]

∣∣∣= AdvD(X,Z) + AdvD(Y,Z)

3. Non-negative

AdvD(X,Y ) =∣∣∣E[D(X)]− E[D(Y )]

∣∣∣≥ 0

AdvD is not a metric, though, because

AdvD(X,Y ) = 0 6 ⇐⇒ X ≡c Y (80)

In particular, ifD is a generalized distinguisher that justsends everything to a constant, then AdvD(X,Y ) = 0for all X,Y .

We now move on to defining a random variable thatwill be of special use later on.

Definition 3.16. Let the power operator 2op :(0, 1n × [q]) → (0, 1n × [q]) of a distinguisherD : 0, 1n × [q] be the (X,B)-jointly distributed ran-dom variable 2D characterized by

2D(a|x) =2D(x,a)∑b 2D(x,b)

(81)

Remark. Note that

C2D (a|X) =2D(a|X)∑b 2D(b|X)

(82)

=2D(a|X)∑b

2D(b,x)∑b′ 2D(b′,x)

(83)

=2D(a|X)

1(84)

= 2D(a|X) (85)

We are able to do some real magic with this due tothe following lemma

Lemma. Let (X,B) be a 0, 1n × [q]-valued randomvariable and D be a generalized distinguisher. Then

KL(

(X,B)||(X, 2D))

= H(2D|X)−H(B|X)− AdvD(

(X,B), (X, 2D))

Proof. By basic properties of the conditional KL-Divergence we have

KL(X,B||X, 2D) = Ex∼Xa∼B

[log

P[B = a|X = x]

P[B = a|X = x]

]

= EX

[∑a

B(a|X) logB(a|X)

2D(a|X)

]

= −H(B|X) + EX

[∑a

B(a|X) log1

2D(a|X)

]Now we get a little bit clever. We see that

H(2D|X) = EX

[H(2D|X = x)]

= EX

[∑a

2D(a|X) log1

2D(a|X)

]Which means we can exchange a H(2D|X) outside ofthe expectation in exchange for a factor of 2D(a|X)inside the expectation. This yields

= H(2D|X)−H(B|X) (86)

+ EX

[∑a

(B(a|X)− 2D(a|X)

)log

1

2D(a|X)

]But using the definition of 2D from Definition 3.16, weknow

log1

2D(a|X)= log

(∑b

2D(X,b))−D(X, a)

The left term of the above equation does not depend ona. This means that by the law of total probability∑

a

(B(a|X)− 2D(a|X)

)log(∑

b

2D(X,b))

= 0

These two observations allow us to simplify Equation87 to prove the formula:

= H(2D|X)−H(B|X)

− EX

[∑a

(B(a|X)− 2D(a|X)

)D(a|X)

]= H(2D|X)−H(B|X)− E[D(X,B)−D(X, 2D)]

= H(2D|X)−H(B|X)− AdvD(

(X,B), (X, 2D))

3.5 Universal Distinguishers

Definition 3.17. Let (X,B) be a 0, 1n × [q]-valuedrandom variable where H(B|X) ≤ log q − δ for someδ ≥ 0. Call a distinguisher D a universal distin-guisher with advantage ε if AdvD((X,B), (X,C)) >ε for all C with H(C|X) ≥ H(B|X) + δ for ε > 0.

Lemma. Let (X,B) be a 0, 1n × [q]-valued randomvariable as in Definition 3.17. If D is a universal distin-guisher with advantage ε then there exists k ∈ [0, log q/ε]such that KL(X,B||X, 2kD) ≤ δ.

Page 20 of 26


Proof. We will first show that there exists a k ∈[0, log q/ε] such that

H(2kD|X) = H(B|X) + δ (87)

For simplicity, denote k0 = log q/ε. By the decomposi-tion of distinguishing advantage in Lemma 3.4 alongwith the basic substitution rule from calculus8 we get

AdvD((X,B), (X, 2k0D))

=1

k0

(H(2k0D|X)−H(B|X)−KL(X,B||X, 2k0D)

)Since the random variable 2k0D|X takes values in [q],its maximum Shannon entropy is log q. Furthermore,the subtracted terms are non-negative. Thus

AdvD((X,B), (X, 2k0D)) ≤ log q

k0= ε (88)

By the assumption on our distinguisher, this meansthat

H(2k0D|X) < H(B|X) + δ (89)

Now, let’s try to get a bound going the other direction.Consider 20|X.

H(20|X) = −∑a

20(a|x) log 20(a|x) (90)

= −∑a

1

qlog 1/q (91)

= log q (92)≥ H(B|X) + δ (93)

With the last equality holding because of an assumptionin the lemma’s statement.

3.6 KL-hard Implies Conditional Pseu-doentropy

We will prove this theorem only for the non-uniformcase. The uniform case is a bit more complex.(Not going to restate the statement).

Proof. We’ll do a proof by contradiction. Assume thatBdoes not have (t, ε) conditional pseudoentropy at leastH(B|X)+δ−ε. This means there must be distinguisherwith size at most t that is able to tell (X,B) and (X,C)apart to at least the following extent:

AdvD((X,B), (X,C)) > ε (94)

Now comes the fun part. Remember that we aretrying to show that B does not have non-uniform (t, δ)KL-hardness given X, and this means that we wantto find a predictor for it. But, this is hard since weknow almost nothing about the distribution (X,B)

8We need to use the chain rule because expectation is integrationwith respect to x. The basic substitution we are using is x→ k0 · x.

besides the fact that it has pseudoentropy. For anydistribution with conditional pseudoentropy, canwe find a predictor? The adversarial nature of thisproblem lends itself to game theory.

GAME Conditional Pseudoentropy VS. Distinguisher

In pure-strategy terms, the game proceedsas

1. Player 1 chooses a [q]-valued random variableC such that

H(C|X) ≥ H(B|X) + δ − ε (95)

2. Player 2 picks a size t′ distinguisher D.

The payoff for Player 2, denoted v2, is

v2 = AdvD((X,B), (X,C)) = −v1 (96)

Allowing for mixed strategies, we observe thefollowing:

1. A mixed strategy for Player 1 is just a distributionof random variables with conditional entropy. Thisis just another (pure) random variable with con-ditional entropy. Thus Player 1 will always lose atleast ε to Player 2.

2. A mixed strategy for Player 2 is a distribution ofdistinguishers. On the surface, it seems hard tothink that Player 2 will be able to use one distribu-tion of distinguishers to counter anything Player1 might put out.

By the Min-Max Theorem, we know that the mixed-strategy version of this game has a well-defined value.This, coupled with the observations above mean thatPlayer 2 must have a mixed strategy of size t′ distin-guishers that can distinguish any distribution with con-ditional entropy at leastH(B|X)+δ−ε to (X,B) withdistinguishing advantage at least ε.Not thinking about efficiency (for now), Player 2’s

mixed strategy must be a universal distinguisher in thesense of Definition 3.17. This means that we can apply3.5 to get that there exists k ∈ [0, log q/ε] such that

KL(X,B||X, 2kD) ≤ δ − ε (97)

We can use this optimal k to construct a predictorP (x, a) = 2kD(x,a). For this predictor,

KL(X,B||X,Cp) ≤ δ − ε (98)

Which contradicts the assumption that B were KL-hardgiven X.Unfortunately, this isn’t enough yet. The P we

proved the existence of may not have small circuits.We have to show that we can efficiently approximatethe steps above we took to get a concrete predictor Pfrom just a distribution of distinguishers. This is a bitof a messy construction, and we will leave the readerto check Vadhan and Zheng, 2011 for full details.

Page 21 of 26


3.7 Pseudoentropy Implies KL-hardness

(non-uniform version)

Proof. Suppose for contradiction that B were not(t′, δ − σ) KL-hard. Then, there would be a (δ − σ)predictor P : 0, 1n × [q]→ [1,∞) as a circuit of sizet′.Consider the following distinguisher:

D(x, a) =1

2t′(logP (x, a) + t′) (99)

D has the following properties:

1. D(a, x) ∈ [0, 1]. This holds since 2−t ≤ P (x, a) ≤2t. Clearly a circuit can’t compute a number itcannot express, size-wise.

2. We have 22t′D = Cp.

22t′D(a|x) =22t′D(a,x)∑b 22t′D(x,b)

(100)

=2logP (x,a)+t′∑b 2logP (x,a)+t′

(101)

=2t′2logP (x,a)

2t′∑b 2logP (x,a)

(102)

= Cp(a|x) (103)

By the decomposition of distinguishing advantage, wehave

H(22t′D|X)−H(B|X)− Adv2t′D((X,B), (X, 22t′D))

= KL(X,B||X, 22t′D) ≤ λ

Now we consider any C such that

H(C|X) ≥ H(B|X) + δ (104)

We derive

H(22t′D|X)−H(C|X)− Adv2t′D((X,C), (X, 22t′D))

= KL(X,C||X, 22t′D) ≤ λ

The similarities in these two formula allow us to com-pare:

AdvD((X,B), (X,C)) = AdvD((X,B), (X, 222t′D))

− AdvD((X,C), (X, 22t′D))

=1

2t′

[AdvD((X,B), (X, 222t′D

))

− AdvD((X,C), (X, 22t′D))]

≥ 1

2t′

(H(C|X)−H(B|X)− λ

)≥ δ − λ

2t′

Since we are able to get a distinguishing advantage outof D, we have shown a contradiction.

4 Entropy Conversions

The main result of Vadhan and Zheng, 2011 was show-ing that the random variable defined in two blocks tobe (f(Un), Un) has next-block pseudoentropy (fromSection 3.2) at least n+ log(n).

4.1 Getting Next-Block Pseudoentropy

Lemma. Let n be a security parameter and f :0, 1n → 0, 1 be one-way. Then Un is uniformlyKL-hard for sampling given f(Un).

Proof. We’ll do a proof by contradiction. If this werenot the case, then there would exist a randomizedoracle algorithm S such that for some δ(n) negligiblein n we had

KL(f(Un), Un||f(Un), Sf(Un),Un(f(Un))) ≤ δ(n)

To relate this to OWFs, we need to turn this into a state-ment about S being correct. We will do this by definingan indicator random variable c : 0, 1 × 0, 1n →0, 1 that checks whether the second input maps tothe first under f . Using indicator random variable no-tation I

c(y, x) = I(f(x) = y) (105)

Then, since applying a deterministic function cannotincrease KL-Divergence(by Proposition 1.9), we have

KL(f(Un), Un||f(Un), Sf(Un),Un(f(Un)))

≤ KL(c(f(Un), Un)||c(f(Un), Sf(Un),Un(f(Un)))

)≤ KL

(B1, Bε(n))

)Where Bp denotes the Bernoulli random variable withprobability of success p. The last line above came be-cause we know c(·, ·) is going to be a zero/one randomvariable with success probabilities in our two casesequal to

P[c(f(Un), Un

)= 1] = P

x∈0,1n

[f(Un) = f(Un)

]= 1

and

P

[c(f(Un),SOf(Un),Un (f(Un))

)= 1

]= Px∈0,1n

[f(Un) = f(SOf(Un),Un (f(Un)))

]≤ ε(n)

Where the last inequality holds because S is an oraclePPT and f is assumed to be a OWF (see Definition1.18). Then we have

KL(B1, Bε(n)

)= −

∑x∈suppB1

B1(x) logBε(x)

B1(x)(106)

= −B1(1) logBε(1)

B1(1)(107)

= − log ε(n) (108)

Page 22 of 26


Since ε(n) is negligible, this means that for all c ∈ N,− log ε(n) > c log n. This contradicts f being uniformlyKL-hard for sampling given f(Un).

This shows that the next-bit pseudoentropy of(f(Un), Un) is greater than zero.

4.2 Ω(log n) Entropy Gap

Morework can be done through this method, continuedin Vadhan and Zheng, 2011, to show that the next-blockpseudoentropy of (f(Un), Un) is at least n+ log n.

5 Constructing a PRG

Although the definitions of pseudoentropy in the pre-vious sections seem natural in their own right, theirfunctions in the following construction of a PRG moti-vate their development.

5.1 Next-Bit Pseudoentropy to Next-BitPseudo-Min-Entropy

Lemma. LetX be a random variable over 0, 1m whereeach block of X has (T, ε) next-block pseudoentropy atleast α. Then it holds that every block of Xt has (T ′, ε′)next-block pseudo-min entropy α′ where

1. T ′ = T −O(m · t)2. ε′ = t2(ε+ 2−k + 2−c·t) for a universal constant c

and3. α′ = α · t− Γ(t, κ) for Γ(t, κ) ∈ O(

√t · κ log t).

Proof. This proof makes use of some notation that canbe tough to visualize. To help we write out the matrixfor Xt9.

X =(X1 X2 · · · Xm

)(109)

Xt =

(X1)(1) · · · (Xi)(1) · · · (Xm)(1)

... · · ·... · · ·

...(X1)(j) · · · (Xi)

(j) · · · (Xm)(j)

... · · ·... · · ·

...(X1)(t) · · · (Xi)

(t) · · · (Xm)(t)

(110)

Y ti =

(Yi)(1)

...(Yi)

(j)

...(Yi)

(t)

(111)

As indicated above by alignment, for each j the entry(Yi)

(j) is jointly distributed with X(j) in Xt.This proof makes use of hybridization (like the proof

of Theorem 2.1). We will assume that Xt does not9Note that Haitner, Reingold, and Vadhan, 2010 wrote this matrix

with opposite convention regarding rows and columns.

have next-block pseudo-min entropy α and derive acontradiction. If this were the case, then we wouldhave a distinguisher Dt such that the conditions ofDefinition 3.8 were violated. This meaning:

E[P[D

OX,Y

t

((X1)(1), · · · , (Xi)

(1))

= 1] (112)

− P[DOX,Y

t (X1, · · · , Xi−1, Yi) = 1]]≤ L · ε (113)

We will show a way to use this distinguisher tocontradict the pseudoentropy of X.Distinguisher D for XOn input (x1, · · · , xi−1, z)

1. D samples j from [t].2. D samples (xt, yt) from (Xt, (Yi)

t) using OX,Y .

xsample =(x1 x2 · · · xi−1 z

)

xt =

x(1)1 · · · x

(1)i−1 x

(1)i · · · x

(1)m

... · · ·...

... · · ·...

x(j)1 · · · x

(j)i−1 x

(j)i · · · x

(j)m

... · · ·...

... · · ·...

x(t)1 · · · x

(t)i−1 x

(t)i · · · x

(t)m

yti =

y(1)i...y

(j)i...y

(t)i

3. Do some replacement

xsample =(x1 x2 · · · xi−1 z

)(114)

xt =

x(1)1 · · · x

(1)i−1 x

(1)i · · · x

(1)m

... · · ·...

... · · ·...

x1 · · · xi−1 z · · · x(j)m

x(j+1)1 · · · x

(j+1)i−1 y

(j+1)i · · · x

(j+1)m

... · · ·...

... · · ·...

x(t)1 · · · x

(t)i−1 y

(t)i · · · x

(t)m

(115)

4. Return

DOXt,(Yi)

t

t

(

x(1)1 · · · x

(1)i−1 x

(1)i

... · · ·...

...x1 · · · xi−1 z

x(j+1)1 · · · x

(j+1)i−1 y

(j+1)i

... · · ·...

...x

(t)1 · · · x

(t)i−1 y

(t)i

)

(116)A few additional points must be made. D is allowed to

Page 23 of 26


sample from OXt,(Yi)t using its given sampling abilityof OX,Y by simply sampling from X and Y t times.Thus D runs in the same time as Dt except that weconstruct this matrix in time O(t ·m) first.Additionally, for notation, allow

Xt1,··· ,i =

x

(1)1 · · · x

(1)i

... · · ·...

x(t)1 · · · y

(t)i

(117)

and similarly for other indices.By Proposition 1.1, we have a relationship between

the Shannon entropy of X and the Shannon min-entropy of Xt. This guarantees the existence of a ran-dom variableW such that for our choice of κ we have

∆(

(X1,··· ,i−1, (Yi)t), (X1,··· ,i−1,W )

)≤ 2−κ + 2−c·t

and for all x ∈ suppX

H∞(W |X1,··· ,i−1) ≥ α− Γ(t, κ) (118)

Then we start the hybridization analysis. Given an i,denote Z [j] to be the jth column of our matrix xt fromEquation 114. Then we have

δDX,Y,i =1

t

∑j∈[t]

P[DOt

X ,(Yi)t

t (X1,··· ,i−1, Z[j]) = 1]

− P[DOt

X ,(Yi)t

t (X1,··· ,i−1, Z[j − 1]) = 1]

= P[DOt

X ,(Yi)t

t (X1,··· ,i−1, Z[t]) = 1]

− P[DOt

X ,(Yi)t

t (X1,··· ,i−1, Z[t− 1]) = 1]

= P[DOt

X ,(Yi)t

t (X1,··· ,i) = 1]

− P[DOt

X ,(Yi)t

t (X1,··· ,i−1, (Yi)t) = 1]

≥ P[DOt

X ,Wt (X1,··· ,i) = 1]

− P[DOt

X ,Wt (X1,··· ,i−1,W ) = 1]− L′(2−κ − 2−c·t)

=1

t(δDt

Xt,W,i − L′(2−κ − 2−c·t)

where L′ bounds oracles calls by Dt. Now we canassess the differentiating power ofD, our separator forX and Y .

δDX,Y ≥1

t(δDt

Xt,W )

≥ 1

t

(t2L′(ε+ 2−k + 2−c·t)− L′ · (2−k + 2−c·t))

≥ Lε

This contradicts our statement about next-block pseu-doentropy.

We will use this result in the next section’s construc-tion.

5.2 Next-Block Pseudoentropy to Pseu-dorandomness

We will start by choosing a universal hash functions : 0, 1t → 0, 1α−t where s is describable in t bits.From this, we define the randomness extractor E to be

E(x, s) := (s, s(x1), s(x2), · · · , s(xm)) (119)

Since S is a hash function, we can apply the leftoverhash lemma (Haastad et al., 1999) to show that S1 andS2, defined below, have statistical difference boundedby 2−κ/2.

S1 := (S, S(X1), · · · , S(Xi−1), S(Yi), U(α−κ)·(m−i))(120)

S2 := (S, S(X1), · · · , S(Xi−1), U(α−κ)·(m−i+1))(121)

To make statements about the distribution of E(X,Ut)easier, we will define the helper function Z on randomvariable inputsW to be

Z [i](W ) = (S, S(X1), · · · , S(Xi−1), S(W ), U(α−κ)·(m−i))(122)

With this setup and notation, we will proceed toa proof by contradiction. Assume that E is not pseu-dorandom, but X1, · · · , Xm has next-bit pseudo min-entropy α. Since E is not pseudorandom there existsD1 such that for all negligible ε

δ1 = P[D1(Z [i](Xi)) = 1]− P[D1(Z [i](Xi)) = 1] ≥ ε(n)(123)

On the other hand, since X1, · · · , Xm has next-bitpseudo min-entropy α, there exist a set of randomvariables Y1, · · · , Ym jointly distributed with X suchthat for every x ∈ suppX

H∞(Yi|X1,··· ,i−1 = x1,··· ,i−1) ≥ α (124)

But now we can use this information to derive an-other expression for δ1:

δ1 = P[D1(Z [m](Xm)) = 1]− P[D1(Z [0](X0)) = 1]

(125)

=

m∑i=1

(P[D1(Z [i](Xi)) = 1]− P[D1(Z [i−1](Xi−1)) = 1]

)(126)

Using the earlier noted result from the leftover hashlemma (Equation 121), we have

δ1 ≤m∑i=1

(P[D1(Z [i](Xi)) = 1]− P[D1(Z [i](Yi)) = 1] + 2−κ/2

)= m ∗ (δDX,Y + 2−κ/2)

≤ m · (ε+ 2−κ/2)

This is small enough to contradict our earlier assertionthat δ1 was less than every negligible function of n.Therefore, it follows that E(X,S) is pseudorandomso long as X has next-bit pseudo-min entropy greaterthan α.

Page 24 of 26


Figure 4: A diagram, reproduced from Vadhan and Zheng,2011, which shows the general strategy behind pseu-doentropy generation from an arbitrary OWF.

5.3 Construction Outline

It’s finally time to outline the construction PRGs. Assummarize by Vadhan and Zheng, 2011, the strategyin which this has been done, from the original con-struction in Haastad et al., 1999 up through Haitner,Reingold, and Vadhan, 2010 and Vadhan and Zheng,2011 is to follow the following four steps:

1. Use an OWF to construct a PEG. This is what weproved we could do with Un → (f(Un), Un).

2. Convert the pseudoentropy generated by the PEGinto pseudo-min-entropy and increase the entropygap.

3. Extract randomness from this stream through theuse of an appropriate hash function.

4. Use an enumerative method to concatenate severalPRGs together to produce a final PRG.

The first three of the above steps can be seen in Figure 4.As claimed in Section 2.2, this can roughly be thoughtof as a generalization of the pseudorandom generativeprocess within Goldreich and Levin, 1989. CompareFigure 4 to Figure 3. Both methods incorporate theparallel production of pseudoentropy which then hasto get manipulated in a dimensionally reducing way inorder to become random.In fact, the method of Haastad et al., 1999 (not ex-

plained here in depth) is even more similar to the Gol-dreich and Levin construction. In particular, Haastadet al., 1999 construct pseudoentropy from functionsthat stretch by just one bit. The recent efficiency gainsfrom Haitner, Reingold, and Vadhan, 2010 and Vadhanand Zheng, 2011 are mostly from using log n bits ofnext-block pseudoentropy (from Section 4) instead of 1bit of pseudoentropy in constructions.

5.4 Formal Explanation

The construction will work as follows:

MAKE PRGOn input f : 0, 1n → 0, 1m a OWF.

1. Define the random variable G = (f(Un), Un).2. Choose a u = O(n/∆).3. Choose t = O(n/∆)2.4. Select a shift parameter for each row from

the construction in Figure 4. Formally, forr ∈ [t], select si from [m + n]. This will takeO(t log (m+ n)) bits of entropy.

5. For each row r, define Gr to be the concatena-tion of u copies of G, with the first sr bits andthe last m+ n− sr bits truncated.

6. Use k bits of randomness to choose a specifichash function h : 0, 1t → 0, 1t′ . We will beable to find such a hash function family withk = O(n).

7. Evaluate h on each of the columns of the matrixand concatenate our result.

If Gr denotes the random variable for the rth rowof the matrix, and Gjr denotes the jth index of Grthen our PRG becomes

g(x1, x2, · · · , xu·t, s1, s2, · · · , st, h) (127)= 〈h, h(G1

1G12 · · ·G1

t ), h(G21G

22 · · ·G2

t ), · · · (128)h(G2

1G22 · · ·G2

t )〉 (129)

Proof. We now proceed to prove the construction, fol-lowing the proof outline in Vadhan and Zheng, 2011but expanding using techniques from Haastad et al.,1999. For organization, it will follow a lot of claimsand sub-proofs, some much more difficult than others.

Claim. Every bit in Step 5 has next-bit pseudoentropyat least k/(m+ n) if we take u = O(n/∆).

Proof. This is the step that Haitner, Reingold, and Vad-han, 2010 call “Entropy Equalization”. It is based onthe fact that since j ∈ [m + n] was chosen randomly,then every bit has at least the average next-bit pseu-doentropy which is trivially k/(m+ n). Since we trun-cate some bits in constructing Gr from u copies ofG, we end up losing some entropy. To make this lostentropy negligible, we need O(n/∆) copies of G.

Remark. This step is costly, clearly, since it requires anextra factor of n in the seed length of the pseudoran-dom generator and requires seed bits to indicate therandom offsets for each row. In Vadhan and Zheng,2011 it is shown how to improve the efficiency of thisstep.

Claim. Each row of the tableau in Figure 4 (when viewedas a complete row) has pseudo-min-entropy given theprevious rows.

Proof. We took t = O(n/∆)2 copies of G to create thetableau. The analysis in Section 5.1 shows that taking

Page 25 of 26


copies of a random variable like G with known next-block pseudoentropy creates a new random variablewith known next-block pseudo-min-entropy (in ourcase this is the random variable with entries that arerows of the tableau). We don’t do the work to provethat this requires that t = O(n2/∆2) but intuitivelythis can be understood as a result of the

√t factor in

Equation 8.

Claim. Specifically, for any i ∈ [t],

〈h(Gi1Gi2 · · ·Git), h〉 ≡c 〈Ut′ , Uk〉 (130)

Proof. We consider a description of h to be k bits. Thus,this follows from Proposition 3.1.

Claim. If every bit going into the hash function h hasnext-bit pseudo min-entropy, then the output of the hashfunction is pseudorandom.

Proof. This follows the work of Section 5.2. For theexact constants that can be extracted, see the work ofHaitner, Reingold, and Vadhan, 2010.

Let’s analyze this construction.How should the efficiency of this construction be

measured?We consider the minimal seed length of the pseudo-random generator that we are able to create. The algo-rithm requires randomness for the following purposes:

1. Each G requires n bits of randomness.2. We need O(u · t) copies of G.3. We need k bits of randomness to describe the hash

function h.4. We need O(t log (m+ n)) bits of randomness to

describe the random shifts for each row.

Adding this all together, we get that this constructionneeds O(n4) bits of randomness. Thus the shortest-seed PRG that could be constructed using this method isof seed length O(n4) where n is the input dimension ofthe OWF f . For practical purposes, this is not very goodat all10. The optimal efficiency of this construction interms of asymptotic seed length of the PRG constructedis not known, though as referenced earlier Vadhan andZheng, 2011 made a slight improvement to take theseed length down to O(n3 log n).

6 Conclusion

Looking at definitions of entropy based on indistin-guishability and unpredictability is very fruitful fordeveloping tools that help with the construction ofcryptographic protocols.

10Candidate OWFs like those described in Section 2 have inputdimensions in the several hundreds of bits.

Bibliography

Arora, Sanjeev and Boaz Barak (2009). ComputationalComplexity: A Modern Approach. Cambridge Univer-sity Press.

Bailey, David and Richard Crandall (2001). “On theRandom Character of Fundamental Constant Expan-sions”. In: Experimental Mathematics.

Borel, Emile (1909). “Les probabilités dénombrableset leurs applications arithmétiques”. In: Rendicontidel Circolo Matematico di Palermo.

Goldreich, Oded and Leonid Levin (1989). “A Hard-Core Predicate for all One-Way Functions”. In: Pro-ceedings of the 21st Annual ACM Symposium on The-ory of Computing.

Goldreich, Oded and Bernd Meyer (1998). “Computa-tional Indistinguishability: Algorithms vs. Circuits”.In: Theoretical Computer Science.

Haastad, Johan et al. (1999). “A Pseudorandom Gener-ator from Any One-Way Function”. In: SIAM J. Com-put, pp. 1364–1396.

Haitner, Iftach, Omer Reingold, and Salil Vadhan(2010). “Efficiency Improvements in ConstructingPseudorandomGenerators fromOne-Way Functions”.In: SIAM J. Comput., 42(3), pp. 1405–1430.

Haitner, Iftach et al. (2010). “Universal One-Way HashFunctions via Inaccessible Entropy”. In: EUROCRYPT2010. Lecture Notes in Computer Science, vol 6110.

Impagliazzo, Russell (1995). “A Personal View ofAverage-Case Complexity”. In: Proceedings of Struc-ture in Complexity Theory. Tenth Annual IEEE Con-ference.

Kolmogorov, Andrey (1968). “Three Approaches to theQuantitative Definition of Information”. In: Interna-tional Journal of Computer Mathematics.

Naor, Moni and Moti Young (1989). “Universal One-Way Hash Functions and their Cryptographic Appli-cations”. In: STOC ’89 Proceedings of the twenty-firstannual ACM symposium on Theory of computing.

Shannon, Claude E. (1948). “AMathematical Theory ofCommunication”. In: Bell Systems Technical Journal27.

– (1949). “Communication Theory of Secrecy Sys-tems”. In: Bell Systems Technical Journal 28, pp. 656–715.

Vadhan, Salil (2012). “Computational Entropy”. In:Palo Alto, CA. Rajeev Motwani Distinguished LectureSeries at Stanford.

Vadhan, Salil and Colin Jia Zheng (2011). “Character-izing Pseudoentropy and Simplifying PseudorandomGenerator Constructions”. In: Electronic Colloquiumon Computational Complexity.

Yao, Andrew C. (1982). “Theory and Applications ofTrapdoor Functions”. In: 23rd Annual Symposium onFoundations of Computer Science, IEEE.

Page 26 of 26

computational entropy and cryptog raphic constructions

Documents