robust 1-bit compressive sensing with partial gaussian

26
1 Robust 1-bit Compressive Sensing with Partial Gaussian Circulant Matrices and Generative Priors Zhaoqiang Liu, Subhroshekhar Ghosh, Jun Han, Jonathan Scarlett Abstract In 1-bit compressive sensing, each measurement is quantized to a single bit, namely the sign of a linear function of an unknown vector, and the goal is to accurately recover the vector. While it is most popular to assume a standard Gaussian sensing matrix for 1-bit compressive sensing, using structured sensing matrices such as partial Gaussian circulant matrices is of significant practical importance due to their faster matrix operations. In this paper, we provide recovery guarantees for a correlation-based optimization algorithm for robust 1-bit compressive sensing with randomly signed partial Gaussian circulant matrices and generative models. Under suitable assumptions, we match guarantees that were previously only known to hold for i.i.d. Gaussian matrices that require significantly more computation. We make use of a practical iterative algorithm, and perform numerical experiments on image datasets to corroborate our theoretical results. I. I NTRODUCTION The goal of compressive sensing (CS) [1], [2] is to recover a signal vector from a small number of linear measurements, by making use of prior knowledge on the structure of the vector in the relevant domain. The structure is typically represented by sparsity in a suitably-chosen basis, or generative priors motivated by recent success of deep generative models [3]. The CS problem has been popular over the past 1–2 decades, and there is a vast amount of literature with theoretical guarantees, including sharp performance bounds for both practical algorithms [4]–[7] and potentially intractable information-theoretically optimal algorithms [8]–[15]. For the sake of low-cost implementation in hardware and robustness to certain nonlinear distortions [16], instead of assuming infinite-precision real-valued measurements as in CS, it has also been popular to consider 1-bit CS [17]–[21], in which the unknown vector is accurately recovered from binary-valued measurements. Z. Liu is with the Department of Computer Science, National University of Singapore (email: [email protected], [email protected]). S. Ghosh is with the Department of Mathematics, National University of Singapore (email: [email protected]). J. Han is with the Platform and Content Group, Tencent (email: [email protected]). J. Scarlett is with the Department of Computer Science, the Department of Mathematics and the Institute of Data Science, National University of Singapore (email: [email protected]). This work was supported by the Singapore National Research Foundation (NRF) under grant R-252-000-A74-281, and the Singapore Ministry of Education (MoE) under grants R-146-000-250-133 and R-146-000-312-114. August 10, 2021 DRAFT arXiv:2108.03570v1 [cs.LG] 8 Aug 2021

Upload: others

Post on 14-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust 1-bit Compressive Sensing with Partial Gaussian

1

Robust 1-bit Compressive Sensing with Partial

Gaussian Circulant Matrices and Generative

PriorsZhaoqiang Liu, Subhroshekhar Ghosh, Jun Han, Jonathan Scarlett

Abstract

In 1-bit compressive sensing, each measurement is quantized to a single bit, namely the sign of a linear function

of an unknown vector, and the goal is to accurately recover the vector. While it is most popular to assume a standard

Gaussian sensing matrix for 1-bit compressive sensing, using structured sensing matrices such as partial Gaussian

circulant matrices is of significant practical importance due to their faster matrix operations. In this paper, we provide

recovery guarantees for a correlation-based optimization algorithm for robust 1-bit compressive sensing with randomly

signed partial Gaussian circulant matrices and generative models. Under suitable assumptions, we match guarantees

that were previously only known to hold for i.i.d. Gaussian matrices that require significantly more computation. We

make use of a practical iterative algorithm, and perform numerical experiments on image datasets to corroborate our

theoretical results.

I. INTRODUCTION

The goal of compressive sensing (CS) [1], [2] is to recover a signal vector from a small number of linear

measurements, by making use of prior knowledge on the structure of the vector in the relevant domain. The structure

is typically represented by sparsity in a suitably-chosen basis, or generative priors motivated by recent success of

deep generative models [3]. The CS problem has been popular over the past 1–2 decades, and there is a vast amount

of literature with theoretical guarantees, including sharp performance bounds for both practical algorithms [4]–[7]

and potentially intractable information-theoretically optimal algorithms [8]–[15].

For the sake of low-cost implementation in hardware and robustness to certain nonlinear distortions [16], instead of

assuming infinite-precision real-valued measurements as in CS, it has also been popular to consider 1-bit CS [17]–[21],

in which the unknown vector is accurately recovered from binary-valued measurements.

Z. Liu is with the Department of Computer Science, National University of Singapore (email: [email protected], [email protected]).

S. Ghosh is with the Department of Mathematics, National University of Singapore (email: [email protected]).

J. Han is with the Platform and Content Group, Tencent (email: [email protected]).

J. Scarlett is with the Department of Computer Science, the Department of Mathematics and the Institute of Data Science, National University

of Singapore (email: [email protected]).

This work was supported by the Singapore National Research Foundation (NRF) under grant R-252-000-A74-281, and the Singapore Ministry

of Education (MoE) under grants R-146-000-250-133 and R-146-000-312-114.

August 10, 2021 DRAFT

arX

iv:2

108.

0357

0v1

[cs

.LG

] 8

Aug

202

1

Page 2: Robust 1-bit Compressive Sensing with Partial Gaussian

2

Most of the relevant work for 1-bit CS has focused on standard Gaussian sensing matrices, and unlike linear

CS, 1-bit CS may easily fail when using a sensing matrix that is not standard Gaussian [22]. However, since such

matrices are unstructured, the associated matrix operations (e.g., multiplication) used in the decoding procedure can

be prohibitively expensive when the problem size is large. This motivates the use of structured matrices permitting

fast matrix operations, and a notable example is a subsampled Gaussian circulant sensing matrix [23], which can be

implemented via the fast Fourier transform (FFT) [24], [25]. Motivated by recent progress on 1-bit CS with partial

Gaussian circulant matrices [26], [27] and successful applications of deep generative models, we provide recovery

guarantees for a correlation-based optimization algorithm for robust 1-bit CS with partial Gaussian circulant matrices

and generative priors.

A. Related Work

It is established in [28] that an `1/`2-restricted isometry property (RIP) of the sensing matrix guarantees accurate

uniform recovery for sparse vectors via a hard thresholding algorithm and a linear programming algorithm. Based

on this result, the authors of [26] provide recovery guarantees for noiseless 1-bit CS with partial Gaussian circulant

matrices by proving the required RIP. In particular, they show that O(sε4 log

(nεs

))noiseless measurements suffice

for the reconstruction of the direction of any s-sparse vector in n-dimensional space up to accuracy ε if the sparsity

level s satisfies s = O(√

εnlogn

). The authors also consider a dithering technique [29]–[32] to enable the estimation

of the norm of the signal vector.

The authors of [27] establish an optimal (up to logarithmic factors) sample complexity for robust 1-bit sparse

recovery using a correlation-based algorithm with `2-norm regularization. The setup and results are related to ours,

but bear several important differences, including (i) the use of regularization; (ii) the focus on sub-Gaussian noise

added before quantization; (iii) the use of dithering-type thresholds in the measurements; (iv) the consideration of

sparse signals instead of generative priors; and (v) the use of a random (rather than fixed) index set in the partial

circulant matrix.

Some theoretical guarantees for circulant binary embeddings, which are closely related to 1-bit CS with partial

Gaussian circulant matrices, have been presented in [24], [33]–[35]. On the other hand, based on successful

applications of deep generative models, instead of assuming the signal vector is sparse, it has been recently popular

to assume that the signal vector lies in the range of a generative model in the problem of CS [3], [36]–[43]. In

particular, 1-bit CS with generative priors have also been studied in [44]–[46], but none of these papers considers

structured sensing matrices that permit fast matrix operations.

B. Contribution

We consider the problem of recovering a signal vector in the range of a generative model via noisy 1-bit

measurements using a correlation-based optimization objective and a randomly signed partial Gaussian circulant

sensing matrix. We characterize the number of measurements sufficient to attain an accurate recovery of an underlying

August 10, 2021 DRAFT

Page 3: Robust 1-bit Compressive Sensing with Partial Gaussian

3

`∞-norm bounded signal1 in the range of a Lipschitz continuous generative model. In particular, for an L-Lipschitz

continuous generative model with bounded k-dimensional input and the range containing in the unit sphere, we

show that roughly O(k logLε2

)samples suffice for ε-accurate recovery. This matches the scaling obtained previously

for i.i.d. Gaussian matrices [44]. We also consider the impact of adversarial corruptions on the measurements, and

show that our scheme is robust to adversarial noise. We make use of a practical iterative algorithm proposed in [44],

and perform some simple numerical expriments on real datasets to support our theoretical results.

C. Notation

We use upper and lower case boldface letters to denote matrices and vectors respectively. We write [N ] =

0, 1, 2, · · · , N − 1 for a positive integer N . For any vector t = [t0, t1, . . . , tn−1]T ∈ Cn, we let Ct ∈ Cn×n be a

circulant matrix corresponding to t. That is, Ct(i, j) = tj−i, for all i, j ∈ [n], where here and subsequently, j − i

is understood modulo n. Let F ∈ Cn×n be the (normalized) Fourier matrix, i.e., F(i, j) = 1√ne−

2πij√−1

n , for all

i, j ∈ [n]. We define the `2-ball Bn2 (r) := x ∈ Rn : ‖x‖2 ≤ r, and we use Sn−1 := x ∈ Rn : ‖x‖2 = 1 to

denote the unit sphere in Rn. For a vector x = [x0, x1, . . . , xn−1]T ∈ Rn and an index i, we denote by s→i(x) the

vector shifted to right by i positions, i.e., the j-th entry of s→i(x) is the ((j − i) mod n)-th entry of x. Similarly,

we denote by si←(x) the vector shifted to left by i positions. A Rademacher variable takes the values +1 or −1,

each with probability 12 , and a sequence of independent Rademacher variables is called a Rademacher sequence.

The operator norm of a matrix from `p into `p is defined as ‖A‖p→p := max‖x‖p=1 ‖Ax‖p. We use g to represent

a standard normal random variable, i.e., g ∼ N (0, 1). The symbols C,C ′, C ′′, c are absolute constants whose values

may differ from line to line. Some additional notations that are used specifically for the proof of Lemma 1 are

presented in a table in Appendix B.

II. PROBLEM SETUP

Except where stated otherwise, we consider the following setup and assumptions:

• Let x∗ = [x∗0, x∗1, . . . , x

∗n−1]T ∈ Rn be the underlying signal vector, assumed to have unit norm (since recovery

of the norm is impossible under our measurement model). We write x∗ as x∗ = Dξx∗, where Dξ is a diagonal

matrix corresponding to a Rademacher sequence ξ = [ξ0, ξ1, . . . , ξn−1]T ∈ Rn.

• Fixing any (deterministic) index set Ω ⊆ [n] with |Ω| = m, we set the sensing matrix A ∈ Rm×n as

A = RΩCgDξ, (1)

where RΩ ∈ Rm×n is the matrix that restricts an n-dimensional vector to coordinates indexed by Ω and

g = [g0, g1, . . . , gn−1]T ∼ N (0, In). That is, A is a partial Gaussian circulant matrix with random column

sign flips. For concreteness, we fix the index set Ω as Ω = [m], but the same analysis holds for any other fixed

set. We also write A as A = RΩCg, i.e., a partial Gaussian circulant matrix without random column sign flips.

1While the assumption about `∞ norm bound appears to be restrictive, we may remove this assumption by multiplying a randomly signed

Hadamard matrix for pre-processing. See Remarks 1 and 2 for detalied explanations.

August 10, 2021 DRAFT

Page 4: Robust 1-bit Compressive Sensing with Partial Gaussian

4

• We assume that the measurements, or response variables, bi ∈ −1, 1, i ∈ [m], are generated according to

bi = θ (〈ai,x∗〉) , (2)

where aTi is the i-th row of the sensing matrix A ∈ Rm×n, and θ is a (possibly random) function with

θ(z) ∈ −1,+1. We focus on the following widely-adopted noisy 1-bit observation models:

– Random bit flips: θ(x) = τ · sign(x), where τ is a ±1-valued random variable with P(τ = −1) = p < 12 ,

independent of A.

– Random noise before quantization: θ(x) = sign(x+ e), where e represents a random noise term added

before quantization, independent of A. For example, when e ∼ N (0, σ2), this corresponds to a special

case of the probit model; when e is a logit noise term, this recovers the logistic regression model.

• We assume that

E[θ(g)g] = λ (3)

for some λ > 0, where g is a standard normal random variable. Note that the expectation is taken with respect

to both θ and g. Then, it is easy to calculate that E[biai] = λx∗ [20]. For the above choices of θ, we have [47,

Sec. 3]:

– Random bit flips: λ = (1− 2p)√

2π .

– Random noise before quantization: For any x ∈ R, we set

φ(x) = E[θ(x)] = 1− 2P(e < −x). (4)

Then, we have φ′(x) = 2fe(−x), where fe is the probability density function (pdf) of e. Thus, we have

λ = E[θ(g)g] = E[E[θ(g)g|g]] = E[φ(g)g] = E[φ′(g)] = 2E[fe(−g)] > 0. (5)

In particular, when e ∼ N (0, σ2), we have fe(x) = 1√2πσ

e−x2

2σ2 and λ =√

2π(1+σ2) . When e is a logit

noise term, we have φ(x) := tanh(x2

), and λ = E[φ′(g)] = 1

2E[

sech2(g2

)].

• For the case of random noise added before quantization, we additionally make the following assumptions:

– Let Fe be the cumulative distribution function (cdf) of e. We assume that Fe(x) + Fe(−x) = 1 for any

x ∈ R.2

– For any a > 0, let θa(x) = sign(x+ ea ). We assume that λa := E[θa(g)g] < C1a+C2, where g ∼ N (0, 1)

and C1, C2 are absolute constants.

The assumption Fe(x) + Fe(−x) = 1 ensures that sign(g + ea ) is a Rademacher random variable. We show in

Lemma 9 that the above assumptions are satisfied when e corresponds to the Gaussian or logit noise term.

• In addition to any random noise in θ, we allow for adversarial noise. In this case, instead of observing

b = [b0, . . . , bm−1]T ∈ Rm directly, we only assume access to b = [b0, . . . , bm−1]T ∈ Rm satisfying

1√m‖b− b‖2 ≤ ς (6)

2Or equivalently, fe(x) = fe(−x) for any x ∈ R.

August 10, 2021 DRAFT

Page 5: Robust 1-bit Compressive Sensing with Partial Gaussian

5

for some parameter ς ≥ 0. Note that the corruptions of b yielding b may depend on A.

• The signal vector x∗ lies in a set of structured signals K. We focus on the case that K = Range(G) for

some L-Lipschitz continuous generative model G : Bk2 (r) → Rn (e.g., see [3]). We will also assume that

Range(G) ⊆ Sn−1, but this can easily be generalized to the case that Range(G) ⊆ Rn and the recovery

guarantee is only given up to scaling [44], [46].

• To derive an estimate of x∗, we seek x maximizing a constrained correlation-based objective function:

x := arg maxx∈K

bT (Ax). (7)

Such an optimization problem has been considered in [47], with the main purpose of deriving a convex

optimization problem for the case that K is a convex set. We focus on the case that K = Range(G), which in

general leads to a non-convex optimization problem. This can be approximately solved using gradient-based

algorithms [44] or by projecting AT b onto K [48], [49]. In this paper, we provide theoretical guarantees for

the recovery in the ideal case that the global maximizer is found. We also provide experimental results in

Section V.

III. MAIN RESULTS

Recall that the i-th observation is bi = θ(〈ai,x∗〉), x∗ = Dξx∗, and A = RΩCg, i.e., a partial Gaussian circulant

matrix without random column sign flips. We have the following important lemma.

Lemma 1. For any t > 0 satisfying m = Ω (t+ log n), suppose that the unit vector x∗ satisfies ‖x∗‖∞ ≤ ρ

with ρ = O(

1m(t+logm)

). Suppose that b = θ(Ax∗) with θ(x) = τ · sign(x) or θ(x) = sign(x + e) (cf. Sec. II),

and A is a partial Gaussian circulant matrix with random column sign flips. Then, with probability at least

1− e−Ω(t) − e−Ω(√m), ∥∥∥∥ 1

mATb− λx∗

∥∥∥∥∞≤ O

(√t+ log n

m

). (8)

Based on Lemma 1, and following the proof techniques used in [46] (see Section IV-C for the details), we have

the following theorem concerning the recovery guarantee for noisy 1-bit CS with partial Gaussian circulant matrices

and generative priors.

Theorem 1. Consider any x∗ ∈ K with K = G(Bk2 (r)) for some L-Lipschitz generative model G : Bk2 (r)→ Sn−1.

Fix δ > 0 satisfying δ = O(√k log Lr

δ

λ√m

+ ςλ

), and suppose that Lr = Ω(δn), n = Ω

(m), and the signal x∗ satisfies

‖x∗‖∞ ≤ ρ = O(

1mk log Lr

δ

). Suppose that all other conditions are the same as those in Lemma 1. Let x be a

solution to the correlation-based optimization problem (7). Then, if

m = Ω

(k log

Lr

δ· (1 + 1(ς > 0)Sς)

), (9)

where Sς =(

log2 n)·(

log2(k log Lr

δ

))is a sample complexity term concerning the adversarial noise, with

probability at least 1− e−Ω(k log Lrδ ) − e−Ω(

√m) − 1(ς > 0) · e−Ω((log2(k log Lr

δ ))(log2 n)), it holds that

‖x− x∗‖2 ≤ O

√k log Lr

δ

λ√m

λ

. (10)

August 10, 2021 DRAFT

Page 6: Robust 1-bit Compressive Sensing with Partial Gaussian

6

Note that the extra logarithmic terms log2(k log Lrδ ) and log2 n in (9) are only required for deriving an upper

bound corresponding to the adversarial noise. If there is no adversarial noise, i.e., ς = 0, the sample complexity in (9)

becomes m = Ω(k log Lr

δ

), and it follows from (10) that O

(k log Lrδ

λ2ε2

)samples suffice for an ε-accurate recovery.

This matches the scaling derived for i.i.d. Gaussian measurements in [44].

Remark 1. As mentioned in the remark in [24, Page 8], for “typical” unit vectors, we have ‖x∗‖∞ = O(

logn√n

).

Thus, assuming m = Θ(k log Lr

δ

), we find that the condition ρ = O

(1

mk log Lrδ

)holds for typical signals when

1(k log Lr

δ )2 logn√

n. Assuming Lr

δ = poly(n) [3], it in turn suffices that k = O(nθ) for some θ ∈(0, 1

4

). In addition,

for an arbitrary unit vector x∗, we can use a simple trick to ensure the `∞ bound with high probability: Instead of

using a randomly signed partial Gaussian circulant sensing matrix A, we can use AH as the sensing matrix, where

H ∈ Rn×n is a randomly signed Hadamard matrix that also allows fast matrix-vector multiplications. Then, the

observation vector becomes b = θ(AHx∗). Since Hx∗ satisfies the desired `∞ bound with high probability [50,

Lemma 3.1], for any x that solves (7), ‖x−Hx∗‖2 satisfies the upper bound in (10). Because H is symmetric and

orthogonal, we have the same upper bound for ‖Hx− x∗‖2.

Remark 2. The use of random column sign flips and a randomly signed Hadamard matrix is most suited to

applications where one has the freedom to choose any measurement matrix, but is limited by computation. In this

sense, the design is more practical than the widespread i.i.d. Gaussian choice, though it may remain unsuitable in

applications such as medical imaging where one must use subsampled Fourier measurements without sign flips or

the Hadamard operator.

IV. PROOFS OF MAIN RESULTS

In this section, we provide proofs for Lemma 1 and Theorem 1. We first present some useful auxiliary results

that are general, and then some that are specific to our setup.

A. General Auxiliary Results

Definition 1. A random variable X is said to be sub-Gaussian if there exists a positive constant C such that

(E [|X|p])1/p ≤ C√p for all p ≥ 1. The sub-Gaussian norm of a sub-Gaussian random variable X is defined as

‖X‖ψ2:= supp≥1 p

−1/2 (E [|X|p])1/p.

We have the following concentration inequality for sub-Gaussian random variables.

Lemma 2. (Hoeffding-type inequality [51, Proposition 5.10]) Let X1, . . . , XN be independent zero-mean sub-

Gaussian random variables, and let K = maxi ‖Xi‖ψ2. Then, for any α = [α1, α2, . . . , αN ]T ∈ RN and any t ≥ 0,

it holds that

P

(∣∣∣ N∑i=1

αiXi

∣∣∣ ≥ t) ≤ exp

(1− ct2

K2‖α‖22

). (11)

In particular, we have the following lemma concerning the Hoeffding’s inequality for (real) Rademacher sums.

August 10, 2021 DRAFT

Page 7: Robust 1-bit Compressive Sensing with Partial Gaussian

7

Lemma 3. ([52, Proposition 6.11]) Let b ∈ RN and ε = (εj)j∈[N ] be a Rademacher sequence. Then, for u > 0,

P

∣∣∣∣∣∣N∑j=1

εjbj

∣∣∣∣∣∣ ≥ ‖b‖2u ≤ 2 exp

(−u

2

2

). (12)

We also use the following well-established fact based on the Riesz-Thorin interpolation theorem.

Lemma 4. ([53]) For any matrix A,

‖A‖2→2 ≤ max ‖A‖1→1, ‖A‖∞→∞ . (13)

Next, we present the Hanson-Wright inequality, which gives a tail bound for quadratic forms in independent

Gaussian variables.

Lemma 5. (Hanson-Wright inequality [54]) If g ∈ RN is a standard Gaussian vector and B ∈ RN×N , then for

any t > 0,

P(∣∣gTBg − E

[gTBg

]∣∣ ≥ t) ≤ exp

(−c ·min

t2

‖B‖2F,

t

‖B‖2→2

). (14)

B. Proof of Lemma 1

Recall that A = RΩCg is a partial Gaussian circulant matrix without random column sign flips and x∗ = Dξx∗.

We have A = ADξ, and ai = Dξai, where aTi is the i-th row of A. Then,

bi = θ(〈ai,x∗〉) = θ(〈ai, x∗〉) (15)

= θ(〈s→i(g), x∗〉) (16)

= θ(〈g, si←(Dξx∗)〉). (17)

Let si = si←(Dξx∗). Since in general, the vectors s0, s1, . . . , sm−1 are not jointly orthogonal, the observations

b0, b1, . . . , bm−1 are not independent. However, the sequence of observations can be approximated by a sequence of

independent random variables using the notion of (m,β)-orthogonality defined in the following.

Definition 2. A sequence of m unit vectors t0, t1, . . . , tm−1 is said to be (m,β)-orthogonal if there exists a

decomposition

ti = ui + ri (∀i) (18)

satisfying the following properties:

1) ui is orthogonal to spanuj : j 6= i;

2) maxi‖ri‖2 ≤ β.

We have the following useful lemma regarding Dξ.

Lemma 6. ([24, Lemma 7]) Let p,q ∈ Rn be vectors that satisfy ‖p‖2 = 1 and ‖q‖∞ ≤ ρ for some parameter ρ.

Then, for any 0 < j < n and ε > 0, we have

P (|〈Dξp, s→j(Dξq)〉| > ε) ≤ e−ε2

8ρ2 , (19)

August 10, 2021 DRAFT

Page 8: Robust 1-bit Compressive Sensing with Partial Gaussian

8

where the probability is over the Rademacher sequence ξ.

Based on Lemma 6, and similar to [24, Lemma 9], we have the following lemma ensuring that the (m,β)-

orthogonality condition is satisfied by a certain sequence of unit vectors with high probability. The proof is given in

Appendix A for completeness.

Lemma 7. For any ν ∈ (0, 1), suppose that x is a unit vector with ‖x‖∞ ≤ ρ < 1

4√

2m log m2

ν

. Let ti = si←(Dξx) for

i ∈ [m]. Then, with probability at least 1−ν over the choice of ξ, the sequence t0, t1, . . . , tm−1 is (m,β)-orthogonal

for β = 2√ρ.

In addition, we have the following two lemmas for the case of random noise added before quantization.

Lemma 8. Let g ∼ N (0, 1) and e be any random noise that is independent with g. Then, for any b ≥ 0 and

a, η > 0,

P(|ag + be| ≤ η) ≤√

2

π

η

a. (20)

Proof. Let fe be the pdf of e and Fg be the cdf of g, i.e., Fg(x) = 1√2π

∫ x−∞ e−

t2

2 dt. We have maxx∈R F′g(x) =

maxx∈R1√2πe−

x2

2 = 1√2π

. Then,

P(|ag + be| ≤ η) =

∫ ∞−∞

fe(t)

∫ η−bta

−η−bta

1√2πe−

x2

2 dxdt (21)

=

∫ ∞−∞

fe(t)

(Fg

(η − bta

)− Fg

(−η − bt

a

))dt (22)

≤ 2η

a·maxx∈R

F ′g(x)

∫ ∞−∞

fe(t)dt (23)

=

√2

π

η

a. (24)

The following lemma shows that when the random noise e added before quantization is Gaussian or logit, then

the additional assumptions corresponding to e made in Section II are satisfied.

Lemma 9. When e ∼ N (0, σ2) or e is a logit noise term, for any x ∈ R and any a > 0, we have

Fe(x) + Fe(−x) = 1 (25)

and

λa := E[θa(g)g] < C1a+ C2, (26)

where θa(x) := sign(x+ ea ), and C1, C2 are absolute constants.

Proof. When e ∼ N (0, σ2), by the symmetry of the pdf of a zero-mean Gaussian distribution, it is easy to see

that (25) holds. Note that ea ∼ N (0, σ

2

a2 ). From λ = E[θ(g)g] =√

2π(1+σ2) (cf. Section II), we obtain

λa = E[θa(g)g] =

√2

π(1 + σ2

a2 )<

√2

π. (27)

August 10, 2021 DRAFT

Page 9: Robust 1-bit Compressive Sensing with Partial Gaussian

9

When e is a logit noise term, we have

P(θ(x) = 1) =ex

ex + 1, P(θ(x) = −1) =

1

ex + 1. (28)

From θ(x) = sign(x+ e), we obtain

Fe(−x) = P(e < −x) =1

ex + 1, (29)

or equivalently,

Fe(x) =1

e−x + 1. (30)

This gives Fe(x) + Fe(−x) = 1 and

fe(x) = F ′e(x) =e−x

(e−x + 1)2≤ 1

4. (31)

For any fixed x, we have

φa(x) := E[θa(x)] = 1− 2P(e < −ax). (32)

Then, for g ∼ N (0, 1), we obtain

λa := E[θa(g)g] = E[E[θa(g)g|g]] (33)

= E[φa(g)g] = E[φ′a(g)] = 2aE[fe(−ag)] ≤ a

2. (34)

Overview of the proof: With the above auxiliary results in place, the main ideas for proving Lemma 1 are outlined

as follows: Let si = si←(Dξx∗). From Lemma 7, the sequence of unit vectors s0, s1, . . . , sm−1 is (m,β)-orthogonal

with high probability. We wirte si = ui + ri as in (18), and then bi = θ(〈g, si〉) = θ(〈g,ui〉+ 〈g, ri〉). Because

‖ri‖2 = o(1), for the case of random bit flips with θ(x) = τ · sign(x), we have that for most of i ∈ [m], the

term |〈g,ui〉| is larger than the term |〈g, ri〉|, and thus bi = θ(〈g,ui〉 + 〈g, ri〉) = θ(〈g,ui〉).3 Because of the

orthogonality of u0,u1, . . . ,um−1, the sequence θ(〈g,ui〉)i∈[m] is independent. Therefore, for any j ∈ [n], the

j-th entry of 1mATb− λx∗ is

1

m

∑i∈[m]

(aijbi − λx∗j ) ≈1

m

∑i∈[m]

(gj−iθ(〈g,ui〉)− λui(j − i)), (35)

where we use aij = gj−i, bi ≈ θ(〈g,ui〉) and x∗j = si(j− i) ≈ ui(j− i). The sequencegj−iθ(〈g,ui〉)−λui(j−

i)i∈[m]

is still not independent, but defining hi =⟨g, ui‖ui‖2

⟩, we have that h0, h1, . . . , hm−1 are i.i.d. standard

Gaussian random variables. We use these to represent gj−i, such that the sum 1m

∑i∈[m](gj−iθ(〈g,ui〉)−λui(j−i))

can be decomposed into three terms.4 We control these three terms separately by making use of Lemmas 2, 3 and 5.

We now proceed with the formal proof. A table of notations is presented in Appendix B for convenience.

3Similarly, for random noise added before quantization with θ(x) = sign(x+ e), from Lemma 8, we have that for most of i ∈ [m], the term

|〈g,ui〉+ ei| is larger than the term |〈g, ri〉|, and thus we also have bi = θ(〈g,ui〉).4For random noise added before quantization, it is decomposed into four terms.

August 10, 2021 DRAFT

Page 10: Robust 1-bit Compressive Sensing with Partial Gaussian

10

Proof of Lemma 1. Useful high-probability events: Recall that from (17), we have bi = θ(〈g, si←(Dξx∗)〉). Let

si = si←(Dξx∗) for i ∈ [m]. Because ‖x∗‖∞ ≤ ρ with ρ ≤ O

(1

m(t+logm)

), setting ν = e−t in Lemma 7, we

have that with probability at least 1 − e−t, the sequence s0, . . . , sm−1 is (m,β)-orthogonal with β = 2√ρ. We

write si = ui + ri with ‖ri‖2 ≤ β for all i. In the following, we always assume the (m,β)-orthogonality. Because

‖ri‖2 ≤ β, from Lemma 2 and a union bound over i ∈ [m], we have that for any η > 0, with probability at least

1−me−η2

2β2 ,

maxi∈[m]

|〈g, ri〉| ≤ η. (36)

In addition, for β = o(1), we have ‖ui‖2 ∈ [1− β, 1 + β] ≥√

2π . By Lemma 8, we obtain for any i ∈ [m] and

η > 0 that

P (|〈g,ui〉+ ωei| ≤ η) = P(∣∣∣∣‖ui‖2⟨g,

ui‖ui‖2

⟩+ ωei

∣∣∣∣ ≤ η) (37)

≤√

2

π

η

‖ui‖2≤ η, (38)

where ω = 0 corresponds to random bit flips with θ(x) = τ · sign(x), and ω = 1 corresponds to random

noise added before quantization with θ(x) = sign(x + e). By the orthogonality of u0, . . . ,um−1, the sequence

〈g,u0〉, . . . , 〈g,um−1〉 is independent. Let

Iη = i ∈ [m] : |〈g,ui〉+ ωei| ≤ η, (39)

and let Icη = [m]\Iη be the complement of Iη . Using a standard Chernoff bound [24], we have for any ζ > 0 that

P(|Iη| ≥ mη +mζ) < e−mζ2

2η+ζ . (40)

Note that when (36) holds, for i ∈ Icη , we have

bi = θ(〈g, si〉) = θ(〈g,ui〉), (41)

since the term 〈g,ui〉+ ωei of magnitude exceeding η dominates the other term 〈g, ri〉 of magnitude at most η. In

addition, because si = si←(x∗), we have for any fixed j ∈ [n] that

si(j − i) = x∗j . (42)

Another useful property is the following bound on the maximum of n independent Gaussians (e.g., using [2,

Eq. (2.9)] and a union bound): With probability at least 1− e−t,

maxj∈[n]

|gj | ≤√

2(t+ log n). (43)

Combining (36), (40), and (43), we find that the event

E : maxi∈[m]

|〈g, ri〉| ≤ η; |Iη| < m(η + ζ); maxj∈[n]

|gj | ≤√

2(t+ log n); (44)

occurs with probability at least 1− e−t −me−η2

2β2 − e−mζ2

2η+ζ .

August 10, 2021 DRAFT

Page 11: Robust 1-bit Compressive Sensing with Partial Gaussian

11

Dividing 1mATb− λx∗ into several terms: By the definitions of Iη and Icη , for any j ∈ [n], we have[

1

mATb− λx∗

]j

=1

m

m−1∑i=0

(aijbi − λx∗j ) (45)

=1

m

∑i∈Icη

(aijbi − λx∗j ) +1

m

∑i∈Iη

(aijbi − λx∗j ) (46)

=1

m

∑i∈Icη

(aijθ(〈g,ui〉)− λsi(j − i)) +1

m

∑i∈Iη

(aijbi − λx∗j ) (47)

=1

m

∑i∈Icη

(gj−iθ(〈g,ui〉)− λ

ui(j − i)‖ui‖2

)

− λ

m

∑i∈Icη

(ri(j − i) + ui(j − i)−

ui(j − i)‖ui‖2

)

+1

m

∑i∈Iη

(gj−ibi − λx∗j ), (48)

where (47) is from (41) and (42), and (48) is from si = ui + ri. Therefore, if we denote Yij = gj−iθ(〈g,ui〉)−

λui(j−i)‖ui‖2 , Z1 := 2(η + ζ)

√2(t+ log n), Z2 := C

√t+lognm for some sufficiently large absolute constant C, and let

Z3 := Z2 + 2β + 2Z1, we obtain

P

(maxj∈[n]

∣∣∣∣∣ 1

m

m−1∑i=0

(aijbi − λx∗j )

∣∣∣∣∣ > Z3

)≤ P(Ec) + P

(maxj∈[n]

∣∣∣∣∣ 1

m

m−1∑i=0

(aijbi − λx∗j )

∣∣∣∣∣ > Z3, E

)(49)

≤ e−t +me− η2

2β2 + e−mζ2

2η+ζ + P

(maxj∈[n]

∣∣∣∣∣ 1

m

m−1∑i=0

(aijbi − λx∗j )

∣∣∣∣∣ > Z3, E

)(50)

≤ e−t +me− η2

2β2 + e−mζ2

2η+ζ + P

maxj∈[n]

∣∣∣∣∣∣ 1

m

∑i∈Icη

Yij

∣∣∣∣∣∣ > Z2 + Z1, E

+ P

maxj∈[n]

∣∣∣∣∣∣ λm∑i∈Icη

(si(j − i)−

ui(j − i)‖ui‖2

∣∣∣∣ > 2β, E

+ P

maxj∈[n]

∣∣∣∣∣∣ 1

m

∑i∈Iη

(gj−ibi − λx∗j )

∣∣∣∣∣∣ > Z1, E

(51)

≤ e−t +me− η2

2β2 + e−mζ2

2η+ζ + P

maxj∈[n]

∣∣∣∣∣∣ 1

m

∑i∈Iη

Yij

∣∣∣∣∣∣ > Z1

∣∣∣E+ P

maxj∈[n]

∣∣∣∣∣∣ 1

m

∑i∈[m]

Yij

∣∣∣∣∣∣ > Z2

+ P

maxj∈[n]

∣∣∣∣∣∣ λm∑i∈Icη

(si(j − i)−

ui(j − i)‖ui‖2

)∣∣∣∣∣∣ > 2β∣∣∣E+ P

maxj∈[n]

∣∣∣∣∣∣ 1

m

∑i∈Iη

(aijbi − λx∗j )

∣∣∣∣∣∣ > Z1

∣∣∣E , (52)

where (50) follows from (44), (51) follows from (48) and the definition of Z3, and (52) uses the triangle inequality

and the fact that P(A, E) ≤ minP(A|E),P(A). Noting that aij = gj−i, λ < 1 (cf. Section II), |x∗j | ≤ ρ < 1,

‖ri‖2 ≤ β, and |1− ‖ui‖2| ≤ β, conditioned on E , we have

maxj∈[n]

∣∣∣∣∣∣ λm∑i∈Icη

(ri(j − i) + ui(j − i)−

ui(j − i)‖ui‖2

)∣∣∣∣∣∣ ≤ 2β, (53)

maxj∈[n]

∣∣∣∣∣∣ 1

m

∑i∈Iη

(gj−ibi − λx∗j )

∣∣∣∣∣∣ ≤ Z1, (54)

August 10, 2021 DRAFT

Page 12: Robust 1-bit Compressive Sensing with Partial Gaussian

12

and

maxj∈[n]

∣∣∣∣∣∣ 1

m

∑i∈Iη

(gj−iθ(〈g,ui〉)− λ

ui(j − i)‖ui‖2

)∣∣∣∣∣∣ ≤ Z1. (55)

Bounding the remaining term: In the following, we focus on deriving an upper bound for maxj∈[n]

∣∣ 1m

∑i∈[m] Yij

∣∣ =

maxj∈[n]

∣∣ 1m

∑i∈[m]

(gj−iθ(〈g,ui〉)− λui(j−i)

‖ui‖2

)∣∣. Let hi =⟨g, ui‖ui‖2

⟩. Then, h0, h1, . . . , hm−1 are i.i.d. standard

Gaussian random variables. For i, ` ∈ [m], j ∈ [n], we denote uj,i,` := u`(j−i)‖u`‖2 . We have from the definition of h`

that Cov[gj−i, h`] = uj,i,`, and for any fixed j ∈ [n], gj−i can be written as

gj−i =∑`∈[m]

uj,i,`h` + wj,itj,i, (56)

where wj,i =√

1−∑`∈[m] u

2j,i,` and tj,i ∼ N (0, 1) is independent of h0, h1, . . . , hm−1. Note that for random

noise added before quantization, we have

θ(〈ui,g〉) = θ(‖ui‖2hi) = sign(‖ui‖2hi + ei) (57)

= sign

(hi +

ei‖ui‖2

)= θi(hi), (58)

where θi(x) := sign(x + e‖ui‖2 ). Let λi = E[θi(g)g] for g ∼ N (0, 1). From Lemma 9, we have that λi <

C1‖ui‖2 + C2 = C, and θi(g) is a Rademacher random variable. Therefore, we obtain5

1

m

∑i∈[m]

(gj−iθ(〈g,ui〉)− λ

ui(j − i)‖ui‖2

)=

1

m

∑i∈[m]

(gj−iθi(hi)− λuj,i,i) (59)

=1

m

∑i∈[m]

∑`∈[m]

uj,i,`h`θi(hi) + wj,itj,iθi(hi)− λuj,i,i

(60)

=1

m

∑i∈[m]

uj,i,i (hiθi(hi)− λi) +1

m

∑i∈[m]

uj,i,i(λi − λ)

+1

m

∑i∈[m]

θi(hi)∑` 6=i

uj,i,`h` +1

m

∑i∈[m]

wj,itj,iθi(hi), (61)

where (60) follows from (56). We control the four summands in (61) separately as follows:

• Because uj,i,ihiθi(hi) − λiuj,i,i, i ∈ [m], are independent, zero-mean, sub-Gaussian random variables with

sub-Gaussian norm upper bounded by an absolute constant c, from Lemma 2 and a union bound over j ∈ [n],

we have with probability at least 1− e−t that

maxj∈[n]

∣∣∣∣∣∣ 1

m

∑i∈[m]

uj,i,i (hiθi(hi)− λi)

∣∣∣∣∣∣ ≤ O(√

t+ log n

m

). (62)

• For the case of random bit flips, the second term vanishes. For the case of random noise added before

quantization, from 0 < λ, λi < C and (68), we obtain

maxj∈[n]

∣∣∣∣∣∣ 1

m

∑i∈[m]

uj,i,i(λi − λ)

∣∣∣∣∣∣ ≤ C maxi∈[m],j∈[n]

|uj,i,i| = O

(1√

m(t+ logm)

). (63)

5For the case of random bit flips with θ(x) = τ · sign(x+ e), we set θi = θ (and thus λi = λ) for all i ∈ [m].

August 10, 2021 DRAFT

Page 13: Robust 1-bit Compressive Sensing with Partial Gaussian

13

• We have

maxj,`

∑i∈[m]

u2j,i,` ≤ 4 max

j,`

∑i∈[m]

u`(j − i)2 (64)

= 4 maxj,`

∑i∈[m]

(s`(j − i)− r`(j − i))2 (65)

≤ 8 maxj,`

∑i∈[m]

(s`(j − i)2 + r`(j − i)2

)(66)

≤ 8 maxj,`

(m‖x∗‖2∞ + ‖r`‖22

)(67)

= O

(1

m(t+ logm)

), (68)

where we use ‖u`‖2 = 1 + o(1) in (64), s` = u` + r` in (65), (a− b)2 ≤ 2(a2 + b2) in (66), and ‖s`‖∞ =

‖x∗‖∞ ≤ ρ = O(

1m(t+logm)

)as well as ‖r`‖2 ≤ 2

√ρ in (67) and (68). Then, from Proposition 1 below and

a union bound over j ∈ [n], we have with probability at least 1−O(e−t)

that

maxj∈[n]

∣∣∣∣∣∣ 1

m

∑`∈[m]

h`∑i 6=`

uj,i,`θi(hi)

∣∣∣∣∣∣ ≤ O(√

t+ log n+ logm

m

)= O

(√t+ log n

m

). (69)

• For a fixed j ∈ [n] and 0 ≤ i1 6= i2 < m, we have

Cov [wj,i1tj,i1 , wj,i2tj,i2 ] = Cov

gj−i1 − ∑`∈[m]

uj,i1,`h`, gj−i2 −∑`∈[m]

uj,i2,`h`

(70)

= −∑`∈[m]

uj,i1,`uj,i2,`. (71)

Then, [wj,0tj,0, . . . , wj,m−1tj,m−1]T ∈ Rm is a Gaussian vector with zero mean and covariance matrix

Σj , where Σj,i,i = w2j,i = 1 −

∑`∈[m] u

2j,i,` ≤ 1 and from (67), we have for i1 6= i2 that

∣∣Σj,i1,i2∣∣ =∣∣−∑`∈[m] uj,i1,`uj,i2,`∣∣ ≤ √(

∑`∈[m] u

2j,i1,`

)(∑`∈[m] u

2j,i2,`

) = O(

1m(t+logm)

)= O

(1m

). When m = Ω(t+

log n), taking a union bound over j ∈ [n], and setting ε = O(√

t+lognm

)in Proposition 2 below, we obtain

with probability at least 1−O(e−t)

that

maxj∈[n]

∣∣∣∣∣∣ 1

m

∑i∈[m]

wj,itj,iθi(hi)

∣∣∣∣∣∣ ≤ O(√

t+ log n

m

). (72)

Combining (61) with (62), (63), (69), (72), we obtain with probability at least 1−O (e−t) that

maxj∈[n]

∣∣∣∣∣∣ 1

m

∑i∈[m]

(gj−iθ(〈g,ui〉)− λ

ui(j − i)‖ui‖2

)∣∣∣∣∣∣ ≤ O(√

t+ log n

m

). (73)

Combining terms: Finally, combining the (m,β)-orthogonality and (52), (53), (54), (55), and (73), with probability

at least 1−O (e−t)−me−η2

2β2 − e−mζ2

2η+ζ , we have∥∥∥∥ 1

mATb− λx∗

∥∥∥∥∞

= maxj∈[n]

∣∣∣∣∣ 1

m

m−1∑i=0

(aijbi − λx∗j )

∣∣∣∣∣ ≤ O(√

t+ log n

m

)+ 2β + 4(η + ζ)

√2(t+ log n). (74)

August 10, 2021 DRAFT

Page 14: Robust 1-bit Compressive Sensing with Partial Gaussian

14

Setting η = β√

2(t+ logm) = O(√ρ) ·

√2(t+ logm) = O

(1√m

)and ζ = C√

m, we obtain with probability at

least 1− e−Ω(t) − e−Ω(√m) that ∥∥∥∥ 1

mATb− λx∗

∥∥∥∥∞≤ O

(√t+ log n

m

). (75)

We now move on to the statements and proofs of Propositions 1 and 2 used above. Throughout the following, we

use the θi defined in (58).6

Proposition 1. Let z ∈ Rm be a standard Gaussian vector. For any t > 0, suppose that W ∈ Rm×m satisfies

max`∈[m]

∑i∈[m] w

2i,` = O

(1

m(t+logm)

). Then, we have with probability at least 1−O

(e−t)

that∣∣∣∣∣∣ 1

m

∑i∈[m]

θi(zi)∑` 6=i

wi,`z`

∣∣∣∣∣∣ ≤ O(√

t+ logm

m

). (76)

Proof. For any fixed ` ∈ [m] and any v > 0, from Lemma 2, we have with probability at least 1−O(e−v

2)that

|z`| ≤ v. (77)

Therefore, with probability at least 1−O(e−v

2),

|z`|√∑

i 6=`

w2i,` = O

(v√

m(t+ logm)

). (78)

Note that θi(z0), θi(z1), . . . , θi(zm−1) form a Rademacher sequence. Then, for any u > 0 and a sufficiently large

absolute constant C, we have

P

∣∣∣∣∣∣z`∑i 6=`

wi,`θi(zi)

∣∣∣∣∣∣ > Cv√m(t+ logm)

u

≤ P(|z`| > v) + P

∣∣∣∣∣∣z`∑i 6=`

wi,`θi(zi)

∣∣∣∣∣∣ > Cv√m(t+ logm)

u∣∣∣ |z`| ≤ v

(79)

≤ O(e−u

2)+O

(e−v

2), (80)

where we use (77), (78) and Lemma 3 to derive (80). Taking a union bound over ` ∈ [m], we obtain with probability

at least 1−O(m(e−u

2

+ e−v2))

that

max`∈[m]

∣∣∣∣∣∣z`∑i 6=`

wi,`θi(zi)

∣∣∣∣∣∣ ≤ Cv√m(t+ logm)

u. (81)

Then, we have ∣∣∣∣∣∣ 1

m

∑i∈[m]

θi(zi)∑` 6=i

wi,`z`

∣∣∣∣∣∣ =

∣∣∣∣∣∣ 1

m

∑`∈[m]

z`∑i 6=`

wi,`θi(zi)

∣∣∣∣∣∣ (82)

≤ max`∈[m]

∣∣∣∣∣∣z`∑i 6=`

wi,`θi(zi)

∣∣∣∣∣∣ ≤ Cv√m(t+ logm)

u. (83)

6For the case of random bit flips with θ(x) = τ · sign(x+ e), we set θi = θ for i ∈ [m].

August 10, 2021 DRAFT

Page 15: Robust 1-bit Compressive Sensing with Partial Gaussian

15

Setting u = C ′√t+ logm and v = C ′′

√t+ logm, we obtain the desired result.

Proposition 2. Let z ∈ Rm be a standard Gaussian vector. Let t ∼ N (0,Σ) be independent of z, where Σ ∈ Rm

satisfies Σi,i ≤ 1 for i ∈ [m] and Σi1,i2 = O(

1m

)for i1 6= i2. Then, for any ε ∈ (0, 1), with probability 1−e−Ω(mε2),∣∣∣∣∣∣ 1

m

∑i∈[m]

tiθi(zi)

∣∣∣∣∣∣ ≤ ε. (84)

Proof. We have

‖Σ‖2F =∑i∈[m]

Σ2i,i +

∑i1 6=i2

Σ2i1,i2 ≤ m+m2 ·O

( 1

m2

)= O(m), (85)

and from Lemma 4,

‖Σ‖2→2 ≤ ‖Σ‖1→1 = O(1). (86)

Then, setting B = Σ in Lemma 5, we obtain with probability at least 1− e−Ω(m) that∑i∈[m]

t2i ≤ 2m. (87)

Again, note that θi(z0), θi(z1), . . . , θi(zm−1) form a Rademacher sequence. For any u > 0 and a sufficiently large

absolute constant C, we have

P

∣∣∣∣∣∣ 1

m

∑i∈[m]

tiθi(zi)

∣∣∣∣∣∣ > C√mu

≤ P

∑i∈[m]

t2i > 2m

+ P

∣∣∣∣∣∣ 1

m

∑i∈[m]

tiθi(zi)

∣∣∣∣∣∣ > C√mu∣∣∣ ∑i∈[m]

t2i ≤ 2m

(88)

≤ O(e−u

2)+ e−Ω(m), (89)

where we use Lemma 3 and (87) to derive (89). Setting ε = C√mu, and we obtain the desired result.

C. Proof of Theorem 1

Before proving Theorem 1, we provide some further lemmas.

Lemma 10. Suppose that the assumptions in Lemma 1 are satisfied. Then, for any v ∈ Rn and t > 0 satisfying

m = Ω(t+ log n), with probability 1− e−Ω(t) − e−Ω(√m), we have∣∣∣∣⟨ 1

mATb− λx∗,Dξv

⟩∣∣∣∣ ≤ O(‖v‖2

√t+ log n

m

). (90)

Proof. Using the same notations for si,ui, ri, hi as those in the proof of Lemma 1, we have⟨1

mATb− λx∗,Dξv

⟩=‖v‖2m

∑i∈[m]

(〈ai,Dξv〉bi − λ〈x∗,Dξv〉) (91)

=‖v‖2m

∑i∈Jη

(〈g,wi〉θ(〈g,ui〉)− λ

⟨wi,

ui‖ui‖2

⟩)

− λ‖v‖2m

∑i∈Jη

(〈x∗,Dξv〉 −

⟨wi,

ui‖ui‖2

⟩)

+‖v‖2m

∑i∈Iη

(〈g,wi〉bi − λ〈x∗,Dξv〉) , (92)

August 10, 2021 DRAFT

Page 16: Robust 1-bit Compressive Sensing with Partial Gaussian

16

where v := v‖v‖2 and wi := si←(Dξv) so that 〈x∗,Dξv〉 = 〈si←(x∗), si←(Dξv)〉 = 〈si,wi〉. Note that (92) is

identical to (48), except that we use 〈g,wi〉 to replace gj−i therein. Similarly to (52), we know that we only need to

focus on bounding 1m

∑i∈[m]

(〈g,wi〉θ(〈g,ui〉)− λ

⟨wi,

ui‖ui‖2

⟩). Setting αi,` := Cov[〈g,wi〉, h`] =

⟨wi,

u`‖u`‖2

⟩,

similarly to (56), 〈g,wi〉 can be written as

〈g,wi〉 =∑`∈[m]

αi,`h` + γ`f`, (93)

where γ` =√

1−∑`∈[m] α

2i,` and f` ∼ N (0, 1) is independent of h0, h1, . . . , hm. Note that by Lemma 6, we

have with probability at least 1− e−t that

maxi 6=`〈wi, s`〉 ≤ ρ ·O(

√t+ logm) = O

(1

m√t+ logm

), (94)

where we use ρ = O(

1m(t+logm)

)in (94). By the (m,β)-orthogonality (cf. the first paragraph in the proof of

Lemma 1 for the cases of random flips and Gaussian noise), for all i, ` ∈ [m], we have∑j∈[n]

v2j r`(j + i)2 ≤ ‖r`‖2∞ = O

(1

m2(t+ logm)

). (95)

Setting u = O(√t+ logm) in Lemma 3 and taking a union bound over i, ` ∈ [m], we obtain with probability at

least 1− e−t that

maxi,`∈[m]

|〈wi, r`〉| = maxi,`∈[m]

|〈Dξv, s→i(r`)〉| (96)

= maxi,`∈[m]

|∑j∈[n]

vjr`(j + i)ξj | ≤ O( 1

m

). (97)

Therefore, we obtain from (94) and (97) that

maxi 6=`|αi,`| ≤ 2 max

i6=`|〈wi,u`〉| = 2 max

i6=`|〈wi, s` − r`〉| = O

( 1

m

), (98)

and we are able to derive inequalities similar to (85), (86) and (87). Following the same proof techniques as those

for Lemma 1, we obtain the desired result.

Based on Lemmas 1 and 10, and using a well-established chaining argument that we do not repeat here (e.g.,

see [46, Lemma 3]), we attain the following lemma. Note that we use the (m,β)-orthogonality and the event E

(cf. (44)) only once, and do not need to take any union bound for them.

Lemma 11. Under the conditions in Theorem 1, with probability at least 1− e−Ω(k log Lrδ )− e−Ω(

√m), it holds that∣∣∣∣⟨ 1

mATb− λx∗,Dξx− x∗

⟩∣∣∣∣ ≤ O√k log Lr

δ

m

(‖x− x∗‖2 + δ). (99)

The restricted isometry property (RIP) is widely used in compressive sensing (CS), and we present its definition

in the following.

Definition 3. A matrix Φ ∈ Cm×n is said to have the restricted isometry property of order s and level δ ∈ (0, 1)

(equivalently, (s, δ)-RIP) if

(1− δ)‖x‖22 ≤ ‖Φx‖22 ≤ (1 + δ)‖x‖22 (100)

August 10, 2021 DRAFT

Page 17: Robust 1-bit Compressive Sensing with Partial Gaussian

17

for all s-sparse vectors in Cn. The restricted isometry constant δs is defined as the smallest value of δ for which (100)

holds.

We have the following lemma that guarantees the RIP for partial Gaussian circulant matrices.

Lemma 12. (Adapted from [55, Theorem 4.1]) For s ≤ n and η, δ ∈ (0, 1), if

m ≥ Ω( sδ2

(log2 s

) (log2 n

)), (101)

then with probability at least 1−e−Ω((log2 s)(log2 n)), the restricted isometry constant of 1√m

A = 1√m

RΩCg satisfies

δs ≤ δ.

Based on the RIP, we have the Johnson-Lindenstrauss embedding according to the following lemma.

Lemma 13. ([56, Theorem 3.1]) Fix η > 0 and ε ∈ (0, 1), and consider a finite set E ⊆ Cn of cardinality |E| = p.

Set s ≥ 40 log 4pη , and suppose that Φ ∈ Cm×n satisfies the RIP of order s and level δ ≤ ε

4 . Then, with probablity

at least 1− η, we have

(1− ε)‖x‖22 ≤ ‖ΦDξx‖22 ≤ (1 + ε)‖x‖22 (102)

uniformly for all x ∈ E, where Dξ = Diag(ξ) is a diagonal matrix with respect to a Rademacher sequence ξ.

Combining Lemmas 12 and 13 with setting s = Ω(log p) and η = O(

1p

), we derive the following corollary.

Corollary 1. Fix ε ∈ (0, 1), and consider a finite set E ⊆ Cn of cardinality |E| = p satisfying n = Ω(log p).

Suppose that

m = Ω

(log p

ε2(log2(log p)

) (log2 n

)). (103)

Then, with probablity 1− e−Ω(log p) − e−Ω((log2(log p))(log2 n)), we have

(1− ε)‖x‖22 ≤∥∥∥∥ 1√

mAx

∥∥∥∥2

2

≤ (1 + ε)‖x‖22 (104)

for all x ∈ E, where we recall that A = RΩCgDξ = ADξ (cf. (1)).

Note that from Lemma 2 and a union bound over [n], and by Lemma 4, we have that for any t > 0, with

probability e−Ω(t),∥∥ 1√

mA∥∥

2→2≤ n

√t+logn√m

. Based on this bound for the spectral norm of 1√m

A and Corollary 1,

and by again using a well-established chaining argument (e.g., see [3, Lemma 4.1], [57, Section 3.4]), we obtain the

following lemma that guarantees the two-sided Set-Restricted Eigenvalue Condition (S-REC) [3] for partial Gaussian

circulant matrices. This lemma will only be used to derive an upper bound for the term corresponding to adversarial

noise.

Lemma 14. For any δ > 0 satisfying Lr = Ω(δn) and n = Ω(k log Lr

δ

), and any α ∈ (0, 1), if

m = Ω

(k log Lr

δ

α2

(log2

(k log

Lr

δ

))(log2 n

)), (105)

August 10, 2021 DRAFT

Page 18: Robust 1-bit Compressive Sensing with Partial Gaussian

18

then with probability 1− e−Ω(k log Lrδ ) − e−Ω((log2(k log Lr

δ ))(log2 n)), it holds that

(1− α)‖x1 − x2‖2 − δ ≤∥∥∥∥ 1√

mA(x1 − x2)

∥∥∥∥2

≤ (1 + α)‖x1 − x2‖2 + δ, (106)

for all x1,x2 ∈ G(Bk2 (r)).

We now provide the proof of Theorem 1.

Proof of Theorem 1. Because x is a solution to (7) and x∗ ∈ K := G(Bk2 (r)), we have

1

mbT (Ax) ≥ 1

mbT (Ax∗). (107)

Recall that A = RΩCg, and x∗ = Dξx∗ (cf. Section III). We have A = RΩCgDξ = ADξ and

1

mbT (ADξx) ≥ 1

mbT (Ax∗), (108)

which can equivalently be expressed as⟨1

mAT b− λx∗,Dξx− x∗

⟩+ 〈λx∗,Dξx− x∗〉 ≥ 0. (109)

By the assumption that G(Bk2 (r)) ⊆ Sn−1, we obtain that ‖Dξx‖2 = ‖x‖2 = 1, and

〈λx∗, x∗ −Dξx〉 = λ〈x∗,x∗ − x〉 =λ

2‖x∗ − x‖22. (110)

Therefore, we obtain

λ

2‖x∗ − x‖22 ≤

∣∣∣∣⟨ 1

mAT (b− b),Dξx− x∗

⟩∣∣∣∣+

∣∣∣∣⟨ 1

mATb− λx∗,Dξx− x∗

⟩∣∣∣∣ (111)

≤∥∥∥∥ 1√

m(b− b)

∥∥∥∥2

·∥∥∥∥ 1√

mA(x− x∗)

∥∥∥∥2

+

∣∣∣∣⟨ 1

mATb− λx∗,Dξx− x∗

⟩∣∣∣∣ . (112)

Setting α = 12 in Lemma 14, we have that if m = Ω(k log Lr

δ (log2(k log Lrδ ))(log2 n)), with probability at least

1− e−Ω(k log Lrδ ) − e−Ω((log2(k log Lr

δ ))(log2 n)),∥∥∥∥ 1√m

A(x− x∗)

∥∥∥∥2

≤ O(‖x− x∗‖2 + δ). (113)

Moreover, by Lemma 11, with probability 1− e−Ω(k log Lrδ ) − e−Ω(

√m), we have∣∣∣∣⟨ 1

mATb− λx∗,Dξx− x∗

⟩∣∣∣∣ ≤ O√k log Lr

δ

m

(‖x− x∗‖2 + δ). (114)

Recall that we assume 1√m‖b− b‖2 ≤ ς . Combining (112), (113) and (114), we obtain with probability at least

1− e−Ω(k log Lrδ ) − e−Ω((log2(k log Lr

δ ))(log2 n)) − e−Ω(√m) that

λ

2‖x∗ − x‖22 ≤ O

√k log Lrδ

m+ ς

(‖x− x∗‖2 + δ). (115)

Then, by considering both possible cases of which of the two terms in (115) is larger, we find that the desired result

holds when δ = O(√k log Lr

δ

λ√m

+ ςλ

).

August 10, 2021 DRAFT

Page 19: Robust 1-bit Compressive Sensing with Partial Gaussian

19

V. NUMERICAL EXPERIMENTS

We empirically evaluate the correlation-based decoder on the MNIST [58] and celebA [59] datasets. The MNIST

dataset consists of 60,000 handwritten images of size n = 28× 28 = 784. The CelebA dataset is a large-scale face

attributes dataset of more than 200,000 images of celebrities. The input images were cropped to a 64× 64 RGB

image, yielding a dimensionality of n = 64× 64× 3 = 12288. For simplicity, we assume that there is no adversarial

noise. We make use of the practical iterative algorithm proposed in [44]:

x(t+1) = PG(x(t) + µAT (b− sign(Ax(t)))

), (116)

where PG(·) is the projection function onto G(Bk2 (r)), b is the uncorrupted observation vector, x(0) = 0, and

µ > 0 is a parameter. As the performance difference among all three noisy 1-bit observation models (random bit

flips, and Gaussian or logit noise added before quantization; cf. Section II) is mild, we only present the numerical

results corresponding to the case of Gaussian noise.

We compare with the 1-bit Lasso [60], and BIHT [61], which are sparsity-based algorithms. The sensing matrix

A is either a randomly signed partial Gaussian circulant matrix or a standard Gaussian matrix. On the MNIST

dataset, a generative model based on variational auto-encoder (VAE) is used and thus our method is denoted by

1b-VAE-C or 1b-VAE-G. where “C” refers to the partial Gaussian circulant matrix and “G” refers to the standard

Gaussian matrix. The latent dimension of the VAE is k = 20. On the celebA dataset, a generative model based

on Deep Convolutional Generative Adversarial Networks (DCGAN) is used and thus our method is denoted by

1b-DCGAN-C or 1b-DCGAN-G, respectively. The latent dimension of the DCGAN is k = 100.

We follow [44] to perform projected gradient descent (PGD) to maximize a constrained correlation-based objective

function. The number of total iterations is 300 with µ = 0.2. We use Adam optimizer to perform the projection

with adaptive number of steps and learning rates based on the changes of x(t). When x(t) changes dramatically in

the beginning phase of the optimization, we use a large number of Adam optimization steps with a large learning

rate. When x(t) has minor change in the later phase of optimization, we use a small number of Adam optimization

steps with a small learning rate. Specifically, in the first 20 iterations, the number of Adam optimization steps is 30

with learning rate 0.3. From the 20-th iteration to 100-th iteration, the number of Adam optimization steps is 10

with learning rate 0.2. From the 100-th iteration to 300-th iteration, the number of Adam optimization steps is 5

with learning rate 0.1. The generative models we use in our experiments have the same architectures as those of

[3]. The experiments are done on test images that were not used when training the generative models. To reduce

the impact of local minima, we choose the best estimation among 5 random restarts. On the MNIST dataset, the

standard deviation of the Gaussian noise σ is chosen as 0.1. On the celebA dataset, the standard deviation σ is

chosen as 0.01.

We observe from Figures 1, 2, and 3 that on both datasets, the sparsity-based methods 1-bit Lasso and BIHT

attain poor reconstructions, while the generative model based method (116) attains mostly accurate reconstructions

when the number of measurements is small. From Figures 1 and 3-a, we observe that for the MNIST dataset, the

reconstruction performance of using a partial Gaussian circulant matrix and a standard Gaussian matrix are almost

August 10, 2021 DRAFT

Page 20: Robust 1-bit Compressive Sensing with Partial Gaussian

20

Orig

inal

1b-L

asso

-CBI

HT-C

1b-V

AE-G

1b-V

AE-C

Figure 1: Reconstruction results on MNIST with 200 measurements.

Orig

inal

1b-L

asso

-CBI

HT-C

1b-G

AN-G

1b-G

AN-C

Figure 2: Reconstruction results on celebA with 1000 measurements.

the same. For the celebA dataset, according to Figures 2 and 3-b, the reconstruction performances are again very

similar.

We also compare the running time of each iteration of (116) for different sensing matrices (standard Gaussian or

partial Gaussian circulant). When A is a standard Gaussian matrix, the time complexity of calculating Ax(t) and

ATb is O(mn). When A is a partial Gaussian circulant matrix, we can use FFT to accelerate the matrix-vector

August 10, 2021 DRAFT

Page 21: Robust 1-bit Compressive Sensing with Partial Gaussian

21

0 100 200 300 400 500Number of measurements

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16Re

cons

truct

ion

erro

r (pe

r pix

el)

1b-Lasso-CBIHT-C1b-VAE-G1b-VAE-C

0 1000 2000 3000 4000 5000Number of measurements

0.05

0.10

0.15

0.20

0.25

0.30

Reco

nstru

ctio

n er

ror (

per p

ixel

)

1b-Lasso-CBIHT-C1b-DCGAN-G1b-DCGAN-C

(a) Results on MNIST (b) Results on celebA

Figure 3: The vertical bars indicate 95% confidence intervals. The reconstruction error is calculated over 1000

images by averaging the per-pixel error in terms of `2 norm.

MNIST (n = 784) celebA (n = 12288)

m=200 m=700 m=200 m=5000

Gaussian 51.4 113.6 704.8 5632.6

Circulant 33.6 57.8 378.3 1367.5

Table I: Average time comparison (ms) of different sensing matrices per iteration in (116). We compare the

running time in a single 2.9GHz CPU core. GPU can also be applied to further speed up the computation.

multiplications, and the time complexity is reduced to O(n log n). Since the dimensionality of the image vector is

relatively small, and the projection operator PG(·) is also time consuming, in our experiments, the speedup of using

a partial Gaussian circulant matrix is not as significant as that in [24, Table 2]. However, from Table I, we can

still observe that using a partial Gaussian circulant matrix is typically about 2 times faster than using a standard

Gaussian matrix, and for the celebA dataset with ambient dimension n = 12288, when m = 5000, using partial

Gaussian circulant matrices leads to around 4 times speedup compared with using standard Gaussian matrices. We

expect a greater speedup for settings with higher dimensionality.

VI. CONCLUSION AND FUTURE WORK

We have established a sample complexity upper bound for noisy 1-bit compressed sensing for the case of using a

randomly signed partial Gaussian circulant sensing matrix and a generative prior. The sample complexity upper

bound matches that obtained previously for i.i.d. Gaussian sensing matrices, and our scheme is robust to adversarial

noise.

We focused only on non-uniform recovery guarantees, and uniform recovery is an immediate direction for further

research. Relaxing the `∞ assumption and allowing for general scalings of k (not only k n1/4) would also be

August 10, 2021 DRAFT

Page 22: Robust 1-bit Compressive Sensing with Partial Gaussian

22

of significant interest. Finally, handling Fourier measurements without column sign flips would be of significant

interest for applications where the measurement matrix is constrained as such.

APPENDIX A

PROOF OF LEMMA 7

For any ε > 0, from Lemma 6 and the union bound, with probability at least 1−m2e− ε2

8ρ2 , it holds that

|〈si, sj〉| ≤ ε (117)

for all 0 ≤ i 6= j < m. Setting m2e− ε2

8ρ2 = ν, we obtain

ε = 2√

√log

m2

ν. (118)

Let B = [s0, s1, . . . , sm−1] ∈ Rn×m. Then, all the diagonal entries of BTB are 1, and the magnitude of all the

off-diagonal entries are upper bounded by ε. By the Gershgorin circle theorem [62], we have

σm(BTB) ≥ 1− (m− 1)ε > 1−mε, (119)

where σm(·) denotes the m-th largest singular value of a matrix. For any i ∈ 1, . . . ,m − 2, and any unit

vector t in the span of s0, . . . , si−1, we write t as t =∑j<i αjsj . Let αi = [α0, . . . , αi−1]T ∈ Ri and Bi =

[s0, s1, . . . , si−1] ∈ Rn×i. Similarly to (119), we have

σi(BTi Bi) ≥ 1− (i− 1)ε > 1−mε. (120)

Therefore, we obtain

1 = ‖Biαi‖22 ≥ σ2i (Bi)‖αi‖22 = σi(B

Ti Bi)‖αi‖22 ≥ (1−mε)‖αi‖22, (121)

and hence

‖αi‖22 ≤1

1−mε. (122)

In addition,

〈si, t〉2 =

∑j<i

αj〈si, sj〉

2

≤ ‖αi‖22

∑j<i

〈si, sj〉2 (123)

≤ 1

1−mε· iε2 <

mε2

1−mε. (124)

By definition, the squared length of the projection of si onto the span of s0, . . . , si−1 is equal to max〈t, si〉2 :

t ∈ spans0, . . . , si−1, ‖t‖2 = 1. Hence, the projection of si onto the span of s0, . . . , si−1 has squared length at

most mε2

1−mε . Then, if

2√

2mρ logm2

ν<

1

2, (125)

recalling that ε = 2√

2ρ√

log m2

ν , we also have mε < 12 , and

mε2

1−mε< 2mε2 = 16mρ2 log

m2

ν(126)

= (2√

2ρ)× (4√

2mρ logm2

ν) ≤ 2

√2ρ < 4ρ. (127)

August 10, 2021 DRAFT

Page 23: Robust 1-bit Compressive Sensing with Partial Gaussian

23

We can now establish Lemma 7 by performing the following procedure on the vectors (essentially Gram-Schmidt

orthogonalization):

1) Initialize: u0 = s0, r0 = 0.

2) For i = 2, 3, . . . ,m− 1, we set ri to be the projection of si onto the span of s0, . . . , si−1, and ui = si − ri.

From (118) and (127), we obtain the desired result.

APPENDIX B

TABLE OF NOTATION FOR THE PROOF OF LEMMA 1

For ease of reference, the notation used in proving our most technical result, Lemma 1, is shown in Table II.

REFERENCES

[1] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing. Springer New York, 2013.

[2] M. J. Wainwright, High-dimensional statistics: A non-asymptotic viewpoint. Cambridge University Press, 2019, vol. 48.

[3] A. Bora, A. Jalal, E. Price, and A. G. Dimakis, “Compressed sensing using generative models,” in Int. Conf. Mach. Learn. (ICML), 2017,

pp. 537–546.

[4] M. Wainwright, “Sharp thresholds for high-dimensional and noisy sparsity recovery using `1-constrained quadratic programming (Lasso),”

IEEE Trans. Inf. Theory, vol. 55, no. 5, pp. 2183–2202, May 2009.

[5] D. L. Donoho, A. Javanmard, and A. Montanari, “Information-theoretically optimal compressed sensing via spatial coupling and approximate

message passing,” IEEE Trans. Inf. Theory, vol. 59, no. 11, pp. 7434–7464, Nov. 2013.

[6] D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp, “Living on the edge: Phase transitions in convex programs with random data,” Inf.

Inference, vol. 3, no. 3, pp. 224–294, 2014.

[7] J. Wen, Z. Zhou, J. Wang, X. Tang, and Q. Mo, “A sharp condition for exact support recovery with orthogonal matching pursuit,” IEEE

Trans. Sig. Proc., vol. 65, no. 6, pp. 1370–1382, 2016.

[8] M. Wainwright, “Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting,” IEEE Trans. Inf. Theory,

vol. 55, no. 12, pp. 5728–5741, Dec. 2009.

[9] E. Arias-Castro, E. J. Candes, and M. A. Davenport, “On the fundamental limits of adaptive sensing,” IEEE Trans. Inf. Theory, vol. 59,

no. 1, pp. 472–481, Jan. 2013.

[10] E. J. Candes and M. A. Davenport, “How well can we estimate a sparse vector?” Appl. Comp. Harm. Analysis, vol. 34, no. 2, pp. 317–323,

2013.

[11] J. Scarlett and V. Cevher, “Limits on support recovery with probabilistic models: An information-theoretic framework,” IEEE Trans. Inf.

Theory, vol. 63, no. 1, pp. 593–620, 2017.

[12] ——, “An introductory guide to fano’s inequality with applications in statistical estimation,” https://arxiv.org/1901.00555, 2019.

[13] Z. Liu and J. Scarlett, “Information-theoretic lower bounds for compressive sensing with generative models,” IEEE J. Sel. Areas Inf. Theory,

vol. 1, no. 1, pp. 292–303, 2020.

[14] A. Kamath, S. Karmalkar, and E. Price, “On the power of compressed sensing with generative models,” in Int. Conf. Mach. Learn. (ICML),

2020, pp. 5101–5109.

[15] Z. Liu, S. Ghosh, and J. Scarlett, “Towards sample-optimal compressive phase retrieval with sparse and generative priors,”

https://arxiv.org/2106.15358, 2021.

[16] P. T. Boufounos, “Reconstruction of sparse signals from distorted randomized measurements,” in Int. Conf. Acoust. Sp. Sig. Proc. (ICASSP),

2010, pp. 3998–4001.

[17] P. T. Boufounos and R. G. Baraniuk, “1-bit compressive sensing,” in Conf. Inf. Sci. Syst. (CISS), 2008, pp. 16–21.

[18] A. Gupta, R. Nowak, and B. Recht, “Sample complexity for 1-bit compressed sensing and sparse classification,” in Int. Symp. Inf. Theory

(ISIT). IEEE, 2010, pp. 1553–1557.

[19] R. Zhu and Q. Gu, “Towards a lower sample complexity for robust one-bit compressed sensing,” in Int. Conf. Mach. Learn. (ICML), 2015,

pp. 739–747.

August 10, 2021 DRAFT

Page 24: Robust 1-bit Compressive Sensing with Partial Gaussian

24

Notations Explanations

A = RΩCg A partial Gaussian circulant matrix

A = RΩCgDξ A randomly signed partial Gaussian circulant matrix

i An integer in [m]

j An integer in [n]

` An integer in [m]

θ θ(x) = τ · sign(x) with P(τ = −1) = p for random flips and θ(x) = sign(x+ e) with e ∼ N (0, σ2) for Gaussian noise

λ λ = E[gθ(g)] for g ∼ N (0, 1); λ = (1− 2p)√

for random flips and λ =√

2π(1+σ2)

for Gaussian noise

bi = θ(〈ai,x∗〉) The i-th observation (no adversarial noise)

x∗ = Dξx∗ The product of a diagonal Rademacher matrix and the signal vector x∗

g A standard Gaussian vector in Rn

aij The (i, j)-th entry of A; aij = gj−i

ρ An upper bound for ‖x∗‖∞, ρ = O(

1m(t+logm)

)β = 2

√ρ Used in the (m,β)-orthogonality condition; β = O

(1√

m(t+logm)

)si = si←(x∗) Shifted version of x∗ to the left by i positions

ui Orthogonal component from the (m,β)-orthogonality of s``∈[m]

ri Residual component from the (m,β)-orthogonality of s``∈[m]; maxi ‖ri‖2 ≤ β

ei The i-th noise term for the case of Gaussian noise; ei ∼ N (0, σ2)

ω ∈ 0, 1 ω = 0 for the case of random flips and ω = 1 for the case of Gaussian noise

Icη The index set for i ∈ [m] such that |〈g,ui〉+ ωei| > |〈g, ri〉| with high probability

Iη Iη = [m]\IcηE High-probability event regarding g and Iη ; see (44)

Yij Yij = aijθ(〈g,ui〉)− λui(j−i)‖ui‖2

Z1, Z2, Z3 Notations used to shorten the expressions in (49)–(55)

hi hi =⟨g, ui‖ui‖2

⟩∼ N (0, 1)

uj,i,` uj,i,` =u`(j−i)‖u`‖2

= Cov[gj−i, h`]

uj,i,` uj,i,` =u`(j−i)‖u`‖2

wj,i wj,i =√

1−∑`∈[m] u

2j,i,`

tj,i A standard Gaussian random variable that is independent of h``∈[m]

θi For random bit flips, θi = θ; for random noise added before quantization, θi = sign(x+ e‖ui‖2

)

λi λi := E[θi(g)g] for g ∼ N (0, 1)

Table II: A table of notation used in the proof of Lemma 1.

[20] L. Zhang, J. Yi, and R. Jin, “Efficient algorithms for robust one-bit compressive sensing,” in Int. Conf. Mach. Learn. (ICML), 2014, pp.

820–828.

[21] S. Gopi, P. Netrapalli, P. Jain, and A. Nori, “One-bit compressed sensing: Provable support and vector recovery,” in Int. Conf. Mach. Learn.

(ICML), 2013, pp. 154–162.

[22] A. Ai, A. Lapanowski, Y. Plan, and R. Vershynin, “One-bit compressed sensing with non-Gaussian measurements,” Linear Algebra Appl.,

vol. 441, pp. 222–239, 2014.

[23] J. Romberg, “Compressive sensing by random convolution,” SIAM J. Imaging Sci., vol. 2, no. 4, pp. 1098–1128, 2009.

[24] F. X. Yu, A. Bhaskara, S. Kumar, Y. Gong, and S.-F. Chang, “On binary embedding using circulant matrices,” J. Mach. Learn. Res., vol. 18,

no. 1, pp. 5507–5536, 2017.

[25] R. M. Gray, “Toeplitz and circulant matrices: A review,” Comm. Inf. Theory, vol. 2, no. 3, pp. 155–239, 2005.

[26] S. Dirksen, H. C. Jung, and H. Rauhut, “One-bit compressed sensing with partial Gaussian circulant matrices,” Inf. Inference, 2017.

August 10, 2021 DRAFT

Page 25: Robust 1-bit Compressive Sensing with Partial Gaussian

25

[27] S. Dirksen and S. Mendelson, “Robust one-bit compressed sensing with partial circulant matrices,” https://arxiv.org/abs/1812.06719, 2018.

[28] S. Foucart, “Flavors of compressive sensing,” in Int. Conf. Approx. Theory (ICAT). Springer, 2016, pp. 61–104.

[29] K. Knudson, R. Saab, and R. Ward, “One-bit compressive sensing with norm estimation,” IEEE Trans. Inf. Theory, vol. 62, no. 5, pp.

2748–2758, 2016.

[30] C. Xu and L. Jacques, “Quantized compressive sensing with RIP matrices: The benefit of dithering,” Inf. Inference, vol. 9, no. 3, pp.

543–586, 2020.

[31] L. Jacques and V. Cambareri, “Time for dithering: Fast and quantized random embeddings via the restricted isometry property,” Inf.

Inference, vol. 6, no. 4, pp. 441–476, 2017.

[32] S. Dirksen and S. Mendelson, “Non-Gaussian hyperplane tessellations and robust one-bit compressed sensing,”

https://arxiv.org/abs/1805.09409, 2018.

[33] X. Yi, C. Caramanis, and E. Price, “Binary embedding: Fundamental limits and fast algorithm,” in Int. Conf. Mach. Learn. (ICML), 2015,

pp. 2162–2170.

[34] S. Oymak, C. Thrampoulidis, and B. Hassibi, “Near-optimal sample complexity bounds for circulant binary embedding,” in Int. Conf.

Acoust. Sp. Sig. Proc. (ICASSP). IEEE, 2017, pp. 6359–6363.

[35] S. Dirksen and A. Stollenwerk, “Fast binary embeddings with Gaussian circulant matrices: Improved bounds,” Discrete Comput. Geom.,

vol. 60, no. 3, pp. 599–626, 2018.

[36] P. Hand, O. Leong, and V. Voroninski, “Phase retrieval under a generative prior,” in Conf. Neur. Inf. Proc. Sys. (NeurIPS), 2018.

[37] D. Van Veen, A. Jalal, M. Soltanolkotabi, E. Price, S. Vishwanath, and A. G. Dimakis, “Compressed sensing with deep image prior and

learned regularization,” https://arxiv.org/1806.06438, 2018.

[38] R. Heckel et al., “Deep decoder: Concise image representations from untrained non-convolutional networks,” in Int. Conf. Learn. Repr.

(ICLR), 2019.

[39] P. E. Hand and B. Joshi, “Global guarantees for blind demodulation with generative priors,” Conf. Neur. Inf. Proc. Sys. (NeurIPS), vol. 32,

2019.

[40] G. Ongie, A. Jalal, C. A. Metzler, R. G. Baraniuk, A. G. Dimakis, and R. Willett, “Deep learning techniques for inverse problems in

imaging,” IEEE J. Sel. Areas Inf. Theory, vol. 1, no. 1, pp. 39–56, 2020.

[41] A. Jalal, L. Liu, A. G. Dimakis, and C. Caramanis, “Robust compressed sensing using generative models,” Conf. Neur. Inf. Proc. Sys.

(NeurIPS), vol. 33, 2020.

[42] J. Whang, Q. Lei, and A. G. Dimakis, “Compressed sensing with invertible generative models and dependent noise,”

https://arxiv.org/2003.08089, 2020.

[43] J. Cocola, P. Hand, and V. Voroninski, “Nonasymptotic guarantees for spiked matrix recovery with generative priors,” in Conf. Neur. Inf.

Proc. Sys. (NeurIPS), vol. 33, 2020.

[44] Z. Liu, S. Gomes, A. Tiwari, and J. Scarlett, “Sample complexity bounds for 1-bit compressive sensing and binary stable embeddings with

generative priors,” in Int. Conf. Mach. Learn. (ICML), 2020, pp. 6216–6225.

[45] S. Qiu, X. Wei, and Z. Yang, “Robust one-bit recovery via ReLU generative networks: Improved statistical rates and global landscape

analysis,” in Int. Conf. Mach. Learn. (ICML), 2020, pp. 7857–7866.

[46] Z. Liu and J. Scarlett, “The generalized Lasso with nonlinear observations and generative priors,” in Conf. Neur. Inf. Proc. Sys. (NeurIPS),

vol. 33, 2020.

[47] Y. Plan and R. Vershynin, “Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach,” IEEE Trans.

Inf. Theory, vol. 59, no. 1, pp. 482–494, 2012.

[48] L. Jacques and C. D. Vleeschouwer, “Quantized iterative hard thresholding: Bridging 1-bit and high-resolution quantized compressed

sensing,” in Int. Conf. Samp. Theory Apps. (SampTA), 2013.

[49] Y. Plan, R. Vershynin, and E. Yudovina, “High-dimensional estimation with geometric constraints,” Inf. Inference, vol. 6, no. 1, pp. 1–40,

2017.

[50] J. Vybíral, “A variant of the Johnson–Lindenstrauss lemma for circulant matrices,” J. Funct. Anal., vol. 260, no. 4, pp. 1096–1105, 2011.

[51] R. Vershynin, “Introduction to the non-asymptotic analysis of random matrices,” https://arxiv.org/abs/1011.3027, 2010.

[52] H. Rauhut, “Compressive sensing and structured random matrices,” Theoretical foundations and numerical methods for sparse recovery,

vol. 9, pp. 1–92, 2010.

[53] J. Bergh and J. Löfström, Interpolation spaces: An introduction. Springer Science & Business Media, 2012, vol. 223.

August 10, 2021 DRAFT

Page 26: Robust 1-bit Compressive Sensing with Partial Gaussian

26

[54] D. L. Hanson and F. T. Wright, “A bound on tail probabilities for quadratic forms in independent random variables,” Ann. Math. Stat.,

vol. 42, no. 3, pp. 1079–1083, 1971.

[55] F. Krahmer, S. Mendelson, and H. Rauhut, “Suprema of chaos processes and the restricted isometry property,” Comm. Pure Appl. Math,

vol. 67, no. 11, pp. 1877–1904, 2014.

[56] F. Krahmer and R. Ward, “New and improved Johnson–Lindenstrauss embeddings via the restricted isometry property,” SIAM J. Math.

Anal., vol. 43, no. 3, pp. 1269–1281, 2011.

[57] G. Daras, J. Dean, A. Jalal, and A. G. Dimakis, “Intermediate layer optimization for inverse problems using deep generative models,”

https://arxiv.org/2102.07364, 2021.

[58] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11,

pp. 2278–2324, 1998.

[59] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Int. Conf. Comput. Vis. (ICCV), 2015, pp. 3730–3738.

[60] Y. Plan and R. Vershynin, “One-bit compressed sensing by linear programming,” Comm. Pure Appl. Math., vol. 66, no. 8, pp. 1275–1297,

2013.

[61] L. Jacques, J. N. Laska, P. T. Boufounos, and R. G. Baraniuk, “Robust 1-bit compressive sensing via binary stable embeddings of sparse

vectors,” IEEE Trans. Inf. Theory, vol. 59, no. 4, pp. 2082–2102, 2013.

[62] C. F. Van Loan and G. H. Golub, Matrix computations. Johns Hopkins University Press Baltimore, 1983.

August 10, 2021 DRAFT