robust 1-bit compressive sensing with partial gaussian
TRANSCRIPT
1
Robust 1-bit Compressive Sensing with Partial
Gaussian Circulant Matrices and Generative
PriorsZhaoqiang Liu, Subhroshekhar Ghosh, Jun Han, Jonathan Scarlett
Abstract
In 1-bit compressive sensing, each measurement is quantized to a single bit, namely the sign of a linear function
of an unknown vector, and the goal is to accurately recover the vector. While it is most popular to assume a standard
Gaussian sensing matrix for 1-bit compressive sensing, using structured sensing matrices such as partial Gaussian
circulant matrices is of significant practical importance due to their faster matrix operations. In this paper, we provide
recovery guarantees for a correlation-based optimization algorithm for robust 1-bit compressive sensing with randomly
signed partial Gaussian circulant matrices and generative models. Under suitable assumptions, we match guarantees
that were previously only known to hold for i.i.d. Gaussian matrices that require significantly more computation. We
make use of a practical iterative algorithm, and perform numerical experiments on image datasets to corroborate our
theoretical results.
I. INTRODUCTION
The goal of compressive sensing (CS) [1], [2] is to recover a signal vector from a small number of linear
measurements, by making use of prior knowledge on the structure of the vector in the relevant domain. The structure
is typically represented by sparsity in a suitably-chosen basis, or generative priors motivated by recent success of
deep generative models [3]. The CS problem has been popular over the past 1–2 decades, and there is a vast amount
of literature with theoretical guarantees, including sharp performance bounds for both practical algorithms [4]–[7]
and potentially intractable information-theoretically optimal algorithms [8]–[15].
For the sake of low-cost implementation in hardware and robustness to certain nonlinear distortions [16], instead of
assuming infinite-precision real-valued measurements as in CS, it has also been popular to consider 1-bit CS [17]–[21],
in which the unknown vector is accurately recovered from binary-valued measurements.
Z. Liu is with the Department of Computer Science, National University of Singapore (email: [email protected], [email protected]).
S. Ghosh is with the Department of Mathematics, National University of Singapore (email: [email protected]).
J. Han is with the Platform and Content Group, Tencent (email: [email protected]).
J. Scarlett is with the Department of Computer Science, the Department of Mathematics and the Institute of Data Science, National University
of Singapore (email: [email protected]).
This work was supported by the Singapore National Research Foundation (NRF) under grant R-252-000-A74-281, and the Singapore Ministry
of Education (MoE) under grants R-146-000-250-133 and R-146-000-312-114.
August 10, 2021 DRAFT
arX
iv:2
108.
0357
0v1
[cs
.LG
] 8
Aug
202
1
2
Most of the relevant work for 1-bit CS has focused on standard Gaussian sensing matrices, and unlike linear
CS, 1-bit CS may easily fail when using a sensing matrix that is not standard Gaussian [22]. However, since such
matrices are unstructured, the associated matrix operations (e.g., multiplication) used in the decoding procedure can
be prohibitively expensive when the problem size is large. This motivates the use of structured matrices permitting
fast matrix operations, and a notable example is a subsampled Gaussian circulant sensing matrix [23], which can be
implemented via the fast Fourier transform (FFT) [24], [25]. Motivated by recent progress on 1-bit CS with partial
Gaussian circulant matrices [26], [27] and successful applications of deep generative models, we provide recovery
guarantees for a correlation-based optimization algorithm for robust 1-bit CS with partial Gaussian circulant matrices
and generative priors.
A. Related Work
It is established in [28] that an `1/`2-restricted isometry property (RIP) of the sensing matrix guarantees accurate
uniform recovery for sparse vectors via a hard thresholding algorithm and a linear programming algorithm. Based
on this result, the authors of [26] provide recovery guarantees for noiseless 1-bit CS with partial Gaussian circulant
matrices by proving the required RIP. In particular, they show that O(sε4 log
(nεs
))noiseless measurements suffice
for the reconstruction of the direction of any s-sparse vector in n-dimensional space up to accuracy ε if the sparsity
level s satisfies s = O(√
εnlogn
). The authors also consider a dithering technique [29]–[32] to enable the estimation
of the norm of the signal vector.
The authors of [27] establish an optimal (up to logarithmic factors) sample complexity for robust 1-bit sparse
recovery using a correlation-based algorithm with `2-norm regularization. The setup and results are related to ours,
but bear several important differences, including (i) the use of regularization; (ii) the focus on sub-Gaussian noise
added before quantization; (iii) the use of dithering-type thresholds in the measurements; (iv) the consideration of
sparse signals instead of generative priors; and (v) the use of a random (rather than fixed) index set in the partial
circulant matrix.
Some theoretical guarantees for circulant binary embeddings, which are closely related to 1-bit CS with partial
Gaussian circulant matrices, have been presented in [24], [33]–[35]. On the other hand, based on successful
applications of deep generative models, instead of assuming the signal vector is sparse, it has been recently popular
to assume that the signal vector lies in the range of a generative model in the problem of CS [3], [36]–[43]. In
particular, 1-bit CS with generative priors have also been studied in [44]–[46], but none of these papers considers
structured sensing matrices that permit fast matrix operations.
B. Contribution
We consider the problem of recovering a signal vector in the range of a generative model via noisy 1-bit
measurements using a correlation-based optimization objective and a randomly signed partial Gaussian circulant
sensing matrix. We characterize the number of measurements sufficient to attain an accurate recovery of an underlying
August 10, 2021 DRAFT
3
`∞-norm bounded signal1 in the range of a Lipschitz continuous generative model. In particular, for an L-Lipschitz
continuous generative model with bounded k-dimensional input and the range containing in the unit sphere, we
show that roughly O(k logLε2
)samples suffice for ε-accurate recovery. This matches the scaling obtained previously
for i.i.d. Gaussian matrices [44]. We also consider the impact of adversarial corruptions on the measurements, and
show that our scheme is robust to adversarial noise. We make use of a practical iterative algorithm proposed in [44],
and perform some simple numerical expriments on real datasets to support our theoretical results.
C. Notation
We use upper and lower case boldface letters to denote matrices and vectors respectively. We write [N ] =
0, 1, 2, · · · , N − 1 for a positive integer N . For any vector t = [t0, t1, . . . , tn−1]T ∈ Cn, we let Ct ∈ Cn×n be a
circulant matrix corresponding to t. That is, Ct(i, j) = tj−i, for all i, j ∈ [n], where here and subsequently, j − i
is understood modulo n. Let F ∈ Cn×n be the (normalized) Fourier matrix, i.e., F(i, j) = 1√ne−
2πij√−1
n , for all
i, j ∈ [n]. We define the `2-ball Bn2 (r) := x ∈ Rn : ‖x‖2 ≤ r, and we use Sn−1 := x ∈ Rn : ‖x‖2 = 1 to
denote the unit sphere in Rn. For a vector x = [x0, x1, . . . , xn−1]T ∈ Rn and an index i, we denote by s→i(x) the
vector shifted to right by i positions, i.e., the j-th entry of s→i(x) is the ((j − i) mod n)-th entry of x. Similarly,
we denote by si←(x) the vector shifted to left by i positions. A Rademacher variable takes the values +1 or −1,
each with probability 12 , and a sequence of independent Rademacher variables is called a Rademacher sequence.
The operator norm of a matrix from `p into `p is defined as ‖A‖p→p := max‖x‖p=1 ‖Ax‖p. We use g to represent
a standard normal random variable, i.e., g ∼ N (0, 1). The symbols C,C ′, C ′′, c are absolute constants whose values
may differ from line to line. Some additional notations that are used specifically for the proof of Lemma 1 are
presented in a table in Appendix B.
II. PROBLEM SETUP
Except where stated otherwise, we consider the following setup and assumptions:
• Let x∗ = [x∗0, x∗1, . . . , x
∗n−1]T ∈ Rn be the underlying signal vector, assumed to have unit norm (since recovery
of the norm is impossible under our measurement model). We write x∗ as x∗ = Dξx∗, where Dξ is a diagonal
matrix corresponding to a Rademacher sequence ξ = [ξ0, ξ1, . . . , ξn−1]T ∈ Rn.
• Fixing any (deterministic) index set Ω ⊆ [n] with |Ω| = m, we set the sensing matrix A ∈ Rm×n as
A = RΩCgDξ, (1)
where RΩ ∈ Rm×n is the matrix that restricts an n-dimensional vector to coordinates indexed by Ω and
g = [g0, g1, . . . , gn−1]T ∼ N (0, In). That is, A is a partial Gaussian circulant matrix with random column
sign flips. For concreteness, we fix the index set Ω as Ω = [m], but the same analysis holds for any other fixed
set. We also write A as A = RΩCg, i.e., a partial Gaussian circulant matrix without random column sign flips.
1While the assumption about `∞ norm bound appears to be restrictive, we may remove this assumption by multiplying a randomly signed
Hadamard matrix for pre-processing. See Remarks 1 and 2 for detalied explanations.
August 10, 2021 DRAFT
4
• We assume that the measurements, or response variables, bi ∈ −1, 1, i ∈ [m], are generated according to
bi = θ (〈ai,x∗〉) , (2)
where aTi is the i-th row of the sensing matrix A ∈ Rm×n, and θ is a (possibly random) function with
θ(z) ∈ −1,+1. We focus on the following widely-adopted noisy 1-bit observation models:
– Random bit flips: θ(x) = τ · sign(x), where τ is a ±1-valued random variable with P(τ = −1) = p < 12 ,
independent of A.
– Random noise before quantization: θ(x) = sign(x+ e), where e represents a random noise term added
before quantization, independent of A. For example, when e ∼ N (0, σ2), this corresponds to a special
case of the probit model; when e is a logit noise term, this recovers the logistic regression model.
• We assume that
E[θ(g)g] = λ (3)
for some λ > 0, where g is a standard normal random variable. Note that the expectation is taken with respect
to both θ and g. Then, it is easy to calculate that E[biai] = λx∗ [20]. For the above choices of θ, we have [47,
Sec. 3]:
– Random bit flips: λ = (1− 2p)√
2π .
– Random noise before quantization: For any x ∈ R, we set
φ(x) = E[θ(x)] = 1− 2P(e < −x). (4)
Then, we have φ′(x) = 2fe(−x), where fe is the probability density function (pdf) of e. Thus, we have
λ = E[θ(g)g] = E[E[θ(g)g|g]] = E[φ(g)g] = E[φ′(g)] = 2E[fe(−g)] > 0. (5)
In particular, when e ∼ N (0, σ2), we have fe(x) = 1√2πσ
e−x2
2σ2 and λ =√
2π(1+σ2) . When e is a logit
noise term, we have φ(x) := tanh(x2
), and λ = E[φ′(g)] = 1
2E[
sech2(g2
)].
• For the case of random noise added before quantization, we additionally make the following assumptions:
– Let Fe be the cumulative distribution function (cdf) of e. We assume that Fe(x) + Fe(−x) = 1 for any
x ∈ R.2
– For any a > 0, let θa(x) = sign(x+ ea ). We assume that λa := E[θa(g)g] < C1a+C2, where g ∼ N (0, 1)
and C1, C2 are absolute constants.
The assumption Fe(x) + Fe(−x) = 1 ensures that sign(g + ea ) is a Rademacher random variable. We show in
Lemma 9 that the above assumptions are satisfied when e corresponds to the Gaussian or logit noise term.
• In addition to any random noise in θ, we allow for adversarial noise. In this case, instead of observing
b = [b0, . . . , bm−1]T ∈ Rm directly, we only assume access to b = [b0, . . . , bm−1]T ∈ Rm satisfying
1√m‖b− b‖2 ≤ ς (6)
2Or equivalently, fe(x) = fe(−x) for any x ∈ R.
August 10, 2021 DRAFT
5
for some parameter ς ≥ 0. Note that the corruptions of b yielding b may depend on A.
• The signal vector x∗ lies in a set of structured signals K. We focus on the case that K = Range(G) for
some L-Lipschitz continuous generative model G : Bk2 (r) → Rn (e.g., see [3]). We will also assume that
Range(G) ⊆ Sn−1, but this can easily be generalized to the case that Range(G) ⊆ Rn and the recovery
guarantee is only given up to scaling [44], [46].
• To derive an estimate of x∗, we seek x maximizing a constrained correlation-based objective function:
x := arg maxx∈K
bT (Ax). (7)
Such an optimization problem has been considered in [47], with the main purpose of deriving a convex
optimization problem for the case that K is a convex set. We focus on the case that K = Range(G), which in
general leads to a non-convex optimization problem. This can be approximately solved using gradient-based
algorithms [44] or by projecting AT b onto K [48], [49]. In this paper, we provide theoretical guarantees for
the recovery in the ideal case that the global maximizer is found. We also provide experimental results in
Section V.
III. MAIN RESULTS
Recall that the i-th observation is bi = θ(〈ai,x∗〉), x∗ = Dξx∗, and A = RΩCg, i.e., a partial Gaussian circulant
matrix without random column sign flips. We have the following important lemma.
Lemma 1. For any t > 0 satisfying m = Ω (t+ log n), suppose that the unit vector x∗ satisfies ‖x∗‖∞ ≤ ρ
with ρ = O(
1m(t+logm)
). Suppose that b = θ(Ax∗) with θ(x) = τ · sign(x) or θ(x) = sign(x + e) (cf. Sec. II),
and A is a partial Gaussian circulant matrix with random column sign flips. Then, with probability at least
1− e−Ω(t) − e−Ω(√m), ∥∥∥∥ 1
mATb− λx∗
∥∥∥∥∞≤ O
(√t+ log n
m
). (8)
Based on Lemma 1, and following the proof techniques used in [46] (see Section IV-C for the details), we have
the following theorem concerning the recovery guarantee for noisy 1-bit CS with partial Gaussian circulant matrices
and generative priors.
Theorem 1. Consider any x∗ ∈ K with K = G(Bk2 (r)) for some L-Lipschitz generative model G : Bk2 (r)→ Sn−1.
Fix δ > 0 satisfying δ = O(√k log Lr
δ
λ√m
+ ςλ
), and suppose that Lr = Ω(δn), n = Ω
(m), and the signal x∗ satisfies
‖x∗‖∞ ≤ ρ = O(
1mk log Lr
δ
). Suppose that all other conditions are the same as those in Lemma 1. Let x be a
solution to the correlation-based optimization problem (7). Then, if
m = Ω
(k log
Lr
δ· (1 + 1(ς > 0)Sς)
), (9)
where Sς =(
log2 n)·(
log2(k log Lr
δ
))is a sample complexity term concerning the adversarial noise, with
probability at least 1− e−Ω(k log Lrδ ) − e−Ω(
√m) − 1(ς > 0) · e−Ω((log2(k log Lr
δ ))(log2 n)), it holds that
‖x− x∗‖2 ≤ O
√k log Lr
δ
λ√m
+ς
λ
. (10)
August 10, 2021 DRAFT
6
Note that the extra logarithmic terms log2(k log Lrδ ) and log2 n in (9) are only required for deriving an upper
bound corresponding to the adversarial noise. If there is no adversarial noise, i.e., ς = 0, the sample complexity in (9)
becomes m = Ω(k log Lr
δ
), and it follows from (10) that O
(k log Lrδ
λ2ε2
)samples suffice for an ε-accurate recovery.
This matches the scaling derived for i.i.d. Gaussian measurements in [44].
Remark 1. As mentioned in the remark in [24, Page 8], for “typical” unit vectors, we have ‖x∗‖∞ = O(
logn√n
).
Thus, assuming m = Θ(k log Lr
δ
), we find that the condition ρ = O
(1
mk log Lrδ
)holds for typical signals when
1(k log Lr
δ )2 logn√
n. Assuming Lr
δ = poly(n) [3], it in turn suffices that k = O(nθ) for some θ ∈(0, 1
4
). In addition,
for an arbitrary unit vector x∗, we can use a simple trick to ensure the `∞ bound with high probability: Instead of
using a randomly signed partial Gaussian circulant sensing matrix A, we can use AH as the sensing matrix, where
H ∈ Rn×n is a randomly signed Hadamard matrix that also allows fast matrix-vector multiplications. Then, the
observation vector becomes b = θ(AHx∗). Since Hx∗ satisfies the desired `∞ bound with high probability [50,
Lemma 3.1], for any x that solves (7), ‖x−Hx∗‖2 satisfies the upper bound in (10). Because H is symmetric and
orthogonal, we have the same upper bound for ‖Hx− x∗‖2.
Remark 2. The use of random column sign flips and a randomly signed Hadamard matrix is most suited to
applications where one has the freedom to choose any measurement matrix, but is limited by computation. In this
sense, the design is more practical than the widespread i.i.d. Gaussian choice, though it may remain unsuitable in
applications such as medical imaging where one must use subsampled Fourier measurements without sign flips or
the Hadamard operator.
IV. PROOFS OF MAIN RESULTS
In this section, we provide proofs for Lemma 1 and Theorem 1. We first present some useful auxiliary results
that are general, and then some that are specific to our setup.
A. General Auxiliary Results
Definition 1. A random variable X is said to be sub-Gaussian if there exists a positive constant C such that
(E [|X|p])1/p ≤ C√p for all p ≥ 1. The sub-Gaussian norm of a sub-Gaussian random variable X is defined as
‖X‖ψ2:= supp≥1 p
−1/2 (E [|X|p])1/p.
We have the following concentration inequality for sub-Gaussian random variables.
Lemma 2. (Hoeffding-type inequality [51, Proposition 5.10]) Let X1, . . . , XN be independent zero-mean sub-
Gaussian random variables, and let K = maxi ‖Xi‖ψ2. Then, for any α = [α1, α2, . . . , αN ]T ∈ RN and any t ≥ 0,
it holds that
P
(∣∣∣ N∑i=1
αiXi
∣∣∣ ≥ t) ≤ exp
(1− ct2
K2‖α‖22
). (11)
In particular, we have the following lemma concerning the Hoeffding’s inequality for (real) Rademacher sums.
August 10, 2021 DRAFT
7
Lemma 3. ([52, Proposition 6.11]) Let b ∈ RN and ε = (εj)j∈[N ] be a Rademacher sequence. Then, for u > 0,
P
∣∣∣∣∣∣N∑j=1
εjbj
∣∣∣∣∣∣ ≥ ‖b‖2u ≤ 2 exp
(−u
2
2
). (12)
We also use the following well-established fact based on the Riesz-Thorin interpolation theorem.
Lemma 4. ([53]) For any matrix A,
‖A‖2→2 ≤ max ‖A‖1→1, ‖A‖∞→∞ . (13)
Next, we present the Hanson-Wright inequality, which gives a tail bound for quadratic forms in independent
Gaussian variables.
Lemma 5. (Hanson-Wright inequality [54]) If g ∈ RN is a standard Gaussian vector and B ∈ RN×N , then for
any t > 0,
P(∣∣gTBg − E
[gTBg
]∣∣ ≥ t) ≤ exp
(−c ·min
t2
‖B‖2F,
t
‖B‖2→2
). (14)
B. Proof of Lemma 1
Recall that A = RΩCg is a partial Gaussian circulant matrix without random column sign flips and x∗ = Dξx∗.
We have A = ADξ, and ai = Dξai, where aTi is the i-th row of A. Then,
bi = θ(〈ai,x∗〉) = θ(〈ai, x∗〉) (15)
= θ(〈s→i(g), x∗〉) (16)
= θ(〈g, si←(Dξx∗)〉). (17)
Let si = si←(Dξx∗). Since in general, the vectors s0, s1, . . . , sm−1 are not jointly orthogonal, the observations
b0, b1, . . . , bm−1 are not independent. However, the sequence of observations can be approximated by a sequence of
independent random variables using the notion of (m,β)-orthogonality defined in the following.
Definition 2. A sequence of m unit vectors t0, t1, . . . , tm−1 is said to be (m,β)-orthogonal if there exists a
decomposition
ti = ui + ri (∀i) (18)
satisfying the following properties:
1) ui is orthogonal to spanuj : j 6= i;
2) maxi‖ri‖2 ≤ β.
We have the following useful lemma regarding Dξ.
Lemma 6. ([24, Lemma 7]) Let p,q ∈ Rn be vectors that satisfy ‖p‖2 = 1 and ‖q‖∞ ≤ ρ for some parameter ρ.
Then, for any 0 < j < n and ε > 0, we have
P (|〈Dξp, s→j(Dξq)〉| > ε) ≤ e−ε2
8ρ2 , (19)
August 10, 2021 DRAFT
8
where the probability is over the Rademacher sequence ξ.
Based on Lemma 6, and similar to [24, Lemma 9], we have the following lemma ensuring that the (m,β)-
orthogonality condition is satisfied by a certain sequence of unit vectors with high probability. The proof is given in
Appendix A for completeness.
Lemma 7. For any ν ∈ (0, 1), suppose that x is a unit vector with ‖x‖∞ ≤ ρ < 1
4√
2m log m2
ν
. Let ti = si←(Dξx) for
i ∈ [m]. Then, with probability at least 1−ν over the choice of ξ, the sequence t0, t1, . . . , tm−1 is (m,β)-orthogonal
for β = 2√ρ.
In addition, we have the following two lemmas for the case of random noise added before quantization.
Lemma 8. Let g ∼ N (0, 1) and e be any random noise that is independent with g. Then, for any b ≥ 0 and
a, η > 0,
P(|ag + be| ≤ η) ≤√
2
π
η
a. (20)
Proof. Let fe be the pdf of e and Fg be the cdf of g, i.e., Fg(x) = 1√2π
∫ x−∞ e−
t2
2 dt. We have maxx∈R F′g(x) =
maxx∈R1√2πe−
x2
2 = 1√2π
. Then,
P(|ag + be| ≤ η) =
∫ ∞−∞
fe(t)
∫ η−bta
−η−bta
1√2πe−
x2
2 dxdt (21)
=
∫ ∞−∞
fe(t)
(Fg
(η − bta
)− Fg
(−η − bt
a
))dt (22)
≤ 2η
a·maxx∈R
F ′g(x)
∫ ∞−∞
fe(t)dt (23)
=
√2
π
η
a. (24)
The following lemma shows that when the random noise e added before quantization is Gaussian or logit, then
the additional assumptions corresponding to e made in Section II are satisfied.
Lemma 9. When e ∼ N (0, σ2) or e is a logit noise term, for any x ∈ R and any a > 0, we have
Fe(x) + Fe(−x) = 1 (25)
and
λa := E[θa(g)g] < C1a+ C2, (26)
where θa(x) := sign(x+ ea ), and C1, C2 are absolute constants.
Proof. When e ∼ N (0, σ2), by the symmetry of the pdf of a zero-mean Gaussian distribution, it is easy to see
that (25) holds. Note that ea ∼ N (0, σ
2
a2 ). From λ = E[θ(g)g] =√
2π(1+σ2) (cf. Section II), we obtain
λa = E[θa(g)g] =
√2
π(1 + σ2
a2 )<
√2
π. (27)
August 10, 2021 DRAFT
9
When e is a logit noise term, we have
P(θ(x) = 1) =ex
ex + 1, P(θ(x) = −1) =
1
ex + 1. (28)
From θ(x) = sign(x+ e), we obtain
Fe(−x) = P(e < −x) =1
ex + 1, (29)
or equivalently,
Fe(x) =1
e−x + 1. (30)
This gives Fe(x) + Fe(−x) = 1 and
fe(x) = F ′e(x) =e−x
(e−x + 1)2≤ 1
4. (31)
For any fixed x, we have
φa(x) := E[θa(x)] = 1− 2P(e < −ax). (32)
Then, for g ∼ N (0, 1), we obtain
λa := E[θa(g)g] = E[E[θa(g)g|g]] (33)
= E[φa(g)g] = E[φ′a(g)] = 2aE[fe(−ag)] ≤ a
2. (34)
Overview of the proof: With the above auxiliary results in place, the main ideas for proving Lemma 1 are outlined
as follows: Let si = si←(Dξx∗). From Lemma 7, the sequence of unit vectors s0, s1, . . . , sm−1 is (m,β)-orthogonal
with high probability. We wirte si = ui + ri as in (18), and then bi = θ(〈g, si〉) = θ(〈g,ui〉+ 〈g, ri〉). Because
‖ri‖2 = o(1), for the case of random bit flips with θ(x) = τ · sign(x), we have that for most of i ∈ [m], the
term |〈g,ui〉| is larger than the term |〈g, ri〉|, and thus bi = θ(〈g,ui〉 + 〈g, ri〉) = θ(〈g,ui〉).3 Because of the
orthogonality of u0,u1, . . . ,um−1, the sequence θ(〈g,ui〉)i∈[m] is independent. Therefore, for any j ∈ [n], the
j-th entry of 1mATb− λx∗ is
1
m
∑i∈[m]
(aijbi − λx∗j ) ≈1
m
∑i∈[m]
(gj−iθ(〈g,ui〉)− λui(j − i)), (35)
where we use aij = gj−i, bi ≈ θ(〈g,ui〉) and x∗j = si(j− i) ≈ ui(j− i). The sequencegj−iθ(〈g,ui〉)−λui(j−
i)i∈[m]
is still not independent, but defining hi =⟨g, ui‖ui‖2
⟩, we have that h0, h1, . . . , hm−1 are i.i.d. standard
Gaussian random variables. We use these to represent gj−i, such that the sum 1m
∑i∈[m](gj−iθ(〈g,ui〉)−λui(j−i))
can be decomposed into three terms.4 We control these three terms separately by making use of Lemmas 2, 3 and 5.
We now proceed with the formal proof. A table of notations is presented in Appendix B for convenience.
3Similarly, for random noise added before quantization with θ(x) = sign(x+ e), from Lemma 8, we have that for most of i ∈ [m], the term
|〈g,ui〉+ ei| is larger than the term |〈g, ri〉|, and thus we also have bi = θ(〈g,ui〉).4For random noise added before quantization, it is decomposed into four terms.
August 10, 2021 DRAFT
10
Proof of Lemma 1. Useful high-probability events: Recall that from (17), we have bi = θ(〈g, si←(Dξx∗)〉). Let
si = si←(Dξx∗) for i ∈ [m]. Because ‖x∗‖∞ ≤ ρ with ρ ≤ O
(1
m(t+logm)
), setting ν = e−t in Lemma 7, we
have that with probability at least 1 − e−t, the sequence s0, . . . , sm−1 is (m,β)-orthogonal with β = 2√ρ. We
write si = ui + ri with ‖ri‖2 ≤ β for all i. In the following, we always assume the (m,β)-orthogonality. Because
‖ri‖2 ≤ β, from Lemma 2 and a union bound over i ∈ [m], we have that for any η > 0, with probability at least
1−me−η2
2β2 ,
maxi∈[m]
|〈g, ri〉| ≤ η. (36)
In addition, for β = o(1), we have ‖ui‖2 ∈ [1− β, 1 + β] ≥√
2π . By Lemma 8, we obtain for any i ∈ [m] and
η > 0 that
P (|〈g,ui〉+ ωei| ≤ η) = P(∣∣∣∣‖ui‖2⟨g,
ui‖ui‖2
⟩+ ωei
∣∣∣∣ ≤ η) (37)
≤√
2
π
η
‖ui‖2≤ η, (38)
where ω = 0 corresponds to random bit flips with θ(x) = τ · sign(x), and ω = 1 corresponds to random
noise added before quantization with θ(x) = sign(x + e). By the orthogonality of u0, . . . ,um−1, the sequence
〈g,u0〉, . . . , 〈g,um−1〉 is independent. Let
Iη = i ∈ [m] : |〈g,ui〉+ ωei| ≤ η, (39)
and let Icη = [m]\Iη be the complement of Iη . Using a standard Chernoff bound [24], we have for any ζ > 0 that
P(|Iη| ≥ mη +mζ) < e−mζ2
2η+ζ . (40)
Note that when (36) holds, for i ∈ Icη , we have
bi = θ(〈g, si〉) = θ(〈g,ui〉), (41)
since the term 〈g,ui〉+ ωei of magnitude exceeding η dominates the other term 〈g, ri〉 of magnitude at most η. In
addition, because si = si←(x∗), we have for any fixed j ∈ [n] that
si(j − i) = x∗j . (42)
Another useful property is the following bound on the maximum of n independent Gaussians (e.g., using [2,
Eq. (2.9)] and a union bound): With probability at least 1− e−t,
maxj∈[n]
|gj | ≤√
2(t+ log n). (43)
Combining (36), (40), and (43), we find that the event
E : maxi∈[m]
|〈g, ri〉| ≤ η; |Iη| < m(η + ζ); maxj∈[n]
|gj | ≤√
2(t+ log n); (44)
occurs with probability at least 1− e−t −me−η2
2β2 − e−mζ2
2η+ζ .
August 10, 2021 DRAFT
11
Dividing 1mATb− λx∗ into several terms: By the definitions of Iη and Icη , for any j ∈ [n], we have[
1
mATb− λx∗
]j
=1
m
m−1∑i=0
(aijbi − λx∗j ) (45)
=1
m
∑i∈Icη
(aijbi − λx∗j ) +1
m
∑i∈Iη
(aijbi − λx∗j ) (46)
=1
m
∑i∈Icη
(aijθ(〈g,ui〉)− λsi(j − i)) +1
m
∑i∈Iη
(aijbi − λx∗j ) (47)
=1
m
∑i∈Icη
(gj−iθ(〈g,ui〉)− λ
ui(j − i)‖ui‖2
)
− λ
m
∑i∈Icη
(ri(j − i) + ui(j − i)−
ui(j − i)‖ui‖2
)
+1
m
∑i∈Iη
(gj−ibi − λx∗j ), (48)
where (47) is from (41) and (42), and (48) is from si = ui + ri. Therefore, if we denote Yij = gj−iθ(〈g,ui〉)−
λui(j−i)‖ui‖2 , Z1 := 2(η + ζ)
√2(t+ log n), Z2 := C
√t+lognm for some sufficiently large absolute constant C, and let
Z3 := Z2 + 2β + 2Z1, we obtain
P
(maxj∈[n]
∣∣∣∣∣ 1
m
m−1∑i=0
(aijbi − λx∗j )
∣∣∣∣∣ > Z3
)≤ P(Ec) + P
(maxj∈[n]
∣∣∣∣∣ 1
m
m−1∑i=0
(aijbi − λx∗j )
∣∣∣∣∣ > Z3, E
)(49)
≤ e−t +me− η2
2β2 + e−mζ2
2η+ζ + P
(maxj∈[n]
∣∣∣∣∣ 1
m
m−1∑i=0
(aijbi − λx∗j )
∣∣∣∣∣ > Z3, E
)(50)
≤ e−t +me− η2
2β2 + e−mζ2
2η+ζ + P
maxj∈[n]
∣∣∣∣∣∣ 1
m
∑i∈Icη
Yij
∣∣∣∣∣∣ > Z2 + Z1, E
+ P
maxj∈[n]
∣∣∣∣∣∣ λm∑i∈Icη
(si(j − i)−
ui(j − i)‖ui‖2
∣∣∣∣ > 2β, E
+ P
maxj∈[n]
∣∣∣∣∣∣ 1
m
∑i∈Iη
(gj−ibi − λx∗j )
∣∣∣∣∣∣ > Z1, E
(51)
≤ e−t +me− η2
2β2 + e−mζ2
2η+ζ + P
maxj∈[n]
∣∣∣∣∣∣ 1
m
∑i∈Iη
Yij
∣∣∣∣∣∣ > Z1
∣∣∣E+ P
maxj∈[n]
∣∣∣∣∣∣ 1
m
∑i∈[m]
Yij
∣∣∣∣∣∣ > Z2
+ P
maxj∈[n]
∣∣∣∣∣∣ λm∑i∈Icη
(si(j − i)−
ui(j − i)‖ui‖2
)∣∣∣∣∣∣ > 2β∣∣∣E+ P
maxj∈[n]
∣∣∣∣∣∣ 1
m
∑i∈Iη
(aijbi − λx∗j )
∣∣∣∣∣∣ > Z1
∣∣∣E , (52)
where (50) follows from (44), (51) follows from (48) and the definition of Z3, and (52) uses the triangle inequality
and the fact that P(A, E) ≤ minP(A|E),P(A). Noting that aij = gj−i, λ < 1 (cf. Section II), |x∗j | ≤ ρ < 1,
‖ri‖2 ≤ β, and |1− ‖ui‖2| ≤ β, conditioned on E , we have
maxj∈[n]
∣∣∣∣∣∣ λm∑i∈Icη
(ri(j − i) + ui(j − i)−
ui(j − i)‖ui‖2
)∣∣∣∣∣∣ ≤ 2β, (53)
maxj∈[n]
∣∣∣∣∣∣ 1
m
∑i∈Iη
(gj−ibi − λx∗j )
∣∣∣∣∣∣ ≤ Z1, (54)
August 10, 2021 DRAFT
12
and
maxj∈[n]
∣∣∣∣∣∣ 1
m
∑i∈Iη
(gj−iθ(〈g,ui〉)− λ
ui(j − i)‖ui‖2
)∣∣∣∣∣∣ ≤ Z1. (55)
Bounding the remaining term: In the following, we focus on deriving an upper bound for maxj∈[n]
∣∣ 1m
∑i∈[m] Yij
∣∣ =
maxj∈[n]
∣∣ 1m
∑i∈[m]
(gj−iθ(〈g,ui〉)− λui(j−i)
‖ui‖2
)∣∣. Let hi =⟨g, ui‖ui‖2
⟩. Then, h0, h1, . . . , hm−1 are i.i.d. standard
Gaussian random variables. For i, ` ∈ [m], j ∈ [n], we denote uj,i,` := u`(j−i)‖u`‖2 . We have from the definition of h`
that Cov[gj−i, h`] = uj,i,`, and for any fixed j ∈ [n], gj−i can be written as
gj−i =∑`∈[m]
uj,i,`h` + wj,itj,i, (56)
where wj,i =√
1−∑`∈[m] u
2j,i,` and tj,i ∼ N (0, 1) is independent of h0, h1, . . . , hm−1. Note that for random
noise added before quantization, we have
θ(〈ui,g〉) = θ(‖ui‖2hi) = sign(‖ui‖2hi + ei) (57)
= sign
(hi +
ei‖ui‖2
)= θi(hi), (58)
where θi(x) := sign(x + e‖ui‖2 ). Let λi = E[θi(g)g] for g ∼ N (0, 1). From Lemma 9, we have that λi <
C1‖ui‖2 + C2 = C, and θi(g) is a Rademacher random variable. Therefore, we obtain5
1
m
∑i∈[m]
(gj−iθ(〈g,ui〉)− λ
ui(j − i)‖ui‖2
)=
1
m
∑i∈[m]
(gj−iθi(hi)− λuj,i,i) (59)
=1
m
∑i∈[m]
∑`∈[m]
uj,i,`h`θi(hi) + wj,itj,iθi(hi)− λuj,i,i
(60)
=1
m
∑i∈[m]
uj,i,i (hiθi(hi)− λi) +1
m
∑i∈[m]
uj,i,i(λi − λ)
+1
m
∑i∈[m]
θi(hi)∑` 6=i
uj,i,`h` +1
m
∑i∈[m]
wj,itj,iθi(hi), (61)
where (60) follows from (56). We control the four summands in (61) separately as follows:
• Because uj,i,ihiθi(hi) − λiuj,i,i, i ∈ [m], are independent, zero-mean, sub-Gaussian random variables with
sub-Gaussian norm upper bounded by an absolute constant c, from Lemma 2 and a union bound over j ∈ [n],
we have with probability at least 1− e−t that
maxj∈[n]
∣∣∣∣∣∣ 1
m
∑i∈[m]
uj,i,i (hiθi(hi)− λi)
∣∣∣∣∣∣ ≤ O(√
t+ log n
m
). (62)
• For the case of random bit flips, the second term vanishes. For the case of random noise added before
quantization, from 0 < λ, λi < C and (68), we obtain
maxj∈[n]
∣∣∣∣∣∣ 1
m
∑i∈[m]
uj,i,i(λi − λ)
∣∣∣∣∣∣ ≤ C maxi∈[m],j∈[n]
|uj,i,i| = O
(1√
m(t+ logm)
). (63)
5For the case of random bit flips with θ(x) = τ · sign(x+ e), we set θi = θ (and thus λi = λ) for all i ∈ [m].
August 10, 2021 DRAFT
13
• We have
maxj,`
∑i∈[m]
u2j,i,` ≤ 4 max
j,`
∑i∈[m]
u`(j − i)2 (64)
= 4 maxj,`
∑i∈[m]
(s`(j − i)− r`(j − i))2 (65)
≤ 8 maxj,`
∑i∈[m]
(s`(j − i)2 + r`(j − i)2
)(66)
≤ 8 maxj,`
(m‖x∗‖2∞ + ‖r`‖22
)(67)
= O
(1
m(t+ logm)
), (68)
where we use ‖u`‖2 = 1 + o(1) in (64), s` = u` + r` in (65), (a− b)2 ≤ 2(a2 + b2) in (66), and ‖s`‖∞ =
‖x∗‖∞ ≤ ρ = O(
1m(t+logm)
)as well as ‖r`‖2 ≤ 2
√ρ in (67) and (68). Then, from Proposition 1 below and
a union bound over j ∈ [n], we have with probability at least 1−O(e−t)
that
maxj∈[n]
∣∣∣∣∣∣ 1
m
∑`∈[m]
h`∑i 6=`
uj,i,`θi(hi)
∣∣∣∣∣∣ ≤ O(√
t+ log n+ logm
m
)= O
(√t+ log n
m
). (69)
• For a fixed j ∈ [n] and 0 ≤ i1 6= i2 < m, we have
Cov [wj,i1tj,i1 , wj,i2tj,i2 ] = Cov
gj−i1 − ∑`∈[m]
uj,i1,`h`, gj−i2 −∑`∈[m]
uj,i2,`h`
(70)
= −∑`∈[m]
uj,i1,`uj,i2,`. (71)
Then, [wj,0tj,0, . . . , wj,m−1tj,m−1]T ∈ Rm is a Gaussian vector with zero mean and covariance matrix
Σj , where Σj,i,i = w2j,i = 1 −
∑`∈[m] u
2j,i,` ≤ 1 and from (67), we have for i1 6= i2 that
∣∣Σj,i1,i2∣∣ =∣∣−∑`∈[m] uj,i1,`uj,i2,`∣∣ ≤ √(
∑`∈[m] u
2j,i1,`
)(∑`∈[m] u
2j,i2,`
) = O(
1m(t+logm)
)= O
(1m
). When m = Ω(t+
log n), taking a union bound over j ∈ [n], and setting ε = O(√
t+lognm
)in Proposition 2 below, we obtain
with probability at least 1−O(e−t)
that
maxj∈[n]
∣∣∣∣∣∣ 1
m
∑i∈[m]
wj,itj,iθi(hi)
∣∣∣∣∣∣ ≤ O(√
t+ log n
m
). (72)
Combining (61) with (62), (63), (69), (72), we obtain with probability at least 1−O (e−t) that
maxj∈[n]
∣∣∣∣∣∣ 1
m
∑i∈[m]
(gj−iθ(〈g,ui〉)− λ
ui(j − i)‖ui‖2
)∣∣∣∣∣∣ ≤ O(√
t+ log n
m
). (73)
Combining terms: Finally, combining the (m,β)-orthogonality and (52), (53), (54), (55), and (73), with probability
at least 1−O (e−t)−me−η2
2β2 − e−mζ2
2η+ζ , we have∥∥∥∥ 1
mATb− λx∗
∥∥∥∥∞
= maxj∈[n]
∣∣∣∣∣ 1
m
m−1∑i=0
(aijbi − λx∗j )
∣∣∣∣∣ ≤ O(√
t+ log n
m
)+ 2β + 4(η + ζ)
√2(t+ log n). (74)
August 10, 2021 DRAFT
14
Setting η = β√
2(t+ logm) = O(√ρ) ·
√2(t+ logm) = O
(1√m
)and ζ = C√
m, we obtain with probability at
least 1− e−Ω(t) − e−Ω(√m) that ∥∥∥∥ 1
mATb− λx∗
∥∥∥∥∞≤ O
(√t+ log n
m
). (75)
We now move on to the statements and proofs of Propositions 1 and 2 used above. Throughout the following, we
use the θi defined in (58).6
Proposition 1. Let z ∈ Rm be a standard Gaussian vector. For any t > 0, suppose that W ∈ Rm×m satisfies
max`∈[m]
∑i∈[m] w
2i,` = O
(1
m(t+logm)
). Then, we have with probability at least 1−O
(e−t)
that∣∣∣∣∣∣ 1
m
∑i∈[m]
θi(zi)∑` 6=i
wi,`z`
∣∣∣∣∣∣ ≤ O(√
t+ logm
m
). (76)
Proof. For any fixed ` ∈ [m] and any v > 0, from Lemma 2, we have with probability at least 1−O(e−v
2)that
|z`| ≤ v. (77)
Therefore, with probability at least 1−O(e−v
2),
|z`|√∑
i 6=`
w2i,` = O
(v√
m(t+ logm)
). (78)
Note that θi(z0), θi(z1), . . . , θi(zm−1) form a Rademacher sequence. Then, for any u > 0 and a sufficiently large
absolute constant C, we have
P
∣∣∣∣∣∣z`∑i 6=`
wi,`θi(zi)
∣∣∣∣∣∣ > Cv√m(t+ logm)
u
≤ P(|z`| > v) + P
∣∣∣∣∣∣z`∑i 6=`
wi,`θi(zi)
∣∣∣∣∣∣ > Cv√m(t+ logm)
u∣∣∣ |z`| ≤ v
(79)
≤ O(e−u
2)+O
(e−v
2), (80)
where we use (77), (78) and Lemma 3 to derive (80). Taking a union bound over ` ∈ [m], we obtain with probability
at least 1−O(m(e−u
2
+ e−v2))
that
max`∈[m]
∣∣∣∣∣∣z`∑i 6=`
wi,`θi(zi)
∣∣∣∣∣∣ ≤ Cv√m(t+ logm)
u. (81)
Then, we have ∣∣∣∣∣∣ 1
m
∑i∈[m]
θi(zi)∑` 6=i
wi,`z`
∣∣∣∣∣∣ =
∣∣∣∣∣∣ 1
m
∑`∈[m]
z`∑i 6=`
wi,`θi(zi)
∣∣∣∣∣∣ (82)
≤ max`∈[m]
∣∣∣∣∣∣z`∑i 6=`
wi,`θi(zi)
∣∣∣∣∣∣ ≤ Cv√m(t+ logm)
u. (83)
6For the case of random bit flips with θ(x) = τ · sign(x+ e), we set θi = θ for i ∈ [m].
August 10, 2021 DRAFT
15
Setting u = C ′√t+ logm and v = C ′′
√t+ logm, we obtain the desired result.
Proposition 2. Let z ∈ Rm be a standard Gaussian vector. Let t ∼ N (0,Σ) be independent of z, where Σ ∈ Rm
satisfies Σi,i ≤ 1 for i ∈ [m] and Σi1,i2 = O(
1m
)for i1 6= i2. Then, for any ε ∈ (0, 1), with probability 1−e−Ω(mε2),∣∣∣∣∣∣ 1
m
∑i∈[m]
tiθi(zi)
∣∣∣∣∣∣ ≤ ε. (84)
Proof. We have
‖Σ‖2F =∑i∈[m]
Σ2i,i +
∑i1 6=i2
Σ2i1,i2 ≤ m+m2 ·O
( 1
m2
)= O(m), (85)
and from Lemma 4,
‖Σ‖2→2 ≤ ‖Σ‖1→1 = O(1). (86)
Then, setting B = Σ in Lemma 5, we obtain with probability at least 1− e−Ω(m) that∑i∈[m]
t2i ≤ 2m. (87)
Again, note that θi(z0), θi(z1), . . . , θi(zm−1) form a Rademacher sequence. For any u > 0 and a sufficiently large
absolute constant C, we have
P
∣∣∣∣∣∣ 1
m
∑i∈[m]
tiθi(zi)
∣∣∣∣∣∣ > C√mu
≤ P
∑i∈[m]
t2i > 2m
+ P
∣∣∣∣∣∣ 1
m
∑i∈[m]
tiθi(zi)
∣∣∣∣∣∣ > C√mu∣∣∣ ∑i∈[m]
t2i ≤ 2m
(88)
≤ O(e−u
2)+ e−Ω(m), (89)
where we use Lemma 3 and (87) to derive (89). Setting ε = C√mu, and we obtain the desired result.
C. Proof of Theorem 1
Before proving Theorem 1, we provide some further lemmas.
Lemma 10. Suppose that the assumptions in Lemma 1 are satisfied. Then, for any v ∈ Rn and t > 0 satisfying
m = Ω(t+ log n), with probability 1− e−Ω(t) − e−Ω(√m), we have∣∣∣∣⟨ 1
mATb− λx∗,Dξv
⟩∣∣∣∣ ≤ O(‖v‖2
√t+ log n
m
). (90)
Proof. Using the same notations for si,ui, ri, hi as those in the proof of Lemma 1, we have⟨1
mATb− λx∗,Dξv
⟩=‖v‖2m
∑i∈[m]
(〈ai,Dξv〉bi − λ〈x∗,Dξv〉) (91)
=‖v‖2m
∑i∈Jη
(〈g,wi〉θ(〈g,ui〉)− λ
⟨wi,
ui‖ui‖2
⟩)
− λ‖v‖2m
∑i∈Jη
(〈x∗,Dξv〉 −
⟨wi,
ui‖ui‖2
⟩)
+‖v‖2m
∑i∈Iη
(〈g,wi〉bi − λ〈x∗,Dξv〉) , (92)
August 10, 2021 DRAFT
16
where v := v‖v‖2 and wi := si←(Dξv) so that 〈x∗,Dξv〉 = 〈si←(x∗), si←(Dξv)〉 = 〈si,wi〉. Note that (92) is
identical to (48), except that we use 〈g,wi〉 to replace gj−i therein. Similarly to (52), we know that we only need to
focus on bounding 1m
∑i∈[m]
(〈g,wi〉θ(〈g,ui〉)− λ
⟨wi,
ui‖ui‖2
⟩). Setting αi,` := Cov[〈g,wi〉, h`] =
⟨wi,
u`‖u`‖2
⟩,
similarly to (56), 〈g,wi〉 can be written as
〈g,wi〉 =∑`∈[m]
αi,`h` + γ`f`, (93)
where γ` =√
1−∑`∈[m] α
2i,` and f` ∼ N (0, 1) is independent of h0, h1, . . . , hm. Note that by Lemma 6, we
have with probability at least 1− e−t that
maxi 6=`〈wi, s`〉 ≤ ρ ·O(
√t+ logm) = O
(1
m√t+ logm
), (94)
where we use ρ = O(
1m(t+logm)
)in (94). By the (m,β)-orthogonality (cf. the first paragraph in the proof of
Lemma 1 for the cases of random flips and Gaussian noise), for all i, ` ∈ [m], we have∑j∈[n]
v2j r`(j + i)2 ≤ ‖r`‖2∞ = O
(1
m2(t+ logm)
). (95)
Setting u = O(√t+ logm) in Lemma 3 and taking a union bound over i, ` ∈ [m], we obtain with probability at
least 1− e−t that
maxi,`∈[m]
|〈wi, r`〉| = maxi,`∈[m]
|〈Dξv, s→i(r`)〉| (96)
= maxi,`∈[m]
|∑j∈[n]
vjr`(j + i)ξj | ≤ O( 1
m
). (97)
Therefore, we obtain from (94) and (97) that
maxi 6=`|αi,`| ≤ 2 max
i6=`|〈wi,u`〉| = 2 max
i6=`|〈wi, s` − r`〉| = O
( 1
m
), (98)
and we are able to derive inequalities similar to (85), (86) and (87). Following the same proof techniques as those
for Lemma 1, we obtain the desired result.
Based on Lemmas 1 and 10, and using a well-established chaining argument that we do not repeat here (e.g.,
see [46, Lemma 3]), we attain the following lemma. Note that we use the (m,β)-orthogonality and the event E
(cf. (44)) only once, and do not need to take any union bound for them.
Lemma 11. Under the conditions in Theorem 1, with probability at least 1− e−Ω(k log Lrδ )− e−Ω(
√m), it holds that∣∣∣∣⟨ 1
mATb− λx∗,Dξx− x∗
⟩∣∣∣∣ ≤ O√k log Lr
δ
m
(‖x− x∗‖2 + δ). (99)
The restricted isometry property (RIP) is widely used in compressive sensing (CS), and we present its definition
in the following.
Definition 3. A matrix Φ ∈ Cm×n is said to have the restricted isometry property of order s and level δ ∈ (0, 1)
(equivalently, (s, δ)-RIP) if
(1− δ)‖x‖22 ≤ ‖Φx‖22 ≤ (1 + δ)‖x‖22 (100)
August 10, 2021 DRAFT
17
for all s-sparse vectors in Cn. The restricted isometry constant δs is defined as the smallest value of δ for which (100)
holds.
We have the following lemma that guarantees the RIP for partial Gaussian circulant matrices.
Lemma 12. (Adapted from [55, Theorem 4.1]) For s ≤ n and η, δ ∈ (0, 1), if
m ≥ Ω( sδ2
(log2 s
) (log2 n
)), (101)
then with probability at least 1−e−Ω((log2 s)(log2 n)), the restricted isometry constant of 1√m
A = 1√m
RΩCg satisfies
δs ≤ δ.
Based on the RIP, we have the Johnson-Lindenstrauss embedding according to the following lemma.
Lemma 13. ([56, Theorem 3.1]) Fix η > 0 and ε ∈ (0, 1), and consider a finite set E ⊆ Cn of cardinality |E| = p.
Set s ≥ 40 log 4pη , and suppose that Φ ∈ Cm×n satisfies the RIP of order s and level δ ≤ ε
4 . Then, with probablity
at least 1− η, we have
(1− ε)‖x‖22 ≤ ‖ΦDξx‖22 ≤ (1 + ε)‖x‖22 (102)
uniformly for all x ∈ E, where Dξ = Diag(ξ) is a diagonal matrix with respect to a Rademacher sequence ξ.
Combining Lemmas 12 and 13 with setting s = Ω(log p) and η = O(
1p
), we derive the following corollary.
Corollary 1. Fix ε ∈ (0, 1), and consider a finite set E ⊆ Cn of cardinality |E| = p satisfying n = Ω(log p).
Suppose that
m = Ω
(log p
ε2(log2(log p)
) (log2 n
)). (103)
Then, with probablity 1− e−Ω(log p) − e−Ω((log2(log p))(log2 n)), we have
(1− ε)‖x‖22 ≤∥∥∥∥ 1√
mAx
∥∥∥∥2
2
≤ (1 + ε)‖x‖22 (104)
for all x ∈ E, where we recall that A = RΩCgDξ = ADξ (cf. (1)).
Note that from Lemma 2 and a union bound over [n], and by Lemma 4, we have that for any t > 0, with
probability e−Ω(t),∥∥ 1√
mA∥∥
2→2≤ n
√t+logn√m
. Based on this bound for the spectral norm of 1√m
A and Corollary 1,
and by again using a well-established chaining argument (e.g., see [3, Lemma 4.1], [57, Section 3.4]), we obtain the
following lemma that guarantees the two-sided Set-Restricted Eigenvalue Condition (S-REC) [3] for partial Gaussian
circulant matrices. This lemma will only be used to derive an upper bound for the term corresponding to adversarial
noise.
Lemma 14. For any δ > 0 satisfying Lr = Ω(δn) and n = Ω(k log Lr
δ
), and any α ∈ (0, 1), if
m = Ω
(k log Lr
δ
α2
(log2
(k log
Lr
δ
))(log2 n
)), (105)
August 10, 2021 DRAFT
18
then with probability 1− e−Ω(k log Lrδ ) − e−Ω((log2(k log Lr
δ ))(log2 n)), it holds that
(1− α)‖x1 − x2‖2 − δ ≤∥∥∥∥ 1√
mA(x1 − x2)
∥∥∥∥2
≤ (1 + α)‖x1 − x2‖2 + δ, (106)
for all x1,x2 ∈ G(Bk2 (r)).
We now provide the proof of Theorem 1.
Proof of Theorem 1. Because x is a solution to (7) and x∗ ∈ K := G(Bk2 (r)), we have
1
mbT (Ax) ≥ 1
mbT (Ax∗). (107)
Recall that A = RΩCg, and x∗ = Dξx∗ (cf. Section III). We have A = RΩCgDξ = ADξ and
1
mbT (ADξx) ≥ 1
mbT (Ax∗), (108)
which can equivalently be expressed as⟨1
mAT b− λx∗,Dξx− x∗
⟩+ 〈λx∗,Dξx− x∗〉 ≥ 0. (109)
By the assumption that G(Bk2 (r)) ⊆ Sn−1, we obtain that ‖Dξx‖2 = ‖x‖2 = 1, and
〈λx∗, x∗ −Dξx〉 = λ〈x∗,x∗ − x〉 =λ
2‖x∗ − x‖22. (110)
Therefore, we obtain
λ
2‖x∗ − x‖22 ≤
∣∣∣∣⟨ 1
mAT (b− b),Dξx− x∗
⟩∣∣∣∣+
∣∣∣∣⟨ 1
mATb− λx∗,Dξx− x∗
⟩∣∣∣∣ (111)
≤∥∥∥∥ 1√
m(b− b)
∥∥∥∥2
·∥∥∥∥ 1√
mA(x− x∗)
∥∥∥∥2
+
∣∣∣∣⟨ 1
mATb− λx∗,Dξx− x∗
⟩∣∣∣∣ . (112)
Setting α = 12 in Lemma 14, we have that if m = Ω(k log Lr
δ (log2(k log Lrδ ))(log2 n)), with probability at least
1− e−Ω(k log Lrδ ) − e−Ω((log2(k log Lr
δ ))(log2 n)),∥∥∥∥ 1√m
A(x− x∗)
∥∥∥∥2
≤ O(‖x− x∗‖2 + δ). (113)
Moreover, by Lemma 11, with probability 1− e−Ω(k log Lrδ ) − e−Ω(
√m), we have∣∣∣∣⟨ 1
mATb− λx∗,Dξx− x∗
⟩∣∣∣∣ ≤ O√k log Lr
δ
m
(‖x− x∗‖2 + δ). (114)
Recall that we assume 1√m‖b− b‖2 ≤ ς . Combining (112), (113) and (114), we obtain with probability at least
1− e−Ω(k log Lrδ ) − e−Ω((log2(k log Lr
δ ))(log2 n)) − e−Ω(√m) that
λ
2‖x∗ − x‖22 ≤ O
√k log Lrδ
m+ ς
(‖x− x∗‖2 + δ). (115)
Then, by considering both possible cases of which of the two terms in (115) is larger, we find that the desired result
holds when δ = O(√k log Lr
δ
λ√m
+ ςλ
).
August 10, 2021 DRAFT
19
V. NUMERICAL EXPERIMENTS
We empirically evaluate the correlation-based decoder on the MNIST [58] and celebA [59] datasets. The MNIST
dataset consists of 60,000 handwritten images of size n = 28× 28 = 784. The CelebA dataset is a large-scale face
attributes dataset of more than 200,000 images of celebrities. The input images were cropped to a 64× 64 RGB
image, yielding a dimensionality of n = 64× 64× 3 = 12288. For simplicity, we assume that there is no adversarial
noise. We make use of the practical iterative algorithm proposed in [44]:
x(t+1) = PG(x(t) + µAT (b− sign(Ax(t)))
), (116)
where PG(·) is the projection function onto G(Bk2 (r)), b is the uncorrupted observation vector, x(0) = 0, and
µ > 0 is a parameter. As the performance difference among all three noisy 1-bit observation models (random bit
flips, and Gaussian or logit noise added before quantization; cf. Section II) is mild, we only present the numerical
results corresponding to the case of Gaussian noise.
We compare with the 1-bit Lasso [60], and BIHT [61], which are sparsity-based algorithms. The sensing matrix
A is either a randomly signed partial Gaussian circulant matrix or a standard Gaussian matrix. On the MNIST
dataset, a generative model based on variational auto-encoder (VAE) is used and thus our method is denoted by
1b-VAE-C or 1b-VAE-G. where “C” refers to the partial Gaussian circulant matrix and “G” refers to the standard
Gaussian matrix. The latent dimension of the VAE is k = 20. On the celebA dataset, a generative model based
on Deep Convolutional Generative Adversarial Networks (DCGAN) is used and thus our method is denoted by
1b-DCGAN-C or 1b-DCGAN-G, respectively. The latent dimension of the DCGAN is k = 100.
We follow [44] to perform projected gradient descent (PGD) to maximize a constrained correlation-based objective
function. The number of total iterations is 300 with µ = 0.2. We use Adam optimizer to perform the projection
with adaptive number of steps and learning rates based on the changes of x(t). When x(t) changes dramatically in
the beginning phase of the optimization, we use a large number of Adam optimization steps with a large learning
rate. When x(t) has minor change in the later phase of optimization, we use a small number of Adam optimization
steps with a small learning rate. Specifically, in the first 20 iterations, the number of Adam optimization steps is 30
with learning rate 0.3. From the 20-th iteration to 100-th iteration, the number of Adam optimization steps is 10
with learning rate 0.2. From the 100-th iteration to 300-th iteration, the number of Adam optimization steps is 5
with learning rate 0.1. The generative models we use in our experiments have the same architectures as those of
[3]. The experiments are done on test images that were not used when training the generative models. To reduce
the impact of local minima, we choose the best estimation among 5 random restarts. On the MNIST dataset, the
standard deviation of the Gaussian noise σ is chosen as 0.1. On the celebA dataset, the standard deviation σ is
chosen as 0.01.
We observe from Figures 1, 2, and 3 that on both datasets, the sparsity-based methods 1-bit Lasso and BIHT
attain poor reconstructions, while the generative model based method (116) attains mostly accurate reconstructions
when the number of measurements is small. From Figures 1 and 3-a, we observe that for the MNIST dataset, the
reconstruction performance of using a partial Gaussian circulant matrix and a standard Gaussian matrix are almost
August 10, 2021 DRAFT
20
Orig
inal
1b-L
asso
-CBI
HT-C
1b-V
AE-G
1b-V
AE-C
Figure 1: Reconstruction results on MNIST with 200 measurements.
Orig
inal
1b-L
asso
-CBI
HT-C
1b-G
AN-G
1b-G
AN-C
Figure 2: Reconstruction results on celebA with 1000 measurements.
the same. For the celebA dataset, according to Figures 2 and 3-b, the reconstruction performances are again very
similar.
We also compare the running time of each iteration of (116) for different sensing matrices (standard Gaussian or
partial Gaussian circulant). When A is a standard Gaussian matrix, the time complexity of calculating Ax(t) and
ATb is O(mn). When A is a partial Gaussian circulant matrix, we can use FFT to accelerate the matrix-vector
August 10, 2021 DRAFT
21
0 100 200 300 400 500Number of measurements
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16Re
cons
truct
ion
erro
r (pe
r pix
el)
1b-Lasso-CBIHT-C1b-VAE-G1b-VAE-C
0 1000 2000 3000 4000 5000Number of measurements
0.05
0.10
0.15
0.20
0.25
0.30
Reco
nstru
ctio
n er
ror (
per p
ixel
)
1b-Lasso-CBIHT-C1b-DCGAN-G1b-DCGAN-C
(a) Results on MNIST (b) Results on celebA
Figure 3: The vertical bars indicate 95% confidence intervals. The reconstruction error is calculated over 1000
images by averaging the per-pixel error in terms of `2 norm.
MNIST (n = 784) celebA (n = 12288)
m=200 m=700 m=200 m=5000
Gaussian 51.4 113.6 704.8 5632.6
Circulant 33.6 57.8 378.3 1367.5
Table I: Average time comparison (ms) of different sensing matrices per iteration in (116). We compare the
running time in a single 2.9GHz CPU core. GPU can also be applied to further speed up the computation.
multiplications, and the time complexity is reduced to O(n log n). Since the dimensionality of the image vector is
relatively small, and the projection operator PG(·) is also time consuming, in our experiments, the speedup of using
a partial Gaussian circulant matrix is not as significant as that in [24, Table 2]. However, from Table I, we can
still observe that using a partial Gaussian circulant matrix is typically about 2 times faster than using a standard
Gaussian matrix, and for the celebA dataset with ambient dimension n = 12288, when m = 5000, using partial
Gaussian circulant matrices leads to around 4 times speedup compared with using standard Gaussian matrices. We
expect a greater speedup for settings with higher dimensionality.
VI. CONCLUSION AND FUTURE WORK
We have established a sample complexity upper bound for noisy 1-bit compressed sensing for the case of using a
randomly signed partial Gaussian circulant sensing matrix and a generative prior. The sample complexity upper
bound matches that obtained previously for i.i.d. Gaussian sensing matrices, and our scheme is robust to adversarial
noise.
We focused only on non-uniform recovery guarantees, and uniform recovery is an immediate direction for further
research. Relaxing the `∞ assumption and allowing for general scalings of k (not only k n1/4) would also be
August 10, 2021 DRAFT
22
of significant interest. Finally, handling Fourier measurements without column sign flips would be of significant
interest for applications where the measurement matrix is constrained as such.
APPENDIX A
PROOF OF LEMMA 7
For any ε > 0, from Lemma 6 and the union bound, with probability at least 1−m2e− ε2
8ρ2 , it holds that
|〈si, sj〉| ≤ ε (117)
for all 0 ≤ i 6= j < m. Setting m2e− ε2
8ρ2 = ν, we obtain
ε = 2√
2ρ
√log
m2
ν. (118)
Let B = [s0, s1, . . . , sm−1] ∈ Rn×m. Then, all the diagonal entries of BTB are 1, and the magnitude of all the
off-diagonal entries are upper bounded by ε. By the Gershgorin circle theorem [62], we have
σm(BTB) ≥ 1− (m− 1)ε > 1−mε, (119)
where σm(·) denotes the m-th largest singular value of a matrix. For any i ∈ 1, . . . ,m − 2, and any unit
vector t in the span of s0, . . . , si−1, we write t as t =∑j<i αjsj . Let αi = [α0, . . . , αi−1]T ∈ Ri and Bi =
[s0, s1, . . . , si−1] ∈ Rn×i. Similarly to (119), we have
σi(BTi Bi) ≥ 1− (i− 1)ε > 1−mε. (120)
Therefore, we obtain
1 = ‖Biαi‖22 ≥ σ2i (Bi)‖αi‖22 = σi(B
Ti Bi)‖αi‖22 ≥ (1−mε)‖αi‖22, (121)
and hence
‖αi‖22 ≤1
1−mε. (122)
In addition,
〈si, t〉2 =
∑j<i
αj〈si, sj〉
2
≤ ‖αi‖22
∑j<i
〈si, sj〉2 (123)
≤ 1
1−mε· iε2 <
mε2
1−mε. (124)
By definition, the squared length of the projection of si onto the span of s0, . . . , si−1 is equal to max〈t, si〉2 :
t ∈ spans0, . . . , si−1, ‖t‖2 = 1. Hence, the projection of si onto the span of s0, . . . , si−1 has squared length at
most mε2
1−mε . Then, if
2√
2mρ logm2
ν<
1
2, (125)
recalling that ε = 2√
2ρ√
log m2
ν , we also have mε < 12 , and
mε2
1−mε< 2mε2 = 16mρ2 log
m2
ν(126)
= (2√
2ρ)× (4√
2mρ logm2
ν) ≤ 2
√2ρ < 4ρ. (127)
August 10, 2021 DRAFT
23
We can now establish Lemma 7 by performing the following procedure on the vectors (essentially Gram-Schmidt
orthogonalization):
1) Initialize: u0 = s0, r0 = 0.
2) For i = 2, 3, . . . ,m− 1, we set ri to be the projection of si onto the span of s0, . . . , si−1, and ui = si − ri.
From (118) and (127), we obtain the desired result.
APPENDIX B
TABLE OF NOTATION FOR THE PROOF OF LEMMA 1
For ease of reference, the notation used in proving our most technical result, Lemma 1, is shown in Table II.
REFERENCES
[1] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing. Springer New York, 2013.
[2] M. J. Wainwright, High-dimensional statistics: A non-asymptotic viewpoint. Cambridge University Press, 2019, vol. 48.
[3] A. Bora, A. Jalal, E. Price, and A. G. Dimakis, “Compressed sensing using generative models,” in Int. Conf. Mach. Learn. (ICML), 2017,
pp. 537–546.
[4] M. Wainwright, “Sharp thresholds for high-dimensional and noisy sparsity recovery using `1-constrained quadratic programming (Lasso),”
IEEE Trans. Inf. Theory, vol. 55, no. 5, pp. 2183–2202, May 2009.
[5] D. L. Donoho, A. Javanmard, and A. Montanari, “Information-theoretically optimal compressed sensing via spatial coupling and approximate
message passing,” IEEE Trans. Inf. Theory, vol. 59, no. 11, pp. 7434–7464, Nov. 2013.
[6] D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp, “Living on the edge: Phase transitions in convex programs with random data,” Inf.
Inference, vol. 3, no. 3, pp. 224–294, 2014.
[7] J. Wen, Z. Zhou, J. Wang, X. Tang, and Q. Mo, “A sharp condition for exact support recovery with orthogonal matching pursuit,” IEEE
Trans. Sig. Proc., vol. 65, no. 6, pp. 1370–1382, 2016.
[8] M. Wainwright, “Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting,” IEEE Trans. Inf. Theory,
vol. 55, no. 12, pp. 5728–5741, Dec. 2009.
[9] E. Arias-Castro, E. J. Candes, and M. A. Davenport, “On the fundamental limits of adaptive sensing,” IEEE Trans. Inf. Theory, vol. 59,
no. 1, pp. 472–481, Jan. 2013.
[10] E. J. Candes and M. A. Davenport, “How well can we estimate a sparse vector?” Appl. Comp. Harm. Analysis, vol. 34, no. 2, pp. 317–323,
2013.
[11] J. Scarlett and V. Cevher, “Limits on support recovery with probabilistic models: An information-theoretic framework,” IEEE Trans. Inf.
Theory, vol. 63, no. 1, pp. 593–620, 2017.
[12] ——, “An introductory guide to fano’s inequality with applications in statistical estimation,” https://arxiv.org/1901.00555, 2019.
[13] Z. Liu and J. Scarlett, “Information-theoretic lower bounds for compressive sensing with generative models,” IEEE J. Sel. Areas Inf. Theory,
vol. 1, no. 1, pp. 292–303, 2020.
[14] A. Kamath, S. Karmalkar, and E. Price, “On the power of compressed sensing with generative models,” in Int. Conf. Mach. Learn. (ICML),
2020, pp. 5101–5109.
[15] Z. Liu, S. Ghosh, and J. Scarlett, “Towards sample-optimal compressive phase retrieval with sparse and generative priors,”
https://arxiv.org/2106.15358, 2021.
[16] P. T. Boufounos, “Reconstruction of sparse signals from distorted randomized measurements,” in Int. Conf. Acoust. Sp. Sig. Proc. (ICASSP),
2010, pp. 3998–4001.
[17] P. T. Boufounos and R. G. Baraniuk, “1-bit compressive sensing,” in Conf. Inf. Sci. Syst. (CISS), 2008, pp. 16–21.
[18] A. Gupta, R. Nowak, and B. Recht, “Sample complexity for 1-bit compressed sensing and sparse classification,” in Int. Symp. Inf. Theory
(ISIT). IEEE, 2010, pp. 1553–1557.
[19] R. Zhu and Q. Gu, “Towards a lower sample complexity for robust one-bit compressed sensing,” in Int. Conf. Mach. Learn. (ICML), 2015,
pp. 739–747.
August 10, 2021 DRAFT
24
Notations Explanations
A = RΩCg A partial Gaussian circulant matrix
A = RΩCgDξ A randomly signed partial Gaussian circulant matrix
i An integer in [m]
j An integer in [n]
` An integer in [m]
θ θ(x) = τ · sign(x) with P(τ = −1) = p for random flips and θ(x) = sign(x+ e) with e ∼ N (0, σ2) for Gaussian noise
λ λ = E[gθ(g)] for g ∼ N (0, 1); λ = (1− 2p)√
2π
for random flips and λ =√
2π(1+σ2)
for Gaussian noise
bi = θ(〈ai,x∗〉) The i-th observation (no adversarial noise)
x∗ = Dξx∗ The product of a diagonal Rademacher matrix and the signal vector x∗
g A standard Gaussian vector in Rn
aij The (i, j)-th entry of A; aij = gj−i
ρ An upper bound for ‖x∗‖∞, ρ = O(
1m(t+logm)
)β = 2
√ρ Used in the (m,β)-orthogonality condition; β = O
(1√
m(t+logm)
)si = si←(x∗) Shifted version of x∗ to the left by i positions
ui Orthogonal component from the (m,β)-orthogonality of s``∈[m]
ri Residual component from the (m,β)-orthogonality of s``∈[m]; maxi ‖ri‖2 ≤ β
ei The i-th noise term for the case of Gaussian noise; ei ∼ N (0, σ2)
ω ∈ 0, 1 ω = 0 for the case of random flips and ω = 1 for the case of Gaussian noise
Icη The index set for i ∈ [m] such that |〈g,ui〉+ ωei| > |〈g, ri〉| with high probability
Iη Iη = [m]\IcηE High-probability event regarding g and Iη ; see (44)
Yij Yij = aijθ(〈g,ui〉)− λui(j−i)‖ui‖2
Z1, Z2, Z3 Notations used to shorten the expressions in (49)–(55)
hi hi =⟨g, ui‖ui‖2
⟩∼ N (0, 1)
uj,i,` uj,i,` =u`(j−i)‖u`‖2
= Cov[gj−i, h`]
uj,i,` uj,i,` =u`(j−i)‖u`‖2
wj,i wj,i =√
1−∑`∈[m] u
2j,i,`
tj,i A standard Gaussian random variable that is independent of h``∈[m]
θi For random bit flips, θi = θ; for random noise added before quantization, θi = sign(x+ e‖ui‖2
)
λi λi := E[θi(g)g] for g ∼ N (0, 1)
Table II: A table of notation used in the proof of Lemma 1.
[20] L. Zhang, J. Yi, and R. Jin, “Efficient algorithms for robust one-bit compressive sensing,” in Int. Conf. Mach. Learn. (ICML), 2014, pp.
820–828.
[21] S. Gopi, P. Netrapalli, P. Jain, and A. Nori, “One-bit compressed sensing: Provable support and vector recovery,” in Int. Conf. Mach. Learn.
(ICML), 2013, pp. 154–162.
[22] A. Ai, A. Lapanowski, Y. Plan, and R. Vershynin, “One-bit compressed sensing with non-Gaussian measurements,” Linear Algebra Appl.,
vol. 441, pp. 222–239, 2014.
[23] J. Romberg, “Compressive sensing by random convolution,” SIAM J. Imaging Sci., vol. 2, no. 4, pp. 1098–1128, 2009.
[24] F. X. Yu, A. Bhaskara, S. Kumar, Y. Gong, and S.-F. Chang, “On binary embedding using circulant matrices,” J. Mach. Learn. Res., vol. 18,
no. 1, pp. 5507–5536, 2017.
[25] R. M. Gray, “Toeplitz and circulant matrices: A review,” Comm. Inf. Theory, vol. 2, no. 3, pp. 155–239, 2005.
[26] S. Dirksen, H. C. Jung, and H. Rauhut, “One-bit compressed sensing with partial Gaussian circulant matrices,” Inf. Inference, 2017.
August 10, 2021 DRAFT
25
[27] S. Dirksen and S. Mendelson, “Robust one-bit compressed sensing with partial circulant matrices,” https://arxiv.org/abs/1812.06719, 2018.
[28] S. Foucart, “Flavors of compressive sensing,” in Int. Conf. Approx. Theory (ICAT). Springer, 2016, pp. 61–104.
[29] K. Knudson, R. Saab, and R. Ward, “One-bit compressive sensing with norm estimation,” IEEE Trans. Inf. Theory, vol. 62, no. 5, pp.
2748–2758, 2016.
[30] C. Xu and L. Jacques, “Quantized compressive sensing with RIP matrices: The benefit of dithering,” Inf. Inference, vol. 9, no. 3, pp.
543–586, 2020.
[31] L. Jacques and V. Cambareri, “Time for dithering: Fast and quantized random embeddings via the restricted isometry property,” Inf.
Inference, vol. 6, no. 4, pp. 441–476, 2017.
[32] S. Dirksen and S. Mendelson, “Non-Gaussian hyperplane tessellations and robust one-bit compressed sensing,”
https://arxiv.org/abs/1805.09409, 2018.
[33] X. Yi, C. Caramanis, and E. Price, “Binary embedding: Fundamental limits and fast algorithm,” in Int. Conf. Mach. Learn. (ICML), 2015,
pp. 2162–2170.
[34] S. Oymak, C. Thrampoulidis, and B. Hassibi, “Near-optimal sample complexity bounds for circulant binary embedding,” in Int. Conf.
Acoust. Sp. Sig. Proc. (ICASSP). IEEE, 2017, pp. 6359–6363.
[35] S. Dirksen and A. Stollenwerk, “Fast binary embeddings with Gaussian circulant matrices: Improved bounds,” Discrete Comput. Geom.,
vol. 60, no. 3, pp. 599–626, 2018.
[36] P. Hand, O. Leong, and V. Voroninski, “Phase retrieval under a generative prior,” in Conf. Neur. Inf. Proc. Sys. (NeurIPS), 2018.
[37] D. Van Veen, A. Jalal, M. Soltanolkotabi, E. Price, S. Vishwanath, and A. G. Dimakis, “Compressed sensing with deep image prior and
learned regularization,” https://arxiv.org/1806.06438, 2018.
[38] R. Heckel et al., “Deep decoder: Concise image representations from untrained non-convolutional networks,” in Int. Conf. Learn. Repr.
(ICLR), 2019.
[39] P. E. Hand and B. Joshi, “Global guarantees for blind demodulation with generative priors,” Conf. Neur. Inf. Proc. Sys. (NeurIPS), vol. 32,
2019.
[40] G. Ongie, A. Jalal, C. A. Metzler, R. G. Baraniuk, A. G. Dimakis, and R. Willett, “Deep learning techniques for inverse problems in
imaging,” IEEE J. Sel. Areas Inf. Theory, vol. 1, no. 1, pp. 39–56, 2020.
[41] A. Jalal, L. Liu, A. G. Dimakis, and C. Caramanis, “Robust compressed sensing using generative models,” Conf. Neur. Inf. Proc. Sys.
(NeurIPS), vol. 33, 2020.
[42] J. Whang, Q. Lei, and A. G. Dimakis, “Compressed sensing with invertible generative models and dependent noise,”
https://arxiv.org/2003.08089, 2020.
[43] J. Cocola, P. Hand, and V. Voroninski, “Nonasymptotic guarantees for spiked matrix recovery with generative priors,” in Conf. Neur. Inf.
Proc. Sys. (NeurIPS), vol. 33, 2020.
[44] Z. Liu, S. Gomes, A. Tiwari, and J. Scarlett, “Sample complexity bounds for 1-bit compressive sensing and binary stable embeddings with
generative priors,” in Int. Conf. Mach. Learn. (ICML), 2020, pp. 6216–6225.
[45] S. Qiu, X. Wei, and Z. Yang, “Robust one-bit recovery via ReLU generative networks: Improved statistical rates and global landscape
analysis,” in Int. Conf. Mach. Learn. (ICML), 2020, pp. 7857–7866.
[46] Z. Liu and J. Scarlett, “The generalized Lasso with nonlinear observations and generative priors,” in Conf. Neur. Inf. Proc. Sys. (NeurIPS),
vol. 33, 2020.
[47] Y. Plan and R. Vershynin, “Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach,” IEEE Trans.
Inf. Theory, vol. 59, no. 1, pp. 482–494, 2012.
[48] L. Jacques and C. D. Vleeschouwer, “Quantized iterative hard thresholding: Bridging 1-bit and high-resolution quantized compressed
sensing,” in Int. Conf. Samp. Theory Apps. (SampTA), 2013.
[49] Y. Plan, R. Vershynin, and E. Yudovina, “High-dimensional estimation with geometric constraints,” Inf. Inference, vol. 6, no. 1, pp. 1–40,
2017.
[50] J. Vybíral, “A variant of the Johnson–Lindenstrauss lemma for circulant matrices,” J. Funct. Anal., vol. 260, no. 4, pp. 1096–1105, 2011.
[51] R. Vershynin, “Introduction to the non-asymptotic analysis of random matrices,” https://arxiv.org/abs/1011.3027, 2010.
[52] H. Rauhut, “Compressive sensing and structured random matrices,” Theoretical foundations and numerical methods for sparse recovery,
vol. 9, pp. 1–92, 2010.
[53] J. Bergh and J. Löfström, Interpolation spaces: An introduction. Springer Science & Business Media, 2012, vol. 223.
August 10, 2021 DRAFT
26
[54] D. L. Hanson and F. T. Wright, “A bound on tail probabilities for quadratic forms in independent random variables,” Ann. Math. Stat.,
vol. 42, no. 3, pp. 1079–1083, 1971.
[55] F. Krahmer, S. Mendelson, and H. Rauhut, “Suprema of chaos processes and the restricted isometry property,” Comm. Pure Appl. Math,
vol. 67, no. 11, pp. 1877–1904, 2014.
[56] F. Krahmer and R. Ward, “New and improved Johnson–Lindenstrauss embeddings via the restricted isometry property,” SIAM J. Math.
Anal., vol. 43, no. 3, pp. 1269–1281, 2011.
[57] G. Daras, J. Dean, A. Jalal, and A. G. Dimakis, “Intermediate layer optimization for inverse problems using deep generative models,”
https://arxiv.org/2102.07364, 2021.
[58] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11,
pp. 2278–2324, 1998.
[59] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Int. Conf. Comput. Vis. (ICCV), 2015, pp. 3730–3738.
[60] Y. Plan and R. Vershynin, “One-bit compressed sensing by linear programming,” Comm. Pure Appl. Math., vol. 66, no. 8, pp. 1275–1297,
2013.
[61] L. Jacques, J. N. Laska, P. T. Boufounos, and R. G. Baraniuk, “Robust 1-bit compressive sensing via binary stable embeddings of sparse
vectors,” IEEE Trans. Inf. Theory, vol. 59, no. 4, pp. 2082–2102, 2013.
[62] C. F. Van Loan and G. H. Golub, Matrix computations. Johns Hopkins University Press Baltimore, 1983.
August 10, 2021 DRAFT