sample complexity of estimating entropyhtyagi/talks/talk_simons.pdfsample complexity of estimating...
TRANSCRIPT
![Page 1: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/1.jpg)
Sample Complexity of Estimating Entropy
Himanshu TyagiIndian Institute of Science, Bangalore
Joint work with Jayadev Acharya, Ananda Theertha Suresh, and Alon Orlitsky
![Page 2: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/2.jpg)
Measuring Randomness in Data
Estimating randomness of the observed data:
Neural signal processing Feature selection for machine learning
Image Registration
Approach: Estimate the “entropy” of the generating distribution
Shannon entropy H(p)
def=
Px �px log px
1
![Page 3: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/3.jpg)
Measuring Randomness in Data
Estimating randomness of the observed data:
Neural signal processing Feature selection for machine learning
Image Registration
Approach: Estimate the “entropy” of the generating distribution
Shannon entropy H(p)
def=
Px �px log px
1
![Page 4: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/4.jpg)
Measuring Randomness in Data
Estimating randomness of the observed data:
Neural signal processing Feature selection for machine learning
Image Registration
Approach: Estimate the “entropy” of the generating distribution
Shannon entropy H(p)
def=
Px �px log px
1
![Page 5: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/5.jpg)
Estimating Shannon Entropy
For an (unknown) distribution p with a (unknown) support-size k,
How many samples are needed for estimating H(p)?
PAC Framework or Large Deviation Guarantees
Let Xn= X1, ..., Xn denote n independent samples from p
Performance of an estimator ˆH is measured by
SH(�, ✏, k)
def= min
⇢n : p
n⇣| ˆH(Xn
) � H(p)| < �⌘
> 1 � ✏,
8 p with support-size k
�
The sample complexity of estimating Shannon Entropy is defined as
S(�, ✏, k)
def= min
HSH
(�, ✏, k)
2
![Page 6: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/6.jpg)
Estimating Shannon Entropy
For an (unknown) distribution p with a (unknown) support-size k,
How many samples are needed for estimating H(p)?
PAC Framework or Large Deviation Guarantees
Let Xn= X1, ..., Xn denote n independent samples from p
Performance of an estimator ˆH is measured by
SH(�, ✏, k)
def= min
⇢n : p
n⇣| ˆH(Xn
) � H(p)| < �⌘
> 1 � ✏,
8 p with support-size k
�
The sample complexity of estimating Shannon Entropy is defined as
S(�, ✏, k)
def= min
HSH
(�, ✏, k)
2
![Page 7: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/7.jpg)
Sample Complexity of Estimating Shannon Entropy
Focus only on the dependence of S(�, ✏, k) on k
I Asymptotically consistent and normal estimators:[Miller55], [Mokkadem89], [AntosK01]
I [Paninski03] For the empirical estimator ˆHe, SHe(k) O(k)
I [Paninski04] There exists an estimator ˆH s.t. SH(k) o(k)
I [ValiantV11] S(k) = ⇥(k/ log k)
- The proposed estimator is constructive and is based on a LP- See, also, [WuY14], [JiaoVW14] for new proofs
But we can estimate the distribution itself using O(k) samples.
Is it easier to estimate some other entropy??
3
![Page 8: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/8.jpg)
Sample Complexity of Estimating Shannon Entropy
Focus only on the dependence of S(�, ✏, k) on k
I Asymptotically consistent and normal estimators:[Miller55], [Mokkadem89], [AntosK01]
I [Paninski03] For the empirical estimator ˆHe, SHe(k) O(k)
I [Paninski04] There exists an estimator ˆH s.t. SH(k) o(k)
I [ValiantV11] S(k) = ⇥(k/ log k)
- The proposed estimator is constructive and is based on a LP- See, also, [WuY14], [JiaoVW14] for new proofs
But we can estimate the distribution itself using O(k) samples.
Is it easier to estimate some other entropy??
3
![Page 9: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/9.jpg)
Sample Complexity of Estimating Shannon Entropy
Focus only on the dependence of S(�, ✏, k) on k
I Asymptotically consistent and normal estimators:[Miller55], [Mokkadem89], [AntosK01]
I [Paninski03] For the empirical estimator ˆHe, SHe(k) O(k)
I [Paninski04] There exists an estimator ˆH s.t. SH(k) o(k)
I [ValiantV11] S(k) = ⇥(k/ log k)
- The proposed estimator is constructive and is based on a LP- See, also, [WuY14], [JiaoVW14] for new proofs
But we can estimate the distribution itself using O(k) samples.
Is it easier to estimate some other entropy??
3
![Page 10: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/10.jpg)
Estimating Rényi Entropy
Definition. The Rényi entropy of order ↵, 0 < ↵ 6= 1, for a distributionp is given by
H↵(p) =
1
1 � ↵log
X
x
p
↵x
Sample Complexity of Estimating Rényi Entropy
Performance of an estimator ˆH is measured by
SH↵ (�, ✏, k)
def= min
⇢n : p
n⇣| ˆH(Xn
) � H↵(p)| < �⌘
> 1 � ✏, 8 p 2 Pk
�
The sample complexity of estimating Rényi Entropy of order ↵ is given by
S↵(�, ✏, k)
def= min
HSH↵ (�, ✏, k)
We mainly seek to characterize the dependence of S↵(�, ✏, k) on k and ↵
4
![Page 11: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/11.jpg)
Estimating Rényi Entropy
Definition. The Rényi entropy of order ↵, 0 < ↵ 6= 1, for a distributionp is given by
H↵(p) =
1
1 � ↵log
X
x
p
↵x
Sample Complexity of Estimating Rényi Entropy
Performance of an estimator ˆH is measured by
SH↵ (�, ✏, k)
def= min
⇢n : p
n⇣| ˆH(Xn
) � H↵(p)| < �⌘
> 1 � ✏, 8 p 2 Pk
�
The sample complexity of estimating Rényi Entropy of order ↵ is given by
S↵(�, ✏, k)
def= min
HSH↵ (�, ✏, k)
We mainly seek to characterize the dependence of S↵(�, ✏, k) on k and ↵
4
![Page 12: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/12.jpg)
Which Rényi Entropy is the Easiest to Estimate?
Notations:
S↵(k) �⇠⇠⌦ (k�) ) for every ⌘ > 0 and for all �, ✏ small,
S↵(�, ✏, k) � k��⌘, for all k large
S↵(k) O(k�) ) there is a constant c depending on �, ✏ s.t.
S↵(�, ✏, k) ck� , for all k large
S↵(k) = ⇥(k�) ) ⌦(k�) S↵(k) O(k�)
Theorem
For every 0 < ↵ < 1:
⇠⇠⌦ (k1/↵
) S↵(k) O(k1/↵/ log k)
For every 1 < ↵ /2 N:
⇠⇠⌦ (k) S↵(k) O(k/ log k)
For every 1 < ↵ 2 N: S↵(k) = ⇥(k1�1/↵)
5
![Page 13: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/13.jpg)
Which Rényi Entropy is the Easiest to Estimate?
Notations:
S↵(k) �⇠⇠⌦ (k�) ) for every ⌘ > 0 and for all �, ✏ small,
S↵(�, ✏, k) � k��⌘, for all k large
S↵(k) O(k�) ) there is a constant c depending on �, ✏ s.t.
S↵(�, ✏, k) ck� , for all k large
S↵(k) = ⇥(k�) ) ⌦(k�) S↵(k) O(k�)
Theorem
For every 0 < ↵ < 1:
⇠⇠⌦ (k1/↵
) S↵(k) O(k1/↵/ log k)
For every 1 < ↵ /2 N:
⇠⇠⌦ (k) S↵(k) O(k/ log k)
For every 1 < ↵ 2 N: S↵(k) = ⇥(k1�1/↵)
5
![Page 14: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/14.jpg)
Which Rényi Entropy is the Easiest to Estimate?
Notations:
S↵(k) �⇠⇠⌦ (k�) ) for every ⌘ > 0 and for all �, ✏ small,
S↵(�, ✏, k) � k��⌘, for all k large
S↵(k) O(k�) ) there is a constant c depending on �, ✏ s.t.
S↵(�, ✏, k) ck� , for all k large
S↵(k) = ⇥(k�) ) ⌦(k�) S↵(k) O(k�)
Theorem
For every 0 < ↵ < 1:
⇠⇠⌦ (k1/↵
) S↵(k) O(k1/↵/ log k)
For every 1 < ↵ /2 N:
⇠⇠⌦ (k) S↵(k) O(k/ log k)
For every 1 < ↵ 2 N: S↵(k) = ⇥(k1�1/↵)
5
![Page 15: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/15.jpg)
Which Rényi Entropy is the Easiest to Estimate?
0 1 2 3 4 5 6 7
0
1
2
3
↵ !
logS↵(k)
logk
!
Theorem
For every 0 < ↵ < 1:
⇠⇠⌦ (k1/↵
) S↵(k) O(k1/↵/ log k)
For every 1 < ↵ /2 N:
⇠⇠⌦ (k) S↵(k) O(k/ log k)
For every 1 < ↵ 2 N: S↵(k) = ⇥(k1�1/↵)
5
![Page 16: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/16.jpg)
Related Work
The ↵th power sum of a distribution p is given by
P↵(p)
def=
X
x
p
↵x
Estimating Rényi entropy with small additive error is the same asestimating power sum with small multiplicative error
I [Bar-YossefKS01] Integer moments of frequencies in a sequencewith multiplicative and additive accuracies
I [JiaoVW14] Estimating power sums with small additive error
For ↵ < 1: Additive and multiplicative accuracy estimation have roughlythe same sample complexity
For ↵ > 1: Additive accuracy estimation requires only a constant numberof samples
6
![Page 17: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/17.jpg)
Related Work
The ↵th power sum of a distribution p is given by
P↵(p)
def=
X
x
p
↵x
Estimating Rényi entropy with small additive error is the same asestimating power sum with small multiplicative error
I [Bar-YossefKS01] Integer moments of frequencies in a sequencewith multiplicative and additive accuracies
I [JiaoVW14] Estimating power sums with small additive error
For ↵ < 1: Additive and multiplicative accuracy estimation have roughlythe same sample complexity
For ↵ > 1: Additive accuracy estimation requires only a constant numberof samples
6
![Page 18: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/18.jpg)
The Estimators
![Page 19: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/19.jpg)
Empirical or Plug-in Estimator
Given n samples X1, ..., Xn,
Let Nx denote the empirical frequency of x.
pn(x)
def=
Nx
n
ˆHe↵
def=
1
1 � ↵log
Xpn(x)
↵
Theorem
For ↵ > 1: SHe
↵↵ (�, ✏, k) O
�k
�max{4,1/(↵�1)} log
1✏
�
For ↵ < 1: SHe
↵↵ (�, ✏, k) O
⇣k1/↵
�max{4,2/↵} log
1✏
⌘
Proof??
8
![Page 20: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/20.jpg)
Empirical or Plug-in Estimator
Given n samples X1, ..., Xn,
Let Nx denote the empirical frequency of x.
pn(x)
def=
Nx
n
ˆHe↵
def=
1
1 � ↵log
Xpn(x)
↵
Theorem
For ↵ > 1: SHe
↵↵ (�, ✏, k) O
�k
�max{4,1/(↵�1)} log
1✏
�
For ↵ < 1: SHe
↵↵ (�, ✏, k) O
⇣k1/↵
�max{4,2/↵} log
1✏
⌘
Proof??
8
![Page 21: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/21.jpg)
Empirical or Plug-in Estimator
Given n samples X1, ..., Xn,
Let Nx denote the empirical frequency of x.
pn(x)
def=
Nx
n
ˆHe↵
def=
1
1 � ↵log
Xpn(x)
↵
Theorem
For ↵ > 1: SHe
↵↵ (�, ✏, k) O
�k
�max{4,1/(↵�1)} log
1✏
�
For ↵ < 1: SHe
↵↵ (�, ✏, k) O
⇣k1/↵
�max{4,2/↵} log
1✏
⌘
Proof??
8
![Page 22: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/22.jpg)
Rényi Entropy Estimation to Power Sum Estimation
Estimating Rényi entropy with small additive error
is the same as
estimating power sum with small multiplicative error
Using a well-known sequence of steps,
suffices to show that bias and variance of pn are multiplicatively small
9
![Page 23: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/23.jpg)
Rényi Entropy Estimation to Power Sum Estimation
Estimating Rényi entropy with small additive error
is the same as
estimating power sum with small multiplicative error
Using a well-known sequence of steps,
suffices to show that bias and variance of pn are multiplicatively small
9
![Page 24: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/24.jpg)
Poisson Sampling
The empirical frequencies Nx are correlated.
Suppose N ⇠ Poi(n) and X1, ..., XN be independent samples from p.
Then,1. Nx ⇠ Poi(npx)
2. {Nx, x 2 X} are mutually independent
3. For each estimator ˆH, there is a modified estimator ˆH 0 such that
P⇣|H↵(p) � ˆH 0
(Xn)| > �
⌘ P
⇣|H↵(p) � ˆH(XN
)| > �⌘
+
✏
2
,
where N ⇠ Poi(n/2) and n � 8 log(2/✏).
It suffices to bound the error probability under Poisson sampling
10
![Page 25: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/25.jpg)
Performance of the Empirical Estimator
For the empirical estimator pn:
1
P↵(p)
����EP
x N↵x
n↵
�� P↵(p)
����
8>><
>>:
c1 max
⇢ �kn
�↵�1,q
kn
�, ↵ > 1,
c2
⇣k1/↵
n
⌘↵, ↵ < 1
1
P↵(p)
2Var
"X
x
N↵x
n↵
#
8>>><
>>>:
c01 max
⇢ �kn
�2↵�1,q
kn
�, ↵ > 1,
c02 max
⇢ ⇣k1/↵
n
⌘↵,q
kn , 1
n2↵�1
�, ↵ < 1
Theorem
For ↵ > 1: SHe
↵↵ (�, ✏, k) O
�k
�max{4,1/(↵�1)} log
1✏
�
For ↵ < 1: SHe
↵↵ (�, ✏, k) O
⇣k1/↵
�max{4,2/↵} log
1✏
⌘
11
![Page 26: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/26.jpg)
A Bias-Corrected Estimator
Consider an integer ↵ > 1
n↵ = n(n � 1)...(n � ↵+ 1) = ↵th falling power of n
Claim: For X ⇠ Poi(�), E[X↵] = �↵
Under Poisson sampling, an unbiased estimator of P↵(p) is
ˆPun
def=
X
x
N↵x
n↵
Our estimator for H↵(p) is ˆHun
def=
11�↵ log
ˆPun
12
![Page 27: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/27.jpg)
Performance of the Bias-Corrected Estimator
For the bias-corrected estimator p
un and an integer ↵ > 1
1
P↵(p)
2Var[p
un ]
↵�1X
r=0
✓↵2k1�1/↵
n
◆↵�r
Theorem
For integer ↵ > 1:
SHu
n↵ (�, ✏, k) O
✓k1�1/↵
�2log
1
✏
◆
To summarize:For every 0 < ↵ < 1: S↵(k) O(k1/↵
)
For every 1 < ↵ /2 N: S↵(k) O(k)
For every 1 < ↵ 2 N: S↵(k) O(k1�1/↵)
13
![Page 28: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/28.jpg)
Performance of the Bias-Corrected Estimator
For the bias-corrected estimator p
un and an integer ↵ > 1
1
P↵(p)
2Var[p
un ]
↵�1X
r=0
✓↵2k1�1/↵
n
◆↵�r
Theorem
For integer ↵ > 1:
SHu
n↵ (�, ✏, k) O
✓k1�1/↵
�2log
1
✏
◆
To summarize:For every 0 < ↵ < 1: S↵(k) O(k1/↵
)
For every 1 < ↵ /2 N: S↵(k) O(k)
For every 1 < ↵ 2 N: S↵(k) O(k1�1/↵)
13
![Page 29: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/29.jpg)
Constants are Small in Practice
Figure 1: Estimation of Renyi entropy of order 2 and order 1.5 using the bias-corrected estimator
and empirical estimator, respectively, for samples drawn from a uniform distribution. The boxplots
display the estimated values for 100 independent experiments.
� < 1 � = 1 � > 1
↵� < 1 log k 1���1�� log k 1���
1�� log k
↵� = 1
������1 log k 1
2 log k 11�� log log k
↵� > 1
������1 log k �
��1 log log k constant
Table 1: The leading terms g(k) in the approximations H�(Z�,k) ⇠ g(k) for di�erent values of ↵�and �. The case ↵� = 1 and � = 1 corresponds to the Shannon entropy of Z1,k.
In particular, for ↵ > 1
H�(Z1,k) =
↵
1 � ↵log log k + ⇥
✓1
k��1
◆+ c(↵),
and the di�erence |H2(p) � H2+�(p)| is O (✏ log log k). Therefore, even for very small ✏ this di�er-
ence is unbounded and approaches infinity in the limit as k goes to infinity. Figure 2 shows the
performance of our estimators for samples drawn from Z1,k. �
Figures 1 and 2 above illustrate the estimation of Renyi entropy for ↵ = 2 and ↵ = 1.5 using
the empirical and the bias-corrected estimators, respectively. As expected, for ↵ = 2 the estimation
works quite well for n =
pk and requires roughly k samples to work well for ↵ = 1.5. Note that
the empirical estimator is negatively biased for ↵ > 1 and the figures above confirm this. Our goal
in this work is to find the exponent of k in S�(k), and as our results show, for noninteger ↵ the
empirical estimator attains the optimal exponent; we do not consider the possible improvement in
performance by reducing the bias in the empirical estimator.
1.6 Organization
The rest of the paper is organized as follows. Section 2 presents basic properties of power sums
of distributions and moments of Poisson random variables, which may be of independent interest.
8
Figure 2: Estimation of Renyi entropy of order 2 and order 1.5 using the bias-corrected estimator
and empirical estimator, respectively, for samples drawn from Z1,k. The boxplots display the
estimated values for 100 independent experiments.
The estimation algorithms are analyzed in Section 3, in Section 3.1 we show results on the empirical
or plug-in estimate, in Section 3.2 we provide optimal results for integral ↵ and finally we provide
an improved estimator for non-integral ↵ > 1. Finally, the lower bounds on the sample complexity
of estimating Renyi entropy are established in Section 4.
2 Technical preliminaries
2.1 Bounds on power sums
Consider a distribution p over [k] = {1, . . . , k}. Since Renyi entropy is a measure of randomness
(see [Ren61] for a detailed discussion), it is maximized by the uniform distribution and the following
inequalities hold:
0 H�(p) log k, ↵ 6= 1,
or equivalently
1 P�(p) k1��, ↵ < 1 and k1�� P�(p) 1, ↵ > 1. (2)
Furthermore, for ↵ > 1, P�+�(p) and P���(p) can be bounded in terms of P�(p), using the
monotonicity of norms and of Holder means (see, for instance, [HLP52]).
Lemma 1. For every 0 ↵,P2�(p) P�(p)
2
Further, for ↵ > 1 and 0 � ↵,
P�+�(p) k(��1)(���)/� P�(p)
2,
andP���(p) k� P�(p).
9
14
![Page 30: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/30.jpg)
Lower Bounds
![Page 31: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/31.jpg)
The General Approach
S↵(�, ✏, k) � g(k) for all �, ✏ sufficiently small:
Show that there exist two distributions p and q such that
1. Support-size of both p and q is k;
2. |H↵(p) � H↵(q)| > �;
3. For all n < g(k), the variation distance kp
n � q
nk is small.
We can replace Xn with a sufficient statistic (Xn) to replace (3) with:
For all n < g(k), the variation distance kp (Xn) � q (Xn)k is small.
16
![Page 32: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/32.jpg)
The General Approach
S↵(�, ✏, k) � g(k) for all �, ✏ sufficiently small:
Show that there exist two distributions p and q such that
1. Support-size of both p and q is k;
2. |H↵(p) � H↵(q)| > �;
3. For all n < g(k), the variation distance kp
n � q
nk is small.
We can replace Xn with a sufficient statistic (Xn) to replace (3) with:
For all n < g(k), the variation distance kp (Xn) � q (Xn)k is small.
16
![Page 33: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/33.jpg)
Distance between Profile Distributions
Definition. Profile of Xn refers � = (�1, ..., �n) where
�i = number of symbols appearing i times in Xn
=
X
x
(Nx = i)
Two simple observations:
1. Profile is a sufficient statistic for the probability multiset of p
2. We can assume Poisson sampling without loss of generality
Let p� and q� denote the distribution of profiles under Poisson sampling
Theorem (Valiant08)
Given distributions p and q such that maxx max{px; qx} ✏40n , for
Poisson sampling with N ⇠ Poi(n), it holds that
kp� � q�k ✏
2
+ 5
X
a
na|Pa(p) � Pa(q)|.
17
![Page 34: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/34.jpg)
Distance between Profile Distributions
Definition. Profile of Xn refers � = (�1, ..., �n) where
�i = number of symbols appearing i times in Xn
=
X
x
(Nx = i)
Two simple observations:
1. Profile is a sufficient statistic for the probability multiset of p
2. We can assume Poisson sampling without loss of generality
Let p� and q� denote the distribution of profiles under Poisson sampling
Theorem (Valiant08)
Given distributions p and q such that maxx max{px; qx} ✏40n , for
Poisson sampling with N ⇠ Poi(n), it holds that
kp� � q�k ✏
2
+ 5
X
a
na|Pa(p) � Pa(q)|.
17
![Page 35: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/35.jpg)
Derivation of our Lower Bounds
For distributions p and q:I kp� � q�k . 5
Pa na|Pa(p) � Pa(q)|
I |H↵(p) � H↵(q)| =
11�↵
���log
P↵(p)P↵(q)
���
Choose p and q to be mixtures of d uniform distributions as follows:
pij=|xi|
kkxk1, 1 i d, 1 j k
qij=|yi|
kkyk1, 1 i d, 1 j k
Thus,
kp� � q�k . 5
X
a
⇣ n
k1�1/a
⌘a����
✓kxka
kxk1
◆a
�✓
kyka
kyk1
◆a����
|H↵(p) � H↵(q)| =
↵
(1 � ↵)k↵�1
����log
kxk↵kyk↵
· kxk1
kyk1
����
18
![Page 36: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/36.jpg)
Derivation of our Lower Bounds
For distributions p and q:I kp� � q�k . 5
Pa na|Pa(p) � Pa(q)|
I |H↵(p) � H↵(q)| =
11�↵
���log
P↵(p)P↵(q)
���
Choose p and q to be mixtures of d uniform distributions as follows:
pij=|xi|
kkxk1, 1 i d, 1 j k
qij=|yi|
kkyk1, 1 i d, 1 j k
Thus,
kp� � q�k . 5
X
a
⇣ n
k1�1/a
⌘a����
✓kxka
kxk1
◆a
�✓
kyka
kyk1
◆a����
|H↵(p) � H↵(q)| =
↵
(1 � ↵)k↵�1
����log
kxk↵kyk↵
· kxk1
kyk1
����18
![Page 37: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/37.jpg)
Derivation of our Lower Bounds: Key Construction
Distributions with kx|r = ky|r, 81 r m� 1 cannot be distinguished
with fewer than k1�1/m samples
Distributions with kx|↵ 6= ky|↵ have different H↵
Lemma
For every d 2 and ↵ not integer, there exist positive vectors x,y 2 d
such that
kxkr = kykr, 1 r d � 1,
kxkd 6= kykd,
kxk↵ 6= kyk↵.
19
![Page 38: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/38.jpg)
Derivation of our Lower Bounds: Key Construction
Distributions with kx|r = ky|r, 81 r m� 1 cannot be distinguished
with fewer than k1�1/m samples
Distributions with kx|↵ 6= ky|↵ have different H↵
Lemma
For every d 2 and ↵ not integer, there exist positive vectors x,y 2 d
such that
kxkr = kykr, 1 r d � 1,
kxkd 6= kykd,
kxk↵ 6= kyk↵.
19
![Page 39: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/39.jpg)
Derivation of our Lower Bounds: Key Construction
Lemma
For every d 2 and ↵ not integer, there exist positive vectors x,y 2 d
such that
kxkr = kykr, 1 r d � 1,
kxkd 6= kykd,
kxk↵ 6= kyk↵.
19
![Page 40: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/40.jpg)
Derivation of our Lower Bounds: Key Construction
Lemma
For every d 2 and ↵ not integer, there exist positive vectors x,y 2 d
such that
kxkr = kykr, 1 r d � 1,
kxkd 6= kykd,
kxk↵ 6= kyk↵.
x1 x2 xd
19
![Page 41: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/41.jpg)
Derivation of our Lower Bounds: Key Construction
Lemma
For every d 2 and ↵ not integer, there exist positive vectors x,y 2 d
such that
kxkr = kykr, 1 r d � 1,
kxkd 6= kykd,
kxk↵ 6= kyk↵.
x1 x2 xd
y1 yd
19
![Page 42: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/42.jpg)
In Closing ...
![Page 43: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/43.jpg)
0 1 2 3 4 5 6 7
0
1
2
3
↵ !logS↵(k)
logk
!
Rényi entropy of order 2 is the “easiest” entropy to estimate,
requiring only O(
pk) samples
Sample complexity of estimating other information measures
![Page 44: Sample Complexity of Estimating Entropyhtyagi/talks/talk_Simons.pdfSample Complexity of Estimating Shannon Entropy Focus only on the dependence of S(, ,k) on k I Asymptotically consistent](https://reader035.vdocuments.us/reader035/viewer/2022081615/5fe340e8125b8b5c796dbc53/html5/thumbnails/44.jpg)
0 1 2 3 4 5 6 7
0
1
2
3
↵ !logS↵(k)
logk
!
Rényi entropy of order 2 is the “easiest” entropy to estimate,
requiring only O(
pk) samples
Sample complexity of estimating other information measures