first-order theory of probabilistic independence and

23
arXiv:2108.07324v1 [cs.IT] 16 Aug 2021 First-Order Theory of Probabilistic Independence and Single-Letter Characterizations of Capacity Regions Cheuk Ting Li Department of Information Engineering The Chinese University of Hong Kong Email: [email protected] Abstract We consider the first-order theory of random variables with the probabilistic independence relation, which concerns statements consisting of random variables, the probabilistic independence symbol, logical operators, and existential and universal quantifiers. Although probabilistic independence is the only non-logical relation included, this theory is surprisingly expressive, and is able to interpret the true first-order arithmetic over natural numbers (and hence is undecidable). We also construct a single-letter characterization of the capacity region for a general class of multiuser coding settings (including broadcast channel, interference channel and relay channel), using a first-order formula. We then introduce the linear entropy hierarchy to classify single-letter characterizations according to their complexity. I. I NTRODUCTION In this paper, we study the first-order theory of random variables with the probabilistic independence relation. We first review some fragments of the theory studied in the literature. The probabilistic independence implication problem studied by Geiger, Paz, and Pearl [1] and Matúš [2] concerns the problem of deciding whether a list of probabilistic independence statements among random variables implies another probabilistic independence statement, e.g. deciding whether X ⊥⊥ Y XY ⊥⊥ Z X ⊥⊥ YZ , where ⊥⊥ denotes probabilistic independence, and juxtaposition XY denotes the joint random variable of X and Y . It was shown in [1] that probabilistic independence implication is finitely axiomatizable (the previous example is one of the axioms), and hence is algorithmically decidable. The conditional independence implication problem [3], [4], [5], [6] generalizes the probabilistic independence implication problem by considering probabilistic conditional independence. Pearl and Paz [6] introduced the semi-graphoid axioms, which was proved to be incomplete by Studený [7]. As shown in [8], no finite axiomization of probabilistic conditional independence is possible. Nevertheless, the semi-graphoid axioms are complete for saturated conditional independence statements [9], [10]. It is unknown whether the conditional independence implication problem is decidable [11], though some variants of this problem have been proved to be decidable or undecidable. If the cardinalities of all random variables are bounded, then it was shown by Niepert [12] that the problem is decidable (also see [13]). However, if only the cardinalities of some random variables are bounded, then it was proved by Li [14] that the problem is undecidable. Khamis, Kolaitis, Ngo and Suciu [11] showed that the general conditional independence implication problem is at most in Π 0 1 in the arithmetical hierarchy. Linear information inequalities [15] concern linear inequalities among entropy terms on the random variables. Pippenger [16] raised the question whether the axiom I (X ; Y |Z )= H (XZ )+ H (YZ ) H (XYZ ) H (Z ) 0 is sufficient to characterize every true linear information inequality. This was answered by Zhang and Yeung [17], [18] in the negative, who showed the existence of non-Shannon-type inequalities not implied by the axiom. More non-Shannon-type inequalities were given in [19], [20], [21], [22], [23]. Linear information inequalities are closely related to the problem of finding the capacity region in network coding [24], [25], [26]. General logical combinations of linear inequalities (using , , ¬) are investigated in [11]. It is unknown whether the verification of conditional linear information inequalities is decidable [27], [28], [11], though if the problem is extended to allow affine inequalities, then it was shown in [14] that the problem is undecidable. While all aforementioned problems are not existential (they are purely universal statements in the form X n .P (X n ), where P is a predicate, and all random variables X n =(X 1 ,...,X n ) are universally quantified, i.e., they are in the -fragment of the first-order theory of random variables), existential results on random variables (concerning predicates on X n in the form U m .P (X n ,U m )) are widely used in information theory. For example, in network information theory [29], capacity regions are often expressed as statements concerning the existence of some auxiliary random variables. Some examples of useful existential formulae include the double Markov property [30] (X ⊥⊥ Z |Y denotes conditional independence) X ⊥⊥ Z |Y Y ⊥⊥ Z |X ⇒∃U.(U ⊥⊥ U |X U ⊥⊥ U |Y XY ⊥⊥ Z |U ),

Upload: others

Post on 15-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

arX

iv:2

108.

0732

4v1

[cs

.IT

] 1

6 A

ug 2

021

First-Order Theory of Probabilistic Independenceand Single-Letter Characterizations of Capacity

RegionsCheuk Ting Li

Department of Information EngineeringThe Chinese University of Hong Kong

Email: [email protected]

Abstract

We consider the first-order theory of random variables with the probabilistic independence relation, which concerns statementsconsisting of random variables, the probabilistic independence symbol, logical operators, and existential and universal quantifiers.Although probabilistic independence is the only non-logical relation included, this theory is surprisingly expressive, and is ableto interpret the true first-order arithmetic over natural numbers (and hence is undecidable). We also construct a single-lettercharacterization of the capacity region for a general class of multiuser coding settings (including broadcast channel, interferencechannel and relay channel), using a first-order formula. We then introduce the linear entropy hierarchy to classify single-lettercharacterizations according to their complexity.

I. INTRODUCTION

In this paper, we study the first-order theory of random variables with the probabilistic independence relation. We first reviewsome fragments of the theory studied in the literature. The probabilistic independence implication problem studied by Geiger,Paz, and Pearl [1] and Matúš [2] concerns the problem of deciding whether a list of probabilistic independence statementsamong random variables implies another probabilistic independence statement, e.g. deciding whether X ⊥⊥ Y ∧ XY ⊥⊥ Z ⇒X ⊥⊥ Y Z , where ⊥⊥ denotes probabilistic independence, and juxtaposition XY denotes the joint random variable of X andY . It was shown in [1] that probabilistic independence implication is finitely axiomatizable (the previous example is one ofthe axioms), and hence is algorithmically decidable.

The conditional independence implication problem [3], [4], [5], [6] generalizes the probabilistic independence implicationproblem by considering probabilistic conditional independence. Pearl and Paz [6] introduced the semi-graphoid axioms, whichwas proved to be incomplete by Studený [7]. As shown in [8], no finite axiomization of probabilistic conditional independenceis possible. Nevertheless, the semi-graphoid axioms are complete for saturated conditional independence statements [9], [10].

It is unknown whether the conditional independence implication problem is decidable [11], though some variants of thisproblem have been proved to be decidable or undecidable. If the cardinalities of all random variables are bounded, then itwas shown by Niepert [12] that the problem is decidable (also see [13]). However, if only the cardinalities of some randomvariables are bounded, then it was proved by Li [14] that the problem is undecidable. Khamis, Kolaitis, Ngo and Suciu [11]showed that the general conditional independence implication problem is at most in Π0

1 in the arithmetical hierarchy.Linear information inequalities [15] concern linear inequalities among entropy terms on the random variables. Pippenger [16]

raised the question whether the axiom I(X ;Y |Z) = H(XZ) +H(Y Z)−H(XY Z)−H(Z) ≥ 0 is sufficient to characterizeevery true linear information inequality. This was answered by Zhang and Yeung [17], [18] in the negative, who showedthe existence of non-Shannon-type inequalities not implied by the axiom. More non-Shannon-type inequalities were given in[19], [20], [21], [22], [23]. Linear information inequalities are closely related to the problem of finding the capacity region innetwork coding [24], [25], [26]. General logical combinations of linear inequalities (using ∧, ∨, ¬) are investigated in [11]. Itis unknown whether the verification of conditional linear information inequalities is decidable [27], [28], [11], though if theproblem is extended to allow affine inequalities, then it was shown in [14] that the problem is undecidable.

While all aforementioned problems are not existential (they are purely universal statements in the form ∀Xn.P (Xn), whereP is a predicate, and all random variables Xn = (X1, . . . , Xn) are universally quantified, i.e., they are in the ∀∗-fragmentof the first-order theory of random variables), existential results on random variables (concerning predicates on Xn in theform ∃Um.P (Xn, Um)) are widely used in information theory. For example, in network information theory [29], capacityregions are often expressed as statements concerning the existence of some auxiliary random variables. Some examples ofuseful existential formulae include the double Markov property [30] (X ⊥⊥ Z|Y denotes conditional independence)

X ⊥⊥ Z|Y ∧ Y ⊥⊥ Z|X

⇒ ∃U.(U ⊥⊥ U |X ∧ U ⊥⊥ U |Y ∧ XY ⊥⊥ Z|U),

the copy lemma [18], [23], and the functional representation lemma [29] (also see [31]). The existential theory of randomvariables with information inequalities has been studied systematically in [32].

There are examples of rate regions and bounds in network information theory expressed in a nested “for all, there exists”form (i.e., a predicate on Xn in the form ∀Y k.P (Xn, Y k) → ∃Um.Q(Xn, Y k, Um)), e.g. the outer bound for multiterminalsource coding in [33], the upper bound on key agreement in [34, Cor. 2], the outer bound for the source broadcast problemin [35, Thm 1], and the auxiliary receivers in [36]. In these examples, there are both existentially and universally quantifiedauxiliary random variables. They can be considered as formulae in (the ∀∗∃∗-fragment of) the first-order theory of randomvariables, with a larger depth of quantifier alternation compared to existential formulae.

Although the aforementioned special cases have been studied extensively, to the best of the author’s knowledge, there has notbeen a systematic treatment on the general first-order theory of random variables with arbitrary depth of quantifier alternation(refer to the related work section). In this paper, we investigate the first-order theory of random variables with probabilistic

independence (FOTPI), which concerns formulae consisting of variables (which represents random variables), the probabilisticindependence symbol ⊥⊥, logical operators (∧, ∨, ¬) and existential and universal quantifiers (∃, ∀).1 Even though probabilisticindependence is the only non-logical relation included, we can use it to define other concepts in probability such as conditionalindependence, functional dependency, uniformity, cardinality and entropy. Therefore, most of the aforementioned problems canbe considered as fragments of FOTPI. Since cardinality bounds can be defined using first-order formulae, as a corollary of[14], FOTPI is undecidable. Also, we show that FOTPI can interpret the true first-order arithmetic over natural numbers.

Furthermore, we prove that for any setting within a general class of multiuser coding settings (including broadcast channel[37], interference channel [38], relay channel [39], and finite-state Markov channel [40] 2), the capacity region has a single-lettercharacterization that can be stated as a formula in FOTPI (here “single-letter” means the number of random variables in theformula is fixed). Whether this can be regarded as a solution to the open problem of finding single-letter characterizations ofthe capacity regions of broadcast channel and interference channel depends on the definition of “single-letter characterization”.While there is no generally accepted definition of what constitutes a single-letter characterization [44], it can be argued thatif “for all, there exists” statements (e.g. [33], [34], [35], [36]) are considered single-letter, then there is no reason to excludestatements with a larger depth of quantifier alternation.

The single-letter characterization given in this paper is very complex (its depth of quantifier alternation is 17), which is againstthe purpose of single-letter characterizations. Perhaps, instead of asking for any single-letter characterization of the capacityregion of the broadcast/interference/relay channel, a more precise question is to find the simplest single-letter characterization,which will hopefully provide more insight to the optimal coding scheme3. We propose a classification of first-order formulae,called the linear entropy hierarchy, according to their depth of quantifier alternation. We can then rigorously state the openproblems of finding single-letter characterizations of capacity regions of the aforementioned channels in the lowest possiblelevel in the linear entropy hierarchy.

This paper is organized as follows. In Section II, we define some relations over random variables in FOTPI. In Section III,we show that FOTPI can interpret the first-order arithmetic. In Section IV, we investigate definability in FOTPI. In SectionV, we study a representation of events in FOTPI. In Section VI, we study a representation of random sequences in FOTPI. InSection VII, we present the main result, which is a single-letter characterization of the capacity region for a general class ofmultiuser coding settings. In Section VIII, we propose the linear entropy hierarchy as a classification of first-order formulaeaccording to their complexity. In Section IX, we study the extension to continuous random variables.

A. Related Work

A computer program called PSITIP is described in [32], which is capable of expressing and verifying some first-orderstatements on random variables, though the paper [32] is focused only on existential and implication problems.

Regarding inner bounds of capacity regions of multiuser coding settings, several general inner bounds were studied in [45],[46], [47], [48]. These bounds can be regarded as purely existential formulae in FOTPI. The author is unaware of any generalresult on outer bounds of multiuser coding settings, though Gallager’s strategy [49], a standard method for proving outerbounds, can be applied automatically by the PSITIP software [32] to discover outer bounds for general multiuser settings. Theresultant bounds are also purely existential.

Khamis, Kolaitis, Ngo and Suciu [11] studied several classes of statements about conditional independence and linearinequalities on entropy (which are all purely universal statements), and gave upper-bounds on their hardness in the arithmetical

1We remark that this paper does not take an axiomatic approach. Theorems are not derived from a finite set of axioms of the first-order theory of probabilisticindependence, but rather from the underlying model of random variables in a probability space (i.e., the axioms of the theory are taken to be the set of truefirst-order sentences about random variables). Since FOTPI can interpret the true first-order arithmetic, by Gödel’s first incompleteness theorem, FOTPI is notrecursively axiomatizable.

2This is perhaps surprising, considering that the capacity of the finite-state Markov channel is uncomputable [41], [42]. This is used in [43] to show thatthe capacity of the finite-state Markov channel does not admit a single-letter characterization in a certain form.

3Note that even purely existential predicates (which include most existing single-letter formulae for capacity regions) can be undecidable without cardinalitybounds. See [14] and Proposition 14. Therefore, simplicity of a formula does not necessarily imply ease of computation.

hierarchy, which is, loosely speaking, characterized by the depth of quantifier alternation in a first-order formula on naturalnumbers. In other words, [11] attempted to express purely universal first-order formulae on random variables as first-orderformulae on natural numbers with the lowest depth of quantifier alternation (general first-order formulae on random variablesare not studied in [11]). In comparison, this paper tries to express capacity regions as first-order formulae on random variableswith the lowest depth of quantifier alternation.

Another undecidability result in information theory is the capacity of the finite-state Markov channel [41], [42]. Using thisfact, Agarwal [43] showed that the capacity of the finite-state Markov channel does not admit a single-letter characterizationexpressible as a conjunction of linear inequalities in the form R ≤

i αiI(UAi;UBi

|UCi) (UAi

= {Ua}a∈Ai, Ai, Bi, Ci ⊆ [n])

and polynomial constraints on the joint probability mass function, where the alphabets of all auxiliary random variables arefixed. This definition of single-letter characterization shares some similarities to the first existential level of the linear entropyhierarchy in this paper (refer to Section VIII and Remark 16 for the similarities and differences).

We also remark that the graphoid and separoid axioms are expressed using first-order languages in the work by Córdoba-Sánchez, Bielza, and Larranaga [50], though [50] has not studied the expressive power of a language with only probabilisticindependence.

The notion of normalized entropy vectors was introduced in [51], which used it to give a characterization of the capacityregion of a communication network, which is single-letter in a certain sense. However, it appears that representing a discretememoryless multiuser channel in the setting in [51] is not entirely straightforward 4.

Note that FOTPI in this paper is not the same as the first-order probabilistic logic [52], [53], [54], which is used in knowledgerepresentation. We are interested in the information contained in the random variables instead of their values (i.e., randomvariables are considered equivalent under relabelling), whereas first-order probabilistic logic concerns the knowledge on thevalues of the random variables.

Notations

We write N+ := {1, 2, . . .}, N0 := {0, 1, 2, . . .}, [a..b] := [a, b]∩Z, [n] := [1..n]. The uniform distribution over the set S isdenoted as Unif(S). The Bernoulli distribution is denoted as Bern(a) (which is 1 with probability a, 0 with probability 1−a).

II. RELATIONS OVER RANDOM VARIABLES

For simplicity, all random variables are assumed to be discrete unless otherwise stated (the case for general random variablesis discussed in Section (IX)). Since we only consider discrete random variables, we may assume that they are all defined in thestandard probability space ([0, 1],F , P ) (where P is the Lebesgue measure, and F is the σ-algebra of Lebesgue measurablesubsets of [0, 1]). Therefore, in this paper, the set of all random variables is taken to be the set of measurable functions from[0, 1] to N+. Let this set be M. Since the labelling of the random variable does not matter in the probabilistic independencerelation, we may also regard a random variable as a finite or countably-generated σ-subalgebra of F , though we will userandom variables instead of σ-subalgebras in this paper for notational simplicity.

The FOTPI is the first-order theory of (M,⊥⊥), where ⊥⊥ stands for probabilistic independence between two randomvariables. This theory consists of all first-order sentences ψ where M |= ψ (i.e., it is the complete theory Th(M,⊥⊥) of thesystem of random variables M).

We write Xι≤ Y for the condition that X is (almost surely) a function of Y (i.e., there exists a function f such that

X = f(Y ) almost surely). This can be expressed using independence as

Xι≤ Y := ∀U.

(

U ⊥⊥ Y → U ⊥⊥ X)

. (1)

It is clear that Xι≤ Y implies ∀U : (U ⊥⊥ Y → U ⊥⊥ X). For the other direction, if X

ι≤ Y does not hold, then there exists

x0, y0 such that 0 < pX|Y (x0|y0) < 1. Let U ∈ {0, 1} be a random variable such that

pU|X,Y (1|x, y) =

1 if y = y0, x = x0

0 if y = y0, x 6= x0

pX|Y (x0|y0) if y 6= y0.

It is clear that U ⊥⊥ Y and U 6⊥⊥ X .We write X ι

= Y for the condition that X is informationally equivalent to Y (i.e., there exists an injective function f suchthat X = f(Y ) almost surely). This can be expressed as

Xι= Y := X

ι≤ Y ∧ Y

ι≤ X.

4It appears that the channel constraint in [51] is for one-shot communication. While one may apply [51] on the infinite product channel, and use thecardinality of the output to normalize the rate against the number of channel uses, such approach would not be considered single-letter in the usual sense.

We use juxtaposition XY to denote the joint random variable of X and Y (while we assume random variables take valuesover natural numbers, we may apply any bijection to pairs of values of X,Y to the natural numbers, since any two bijectionsare equivalent under ι

=). Joint random variable can be characterized by the lattice join operation [55]:

Zι= XY

⇔ Xι≤ Z ∧ Y

ι≤ Z

∧ ∀U.(

(Xι≤ U ∧ Y

ι≤ U) → Z

ι≤ U

)

. (2)

Therefore, it is not necessary to include joint random variable in the language of FOTPI, though we would still use the notationXY for the Z satisfying the above formula for notational simplicity.

Mutual independence among random variables X1, . . . , Xn can be expressed as

X1 ⊥⊥ · · · ⊥⊥ Xn

⇔n∧

i=2

(

Xi ⊥⊥ (X1 · · ·Xi−1))

.

Write X ⊥⊥ Y |Z for the condition that X,Y are conditionally independent given Z . Using the functional representationlemma [29], we can express conditional independence as

X ⊥⊥ Y |Z

⇔ ∃U.U ⊥⊥ XZ ∧ Yι≤ ZU. (3)

We also define

Xι= Y := X

ι≤ Y ∧ Y

ι≤ X,

6= Y := ¬(Xι= Y ),

Xι< Y := X

ι≤ Y ∧ ¬(Y

ι≤ X).

III. REPRESENTATION OF INTEGERS

In this section, we describe a representation of integers as random variables. We show that the first-order theory of arithmeticover nonnegative integers is interpretable in the first-order theory of probabilistic independence.

A. Uniformity

To test whether X is uniformly distributed over its support, we use the result in [17] that if X,Y, Z are discrete randomvariables such that any one of them is a function of the other two, and they are pairwise independent, then they are alluniformly distributed over their supports, which have the same size. Using the notations in [14], the condition that X isuniformly distributed over its support can be expressed as

unif(X) := ∃Y, Z. triple(X,Y, Z),

where

triple(X,Y, Z) :=Xι≤ Y Z ∧ Y

ι≤ XZ ∧ Z

ι≤ XY

∧ X ⊥⊥ Y ∧ X ⊥⊥ Z ∧ Y ⊥⊥ Z.

B. Cardinality

We write X for the support of X , and |X | for the cardinality of X . To test whether X is (at most) a binary random variable(i.e., |X | ≤ 2), note that any random variable with strictly less information than X must be degenerate, and hence the conditionthat X is (at most) a binary random variable can be expressed as

card≤2(X) := ∀U(

Uι< X → U

ι= ∅

)

.

By Uι= ∅, we mean that U is informationally equivalent to the constant random variable. This can be expressed without

introducing a new constant ∅ by U ι= ∅ ⇔ U ⊥⊥ U .

If X has cardinality at most n, then any random variable with strictly less information than X has cardinality at most n− 1.Therefore, the condition that |X | ≤ n (n ≥ 2) can be defined recursively as

card≤n(X) := ∀U(

Uι< X → card≤n−1(U)

)

, (4)

card≤1(X) := (Xι= ∅).

We can then definecard=n(X) := card≤n(X) ∧ ¬card≤n−1(X), (5)

card≥n(X) := ¬card≤n−1(X).

C. Relations over Integers

Given the tests for uniformity and cardinality, a natural way to represent a positive integer k as a random variable is torepresent it as a uniformly distributed random variable with cardinality k. In this section, we express several relations overpositive integers using first-order formulae. Note that some of these formulae have appeared in [14].

• (Equality) The formula for checking |X | = |Y| for uniform X,Y is given by [14]:

ueq(X,Y ) := ∃U1, U2, U3.

triple(X,U1, U2) ∧ triple(Y, U1, U3).

We also defineueqn(X) := unif(X) ∧ card=n(X)

to check for equality against constants.• (Multiplication) The formula for checking |Z| = |X ||Y| for uniform X,Y is given by [14]:

uprod(X,Y, Z) := ∃X, Y .(

ueq(X, X) ∧ ueq(Y, Y )

∧ X ⊥⊥ Y ∧ XYι= Z

)

.

• (Comparison) The formula for checking |X | ≤ |Y| for uniform X,Y is given by [14] (with slight modification):

ule(X,Y )

:= ∃G, Y .(

uprod(X,Y,G) ∧ ueq(Y, Y ) ∧ Gι≤ Y Y

)

.

We briefly repeat the reason given in [14] here. Note that uprod(X,Y,G) ∧ ueq(Y, Y ) ∧ Gι≤ Y Y implies |X ||Y| =

|G| ≤ |Y|2, which implies |X | ≤ |Y|. For the other direction, assume X = {0, . . . , |X | − 1}, Y = {0, . . . , |Y| − 1},|X | ≤ |Y|. Take G = (Y, X), where X ∼ Unif(X ) is independent of Y . Take Y = X + Y mod |Y|. It is clear that

Gι≤ Y Y . Therefore, the formula for strict inequality |X | < |Y| is

ult(X,Y ) := ¬ule(Y,X).

Also define uge and ugt similarly. We define

ulen(X) := ∃U. ueqn(U) ∧ ule(X,U),

and similar for ultn, ugen, ugtn.• (Divisibility) The condition that |Y| is divisible by |X | for uniform X,Y can be expressed as

udiv(X,Y ) := ∃U. uprod(X,U, Y ).

The condition that |X | is a prime number is

uprime(X) := ¬∃U, V.(

6= ∅ ∧ Vι

6= ∅ ∧ uprod(U, V,X))

.

• (Successor) The condition that |Y| = |X |+ 1 for uniform X,Y can be expressed as

usucc(X,Y ) := ult(X,Y ) ∧ ∀U.(

ult(X,U) → ule(Y, U))

.

D. Interpreting First-order Arithmetic

In order to interpret the first-order theory of arithmetic, it is left to define addition. Given a discrete random variable X ,we call Y a single-mass indicator of X if there exists x such that P(X = x) > 0 and Y ι

= 1{x}(X) (i.e., Y is the indicatorfunction of X = x). This can be characterized by the first-order formula

smi(X,Y ) := (Xι= Y

ι= ∅)

∨(

Yι≤ X ∧ card=2(Y )

∧ ∀U.(

Uι≤ X ∧ card=4(U)

→ ¬∃V.(card≤2(V ) ∧ Uι≤ Y V )

))

. (6)

To check this, note that if Y ι= 1{x}(X) and U

ι≤ X ∧ card=4(U), then there are at least 3 possible values of U given Y = 0,

and there does not exist V with card≤2(V ) ∧ Uι≤ Y V . For the other direction, assume Y ∈ {0, 1} is binary but is not a

single-mass indicator of X . We can therefore construct Uι≤ X which takes two possible values given Y = 0 or Y = 1, and

there exists V with card≤2(V ) ∧ Uι≤ Y V (which indicates which of the two values U takes).

Consider

frac(X,Y, Z, U) :=(

ueq2(U) ∧ uprod(X,U,Z) ∧ uprod(Y, U, Z))

∨ ∃X, Y .(

ueq(X, X) ∧ ueq(Y, Y ) ∧ unif(Z)

∧ card=2(U) ∧ ¬unif(U)

∧ Uι≤ Z ∧ X ⊥⊥ Y ⊥⊥ U ∧ Z

ι≤ XY U

∧ ∀V.(

smi(Z, V ) → smi(XU, V ) ∨ smi(Y U, V )))

. (7)

We will show that frac(X,Y, Z, U) holds if and only if X,Y, Z are uniform, |Z| = |X |+ |Y|, and U ∼ Bern(|X |/(|X |+ |Y|)).Note that the first case ueq2(U) ∧ uprod(X,U,Z) ∧ uprod(Y, U, Z) checks for |X | = |Y| = |Z|/2. The second case checks for|Z| = |X |+ |Y|, |X | 6= |Y|. To check the second case, note that if |Z| = |X |+ |Y| (assume X = [|X |] and the same for Y,Z),then we can let U ∼ Bern(|X |/(|X |+|Y|)), X ∼ Unif[|X |], Y ∼ Unif[|Y|], X ⊥⊥ Y ⊥⊥ U and Z = X if U = 1, Z = Y +|X |

if U = 0. It is straightforward to check Z ∼ Unif[|X |+ |Y|], Zι≤ XY U and ∀V.(smi(Z, V ) → smi(XU, V )∨ smi(Y U, V )).

For the other direction, assume frac(X,Y, Z, U) holds and |X | 6= |Y|. Let U ∼ Bern(θ), θ ∈ (0, 1)\{1/2}. Note that ifsmi(Z, V ), then V ∼ Bern(1/|Z|). Also, smi(XU, V ) ∨ smi(Y U, V ) holds only if V ∼ Bern(φ) where φ = θ/|X |, θ/|Y|,(1 − θ)/|X |, or (1 − θ)/|Y|. Since θ/|X | 6= (1 − θ)/|X |, we either have 1/|Z| = θ/|X | = (1 − θ)/|Y|, 1/|Z| = θ/|X | =1 − (1 − θ)/|Y| or 1/|Z| = 1 − θ/|X | = 1 − (1 − θ)/|Y| (the other cases are similar by symmetry). The first case givesθ = |X |/(|X |+|Y|), |Z| = |X |+|Y|. For the second case, it implies |Z| ≥ 2, and hence |Y| = 1. Since θ/|X | = 1−(1−θ)/|Y|,

we have |X | = 1. Since Zι≤ XY U , we have |Z| = 2, and U ∼ Bern(1/2), giving a contradiction. For the third case, it

implies |Z| ≥ 2, |X | = |Y| = 1. Since 1 − θ/|X | = 1 − (1 − θ)/|Y|, we have θ = 1/2, giving a contradiction. Hence, onlythe first case is possible, and |Z| = |X |+ |Y|.

We can therefore define addition as follows. The formula for checking |Z| = |X |+ |Y| for uniform X,Y, Z is given by

usum(X,Y, Z) := ∃U.frac(X,Y, Z, U).

We now show that true arithmetic [56] (i.e., the theory Th(N0,+, ·, <) containing all true first-order sentences overnonnegative integers with addition, multiplication and comparison) is interpretable in the first-order theory of probabilisticindependence.

Theorem 1. True arithmetic is interpretable in the first-order theory of probabilistic independence.

Proof: We represent a ∈ N+ by a uniform random variable with cardinality a. Note that true arithmetic concerns N0

instead of N+, so we need a special representation for 0. We represent 0 by a random variable with distribution Bern(1/3)(up to relabeling). This distribution can be checked by observing that X ∼ Bern(1/3) (up to relabeling) if and only if

is0(X) := ∃U.(ueq3(U) ∧ ∅ι< X

ι< U).

The following formula checks whether X is the representation of an integer in N0:

isnat(X) := is0(X) ∨ unif(X). (8)

It is straightforward to modify the definitions of usum and uprod to accommodate this special value of 0.

As a result, by Tarski’s undefinability theorem [57], [56], FOTPI is not arithmetically definable.

IV. DEFINABLE DISTRIBUTIONS

In this section, we investigate the concept of definability in FOTPI.

Definition 2 (Definability). We use the following definitions of definability:• (Definability of distributions) We call a probability mass function p definable in FOTPI if there exists a first-order formulaP (X) such that P (X) holds if and only if X follows the distribution p up to relabeling (i.e., there exists an injectivefunction f such that f(X) ∼ p). We call a set S ⊆ Mn definable in FOTPI if there exists a first-order formulaP (X1, . . . , Xn) which holds if and only if (X1, . . . , Xn) ∈ S (note that a relation is a special case of a set for n = 2).

• (Bernoulli definability over reals) We call a real number θ ∈ [0, 1/2] Bernoulli-definable in FOTPI if the probability massfunction of the Bernoulli distribution Bern(θ) is definable in FOTPI. We call a set S ⊆ [0, 1/2]n Bernoulli-definable inFOTPI if there exists a first-order formula P (X1, . . . , Xn) which holds if and only if Xi ∼ Bern(θi) (up to relabelling),θi ∈ [0, 1/2], and (θ1, . . . , θn) ∈ S. For S ⊆ [0, 1/2]n, we call a function f : S → [0, 1/2] Bernoulli-definable in FOTPIif its graph {(θ1, . . . , θn, f(θ1, . . . , θn)) : (θ1, . . . , θn) ∈ S} ∈ [0, 1/2]n+1 is Bernoulli-definable in FOTPI.

• (Uniform definability over natural numbers) We call a set S ⊆ Nn+ uniform-definable in FOTPI if there exists a first-order

formula P (X1, . . . , Xn) which holds if and only if Xi ∼ Unif[ki] (up to relabelling) and (k1, . . . , kn) ∈ S. For S ⊆ Nn+,

we call a function f : S → N+ uniform-definable in FOTPI if its graph {(θ1, . . . , θn, f(θ1, . . . , θn)) : (θ1, . . . , θn) ∈S} ∈ Nn+1

+ is uniform-definable in FOTPI.

Sinceqeq(X,Y,B) := ∃Z.frac(X,Y, Z,B) (9)

holds if and only if X,Y are uniform, and B ∼ Bern(|X |/(|X | + |Y|)), we know that all rational numbers in [0, 1/2] areBernoulli-definable. We then show some Bernoulli-definable relations.

Lemma 3. The relations “≤”, “<” and “=” over [0, 1/2] are Bernoulli-definable in FOTPI, i.e., there is a first-order formula

ble(X,Y ) which holds if and only if X ∼ Bern(θ) and Y ∼ Bern(φ) (up to relabelling), where 0 ≤ θ ≤ φ ≤ 1/2, and there

are first-order formulae blt(X,Y ), beq(X,Y ) which hold if and only if θ < φ and θ = φ respectively.

Proof: Consider

qlt(X,Y,B)

:= ueq2(B) ∨(

card=2(B) ∧ ∃C,D.(

qeq(X,Y,C) ∧ ueq2(D)

∧ smi(BCD,C) ∧ smi(BCD,D) ∧ ¬smi(BCD,B))

)

.

We will show that if X,Y are uniform random variables with |X | = a, |Y| = b, a < b, then qlt(X,Y,B) holds if and only ifB ∼ Bern(θ) (up to relabelling) for some a/(a+ b) < θ ≤ 1/2. For the “if” direction, if θ < 1/2 (the case θ = 1/2 is clear),take

(B,C,D) =

(1, 1, 0) with prob. a/(a+ b)

(1, 0, 0) with prob. θ − a/(a+ b)

(0, 0, 0) with prob. 1/2− θ

(0, 0, 1) with prob. 1/2.

For the “only if” direction, assume B,C,D ∈ {0, 1}. Assume the single-mass indicator in smi(BCD,C) is 1{1}(C), andthe indicator in smi(BCD,D) is 1{1}(D). We must have P(C = 1) = a/(a + b) (if P(C = 1) = b/(a + b) > 1/2, then1{1}(C) cannot be a single-mass indicator of BCD). Consider the distribution of B. Since we cannot break the split themasses P(C = 1) = a/(a + b) and P(D = 1) = 1/2 among different values of B, we either assign them to the same orto different values of B. The former case is impossible due to ¬smi(BCD,B). Therefore the masses a/(a+ b) and 1/2 areassigned to different values of B, giving a/(a+ b) ≤ γ ≤ 1/2. Note that γ = a/(a+ b) and γ = 1/2 are impossible due to¬smi(BCD,B).

We also defineqle(X,Y,B) := qlt(X,Y,B) ∨ qeq(X,Y,B). (10)

Using the fact that for θ1, θ2 ≥ 0, we have θ1 ≤ θ2 ⇔ ∀a, b ∈ N+. a/b < θ1 → a/b < θ2, we can define ble(B,C),blt(B,C) and beq(B,C) by

ble(B,C) := card≤2(B) ∧ card≤2(C)

∧ ∀X,Y.(

ult(X,Y ) ∧ qlt(X,Y,B) → qlt(X,Y,C))

,

blt(B,C) := ¬ble(C,B),

beq(B,C) := ble(B,C) ∧ ble(C,B).

We can use this to show that any arithmetically definable number [58], [56] in [0, 1/2] (i.e., a real number θ ∈ [0, 1/2] suchthat the set {(a, b) ∈ N2

+ : a/b ≤ θ} is definable using a formula in first-order arithmetic) is Bernoulli-definable.

Theorem 4. Any arithmetically definable number in [0, 1/2] is Bernoulli-definable in FOTPI.

Proof: Let θ ∈ [0, 1/2] be an arithmetically definable number. By Theorem 1, we can find a first-order formula (in thetheory of probabilistic independence) ψ(X,Y ) which holds if and only if X,Y are uniform and |X |/|Y| ≤ θ. We can checkwhether B ∼ Bern(θ) by checking

∀X,Y.(

ψ(X,Y ) ↔ ∀C.(qeq(X,Y,C) → ble(C,B)))

.

We call a function f : N+ → R≥0 arithmetically definable if the set {(x, a, b) ∈ N3+ : a/b ≤ f(x)} is definable using a

formula in first-order arithmetic. We show that any arithmetically definable probability mass function is definable in FOTPI.

Theorem 5. For any probability mass function p over N+, if the function p : N+ → [0, 1] is arithmetically definable, then it

is definable in FOTPI.

Proof: Let Xι≤ Y where there are at least 3 possible values of Y given any X = x. We say that two single-mass

indicators B = 1{b}(Y ), C = 1{c}(Y ) correspond to the same value of X if the value of X given Y = b is the same as thevalue of X given Y = c. This can be checked by

smis(X,Y,B,C) := Xι≤ Y ∧ smi(Y,B) ∧ smi(Y,C) ∧

(

Bι= C ∨

¬∃U.(card≤2(U) ∧ BCι≤ XU)

)

.

To check this, note that if Bι

6= C correspond to the same value of X , since there are at least 3 possible values of Y

corresponding to that value of X , and B,C correspond to 2 of them, it is impossible to have card≤2(U) ∧ BCι≤ XU . For

the other direction, if Bι

6= C correspond to different values of X , we can take U = max{B,C}, and have BCι≤ XU . We

also define the formula for checking whether B,C correspond to different values of X :

smid(X,Y,B,C) := Xι≤ Y ∧ smi(Y,B) ∧ smi(Y,C)

∧ ¬smis(X,Y,B,C).

In order to check whether a random variable A follows p, we assign labels in {3, 4, . . .} to values of A. Assume A takesvalues over {3, 4, . . .}. We call a random variable L a label of A if the conditional distribution of L given A = a is uniformamong a different values, and these values are different for different a. This can be checked by (up to relabelling)

label3(A,L) := Aι≤ L

∧ ∀B.(

smi(L,B) → ∃U.(

uge3(U) ∧ U ⊥⊥ A

∧ ∀C.(smis(A,L,B,C) → smi(AU,C)) (11)

∧ ∀D.(

smid(A,L,B,D) → ¬∃V.(ueq(U, V ) ∧ V ⊥⊥ A ∧ smi(AV,D)))))

. (12)

Assume L is a label of A. Consider B where smi(L,B) holds, and assume B = 1{l}(L). Let the value of A conditionalon L = l be a. Then l is one of the a ≥ 3 values of L corresponding to A = a. The line (11) holds since we can haveU ∼ Unif[a], and any single-mass indicator C of L corresponding to A = a (there are a such C’s) are single-mass indicatorsof AU (since U divides the mass A = a into a equal pieces). For (12), if D is a single-mass indicator of L correspondingto a value of A other than a (let it be a), then P(D = 1) = P(A = a)/a 6= P(A = a)/a, and it is impossible to have

ueq(U, V ) ∧ V ⊥⊥ A ∧ smi(AV,D). For the other direction, using similar arguments, we can deduce that if label3(A,L)holds, then conditional on any A = a, L is uniformly distributed in a set Sa (of size that equals the size of U in the definitionof label3(A,L)), and the sizes of Sa are distinct (due to (12)), and hence we can assign the labels a = |Sa| to the values ofA.

Given A with label L, we call B a divided mass of the value A = |U| if B = 1{l}(L) is a single-mass indicator of L, andwe have A = |U| given L = l. Note that this implies P(B = 1) = P(A = |U|)/|U|. This can be checked by

divmass3(A,L, U,B)

:= ∃U .(

label3(A,L) ∧ smi(L,B) ∧ ueq(U, U)

∧ uge3(U) ∧ U ⊥⊥ A ∧ smi(AU ,B))

. (13)

Let p be an arithmetically definable probability mass function over {3, 4, . . .} (we can use the domain {3, 4, . . .} insteadof N+ by shifting). Let p : {3, 4, . . .} → [0, 1/3] be defined as p(a) := p(a)/a (which is also arithmetically definable). ByTheorem 1, we can find a first-order formula (in the theory of probabilistic independence) ψ(W,X, Y ) which holds if andonly if W,X, Y are uniform, |W| ≥ 3 and |X |/(|X |+ |Y|) ≤ p(|W|). To show that p is definable in L, we can check whetherA ∼ p (up to relabelling) by

∃L.(

label3(A,L) ∧ ∀B,U.(

divmass3(A,L, U,B)

→ ∀X,Y,C.(

ult(X,Y ) ∧ qeq(X,Y,C) → (ble(C,B) ↔ ψ(U,X, Y )))

))

.

By divmass3, the B (assume B ∼ Bern(θ), θ ≤ 1/2) and U in the above definition satisfies P(A = |U|) = θ|U|. Thesecond line of the definition states that for any uniform X,Y with |X | < |Y| and C ∼ Bern(|X |/(|X | + |Y|)), we have|X |/(|X |+ |Y|) ≤ θ = P(A = |U|)/|U| if and only if ψ(U,X, Y ) ⇔ |X|/(|X | + |Y|) ≤ p(|U|) = p(|U|)/|U|.

We say that X,Y have the same distribution up to relabelling, written as X r= Y , if there exists an injective function f

such that f(X) has the same distribution as Y . This relation is also definable in FOTPI.

Proposition 6. The following relations over random variables are definable in FOTPI:

1) The “same distribution up to relabelling” relation Xr= Y .

2) Comparison of cardinality: |X | = |Y| and |X | ≤ |Y|.

Proof: Using similar arguments as in Theorem 5, we can check whether A1r= A2 by

∃L1, L2.(

label3(A1, L1) ∧ label3(A2, L2)

∧ ∀B,U.(

(∃B1, U1.(beq(B,B1) ∧ ueq(U,U1) ∧ divmass3(A1, L1, U1, B1)))

↔ (∃B2, U2.(beq(B,B2) ∧ ueq(U,U2) ∧ divmass3(A2, L2, U2, B2))))

)

.

Intuitively, this means that there is a labelling of A1, A2 such that for any B ∼ Bern(θ) and uniform U , we have P(A1 =|U|) = θ|U| if and only if P(A2 = |U|) = θ|U|, which clearly implies A1 has the same distribution as A2.

We can check whether V is uniform and |A|+ 2 ≤ |V| by

cardleu2(A, V ) := ∃L.(

label3(A,L)

∧ ∀B,U.(

divmass3(A,L, U,B) → ule(U, V )))

.

The reason is that the smallest possible maxA among labellings of A using the set of values {3, 4, . . .} is |A| + 2. We canthen check for |A1| ≤ |A2| by

∀V.(

cardleu2(A2, V ) → cardleu2(A1, V ))

,

and we can check for |A1| = |A2| by

∀V.(

cardleu2(A1, V ) ↔ cardleu2(A2, V ))

.

V. REPRESENTATION OF EVENTS

In this section, we discuss a representation of events. While the event E can be represented by the indicator random variableC = 1{E}, there is an ambiguity since C can also be the representation of the complement Ec (we do not concern thelabelling of C).

Instead, we represent an event E with P(E) < 1 as a random variable D where D ∼ Unif[k] conditional on E, wherek ≥ 2 satisfies P(E)/k < P(Ec), and D = 0 if E does not occur (E can be recovered by taking the complement of thelargest mass of D). If P(E) = 1, it is represented by any D ∼ Unif[k] where k ≥ 2. Note that there are only two cases whereD is uniform: P(E) = 1 (which can be checked by uge2(D)) and P(E) = 0 (which can be checked by D ι

= ∅). Technically,D represents E only up to a difference of a set of measure 0, though measure 0 sets do not affect the truth value of formulaeconcerning probabilistic independence.

We can check whether C is the indicator function of the event represented by D using

ind(D,C) := (unif(D) ∧ Cι= ∅)

∨(

card=2(C) ∧ smi(D,C) ∧ ∃U, V.(

uge2(U) ∧ ueq2(V ) ∧ U ⊥⊥ V ⊥⊥ C

∧ Dι≤ CU ∧ |D| = |U|+ 1

∧ ∀F,G.(

(smi(DV,F ) ∧ ¬smi(CV, F ) ∧ smi(DV,G) ∧ smi(CV,G))

→ blt(F,G))))

Note that |D| = |U|+1 can be checked using Proposition 6 and Theorem 1. To check the above formula, note that card=2(C),

smi(D,C), Dι≤ CU and |D| = |U|+1 ensures that C,D is in the form C ∈ {0, 1},D = 0 if C = 0, andD|{C = 1} ∼ Unif[k]

(up to relabelling). In the last two lines, note that DV divides each mass of D into two equal halves, F is a single massindicator of DV where D 6= 0 (and hence P(F = 1) = P(C = 1)/(2k)), and G is a single mass indicator of DV whereD = 0 (and hence P(G = 1) = P(C = 0)/2). The condition blt(F,G) means P(C = 1)/(2k) < P(C = 0)/2, which isthe condition needed for D to be the representation of the event C = 1. Note that V is needed since blt is defined only forBernoulli random variables with parameters in [0, 1/2].

We can check whether D is the representation of some event by

isev(D) := ∃C. ind(D,C). (14)

To check whether the event represented by D1 is the complement of the event represented by D2 (up to a difference of measure0):

compl(D1, D2) := ∃C.(

ind(D1, C) ∧ ind(D2, C) ∧ ¬smi(D1D2, C))

∧ (uge2(D2) → D1ι= ∅) ∧ (uge2(D1) → D2

ι= ∅).

To check whether the event represented by D1 is the same as the event represented by D2 (up to a difference of measure 0):

eveq(D1, D2) := ∃C.(

ind(D1, C) ∧ ind(D2, C))

∧ ¬compl(D1, D2).

To check whether the event represented by D1 is a subset of the event represented by D2 (up to a difference of measure 0):

subset(D1, D2)

:= isev(D1) ∧ isev(D2)

∧(

D1ι= ∅ ∨ uge2(D2) ∨

(

¬unif(D1) ∧ ¬unif(D2)∧

∃C2.(

ind(D2, C2) ∧ ∀D1.(eveq(D1, D1) → smi(D2D1, C2))))

)

.

The reason is that if the nondegenerate event represented by D1 (let it be E1) is a subset of the nondegenerate event representedby D2 (let it be E2), then any representation D1 of E1 will be constant conditional on Ec

2, and hence C2 = 1{E2} is a singlemass indicator of D1. For the other direction, if P(E1\E2) > 0, then we can have D1|E1 ∼ Unif[k] for k large enough sothat P(E1)/k < P(E1\E2) , and hence D1 is not constant conditional on E1\E2, and hence is not constant conditional onEc

2. This means smi(D2D1, C2) cannot hold.

We can also take the union of a collection of events. Let P be a first-order formula. To check whether D is the representationof the union of all events E with a representation satisfying P :

unionP (D) := ∀D2.(

P (D2) ∧ isev(D2) → subset(D2, D))

∧ ∀D.(

isev(D) ∧ ∀D2.(

P (D2) ∧ isev(D2) → subset(D2, D))

→ subset(D, D))

. (15)

Technically, since a representation only identifies the event up to a difference of measure 0, an uncountable union may not bewell-defined (in the equivalence classes of events mod 0). Instead of the ordinary union of sets, the above definition actuallydescribes the essential union of measurable sets [59, Def. 2], which is always measurable. Nevertheless, since this paperconcerns discrete settings, we can regard essential union as ordinary union. We also define

union(D1, . . . , Dn, D) := unionD:∨

ieveq(D,Di)(D). (16)

Define inter for intersection similarly.We can also check whether the event represented by D1 is disjoint of the event represented by D2 (up to a difference of

measure 0):

disjoint(D1, D2)

:= ∃D1.(

compl(D1, D1) ∧ subset(D2, D1))

.

To check whether the event represented by D1 is independent of the event represented by D2:

indep(D1, D2)

:= ∃D1, D2.(

eveq(D1, D1) ∧ eveq(D2, D2) ∧ D1 ⊥⊥ D2

)

.

To check whether P(E1) ≤ P(E2), where Ei is represented by Di:

prle(D1, D2) := ∃D1.(

D1r= D1 ∧ subset(D1, D2)

)

. (17)

Also definepreq(D1, D2) := ∃D1.

(

D1r= D1 ∧ eveq(D1, D2)

)

. (18)

Note that this allows us to perform addition and multiplication on probability of events. For example, to check whetherP(E1) = P(E2)P(E3) +P(E4):

∃D1, . . . , D4, D23.(

4∧

i=1

preq(Di, Di)

∧ indep(D2, D3) ∧ inter(D2, D3, D23)

∧ disjoint(D23, D4) ∧ union(D23, D4, D1))

. (19)

Given A with label L (12), we can check whether P(A = |U|) > 0, and D is the representation of the event A = |U|(where U is a uniform random variable) by

labelevne3(A,L, U,D)

:= ∃C, U .(

ind(D,C) ∧ label3(A,L) ∧ smi(A,C) ∧ ueq(U, U)

∧ uge3(U) ∧ U ⊥⊥ A

∧ ∀B.(

divmass3(A,L, U,B) → smi(AU,B) ∧ smi(CU ,B)

∧ ∃D.(eveq(D, D) ∧ Bι≤ D)

)

)

.

To show the validity of this formula, we let C = 1{a}(A). Note that if |A| ≥ 3 (otherwise the above formula is obviouslyvalid), smi(AU,B) ∧ smi(CU,B) implies that B = 1 (assuming B = 1{l}(L)) only if C = 1. Hence smi(CU,B) impliesP(B = 1) = P(A = a)/|U|, and the mass A = a is divided into |U| equal pieces by L, and hence a = |U| by the definitionof label3(A,L). It is left to check that D is the representation of the event A = |U| (instead of A 6= |U|). This is ensured by

∃D.(eveq(D, D) ∧ Bι≤ D). If D represents A 6= |U|, then D is constant given A = |U|, so B cannot be a function of D

since P(B = 1) = P(A = |U|)/|U|. If D represents A = |U|, we can take D|{A = |U|} ∼ Unif[k|U|] (for large enough k soD satisfies the definition of a representation of an event) such that L is a function of D conditional on A = |U|.

Note that labelevne3 is false if P(A = |U|) = 0. To check whether D is the representation of the event A = |U| (which isempty if P(A = |U|) = 0), we use

labelev3(A,L, U,D)

:= labelevne3(A,L, U,D)

∨(

Dι= ∅ ∧ ¬∃D2.labelevne3(A,L, U,D2)

)

.

The condition that X has the same distribution as Z is written as X d= Z . We say that Y |X follows the conditional

distribution of W |Z , written as Y |X ∼W |Z , if pX(x) > 0 implies pZ(x) > 0 and pY |X(y|x) = pW |Z(y|x) for any y. Notethat since FOTPI does not concern the labelling of a random variable, we have to use another random variable LX (the label)to specify the values of X , as described in (12). The statements X

d= Z and Y |X ∼W |Z are not valid statements in FOTPI

without the labels.Also, we say that Y |X follows the conditional distribution of W |Z up to relabelling, written as Y |X

r∼W |Z , if there exists

relabellings X ι= X , Y ι

= Y , Z ι= Z , W ι

= W such that Y |X ∼ W |Z . Note that Y |Xr∼ W |Z does not depend on the

labelling of X,Y, Z,W .

Proposition 7. The following conditions are definable in FOTPI:

• The condition Xd= Z , where LX , LZ are the labels of X,Z respectively (12).

• The condition Y |X ∼W |Z , where LX , LY , LZ , LW are the labels of X,Y, Z,W respectively (12).• The condition Y |X

r∼W |Z .

Proof: Note that X d= Z can be checked by

deq(X,LX , Z, LZ)

:= label3(X,LX) ∧ label3(Z,LZ) ∧ ∀U,DX , DZ .(

labelev3(X,LX , U,DX) ∧ labelev3(Z,LZ , U,DZ)

→ P(DX) = P(DZ))

,

where P(DX) = P(DZ) is checked by (18). Also, Y |X ∼W |Z can be checked by

cdeq(X,LX , Y, LY , Z, LZ ,W, LW )

:= label3(X,LX) ∧ label3(Y, LY ) ∧ label3(Z,LZ) ∧ label3(W,LW )

∧ ∀U, V,DX , DY , DZ , DW .(

DX

ι

6= ∅ ∧ labelev3(X,LX , U,DX) ∧ labelev3(Z,LZ , U,DZ)

∧ labelev3(Y, LY , V,DY ) ∧ labelev3(W,LW , V,DW )

→ DZ

ι

6= ∅ ∧ P(DX)P(DZ ∩DW ) = P(DZ)P(DX ∩DY ))

,

where DZ ∩DW denotes the representation of the intersection of the events represented by DZ and DW (16), P(DX) denotesthe probability of the event represented by DX , and P(DX)P(DZ ∩DW ) = P(DZ)P(DX ∩DY ) can be expressed in thesame way as (19). Note that Y |X ∼ W |Z if and only if pX(x) > 0 → pZ(x) > 0 ∧ pX(x)pZ,W (x, y) = pZ(x)pX,Y (x, y)for all x, y.

For Y |Xr∼W |Z , it can be checked by

cdeqr(X,Y, Z,W )

:= ∃LX , LY , LZ , LW .cdeq(X,LX , Y, LY , Z, LZ ,W, LW ).

VI. REPRESENTATION OF RANDOM SEQUENCES

In the remainder of this paper, we use the following notations within first-order formulae:• Random variables are denoted by uppercase letters (except E).

• N0-valued variables are denoted by lowercase (English or Greek) letters. Any N0-valued variable α is understood as arandom variable obtained using the representation in Theorem 1, restricted to satisfy isnat(α) (8), and hence can be

expressed in FOTPI. Addition, comparison and multiplication can be performed on N0-valued variables due to Theorem1.

• Event variables are denoted by E (or with subscripts). Any event variable E is understood as a random variable obtainedusing the representation in Section V, restricted to satisfy isev(E) (14), and hence can be expressed in FOTPI. Set

operations and relations such as E1 ∪ E2, E1 ∩ E2, E1as= E2 (i.e., E1 ↔ E2 holds almost surely), E1

as⊆ E2 (i.e.,

E1 → E2 holds almost surely) and E1 ⊥⊥ E2 can be defined (see Section V). For the union of a collection of events Esatisfying P (E) (15), we use the notation

E:P (E)E.

We have defined label3(A,L) (12) to check whether A can be labelled using values in {3, 4, . . .}, and L|A ∼ Unif[A]. Forthe purpose of simplicity, we will shift the values of A to {0, 1, . . .} (i.e., L|A ∼ Unif[A + 3]). Given A ∈ {0, 1, . . .} withlabel L, we can check whether E is (the representation of) the event A = a, a ∈ N0 (recall that a is understood to be obtainedusing the representation in Theorem 1) by

labelev0(A,L, a, E) := labelev3(A,L, a+ 3, E).

For notational simplicity, we will denote the E satisfying the above formula by

{A ◦ L = a}. (20)

The notation A ◦ L intuitively means the random variable A with labels given by L (note that A itself, as for any randomvariable in FOTPI, does not inherently have well-defined values, since formulae in FOTPI do not concern the labelling ofrandom variables).

We can also obtain the event where the random variable A (with label L) equals the random variable B (with label M ) by⋃

E:∃a.Eas={A◦L=a}∩{B◦M=a}

E,

which is defined using (15). The above event is denoted as

{A ◦ L = B ◦M}. (21)

If the above event occurs with probability 1 (recall that we can check whether P(E) = 1 by ueq2(E)), we simply denote thiscondition as

A ◦ Las= B ◦M.

The conditional distribution relation (Proposition 7) is denoted as

Y ◦ LY |X ◦ LX ∼W ◦ LW |Z ◦ LZ . (22)

When we check independence or conditional independence for labelled random variables, e.g.,

A ◦ L ⊥⊥ B ◦M, (23)

we ignore the labels (the above line means A ⊥⊥ B).In this section, we use the notation Xn = (X1, . . . , Xn) to denote a sequence of random variables. The main challenge

of expressing a coding setting in FOTPI is that coding settings are often defined asymptotically, where n, the length of therandom sequences, tends to infinity. However, the number of random variables in a formula in FOTPI is fixed. Therefore, wehave to design a method to extract Xi, given Xn as a single random variable. We utilize the classical Gödel encoding [60],[61], [62], which we briefly recall below:

Definition 8. The Gödel beta function [61] is defined as

beta(b, c, i) := bmod (c(i + 1) + 1),

where amod b denotes the remainder when a is divided by b. It satisfies the property that for any sequence a0, a1, . . . , an ∈ N0,there exists b, c ∈ N0 such that beta(b, c, i) = ai for i ∈ [0..n]. The Cantor pairing function (which is a bijection from N0×N0

to N0) is defined as

pair(b, c) :=(b+ c)(b + c+ 1)

2+ c.

For a sequence a1, a2, . . . , an ∈ N0, its Gödel encoding is defined as

enc({ai}i∈[n]) := pair(b, c),

where b, c ∈ N0 satisfy beta(b, c, 0) = n and beta(b, c, i) = ai for i ∈ [n] (if multiple (b, c) satisfies the requirements, takethe smallest pair(b, c)). The decoding predicate dec(r, i, a) is defined so that dec(enc({ai}i∈[n]), i, a) is true if and only ifi ∈ [n] and ai = a. It can be defined by

decn(r, i, a) := ∃b, c.(

pair(b, c) = r

∧ 1 ≤ i ≤ beta(b, c, 0) ∧ beta(b, c, i) = a)

,

dec(r, i, a) := decn(r, i, a)∧

∀r′.(

(∀i′, a′.(decn(r, i′, a′) ↔ decn(r′, i′, a′))) → r′ ≥ r)

.

Note that the definition of dec enforces the minimality condition in the definition of enc (r is the smallest among all r′ whichgives the same decoded values for all i). Due to Theorem 1, dec can be defined in FOTPI (though enc cannot since FOTPIdoes not natively support quantifying over sequences).

Assume X1, . . . , Xn ∈ N0. Let X := enc(Xn) be the Gödel encoding of Xn (which is a random integer), and L|X ∼Unif[X + 3] is the label of X . We can check whether X = enc(Xn) for some Xn by the following formula in FOTPI:

isseq(X, L, n) := label3(X, L)∧

∀l.(

{X[L] = l} 6= ∅

→(

∀i.(1 ≤ i ≤ n → ∃x. dec(l, i, x)))

∧(

∀i, x.(i > n → ¬dec(l, i, x))))

,

where {X[L] = l} is defined in (20). Intuitively, this means if P(X = l) > 0, then we can decode the i-th entry of l for1 ≤ i ≤ n, and we cannot decode its i-th entry for i > n.

We now define a formula to check whether X is the i-th component (i.e., Xi) of X . Let L|X ∼ Unif[X+3] be the labellingof X . This can be checked by

entry(X, L, n,X, L, i)

:= isseq(X, L, n) ∧ label3(X,L) ∧ 1 ≤ i ≤ n

∧ ∀x, l.(

dec(l, i, x) → {X ◦ L = l}as⊆ {X ◦ L = x}

)

.

This means that if dec(l, i, x), then X = l implies X = x.We can also obtain subsequences of X = enc(Xn). To check whether Y with label M satisfies Y = enc(Xi, . . . , Xj), we

use the formula

subseq(X, L, n, Y , M , i, j)

:= 1 ≤ i ≤ j ≤ n ∧ isseq(X, L, n) ∧ isseq(Y , M , j − i+ 1)

∧ ∀k.(

i ≤ k ≤ j →

∀X,L.(

entry(X, L, n,X, L, k)

↔ entry(Y , M , j − i+ 1, X, L, k − i+ 1)))

.

For notational simplicity, in the remainder of this paper, we use the notation

X ◦ L = (X ◦ L)i

to denote entry(X, L,X, L, i), andY ◦ M = (X ◦ L)i..j

to denote subseq(X, L, Y , M , i, j). Given X1, . . . , Xn with labels L1, . . . , Ln for a fixed (non-variable) n, we write

X ◦ L = (X1 ◦ L1, . . . , Xn ◦ Ln) (24)

if

isseq(X, L, n) ∧n∧

i=1

entry(X, L, n,Xi, Li, i),

i.e., X = enc(Xn) with label L.

To check whether X = enc(Xn) (with label L) where X1, . . . , Xnare i.i.d. with the same distribution as X (with label L) ,

iid(X, L, n,X, L)

:= isseq(X, L, n) ∧ ∀i,X ′, L′, Y , M .(

X ′ ◦ L′ = (X ◦ L)i ∧ Y ◦ M = (X ◦ L)1..i−1

→ X ′ ◦ L′ d= X ◦ L ∧ X ′ ⊥⊥ Y

)

,

where X ′ ◦ L′ d= X ◦ L is defined in (22) (we can let the conditioned random variables be degenerate). The above formula

checks that Xid= X is independent of X1, . . . , Xi−1.

We now characterize the entropy H(X). By the source coding theorem, H(X) is the smallest R such that for any R′ > R

and ǫ > 0, there exists n, W , Y n with Y nι≤W

ι≤ Xn, |W| ≤ 2nR

, and P(Xn 6= Y n) ≤ ǫ, where X1, . . . , Xnare i.i.d. withthe same distribution as X . Therefore, we can check whether H(X) ≤ a/b for a, b ∈ N0, b > 0 by

hle(X, a, b) := ∀a′, b′.(

a′b > ab′ →

∀E.(

isev(E) ∧ Eι

6= ∅ → ∃X, L, Y , M , L,W, n.(

iid(X, L, n,X, L) ∧ isseq(Y , M , n)

∧ Yι≤W

ι≤ X ∧ |W|b

≤ 2na′

∧ prle({X ◦ L 6= Y ◦ M}, E))))

, (25)

where |W|b′

≤ 2na′

can be defined since exponentiation is definable using a first-order formula over natural numbers (alter-natively, we can construct an i.i.d. sequence W by iid(W,LW , b′, W , LW ), which has cardinality |W|b

, and use Proposition6 to check |W|b

≤ 2na′

), {X ◦ L 6= Y ◦ M} represents the event Xn 6= Y n (this notation is defined in (21)), andprle({X ◦ L 6= Y ◦ M}, E) (17) checks that P(Xn 6= Y n) ≤ P(E). The above formula means that for any a′, b′ such that

a′/b′ > a/b, for anyE with P(E) > 0, there exists n, W , Y n with Y nι≤W

ι≤ Xn, |W| ≤ 2na

′/b′ , and P(Xn 6= Y n) ≤ P(E),where X1, . . . , Xnare i.i.d. with the same distribution as X .

Using (25), we can check inequalities among entropy of random variables. For example, H(X) ≤ 2H(Y ) can be checkedby

∀a, b.(

hle(Y, a, 2b) → hle(X, a, b))

.

VII. SINGLE-LETTER CHARACTERIZATION OF CAPACITY REGIONS

In this section, we study a general network which encompasses the discrete memoryless networks in [63], [64], [65], thefinite-state Markov channel [40], and the Markov network [43].

Definition 9 (Joint source-channel Markov network). We define a network with k terminals as follows. Consider the sourcedistribution pW1,...,Wk

, the channel pY1,...,Yk,S′|X1,...,Xk,S , input alphabet Xi (where Xi ∈ Xi), initial state distribution pS , anddecoding requirement pZ1,...,Zk|W1,...,Wk

, where the random variables take values in N0. At the beginning of the communicationscheme, the source Wi,1, . . . ,Wi,n is given to terminal i, where (W1,t, . . . ,Wk,t) ∼ pW1,...,Wk

i.i.d. across t ∈ [n]. LetSt, Xi,t, Yi,t be the channel state, the channel input given by terminal i, and the channel output observed by terminal i attime t ∈ [n] respectively. Let S1 ∼ pS independent of {Wi,t}i,t. At time t ∈ [n], terminal i outputs Xi,t ∈ Xi as a (possiblystochastic) mapping of Wi,1, . . . ,Wi,n, Xi,1, . . . , Xi,t−1 and Yi,1, . . . , Yi,t−1 for i ∈ [k], and then Y1,t, . . . , Yk,t, St+1 aregenerated given X1,t, . . . , Xk,t, St following pY1,...,Yk,S′|X1,...,Xk,S . At the end, terminal i outputs Zi,1, . . . , Zi,n as a (possiblystochastic) mapping of Wi,1, . . . ,Wi,n, Xi,1, . . . , Xi,n and Yi,1, . . . , Yi,n. The probability of error is defined as the total variationdistance

Pe := dTV

(

{W1,t, . . . ,Wk,t, Z1,t, . . . , Zk,t}t∈[n], (pW1,...,WkpZ1,...,Zk|W1,...,Wk

)n)

,

i.e., this is a strong coordination problem [66] whereW1,t, . . . ,Wk,t, Z1,t, . . . , Zk,t should approximately follow pW1,...,WkpZ1,...,Zk|W1,...,Wk

.We say that this network admits a communication scheme if for any ǫ > 0, there exists n and a communication scheme (encodingand decoding functions) such that Pe ≤ ǫ.

Note that the joint source-channel Markov network encompasses channel coding settings by letting M1, . . . ,Ml to beindependent (they are the messages), Wi = {Mj}j∈Ei

where Ei ⊆ [k] is the set of messages that terminal k can access, andZi = {Mj}j∈Di

where Di ⊆ [k] is the set of messages that terminal k intends to decode. In this case, Pe is the probabilitythat any terminal makes an error in decoding the intended messages. The sources {Wi,t} can be regarded as messages by thesource-channel separation theorem. For example, to represent the broadcast channel pY2,Y3|X1

, let W1 = (M1,M2), Z2 =M1,

Z3 = M2, where M1,M2 are independent. Let X2 = X3 = Y1 = S = {0} (i.e., X2, X3, Y1, S are degenerate). Then therate pair (R1, R2) is achievable for the broadcast channel if and only if the joint source-channel Markov network admits acommunication scheme when H(M1) = R1, H(M2) = R2.5

We will show that the capacity region of the joint source-channel Markov network can be characterized in FOTPI.

Theorem 10. Fix k ≥ 1. There exists a first-order formula

Qk(W1, . . . ,Wk, X1, . . . , Xk, Y1, . . . , Yk, Z1, . . . , Zk, S, LS , S′, LS′)

such that for any joint source-channel Markov network (pW1,...,Wk, pY1,...,Yk,S′|X1,...,Xk,S , pS , pZ1,...,Zk|W1,...,Wk

), and any

pX1,...,Xkwhich assigns positive probability for each (x1, . . . , xk) ∈ X1 × · · · × Xk, the network admits a communication

scheme if and only if the formula Qk holds for

(X1, . . . , Xk, S, Y1, . . . , Yk, S′) ∼ pX1,...,Xk

pSpY1,...,Yk,S′|X1,...,Xk,S

independent of

(W1, . . . ,Wk, Z1, . . . , Zk) ∼ pW1,...,WkpZ1,...,Zk|W1,...,Wk

,

and LS, LS′ are labels (12) of S, S′ respectively (i.e., label3(S,LS) and label3(S′, LS′) hold)6.

Proof: The main idea of the proof is to state the multi-letter operational definition of the joint source-channel Markovnetwork using a first-order formula. Since the operational definition only consists of distribution constraints, conditionalindependence constraints, and a bound on the total variation distance, all of which can be expressed in first-order formulae,the proof is a rather straightforward rewriting of these constraints in the language of first-order formulae.

Let W[k] := enc(W1, . . . ,Wk) (with label LW[k]), and define X[k], Y[k], Z[k] similarly. This can be checked by

isseq(W[k], LW[k], k) ∧

k∧

i=1

∃L.Wi ◦ Las= (W[k] ◦ LW[k]

)i, (26)

and similar for X[k], Y[k], Z[k]. Note that k is fixed and is not a variable, so the above formula is valid.We write Wn

i = (Wi,1, . . . ,Wi,n). Let W[k],i := enc(W1,t, . . . ,Wk,t), and W[k] := enc(W[k],1, . . . ,W[k],n) (with label

LW[k]). Define X[k], Y[k], Z[k],

¯Z[k] similarly. Note that {Zi,t}i∈[k],t∈[n] (used to define Z[k]) are auxiliary random variables

that do not appear in the operational setting (they are not the same as Zi,t), which will be used later in the bound on Pe. Wecheck that they are nested sequences by

isseq(W[k], LW[k], n)

∧ ∀t, W , LW .(

W ◦ LW = (W[k]◦)t → isseq(W ′, L′W , k)

)

, (27)

and similar for X[k], Y[k], Z[k],¯Z[k], where we write W[k]◦ = W[k] ◦ LW[k]

for brevity (when the label corresponding to therandom variable is clear from the context). We write (W[k]◦)t,i := ((W[k]◦)t)i (which corresponds to Wi,t).

First we enforce that (W1,t, . . . ,Wk,t, Z1,t, . . . , Zk,t) ∼ pW1,...,WkpZ1,...,Zk|W1,...,Wk

i.i.d. across t ∈ [n] by

∀t.(

1 ≤ t ≤ n → (W[k]◦)td=W[k] ◦ ∧ (Z[k]◦)t|(W[k]◦)t ∼ Z[k] ◦ |W[k]◦

∧ (W[k]◦)t(Z[k]◦)t ⊥⊥ (W[k]◦)1..t−1(Z[k]◦)1..t−1

)

, (28)

where (W[k]◦)td=W[k]◦ means

∃W ′, L′.(

W ′ ◦ L′ = (W[k] ◦ LW[k])t ∧ W ′ ◦ L′ d

=W[k] ◦ LW[k]

)

,

and refer to (23) for the meaning of (W[k]◦)t(Z[k]◦)t ⊥⊥ (W[k]◦)1..t−1(Z[k]◦)1..t−1. Note that (28) checks that

(W1,t, . . . ,Wk,t)d= (W1, . . . ,Wk),

(Z1,t, . . . , Zk,t)|(W1,t, . . . ,Wk,t) ∼ (Z1, . . . , Zk)|(W1, . . . ,Wk),

5Technically, the capacity region is often defined as the closure of the set of achievable (R1, R2). We can represent the closure operation in FOTPI bydeclaring the rate pair (H(M1), H(M2)) to be in the capacity region if for any M1, M2 such that H(M1) < H(M1), H(M2) < H(M2) (which can beexpressed using (16)), the network admits a communication scheme for the messages M1, M2. For the purpose of simplicity, we ignore the closure operation.

6We require labels only for the present state S and the next state S′, but not for Wi,Xi, Yi, Zi, since the channel pY1,...,Yk,S′|X1,...,Xk,Sis essentially

unchanged under relabelling of Wi, Xi, Yi, Zi, but not under relabelling of S (while keeping S′ unmodified). How the values of S correspond to the valuesof S′ is essential.

and(W1,t, . . . ,Wk,t, Z1,t, . . . , Zk,t) ⊥⊥ {(W1,t′ , . . . ,Wk,t′ , Z1,t′ , . . . , Zk,t′)}t′<t.

Let S := enc(S1, . . . , Sn+1) (with label LS). We check that S1 ∼ pS is independent of {Wi,t}i,t by

isseq(S, LS , n+ 1) ∧ (S◦)1d= S ◦ ∧ (S◦)1 ⊥⊥ W[k]. (29)

WriteX ◦ LX = (X[k]◦)1..t−1,i

for the X, LX satisfying

isseq(X, LX , t− 1) ∧ ∀t′.(

1 ≤ t′ ≤ t− 1 → (X[k]◦)t′,i = (X◦)t′)

,

i.e., X, LX corresponds to the sequence Xi,1, . . . , Xi,t−1. At time t ∈ [n], terminal i outputs Xi,t as a stochastic mappingof Wi,1, . . . ,Wi,n, Xi,1, . . . , Xi,t−1 and Yi,1, . . . , Yi,t−1 (conditionally independent of {Wj,t′}j∈[k],t′∈[n], {Xj,t′}j∈[k],t′<t,{Yj,t′}j∈[k],t′<t, {Sj,t′}j∈[k],t′≤t given those random variables) for i ∈ [k]. This is checked by

k∧

i=1

∀t.(

1 ≤ t ≤ n →

(X[k]◦)t,i ⊥⊥ (W[k]◦)(X[k]◦)1..t−1(Y[k]◦)1..t−1(S◦)1..t∣

∣ (W[k]◦)1..n,i(X[k]◦)1..t−1,i(Y[k]◦)1..t−1,i

)

. (30)

The channel generates Y1,t, . . . , Yk,t, St+1 given X1,t, . . . , Xk,t, St following pY1,...,Yk,S′|X1,...,Xk,S . This can be checked by

∀t.(

1 ≤ t ≤ n →

((Y[k]◦)t, (S◦)t+1) | ((X[k]◦)t, (S◦)t) ∼ (Y[k]◦, S′◦) | (X[k]◦, S◦)

(Y[k]◦)t(S◦)t+1 ⊥⊥ (W[k]◦)(X[k]◦)1..t−1(Y[k]◦)1..t−1(S◦)1..t−1∣

∣ (X[k]◦)t(S◦)t)

, (31)

where the first line uses the notation (X1 ◦ L1, X2 ◦ L2) in (24).At the end, terminal i outputs Zi,1, . . . , Zi,n as a stochastic mapping of Wi,1, . . . ,Wi,n, Xi,1, . . . , Xi,n and Yi,1, . . . , Yi,n.

This can be checked by

k∧

i=1

(

(¯Z[k]◦)1..n,i ⊥⊥ (W[k]◦)(X[k]◦)(Y[k]◦)(S◦)

∣ (W[k]◦)1..n,i(X[k]◦)1..n,i(Y[k]◦)1..n,i)

. (32)

By the coupling definition of total variation distance, we can enforce Pe ≤ P(E) for some event E using (17) and the notationin (21) by

prle(

{(Z[k]◦) = (¯Z[k]◦)}, E

)

. (33)

Finally, since the network admits a communication scheme if and only if Pe ≤ P(E) is possible for any P(E) > 0, the finalformula Qk is

∀E.(

isev(E) ∧ Eι

6= ∅ →

∃n,W[k], LW[k], X[k], LX[k]

, Y[k], LY[k], Z[k], LZ[k]

,

W[k], LW[k], X[k], LX[k]

, Y[k], LY[k], Z[k], LZ[k]

,¯Z[k], L ¯

Z[k].

(· · · ))

,

where the “· · · ” in the last line is the conjunction of (26)–(33).

VIII. DEFINITION OF SINGLE-LETTER CHARACTERIZATION

Theorem 10 shows that the capacity regions of a large class of multiuser settings can be expressed as first-order formulae.Arguably, this single-letter characterization is against the spirit of single-letter characterizations in network information theory,considering its complexity. Theorem 10 suggests that general first-order formulae are perhaps too powerful, and focusing onfirst-order “single-letter” formulae is not a meaningful restriction.

As there was no generally accepted definition of single-letter characterization, Körner [44] raised the question of finding alogical theory on single-letter characterizations. After this, to the best of the author’s knowledge, the only attempt in providinga definition of single-letter characterization is [43], which studies single-letter characterizations in the form of a conjunctionof linear inequalities on mutual information terms, where the alphabets of all auxiliary random variables are fixed, whichmight be a little restrictive (refer to Remark 16). In this section, we propose some possible definitions that are (in a sense)more general than [43], yet are more restrictive than general first-order formulae. This may allow open problems regardingsingle-letter characterizations of capacity regions to be stated in a rigorous manner.

We first define the probabilistic independence hierarchy in a similar manner as the arithmetical hierarchy [58] and the Lévyhierarchy [67].

Definition 11 (The probabilistic independence hierarchy). A first-order formula is in the set ∆pi0 = Σpi

0 = Πpi0 if and

only if it is logically equivalent to a quantifier-free (i.e., all variables are free) formula. We define ∆pii ,Σ

pii ,Π

pii recur-

sively. A first-order formula is in the set Σpii+1 (i ≥ 0) if and only if it is logically equivalent to a formula in the form

∃U1, . . . , Uk. P (X1, . . . , Xn, U1, . . . , Uk) where P is a formula in Πpii . A first-order formula is in the set Πpi

i+1 (i ≥ 0) if andonly if it is logically equivalent to a formula in the form ∀U1, . . . , Uk. P (X1, . . . , Xn, U1, . . . , Uk) where P is a formula inΣpi

i . We then define ∆pii = Σpi

i ∩Πpii .

As a direct corollary of [14], Σpi3 and Πpi

3 are undecidable fragments of FOTPI. Finding the lowest level in the probabilisticindependence hierarchy which is undecidable is left for future studies.

Proposition 12. The problem of deciding the truth value of a Σpi3 formula without free variables is undecidable. The same is

also true for Πpi3 .

Proof: Note that Xι≤ Y (1) is Πpi

1 , Z ι= XY (2) (which can be extended to Z

ι= X1 · · ·Xn) is Πpi

2 , X ⊥⊥ Y |Z (3)is Σpi

3 , and card=2(X) (5) is Πpi2 . The undecidability of Σpi

3 follows from [14, Cor. 3], which states that the problem ofdeciding whether there exists random variables X1, . . . , Xn satisfying some conditional independence constraints where X1 isnondegenerate binary is undecidable. The undecidability of Πpi

3 is due to the fact that the negation of a Σpi3 formula is a Πpi

3

formula.

In FOTPI, the atomic formulae are probabilistic independence statements. In network information theory, rate regions aremore often stated using linear inequalities on entropy terms. We can also define another hierarchy with linear inequalities asatomic formulae.

Definition 13 (The linear entropy hierarchy). A first-order formula is in the set ∆Hatom if and only if it is either in the form

aTh(X1, . . . , Xn) ≥ 0 for some a ∈ Q2n−1 (where h(X1, . . . , Xn) ∈ R2n−1 is the entropic vector [17] of (X1, . . . , Xn)), or

in the form Y |X1, . . . , Xnr∼ Y |X1, . . . , Xn (see Proposition 7) 7. A first-order formula is in the set ∆H

0 = ΣH0 = ΠH

0 if andonly if it is logically equivalent to a composition of ∆H

atom formulae using logical conjunction, disjunction and negation. Wedefine ∆H

i ,ΣHi ,Π

Hi in a similar manner as ∆pi

i ,Σpii ,Π

pii .

Note that ΣH1 shares some similarities with [43] (refer to Remark 16). Since entropy and the r

∼ relation can be defined inFOTPI, the linear entropy hierarchy is in FOTPI as well. Similar to Proposition 12, ΣH

2 and Πpi2 are undecidable fragments of

FOTPI.

Proposition 14. The problem of deciding the truth value of a ΣH2 formula without free variables is undecidable. The same is

also true for Πpi2 . Also, the problem of deciding the truth value of P (X), where P is a ΣH

1 formula and X ∼ Bern(1/2), is

undecidable. The same is also true for Πpi1 .

Proof: Note that card=2(X) (5) is ΠH1 . The result follows from [14, Cor. 3], which states that the problem of deciding

whether there exists X1, . . . , Xn satisfying some conditional independence constraints where X1 is nondegenerate binary (alsoholds for the constraint X1 ∼ Bern(1/2)) is undecidable.

Theorem 10 states that for a general class of multiuser coding settings, the capacity region can be stated as a first-orderformula, and hence is in the linear entropy hierarchy.

7This is needed for channel coding problems to allow changing the input distribution. For example, the capacity region of the point-to-point channel pY |X

is ∃X, Y . Y |Xr∼ Y |X ∧ H(M) ≤ I(X ; Y ) (where the rate is R = H(M)). This is also useful for outer bounds involving coupling, e.g. [68].

∆H18

e.g. capacity region of any setting in Theorem 10

...

∆H3

ΣH2

ΠH2

e.g. outer bounds in [33], [34], [35], [36]

∆H2

ΣH1

ΠH1

e.g. degraded broadcast channel [71], [49],

multiple access channel [72], [73], [38]

∆H1

∆H0

= ΣH0

= ΠH0

e.g. Slepian-Wolf [70]

Table ITHE LINEAR ENTROPY HIERARCHY AND EXAMPLES OF CAPACITY REGIONS AND BOUNDS AT EACH LEVEL.

Proposition 15. The formula for the capacity region of the joint source-channel Markov network in Theorem 10 is in ∆H18.

Proof: Note that X ⊥⊥ Y can be expressed as H(XY ) − H(X) − H(Y ) ≥ 0, so Theorem 10 is still valid when theatomic formulae are linear inequalities on entropy terms. Technically, using linear inequalities would restrict attention to randomvariables with finite entropy, though it can be checked that the construction in Theorem 10 does not require any random variablewith infinite entropy (as long as the input random variables have finite entropy).

By tracing the construction in the proof of Theorem 10 and counting the depth of quantifier alternation, we find out thatthe formula is in ΠH

17 ⊆ ∆H18.

Using the linear entropy hierarchy, we propose some possible definitions of single-letter characterization, in decreasing orderof generality.

1) Any first-order formula in⋃

i≥0 ∆Hi . By Theorem 10, the capacity regions of a large class of multiuser settings can

be expressed as first-order formulae.

2) A first-order formula in ∆Hi , ΣH

i or ΠHi for a fixed i ≥ 2. For example, we may restrict attention to ΠH

2 to allowonly existential formulae and “for all, there exists” formulae (e.g. [33], [34], [35], [36]).

3) A first-order formula in ΣH1 , i.e., an existential formula. The majority of capacity regions and bounds in network

information theory [29] are ΣH1 formulae, where all auxiliary random variables are existentially quantified. However,

restricting to ΣH1 does not give any provable benefit on the ease of computation of the region, since Proposition 14

shows that such formula is undecidable even for the simple input distribution X ∼ Bern(1/2). Therefore, restricting toΣH

1 formulae is merely a simplicity concern rather than a computability concern.

4) A first-order formula in ΣH1 , together with cardinality bounds for all existentially-quantified random variables.

Cardinality bounds on auxiliary random variables are given for many capacity regions in network information theory [29],which might allow more efficient computation. Technically, we should restrict the cardinality bounds to be computablefunctions of the cardinalities of the (non-auxiliary) random variables in the coding setting. Considering that logarithm isinvolved in the definition of entropy, whether the cardinality bounds can make the regions computable depends on thesolution to Tarski’s exponential function problem [69] (also see [28], [11]).

5) A first-order formula in ∆H0 = ΣH

0 = ΠH0 . This would disallow auxiliary random variables, and hence is probably too

restrictive. Regions without auxiliary random variables such as the Slepian-Wolf region [70] are in ∆H0 .

Perhaps, instead of setting a fixed limit on which levels in the linear entropy hierarchy counts as single-letter characterizations,we can find the lowest level that can express the capacity region of a given setting. Theorem 10 implies that such level alwaysexists. Given the linear entropy hierarchy, we can state open problems in network information theory, e.g. for the broadcastchannel, as follows.

Open problem. What is the lowest level in the linear entropy hierarchy ∆Hi ,Σ

Hi ,Π

Hi such that the capacity region of the

broadcast channel pY,Z|X can be expressed as a first-order formula (with free variables X,Y, Z,W1,W2, where Wi representsthe rate Ri via Ri = H(Wi)) in that level?

Theorem 10 implies that the capacity region is expressible in ∆H18 (and hence in ΣH

18 and ΠH18). The remaining question

is to find the lowest possible level. If the conjecture that the 3-auxiliary Marton’s inner bound [74], [75], [76] is optimal iscorrect, then the lowest level would be ΣH

1 . Some outer bounds for the broadcast channel are the UV outer bound [77], [78](ΣH

1 formula, suboptimal) and the J version of the UV outer bound [36] (ΠH2 formula, optimality unknown).

We can also raise the same question for interference channel [38]. The Han-Kobayashi inner bound [79] (ΣH1 formula) was

shown to be suboptimal by Nair, Xia and Yazdanpanah [80]. To the best of the author’s knowledge, there is no candidatesingle-letter characterization of the capacity region that is conjectured to be optimal. Some outer bounds for the interferencechannel are given in [81] (ΣH

1 formula), [82] (ΣH1 formula), [83] (ΣH

1 formula), and [36] (ΠH2 formula). Perhaps the reason

we are unable to give a ΣH1 candidate for the capacity region is that the actual lowest possible level is ΠH

2 (or higher).

Remark 16. In [43], two slightly different definitions of single-letter characterizations are given (specialized to the case wherethere is only one rate R):

• [43, eqn (1)]: A formula which is the conjunction of inequalities in the form βR +∑

i αiI(UAi;UBi

|UCi) ≤ 0 (where

UAi= {Ua}a∈Ai

, Ai, Bi, Ci ⊆ [n], β, αi ∈ R are computable; note that β can be nonpositive) and polynomial constraintson the joint probability mass function.

• [43, eqn (3)]: A formula which is the conjunction of inequalities in the form R ≤∑

i αiI(UAi;UBi

|UCi) and polynomial

constraints on the joint probability mass function.The proof of the non-existence of single-letter characterization for the Markov channel in [43] is performed on the seconddefinition, which is more restrictive than the first definition. While [43] shows that the optimal rate in the second definitioncan be computed up to arbitrary precision, whether this also holds for the first definition is not entirely clear (it may dependon the solution to Tarski’s exponential function problem [69]).

The first definition [43, eqn (1)] is closer to ΣH1 in this paper. Still there are some important differences. In [43], the

alphabets of all random variables are fixed, and general polynomial constraints on the joint probability mass function areallowed. These are not the case for ΣH

1 . The fixed alphabet assumption is crucial to the proof of the non-existence of single-letter characterization in [43] (since fixing the alphabet transforms the problem into an existential problem on real numbers,which are the entries of the joint probability mass function). On the other hand, unlike [43], ΣH

1 allows logical negation anddisjunction, making the set of rate regions expressible in ΣH

1 closed under union. It is unclear whether the Markov networkin [43], or the joint source-channel Markov network in this paper, has a capacity region expressible in ΣH

1 .

IX. EXTENSION TO CONTINUOUS RANDOM VARIABLES

In the previous sections, we assumed all random variables are discrete. In this section, we consider an extension to generalrandom variables (in the standard probability space). It is unclear a priori whether this extension will increase or decrease theinterpretability strength. For example, the true first-order theory of natural numbers is undecidable, whereas the true first-ordertheory of real numbers is decidable [69], [84]. Hence, it is possible that a theory in a continuous setting is not stronger thanits discrete counterpart. Nevertheless, we will show that the condition that a random variable is discrete can be defined usinga first-order formula, and hence the first-order theory of general random variables is at least as expressive as that of discreterandom variables.

Previously, we assumed all random variables are defined on the same standard probability space ([0, 1],F , P ). This ispossible since the standard probability space is rich enough to allow defining any new discrete random variable in addition toan existing finite collection of discrete random variables. This is no longer possible for continuous random variables, since onerandom variable (e.g. the random variable defined by the identity map) may already exhaust the randomness in the space, andit is impossible to define any nondegenerate random variable independent of it. Therefore, we have to assign slightly differentsemantics to logical symbols, in order to allow extension of the probability space when new random variables are introduced.

We define the first-order theory of general random variables. Assume all random variables take value in [0, 1]. Thepredicate ψ(X1, . . . , Xn) should be interpreted as a predicate on the joint probability distribution PX1,...,Xn

over [0, 1]n,i.e., ψ(X1, . . . , Xn) is a shorthand for ψ(PX1,...,Xn

). For a first-order predicate ψ(X1, . . . , Xn, Y ) on PX1,...,Xn,Y , theformula ∃Y.ψ(X1, . . . , Xn, Y ) (which is a predicate on PX1,...,Xn

) means ∃PX1,...,Xn,Y .((PX1,...,Xn,Y )X1,...,Xn= PX1,...,Xn

∧ψ(PX1,...,Xn,Y )), that is, there exists joint distribution PX1,...,Xn,Y over [0, 1]n+1 such that its (X1, . . . , Xn)-marginal (PX1,...,Xn,Y )X1,...,Xn

is PX1,...,Xn, and ψ(PX1,...,Xn,Y ) holds. We define ∀Y.ψ(X1, . . . , Xn, Y ) similarly. For logical conjunction

ψ1(X1, . . . , Xl, Y1, . . . , Ym) ∧ ψ2(X1, . . . , Xl, Z1, . . . , Zn),

it is interpreted as

ψ1((PX1,...,Xl,Y1,...,Ym,Z1,...,Zn)X1,...,Xl,Y1,...,Ym

)

∧ ψ2((PX1,...,Xl,Y1,...,Ym,Z1,...,Zn)X1,...,Xl,Z1,...,Zn

),

which is a predicate on PX1,...,Xl,Y1,...,Ym,Z1,...,Zn. Define logical disjunction similarly. We can therefore define any first-order

formula recursively (the base cases are ∃Y.ψ(Y ) and ∀Y.ψ(Y ) without any free variable, which are interpreted as ∃PY .ψ(PY )and ∀PY .ψ(PY ) respectively).

We now check that some of the formulae in the previous sections still hold in the first-order theory of general randomvariables. The condition U ι

= ∅ (i.e., U is almost surely a constant, or PU is a degenerate distribution) can be checked using

the same formula U ι= ∅ ⇔ U ⊥⊥ U . For general random variables, X

ι≤ Y means there exists a measurable function

f : [0, 1] → [0, 1] such that X = f(Y ) with probability 1. We can check that (1)

Xι≤ Y ⇔ ∀U.

(

U ⊥⊥ Y → U ⊥⊥ X)

is still valid for general random variables. It is straightforward to check that Xι≤ Y ⇒ ∀U.(U ⊥⊥ Y → U ⊥⊥ X). For the

other direction, assume Xι

� Y . Then there exists measurable A ⊆ [0, 1] such that 1A(X) is not almost surely a function ofY (i.e., P(P(X ∈ A|Y ) ∈ {0, 1}) < 1). Note that 0 < P(X ∈ A) < 1. Let

U |(X = x, Y = y) ∼

{

Unif[0,P(X ∈ A|Y = y)] if x ∈ A

Unif[P(X ∈ A|Y = y), 1] if x /∈ A.

It is straightforward to check that U ∼ Unif[0, 1] is independent of Y . Assume the contrary that U is independent of X , wehave

1

2= E[U |X ∈ A] = E

[

1

2P(X ∈ A|Y )

∣X ∈ A

]

,

which implies P(X ∈ A|Y ) = 1 given X ∈ A with probability 1. Also,

1

2= E[U |X /∈ A] = E

[

1

2+

1

2P(X ∈ A|Y )

∣X /∈ A

]

,

which implies P(X ∈ A|Y ) = 0 given X /∈ A with probability 1. Therefore, P(P(X ∈ A|Y ) ∈ {0, 1}) = 1, contradictingthe assumption that 1A(X) is not almost surely a function of Y .

It is straightforward to check that the formula (4) for card≤n(X) (which checks whether there exists S ⊆ [0, 1] with |S| ≤ nand P(X ∈ S) = 1) still holds, and the formula (11) for single-mass indicator still holds. Therefore, we can check whetherthe distribution of X is atomless (i.e., P(X = x) = 0 for any x) by

atomless(X) := ¬∃U.smi(X,U).

Finally, we define the formula for discrete random variables. Since any probability distribution can be decomposed into amixture of a discrete distribution and an atomless distribution, checking whether X is a discrete random variable is equivalentto checking if it does not contain an atomless component, i.e., a measurable set S ⊆ [0, 1] such that P(X ∈ S) > 0 andP(X = x) = 0 for any x ∈ S. This can be checked by the formula

discrete(X) := ¬∃V,W.(

VWι≤ X ∧ card=2(V ) ∧ card=2(W ) ∧ card=3(VW )

∧ ¬∃U.(smi(X,U) ∧ smi(VWU, V ) ∧ smi(VWU,W )))

. (34)

To check this, we first show that if there exists V,W satisfying the above constraints, then X has an atomless component.Since card=3(VW ), we may assume (V,W ) ∈ {(0, 0), (1, 0), (0, 1)}. Let S be the set of values of X corresponding to(V,W ) = (0, 0). Assume the contrary that S is not an atomless component. Then there exists x ∈ S such that P(X = x) > 0.Let U = 1{X = x}. It is straightforward to check that U satisfies the constraints in (34), leading to a contradiction.

We then show that if X has an atomless component, then there exists V,W satisfying (34). Let S ⊆ [0, 1] such thatP(X ∈ S) > 0 and P(X = x) = 0 for any x ∈ S. Divide S into three disjoint sets S0, S1, S2 with positive probabilities.Let S0,0 := S0, S0,1 := S1, S1,0 := [0, 1]\(S0,0 ∪ S0,1). Let (V,W ) = (0, 0) if X ∈ S0,0, (V,W ) = (0, 1) if X ∈ S0,1,(V,W ) = (1, 0) if X ∈ S1,0. Assume the contrary that there exists U satisfying the constraints in (34). Since smi(X,U),we assume U = 1{x0}(X), P(X = x0) > 0. Since smi(VWU, V ), U is degenerate conditional on (V,W ) = (1, 0), andhence x0 /∈ S1,0 (note that S1,0 6= {x0} since S2 ⊆ S1,0 is atomless). Since smi(VWU,W ), U is degenerate conditional on(V,W ) = (0, 1), and hence x0 /∈ S0,1. Hence we have x0 ∈ S0,0 ⊆ S, leading to a contradiction.

As a result, the first-order theory of general random variables is at least as expressive as the first-order theory of discreterandom variables (if we can impose discrete(X) on each variable X in the theory of general random variables, it reducesto the theory of discrete random variables). Hence, the first-order theory of general random variables is also algorithmicallyundecidable.

We remark that it is unclear whether Theorem 10 holds for channels with continuous input or output alphabet. The aboveargument only shows that the capacity of discrete channels can still be characterized using a first-order formula with generalrandom variables by the construction in Theorem 10.

X. ACKNOWLEDGEMENT

The author acknowledges support from the Direct Grant for Research, The Chinese University of Hong Kong (Project ID:4055133). The author would like to thank Chandra Nair and Raymond W. Yeung for their invaluable comments.

REFERENCES

[1] D. Geiger, A. Paz, and J. Pearl, “Axioms and algorithms for inferences involving probabilistic independence,” Information and Computation, vol. 91,no. 1, pp. 128–141, 1991.

[2] F. Matúš, “Stochastic independence, algebraic independence and abstract connectedness,” Theoretical Computer Science, vol. 134, no. 2, pp. 455–471,1994.

[3] A. P. Dawid, “Conditional independence in statistical theory,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 41, no. 1, pp.1–15, 1979.

[4] W. Spohn, “Stochastic independence, causal independence, and shieldability,” Journal of Philosophical logic, vol. 9, no. 1, pp. 73–99, 1980.[5] M. Mouchart and J.-M. Rolin, “A note on conditional independence,” Statistica, vol. 44, p. 557, 1984.[6] J. Pearl and A. Paz, “Graphoids: a graph-based logic for reasoning about relevance relations,” Advances in Artificial Intelligence, pp. 357–363, 1987.[7] M. Studený, “Multiinformation and the problem of characterization of conditional independence relations,” Problems of Control and Information Theory,

no. 18, pp. 3–16, 1989.[8] ——, “Conditional independence relations have no finite complete characterization,” Information Theory, Statistical Decision Functions and Random

Processes, pp. 377–396, 1992.[9] F. M. Malvestuto, “A unique formal system for binary decompositions of database relations, probability distributions, and graphs,” Information Sciences,

vol. 59, no. 1-2, pp. 21–52, 1992.[10] D. Geiger and J. Pearl, “Logical and algorithmic properties of conditional independence and graphical models,” The Annals of Statistics, pp. 2001–2021,

1993.[11] M. A. Khamis, P. G. Kolaitis, H. Q. Ngo, and D. Suciu, “Decision problems in information theory,” arXiv preprint arXiv:2004.08783, 2020.[12] M. Niepert, “Logical inference algorithms and matrix representations for probabilistic conditional independence,” arXiv preprint arXiv:1205.2621, 2012.[13] M. Hannula, Å. Hirvonen, J. Kontinen, V. Kulikov, and J. Virtema, “Facets of distribution identities in probabilistic team semantics,” in European

Conference on Logics in Artificial Intelligence. Springer, 2019, pp. 304–320.[14] C. T. Li, “The undecidability of conditional affine information inequalities and conditional independence implication with a binary constraint,” arXiv

preprint arXiv:2104.05634, 2021.[15] R. W. Yeung, “A framework for linear information inequalities,” IEEE Trans. Inf. Theory, vol. 43, no. 6, pp. 1924–1934, 1997.[16] N. Pippenger, “What are the laws of information theory,” in 1986 Special Problems on Communication and Computation Conference, 1986, pp. 3–5.[17] Z. Zhang and R. W. Yeung, “A non-Shannon-type conditional inequality of information quantities,” IEEE Trans. Inf. Theory, vol. 43, no. 6, pp. 1982–1986,

1997.[18] ——, “On characterization of entropy function via information inequalities,” IEEE Trans. Inf. Theory, vol. 44, no. 4, pp. 1440–1452, 1998.[19] K. Makarychev, Y. Makarychev, A. Romashchenko, and N. Vereshchagin, “A new class of non-Shannon-type inequalities for entropies,” Communications

in Information and Systems, vol. 2, no. 2, pp. 147–166, 2002.[20] R. Dougherty, C. Freiling, and K. Zeger, “Six new non-Shannon information inequalities,” in 2006 IEEE ISIT. IEEE, 2006, pp. 233–236.[21] F. Matúš, “Infinitely many information inequalities,” in 2007 IEEE ISIT. IEEE, 2007, pp. 41–44.[22] W. Xu, J. Wang, and J. Sun, “A projection method for derivation of non-Shannon-type information inequalities,” in 2008 IEEE ISIT. IEEE, 2008, pp.

2116–2120.[23] R. Dougherty, C. Freiling, and K. Zeger, “Non-Shannon information inequalities in four random variables,” arXiv preprint arXiv:1104.3602, 2011.[24] R. W. Yeung, Information theory and network coding. Springer Science & Business Media, 2008.[25] T. Chan and A. Grant, “Dualities between entropy functions and network codes,” IEEE Trans. Inf. Theory, vol. 54, no. 10, pp. 4470–4487, 2008.[26] X. Yan, R. W. Yeung, and Z. Zhang, “An implicit characterization of the achievable rate region for acyclic multisource multisink network coding,” IEEE

Trans. Inf. Theory, vol. 58, no. 9, pp. 5625–5639, 2012.[27] R. Dougherty, “Is network coding undecidable?” in Applications of Matroid Theory and Combinatorial Optimization to Information and Coding Theory,

2009.[28] A. Gómez, C. Mejía, and J. A. Montoya, “Network coding and the model theory of linear information inequalities,” in 2014 International Symposium

on Network Coding (NetCod). IEEE, 2014, pp. 1–6.[29] A. El Gamal and Y.-H. Kim, Network information theory. Cambridge University Press, 2011.[30] I. Csiszar and J. Körner, Information theory: coding theorems for discrete memoryless systems. Cambridge University Press, 2011.[31] C. T. Li and A. El Gamal, “Strong functional representation lemma and applications to coding theorems,” IEEE Trans. Inf. Theory, vol. 64, no. 11, pp.

6967–6978, Nov 2018.[32] C. T. Li, “An automated theorem proving framework for information-theoretic results,” arXiv preprint arXiv:2101.12370, 2021.[33] A. B. Wagner and V. Anantharam, “An improved outer bound for multiterminal source coding,” IEEE Transactions on Information Theory, vol. 54,

no. 5, pp. 1919–1937, 2008.[34] A. A. Gohari and V. Anantharam, “Information-theoretic key agreement of multiple terminals – Part I,” IEEE Trans. Inf. Theory, vol. 56, no. 8, pp.

3973–3996, 2010.[35] L. Yu, H. Li, and W. Li, “Distortion bounds for source broadcast problems,” IEEE Trans. Inf. Theory, vol. 64, no. 9, pp. 6034–6053, 2018.

[36] A. Gohari and C. Nair, “Outer bounds for multiuser settings: The auxiliary receiver approach,” 2020. [Online]. Available:http://chandra.ie.cuhk.edu.hk/pub/papers/NIT/Auxiliary-Receiver.pdf

[37] T. Cover, “Broadcast channels,” IEEE Transactions on Information Theory, vol. 18, no. 1, pp. 2–14, 1972.[38] R. Ahlswede, “The capacity region of a channel with two senders and two receivers,” The annals of probability, vol. 2, no. 5, pp. 805–814, 1974.[39] E. C. Van Der Meulen, “Three-terminal communication channels,” Advances in applied Probability, vol. 3, no. 1, pp. 120–154, 1971.[40] A. J. Goldsmith and P. P. Varaiya, “Capacity, mutual information, and coding for finite-state markov channels,” IEEE transactions on Information Theory,

vol. 42, no. 3, pp. 868–886, 1996.[41] D. Elkouss and D. Pérez-García, “Memory effects can make the transmission capability of a communication channel uncomputable,” Nature

communications, vol. 9, no. 1, pp. 1–5, 2018.[42] H. Boche, R. F. Schaefer, and H. V. Poor, “Shannon meets Turing: Non-computability and non-approximability of the finite state channel capacity,”

arXiv preprint arXiv:2008.13270, 2020.[43] M. Agarwal, “Non-existence of certain kind of finite-letter mutual information characterization for a class of time-invariant Markoff channels,” arXiv

preprint arXiv:1804.05977, 2018.[44] J. Körner, “The concept of single-letterization in information theory,” in Open Problems in Communication and Computation. Springer, 1987, pp.

35–36.[45] S. Rini and A. Goldsmith, “A general approach to random coding for multi-terminal networks,” in 2013 Information Theory and Applications Workshop

(ITA). IEEE, 2013, pp. 1–9.[46] P. Minero, S. H. Lim, and Y.-H. Kim, “A unified approach to hybrid coding,” IEEE Trans. Inf. Theory, vol. 61, no. 4, pp. 1509–1523, 2015.[47] S.-H. Lee and S.-Y. Chung, “A unified approach for network information theory,” in 2015 IEEE ISIT. IEEE, 2015, pp. 1277–1281.[48] ——, “A unified random coding bound,” IEEE Transactions on Information Theory, vol. 64, no. 10, pp. 6779–6802, 2018.[49] R. G. Gallager, “Capacity and coding for degraded broadcast channels,” Problemy Peredachi Informatsii, vol. 10, no. 3, pp. 3–14, 1974.[50] I. Córdoba-Sánchez, C. Bielza, and P. Larranaga, “Graphoids and separoids in model theory,” Technical Report TR: UPM-ETSIINF/DIA/2016-1,

Universidad Politécnica de Madrid, Tech. Rep., 2016.[51] B. Hassibi and S. Shadbakht, “Normalized entropy vectors, network information theory and convex optimization,” in 2007 IEEE Information Theory

Workshop on Information Theory for Wireless Networks. IEEE, 2007, pp. 1–5.[52] N. J. Nilsson, “Probabilistic logic,” Artificial intelligence, vol. 28, no. 1, pp. 71–87, 1986.[53] D. Koller and J. Y. Halpern, “Irrelevance and conditioning in first-order probabilistic logic,” in AAAI/IAAI, Vol. 1. Citeseer, 1996, pp. 569–576.[54] F. G. Cozman, C. P. de Campos, and J. C. F. da Rocha, “Probabilistic logic with independence,” International Journal of Approximate Reasoning, vol. 49,

no. 1, pp. 3–17, 2008.[55] H. Li and E. K. Chong, “On a connection between information and group lattices,” Entropy, vol. 13, no. 3, pp. 683–708, 2011.[56] G. S. Boolos, J. P. Burgess, and R. C. Jeffrey, Computability and logic. Cambridge university press, 2002.[57] A. Tarski, Pojecie prawdy w jezykach nauk dedukcyjnych. Nakł. ÊŸTow. Naukowego Warszawskiego, 1933, no. 34.[58] H. Rogers, Theory of recursive functions and effective computability. McGraw-Hill (New York, NY), 1967.[59] Z. Páles, “On the essential union and intersection of families of measurable sets,” Annales Univ. Sci. Budapest., Sect. Comp., vol. 51, pp. 173–177, 2020.[60] K. Gödel, “Über formal unentscheidbare sätze der principia mathematica und verwandter systeme i,” Monatshefte für mathematik und physik, vol. 38,

no. 1, pp. 173–198, 1931.[61] ——, “On undecidable propositions of formal mathematical systems, mimeographed lecture notes by sc kleene and jb rosser,” Institute for Advanced

Study, Princeton, NJ, pp. 39–74, 1934.[62] P. Smith, An introduction to Gödel’s theorems. Cambridge University Press, 2013.[63] M. R. Aref, “Information flow in relay networks,” Ph.D. dissertation, Stanford University, 1981.[64] A. El Gamal, “On information flow in relay networks,” in NTC’81; National Telecommunications Conference, Volume 2, vol. 2, 1981, pp. D4–1.[65] S. H. Lim, Y.-H. Kim, A. El Gamal, and S.-Y. Chung, “Noisy network coding,” IEEE Transactions on Information Theory, vol. 57, no. 5, pp. 3132–3152,

2011.[66] P. Cuff, H. Permuter, and T. M. Cover, “Coordination capacity,” IEEE Trans. Inf. Theory, vol. 56, no. 9, pp. 4181–4206, Sept 2010.[67] A. Lévy, A hierarchy of formulas in set theory. American Mathematical Soc., 1965, no. 57.[68] H. Sato, “An outer bound to the capacity region of broadcast channels (corresp.),” IEEE Transactions on Information Theory, vol. 24, no. 3, pp. 374–377,

1978.[69] A. Tarski, “A decision method for elementary algebra and geometry,” RAND Corporation, Santa Monica, CA, 1948.[70] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. Inf. Theory, vol. 19, no. 4, pp. 471–480, Jul. 1973.[71] P. Bergmans, “Random coding theorem for broadcast channels with degraded components,” IEEE Trans. Inf. Theory, vol. 19, no. 2, pp. 197–207, 1973.[72] R. Ahlswede, “Multi-way communication channels,” in 2nd Int. Symp. Inform. Theory, Tsahkadsor, Armenian SSR, 1971, pp. 23–52.[73] H. Liao, “Multiple access channels,” Ph.D. dissertation, University of Hawaii, Honolulu, HI, 1972.[74] K. Marton, “A coding theorem for the discrete memoryless broadcast channel,” IEEE Trans. Inf. Theory, vol. 25, no. 3, pp. 306–311, May 1979.[75] S. I. Gel’fand and M. S. Pinsker, “Capacity of a broadcast channel with one deterministic component,” Problemy Peredachi Informatsii, vol. 16, no. 1,

pp. 24–34, 1980.[76] Y. Liang and G. Kramer, “Rate regions for relay broadcast channels,” IEEE Trans. Inf. Theory, vol. 53, no. 10, pp. 3517–3535, Oct 2007.[77] A. El Gamal, “The capacity of a class of broadcast channels,” IEEE Trans. Inf. Theory, vol. 25, no. 2, pp. 166–169, 1979.[78] C. Nair and A. El Gamal, “An outer bound to the capacity region of the broadcast channel,” IEEE Trans. Inf. Theory, vol. 53, no. 1, pp. 350–355, 2006.[79] T. S. Han and K. Kobayashi, “A new achievable rate region for the interference channel,” IEEE transactions on information theory, vol. 27, no. 1, pp.

49–60, 1981.[80] C. Nair, L. Xia, and M. Yazdanpanah, “Sub-optimality of han-kobayashi achievable region for interference channels,” in 2015 IEEE International

Symposium on Information Theory (ISIT). IEEE, 2015, pp. 2416–2420.[81] H. Sato, “Two-user communication channels,” IEEE transactions on information theory, vol. 23, no. 3, pp. 295–304, 1977.[82] A. Carleial, “Outer bounds on the capacity of interference channels (corresp.),” IEEE transactions on information theory, vol. 29, no. 4, pp. 602–606,

1983.[83] R. H. Etkin and E. Ordentlich, “Analysis of deterministic binary interference channels via a general outer bound,” IEEE transactions on information

theory, vol. 57, no. 5, pp. 2597–2604, 2011.[84] A. Seidenberg, “A new decision method for elementary algebra,” Annals of Mathematics, pp. 365–374, 1954.