18.600: lecture 23 .1in conditional probability, order
TRANSCRIPT
![Page 1: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/1.jpg)
18.600: Lecture 23
Conditional probability, order statistics,expectations of sums
Scott Sheffield
MIT
![Page 2: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/2.jpg)
Outline
Conditional probability densities
Order statistics
Expectations of sums
![Page 3: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/3.jpg)
Outline
Conditional probability densities
Order statistics
Expectations of sums
![Page 4: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/4.jpg)
Conditional distributions
I Let’s say X and Y have joint probability density functionf (x , y).
I We can define the conditional probability density of X giventhat Y = y by fX |Y=y (x) = f (x ,y)
fY (y) .
I This amounts to restricting f (x , y) to the line correspondingto the given y value (and dividing by the constant that makesthe integral along that line equal to 1).
I This definition assumes that fY (y) =∫∞−∞ f (x , y)dx <∞ and
fY (y) 6= 0. Is that safe to assume?
I Usually...
![Page 5: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/5.jpg)
Conditional distributions
I Let’s say X and Y have joint probability density functionf (x , y).
I We can define the conditional probability density of X giventhat Y = y by fX |Y=y (x) = f (x ,y)
fY (y) .
I This amounts to restricting f (x , y) to the line correspondingto the given y value (and dividing by the constant that makesthe integral along that line equal to 1).
I This definition assumes that fY (y) =∫∞−∞ f (x , y)dx <∞ and
fY (y) 6= 0. Is that safe to assume?
I Usually...
![Page 6: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/6.jpg)
Conditional distributions
I Let’s say X and Y have joint probability density functionf (x , y).
I We can define the conditional probability density of X giventhat Y = y by fX |Y=y (x) = f (x ,y)
fY (y) .
I This amounts to restricting f (x , y) to the line correspondingto the given y value (and dividing by the constant that makesthe integral along that line equal to 1).
I This definition assumes that fY (y) =∫∞−∞ f (x , y)dx <∞ and
fY (y) 6= 0. Is that safe to assume?
I Usually...
![Page 7: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/7.jpg)
Conditional distributions
I Let’s say X and Y have joint probability density functionf (x , y).
I We can define the conditional probability density of X giventhat Y = y by fX |Y=y (x) = f (x ,y)
fY (y) .
I This amounts to restricting f (x , y) to the line correspondingto the given y value (and dividing by the constant that makesthe integral along that line equal to 1).
I This definition assumes that fY (y) =∫∞−∞ f (x , y)dx <∞ and
fY (y) 6= 0. Is that safe to assume?
I Usually...
![Page 8: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/8.jpg)
Conditional distributions
I Let’s say X and Y have joint probability density functionf (x , y).
I We can define the conditional probability density of X giventhat Y = y by fX |Y=y (x) = f (x ,y)
fY (y) .
I This amounts to restricting f (x , y) to the line correspondingto the given y value (and dividing by the constant that makesthe integral along that line equal to 1).
I This definition assumes that fY (y) =∫∞−∞ f (x , y)dx <∞ and
fY (y) 6= 0. Is that safe to assume?
I Usually...
![Page 9: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/9.jpg)
Remarks: conditioning on a probability zero event
I Our standard definition of conditional probability isP(A|B) = P(AB)/P(B).
I Doesn’t make sense if P(B) = 0. But previous slide defines“probability conditioned on Y = y” and P{Y = y} = 0.
I When can we (somehow) make sense of conditioning onprobability zero event?
I Tough question in general.
I Consider conditional law of X given that Y ∈ (y − ε, y + ε). Ifthis has a limit as ε→ 0, we can call that the law conditionedon Y = y .
I Precisely, defineFX |Y=y (a) := limε→0 P{X ≤ a|Y ∈ (y − ε, y + ε)}.
I Then set fX |Y=y (a) = F ′X |Y=y (a). Consistent with definitionfrom previous slide.
![Page 10: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/10.jpg)
Remarks: conditioning on a probability zero event
I Our standard definition of conditional probability isP(A|B) = P(AB)/P(B).
I Doesn’t make sense if P(B) = 0. But previous slide defines“probability conditioned on Y = y” and P{Y = y} = 0.
I When can we (somehow) make sense of conditioning onprobability zero event?
I Tough question in general.
I Consider conditional law of X given that Y ∈ (y − ε, y + ε). Ifthis has a limit as ε→ 0, we can call that the law conditionedon Y = y .
I Precisely, defineFX |Y=y (a) := limε→0 P{X ≤ a|Y ∈ (y − ε, y + ε)}.
I Then set fX |Y=y (a) = F ′X |Y=y (a). Consistent with definitionfrom previous slide.
![Page 11: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/11.jpg)
Remarks: conditioning on a probability zero event
I Our standard definition of conditional probability isP(A|B) = P(AB)/P(B).
I Doesn’t make sense if P(B) = 0. But previous slide defines“probability conditioned on Y = y” and P{Y = y} = 0.
I When can we (somehow) make sense of conditioning onprobability zero event?
I Tough question in general.
I Consider conditional law of X given that Y ∈ (y − ε, y + ε). Ifthis has a limit as ε→ 0, we can call that the law conditionedon Y = y .
I Precisely, defineFX |Y=y (a) := limε→0 P{X ≤ a|Y ∈ (y − ε, y + ε)}.
I Then set fX |Y=y (a) = F ′X |Y=y (a). Consistent with definitionfrom previous slide.
![Page 12: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/12.jpg)
Remarks: conditioning on a probability zero event
I Our standard definition of conditional probability isP(A|B) = P(AB)/P(B).
I Doesn’t make sense if P(B) = 0. But previous slide defines“probability conditioned on Y = y” and P{Y = y} = 0.
I When can we (somehow) make sense of conditioning onprobability zero event?
I Tough question in general.
I Consider conditional law of X given that Y ∈ (y − ε, y + ε). Ifthis has a limit as ε→ 0, we can call that the law conditionedon Y = y .
I Precisely, defineFX |Y=y (a) := limε→0 P{X ≤ a|Y ∈ (y − ε, y + ε)}.
I Then set fX |Y=y (a) = F ′X |Y=y (a). Consistent with definitionfrom previous slide.
![Page 13: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/13.jpg)
Remarks: conditioning on a probability zero event
I Our standard definition of conditional probability isP(A|B) = P(AB)/P(B).
I Doesn’t make sense if P(B) = 0. But previous slide defines“probability conditioned on Y = y” and P{Y = y} = 0.
I When can we (somehow) make sense of conditioning onprobability zero event?
I Tough question in general.
I Consider conditional law of X given that Y ∈ (y − ε, y + ε). Ifthis has a limit as ε→ 0, we can call that the law conditionedon Y = y .
I Precisely, defineFX |Y=y (a) := limε→0 P{X ≤ a|Y ∈ (y − ε, y + ε)}.
I Then set fX |Y=y (a) = F ′X |Y=y (a). Consistent with definitionfrom previous slide.
![Page 14: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/14.jpg)
Remarks: conditioning on a probability zero event
I Our standard definition of conditional probability isP(A|B) = P(AB)/P(B).
I Doesn’t make sense if P(B) = 0. But previous slide defines“probability conditioned on Y = y” and P{Y = y} = 0.
I When can we (somehow) make sense of conditioning onprobability zero event?
I Tough question in general.
I Consider conditional law of X given that Y ∈ (y − ε, y + ε). Ifthis has a limit as ε→ 0, we can call that the law conditionedon Y = y .
I Precisely, defineFX |Y=y (a) := limε→0 P{X ≤ a|Y ∈ (y − ε, y + ε)}.
I Then set fX |Y=y (a) = F ′X |Y=y (a). Consistent with definitionfrom previous slide.
![Page 15: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/15.jpg)
Remarks: conditioning on a probability zero event
I Our standard definition of conditional probability isP(A|B) = P(AB)/P(B).
I Doesn’t make sense if P(B) = 0. But previous slide defines“probability conditioned on Y = y” and P{Y = y} = 0.
I When can we (somehow) make sense of conditioning onprobability zero event?
I Tough question in general.
I Consider conditional law of X given that Y ∈ (y − ε, y + ε). Ifthis has a limit as ε→ 0, we can call that the law conditionedon Y = y .
I Precisely, defineFX |Y=y (a) := limε→0 P{X ≤ a|Y ∈ (y − ε, y + ε)}.
I Then set fX |Y=y (a) = F ′X |Y=y (a). Consistent with definitionfrom previous slide.
![Page 16: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/16.jpg)
A word of caution
I Suppose X and Y are chosen uniformly on the semicircle{(x , y) : x2 + y2 ≤ 1, x ≥ 0}. What is fX |Y=0(x)?
I Answer: fX |Y=0(x) = 1 if x ∈ [0, 1] (zero otherwise).
I Let (θ,R) be (X ,Y ) in polar coordinates. What is fX |θ=0(x)?
I Answer: fX |θ=0(x) = 2x if x ∈ [0, 1] (zero otherwise).
I Both {θ = 0} and {Y = 0} describe the same probability zeroevent. But our interpretation of what it means to conditionon this event is different in these two cases.
I Conditioning on (X ,Y ) belonging to a θ ∈ (−ε, ε) wedge isvery different from conditioning on (X ,Y ) belonging to aY ∈ (−ε, ε) strip.
![Page 17: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/17.jpg)
A word of caution
I Suppose X and Y are chosen uniformly on the semicircle{(x , y) : x2 + y2 ≤ 1, x ≥ 0}. What is fX |Y=0(x)?
I Answer: fX |Y=0(x) = 1 if x ∈ [0, 1] (zero otherwise).
I Let (θ,R) be (X ,Y ) in polar coordinates. What is fX |θ=0(x)?
I Answer: fX |θ=0(x) = 2x if x ∈ [0, 1] (zero otherwise).
I Both {θ = 0} and {Y = 0} describe the same probability zeroevent. But our interpretation of what it means to conditionon this event is different in these two cases.
I Conditioning on (X ,Y ) belonging to a θ ∈ (−ε, ε) wedge isvery different from conditioning on (X ,Y ) belonging to aY ∈ (−ε, ε) strip.
![Page 18: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/18.jpg)
A word of caution
I Suppose X and Y are chosen uniformly on the semicircle{(x , y) : x2 + y2 ≤ 1, x ≥ 0}. What is fX |Y=0(x)?
I Answer: fX |Y=0(x) = 1 if x ∈ [0, 1] (zero otherwise).
I Let (θ,R) be (X ,Y ) in polar coordinates. What is fX |θ=0(x)?
I Answer: fX |θ=0(x) = 2x if x ∈ [0, 1] (zero otherwise).
I Both {θ = 0} and {Y = 0} describe the same probability zeroevent. But our interpretation of what it means to conditionon this event is different in these two cases.
I Conditioning on (X ,Y ) belonging to a θ ∈ (−ε, ε) wedge isvery different from conditioning on (X ,Y ) belonging to aY ∈ (−ε, ε) strip.
![Page 19: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/19.jpg)
A word of caution
I Suppose X and Y are chosen uniformly on the semicircle{(x , y) : x2 + y2 ≤ 1, x ≥ 0}. What is fX |Y=0(x)?
I Answer: fX |Y=0(x) = 1 if x ∈ [0, 1] (zero otherwise).
I Let (θ,R) be (X ,Y ) in polar coordinates. What is fX |θ=0(x)?
I Answer: fX |θ=0(x) = 2x if x ∈ [0, 1] (zero otherwise).
I Both {θ = 0} and {Y = 0} describe the same probability zeroevent. But our interpretation of what it means to conditionon this event is different in these two cases.
I Conditioning on (X ,Y ) belonging to a θ ∈ (−ε, ε) wedge isvery different from conditioning on (X ,Y ) belonging to aY ∈ (−ε, ε) strip.
![Page 20: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/20.jpg)
A word of caution
I Suppose X and Y are chosen uniformly on the semicircle{(x , y) : x2 + y2 ≤ 1, x ≥ 0}. What is fX |Y=0(x)?
I Answer: fX |Y=0(x) = 1 if x ∈ [0, 1] (zero otherwise).
I Let (θ,R) be (X ,Y ) in polar coordinates. What is fX |θ=0(x)?
I Answer: fX |θ=0(x) = 2x if x ∈ [0, 1] (zero otherwise).
I Both {θ = 0} and {Y = 0} describe the same probability zeroevent. But our interpretation of what it means to conditionon this event is different in these two cases.
I Conditioning on (X ,Y ) belonging to a θ ∈ (−ε, ε) wedge isvery different from conditioning on (X ,Y ) belonging to aY ∈ (−ε, ε) strip.
![Page 21: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/21.jpg)
A word of caution
I Suppose X and Y are chosen uniformly on the semicircle{(x , y) : x2 + y2 ≤ 1, x ≥ 0}. What is fX |Y=0(x)?
I Answer: fX |Y=0(x) = 1 if x ∈ [0, 1] (zero otherwise).
I Let (θ,R) be (X ,Y ) in polar coordinates. What is fX |θ=0(x)?
I Answer: fX |θ=0(x) = 2x if x ∈ [0, 1] (zero otherwise).
I Both {θ = 0} and {Y = 0} describe the same probability zeroevent. But our interpretation of what it means to conditionon this event is different in these two cases.
I Conditioning on (X ,Y ) belonging to a θ ∈ (−ε, ε) wedge isvery different from conditioning on (X ,Y ) belonging to aY ∈ (−ε, ε) strip.
![Page 22: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/22.jpg)
Outline
Conditional probability densities
Order statistics
Expectations of sums
![Page 23: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/23.jpg)
Outline
Conditional probability densities
Order statistics
Expectations of sums
![Page 24: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/24.jpg)
Maxima: pick five job candidates at random, choose best
I Suppose I choose n random variables X1,X2, . . . ,Xn uniformlyat random on [0, 1], independently of each other.
I The n-tuple (X1,X2, . . . ,Xn) has a constant density functionon the n-dimensional cube [0, 1]n.
I What is the probability that the largest of the Xi is less thana?
I ANSWER: an.
I So if X = max{X1, . . . ,Xn}, then what is the probabilitydensity function of X?
I Answer: FX (a) =
0 a < 0
an a ∈ [0, 1]
1 a > 1
. And
fx(a) = F ′X (a) = nan−1.
![Page 25: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/25.jpg)
Maxima: pick five job candidates at random, choose best
I Suppose I choose n random variables X1,X2, . . . ,Xn uniformlyat random on [0, 1], independently of each other.
I The n-tuple (X1,X2, . . . ,Xn) has a constant density functionon the n-dimensional cube [0, 1]n.
I What is the probability that the largest of the Xi is less thana?
I ANSWER: an.
I So if X = max{X1, . . . ,Xn}, then what is the probabilitydensity function of X?
I Answer: FX (a) =
0 a < 0
an a ∈ [0, 1]
1 a > 1
. And
fx(a) = F ′X (a) = nan−1.
![Page 26: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/26.jpg)
Maxima: pick five job candidates at random, choose best
I Suppose I choose n random variables X1,X2, . . . ,Xn uniformlyat random on [0, 1], independently of each other.
I The n-tuple (X1,X2, . . . ,Xn) has a constant density functionon the n-dimensional cube [0, 1]n.
I What is the probability that the largest of the Xi is less thana?
I ANSWER: an.
I So if X = max{X1, . . . ,Xn}, then what is the probabilitydensity function of X?
I Answer: FX (a) =
0 a < 0
an a ∈ [0, 1]
1 a > 1
. And
fx(a) = F ′X (a) = nan−1.
![Page 27: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/27.jpg)
Maxima: pick five job candidates at random, choose best
I Suppose I choose n random variables X1,X2, . . . ,Xn uniformlyat random on [0, 1], independently of each other.
I The n-tuple (X1,X2, . . . ,Xn) has a constant density functionon the n-dimensional cube [0, 1]n.
I What is the probability that the largest of the Xi is less thana?
I ANSWER: an.
I So if X = max{X1, . . . ,Xn}, then what is the probabilitydensity function of X?
I Answer: FX (a) =
0 a < 0
an a ∈ [0, 1]
1 a > 1
. And
fx(a) = F ′X (a) = nan−1.
![Page 28: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/28.jpg)
Maxima: pick five job candidates at random, choose best
I Suppose I choose n random variables X1,X2, . . . ,Xn uniformlyat random on [0, 1], independently of each other.
I The n-tuple (X1,X2, . . . ,Xn) has a constant density functionon the n-dimensional cube [0, 1]n.
I What is the probability that the largest of the Xi is less thana?
I ANSWER: an.
I So if X = max{X1, . . . ,Xn}, then what is the probabilitydensity function of X?
I Answer: FX (a) =
0 a < 0
an a ∈ [0, 1]
1 a > 1
. And
fx(a) = F ′X (a) = nan−1.
![Page 29: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/29.jpg)
Maxima: pick five job candidates at random, choose best
I Suppose I choose n random variables X1,X2, . . . ,Xn uniformlyat random on [0, 1], independently of each other.
I The n-tuple (X1,X2, . . . ,Xn) has a constant density functionon the n-dimensional cube [0, 1]n.
I What is the probability that the largest of the Xi is less thana?
I ANSWER: an.
I So if X = max{X1, . . . ,Xn}, then what is the probabilitydensity function of X?
I Answer: FX (a) =
0 a < 0
an a ∈ [0, 1]
1 a > 1
. And
fx(a) = F ′X (a) = nan−1.
![Page 30: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/30.jpg)
General order statistics
I Consider i.i.d random variables X1,X2, . . . ,Xn with continuousprobability density f .
I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .
I In particular, Y1 = min{X1, . . . ,Xn} andYn = max{X1, . . . ,Xn} is the maximum.
I What is the joint probability density of the Yi?
I Answer: f (x1, x2, . . . , xn) = n!∏n
i=1 f (xi ) if x1 < x2 . . . < xn,zero otherwise.
I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation suchthat Xj = Yσ(j)
I Are σ and the vector (Y1, . . . ,Yn) independent of each other?
I Yes.
![Page 31: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/31.jpg)
General order statistics
I Consider i.i.d random variables X1,X2, . . . ,Xn with continuousprobability density f .
I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .
I In particular, Y1 = min{X1, . . . ,Xn} andYn = max{X1, . . . ,Xn} is the maximum.
I What is the joint probability density of the Yi?
I Answer: f (x1, x2, . . . , xn) = n!∏n
i=1 f (xi ) if x1 < x2 . . . < xn,zero otherwise.
I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation suchthat Xj = Yσ(j)
I Are σ and the vector (Y1, . . . ,Yn) independent of each other?
I Yes.
![Page 32: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/32.jpg)
General order statistics
I Consider i.i.d random variables X1,X2, . . . ,Xn with continuousprobability density f .
I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .
I In particular, Y1 = min{X1, . . . ,Xn} andYn = max{X1, . . . ,Xn} is the maximum.
I What is the joint probability density of the Yi?
I Answer: f (x1, x2, . . . , xn) = n!∏n
i=1 f (xi ) if x1 < x2 . . . < xn,zero otherwise.
I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation suchthat Xj = Yσ(j)
I Are σ and the vector (Y1, . . . ,Yn) independent of each other?
I Yes.
![Page 33: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/33.jpg)
General order statistics
I Consider i.i.d random variables X1,X2, . . . ,Xn with continuousprobability density f .
I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .
I In particular, Y1 = min{X1, . . . ,Xn} andYn = max{X1, . . . ,Xn} is the maximum.
I What is the joint probability density of the Yi?
I Answer: f (x1, x2, . . . , xn) = n!∏n
i=1 f (xi ) if x1 < x2 . . . < xn,zero otherwise.
I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation suchthat Xj = Yσ(j)
I Are σ and the vector (Y1, . . . ,Yn) independent of each other?
I Yes.
![Page 34: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/34.jpg)
General order statistics
I Consider i.i.d random variables X1,X2, . . . ,Xn with continuousprobability density f .
I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .
I In particular, Y1 = min{X1, . . . ,Xn} andYn = max{X1, . . . ,Xn} is the maximum.
I What is the joint probability density of the Yi?
I Answer: f (x1, x2, . . . , xn) = n!∏n
i=1 f (xi ) if x1 < x2 . . . < xn,zero otherwise.
I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation suchthat Xj = Yσ(j)
I Are σ and the vector (Y1, . . . ,Yn) independent of each other?
I Yes.
![Page 35: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/35.jpg)
General order statistics
I Consider i.i.d random variables X1,X2, . . . ,Xn with continuousprobability density f .
I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .
I In particular, Y1 = min{X1, . . . ,Xn} andYn = max{X1, . . . ,Xn} is the maximum.
I What is the joint probability density of the Yi?
I Answer: f (x1, x2, . . . , xn) = n!∏n
i=1 f (xi ) if x1 < x2 . . . < xn,zero otherwise.
I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation suchthat Xj = Yσ(j)
I Are σ and the vector (Y1, . . . ,Yn) independent of each other?
I Yes.
![Page 36: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/36.jpg)
General order statistics
I Consider i.i.d random variables X1,X2, . . . ,Xn with continuousprobability density f .
I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .
I In particular, Y1 = min{X1, . . . ,Xn} andYn = max{X1, . . . ,Xn} is the maximum.
I What is the joint probability density of the Yi?
I Answer: f (x1, x2, . . . , xn) = n!∏n
i=1 f (xi ) if x1 < x2 . . . < xn,zero otherwise.
I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation suchthat Xj = Yσ(j)
I Are σ and the vector (Y1, . . . ,Yn) independent of each other?
I Yes.
![Page 37: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/37.jpg)
General order statistics
I Consider i.i.d random variables X1,X2, . . . ,Xn with continuousprobability density f .
I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .
I In particular, Y1 = min{X1, . . . ,Xn} andYn = max{X1, . . . ,Xn} is the maximum.
I What is the joint probability density of the Yi?
I Answer: f (x1, x2, . . . , xn) = n!∏n
i=1 f (xi ) if x1 < x2 . . . < xn,zero otherwise.
I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation suchthat Xj = Yσ(j)
I Are σ and the vector (Y1, . . . ,Yn) independent of each other?
I Yes.
![Page 38: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/38.jpg)
Example
I Let X1, . . . ,Xn be i.i.d. uniform random variables on [0, 1].
I Example: say n = 10 and condition on X1 being the thirdlargest of the Xj .
I Given this, what is the conditional probability density functionfor X1?
I Write p = X1. This kind of like choosing a random p andthen conditioning on 7 heads and 2 tails.
I Answer is beta distribution with parameters (a, b) = (8, 3).
I Up to a constant, f (x) = x7(1− x)2.
I General beta (a, b) expectation is a/(a + b) = 8/11. Mode is(a−1)
(a−1)+(b−1) = 2/9.
![Page 39: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/39.jpg)
Example
I Let X1, . . . ,Xn be i.i.d. uniform random variables on [0, 1].
I Example: say n = 10 and condition on X1 being the thirdlargest of the Xj .
I Given this, what is the conditional probability density functionfor X1?
I Write p = X1. This kind of like choosing a random p andthen conditioning on 7 heads and 2 tails.
I Answer is beta distribution with parameters (a, b) = (8, 3).
I Up to a constant, f (x) = x7(1− x)2.
I General beta (a, b) expectation is a/(a + b) = 8/11. Mode is(a−1)
(a−1)+(b−1) = 2/9.
![Page 40: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/40.jpg)
Example
I Let X1, . . . ,Xn be i.i.d. uniform random variables on [0, 1].
I Example: say n = 10 and condition on X1 being the thirdlargest of the Xj .
I Given this, what is the conditional probability density functionfor X1?
I Write p = X1. This kind of like choosing a random p andthen conditioning on 7 heads and 2 tails.
I Answer is beta distribution with parameters (a, b) = (8, 3).
I Up to a constant, f (x) = x7(1− x)2.
I General beta (a, b) expectation is a/(a + b) = 8/11. Mode is(a−1)
(a−1)+(b−1) = 2/9.
![Page 41: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/41.jpg)
Example
I Let X1, . . . ,Xn be i.i.d. uniform random variables on [0, 1].
I Example: say n = 10 and condition on X1 being the thirdlargest of the Xj .
I Given this, what is the conditional probability density functionfor X1?
I Write p = X1. This kind of like choosing a random p andthen conditioning on 7 heads and 2 tails.
I Answer is beta distribution with parameters (a, b) = (8, 3).
I Up to a constant, f (x) = x7(1− x)2.
I General beta (a, b) expectation is a/(a + b) = 8/11. Mode is(a−1)
(a−1)+(b−1) = 2/9.
![Page 42: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/42.jpg)
Example
I Let X1, . . . ,Xn be i.i.d. uniform random variables on [0, 1].
I Example: say n = 10 and condition on X1 being the thirdlargest of the Xj .
I Given this, what is the conditional probability density functionfor X1?
I Write p = X1. This kind of like choosing a random p andthen conditioning on 7 heads and 2 tails.
I Answer is beta distribution with parameters (a, b) = (8, 3).
I Up to a constant, f (x) = x7(1− x)2.
I General beta (a, b) expectation is a/(a + b) = 8/11. Mode is(a−1)
(a−1)+(b−1) = 2/9.
![Page 43: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/43.jpg)
Example
I Let X1, . . . ,Xn be i.i.d. uniform random variables on [0, 1].
I Example: say n = 10 and condition on X1 being the thirdlargest of the Xj .
I Given this, what is the conditional probability density functionfor X1?
I Write p = X1. This kind of like choosing a random p andthen conditioning on 7 heads and 2 tails.
I Answer is beta distribution with parameters (a, b) = (8, 3).
I Up to a constant, f (x) = x7(1− x)2.
I General beta (a, b) expectation is a/(a + b) = 8/11. Mode is(a−1)
(a−1)+(b−1) = 2/9.
![Page 44: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/44.jpg)
Example
I Let X1, . . . ,Xn be i.i.d. uniform random variables on [0, 1].
I Example: say n = 10 and condition on X1 being the thirdlargest of the Xj .
I Given this, what is the conditional probability density functionfor X1?
I Write p = X1. This kind of like choosing a random p andthen conditioning on 7 heads and 2 tails.
I Answer is beta distribution with parameters (a, b) = (8, 3).
I Up to a constant, f (x) = x7(1− x)2.
I General beta (a, b) expectation is a/(a + b) = 8/11. Mode is(a−1)
(a−1)+(b−1) = 2/9.
![Page 45: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/45.jpg)
Outline
Conditional probability densities
Order statistics
Expectations of sums
![Page 46: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/46.jpg)
Outline
Conditional probability densities
Order statistics
Expectations of sums
![Page 47: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/47.jpg)
Properties of expectation
I Several properties we derived for discrete expectationscontinue to hold in the continuum.
I If X is discrete with mass function p(x) thenE [X ] =
∑x p(x)x .
I Similarly, if X is continuous with density function f (x) thenE [X ] =
∫f (x)xdx .
I If X is discrete with mass function p(x) thenE [g(x)] =
∑x p(x)g(x).
I Similarly, X if is continuous with density function f (x) thenE [g(X )] =
∫f (x)g(x)dx .
I If X and Y have joint mass function p(x , y) thenE [g(X ,Y )] =
∑y
∑x g(x , y)p(x , y).
I If X and Y have joint probability density function f (x , y) thenE [g(X ,Y )] =
∫∞−∞
∫∞−∞ g(x , y)f (x , y)dxdy .
![Page 48: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/48.jpg)
Properties of expectation
I Several properties we derived for discrete expectationscontinue to hold in the continuum.
I If X is discrete with mass function p(x) thenE [X ] =
∑x p(x)x .
I Similarly, if X is continuous with density function f (x) thenE [X ] =
∫f (x)xdx .
I If X is discrete with mass function p(x) thenE [g(x)] =
∑x p(x)g(x).
I Similarly, X if is continuous with density function f (x) thenE [g(X )] =
∫f (x)g(x)dx .
I If X and Y have joint mass function p(x , y) thenE [g(X ,Y )] =
∑y
∑x g(x , y)p(x , y).
I If X and Y have joint probability density function f (x , y) thenE [g(X ,Y )] =
∫∞−∞
∫∞−∞ g(x , y)f (x , y)dxdy .
![Page 49: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/49.jpg)
Properties of expectation
I Several properties we derived for discrete expectationscontinue to hold in the continuum.
I If X is discrete with mass function p(x) thenE [X ] =
∑x p(x)x .
I Similarly, if X is continuous with density function f (x) thenE [X ] =
∫f (x)xdx .
I If X is discrete with mass function p(x) thenE [g(x)] =
∑x p(x)g(x).
I Similarly, X if is continuous with density function f (x) thenE [g(X )] =
∫f (x)g(x)dx .
I If X and Y have joint mass function p(x , y) thenE [g(X ,Y )] =
∑y
∑x g(x , y)p(x , y).
I If X and Y have joint probability density function f (x , y) thenE [g(X ,Y )] =
∫∞−∞
∫∞−∞ g(x , y)f (x , y)dxdy .
![Page 50: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/50.jpg)
Properties of expectation
I Several properties we derived for discrete expectationscontinue to hold in the continuum.
I If X is discrete with mass function p(x) thenE [X ] =
∑x p(x)x .
I Similarly, if X is continuous with density function f (x) thenE [X ] =
∫f (x)xdx .
I If X is discrete with mass function p(x) thenE [g(x)] =
∑x p(x)g(x).
I Similarly, X if is continuous with density function f (x) thenE [g(X )] =
∫f (x)g(x)dx .
I If X and Y have joint mass function p(x , y) thenE [g(X ,Y )] =
∑y
∑x g(x , y)p(x , y).
I If X and Y have joint probability density function f (x , y) thenE [g(X ,Y )] =
∫∞−∞
∫∞−∞ g(x , y)f (x , y)dxdy .
![Page 51: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/51.jpg)
Properties of expectation
I Several properties we derived for discrete expectationscontinue to hold in the continuum.
I If X is discrete with mass function p(x) thenE [X ] =
∑x p(x)x .
I Similarly, if X is continuous with density function f (x) thenE [X ] =
∫f (x)xdx .
I If X is discrete with mass function p(x) thenE [g(x)] =
∑x p(x)g(x).
I Similarly, X if is continuous with density function f (x) thenE [g(X )] =
∫f (x)g(x)dx .
I If X and Y have joint mass function p(x , y) thenE [g(X ,Y )] =
∑y
∑x g(x , y)p(x , y).
I If X and Y have joint probability density function f (x , y) thenE [g(X ,Y )] =
∫∞−∞
∫∞−∞ g(x , y)f (x , y)dxdy .
![Page 52: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/52.jpg)
Properties of expectation
I Several properties we derived for discrete expectationscontinue to hold in the continuum.
I If X is discrete with mass function p(x) thenE [X ] =
∑x p(x)x .
I Similarly, if X is continuous with density function f (x) thenE [X ] =
∫f (x)xdx .
I If X is discrete with mass function p(x) thenE [g(x)] =
∑x p(x)g(x).
I Similarly, X if is continuous with density function f (x) thenE [g(X )] =
∫f (x)g(x)dx .
I If X and Y have joint mass function p(x , y) thenE [g(X ,Y )] =
∑y
∑x g(x , y)p(x , y).
I If X and Y have joint probability density function f (x , y) thenE [g(X ,Y )] =
∫∞−∞
∫∞−∞ g(x , y)f (x , y)dxdy .
![Page 53: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/53.jpg)
Properties of expectation
I Several properties we derived for discrete expectationscontinue to hold in the continuum.
I If X is discrete with mass function p(x) thenE [X ] =
∑x p(x)x .
I Similarly, if X is continuous with density function f (x) thenE [X ] =
∫f (x)xdx .
I If X is discrete with mass function p(x) thenE [g(x)] =
∑x p(x)g(x).
I Similarly, X if is continuous with density function f (x) thenE [g(X )] =
∫f (x)g(x)dx .
I If X and Y have joint mass function p(x , y) thenE [g(X ,Y )] =
∑y
∑x g(x , y)p(x , y).
I If X and Y have joint probability density function f (x , y) thenE [g(X ,Y )] =
∫∞−∞
∫∞−∞ g(x , y)f (x , y)dxdy .
![Page 54: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/54.jpg)
Properties of expectation
I For both discrete and continuous random variables X and Ywe have E [X + Y ] = E [X ] + E [Y ].
I In both discrete and continuous settings, E [aX ] = aE [X ]when a is a constant. And E [
∑aiXi ] =
∑aiE [Xi ].
I But what about that delightful “area under 1− FX” formulafor the expectation?
I When X is non-negative with probability one, do we alwayshave E [X ] =
∫∞0 P{X > x}, in both discrete and continuous
settings?
I Define g(y) so that 1− FX (g(y)) = y . (Draw horizontal lineat height y and look where it hits graph of 1− FX .)
I Choose Y uniformly on [0, 1] and note that g(Y ) has thesame probability distribution as X .
I So E [X ] = E [g(Y )] =∫ 10 g(y)dy , which is indeed the area
under the graph of 1− FX .
![Page 55: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/55.jpg)
Properties of expectation
I For both discrete and continuous random variables X and Ywe have E [X + Y ] = E [X ] + E [Y ].
I In both discrete and continuous settings, E [aX ] = aE [X ]when a is a constant. And E [
∑aiXi ] =
∑aiE [Xi ].
I But what about that delightful “area under 1− FX” formulafor the expectation?
I When X is non-negative with probability one, do we alwayshave E [X ] =
∫∞0 P{X > x}, in both discrete and continuous
settings?
I Define g(y) so that 1− FX (g(y)) = y . (Draw horizontal lineat height y and look where it hits graph of 1− FX .)
I Choose Y uniformly on [0, 1] and note that g(Y ) has thesame probability distribution as X .
I So E [X ] = E [g(Y )] =∫ 10 g(y)dy , which is indeed the area
under the graph of 1− FX .
![Page 56: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/56.jpg)
Properties of expectation
I For both discrete and continuous random variables X and Ywe have E [X + Y ] = E [X ] + E [Y ].
I In both discrete and continuous settings, E [aX ] = aE [X ]when a is a constant. And E [
∑aiXi ] =
∑aiE [Xi ].
I But what about that delightful “area under 1− FX” formulafor the expectation?
I When X is non-negative with probability one, do we alwayshave E [X ] =
∫∞0 P{X > x}, in both discrete and continuous
settings?
I Define g(y) so that 1− FX (g(y)) = y . (Draw horizontal lineat height y and look where it hits graph of 1− FX .)
I Choose Y uniformly on [0, 1] and note that g(Y ) has thesame probability distribution as X .
I So E [X ] = E [g(Y )] =∫ 10 g(y)dy , which is indeed the area
under the graph of 1− FX .
![Page 57: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/57.jpg)
Properties of expectation
I For both discrete and continuous random variables X and Ywe have E [X + Y ] = E [X ] + E [Y ].
I In both discrete and continuous settings, E [aX ] = aE [X ]when a is a constant. And E [
∑aiXi ] =
∑aiE [Xi ].
I But what about that delightful “area under 1− FX” formulafor the expectation?
I When X is non-negative with probability one, do we alwayshave E [X ] =
∫∞0 P{X > x}, in both discrete and continuous
settings?
I Define g(y) so that 1− FX (g(y)) = y . (Draw horizontal lineat height y and look where it hits graph of 1− FX .)
I Choose Y uniformly on [0, 1] and note that g(Y ) has thesame probability distribution as X .
I So E [X ] = E [g(Y )] =∫ 10 g(y)dy , which is indeed the area
under the graph of 1− FX .
![Page 58: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/58.jpg)
Properties of expectation
I For both discrete and continuous random variables X and Ywe have E [X + Y ] = E [X ] + E [Y ].
I In both discrete and continuous settings, E [aX ] = aE [X ]when a is a constant. And E [
∑aiXi ] =
∑aiE [Xi ].
I But what about that delightful “area under 1− FX” formulafor the expectation?
I When X is non-negative with probability one, do we alwayshave E [X ] =
∫∞0 P{X > x}, in both discrete and continuous
settings?
I Define g(y) so that 1− FX (g(y)) = y . (Draw horizontal lineat height y and look where it hits graph of 1− FX .)
I Choose Y uniformly on [0, 1] and note that g(Y ) has thesame probability distribution as X .
I So E [X ] = E [g(Y )] =∫ 10 g(y)dy , which is indeed the area
under the graph of 1− FX .
![Page 59: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/59.jpg)
Properties of expectation
I For both discrete and continuous random variables X and Ywe have E [X + Y ] = E [X ] + E [Y ].
I In both discrete and continuous settings, E [aX ] = aE [X ]when a is a constant. And E [
∑aiXi ] =
∑aiE [Xi ].
I But what about that delightful “area under 1− FX” formulafor the expectation?
I When X is non-negative with probability one, do we alwayshave E [X ] =
∫∞0 P{X > x}, in both discrete and continuous
settings?
I Define g(y) so that 1− FX (g(y)) = y . (Draw horizontal lineat height y and look where it hits graph of 1− FX .)
I Choose Y uniformly on [0, 1] and note that g(Y ) has thesame probability distribution as X .
I So E [X ] = E [g(Y )] =∫ 10 g(y)dy , which is indeed the area
under the graph of 1− FX .
![Page 60: 18.600: Lecture 23 .1in Conditional probability, order](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8ab47f1a067715a77d35a/html5/thumbnails/60.jpg)
Properties of expectation
I For both discrete and continuous random variables X and Ywe have E [X + Y ] = E [X ] + E [Y ].
I In both discrete and continuous settings, E [aX ] = aE [X ]when a is a constant. And E [
∑aiXi ] =
∑aiE [Xi ].
I But what about that delightful “area under 1− FX” formulafor the expectation?
I When X is non-negative with probability one, do we alwayshave E [X ] =
∫∞0 P{X > x}, in both discrete and continuous
settings?
I Define g(y) so that 1− FX (g(y)) = y . (Draw horizontal lineat height y and look where it hits graph of 1− FX .)
I Choose Y uniformly on [0, 1] and note that g(Y ) has thesame probability distribution as X .
I So E [X ] = E [g(Y )] =∫ 10 g(y)dy , which is indeed the area
under the graph of 1− FX .