copyright © andrew w. moore slide 1 probability densities in data mining andrew w. moore professor...
TRANSCRIPT
![Page 1: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/1.jpg)
Copyright © Andrew W. Moore Slide 1
Probability Densities in Data
MiningAndrew W. Moore
ProfessorSchool of Computer ScienceCarnegie Mellon University
www.cs.cmu.edu/[email protected]
412-268-7599
![Page 2: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/2.jpg)
Copyright © Andrew W. Moore Slide 2
Contenido• Porque son importantes.• Notacion y fundamentos de PDF
continuas.• PDFs multivariadas continuas.• Combinando variables aleatorias
discretas y continuas.
![Page 3: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/3.jpg)
Copyright © Andrew W. Moore Slide 3
Porque son importantes?• Real Numbers occur in at least 50% of
database records• Can’t always quantize them• So need to understand how to describe
where they come from• A great way of saying what’s a
reasonable range of values• A great way of saying how multiple
attributes should reasonably co-occur
![Page 4: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/4.jpg)
Copyright © Andrew W. Moore Slide 4
Porque son importantes?• Can immediately get us Bayes
Classifiers that are sensible with real-valued data
• You’ll need to intimately understand PDFs in order to do kernel methods, clustering with Mixture Models, analysis of variance, time series and many other things
• Will introduce us to linear and non-linear regression
![Page 5: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/5.jpg)
Copyright © Andrew W. Moore Slide 5
A PDF of American Ages in 2000
![Page 6: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/6.jpg)
Copyright © Andrew W. Moore Slide 6
Poblacion de PR por grupo de edad
group freq midpoint freq.rela0-4 284593 2.5 0.07375165-9 301424 7.5 0.078113310-14 305025 12.5 0.079046515-19 305577 17.5 0.079189520-24 299362 22.5 0.077578925-29 277415 27.5 0.071891430-34 262959 32.5 0.068145235-39 265154 37.5 0.068714040-44 258211 42.5 0.066914745-49 239965 47.5 0.062186350-54 233597 52.5 0.060536155-59 206552 57.5 0.053527460-64 169796 62.5 0.044002265-69 141869 67.5 0.036765070-74 112416 72.5 0.029132375-79 85137 77.5 0.022063080-84 57953 82.5 0.015018485+ 51801 87.5 0.0134241
![Page 7: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/7.jpg)
Copyright © Andrew W. Moore Slide 7
0 20 40 60 80
0.02
0.04
0.06
0.08
pobpr$midpoint
pobp
r$fre
q.re
lapdf de la edad poblacional en PR en 2000
![Page 8: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/8.jpg)
Copyright © Andrew W. Moore Slide 8
A PDF of American Ages in 2000
Let X be a continuous random variable.
If p(x) is a Probability Density Function for X then…
b
ax
dxxpbXaP )(
50
30age
age)age(50Age30 dpP
= 0.36
![Page 9: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/9.jpg)
Copyright © Andrew W. Moore Slide 9
Properties of PDFs
That means…
h
hxX
hxP
xp
22)( lim
0h
b
ax
dxxpbXaP )(
)(xpxXPx
![Page 10: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/10.jpg)
Copyright © Andrew W. Moore Slide 10
)()()]2/(2/[)()22
(2/
2/
whpwphxhxdttph
xXh
xPhx
hx
Donde x-h/2<w<x+h/2). Luego,
)()2/2/(
wph
hxXhxP
Asi p(w) tiende a p(x) cuando h tiende a cero
![Page 11: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/11.jpg)
Copyright © Andrew W. Moore Slide 11
x
dttpxXP )()(
h
dttp
h
xXPhxXP
xd
xXdPhx
x
h
)()()(lim
)(
)(0
hxwxwph
whph
),(lim)(
lim0
Notar que p(w) tiende a p(x) cuando h tiende a 0
Funcion de distribucion acumulativa. Esta es una funcion No decreciente
Se ha mostrado que la derivada de la funcion de distribucion da la funcion de densidad
![Page 12: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/12.jpg)
Copyright © Andrew W. Moore Slide 12
Properties of PDFs
b
ax
dxxpbXaP )(
)(xpxXPx
Therefore…
Therefore…
1)(
x
dxxp
0)(: xpx
La dcerivada de una fucnion no dcecreciente es mayor o igual que cero.
![Page 13: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/13.jpg)
Copyright © Andrew W. Moore Slide 13
• Cual es el significado de p(x)?Si
p(5.31) = 0.06 and p(5.92) = 0.03
Entonces cuando un valor de X es muestreado de la distribucion, es dos veces mas probable que X este mas cerca a 5.31 que a 5.92.
![Page 14: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/14.jpg)
Copyright © Andrew W. Moore Slide 14
Yet another way to view a PDF
A recipe for sampling a random age.
1. Generate a random dot from the rectangle surrounding the PDF curve. Call the dot (age,d)
2. If d < p(age) stop and return age
3. Else try again: go to Step 1.
![Page 15: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/15.jpg)
Copyright © Andrew W. Moore Slide 15
Test your understanding• True or False:
1)(: xpx
0)(: xXPx
![Page 16: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/16.jpg)
Copyright © Andrew W. Moore Slide 16
ExpectationsE[X] = the expected value of random variable X
= the average value we’d see if we took a very large number of random samples of X
x
dxxpx )(
![Page 17: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/17.jpg)
Copyright © Andrew W. Moore Slide 17
ExpectationsE[X] = the expected value of random variable X
= the average value we’d see if we took a very large number of random samples of X
x
dxxpx )(
= the first moment of the shape formed by the axes and the blue curve
= the best value to choose if you must guess an unknown person’s age and you’ll be fined the square of your error
E[age]=35.897
![Page 18: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/18.jpg)
Copyright © Andrew W. Moore Slide 18
Expectation of a function=E[f(X)] = the expected value of f(x) where x is drawn from X’s distribution.
= the average value we’d see if we took a very large number of random samples of f(X)
x
dxxpxf )()(
Note that in general:
])[()]([ XEfxfE
64.1786]age[ 2 E
62.1288])age[( 2 E
![Page 19: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/19.jpg)
Copyright © Andrew W. Moore Slide 19
Variance2 = Var[X] = the expected squared difference between x and E[X]
x
dxxpx )()( 22
= amount you’d expect to lose if you must guess an unknown person’s age and you’ll be fined the square of your error, and assuming you play optimally
02.498]age[Var
![Page 20: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/20.jpg)
Copyright © Andrew W. Moore Slide 20
Standard Deviation2 = Var[X] = the expected squared difference between x and E[X]
x
dxxpx )()( 22
= amount you’d expect to lose if you must guess an unknown person’s age and you’ll be fined the square of your error, and assuming you play optimally
= Standard Deviation = “typical” deviation of X from its mean
02.498]age[Var
][Var X
32.22
222 )]([)( XEXE
![Page 21: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/21.jpg)
Copyright © Andrew W. Moore Slide 21
Estadisticas para PR• E(edad)=35.17• Var(edad)=501.16• Desv.Est(edad)=22.38
![Page 22: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/22.jpg)
Copyright © Andrew W. Moore Slide 22
Funciones de densidad mas conocidas
![Page 23: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/23.jpg)
Copyright © Andrew W. Moore Slide 23
Funciones de densidad mas conocidas
• La densidad uniforme o rectangular• La densidad triangular• La densidad exponencial• La densidad Gamma y la Chi-square• La densidad Beta• La densidad Normal o Gaussiana• Las densidades t y F.
![Page 24: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/24.jpg)
Copyright © Andrew W. Moore Slide 24
La distribucion rectangular
-w/2 0 w/2
1/w
0][ XE12
]Var[2w
X
2
w|x|if0
2
w|x|if
1
)( wxp
![Page 25: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/25.jpg)
Copyright © Andrew W. Moore Slide 25
La distribucion triangular
0
w|x|
w|x|w
xwxp
if0
if||
)( 2
6]Var[
2wX
0][ XE
w
1
w
w
![Page 26: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/26.jpg)
Copyright © Andrew W. Moore Slide 26
The Exponential distribution
otherwise
xexpx
if0
0if1
)(/
2]Var[ X
][XE
0 20 40 60 80 100
0.0
00
.02
0.0
40
.06
0.0
80
.10
Densidad exponencial,=.1
x
0.1
* e
xp
(-0
.1 *
x)
![Page 27: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/27.jpg)
Copyright © Andrew W. Moore Slide 27
La distribucion Normal
Estandar
2exp
2
1)(
2xxp
1]Var[ X
0][ XE
![Page 28: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/28.jpg)
Copyright © Andrew W. Moore Slide 28
La distribucion Normal General
2
2
2
)(exp
2
1)(
x
xp
2]Var[ X
μXE ][
=100
=15
![Page 29: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/29.jpg)
Copyright © Andrew W. Moore Slide 29
General Gaussian
2
2
2
)(exp
2
1)(
x
xp
2]Var[ X
μXE ][
=100
=15
Shorthand: We say X ~ N(,2) to mean “X is distributed as a Gaussian with parameters and 2”.
In the above figure, X ~ N(100,152)
Also known as the normal
distribution or Bell-shaped curve
![Page 30: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/30.jpg)
Copyright © Andrew W. Moore Slide 30
The Error FunctionAssume X ~ N(0,1)
Define ERF(x) = P(X<x) = Cumulative Distribution of X
x
z
dzzpxERF )()(
x
z
dzz
2exp
2
1 2
![Page 31: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/31.jpg)
Copyright © Andrew W. Moore Slide 31
Using The Error FunctionAssume X ~ N(,2)
P(X<x| ,2) = )(2x
ERF
![Page 32: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/32.jpg)
Copyright © Andrew W. Moore Slide 32
The Central Limit Theorem• If (X1,X2, … Xn) are i.i.d. continuous
random variables• Then define
• As n-->infinity, p(z)--->Gaussian with mean E[Xi] and variance Var[Xi]
Somewhat of a justification for assuming Gaussian noise is common
n
iin x
nxxxfz
121
1),...,(
![Page 33: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/33.jpg)
Copyright © Andrew W. Moore Slide 33
Estimadores de funcion de densidad
• Histograms
• K-nearest neighbors:
• Kernel density estimators
knd
kf
2)(ˆ x
nh
kxf )(ˆ h ancho de clase
dk es la distancia hasta el k-esimo vecino
![Page 34: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/34.jpg)
Copyright © Andrew W. Moore Slide 34
Estimacion de funcion de densidad-histograma
x
Den
sity
0 2 4 6 8
0.00
0.05
0.10
0.15
0.20
0.25
> x=c( 7.3, 6.8, 7.1, 2.5, 7.9, 6.5, 4.2, 0.5, 5.6, 5.9)> hist(x,freq=F,main="Estimacion de funcion de densidad-histograma")> rug(x,col=2)
![Page 35: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/35.jpg)
Copyright © Andrew W. Moore Slide 35
Estimacion de densidad por knn en 20 pts con k=1,3,5,7
0 2 4 6 8
01
23
4
x
fest
0 2 4 6 8
0.1
0.3
x
fest
0 2 4 6 8
0.05
0.20
x
fest
0 2 4 6 8
0.05
0.15
0.25
x
fest
![Page 36: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/36.jpg)
Copyright © Andrew W. Moore Slide 36
Estimación por kernels de una función de densidad univariada.
En el caso univariado, el estimador por kernels de la función de densidad f(x) se obtiene de la siguiente manera. Consideremos que x1,…xn es una variable aleatoria X con función de densidad f(x), definamos la función de distribución empirica por
el cual es un estimador de la función de distribución acumulada F(x) de X. Considerando que la función de densidad f(x) es la derivada de la función de distribución F y usando aproximación para derivada se tiene que
n
xobsxFn
#)(
![Page 37: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/37.jpg)
Copyright © Andrew W. Moore Slide 37
donde h es un valor positivo cercano a cero. Lo anterior es equivalente a la proporción de puntos en el intervalo (x-h, x+h) dividido por 2h. La ecuación anterior puede ser escrita como:
donde la función peso K está definida por 0 si |z|>1 K(z)=
1/2 si |z| 1
h
hxFhxFxf nn
2
)()()(ˆ
)(1
)(ˆ1
n
i
i
h
xxK
nhxf
![Page 38: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/38.jpg)
Copyright © Andrew W. Moore Slide 38
Muestra: 6, 8, 9 12, 20, 25,18, 31
hhhdepuntosenproporcionf 2/)15,15()15(ˆ
32/164/28/)19,11()15(ˆ depuntosenporporcionf
]02/1002/1000)[4*8/(1)15(ˆ f
![Page 39: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/39.jpg)
Copyright © Andrew W. Moore Slide 39
este es llamado el kernel uniforme y h es llamado el ancho de banda el cual es un parámetro de suavización que indica cuanto contribuye cada punto muestral al estimado en el punto x. En general, K y h deben satisfacer ciertas condiciones de regularidad, tales como:
K(z) debe ser acotado y absolutamente
integrable en (-,) Usualmente, pero no siempre, K(z)0 y
simétrico, luego cualquier función de densidad simétrica puede usarse como kernel.
1)( dzzK
0)(lim
nhn
![Page 40: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/40.jpg)
Copyright © Andrew W. Moore Slide 40
Eleccion del ancho de banda h
2.006.1 snh
Donde n es el numero de datos y s la desviacion estandar de la muestra.
2.0
13 )(79.0 nQQh
![Page 41: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/41.jpg)
Copyright © Andrew W. Moore Slide 41
EL KERNEL GAUSSIANO
En este caso el kernel representa una función peso más suave donde todos los puntos contribuyen al estimado de f(x) en x. Es decir,
)2
1exp(
2
1)( 2zzK
n
i
h
xx i
e
nhxf
1
2
)(
2
1)(ˆ
2
2
![Page 42: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/42.jpg)
Copyright © Andrew W. Moore Slide 42
EL KERNEL TRIANGULAR K(z)=1- |z| para |z|<1, 0 en otro caso.
EL KERNEL "BIWEIGHT"
15/16(1-z2)2 para |z|<1
K(z)=
0 en otro caso
![Page 43: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/43.jpg)
Copyright © Andrew W. Moore Slide 43
EL KERNEL EPANECHNIKOV para |z|< K(z)= 0 en otro caso
5
)5
1(54
3 2z
![Page 44: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/44.jpg)
Copyright © Andrew W. Moore Slide 44
Estimacion de densidad en 20 pts usando kernel gaussiano con h=.5,”opt1”,”opt2”, 4
0 2 4 6 8
0.00
0.15
0.30
x
fest
0 2 4 6 8
0.02
0.08
0.14
x
fest
0 2 4 6 8
0.05
0.15
x
fest
0 2 4 6 8
0.04
0.07
x
fest
![Page 45: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/45.jpg)
Copyright © Andrew W. Moore Slide 45
Variables aleatorias bidimensionales
p(x,y) = probability density of random
variables (X,Y) at location (x,y)
![Page 46: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/46.jpg)
Copyright © Andrew W. Moore Slide 46
Estimadores de funcion de densidad bi-dimensionales
• Histogramas
• K-nearest neighbors:
• Kernel density estimators
knA
kxf )(ˆ
nA
kxf )(ˆ A area de la clase
Ak es el area ncluyendo hasta el k-esimo vecino
![Page 47: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/47.jpg)
Copyright © Andrew W. Moore Slide 47
Estimacion de kernel bivariado
)||||
(1
)(ˆ1
2
n
i
i
hK
nhf
xtt
))]()'[((1
)(ˆ 2/11
121
ii
n
iHK
hnhf xtxtt
Sean xi=(x,y) los valores observados y t=(t1,t2) un punto del plano donde se desea estimar la densidad conjunta
2
2
2
1
0
0
h
hH
Si h1=h2=h
![Page 48: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/48.jpg)
Copyright © Andrew W. Moore Slide 48
• (a1, a2)H-12
2
2
2
2
2
1 ||||
2
1
h
a
h
aa
a
a
2
2
1
/10
0/1
h
hH
donde
![Page 49: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/49.jpg)
Copyright © Andrew W. Moore Slide 49
Estimacion de densidad-Kernel Gaussiano bivariado
n
i
h
ty
h
tx
e
hnhf
1
2
)(
2
)(
21 2
1)(ˆ
22
22
21
21
t
![Page 50: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/50.jpg)
Copyright © Andrew W. Moore Slide 50
10
20
30
40
2000
3000
4000
50000 e+00
1 e-05
2 e-05
3 e-05
densidad conjunta estimada por metodo kernel
f1= kde2d(autompg1$V1, autompg1$V5,n=100)persp(f1$x,f1$y,f1$z)
![Page 51: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/51.jpg)
Copyright © Andrew W. Moore Slide 51
mpg
wei
ght
10 20 30 40
1500
2500
3500
4500
grafica de contorno de la densidad estimada
contour(f1, levels=c(8e-6,2e-5, 2.8e-5),col=c(2,3,4), xlab="mpg",ylab="weight")
![Page 52: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/52.jpg)
Copyright © Andrew W. Moore Slide 52
In 2 dimensions
Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space…
Ryx
dydxyxpRYXP),(
),()),((
![Page 53: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/53.jpg)
Copyright © Andrew W. Moore Slide 53
In 2 dimensions
Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space…
Ryx
dydxyxpRYXP),(
),()),((
P( 20<mpg<30 and 2500<weight<3000) =
volumen under the 2-d surface within the red rectangle
![Page 54: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/54.jpg)
Copyright © Andrew W. Moore Slide 54
In 2 dimensions
Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space…
Ryx
dydxyxpRYXP),(
),()),((
P( [(mpg-25)/10]2 + [(weight-3300)/1500]2
< 1 ) =
volumen under the 2-d surface within the red oval
![Page 55: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/55.jpg)
Copyright © Andrew W. Moore Slide 55
In 2 dimensions
Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space…
Ryx
dydxyxpRYXP),(
),()),((
Take the special case of region R = “everywhere”.
Remember that with probability 1, (X,Y) will be drawn from “somewhere”.
So..
x y
dydxyxp 1),(
![Page 56: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/56.jpg)
Copyright © Andrew W. Moore Slide 56
In 2 dimensions
Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space…
Ryx
dydxyxpRYXP),(
),()),((
20h
2222lim h
hyY
hy
hxX
hxP
),( yxp
![Page 57: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/57.jpg)
Copyright © Andrew W. Moore Slide 57
In m dimensions
Let (X1,X2,…Xm) be an n-tuple of continuous random variables, and let R be some region of Rm …
)),...,,(( 21 RXXXP m
Rxxx
mm
m
dxdxdxxxxp),...,,(
1221
21
,,...,),...,,(...
![Page 58: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/58.jpg)
Copyright © Andrew W. Moore Slide 58
Independence
If X and Y are independent then knowing the value of X does not help predict the
value of Y
)()(),( :yx, iff ypxpyxpYX
mpg,weight NOT independent
![Page 59: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/59.jpg)
Copyright © Andrew W. Moore Slide 59
Independence
If X and Y are independent then knowing the value of X does not help predict the
value of Y
)()(),( :yx, iff ypxpyxpYX
the contours say that acceleration and weight
are independent
![Page 60: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/60.jpg)
Copyright © Andrew W. Moore Slide 60
Multivariate Expectation
xxxXμX dpE )(][
E[mpg,weight] =(24.5,2600)
The centroid of the cloud
![Page 61: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/61.jpg)
Copyright © Andrew W. Moore Slide 61
Multivariate Expectation> f1= kde2d(autompg1$mpg, autompg1$weight,n=100)> dx=f1$x[2]-f1$x[1]> dy=f1$y[2]-f1$y[1]> dx[1] 0.379798> dy[1] 35.62626> meanmpg=sum(f1$x*f1$z)*dx*dy[1] 22.48855> meanweight=sum(f1$y*f1$z)*dx*dy[1] 2848.638>#estimated mean> mean(autompg1$weight)[1] 2977.584> mean(autompg1$mpg)[1] 23.44592
![Page 62: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/62.jpg)
Copyright © Andrew W. Moore Slide 62
Multivariate Expectation
xxxX dpffE )()()]([
![Page 63: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/63.jpg)
Copyright © Andrew W. Moore Slide 63
Test your understanding? ][][][ does ever) (if When :Question YEXEYXE
•All the time? Siempre
•Only when X and Y are independent?
•It can fail even if X and Y are independent?
![Page 64: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/64.jpg)
Copyright © Andrew W. Moore Slide 64
Bivariate Expectation
dydxyxpxXE ),(][
dydxyxpyxfyxfE ),(),()],([
dydxyxpyYE ),(][
dydxyxpyxYXE ),()(][
][][][ YEXEYXE
![Page 65: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/65.jpg)
Copyright © Andrew W. Moore Slide 65
Bivariate Covariance)])([(],Cov[ yxxy YXEYX
])[(][],Cov[ 22xxxx XEXVarXX
])[(][],Cov[ 22yyyy YEYVarYY
![Page 66: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/66.jpg)
Copyright © Andrew W. Moore Slide 66
Bivariate Covariance)])([(],Cov[ yxxy YXEYX
])[(][],Cov[ 22xxxx XEXVarXX
])[(][],Cov[ 22yyyy YEYVarYY
then, Write
Y
X X
yxy
xyxTxx ))((E 2
2
][] [
ΣμXμXXCov
![Page 67: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/67.jpg)
Copyright © Andrew W. Moore Slide 67
Covarianza y desviacion estandar estimadas entre mpg y weight
> cov(autompg1[,c(1,5)]) mpg weightmpg 60.91814 -5517.441weight -5517.44070 721484.709> sd(autompg1$mpg)[1] 7.805007> sd(autompg1$weight)[1] 849.4026
![Page 68: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/68.jpg)
Copyright © Andrew W. Moore Slide 68
Covariance Intuition
E[mpg,weight] =(24.5,2600)
8mpg 8mpg
700weight
700weight
![Page 69: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/69.jpg)
Copyright © Andrew W. Moore Slide 69
Covariance Intuition
E[mpg,weight] =(24.5,2600)
8mpg 8mpg
700weight
700weight
PrincipalEigenvector
of
![Page 70: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/70.jpg)
Copyright © Andrew W. Moore Slide 70
Regression Line
)()/(2 x
X
xy
y xxXYE
Notice that the regression line pass trough (x,y)
![Page 71: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/71.jpg)
Copyright © Andrew W. Moore Slide 71
Regression Line>l1=lm(weight~mpg,data=autompg1)> l1
Call:lm(formula = weight ~ mpg, data = autompg1)
Coefficients:(Intercept) mpg 5101.11 -90.57
>#slope of regression line>slope= -5517.44/60.918[1] -90.571
![Page 72: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/72.jpg)
Copyright © Andrew W. Moore Slide 72
Primer Principal component> a=cov(autompg1[,c(1,5)])> eigen(a)$values[1] 721526.90386 18.72329
$vectors [,1] [,2][1,] -0.007647317 0.999970759[2,] 0.999970759 0.007647317
#slope of primer principal component
> .99997/-.00764[1] –130.8861
![Page 73: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/73.jpg)
Copyright © Andrew W. Moore Slide 73
Covariance Fun Facts
yxy
xyxTxx ))((E 2
2
][] [
ΣμXμXXCov
•True or False: If xy = 0 then X and Y are independent. False
•True or False: If X and Y are independent then xy = 0. True
•True or False: If xy = x y then X and Y are deterministically related. True
•True or False: If X and Y are deterministically related then xy = x y.
false
How could you prove or disprove these?
![Page 74: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/74.jpg)
Copyright © Andrew W. Moore Slide 74
Test your understanding? ][][][ does ever) (if When :Question YVarXVarYXVar
•All the time?
•Only when X and Y are independent? Cierto
•It can fail even if X and Y are independent?
![Page 75: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/75.jpg)
Copyright © Andrew W. Moore Slide 75
Marginal Distributions
y
dyyxpxp ),()(
![Page 76: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/76.jpg)
Copyright © Andrew W. Moore Slide 76
Conditional Distributions
yYX
yxp
when of p.d.f.
)|(
)4600weight|mpg( p
)3200weight|mpg( p
)2000weight|mpg( p
![Page 77: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/77.jpg)
Copyright © Andrew W. Moore Slide 77
Conditional Distributions
yYX
yxp
when of p.d.f.
)|(
)4600weight|mpg( p
)(
),()|(
yp
yxpyxp
Why?
![Page 78: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/78.jpg)
Copyright © Andrew W. Moore Slide 78
Independence Revisited
It’s easy to prove that these statements are equivalent…
)()(),( :yx, iff ypxpyxpYX
)()|( :yx,
)()|( :yx,
)()(),( :yx,
ypxyp
xpyxp
ypxpyxp
![Page 79: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/79.jpg)
Copyright © Andrew W. Moore Slide 79
More useful stuff
BayesRule
(These can all be proved from definitions on previous slides)
1)|(
x
dxyxp
)|(
)|,(),|(
zyp
zyxpzyxp
)(
)()|()|(
yp
xpxypyxp
![Page 80: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/80.jpg)
Copyright © Andrew W. Moore Slide 80
Mixing discrete and continuous variables
h
vAh
xXh
xPvAxp
22),( lim
0h
1),(1
An
v x
dxvAxp
BayesRule
BayesRule)(
)()|()|(
AP
xpxAPAxp
)(
)()|()|(
xp
APAxpxAP
![Page 81: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/81.jpg)
Copyright © Andrew W. Moore Slide 81
0 5 10 15
0.00
0.05
0.10
0.15
0.20
0.25
x
fest
clase 1
clase 2
Estimacion de funcion de dendidad conjunta mixta
P(educacion,salario>50k)
![Page 82: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/82.jpg)
Copyright © Andrew W. Moore Slide 82
5 10 15
0.4
0.6
0.8
1.0
1:16
b[, 1
]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
class 1
class 2
conditional density estimation de educacion por clase
Estimation of the posterior P(Class/Education)
![Page 83: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/83.jpg)
Copyright © Andrew W. Moore Slide 83
Mixing discrete and continuous variables
P(EduYears,Wealthy)
![Page 84: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/84.jpg)
Copyright © Andrew W. Moore Slide 84
Mixing discrete and continuous variables
P(EduYears,Wealthy)
P(Wealthy| EduYears)
![Page 85: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/85.jpg)
Copyright © Andrew W. Moore Slide 85
Mixing discrete and continuous variables
Renorm
aliz
ed
Axes
P(EduYears,Wealthy)
P(Wealthy| EduYears)
P(EduYears|Wealthy)
![Page 86: Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University](https://reader035.vdocuments.us/reader035/viewer/2022062621/551ceeee550346357a8b5291/html5/thumbnails/86.jpg)
Copyright © Andrew W. Moore Slide 86
Ejercicios
• Suppose X and Y are independent real-valued random variables distributed between 0 and 1:• What is p[min(X,Y)]? • What is E[min(X,Y)]?
• Prove that E[X] is the value u that minimizes E[(X-u)2]
• What is the value u that minimizes E[|X-u|]?