03. pdf estimation corr
TRANSCRIPT
-
8/10/2019 03. PDF Estimation Corr
1/43
1Dipartimento di Ingegneria
Biofisica ed ElettronicaUniversit di Genova
Prof. Sebastiano B. Serpico
3. Supervised estimation of
probability density functions
-
8/10/2019 03. PDF Estimation Corr
2/43
2
Supervised Classifier Design
Approach 1: Approach 2:
Training setData set
Estimation of
the classconditional pdf
Decision
Theory
{p(x| i)}
training
samples for
the classes
{i}
Apply the decision
rule to the data set
Data set classification
Training set Data set
Training of anon-Bayesian
classifier by a
direct use of the
training
samples
training
samples for
the classes
{i}
Apply the decision
rule to the data set
Data set classification
-
8/10/2019 03. PDF Estimation Corr
3/43
3
Supervised Estimation of a pdf
The use of the decision theory to design classifiers requires topreliminary estimate the class conditional pdf. In a supervisedapproach, the estimation of the pdf p(x| i) can be performedon the basis of the trainingdata of the class i.
Problem definition and notation:
Consider a feature vectorx with (unknown) pdf p(x) and a finite
set X = {x1, x2, ..., xN} of N independent samples extracted fromsuch a pdf;
We would like to compute, on the basis of the available samples,an estimated pdf
In order to perform supervised classification, the estimation
process has to be repeated individually for each single class: inparticular, to estimate p(x| i), we assume that the set Xcorresponds to the set of trainingsamples of the class i.
( );p x
-
8/10/2019 03. PDF Estimation Corr
4/43
4
Approaches to pdf Estimation
Parametric Estimation: a given model (e.g.: Gaussian,
exponential,) for theanalytical form of p(x) isassumed;
the parameters of such amodel are estimated.
Remarks: a given model could be not
physically realistic;
most of the parametricmethods assume single-modepdfs, while many realproblems involve multimodalpdfs;
complex methods (notconsidered here) have beendeveloped to identify thedifferent modes of a pdf.
Non-Parametric Estimation :
no analytical model is assumed
for the pdf estimation, but p(x)is directly estimated from thesamples in X.
Remarks:
typically, the lack of predefined
models allows more flexibility; however, the computational
complexity of the estimationproblem is generally higherthan for the parametric case.
-
8/10/2019 03. PDF Estimation Corr
5/43
5
Parametric Estimation
Given an analytical model of the pdf p(x) to be estimated, theparameters that characterize the model are collected into avector r.
We highlight the dependence on the parameters by adopting thenotationp(x| ) (in particular,p(x| ), considered as function of ,is called likelihood function).
The training samples x1, x2, ..., xN are collected into a singlevector of observationsX. The samples are considered as randomvectors and a pdfp(X| ) is associated to them
usually the samples are assumed as identically distributedrandom vectors (because thay are all extracted from the same pdf
p(x)) and independent from each other (i.i.d. random vectors, thatis, independent and identically distributed), then:
1
( | ) ( | )N
kk
p p
X x
-
8/10/2019 03. PDF Estimation Corr
6/43
6
General Properties of the Estimations
General properties
The estimate of the parameter vector depends on the observationvector:
Therefore also the estimate is a random vector.
Bias
The expected value E{} of the estimation error iscalled biasand
is defined as: The estimation is told to be not biased if, for each parameter
vector , the estimation error has zero mean:
To prove that the estimation of the parameter i (i= 1, 2, ..., r) isgood, we want that the estimation error i (i component of )has zero mean (i.e. that the estimation is not biased), but also ithas to have a smallvariance var{i}.
( ) X
{ } 0 or { }E E
( , ) X
-
8/10/2019 03. PDF Estimation Corr
7/43
7
Variance of the Estimation Error
Cramr-Rao Inequality:
For each unbiased estimation of the vector , it holds:
where (
) = E{
p(X| )
p(X| )t} is the Fisher informationmatrix :
The Cramr-Rao inequality provides a lower bound for thevariance of the estimation error.
therefore var{i} cannot be made arbitrary small, but it will bealways lower bounded by [1()]
ii. Then, the Fisher information
matrix represents a measure of howgoodan estimation can be.
In particular, an unbiased estimation that satisfies the equality foreach vector of the parameters is told to be efficient.
1var{ } [ ( )] , 1,2,...,i ii i r
( | ) ( | )[ ( )]ij
i j
p pE
X X
-
8/10/2019 03. PDF Estimation Corr
8/43
8
Asymptotic Properties of Estimations
Often biased and/or nonefficient estimations are used,provided they exhibit a good behavior for largevalues of N.
An estimation is called asymptotically unbiased if the error meanis zero for N+ :
An estimation is told asymptotically efficient if the error variance
of the estimation corresponds to the Cramr-Rao lower bound forN+ :
An estimation is told consistent if it converges to the true valuein probability for N+ :
Sufficient condition for an estimation to be consistent is thatitsasymptotically unbiased and that the estimation error hasinfinitesimal variance for N+ [Mendel, 1987].
1
var{ }lim 1, 1,2,...,
[ ( )]i
Nii
i r
lim { } 1 0N
P
N
N
E 0 that is E lim lim
-
8/10/2019 03. PDF Estimation Corr
9/43
9
ML Estimation
Definition
The Maximum Likelihood (ML) estimate of the vector isdefined as the following vector:
Remarks
For different values of , different pdfs are obtained. Each ofthem is computed in correspondence of the observations X. Thepdf assuming the maximum value for X is identified: the MLestimate is the value of that produces this pdf.
Often its an advantage not to maximize the likelihood functionp(X| ), but (equivalently) the log-likelihood function:
arg max ( | )p X
arg maxln ( | )p X
-
8/10/2019 03. PDF Estimation Corr
10/43
10
ML Estimation: Example
ML estimation of the mean of a one-dimensional Gaussianwith known variance (i.e., equal to one) starting from a singleobserved sample x0.
-6 -4 -2 0 2 4 6
x0
0m x
-
8/10/2019 03. PDF Estimation Corr
11/43
11
Properties of the ML Estimation
Under mild assumptions about the function p(X| ), it can beproven that, if an efficient estimation exists and if the MLestimation is unbiased, then the efficient estimation is the MLestimation.
Even when an efficient estimation does not exist, the MLestimation exhibits goodasymptotic properties. In particular
the ML estimation is: Asymptotically unbiased
Asymptotically efficient
Consistent
Asymptotically Gaussian
These properties explain the wide diffusion of ML estimatorsin classification methods.
1( )
, .N
-
8/10/2019 03. PDF Estimation Corr
12/43
-
8/10/2019 03. PDF Estimation Corr
13/43
13
Properties of the Parametric Gaussian Estimation
The estimations of m and , as ML estimations, areasymptotically unbiased, asymptotically efficient andconsistent. Moreover, the following additional properties arevalid:
The estimation of m is unbiased, while the estimation of isbiased:
Therefore, usually, the estimation of is modified as follows:
The two estimations coincide for N + (it is consistent withthe fact that ML estimations are asymptotically unbiased).
The introduced estimation for the mean and the covariancematrix are called, generally, sample meanand sample covariance.
1
{ } , { } N
E E N
m m
1
1 ( )( )
1
Nt
k kkN
x m x m
-
8/10/2019 03. PDF Estimation Corr
14/43
14Iterative Expressions of theGaussian Parametric Estimation
The estimations of m and can also be expressed with aniterative form, using each single sample in sequence, instead ofthe whole training set at once:
Iterative computation of the sample mean:
Iterative computation of the samplecovariance:
( )( 1) 1
( )
1 (1) ( )1
, 1,2, ..., 11
1
,
kk kk
kh
h N
kk N
k
k
m x
mm x
m x m m
( )( 1) 1 1
( )
1 (1)1 1
( ) ( ) ( ) ( ) ( )
, 1,2, ..., 111
,
k tk k kk
k th h
h t
k k k k t N
kSS k N
S kk S
S
x x
x x
x x
m m
The iterativeestimations of
are referred to theexpression with
denominator N.
-
8/10/2019 03. PDF Estimation Corr
15/43
15
Example (1/2)
Given n= 3 featuresand two classes 1and 2, characterized by thefollowing training set:
1: (0, 0, 0), (1, 1, 0), (1, 0, 0), (1, 0, 1)
2: (0, 0, 1), (0, 1, 1), (1, 1, 1), (0, 1, 0)
In this case it has no sense to normalize the features. In fact the featuresare binary and the samples are represented by all possible combinationsof the three binary features.
We assume class conditional pdfs are Gaussian and we apply MLestimation. It is necessary to estimate the mean vector and thecovariance matrix for each class. Letssee the computation for the class
1. The mean estimated value is provided by:
1
0 1 1 1 3 / 41
0 1 0 0 1/ 44
0 0 0 1 1/ 4
m
x1
x2
x3
-
8/10/2019 03. PDF Estimation Corr
16/43
16
Example (2/2)
Estimation of the covariance matrix for 1 Letsuse as denominator N= 4, so obtaining a biased estimation.
The use of N 1 = 3 would give an unbiased estimation.However, for large N, such as N> 30, we would obtain 1/(N1)1/N.
1
3 / 4 1 / 41
1 / 4 3 / 4 1 / 4 1 / 4 3 / 4 1 / 4 3 / 4 1 / 44
1 / 4 1 / 4
1 / 4 1 / 4
1 / 4 1 / 4 1 / 4 1 / 4 1 / 4 1 / 4 1 / 4 3 / 4
1 / 4 3 / 4
9 3 3 1 3 1 1 11
3 1 1 3 9 34 163 1 1 1 3 1
1 1 1 3
1 1 1 1 1 31 1 1 3 3 9
12 4 4 3 1 11 1
4 12 4 1 3 164 16
4 4 12 1 1 3
In this case (notin general!) we
obtain
2 1
-
8/10/2019 03. PDF Estimation Corr
17/43
17
Non-Gaussian Parametric Estimation
When a Gaussian modelappears not accurate for the
considered problem, otherparametric models can beadopted.
In the case n> 1, an extension ofthe gaussian model is given by
the elliptically contoured pdfs:
where m= E{x}, = Cov{x} andf is an appropriate non-negative
function.
The level curves of such pdfsare hyperellipses, like in thegaussian case.
In the case n = 1, very generalmodels are the Pearsons pdfs,
that, varying some parameters,include uniform pdfs, gaussianand also impulsive modelswith vertical asymptotes.
1/ 2 1( ) [( ) ( )]tp fx x m x m
-
8/10/2019 03. PDF Estimation Corr
18/43
18
Non-Parametric Estimation: Problem Definition (1/2)
In a non-parametriccontext the estimation of the unknown pdfp(x) is not restricted to satisfy any predefined model and it is
directly built-up from the training samples x1, x2, , xN(assumed i.i.d.).
Let x* be a generic sample and R a predefined region of thefeature space; Rincludes x*. Assuming that the true pdfp(x) isa continuous function and that R is enough small that suchfunction doesntvary in a significant way in R, we have:
where Vis the n-dimensional volume (measure) of R.
If Kis the number of trainingsamples belonging to R(over a totalof Ntraining samples), a consistent estimation of the probability Pis represented by the relative frequency:
{ } ( ) ( *)RR
P P R p d p V x x x x
,RK
PN
lim { } 1 0R RN
P P PLaw of large
numbers
-
8/10/2019 03. PDF Estimation Corr
19/43
19
Non-Parametric Estimation: Problem Definition (2/2)
Pdf estimation
From the estimation of the probability PR that a sample belongsto R, we can derive an estimation of the pdf in the point x*:
Remarks
R has to be large enough to contain a number of trainingsamples that justify the application of the Law of large numbers;
R has to be small enough to justify the hypothesis that p(x)doesntvary significantly in R.
So, to obtain an accurate estimation, a compromise is necessary
between these two needs, to guarantee a goodestimation of thepdf.
However its not possible to obtain a good compromise, if thetotal number Nof samples in the trainingset is small.
( *) R
P Kp
V NV x
-
8/10/2019 03. PDF Estimation Corr
20/43
20
Two Non-parametric Approaches
By exchanging the roles of the quantities K and V, the abovereasoning leads to two possible approaches to non-parametric
estimation: the k-nearest-neighbor approach: for a fixed K and a given
point xof the feature space, the region Rcontaining the K samplesnearest to x belonging to the training set, the hypervolume V iscomputed, and the estimation of the pdf is deduced;
Parzen-window approach: for a fixed region R centered in x,whose hypervolume is equal to V, Kis computed considering thetraining set, then the estimation is derived.
Its possible to prove that both approaches lead to consistent
estimations. However, its not possible to draw generalconclusions about their behavior in a real context,characterized by afinitenumber of training samples.
-
8/10/2019 03. PDF Estimation Corr
21/43
21
K-Nearest-Neighbor Estimation
Hypotheses
The number of trainingsamples Kis preset.
A reference cell(e.g., a sphere) centered in x* is considered.
Methodology
The k-nearest neighbor (k-nn) estimator extends the cell till itexactly contains K training samples: VK(x*) is the volume of the
resulting cell. The pdf in the point x* is estimated as follow:
It can be proved that, selected Kas a function of N(K= KN), thenecessary and sufficient condition for the k-nn estimation to beconsistentin all points where p(x) is continuous is KN+ forN+ , but of order lower than 1 (e.g., KN= N
1/2).
( *)( *)K
Kp
N Vx
x
-
8/10/2019 03. PDF Estimation Corr
22/43
22
Remarks on the k-nn Method
Typically the cell used with k-nnis a hypersphere, then, the k-nnestimation is based on the following steps:
Identify the Ktrainingsamples closest to the considered point x*(wrt an Euclidean metric);
Identify the radius rof the smallest hypersphere that, centered inx*, includes the above Ksamples (rcoincides with the distance ofx* to the farthest one among the K samples);
Compute the volume of the n-dimensional hypersphere of radiusrand then the value of the estimation ofp(x*).
Disadvantages
The pdf estimated by the k-nnmethod is not a truepdf, since
its integral divergesbecause of the singularities due to the termVK(x*) at the denominator (e.g., V1(xk) = 0 for k= 1, 2, ..., N).
The k-nn estimation is computationally heavy, even if ad hoctechniques have been proposed to decrease the computationalcharge (e.g., KD-Tree).
-
8/10/2019 03. PDF Estimation Corr
23/43
23
Parzen-Window Estimation: Introduction
Hypotheses and notation
Suppose R is an n-dimensional hypercubewith side h (and thenvolume V= hn), centered in the point x*.
Introduce the following rectangular funcion:
Introductionto the method
The trainingsample xkbelongs to the hypercube Rwith center x*and side hif [(xkx*)/h] = 1, otherwise [(xkx*)/h] = 0.
Then the number of the trainingsamples that fall into Ris:
Consequently, the estimation can be computed as:1
*N k
k
Kh
x x
1 1
* *1 1 1( *)
N Nk k
n n
k k
Kp
NV h N hNh h
x x x xx
elsewhere0
0centerand1sidewithhypercubexif1(x)
2
-
8/10/2019 03. PDF Estimation Corr
24/43
24
Parzen-Window Estimation
The just illustrated estimation, based on the concept ofcounting the number of the training samples included into a
prefixed volume, can be interpreted as the superposition ofrectangular contributions, each of which is associated to asingle sample.
To obtain more regular estimations (the rectangular function isdiscontinuous), the previous expression is generalized.
The pdf estimation is expressed as the sum of the Ncontributions, each of them is associated to a single sample, andthe single contribution is expressed by a function (), in generalnot rectangular, but such that () assumes real values that varywith continuity. The following estimation is obtained:
The function () is called Parzen window or kernel and theparameter his the widthof the window (or of the kernel).
1
1 1( )
Nk
nk
pN hh
x x
x
25
-
8/10/2019 03. PDF Estimation Corr
25/43
25
Features of the Kernel Function
In order that the Parzen-window estimation have sense, itsnecessary to impose restrictions on the kernel():
Necessary and sufficient condition for the Parzen-windowestimation to be a pdf, is that the kernel function itself be a pdf(i.e., a non-negative and normalized function):
Moreover some further conditions are accepted with the aimto obtain a goodestimation:
() takes its global maximum in 0;
() is continuous (this is necessary to guarantee that the
estimation doesntvary suddenly or have discontinuities); (x) is infinitesimal for x (then the effect of a sample
vanishes at large distances from the sample itself):
lim ( ) 0
x
x
( ) 0 , ( ) 1n
n dx x x x
26
-
8/10/2019 03. PDF Estimation Corr
26/43
26
Examples of Kernel Functions for n= 1
Rectangular kernel:
Triangular kernel: Gaussian kernel:
Exponential kernel:
Cauchykernel: Kernelwith sinc2()behavior:
21
22( ) exp xx
12( ) exp( )x x
1
x-1 1
x
x
x
x
x1/2
2
1 1
( ) 1x x
( ) ( )x x
( ) ( )x x
21 sin( / 2)
( )2 / 2
xx
x
Triangular
kernel
Gaussian
Kernel
Exponentialkernel
x
x Cauchy kernel
x
x sinc2 kernel
Here ()doesnt
satisfy thecondition ofcontinuity. x-1/2 1/2
x
1
Rectangular
kernel
27
-
8/10/2019 03. PDF Estimation Corr
27/43
27
Remark on the Parzen-Window Estimation
Multidimensional case
Often, in multidimensionalfeature spaces (n> 1) the choice of thekernel function is led back to the monodimensional case, byadopting:
where ()is a monodimensional kernelfunction (i.e., one of thoselisted in the previous slide). In other words, the
(multidimensional) kernel() has spherical symmetry and thebehavior, moving outward from the centre, is derived from ().
Properties of the Parzen-window estimation
It can be proved that generally the Parzen-window estimation isbiased.
However, choosing the width hof the kernel as a function of thenumber Nof training samples (i.e., h= hN) and by imposing that{hN} be an infinitesimal sequence of order smaller than 1/n, theParzen-window estimation becomes asymptotically unbiasedand consistent (e.g., hN= N
1/(2n)).
( ) ( ) x x
28
-
8/10/2019 03. PDF Estimation Corr
28/43
28Parzen-Window Estimation with a Finite Number of Samples
The asymptotic properties of the Parzen-window estimationare derived by making the number of training samples
approach infinity, which, obviously, itsnot realistic. With a finite training set, choosing h0, the estimation turns into
a sequence of Dirac pulses centered on the single samples andthen exhibits an excessive variability. Instead, if his too large, aneccessive smoothing is generated.
Therefore, the application of the method requires a high numberof trainingsamples, an adequate choice of the kernelfunction, anda compromise choice for the h value.
Automatic algorithms exist (not described in this course) for theautomatic optimization [Scott et al., 1987] [Sain et al., 1974] oreven for the adaptive optimization [Mucciardi, Gose, 1970] of thekernel width.
29
-
8/10/2019 03. PDF Estimation Corr
29/43
29
Remarks on the Parzen-Window Estimation
Computational complexity
Like the K-NN, also the Parzen-window estimation is
computationally heavy. However approaches exist to reduce thecomplexity of the Parzen-window estimation (not presented inthis course).
Probabilistic Neural Networks
The Parzen-window estimation with multidimensional gaussiankernelswith spherical symmetry can be implemented by means ofa neural architecture called Probabilistic Neural Network (PNN)[Specht, 1990]
30
-
8/10/2019 03. PDF Estimation Corr
30/43
30
Example (1)
Parzen-windowestimates of a one-
dimensional Gaussianpdf using a Gaussian
kernelN(0, 1). Being n=1, we have considered:
with constant h1> 0.
1N
hh
N
31
-
8/10/2019 03. PDF Estimation Corr
31/43
31
Example (2)
Parzen-windowestimates of a bimodal
one-dimensional pdf(one uniform and one
triangular modes)using Gaussian kernels
N(0, 1). The sameexpression of h
N
as forthe 1D Gaussian pdf
has been adopted.
32
-
8/10/2019 03. PDF Estimation Corr
32/43
32
Example (3)
Parzen-windowestimates of a
bidimensionalGaussian pdf using
Gaussian kernels
Being n= 2, we haveconsidered
with constant h1
> 0.
0 1 0,
0 0 1N
14N
hh
N
-
8/10/2019 03. PDF Estimation Corr
33/43
34
-
8/10/2019 03. PDF Estimation Corr
34/43
34
In particular, the estimate that presents the minimum meanquadratic error wrt the true pdf in the space of the m basis
functions is searched for. Therefore, the minimization of the following functional is
considered:
The functional to minimize is a quadratic form into the coefficientsc1, c2, , cm. By imposing a simple condition of stationarity (nullgradient) we obtain:
Minimization of the Quadratic Error
22
1 1 1
1 1 1 1
22
1
( ) ,
, , , ,
2 ,
m m m
i i i i j ji i j
m m m m
i j i j i i j ji j i j
m
i i ii
p p p c p c p c p
c c c p c p p p
c p c p
10 , , 1,2,..., ,
m
i i i iii
c p i m p pc
35
-
8/10/2019 03. PDF Estimation Corr
35/43
35
Computation of the Optimal Coefficients
Estimation of the coefficients of the expansion
Taking into account that p(x) is a pdf defined over A, we can
estimate the scalar product p, i(i= 1, 2, ..., m) by using a set oftraining samples :
Approximation error Increasing the number of the basis-functions m, we can obtain
estimations with smaller and smaller approximation errors: for m+ , we expect an infinitesimal error.
In fact, the existence of complete orthonormal bases can be
demonstrated, that is, sequencesof orthonormal functions {i(): i=1, 2, ...} such that any f function with finite energy can beexpanded as:
1
1, ( ) ( ) { ( )} ( )
N
i i i i i i kkA
c p p d E cN
x x x x x
1
, i ii
f f The series converges inquadratic mean (with respect
to the introduced norm)
36
-
8/10/2019 03. PDF Estimation Corr
36/43
36
Choice of the Basis Functions (1)
In general, being the true pdf unknown , its not possible to apriori identify the orthonormal basis that provides a given
approximation error with the minimum number of coefficients. A choice on the basis of operational issues can be taken, such as the
implementationsimplicity or the computation time.
Examples of complete orthonormal bases in the case n= 1
The goniometric functions form a complete orthonormal basis over[0, 2] (Fourier series expansion):
Complete orthonormal bases can be generated (over variousdomains) by means of systems of orthogonal polynomials (Legendre,Hermite, Lagrange, Tchebitshev polynomials).
1/2
1/2
1/2
(2 ) for 1
( ) cos( ) for 2 ( 1,2,...)
sin( ) for 2 1 ( 1,2,...)
i
i
x rx i r r
rx i r r
37
-
8/10/2019 03. PDF Estimation Corr
37/43
37
Choice of the Basis Functions (2)
Legendre polynomials
They are a sequence of recursively defined polynomials:
they are orthogonal into [1, 1]and need to be normalized:
In the case n> 1, a complete orthonormal basis can be obtainedby multiplying one-dimensional basis functions:
given a one-dimensional basis {i}, a bi-dimensional basis can bedefined as follow:
23 12 2 21 1
35 33 2 20 1
2 1( ) ,( ) ( ) ( )
1 1( ) , ...( ) 1, ( )
i i i
i iP x xP x xP x P x
i iP x x xP x P x x
1
1
2 1( ) ( ) ( ) ( )2 1 2
i j ij i iP x P x dx x i P xi
1 1 2 1 1 1 2 2 1 2 2 1 1 2
3 1 2 1 1 2 2 4 1 2 2 1 2 2
5 1 2 3 1 1 2
( , ) ( ) ( ) ( , ) ( ) ( )
( , ) ( ) ( ) ( , ) ( ) ( )
( , ) ( ) ( ) ...
x x x x x x x x
x x x x x x x x
x x x x
38
f h l
-
8/10/2019 03. PDF Estimation Corr
38/43
38
Accuracy of the Functional Approximation
The quality of the approximation depends on differentelements:
orthogonalityof the basis functions over the region of the featurespace in which the samples take values;
number mof the adoptedbasis functions .
Number of basis functions
The number mnecessary to reach a certain approximation errordepends on the chosen type of basis functions (i.e.: a sinusoidalp(x) will require, in general, less functions from a trigonometricbasis than from a polynomial one).
In the lack of a priori information on p(x), for a given basis,
typically m is derived inserting the estimated pdf into theadopted classifier, evaluating the performances on the test setandincreasing mtill reaching the desired accuracy.
39
E l (1)
-
8/10/2019 03. PDF Estimation Corr
39/43
39
Example (1)
ML classification with estimations based on Legendre polynomials.
Given two classes describedby the following training samples in a
bi-dimensionalfeaturespace:
Adopt a Legendre polynomials basis of order 4 (m= 4). Note that the
featuresare normalized into [1, 1], corresponding to the orthogonalityinterval of the Legendre polynomials (if its not so, its sufficient tonormalize them on such an interval). The basis-functions are:
1 1 11 1 2 0 1 0 22 2 2
33 12 1 2 1 1 0 2 12 2 2
3313 1 2 0 1 1 2 22 2 2
3 3 34 1 2 1 1 1 2 1 22 2 2
( , ) ( ) ( )
( , ) ( ) ( )
( , ) ( ) ( )
( , ) ( ) ( )
x x P x P x
x x P x P x x
x x P x P x x
x x P x P x x x
x1
x2
O 1
1
1
1
1
2
: ( 3 / 5,0), (0, 3 / 5), ( 3 / 5, 3 / 5) ,
: (1,1), (4 / 5, 2 / 5), (3 / 5, 4 / 5).
40
E l (2)
-
8/10/2019 03. PDF Estimation Corr
40/43
Example (2)
Computation of the coefficients:
For class 1:
For class 2:
3 3 3 31 11 1 1 13 5 5 5 5 2
3 33 3 3 3 3 31 12 2 2 2 33 5 5 5 5 3 2 5 5 5
3 3 3 3 3 9 91 14 4 4 43 5 5 5 5 3 2 25 50
3 31 271 1 2 1 2 1 24 10 10 100
,0 0, ,
,0 0, , 0
,0 0, , 0 0
( | ) , , [ 1,1]
c
c c
c
p x x x x x xx
31 4 2 4 11 2 2 23 5 5 5 5 2
3 2 33 31 4 2 4 1 4
2 2 2 23 5 5 5 5 3 2 5 5 53 11 331 4 2 4 1 2 4
3 3 3 33 5 5 5 5 3 2 5 5 30
3 3 8 91 4 2 4 1 124 4 4 43 5 5 5 5 3 2 25 25 10
12 4
(1,1) , ,
(1,1) , , 1
(1,1) , , 1
(1,1) , , 1
( | )
c
c
c
c
p x 3 11 271 2 1 2 1 25 20 20 , , [ 1,1]x x x x x x
41
E l (3)
-
8/10/2019 03. PDF Estimation Corr
41/43
Example (3)
Computation of the ML discriminant curve
The discriminant curve is an equilateral hyperbola (i.e., it has
orthogonal asymptotes):
3 31 271 2 1 2 1 24 10 10 100
3 91 11 27 17 271 2 1 2 1 2 1 24 5 20 20 10 20 25
1
1 2 1 2 2 1
: ( | ) ( | )
0
9090 85 108 0
108 85
p p x x x x
x x x x x x x x
xx x x x x
x
x x
If we used three basisfunctions, we would
obtain a lineardiscriminant function.x1
x2
O 1
1
1
1
-
8/10/2019 03. PDF Estimation Corr
42/43
-
8/10/2019 03. PDF Estimation Corr
43/43