03. pdf estimation corr

8/10/2019 03. PDF Estimation Corr

1/43

1Dipartimento di Ingegneria

Biofisica ed ElettronicaUniversit di Genova

Prof. Sebastiano B. Serpico

3. Supervised estimation of

probability density functions


2/43

2

Supervised Classifier Design

Approach 1: Approach 2:

Training setData set

Estimation of

the classconditional pdf

Decision

Theory

{p(x| i)}

training

samples for

the classes

{i}

Apply the decision

rule to the data set

Data set classification

Training set Data set

Training of anon-Bayesian

classifier by a

direct use of the

training

samples

training

samples for

the classes

{i}

Apply the decision

rule to the data set

Data set classification


3/43

3

Supervised Estimation of a pdf

The use of the decision theory to design classifiers requires topreliminary estimate the class conditional pdf. In a supervisedapproach, the estimation of the pdf p(x| i) can be performedon the basis of the trainingdata of the class i.

Problem definition and notation:

Consider a feature vectorx with (unknown) pdf p(x) and a finite

set X = {x1, x2, ..., xN} of N independent samples extracted fromsuch a pdf;

We would like to compute, on the basis of the available samples,an estimated pdf

In order to perform supervised classification, the estimation

process has to be repeated individually for each single class: inparticular, to estimate p(x| i), we assume that the set Xcorresponds to the set of trainingsamples of the class i.

( );p x


4/43

4

Approaches to pdf Estimation

Parametric Estimation: a given model (e.g.: Gaussian,

exponential,) for theanalytical form of p(x) isassumed;

the parameters of such amodel are estimated.

Remarks: a given model could be not

physically realistic;

most of the parametricmethods assume single-modepdfs, while many realproblems involve multimodalpdfs;

complex methods (notconsidered here) have beendeveloped to identify thedifferent modes of a pdf.

Non-Parametric Estimation :

no analytical model is assumed

for the pdf estimation, but p(x)is directly estimated from thesamples in X.

Remarks:

typically, the lack of predefined

models allows more flexibility; however, the computational

complexity of the estimationproblem is generally higherthan for the parametric case.


5/43

5

Parametric Estimation

Given an analytical model of the pdf p(x) to be estimated, theparameters that characterize the model are collected into avector r.

We highlight the dependence on the parameters by adopting thenotationp(x| ) (in particular,p(x| ), considered as function of ,is called likelihood function).

The training samples x1, x2, ..., xN are collected into a singlevector of observationsX. The samples are considered as randomvectors and a pdfp(X| ) is associated to them

usually the samples are assumed as identically distributedrandom vectors (because thay are all extracted from the same pdf

p(x)) and independent from each other (i.i.d. random vectors, thatis, independent and identically distributed), then:

1

( | ) ( | )N

kk

p p

X x


6/43

6

General Properties of the Estimations

General properties

The estimate of the parameter vector depends on the observationvector:

Therefore also the estimate is a random vector.

Bias

The expected value E{} of the estimation error iscalled biasand

is defined as: The estimation is told to be not biased if, for each parameter

vector , the estimation error has zero mean:

To prove that the estimation of the parameter i (i= 1, 2, ..., r) isgood, we want that the estimation error i (i component of )has zero mean (i.e. that the estimation is not biased), but also ithas to have a smallvariance var{i}.

( ) X

{ } 0 or { }E E

( , ) X


7/43

7

Variance of the Estimation Error

Cramr-Rao Inequality:

For each unbiased estimation of the vector , it holds:

where (

) = E{

p(X| )

p(X| )t} is the Fisher informationmatrix :

The Cramr-Rao inequality provides a lower bound for thevariance of the estimation error.

therefore var{i} cannot be made arbitrary small, but it will bealways lower bounded by [1()]

ii. Then, the Fisher information

matrix represents a measure of howgoodan estimation can be.

In particular, an unbiased estimation that satisfies the equality foreach vector of the parameters is told to be efficient.

1var{ } [ ( )] , 1,2,...,i ii i r

( | ) ( | )[ ( )]ij

i j

p pE

X X


8/43

8

Asymptotic Properties of Estimations

Often biased and/or nonefficient estimations are used,provided they exhibit a good behavior for largevalues of N.

An estimation is called asymptotically unbiased if the error meanis zero for N+ :

An estimation is told asymptotically efficient if the error variance

of the estimation corresponds to the Cramr-Rao lower bound forN+ :

An estimation is told consistent if it converges to the true valuein probability for N+ :

Sufficient condition for an estimation to be consistent is thatitsasymptotically unbiased and that the estimation error hasinfinitesimal variance for N+ [Mendel, 1987].

1

var{ }lim 1, 1,2,...,

[ ( )]i

Nii

i r

lim { } 1 0N

P

N

N

E 0 that is E lim lim


9/43

9

ML Estimation

Definition

The Maximum Likelihood (ML) estimate of the vector isdefined as the following vector:

Remarks

For different values of , different pdfs are obtained. Each ofthem is computed in correspondence of the observations X. Thepdf assuming the maximum value for X is identified: the MLestimate is the value of that produces this pdf.

Often its an advantage not to maximize the likelihood functionp(X| ), but (equivalently) the log-likelihood function:

arg max ( | )p X

arg maxln ( | )p X


10/43

10

ML Estimation: Example

ML estimation of the mean of a one-dimensional Gaussianwith known variance (i.e., equal to one) starting from a singleobserved sample x0.

-6 -4 -2 0 2 4 6

x0

0m x


11/43

11

Properties of the ML Estimation

Under mild assumptions about the function p(X| ), it can beproven that, if an efficient estimation exists and if the MLestimation is unbiased, then the efficient estimation is the MLestimation.

Even when an efficient estimation does not exist, the MLestimation exhibits goodasymptotic properties. In particular

the ML estimation is: Asymptotically unbiased

Asymptotically efficient

Consistent

Asymptotically Gaussian

These properties explain the wide diffusion of ML estimatorsin classification methods.

1( )

, .N


12/43


13/43

13

Properties of the Parametric Gaussian Estimation

The estimations of m and , as ML estimations, areasymptotically unbiased, asymptotically efficient andconsistent. Moreover, the following additional properties arevalid:

The estimation of m is unbiased, while the estimation of isbiased:

Therefore, usually, the estimation of is modified as follows:

The two estimations coincide for N + (it is consistent withthe fact that ML estimations are asymptotically unbiased).

The introduced estimation for the mean and the covariancematrix are called, generally, sample meanand sample covariance.

1

{ } , { } N

E E N

m m

1

1 ( )( )

1

Nt

k kkN

x m x m


14/43

14Iterative Expressions of theGaussian Parametric Estimation

The estimations of m and can also be expressed with aniterative form, using each single sample in sequence, instead ofthe whole training set at once:

Iterative computation of the sample mean:

Iterative computation of the samplecovariance:

( )( 1) 1

( )

1 (1) ( )1

, 1,2, ..., 11

1

,

kk kk

kh

h N

kk N

k

k

m x

mm x

m x m m

( )( 1) 1 1

( )

1 (1)1 1

( ) ( ) ( ) ( ) ( )

, 1,2, ..., 111

,

k tk k kk

k th h

h t

k k k k t N

kSS k N

S kk S

S

x x

x x

x x

m m

The iterativeestimations of

are referred to theexpression with

denominator N.


15/43

15

Example (1/2)

Given n= 3 featuresand two classes 1and 2, characterized by thefollowing training set:

1: (0, 0, 0), (1, 1, 0), (1, 0, 0), (1, 0, 1)

2: (0, 0, 1), (0, 1, 1), (1, 1, 1), (0, 1, 0)

In this case it has no sense to normalize the features. In fact the featuresare binary and the samples are represented by all possible combinationsof the three binary features.

We assume class conditional pdfs are Gaussian and we apply MLestimation. It is necessary to estimate the mean vector and thecovariance matrix for each class. Letssee the computation for the class

1. The mean estimated value is provided by:

1

0 1 1 1 3 / 41

0 1 0 0 1/ 44

0 0 0 1 1/ 4

m

x1

x2

x3


16/43

16

Example (2/2)

Estimation of the covariance matrix for 1 Letsuse as denominator N= 4, so obtaining a biased estimation.

The use of N 1 = 3 would give an unbiased estimation.However, for large N, such as N> 30, we would obtain 1/(N1)1/N.

1

3 / 4 1 / 41

1 / 4 3 / 4 1 / 4 1 / 4 3 / 4 1 / 4 3 / 4 1 / 44

1 / 4 1 / 4

1 / 4 1 / 4

1 / 4 1 / 4 1 / 4 1 / 4 1 / 4 1 / 4 1 / 4 3 / 4

1 / 4 3 / 4

9 3 3 1 3 1 1 11

3 1 1 3 9 34 163 1 1 1 3 1

1 1 1 3

1 1 1 1 1 31 1 1 3 3 9

12 4 4 3 1 11 1

4 12 4 1 3 164 16

4 4 12 1 1 3

In this case (notin general!) we

obtain

2 1


17/43

17

Non-Gaussian Parametric Estimation

When a Gaussian modelappears not accurate for the

considered problem, otherparametric models can beadopted.

In the case n> 1, an extension ofthe gaussian model is given by

the elliptically contoured pdfs:

where m= E{x}, = Cov{x} andf is an appropriate non-negative

function.

The level curves of such pdfsare hyperellipses, like in thegaussian case.

In the case n = 1, very generalmodels are the Pearsons pdfs,

that, varying some parameters,include uniform pdfs, gaussianand also impulsive modelswith vertical asymptotes.

1/ 2 1( ) [( ) ( )]tp fx x m x m


18/43

18

Non-Parametric Estimation: Problem Definition (1/2)

In a non-parametriccontext the estimation of the unknown pdfp(x) is not restricted to satisfy any predefined model and it is

directly built-up from the training samples x1, x2, , xN(assumed i.i.d.).

Let x* be a generic sample and R a predefined region of thefeature space; Rincludes x*. Assuming that the true pdfp(x) isa continuous function and that R is enough small that suchfunction doesntvary in a significant way in R, we have:

where Vis the n-dimensional volume (measure) of R.

If Kis the number of trainingsamples belonging to R(over a totalof Ntraining samples), a consistent estimation of the probability Pis represented by the relative frequency:

{ } ( ) ( *)RR

P P R p d p V x x x x

,RK

PN

lim { } 1 0R RN

P P PLaw of large

numbers


19/43

19

Non-Parametric Estimation: Problem Definition (2/2)

Pdf estimation

From the estimation of the probability PR that a sample belongsto R, we can derive an estimation of the pdf in the point x*:

Remarks

R has to be large enough to contain a number of trainingsamples that justify the application of the Law of large numbers;

R has to be small enough to justify the hypothesis that p(x)doesntvary significantly in R.

So, to obtain an accurate estimation, a compromise is necessary

between these two needs, to guarantee a goodestimation of thepdf.

However its not possible to obtain a good compromise, if thetotal number Nof samples in the trainingset is small.

( *) R

P Kp

V NV x


20/43

20

Two Non-parametric Approaches

By exchanging the roles of the quantities K and V, the abovereasoning leads to two possible approaches to non-parametric

estimation: the k-nearest-neighbor approach: for a fixed K and a given

point xof the feature space, the region Rcontaining the K samplesnearest to x belonging to the training set, the hypervolume V iscomputed, and the estimation of the pdf is deduced;

Parzen-window approach: for a fixed region R centered in x,whose hypervolume is equal to V, Kis computed considering thetraining set, then the estimation is derived.

Its possible to prove that both approaches lead to consistent

estimations. However, its not possible to draw generalconclusions about their behavior in a real context,characterized by afinitenumber of training samples.


21/43

21

K-Nearest-Neighbor Estimation

Hypotheses

The number of trainingsamples Kis preset.

A reference cell(e.g., a sphere) centered in x* is considered.

Methodology

The k-nearest neighbor (k-nn) estimator extends the cell till itexactly contains K training samples: VK(x*) is the volume of the

resulting cell. The pdf in the point x* is estimated as follow:

It can be proved that, selected Kas a function of N(K= KN), thenecessary and sufficient condition for the k-nn estimation to beconsistentin all points where p(x) is continuous is KN+ forN+ , but of order lower than 1 (e.g., KN= N

1/2).

( *)( *)K

Kp

N Vx

x


22/43

22

Remarks on the k-nn Method

Typically the cell used with k-nnis a hypersphere, then, the k-nnestimation is based on the following steps:

Identify the Ktrainingsamples closest to the considered point x*(wrt an Euclidean metric);

Identify the radius rof the smallest hypersphere that, centered inx*, includes the above Ksamples (rcoincides with the distance ofx* to the farthest one among the K samples);

Compute the volume of the n-dimensional hypersphere of radiusrand then the value of the estimation ofp(x*).

Disadvantages

The pdf estimated by the k-nnmethod is not a truepdf, since

its integral divergesbecause of the singularities due to the termVK(x*) at the denominator (e.g., V1(xk) = 0 for k= 1, 2, ..., N).

The k-nn estimation is computationally heavy, even if ad hoctechniques have been proposed to decrease the computationalcharge (e.g., KD-Tree).


23/43

23

Parzen-Window Estimation: Introduction

Hypotheses and notation

Suppose R is an n-dimensional hypercubewith side h (and thenvolume V= hn), centered in the point x*.

Introduce the following rectangular funcion:

Introductionto the method

The trainingsample xkbelongs to the hypercube Rwith center x*and side hif [(xkx*)/h] = 1, otherwise [(xkx*)/h] = 0.

Then the number of the trainingsamples that fall into Ris:

Consequently, the estimation can be computed as:1

*N k

k

Kh

x x

1 1

* *1 1 1( *)

N Nk k

n n

k k

Kp

NV h N hNh h

x x x xx

elsewhere0

0centerand1sidewithhypercubexif1(x)

2


24/43

24

Parzen-Window Estimation

The just illustrated estimation, based on the concept ofcounting the number of the training samples included into a

prefixed volume, can be interpreted as the superposition ofrectangular contributions, each of which is associated to asingle sample.

To obtain more regular estimations (the rectangular function isdiscontinuous), the previous expression is generalized.

The pdf estimation is expressed as the sum of the Ncontributions, each of them is associated to a single sample, andthe single contribution is expressed by a function (), in generalnot rectangular, but such that () assumes real values that varywith continuity. The following estimation is obtained:

The function () is called Parzen window or kernel and theparameter his the widthof the window (or of the kernel).

1

1 1( )

Nk

nk

pN hh

x x

x

25


25/43

25

Features of the Kernel Function

In order that the Parzen-window estimation have sense, itsnecessary to impose restrictions on the kernel():

Necessary and sufficient condition for the Parzen-windowestimation to be a pdf, is that the kernel function itself be a pdf(i.e., a non-negative and normalized function):

Moreover some further conditions are accepted with the aimto obtain a goodestimation:

() takes its global maximum in 0;

() is continuous (this is necessary to guarantee that the

estimation doesntvary suddenly or have discontinuities); (x) is infinitesimal for x (then the effect of a sample

vanishes at large distances from the sample itself):

lim ( ) 0

x

x

( ) 0 , ( ) 1n

n dx x x x

26


26/43

26

Examples of Kernel Functions for n= 1

Rectangular kernel:

Triangular kernel: Gaussian kernel:

Exponential kernel:

Cauchykernel: Kernelwith sinc2()behavior:

21

22( ) exp xx

12( ) exp( )x x

1

x-1 1

x

x

x

x

x1/2

2

1 1

( ) 1x x

( ) ( )x x

( ) ( )x x

21 sin( / 2)

( )2 / 2

xx

x

Triangular

kernel

Gaussian

Kernel

Exponentialkernel

x

x Cauchy kernel

x

x sinc2 kernel

Here ()doesnt

satisfy thecondition ofcontinuity. x-1/2 1/2

x

1

Rectangular

kernel

27


27/43

27

Remark on the Parzen-Window Estimation

Multidimensional case

Often, in multidimensionalfeature spaces (n> 1) the choice of thekernel function is led back to the monodimensional case, byadopting:

where ()is a monodimensional kernelfunction (i.e., one of thoselisted in the previous slide). In other words, the

(multidimensional) kernel() has spherical symmetry and thebehavior, moving outward from the centre, is derived from ().

Properties of the Parzen-window estimation

It can be proved that generally the Parzen-window estimation isbiased.

However, choosing the width hof the kernel as a function of thenumber Nof training samples (i.e., h= hN) and by imposing that{hN} be an infinitesimal sequence of order smaller than 1/n, theParzen-window estimation becomes asymptotically unbiasedand consistent (e.g., hN= N

1/(2n)).

( ) ( ) x x

28


28/43

28Parzen-Window Estimation with a Finite Number of Samples

The asymptotic properties of the Parzen-window estimationare derived by making the number of training samples

approach infinity, which, obviously, itsnot realistic. With a finite training set, choosing h0, the estimation turns into

a sequence of Dirac pulses centered on the single samples andthen exhibits an excessive variability. Instead, if his too large, aneccessive smoothing is generated.

Therefore, the application of the method requires a high numberof trainingsamples, an adequate choice of the kernelfunction, anda compromise choice for the h value.

Automatic algorithms exist (not described in this course) for theautomatic optimization [Scott et al., 1987] [Sain et al., 1974] oreven for the adaptive optimization [Mucciardi, Gose, 1970] of thekernel width.

29


29/43

29

Remarks on the Parzen-Window Estimation

Computational complexity

Like the K-NN, also the Parzen-window estimation is

computationally heavy. However approaches exist to reduce thecomplexity of the Parzen-window estimation (not presented inthis course).

Probabilistic Neural Networks

The Parzen-window estimation with multidimensional gaussiankernelswith spherical symmetry can be implemented by means ofa neural architecture called Probabilistic Neural Network (PNN)[Specht, 1990]

30


30/43

30

Example (1)

Parzen-windowestimates of a one-

dimensional Gaussianpdf using a Gaussian

kernelN(0, 1). Being n=1, we have considered:

with constant h1> 0.

1N

hh

N

31


31/43

31

Example (2)

Parzen-windowestimates of a bimodal

one-dimensional pdf(one uniform and one

triangular modes)using Gaussian kernels

N(0, 1). The sameexpression of h

N

as forthe 1D Gaussian pdf

has been adopted.

32


32/43

32

Example (3)

Parzen-windowestimates of a

bidimensionalGaussian pdf using

Gaussian kernels

Being n= 2, we haveconsidered

with constant h1

> 0.

0 1 0,

0 0 1N

14N

hh

N


33/43

34


34/43

34

In particular, the estimate that presents the minimum meanquadratic error wrt the true pdf in the space of the m basis

functions is searched for. Therefore, the minimization of the following functional is

considered:

The functional to minimize is a quadratic form into the coefficientsc1, c2, , cm. By imposing a simple condition of stationarity (nullgradient) we obtain:

Minimization of the Quadratic Error

22

1 1 1

1 1 1 1

22

1

( ) ,

, , , ,

2 ,

m m m

i i i i j ji i j

m m m m

i j i j i i j ji j i j

m

i i ii

p p p c p c p c p

c c c p c p p p

c p c p

10 , , 1,2,..., ,

m

i i i iii

c p i m p pc

35


35/43

35

Computation of the Optimal Coefficients

Estimation of the coefficients of the expansion

Taking into account that p(x) is a pdf defined over A, we can

estimate the scalar product p, i(i= 1, 2, ..., m) by using a set oftraining samples :

Approximation error Increasing the number of the basis-functions m, we can obtain

estimations with smaller and smaller approximation errors: for m+ , we expect an infinitesimal error.

In fact, the existence of complete orthonormal bases can be

demonstrated, that is, sequencesof orthonormal functions {i(): i=1, 2, ...} such that any f function with finite energy can beexpanded as:

1

1, ( ) ( ) { ( )} ( )

N

i i i i i i kkA

c p p d E cN

x x x x x

1

, i ii

f f The series converges inquadratic mean (with respect

to the introduced norm)

36


36/43

36

Choice of the Basis Functions (1)

In general, being the true pdf unknown , its not possible to apriori identify the orthonormal basis that provides a given

approximation error with the minimum number of coefficients. A choice on the basis of operational issues can be taken, such as the

implementationsimplicity or the computation time.

Examples of complete orthonormal bases in the case n= 1

The goniometric functions form a complete orthonormal basis over[0, 2] (Fourier series expansion):

Complete orthonormal bases can be generated (over variousdomains) by means of systems of orthogonal polynomials (Legendre,Hermite, Lagrange, Tchebitshev polynomials).

1/2

1/2

1/2

(2 ) for 1

( ) cos( ) for 2 ( 1,2,...)

sin( ) for 2 1 ( 1,2,...)

i

i

x rx i r r

rx i r r

37


37/43

37

Choice of the Basis Functions (2)

Legendre polynomials

They are a sequence of recursively defined polynomials:

they are orthogonal into [1, 1]and need to be normalized:

In the case n> 1, a complete orthonormal basis can be obtainedby multiplying one-dimensional basis functions:

given a one-dimensional basis {i}, a bi-dimensional basis can bedefined as follow:

23 12 2 21 1

35 33 2 20 1

2 1( ) ,( ) ( ) ( )

1 1( ) , ...( ) 1, ( )

i i i

i iP x xP x xP x P x

i iP x x xP x P x x

1

1

2 1( ) ( ) ( ) ( )2 1 2

i j ij i iP x P x dx x i P xi

1 1 2 1 1 1 2 2 1 2 2 1 1 2

3 1 2 1 1 2 2 4 1 2 2 1 2 2

5 1 2 3 1 1 2

( , ) ( ) ( ) ( , ) ( ) ( )

( , ) ( ) ( ) ( , ) ( ) ( )

( , ) ( ) ( ) ...

x x x x x x x x

x x x x x x x x

x x x x

38

f h l


38/43

38

Accuracy of the Functional Approximation

The quality of the approximation depends on differentelements:

orthogonalityof the basis functions over the region of the featurespace in which the samples take values;

number mof the adoptedbasis functions .

Number of basis functions

The number mnecessary to reach a certain approximation errordepends on the chosen type of basis functions (i.e.: a sinusoidalp(x) will require, in general, less functions from a trigonometricbasis than from a polynomial one).

In the lack of a priori information on p(x), for a given basis,

typically m is derived inserting the estimated pdf into theadopted classifier, evaluating the performances on the test setandincreasing mtill reaching the desired accuracy.

39

E l (1)


39/43

39

Example (1)

ML classification with estimations based on Legendre polynomials.

Given two classes describedby the following training samples in a

bi-dimensionalfeaturespace:

Adopt a Legendre polynomials basis of order 4 (m= 4). Note that the

featuresare normalized into [1, 1], corresponding to the orthogonalityinterval of the Legendre polynomials (if its not so, its sufficient tonormalize them on such an interval). The basis-functions are:

1 1 11 1 2 0 1 0 22 2 2

33 12 1 2 1 1 0 2 12 2 2

3313 1 2 0 1 1 2 22 2 2

3 3 34 1 2 1 1 1 2 1 22 2 2

( , ) ( ) ( )

( , ) ( ) ( )

( , ) ( ) ( )

( , ) ( ) ( )

x x P x P x

x x P x P x x

x x P x P x x

x x P x P x x x

x1

x2

O 1

1

1

1

1

2

: ( 3 / 5,0), (0, 3 / 5), ( 3 / 5, 3 / 5) ,

: (1,1), (4 / 5, 2 / 5), (3 / 5, 4 / 5).

40

E l (2)


40/43

Example (2)

Computation of the coefficients:

For class 1:

For class 2:

3 3 3 31 11 1 1 13 5 5 5 5 2

3 33 3 3 3 3 31 12 2 2 2 33 5 5 5 5 3 2 5 5 5

3 3 3 3 3 9 91 14 4 4 43 5 5 5 5 3 2 25 50

3 31 271 1 2 1 2 1 24 10 10 100

,0 0, ,

,0 0, , 0

,0 0, , 0 0

( | ) , , [ 1,1]

c

c c

c

p x x x x x xx

31 4 2 4 11 2 2 23 5 5 5 5 2

3 2 33 31 4 2 4 1 4

2 2 2 23 5 5 5 5 3 2 5 5 53 11 331 4 2 4 1 2 4

3 3 3 33 5 5 5 5 3 2 5 5 30

3 3 8 91 4 2 4 1 124 4 4 43 5 5 5 5 3 2 25 25 10

12 4

(1,1) , ,

(1,1) , , 1

(1,1) , , 1

(1,1) , , 1

( | )

c

c

c

c

p x 3 11 271 2 1 2 1 25 20 20 , , [ 1,1]x x x x x x

41

E l (3)


41/43

Example (3)

Computation of the ML discriminant curve

The discriminant curve is an equilateral hyperbola (i.e., it has

orthogonal asymptotes):

3 31 271 2 1 2 1 24 10 10 100

3 91 11 27 17 271 2 1 2 1 2 1 24 5 20 20 10 20 25

1

1 2 1 2 2 1

: ( | ) ( | )

0

9090 85 108 0

108 85

p p x x x x

x x x x x x x x

xx x x x x

x

x x

If we used three basisfunctions, we would

obtain a lineardiscriminant function.x1

x2

O 1

1

1

1


42/43


43/43

03. pdf estimation corr

Documents