dei/cisuc evonet summer school - parma © 2003 ernesto costa 1 how to conduct good experiments?...

47
1 DEI/CISUC Evonet Summer School - Parma © 2003 Ernesto Costa How To Conduct Good Experiments? Ernesto Costa DEI/CISUC [email protected] http:// www.dei.uc.pt/ ~ernesto

Upload: luke-skinner

Post on 13-Dec-2015

230 views

Category:

Documents


1 download

TRANSCRIPT

1

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

How To Conduct Good Experiments?

Ernesto Costa

DEI/CISUC

[email protected]

http://www.dei.uc.pt/~ernesto

2

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Summary

What is the goal of this talk?

BackgroundProbabilities

Random Variables and Probability distributions

Inferential Statistics

Applying the Theory

Conclusions

3

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

What is the goal of this talk?

I don’t know! I have been asked to give a talk on that subject…

I do know!EC is (much) an experimental discipline

Most of our work is to compare thingsAlgorithms

Parameters settings

What is a fair comparison?

4

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

What is the goal of this talk?

Looking for EC papersOne problem

One run

Several runs10, 20, 30?

Use average values

Use average of the bests

Use the mean

Use the mean and the standard deviation

Use Confidence Levels / Intervals

5

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

What is a good experiment? Identify independent and dependent variables

Mutation rate fitness

Different crossover operators fitness

Evolution and Learning # of survivors

Identify the conditions of the experimentInitial conditions

Number of runs

Parameters Settings

Identify the kind of Statistics you will needDescriptive

Inferential

Non parametric

What is the goal of this talk?

6

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

Probabilities

Experiment: procedure whose variable result cannot be predicted ahead of time.

Tossing a coin, rolling a dice

Sample Space: set of possible outcomes of an experiment.

{Heads, Tails}

{1,2,3,4,5,6}

Event: subset of the sample space{Heads}

{1,3,5}

7

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

Probability of an EventMeasure the likelihod that the event will occur

Tossing a (fair) coin: probability(outcome=heads) =1/2

AxiomsP(E)0

P(S)=1

For mutually exclusive events

11

( )i iii

P E P E

Probabilities

8

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

1/6

Experiment Prob. Assign.Sample Space Event

ExampleWhat is the probability of when rolling two dice the sum of the two outcomes equal 7?

Working Methodology

Two Dice Experiment

02468

1 2 3 4 5 6 7 8 9 101112

Sum

Nu

mb

er

Tw o DiceExperiment

Background

Probabilities

9

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Definition: Let E and F be two events, with p(F)>0. The conditional probability of E given F, p(E|F), is defined as:

p(E | F) p(EF)

p(F)

Probabilities

Example: A family has two children. Knowing that one is a boy whatis the probability that they have two boys?

1/3

10

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Probabilities

Theorem of Bayes:

)()|()()|(

)()|(

)(

)()|()|1(

2211

1111

ApABpApABp

ApABp

Bp

ApABpBAp

Example: A building has two lifts. One is used by 45% of the residents And the other by 55%. The first one, 5% of the time have problems, whileThe second 8% of the time can let you in trouble. Knowing that one lift had a problem , what is the probability of being lift number 1?

33,8%

11

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Random Variables

Definition: A random variable, X, is a function from the sample space of an experiment to the set of real numbers.

X(s)

s

SX

0 1 2 3

SX

A RV is a function … and is not random!!!

Random Variables and Probability Distributions

12

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Experiment Prob. Assign.Sample Space Event

Random Variable Prob. Distribution

Working Methodology

Random Variables and Probability Distributions

Toss coin (3x) f(xi)=p(X=xi)8 possibilities # Heads

X(HHT)=2 Xf(xi)

ExperimentSample Space

Random Variable

Event

Prob. Distrib.

Example

Prob. Assign.

13

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Example: Suppose you toss a coin three times. Let X(t) denote the number of heads that appear when t is the result. Então X(t):

X(HHH) = 3X(HHT) = X(HTH) = X(THH) = 2X(TTH) = X(THT) = X(HTT) = 1X(TTT) = 0

Probability Distribution

00,05

0,1

0,150,2

0,250,3

0,350,4

0 1 2 3

X

f(x

i)

Random Variables and Probability Distributions

Probabilty Distribution

14

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Random Variables and Probability Distributions

DiscreteProbability Mass Function

ContinuousProbability Density Function (pdf)

( ) ( ) 0P X x p x

Types of Random Variables

( ) 1x

p x

( ) 0,f x x

( ) 1f x dx

( ) ( )b

a

P a X b f x dx x

f(x)

0 x1 x2

15

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

LocationMean

DispersionVariance

Standard Deviation

( ) ( )x

E X xp x

Random Variables and Probability Distributions

Measures of Random Variables

( ) ( )E X xf x dx

2 2( ) ( ) ( )x

V X x p x

2 2( ) ( ) ( )V X x f x

( )V X

16

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Two random Variables, X and Y, over the same sample space S, are said to be independent iff:

Theorem of the Product

Theorem of Sum

1 2 1 2( ( ) ( ) ) ( ( ) )* ( ( ) )p X s r Y s r p X s r p Y s r

Random Variables and Probability Distributions

Independence of Random Variables

( ) ( ) ( )V X Y V X V Y

( * ) ( )* ( )E X Y E X E Y

17

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Random Variables and Probability Distributions

Binomial DistributionDomain {0,1,2,…n}

Probability mass function

Mean np

Variance npq

ininiii qpCpxXp )(

Discrete Probability Distributions

Binomial Distribution

0

0,05

0,1

0,15

0,2

0,25

0,3

1 2 3 4 5 6 7 8 9 10 11 12

Values x

Pro

ba

bili

ty

Series1

Binomial Distribution

0

0,05

0,1

0,15

0,2

0,25

1 2 3 4 5 6 7 8 9 10 11 12

Values x

Pro

babi

lity

Series1

P=0.3 P=0.5

n

i

inini npiqpCXE

0

)(

npqXEXEXV 222 )()()(

18

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Poisson DistributionApproach the Binomial DistributionDomain {0,1,2,3,...}Probability mass functionMean: Variance:

!)(

i

epiXp

i

i

Random Variables and Probability Distributions

Discrete Probability Distributions

=np Poisson distribution

0

0,05

0,1

0,15

0,2

1 2 3 4 5 6 7 8 9 10 11 12

Values

Pro

ba

bili

ty

Series1

=6Poisson Distribution

00,020,04

0,060,08

0,10,12

0,140,16

1 2 3 4 5 6 7 8 9 10 11 12

Values

Pro

ba

bili

ty

Series1

=8,4

19

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

-3 -2 -1 1 2 3

0.1

0.2

0.3

0.4

Normal (Gaussian) Distribution

Standard Normal Distribution

Random Variables and Probability Distributions

(3,2)N

2

2

2

)(

2

1)(

x

exf

)1,0(N

Continuous Probability Distributions

-4 -2 2 4 6 8 10

0.05

0.1

0.15

0.2

0.25

20

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Converting a normal distribution to a standard normal distribution

X a random Variable withMean Standard Deviation σ

Using a translationDefining a new Random variable

XZ

Random Variables and Probability Distributions

Continuous Probability Distributions

21

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

-3 -2 -1 1 2 3

0.1

0.2

0.3

0.4N(0,1)

=1=5

=101

2

2

1( )

1( , )2 2

f xx

B

Student’s t-DistributionApproximates the standard normal distribution N(0,1)

Degrees of freedom (df),

Mean 0, Variance

Random Variables and Probability Distributions

Continuous Probability Distributions

22

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Goal: to apply probability theory to data analysis

How?Model the data (population) by mean of a probability distribution

Use a sample of the data instead of the all populationEstimate the population parameters (, σ, p) using correspondent sample statistics (x, s, )

StatisticsBackground

population sample

σ

p

x

sparameters statistics

23

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Unbiased estimatorA statistics with mean value equal to the population parameter being estimated

Point Estimators

Interval Estimators

BackgroundStatistics

24

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

Consider a population with mean and standard deviation σ. Let denote the mean of the observations in random samples of size n. Then:

When the population distribution is normal, the sampling distribution of is also normal for any sample size n

(Central Limit Theorem) When n is sufficient large (n>30) the sampling distribution is well aproximated by a normal curve, even if the population distribution is not itself normal

)(XEx

Sample distribution of the sample mean and the Central Limit Theorem

X

nx

X

25

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

Unbiased estimatorsMean

Standard Deviation

)(XEx

Sample distribution of the sample mean

2( )ˆ

1

ii

x

x xs

n

(n-1) are the degrees of freedom (df)

26

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

ConsequenceFor a large sample or population whose distribution is normal:

has (approximately) a standard normal (Z) distribution.

Sample distribution of the sample mean and the Central Limit Theorem

x

x

XZ

27

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

Estimate the mean The population standard deviation, σ, is known;

The sample mean from a random sample, is known,

The sample size is large (>30)

The one sample Z confidence interval is

Example: for an 95% confidence interval Z=1.96.

X

Confidence Intervals – one sample

_critical valuex Zn

28

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

Example: we want a confidence level of 90%Look into a N(0,1)

For a CL of 90%, we have to isolate the area of 5% to the left and to the right of the bell shaped normal distribution.

The confidence interval will be given by

Looking in a table for the value of Z we obtain Z=1.65

Confidence Intervals – one sample

0.1

2

x Zn

29

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

What does it means having a confidence interval of 95%?

That there is a probability of 95% that the true mean (population) is in the interval? NO!!

Mean that 95% of all possible samples result in an interval that includes the true mean!

Confidence Intervals – one sample

30

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

Estimate the mean The population standard deviation, is NOT known;

The sample mean from a random sample, is known,

The sample size is large (>30) OR the population distribution is normal

The one sample t confidence interval is

where the t critical value is based on (n-1) degrees of freedom (df).

Example: for an 95% confidence interval and 19 df t=2.09.

The Student T Distribution can be used for small samples assuming that the population distribution is approximately normal

_critical value

sx t

n

Confidence Intervals – one sample

X

31

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

A hypothesis is a claim about the value of one or more population characteristics.A test procedure is a method for using sample data to decide between to competing claims about population characteristics. (= 100 or 100)Method by contradiction: we assume a particular hypothesis. Using the sample data we try to find out if there is convincing evidence to reject this hypothesis in favor of a competing one

Hypothesis Testing – one sample

32

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

The null hypothesis, H0, is a claim about a population characteristic that is initially assumed to be true.

Ha is the alternative hypothesis or competing claim.

Testing H0 versus Ha can lead to the conclusion the H0 must be rejected or we fail to reject H0. I that last case we cannot say that H0 is accepted!

Hypothesis Testing – one sample

33

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

ErrorsType I error

Rejecting H0 when H0 is true

The probability of a type I error, , is called Level of Significance of the test.

Type II errorFailing to reject H0 when H0 is false

The probability of a Type II error is denoted by .

There is a tradeoff between and : making type I error very small increase the probability of type II error.

Hypothesis Testing – one sample

34

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

Test Statistic (Z,t): function of the sample data on which a decision about reject or fail to reject H0 is based;

p-value (observed significance level): is the probability, assuming that H0 is true, of obtaining a test statistics at least as inconsistent with H0 as what actually resulted.

Decision about H0: comparing the p-value with the chosen .

Reject H0 if p-value

Hypothesis Testing – one sample

35

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

Hypothesis Testing – principlesWhat is the population parameter (mean,…)

State the H0 and Ha

Define the significance level The assumptions for the test are reasonable (big sample,…)

Calculate the test statistic (Z,…)

Calculate the associated p-value

State the conclusion (reject if p-value ,…)

Hypothesis Testing – one sample

36

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

ExamplePopulation parameter the mean, H0: =100, Ha: 100

Significance level =0.01

n=40 is large

From the sample: =105,3, σ=8.4

From the z-curve we know that the p-value 0

Therefore the null hypothesis, H0, is rejected with a significance level of 0.01.

x

Hypothesis Testing – one sample

105,3 1003.99

8.4

40

z

37

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

Use the sample distribution of the difference of the sample means:

PropertiesThe mean of the difference is equal to the difference of the means

The variance of the difference is equal to the sum of the individuals variances. Thus, the standard deviation:

The sampling distribution of the difference of the sample means, can be considered approximately normal (each n large, each sample mean come from a population (approximately) normal

1 2x x

Comparing Two Populations based on independent samples

1 2 1 2x x

1 2

2 2

1 2

1 2x x n n

38

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

AssumptionsThe two samples are independently random samples

Sample sizes are both large (n >30) OR the population distributions are (approximately) normal.

Formulas

1 2 1 2x x Confidence interval for the mean of

2 2

1 21 2 _

1 2critical valuex x t

n ns s

2 22

1 2 1 21 22 2

1 21 2

1 21 1

V Vdf where V V

n n

n n

s sV V

39

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

Same procedure, only the formulas are different!

Z TestLarge samples OR

Population distributions are (at least approximately) normal

Hypothesis Test

1 2 1 2

2 2

1 2

1 2

( )x xz

n n

40

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Background

t testLarge samples OR

Population distributions normal AND the random samples are independent

1 2 1 2

2 2

1 2

1 2

( )x xt

n ns s

Hypothesis Test

2 22

1 2 1 21 22 2

1 21 2

1 21 1

V Vdf where V V

n n

n n

s sV V

41

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Applying the Theory

Two algorithmsA standard GA

A standard GA + local learning (Baldwin Effect)

Goal: good quality machines

Who is better? Comparing the means!H0:1= 2 (no improvement!!!), Ha: 1≠ 2

Confidence level, =0.01

Assuming that the population distributions are normal

Number of (independent) runs = 30 for each case

Use t test

The Busy Beaver Problem

42

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

From the samples (# good machines)sga=0.1

be=0.23

Sga2=0.093

Sbe2=0.185

From the formulasdf=53

t=1.35

p-value2*0.1=0.2

Conclusion With =0.01and p-value =0.2, the null hypothesis H0 cannot be rejected

The Busy Beaver Problem

Applying the Theory

43

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Applying the Theory

Two different GAs applied to function optimization

A standard GA using a 2 point CXover

A modified GA using transformation

Goal: find the minimum

-500

-250

0

250

500 -500

-250

0

250

500

0

500

1000

1500

-500

-250

0

250

500

Function Optimization

The Schwefel Function

Minimum = 0

44

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Who is better? Two point Crossover or Transformation?

Comparing the means of the best fit!

H0:1= 2 (no improvement!!!), Ha: 1≠ 2

Confidence level, =0.05

Assuming the population distributions are normal

Number of (independent) runs = 30 for each case

Use t test

Applying the TheoryFunction Optimization

45

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

From the samples (fitness of the best individuals)sga=5.4838

tr=0.0768

Sga2=149.788

Str2=0.02958

From the formulasdf=29

t=2.42

p-value2*0.012=0.024

Conclusion With =0.05 and p-value =0.024, the null hypothesis H0 is rejected.

Applying the TheoryFunction Optimization

46

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Conclusions

This is a very simple presentationAssuming Normal distributions

There are many others

In many situations we cannot assume a normal distribution

Many things left unmentionedMore than two populations

Analysis of Variance (ANOVA)

Regression and Correlation

Non parametric methods

47

DE

I/C

ISU

C

Evonet Summer School - Parma© 2003 Ernesto Costa

Want to know more?

Paul Cohen, Empirical Methods for Artificial Intelligence. MIT Press, Boston, 1995 James Kennedy and Russell Eberhart, Swarm Intelligence (Appendix A),Morgan Kaufman, 2001.Roxy Peck, Chris Olsen and Jay Devore, Introduction to Statistics and Data Analysis,Duxbury, 2001.Mark Wineberg and Steffen Christensen, Using Appropriate Statistics, GECCO’2003 Tutorial.