Download - 1 Basic Experimentation Notes developed by Ken Lulay Mechanical Engineering University of Portland July 2008

1

Basic Experimentation

Notes developed by Ken LulayMechanical EngineeringUniversity of PortlandJuly 2008

2

Objectives

You will be able to: Understand basic experiment “vocabulary” Design and analyze a single variable

experiment (2 level factor) Design and analyze a multi-variable

experiment (2 level factors)

3

Basic Experimentation - Overview

Experiment Basics

Single variable experiments – design and analysis

Multi-variable experiments – design and analysis

4

Overview of Experiment Basics

Differences between testing and experimenting

Experimental variables Errors: systematic and random

5

Experimenting and testing…

…both require obtaining data (taking measurements), but…

…what are they and how are they different?

6

Testing

Testing may involve investigating only one set of conditions.

Usually evaluating performance Example: determine strength of a material

May be a standardized test (ASMT, ISO) Often has pass/fail criteria

Does it meet specifications or not? We will not be discussing “testing”

7

Experimentation

Performed to increase knowledge how things perform under differing conditions

Vary the input to determine the response Requires more than one set of conditions

(design points)

Evaluate “better/worse” (not pass/fail)

8

Experimentation & Testing

A BIG difference:

Tests are often routine Same tests done daily! Analogous to daily commuting to work

Experiments are “unique” Usually done only once! Therefore, require more careful planning! Analogous to a vacation trip

9

Variables

Variables are physical quantities that may or may not affect the results of an experiment or test.

Several types of variables are associated with any test and experiment:

Controlled or Extraneous Controlled variable are held constant or intentionally

manipulated (changed) during an experiment. Extraneous variables are not controlled. They are

generally assumed to have no effect on the response (ex: ambient room temperature)

10

Variables

Dependent or Independent The magnitude (value) of dependent variables are

dependent upon other variables whereas the magnitudes of independent variables are not

Ex: in an experiment to determine the effect of temperature change on the toughness of AISI 1045 steel, temperature would be an independent variable and toughness would be a dependent variable

Continuous or Discrete (a.k.a Categorical) Discrete variables cannot take on a continuous range

of values. Ex: Red/Green; Company A/Company B. Continuous variables can take on a continuous range.

Ex: temperature, toughness, force

11

Terminology

Factor - an independent variable in an experiment - factor levels are intentionally varied in an experiment to see what the effect is on the response.

Factor Level - the target value of the factor. Example: pressure may be set to two levels: 0.5 Atm, and 1.0 Atm

Response - the thing to be measured. Example, if you want to determine the yield strength at different temperatures, the yield strength is the response.

12

Variables and Levels

Proper selection of appropriate variables and their levels is not trivial but is critical

Selecting proper factors and levels is worth the effort. Don’t rush this step.

Differences in factor levels: Factor levels must be “well separated” Far enough apart to be “different” (produce-ably and

measurably) Not too far apart to be “unreasonable” (non-linear

responses can be an issue – may miss the optimum)

13

Purpose of experiments?

The sole purpose of our experiments will be to answer the following questions:

Does changing one or more factor have a statistically significant effect on the response(s)? And if so, which factors appear to have the most significant effect?

14

Practice

A materials engineer wants to study the effect of molybdenum content in a particular high alloy steel on the yield strength at various temperatures.

For this experiment: Define the factors and their levels Define the response Identify “all” variables and classify them

controlled/extraneous discrete/continuous dependent/independent

15

Practice (“answers”)

Factors (controlled): Molybdenum content; levels: 5.1% and 5.2%? Test temperature; levels: -50F, 1000F?

Also control: Test bar geometry, chemistry (other than Mo),

strain rate, measurement methods and systems, test methods and systems,…

Extraneous: humidity, … Dependent: yield strength (response)

16

Errors

Errors (measurement variation) are due to a number of factors:

Measurement error error = measured value - true value

Changes in test specimen Ex: one specimen has slightly larger diameter

Changes in environment Ex: ambient temperature increase

Et cetera

17

Errors

The “true” value is the value one would obtain with a perfect measurement.

The true value is never known in an experiment

Therefore, error can never be known exactly, it can only be estimated using statistical analysis.

Errors are inherent in measuring devices and caused by uncontrollable variations within the experiment.

18

Systematic and Random Errors

In any experiment, two types of error can exist:

Systematic Random

19

Systematic Errors

Caused by underlying factors which affect the results in a “consistent/reproducible” and sometime “knowable” way

Sometimes referred to as “bias” Not random DANGER: can lead to false conclusions!

Discuss this now, but example to follow later Can be managed (reduced effects) by

properly designed experiments (randomizing the test conditions).

20

Causes of Systematic Errors

Unknown changes during the experiment temperature, procedures, equipment, etc.

Different batches of material or samples

Et cetera

21

Random Errors

Show no reproducible pattern – they are random.

Sometimes referred to as “noise.” Typically have normal distribution (bell

shaped)averaging several readings can reduce random errors.

22

Practice

Consider the previous example (experiment to determine effect of varying molybdenum content and temperature on yield strength)

Make a list of possible systematic errors and random errors for the design on next slide…

23

PracticeRun Moly Temp.

1 2% 50F

2 2% 50F

3 2% 50F

4 10% 50F

5 10% 50F

6 10% 50F

7 2% 150F

8 2% 150F

9 2% 150F

10 10% 150F

11 10% 150F

12 10% 150F

•The Experiment:

–Two batches of steel: 2wt%Mo & 10wt%Mo

–Test bars are machined by outside company

–Two test temperatures: 50F, 150F

PRACTICE: Make a list of possible systematic errors and random errors

24


Possible systematic errors: Batches of steel (chemistry variation of other elements)

How could this effect be mitigated? Machining of specimens (did moly content affect

machining quality? Were specimens machined in batches with different diameters?)

Temperature drift during testing (maybe from 52F towards 48F, and from 152F towards 148)?

Variation between beginning and end of test (measurement systems, operator, test equipment, test procedures…)

How could systematic errors have been reduced?

25


Possible random errors: Measurement errors Diameter of bars (maybe random) Load cell variation Others?

26

Review of Terminology

Do Exercise 1 (definitions).

27

Single Variable Experiments to follow

28

Overview of Single Variable Experiments

Basic Design of Experiments (DOE) Example of “how not to” Statistics and t-testing Hypothesis testing Confidence Intervals

29

Design of Experiments (DOE)

By careful design, errors can be mitigated Systematic errors are mitigated by randomizing the test

conditions (randomized run order) Random errors are mitigated by increasing the number

of data points

Design is a compromise of competing criteria: Cost, time, availability of equipment, etc. Control over variables Importance of results and conclusion

CAREFUL PLANNING is REQUIRED! Let’s look at a basic example…

30

Example:Single Variable Experiment

Wacky Engineer, a new employee at ASKO, believes that the color of paint applied to a tensile bar can affect the strength.

Let’s take a look at this experiment…

31

Single Variable Experiment

Determine if paint color affects strength of tensile bars Factor 1: paint color Levels: Red, Green Other controlled variables: test specimen geometry

and material (constant) Response: yield strength of bar Results: Red = 81.9ksi, Green = 80.2ksi

Did color of paint have an effect? Not a well thought out experiment We need more and better data…

32


New Experiment with more data: paint five bars red and five green

Red paint is available, green paint is on backorder.

Your boss really wants data soon! Test facility is available, so show progress:

Paint and test red bars! Green paint arrives, complete the testing!

33


The results:R: 80.3, 81.2, 82.1, 83.1, 82.2; Ave=81.9G: 78.2, 82.1, 80.8, 81.6, 81.1; Ave=80.2

The red bars were stronger on average. Same operator did all testing. Red bars were the first tensile bars he’s ever tested. Did color of paint have an effect? This is another poorly thought out experiment. What are some problems with this experiment?

34

Another, Better Example

Re-do the prior experiment, but randomize Randomize by using the following run order: R, G, G, R, G, G, R, R, G, R

why randomize?

Why would the following run order not be “OK”?

R, G, R, G, R, G, R, G, R, G

35

Better Example

The randomized run order results:R: 80.3, 81.2, 82.1, 83.1, 82.2; Ave=81.9

G: 78.2, 82.1, 79.8, 79.6, 81.1; Ave=80.2 R & G averages are different but did color of

paint really have an effect? Averages are only part of the answer

“Statistically significant” difference depends upon both the averages and the variation.

36

Plot the Data

79 81 83

• Looks like Red paint increased the strength!• Will your boss believe this?• How certain are you that the effect is real?• How likely is this to be a “fluke”?

37

Need some statistical stuff…

38

Probability Distribution

Assume distribution is “normal”!!! Measurements are a sample of the total We can never be 100% certain about

experimental results (variation, error). Can only estimate “likelihood” or “probability”

a b

f(x)

b

adxxfbxa )()Pr(

39

t-test

Comparing the averages is NOT sufficient!

The best way to answer “are they different” is with the t-test.

The t-test incorporates both the deviation of the data as well as the means.

40

t-test – what does it do?

Consider two sets of sampled data Are their true means likely different?

What about these two sets?

t-test will help us decide

Both sets havesame averages

41

Basic Statistics = true mean X = estimated mean based on finite sample size = true standard deviation S = estimated standard deviated based on the finite sample size n = number of samples

xi is the value of the ith sample (Equation 1)

n

iix

nX

1

1

n

ii Xx

nS

1

2)(1

1 (Equation 2)

42

Basic Statistics, Continued

Note: X is an estimate of the actual mean (). It becomes closer to with increasing sample size, n. X itself is a random sample of the true mean, .

For normally distributed data:68.3% of all data will be with in +/- 1 95.4% of all data will be with in +/- 2 99.7% of all data will be with in +/- 3

43

Hypothesis Testing

We want to determine if color of paint had an effect on strength (Red vs. Green, prior example).

Hypothesize there is no effect due to paint color (this is the so-called “null hypothesis” or H0=0). In other words, we claim that:

Red = Green

We have sample means (XR=81.9, XG=80.2) which

are estimates for the true means (Red, Green) but we

can never know the true means exactly.

44

Statistics

Assume the deviations are the same (R = G) “Pool” the deviations:

For our Paint Color experiment:

nR = nG = 5, SR2 = 1.33; SG

2 = 2.23

Sp2 = {(5-1)*1.33+ (5-1)*2.23} / {(5-1) + (5-1)}

Sp2 = 1.78

)1()1(

)1()1( 222

GR

GGRRp nn

SnSnS (Equation 3)

45

t-test

We now define t0, which is from the t-distribution (don’t worry about what that means):

For our example:

t0 = ABS{81.9 – 80.2} / {1.78 (1/5 + 1/5)}1/2

t0 = 2.06

)11

(20

GRp

GR

nnS

XXt

(Equation 4)

46

t-test

So what is this “t0” number?

Notice the “effect” (difference between the two samples) is in the numerator, the variation (“noise”) is in the denominator.

The larger t0 is the greater the probability that the effect (difference) is real. How large is large?

)11

(20

GRp

GR

nnS

XXt

“Effect”

“Error” or “variance”

47

t-test

To determine the t-distribution value we need to know the degrees of freedom and select a confidence level

Determine the degree of freedom in our experimentDOF = (nR - 1) + (nG - 1) = (5 - 1) + (5 - 1) = 8

We need to compare t0 calculate with tabulated values from t-distribution with corresponding degrees of freedom (8) at some level of confidence

Confidence level is our choice, typically 95% or 99%.

48

t-distribution Table

We select 95% confidence as our criterion

For 95% confidence interval, = 0.05

There are 8 degrees of freedom in this experiment

From t-distribution: t/2, DOF = t0.05/2, 8 = 2.31

t-distribution values are obtained from tables in most statistics/experimentation books.

Note, t/2 – means we are using 2-sided or 2-tailed test which is

appropriate for the hypothesis of R =G. If we were to ask the question

is R > G, then we would use single-sided t-table (t/1, DOF).

49

t-test

In our paint example t0 < t/2, DOF (2.06 < 2.31)

t0 is too small to reject the null hypothesis at 95% confidence.

Therefore, we accept the null hypothesis

(Red = Green).

This does not mean we are 95% confident that the bars painted red were equal to the green. It means we cannot say with confidence that they are different. Next slide…

50

95% Confident?

“Cannot say they are different” is not equal to “saying they are the same”A “well mixed” box of 100 apples: any of the apples

can be either red or green. Hypothesis: number of red = number of green

New box, pull 30 out: 1 is red, 29 are green – would you reject the null hypothesis?

Null hypothesis: 50 are red, 50 are green Pull 30 apples out: 15 red, 15 are green – would

you reject the null hypothesis? Pull 99 apples: 49 are red, 50 are green…?

51

Exercise – t-test Task: conduct an experiment to determine if there is a

difference in the fatigue life between two brands of paperclips.

Fatigue life is defined in this experiment as number of times the clip can be bent back and forth 90 degrees. One 90 degree bend is one fatigue cycle.

If there are 10 or more people in the class, split the class in half (effectively conducting two identical experiments with about 5 data points per paperclip brand in each experiment).

Each person in the class should break one of each brand and record their results.

Use the worksheets in back of this book.

52

Pairing – a special condition t-test

Determining relative magnitudes of the “effect” and “noise” is foundational for statistical analysis of experimental data.

“Noise” comes from many sources: differing batches of specimens, differing test or measurement apparatus, operator differences, etc.

If we can “filter out” noise, we would be able to perform a more effective analysis.

53


If there is a single source of noise that we can identify and control, we may be able to “pair” the data.

For example, if we want to test the wear life of a new alloy, we may conduct an experiment to compare the life of the new alloy with a traditional alloy.

Put several of each part on buckets in the field. Due to differing loading conditions, one would expect

there to be large variability in wear from one bucket to another. Therefore, the bucket variation will introduce a large amount of noise.

54


Pairing: put one sample of each alloy on each bucket (alternating location (left-right) of the two alloys from bucket to bucket). Determine the difference in life on each bucket between the two alloys.

Since each alloy on a given bucket presumably will experience similar loading, pairing will effectively “filter out” the noise contributed by bucket-to-bucket variation.

The null hypothesis now becomes D = 0, where D is the difference in means between the two groups (alloys, in this example).

55


For paired t-test

where

D

D

D

nS

Xt

20

Dn

iDi

DD Xd

nS

1

22 )(1

1

(Equation 5)

XD is the average of the differences, nD is the

number of pairs of data, di is the individual

data (difference).

56

Exercise - Pairing

What about our paperclip experiment? A potentially large source of error was

operator-to-operator variability. Each operator tested one of each paperclip,

therefore, we can analyze using pairing! How lucky!

Using the worksheet in the back of this booklet, re-analyze the paperclip fatigue data using pairing (Exercise 3).

A bit more on pairing…

Situation: Your company produces optical supplies. The quality of optical

mirrors is not satisfactory. You believe that the problem has to do with grinding speed.

Given: *Your company has 12 grinding machines to produce optical mirrors. *The machines are numbered 1-12, but are randomly placed throughout the

shop. *You are allowed to use a total of 24 mirrors in your experiment. Task: Design an experiment (i.e. fill in the table below) to determine which cutting

speed is better, Fast or Slow. At a maximum, you will have 24 runs. You are not trying to evaluate grinding machine performance, so you may use

any number of grinding machines (1,2,...12).

57

Run Grinder Speed Run Grinder Speed1 2 Slow 13 4 Slow2 3 Slow 14 7 Fast3 11 Fast 15 5 Slow4 6 Slow 16 6 Fast5 1 Slow 17 11 Slow6 5 Fast 18 4 Fast7 10 Slow 19 9 Fast8 8 Fast 20 10 Fast9 2 Fast 21 8 Slow

10 12 Slow 22 3 Fast11 12 Fast 23 1 Fast12 9 Slow 24 7 Slow

58

Grinder Response for Slow Response for Fast Difference

1 1.22 1.96 0.742 1.63 1.80 0.173 2.42 3.01 0.594 3.12 3.05 -0.075 0.76 1.23 0.476 4.23 4.89 0.667 1.58 1.30 -0.288 2.81 3.17 0.369 2.19 2.94 0.75

10 3.75 3.90 0.1511 1.66 2.28 0.6212 3.80 4.40 0.6

AVERAGE XL=2.431 XH=2.828 XD=0.3967Deviation sL=1.118 SH=1.171 sD=0.335Samples nL=12 nH=12 nD=12

59

60

0 1 2 3 4 5

Mirror Quality

Slow Speed

Fast Speed

Figure 1 - data from Fast and Slow grinding speeds.

61

-0.4 -0.2 0 0.2 0.4 0.6 0.8

Mirror Quality Difference (Fast - Slow) for Each Machine

difference

Figure 2 - difference between Fast and Slow data on each machine.

62

0 1 2 3 4 5

Mirror Quality

Slow Speed

Fast Speed

Figure 3 - data from Fast and Slow grinding speeds, with one additional data point for

each.

New data

New data

Add 2 new data points created on one machine…do they appear to be “reasonable?”

63

-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4

Figure 4 - difference between Fast and Slow data on each machine with addition point.

Difference between new data points

But they are produced on the same machine, so does the difference appear to be “reasonable?”

64

Pairing – conclusion

Pairing is not always an option (need to be able to identify a single source of noise and then introduce one of each group to the precise same noise).

If it is an option – do it! There is no cost other than planning for it. It will increase the power of the conclusion Not uncommon to fail to reject the null hypothesis

using the t-test alone, but rejecting it using pairing analysis because of its increased power.

65

Can the t-test mislead us?

Yes! We can only make statements about probability!

Also, for the t-test to be valid, the data must have normal distribution.

There are two types of errors that can be made with hypothesis tests (next slide, please)…

66

Hypothesis Errors

Type I The probability of erroneously rejecting the null

hypothesis Also known as the level of significance (equals )

Type II The probability of erroneously accepting the null

hypothesis The power of the experiment increases with increased

number of data points (less likely to make a type II error)

These are independent, not complimentary (think about the previous “apple” example.)

67

Confidence Intervals

Rather than asking the question “are they different” we may want to ask the question “how different are they”

Confidence Intervals help us answer that question

68

Confidence Intervals What is the likely range of differences between the means of two groups (A-B)?

)11

()( 2,2/

BApDOFBA nn

StXXCI

The interval or range is:

Where t/2, DOF is based on the level of confidence, and DOF = nA + nB - 2

(Eq’n 6)

69

Confidence Intervals

For our example, for 95% confidence intervals, we have: DOF = 5 + 5 – 2 = 8 (nG = nR = 5)

t0.05/2,8 = 2.31 (from t-distribution tables)

XR=81.9, XG=80.2, Sp2 = 1.78 (from previous)

= (81.9-80.2)+/-(2.31){1.78(1/5+1/5)}1/2 = 1.7 +/- 1.95

)11

()( 2,2/

GRpDOFGR nn

StXXCI

70

Results

• The 95% confidence interval is:

1.7 - 1.95 < R -G < 1.7+1.95

Which is: -0.25 < R – G < 3.65

• We are 95% confident that the true difference in means of these two groups lies somewhere within this interval

(-0.25 to 3.65).

• Since the interval contains zero, we failed to reject the null hypothesis.

• Would the range increase or decrease for higher levels of confidence?

71

So what?

The confidence interval was determined to be: -0.25 < R – G < 3.65 (95% level)

What if we consider a difference of 2ksi or greater to have engineering significance, what next?

What if we consider a difference of 5ksi or greater to have engineering significance, what next?

72

Exercise – confidence interval

Using the worksheet in the back (Exercise 4) and the data already obtained, determine the confidence interval for difference in fatigue life (number of bends until fracture) of two paperclip brands (A and B).

73

Randomize run order (remember, systematic errors = bad)

t-test is used to evaluate “is there an effect?” Pairing can be more powerful

use it if possible Confidence Interval determines likely range of

the difference (R – G )

Summary for Single Variable Experiment

74

We’ve looked at a single variable experiment.

What about more complicated conditions experiments with two or more variables…

What’s next?

75

Multi-Variable Experiments to follow

76

Overview of Multi-Variable Experiments

“One variable at a time” approach Interactions (what are they?) Terminology Balanced design (what’s this?) Factorial Experiments Practice (optimize a “manufacturing” process)

77

Purpose of experiments?

Remember, the purpose of experiments is to answer the question “does changing one or more factors have an effect on the response.”

Our job is to answer that question using the limited resources available as well as possible.

78

Multi-Variables Using “One Variable at a Time” Approach

“One variable at a time” Very basic experiment Seems intuitive and simple

Reality: Difficult to draw meaningful conclusions Poor use of resources Avoid these types of experiments! Example to follow illustrates why

79

One Variable at a Time

Example: Determine optimal conditions for the following machining process:

Factors: Tool condition (levels: dull or sharp) Cutting depth (levels: 0.005” or 0.010”) Cutting speed (levels: 500rpm and 1000rpm)

Response: surface finish

80

One Variable at a Time, Design

Test conditions: Run 1, “baseline” or “control”

sharp, 0.005”, 500rpm Run 2, vary the tool condition

dull, 0.005”, 500rpm Run 3: vary the depth

sharp, 0.010”, 500rpm Run 4: vary the speed

sharp, 0.005”, 1000rpm

81

One Variable at a Time, Results

Run Tool Depth Speed Results

(surface finish)

1 Sharp Low Slow 140rms

2 Dull Low Slow 190rms

3 Sharp Deep Slow 120rms

4 Sharp Low Fast 90rms

82

One Variable at a Time, Conclusion?

Run 4 produced the best result, but…

How much random error was present?

Did systematic error influence the results?

Best set of variables maybe: Sharp tool? Deep cut? Fast?

Run

1 (base) 140

2 (dull) 190

3 (deep) 120

4 (fast) 90

83

One Variable at a Time, Conclusion

We have only one data point for conditions of “dull”, “deep”, “fast”, but three data points for “sharp”, “low”, “slow”

There is no way to estimate errors (most conditions were tested only once) Without estimating the errors, it is difficult to draw

valid conclusions. Did not test the “best conditions” together

This is “okay” if there are no interactions Could conduct another experiment to

validate – but wouldn’t it have been better to do a complete job the first time?

84

We’ve identified that there are problems with the “one variable at a time” approach.

Before considering better alternatives, we need to understand “interactions.”

Interactions?

85

What are Interactions?

An interaction is when changing one factor influences how a different factor will affect the response. Clear?

Example: Conduct an experiment to determine which is more effective at keeping your shirt dry in the rain: an umbrella or a raincoat.

Experiment: 2 factors: Factor 1, “weather”: rain with wind, rain with no wind Factor 2 , “tool”: umbrella, raincoat Response: “wetness”

86

Interaction Example

No wind: raincoat and umbrella were effective.

Wind: the umbrella was not effective but coat was.

There is an “interaction” between “weather” and “tool”

Run Weather Tool Wetness

1 Wind Umbr 80%

2 Wind Coat 20%

3 No wind Umbr 30%

4 No wind Coat 20%

87

What would no interaction “look like?” Contrast the previous with the following “no

interaction” results…

88


1 Wind Umbr 90%

2 Wind Coat 30%

3 No wind Umbr 80%

4 No wind Coat 20%


1 Wind Umbr 90%

2 Wind Coat 80%

3 No wind Umbr 30%

4 No wind Coat 20%

No interaction(“tool” had an effect: you’ll get wet if you use an umbrella)

No interaction(“weather” had an effect: you’ll get wet if it’s windy)

89

No Wind

Wind

Coat

Umbrella

Res

pons

e (w

etne

ss)

No Wind

Wind

Coat

Umbrella

Res

pons

e (w

etne

ss)

No Wind

Wind

Coat

Umbrella

Res

pons

e (w

etne

ss)

Interaction No interaction No interaction

(Response not parallel) (Response parallel) (Response parallel)

Interactions are easier to see by plotting results.Our three different scenarios:

90

Alternative Interaction Plot

Run Weather Tool Interaction Wetness

1 Wind Umbr + 90%

2 Wind Coat - 80%

3 No wind Umbr - 30%

4 No wind Coat + 20%

No Wind

Wind

Coat

Umbrella

Res

pons

e (w

etne

ss)

Plot all 4 interaction values and fit a trend line. If trendline is flat, there isno interaction. See “exercise” in back for more completediscussion.

- +

Res

pons

e (w

etne

ss)

Interaction

91

Exercise - interactions

Determine if interactions exist in the results shown in the exercise in the back of the booklet (Exercise 5).

92

End of Story for“One Variable at a Time”

The “machining” example above did not evaluate interactions…

…We can not determine what best set of conditions are.

And not only are we not confident in the results, we have no idea of how “not confident” we are!

93

What’s next?

We’re almost ready for some really fun stuff…

…but first, some terminology…

…and then, explain what “balanced” designs mean.

Then we can have fun with designing experiments “the right way!”

94

Terminology

Repetition - measuring the same response more than once (or taking another data point) without resetting up the experimental conditions. Decreases measurement errors to a limited degree.

Replication - requires completely redoing the experimental conditions. In other words, setting up the conditions as identically as possible to produce another measurement. Very important to estimate the experimental error. It shows the effects of set-up, and other unknown

extraneous variables. Replication is NOT the same as repetition,

although they sound similar.

95

Terminology, cont.

Run - a set of experimental test conditions. All factors are set to specific levels. If I want to measure the boiling point at three pressure levels, I need at least three runs - one with the pressure at each of the 3 levels.

Treatment (design point) - a set of

experimental conditions. One treatment is conducted each run, but treatments may be replicated in an experiment (may occur more than once).

96

Repetition or Replication?

Consider an experiment shown at right.

How would this experiment be conducted differently if it were to have 2 replicates compared to 2 repetitions?

Run Tool Sharpness

1 Sharp

2 Sharp

3 Dull

4 Dull

97

Designed Experiments(Design of Experiments, DOE’s)

Statistically based methodology of conducting and analyzing experiments

Interactions can be evaluated

Systematic error can be mitigated by randomization

Random error (noise) is mitigated by "balanced" designs since each variable is tested at different levels multiple times.

Let’s explain “balanced” design…

98

Balanced Design - What it really means

Each factor is tested an equal number of times at each level

For each factor setting, all of the other factors are set to each of their levels an equal number of times.

The variation of all the other factors does not bias the results.

Balanced designs do not necessarily test all possible conditions.

Need an example to understand “balanced”…

99

Prior Machining Example

The “One Variable at a Time” example was not a balanced design.

One level of each variable wastested 3 times, the other levelwas tested only once. Run Tool Depth Speed

1 Sharp Low Slow

2 Dull Low Slow

3 Sharp Deep Slow

4 Sharp Low Fast

100

Example of Balanced Designs Consider the “machining experiment”:

3 factors, 2 levels each

Tool: sharp, dull

Depth: deep, low

Speed: fast, slow

To run every possible combination would require 2f runs where f is the number of factors (f=3, 23 = 8).

…but we don’t need all 8 conditions for a balanced design…

Balanced means…well, we need an example…

101

Example of Balanced Designs

Run Tool Depth Speed

1 Dull Low Slow

2 Dull Deep Fast

3 Sharp Deep Slow

4 Sharp Low Fast

For each level of one factor, the other factors are tested an equal number of times at each level.

Ex: For dull tool, depth is low once and deep once, speed is slow once and fast once, et cetera.

This is a balanced design:

102

Contrast with not balanced


1 Dull Deep Slow

2 Dull Deep Fast

3 Sharp Low Slow

4 Sharp Low Fast

NOT BALANCED!All levels tested the same number of times as previous example (twice), but…if tool is dull, then depth is always deep

103


1 Dull Low Slow

2 Dull Deep Fast

3 Sharp Deep Slow

4 Sharp Low Fast


1 Dull Deep Slow

2 Dull Deep Fast

3 Sharp Low Slow

4 Sharp Low Fast

Balanced:

Not balanced:

104

Exercise – balanced experiments

Complete Exercise 6 in the back of this booklet: Create a balanced experiment with two factors at two

levels each. Assume 8 runs (22 = 4 conditions) Do not randomize (for this practice)

Factor A: levels: + and –

Factor B: levels: + and –

Notice “+” and “-” are often used in DOE’s to signify a “high” and “low” level. These are called “coded” levels

105

Review

We’ve studied Single Variable experiments (t-test, pairing, Confidence Intervals)

We have an understanding of interactions

We have an understanding of “balanced” design

We are ready to study experiments with multiple factors (factorial experiments)

106

Factorial Experiments – “The Right Way”

We will consider only full factorial experiments (experiments where all possible combinations are tested).

We will limit our discussion to experiments with two levels per factor.

Non-linear results will not be detected The total number of possible combinations for

experiments with multiple factors, all with two levels is 2f, where f is the total number of factors (test variables).

107

2-Level Factorial Design MatrixDesign Point Factor 1 Factor 2 Factor 3 Factor 4

1 + + + +2 - + + +3 + - + +4 - - + +5 + + - +6 - + - +7 + - - +8 - - - +9 + + + -10 - + + -11 + - + -12 - - + -13 + + - -14 - + - -15 + - - -16 - - - -

22

23

24

21

108

Effects?

Remember, experiments answer the question “is there an effect caused by changing factor levels?”

t-test answers this by comparing the difference (effect) to the error (noise):

We can do something similar with multi-variable experiments. Example follows…

""

""0 noise

effectt

109

Example, 2 Factors

We will use an example to develop our understanding of design and analysis

Design an experiment with: Factor 1: Paint color; Levels: Red, Green Factor 2: Operator; Levels: Chris, Terry Response: Yield strength

Use: Full factorial (all combinations tested) 3 replicates (each condition tested 3 times)

110

Design

To estimate error we need at least 2 replicates (each condition is tested twice)

More replicates = better estimate We decide to have 3 replicates (each

condition (design point) tested 3 times) We need a balanced design

111

Design Matrix

Design Point

Factor 1 Factor 2 Factor 1 Factor 2

1 + + Red Chris

2 - + Green Chris

3 + - Red Terry

4 - - Green Terry

(coded levels, +/-) (Non-coded levels)

Factor 1, color: (+) = Red; (-) = GreenFactor 2, operator: (+) = Chris; (-) = Terry

112

Interactions

With this DOE we will be able to analyze the effects of interactions.

Interactions are treated as an independent factors in the analysis

Two-way interactions (review): The effect of one factor depends upon the

level of another Ex: you will stay dry if you use an umbrella

and no wind, but will stay dry if you use a raincoat regardless of wind.

113

Design MatrixDesign Point

1 2 1X2

1 + + +

2 - + -

3 + - -

4 - - +

The level of interaction between Factors 1 and 2 (1X2) is the “product” of coded levels Factor 1 and 2 { i.e. (+)*(+)=(+); (+)*(-)=(-); (-)*(-)=(+) }

114

The randomized run sheet is on next slide

Includes 3 replicates (each design point, or set of conditions, is tested 3 times)

115

Randomized Run Order

Run Design Point

Run Design Point

1 4 7 4

2 2 8 2

3 3 9 3

4 1 10 2

5 1 11 4

6 3 12 1

The design point defines the test conditions for the run (see previous slides)

116

DOE Results

The experiment was conducted following the prescribed randomized run order.

The next slide shows the re-organized results and calculates the means

We will plot the results Then we will step through the analysis…

117

Results

Response for the 3 replicates

Main factors (1, 2) and interaction (1X2)

Averages for the 3 replicates

Dsgn Pt 1 2 1X2 xi1 xi2 xi3 Xi (ave)1 + + + 82 84 83 83.02 - + - 81 85 84 83.33 + - - 89 90 88 89.04 - - + 88 91 92 90.3

118

We are concerned with the averages, not the individual data points (they vary due to noise)

Let’s plot the data…graphs are a good way to visualize results…

119

Plot Results of Factor 1

Factor 1(Color)

- +

Res

pons

e

80

90

Plot the response values (averages)against the factor level (“-” “+”).

The graph shows that the averageresponse when Factor 1 was “-”compared to “+” is not much different. Changing Factor 1 had little effect.

Dsgn Pt 1 Xi (ave)1 + 83.02 - 83.33 + 89.04 - 90.3

120

Results for Factor 2

The slope of the trend line between the (-) and (+) levels shows that Factor 2 had a large effect on the response.

Factor 2(Operator)

- +

Res

pons

e

80

90

Dsgn Pt 2 Xi (ave)1 + 83.02 + 83.33 - 89.04 - 90.3

121

Results for Interaction (1X2)

Factor 1X2(Interaction betweenFactors 1 and 2)

- +R

espo

nse

80

90

Again, a nearly level trend line indicates little effect due to this factor (1X2). In other words, there is little interaction between Factors 1 and 2.

Dsgn Pt 1X2 Xi (ave)1 + 83.02 - 83.33 - 89.04 + 90.3

122

More Rigor (Statistics!)

The graphs are useful in terms of giving us a qualitative sense of effects.

But as we’ve seen, “averages” are not sufficient.

We need a method to quantify our confidence in the effect.

Where are we going? t-test is where!

123

Remember the “t-test”?

The t-test is used to answer the critical question: is there a statistically significant effect or is the change caused by random noise?

In order to answer that question in any experiment, we must compare the “effect” with the “noise.”

We must determine both “effect” and “noise” We’ll start with “noise” (error).

124

Nomenclature

Let xij be the response of the jth replicate of treatment “i”

Let Xi be the average of all responses within the replicate “j”

Let XT be the average of all responses, total

Let k be the total number of test conditions (design points)

“i” goes from 1 to k.

Let ni be the number of replicates for treatment “i”.

Let N be the total number of tests

125

Sum of Squares (SS)

SStotal = SSwithin + SSbetween

2

11

2

11

2

1 1

)()()( T

ni

ji

k

ii

ni

jij

k

iT

k

i

ni

jij XXXxXx

SSwithin is due to random noise.

SSbetween is variation attributed to changing the factor levels

If SSbetween is large compared to SSwithin then the treatment had an effect

“Sum of Squares” is a measure of variance

(Equation 7)

126

Example showing how to determine for SSwithin-1 design point 1:

SSwithin-1 = (82-83)2 + (84-83)2 + (83-83)2 = 2.0

2

1

)( i

ni

jijithwithin XxSS

Calculate sum of squares within each treatment (design point) and include in the table.

This is the first step to determine the “noise.”

Dsgn Pt 1 2 1X2 xi1 xi2 xi3 Xi (ave) SS1 + + + 82 84 83 83.0 2.02 - + - 81 85 84 83.3 8.73 + - - 89 90 88 89.0 2.04 - - + 88 91 92 90.3 8.7

127

Sum of Squares

2

1

)( i

ni

jijithwithin XxSS

“i” goes from 1 to 4 (design points)and ni = 3 (number of replicatesfor the ith design point)

2

1 1

)( i

k

i

ni

jijwithin XxSS

k = 4 (design pts)

Dsgn Pt mean 1 2 1X2 Xi (ave) SS1 + + + + 83.0 2.02 + - + - 83.3 8.73 + + - - 89.0 2.04 + - - + 90.3 8.7

Total: 21.3

• Determine sum of squares for each design point (following example on previous slide)

Sum=21.3

128

Experimental “noise”

SSwithin (calculated above) is related to the experimental “noise”, but it is not what we use in the t-test.

What do we use? Next slide please…

129

“Noise” (error) for the experiment

The mean square error is given as:

mse2 = SSwithin/(N-2f); N=total number of data

points (12), f = number of factors (2);

mse2 = 21.3/(12-22) = 2.7 (SSwithin = 21.3)

The “standard error” is:

For our example: standard error: 9.012

7.2*4

Nmse24

130

The standard error (just calculated) is the same for all factors in the experiment – it is the “experimental noise.”

Remember, t0 is the ratio between “effect” and “noise”

t0 = effect / standard error

“Noise” (error) for the experiment

131

Effect

We’ve determined the standard error

But what was the effect of various factor levels? Determine the average response for each factor

at each level: Determine the average response when factor 1

was (+) and also when it was (-), then do this for factor 2, etc.

This procedure requires a balanced design

132

Determine the Effect, Step 1

Factor 1 was (+) for design points 1 and 3: 83.0 + 89.0 = 172.0Factor 1 was (-) for design points 2 and 4: 83.3 + 90.3 = 173.6

Notice, slight rounding error differences between table and hand calculations

Dsgn Pt 1 2 1X2 Xi (ave) SS1 + + + 83.0 2.02 - + - 83.3 8.73 + - - 89.0 2.04 - - + 90.3 8.7

Total: 21.3sum (+) 172.0 166.3 173.3sum (-) 173.7 179.3 172.3

133

The “effect” is the difference in averaged responses for (+) and (-) levels. For Factor 1:

Effect = ABS {sum(+) – sum(-)}/n+ = {172.0 – 173.6} / 2 = 0.8• n+ = number of (+) data points (2 in this example)• Remember, average does not tell the whole story! • t-test to the rescue!

Determining effect: Step 2 and Step 3

Dsgn Pt 1 2 1X2 Xi (ave) SS1 + + + 83.0 2.02 - + - 83.3 8.73 + - - 89.0 2.04 - - + 90.3 8.7

Total: 21.3sum (+) 172.0 166.3 173.3sum (-) 173.7 179.3 172.3difference -1.7 -13.0 1.0Effect 0.8 6.5 0.5

134

t-test Review

t0 is the ratio of the “effect” to the “noise”. The larger t0 is the greater the probability that the factor had a real effect.

In the previous table we calculated the effect of all three factors (1, 2, 1X2). The “effect” is the difference in averaged responses for (+) and (-) levels.

The noise has already been determined for our example (“standard error”).

135

t-test

t0 is equal to Effect / Standard error:

Calculate t0 for each factor (including interactions)

N

n

sumsum

tmse2

0

4

)()(

136

Example calculations…

N

n

sumsum

t20

4

)()(

For this experiment we’ve calculated the standard error {(42/N)1/2} to be 0.9.

For Factor 1, the effect is = {sum(+)-sum(-)}/n+ = {172.0-173.3} / 2 = 0.8

Also for Factor 1, t0 = 0.8/0.9 = 0.9

Determine t0 for all factors and interactions

137

t-distribution

We also need to determine the value from the t-distribution table:

Degrees of freedom= N – 2f

N = total number of observations (12)

f = number of factors (2)

DOF = 12 – 22 = 8

For 95% confidence, from a t-distribution table: t/2, DOF = t0.05/2, 8 = 2.31

This is the same for all factors. Enter in the table…

138

Dsgn Pt 1 2 1X2 Xi (ave) SS1 + + + 83.0 2.02 - + - 83.3 8.73 + - - 89.0 2.04 - - + 90.3 8.7

Total: 21.3sum (+) 172.0 166.3 173.3sum (-) 173.7 179.3 172.3difference -1.7 -13.0 1.0Effect 0.8 6.5 0.5Std Error 0.9 0.9 0.9t0 0.9 6.9 0.5t-0.05/2, 8 2.31 2.31 2.31Effect? no yes no

Results

ErrorStd

Effectt 0

139

Experiment Conclusion

The key points from the table:

Factor 1 Factor 2 Factor 1X2

t0 0.9 6.9 0.5

t0.05/2, 8 2.31 2.31 2.31

Effect?

(is t0> t0.05/2, 8?)

no yes no

Only Factor 2 had a statistically significant effect.

140

Factorial Experiment, Conclusion

The above example shows the basics of analyzing a factorial experiment.

DOE software will perform the analysis for you.

Usually, F-test is performed rather than t-test, but the concept is the same (they are equivalent).

We will do an experiment for practice, first, let’s talk about other aspects of an experiment…

141

Planning an Experiment

Okay, we now have some idea about DOE “Design” is only a small part of the picture To conduct an experiment properly, much

more is required. This typically includes most of the following…

142

Experiment Process Define the problem (write down a problem statement),

define the objective (purpose). Determine available resources Determine factors, levels, and response(s) Create the design (number of runs, run order, etc.) Obtain resources

$$$, measurement and test equipment, test specimens (have spares), personnel, etc.

Create a plan Determine schedule for personnel, equipment, etc. Create a run-sheet.

Save all used specimens, identify them. You may need to take another closer look at them later.

143

Practice Experiment

Problem: we like “good” popcorn – and we currently can’t make good popcorn.

Create an experiment to help solve this problem.

As a class, complete the next slide

144

Practice

As a class: Write a problem statement Write clear objective of experiment Determine Factors – only two – think carefully

about what you select. Determine factor levels (do not worry that

some combination of factor levels will produce bad popcorn – this is to be expected)

Determine response (may be more than 1). Next slide please…

145

Practice

Factors you may have considered: Time Power setting Placement of bag within the microwave Orientation of the bag Brands of popcorn Different microwaves

Are “time” and “power setting” independent? Could they be combined and called “energy input”?

Resources are limited – we want the most useful information possible.

146

Exercise – full factorial experiment

Break into smaller groups (about 5 per group) and design and conduct the experiment and analyze the results. Use the worksheet in the back of this booklet (Exercise 7).

After completion, discuss as a class.

147

One last thing

“Outliers” may be an issue in an experiment. Unfortunately, if only 1 or 2 data points are observed for a given set of experimental conditions, it is not possible to determine if an outlier exists. Even more detrimental, with few data points a single outlier can dramatically change the sample mean!

What to do? Always do a “reality check” – do the results seem reasonable? If not, it may be due to an outlier – OR it may not be a error in any form (your judgment may be off – no shame in being surprised by results).

Be careful about dismissing what you think is an outlier – it may not be!

148

Limitations to what we’ve done

We considered only experiments with: Assumed normal distributions All factors at 2 levels each

These factors can be discrete or continuous Response must be continuous, not discrete

At least 2 replicates We did not look at “censored” data (such as fatigue

data that is terminated after so many cycles even if there was no failure)

All Design Points were replicated an equal number of times

Full factorial (all possible combinations were tested) Next slide please…

149

Advanced Stuff – But Not Here, Not Now

There are more advanced concepts (and surprisingly, these are not necessarily much more complicated to design or analyze.)

150

Life beyond this course

Experiments do not actually require having a second replicate to estimate errors (talk to a statistician)

Fractionated experiments – experiments that not all possible combinations are tested. These are very beneficial if there is a large number of

factors (2f gets big fast!). The “cost” is lost knowledge regarding interactions.

Experiments can model non-linearity if more than 2 levels per factor are included

151

CONCLUSIONS

Experiments require planning! Randomize to mitigate systematic errors

No pain, no gain Select factors and their levels carefully

May want to “try out” levels (pre-experiment) before beginning a DOE

t-test helps answer “is there an effect” Pairing is a good thing – if possible

Full factorial designs are effective and efficient for multi-variable experiments

152

Happy Experimenting!

Download - 1 Basic Experimentation Notes developed by Ken Lulay Mechanical Engineering University of Portland July 2008

Top Related