1 analysis of variance of multiple factors (1)

21
Analysis of variance for multiple factors. Analysis of variance of 2 fixed factors. The general factorial design with two factors. The simplest type of factorial designs are those involving only two factors, where there are a levels of factor A and b levels of factor B with n ij 2 replicates (experimental runs) for each combination of levels of the two factors. The design matrix is as follow: Factor B Factor A 1 2 b 1 y 111 , y 112 , … y 11n y 121 , y 122 , … y 12n y 1b1 , y 1b2 , … y 1bn 2 y 211 , y 212 , … y 21n y 221 , y 222 , … y 22n y 2b1 , y 2b2 , … y 2bn : : : : a y a11 , y a12 , … y a1n y a21 , y a22 , … y a2n y ab1 , y ab2 , … y abn In this case it is considered that the observations are values from a variable Y , which can be expressed by the model: y ijk =μ +α i +β j +( αβ) ij +ε ijk donde i=1,2 , .... , aj=1,2 , ... ,bk=1,2 , ... ,n : It is the overall mean. α i : It is the effect of the i th level of factor A with respect to the overall mean. : It is the effect of the j th level of factor B with respect to the overall mean. αβ ij : It is the effect of interaction between the level of factor A and level of factor B with respect to the overall mean. ε ijk : It is the component of random error or noise inherent to the process, present in each observation. Estimation of the model parameters We can use the model to predict any observation: ^ y ijk =μ +α i +β j +( τβ ) ij We can see that it does not have the noise component, this is because the natural variation in the process can not be predicted, because it is expected to have a random pattern. Taking into account the actual value and the value predicted by the model, the noise component is ε ijk =y ijk ^ y ijk =y ijk −( μ +α i +β j +( τβ ) ij ) Getting the sum of squared errors SS E = i=1 a j=1 b k=1 ε ijk 2 1

Upload: lblackdragonl

Post on 15-Nov-2015

227 views

Category:

Documents


12 download

DESCRIPTION

Analysis of Variance of Multiple Factors (1)

TRANSCRIPT

conocen como: Modelos de diseo de un sentido o Completamente aleatorizados

Analysis of variance for multiple factors.

Analysis of variance of 2 fixed factors.

The general factorial design with two factors.

The simplest type of factorial designs are those involving only two factors, where there are levels of factor A and levels of factor B with nij2 replicates (experimental runs) for each combination of levels of the two factors. The design matrix is as follow:

Factor B

Factor A12b

1y111, y112, y11ny121, y122, y12ny1b1, y1b2, y1bn

2y211, y212, y21ny221, y222, y22ny2b1, y2b2, y2bn

::::

aya11, ya12, ya1nya21, ya22, ya2nyab1, yab2, yabn

In this case it is considered that the observations are values from a variable, which can be expressed by the model:

: It is the overall mean.

: It is the effect of the ith level of factor A with respect to the overall mean.

: It is the effect of the jth level of factor B with respect to the overall mean.

: It is the effect of interaction between the level of factor A and level of factor B with respect to the overall mean. : It is the component of random error or noise inherent to the process, present in each observation.

Estimation of the model parameters

We can use the model to predict any observation: We can see that it does not have the noise component, this is because the natural variation in the process can not be predicted, because it is expected to have a random pattern. Taking into account the actual value and the value predicted by the model, the noise component is

Getting the sum of squared errors

Using the method of least squares to estimate the model parameters that minimize the sum of these squared errors the estimations of the model parameters are:

With these estimate values, we can estimate the value of any observation as:

Whence it follows that

The natural variability of the process occurs internally within each combination of levels of the 2 factors.

The hypotheses tests, to test with analysis of variance of two factors, are:On the equality of the means or the effects of levels of factor A.

On the equality of the means or the effects of levels of factor B

Whether the levels of factor A and factor B interact.

These hypothesis testing can be performed using an analysis of variance, considering that again the total sum of squares can be decomposed as:

The sums of squares are obtained by the expressions:

And the ANOVA table is constructed as:

Source of VariationSum of squaresDegrees of FreedomMean SquareFm-Ratio

Factor

Factor

Interaction

Error

Total

These hypotheses are rejected if is greater than Respectively.

Again, the Anova analysis is valid only if the assumptions referred to Analysis a one factor are met.

Multiple comparisons of means. If in the analysis of variance Ho for the interaction is rejected, multiple comparisons for interaction should be performed considering all combinations of levels of the two factors analyzed. If Ho for the interaction is not significant, multiple comparisons for each individual significant factor should be performed.

Coefficient of determination of the ANOVA model:It is a measure about the percentage at which the Anova model can explain the variability in Y, considering the variability or changes on factor levels..

Response surface. It is the graphical representation of multiple linear regression model or nonlinear regression involving the 2 analyzed factors.

Basic assumptions in the analysis of variance and checking. Analysis of variance of three fixed factors.

In this case, the model to be used is:

And the analysis of variance table is:

Source of VariationSum of squaresDegrees of FreedomMean SquareFm-Ratio

Factor A

Factor B

Factor C

Interaction AB

Interaction AC

Interaction BC

Interaction ABC

Error

Total

For a balanced experiment, the sums of squares of the main effects are based on the totals of factor A (yi), B (y.j...) and C (y..k.) as follows (y..k.)

To compute the two-factor interaction sums of squares, the total A x B, A x C and B x C are needed. The sums of squares are:

The sum of squares of the interaction of three factors is calculated from the three-way cell totals {yijk.} as:

Only if there are at least two replicas for combination.

The sum of squares of the error can be found by subtracting the sum of the squares of each main effect and interaction of the total sum of squares.

The parameters of the Anova model are estimated as:

Three-factor Anova with one of them as a blocking factor.

In this case, the order in which the combinations of the other two factors must be completely randomized inside each block.

Considering that the 3rd factor is the blocker factor model for this design is

Where is the effect of the k-th block and the random component includes other interactions. In this case the blocking factor interactions can not be separated from error.

If there is only one replica per combination and the number of levels of factor C is set as n, the analysis of variance table is:

Source of VariationSum of squaresDegrees of FreedomMean SquareFm-Ratio

Factor ASSA a-1MSA = SSA / (a-1)MSA / MSE

Factor BSSBb-1MSB = SSB / (b-1)MSB / MSE

Blocker CSSCn-1MSC = SSC / (n-1)MSC / MSE

Interaction ABSSAB(a-1)(b-1)MSAB = SSAB / ((a-1)(b-1))MSAB / MSE

ErrorSSerror(ab-1)(n-1)MSE = SSE / ((ab-1)(n-1))

TotalSSTotalabn-1

The sums of squares of the main effects are based on the totals factor A (yi), B (y.j...) and C (y..k.) as follows (y..k.)

To calculate the sum of squares of the interaction, the totals of the cells, in a table AxB, are needed.

1. The performance of a chemical process is studied. It is thought that the 2 most important variables are the pressure and temperature. 3 levels of each factor were selected and carried out the experiment with 2 replicates. Use = 0.04.Pressure (psi)

Temperature (C)200215230

15090.490.790.2

90.290.690.4

16090.190.589.9

90.390.690.1

17090.590.890.4

90.790.990.1SI

a. Perform the ANOVA, only set the hypothesis for the interaction, but conclude for the 3 hypotheses. b. Determine the effect of using the combination (temperature, pressure) = (160,200). c. Construct a graph of interaction and conclude. d. Using Tukey method determine the optimum levels at which the process must be operated. e. Determine and interpret a confidence interval, when the combination (Temp, Press) = (160,200) is used. f. Determine if the assumption of normality of the errors is met, perform the Anderson Darling test. g. Set a regression model that relates the performance of the process pressure and temperature, with and without interaction.

2. An experiment was performed to determine if the firing temperature or the position in the oven affect the thickness (m) of a carbon anode. A small thickness is desired. Temperature (C)

Position800825850

15701063565

5651080540

5831043590

2528988526

5471026538

5211004532SI

a. Perform the ANOVA, only set the hypothesis for the interaction, but conclude for the 3 hypotheses. b. Determine the effect of using the combination (position, temperature) = (1, 825). c. Construct a graph of interaction and conclude. d. Determine and interpret a confidence interval, when the combination (Pos, Temp) = (2,800) is used.e. Graphically determine if the errors are independent. Consider that the runs were labeled in the standard order. Use a graph et vs t.f. Get scatter diagrams to show the type of relationship between the dependent variable and each of the independent variables, using such information construct a regression model.

3. A mechanical engineer studies the push force developed by a drill. Suspected drill speed and feed speed of the material are the most important factors. He selected four feed speeds and uses a high-speed drilling and another low, they were chosen to represent extreme conditions. Use = 0.03.

B: Feed Speed

A:Drill Speed0.0150.030.0450.06

1252.72.452.62.75

2.782.492.722.86

2002.832.852.862.94

2.862.82.872.88

a. Is there a significant interaction effect? Perform the appropriate analysis of variance. b. Construct a graph of interaction and conclude.c. Using Tukey method determine the optimum levels at which the process must be operated. d. Determine if there are outliers. e. Perform the appropriate test to determine whether the variances are constant at different levels of feed rate. f. Fit a regression model relating the appropriate push force to the speeds. Compare with the model without the constant term.

4. An experiment was performed to study the influence of the operating temperature and three types of glass cover plates, in the light output of an oscilloscope tube. High light output is desired.

Temperature

Type of plate100125150

158010901392

56810871380

57010851386

295010701328

93010351312

97910001299

3846845867

875853904

899866889

a. Perform the appropriate analysis of variance. b. Set an appropriate regression model that relates the light output with plate type and temperature. c. Get an interaction graph, In the horizontal axis use the plate type. d. Using Tukey determine the optimum levels at which the process must be operated. e. Determine if there are atypical or extreme observations. f. Get a chart on normal probability paper for errors and conclude.

5. In the article Journal of Testing and evaluation (vol. 16, no. 2, pp. 508-515), it is described an experiment where the effects of cyclic loading sequence and environmental conditions on the growth of fatigue cracks with a constant effort of 22 MPa for a particular material were investigated. The experiment data are presented below, the dependent variable is the growth of fatigue cracks. Use = 0.05. Small cracks desired.

Frequency of cyclic loadingEnviroment condictions

AirH2OH2O seawater

103.292.061.9

3.472.051.93

3.482.231.75

3.122.032.06

7.52.653.23.1

2.683.183.24

2.763.963.98

2.383.643.24

52.244.513.96

2.714.123.01

2.814.63.36

2.084.353.45

a. Just set the hypothesis for interaction, make the corresponding analysis of variance and conclude over all the components.b. Get a graph interaction and conclude. c. Using multiple comparisons of means with Tukey's method, determine the optimum levels at which the process should be operated. d. Get a confidence interval for average crack, when the combination (FCC, MA) = (10, H2O) is used.e. Determine if there are outliers. f. Perform the appropriate hypothesis test to determine if the variances are constant at different frequency levels. g. Fit a multiple regression model relating the crack growth with the frequency of the cyclic loading, the environmental conditions and their interaction, note that Dummy variables are required. Evaluate the adequacy of the model using the standard error of estimate and the adjusted coefficient of determination.

6. The lifetime of tire wear was determined by measuring the surface wastage (in thousandths), for each of 5 different brands of compact cars (factor A) in combination with each of each of 4 different brands radial tire (factor B), doing one replicate for each combination. Obtained SSA=40.6, SSB=64.1 y SSE=59.2.a. What factor should be considered as a blocking factor?b. Set Hypothesis tests, do the Anova analysis and concluded.

7. Johnson and Leone (Statistics and Experimental Design in Engineering and the Physical Sciences, John Wiley) describe an experiment conducted to investigate the copper plates twist. The two factors studied were temperature and the copper content of plates. The response variable was a measure of the amount of twist of the plates. The data were as follows: Copper Content (%)

Temperature (C)406080100

5017, 1619, 2124, 2228, 27

7510, 918, 1714, 1227, 31

10014, 1218, 2125, 2330, 28

12521, 1723, 2123, 2230, 31SI

a. Analyze the data, assuming that the replicas were made in different shifts (blocks).b. Construct a graph of temperature vs. interaction copper content and complete.c. Use Tukey to determine which factor levels should use.d. Is the assumption of normality of the errors met?e. Check the homoscedasticity assumption for temperature levels.f. Determine if the experiment was properly randomized.g. Construct two regression models, one linear and another quadratic, compare and conclude.

8. Un ingeniero sospecha que el acabado superficial de una pieza metlica se afecta por la velocidad de alimentacin y la profundidad de corte. Selecciona tres velocidades de alimentacin y cuatro profundidades de corte. Despus realiza un experimento factorial y obtiene los siguientes datos:

Velocidad de alimentacin (pulg/min)Profundidad de corte (pulg)

0.150.180.20.25

0.274798299

646888104

60739296

0.25929899104

86104108110

88889599

0.399104108114

9899110111

1029599107

a. Realice el correspondiente anlisis de varianza completo (solo plantee la hiptesis de la interaccin).b. Obtener un grfico de interaccin de velocidad vs profundidad y concluir.c. Obtener e interpretar estimaciones puntuales del acabado superficial promedio con cada velocidad de alimentacin.d. Determinar si existen observaciones atpicas utilizando los errores.e. Obtener un I.C. para la diferencia en el acabado promedio cuando se utilizan las velocidades de 0.20 y 0.25.f. Genere las correspondientes grficas y determine si se cumple el supuesto de homocedasticidad en el factor de velocidad, y en el factor de profundidad.g. Determine utilizando una grfica de probabilidad normal si se cumple el supuesto de normalidad.h. Determine grficamente si se cumple el supuesto de independencia.

9. An engineer is investigating the effects on paper strength that produce the percentage of the concentration of the pulp fiber, the tank pressure and the cooking time of the pulp. 3 levels for the concentration of fiber, 3 pressure levels and 2 levels for the selected cooking time. A general factorial experiment was performed getting the following results. It is desired a greater resistance.

FactorsResistance: Replicates

Run NumberA.Concentration (%)B.Pressure (psi)C.cooking time (hours)12

124003196.6196.0

224004198.4198.6

325003197.7196.0

425004199.6200.4

526003199.8199.4

626004200.6200.9

744003198.5197.2

844004197.5198.1

945003196.0196.9

1045004198.7198.0

1146003198.4197.6

1246004199.6199.0

1384003197.5196.6

1484004197.6198.4

1585003195.6196.2

1685004197.0197.8

1786003197.0198.1

1886004198.5199.8

a. In Minitab perform the randomization of the experiment, report combinations of factor levels and the order in which they must perform the experiment, for the first 5 runs.b. Perform and conclude about the corresponding variance analysis. Set the hypothesis of factor and interaction that are most significant.c. Construct and conclude about the most meaningful interaction. (obs: it is only 2 factors).d. Determine the optimal levels for the 3 factors. Use Tukey.e. Determine and interpret the effect of using the optimal combination.f. Obtain and interpret a confidence interval using the optimal combination.g. Build your regression model for the effects that were significant and plot on the same graph the actual value and the predicted value. What is your conclusion? h. Determine if there are outliers.i. Perform the Hypothesis test of Anderson Darling.

10. Conteste las siguientes preguntas.a. Qu es un experimento factorial completo?b. En caso de no cumplirse los supuestos de normalidad y varianza constante, Qu se puede hacer para evitar problemas con el anlisis y resultados obtenidos?c. Cuntos efectos se pueden estudiar con un factorial 4x3x2?d. Mencione al menos tres ventajas de la experimentacin factorial sobre la estrategia de mover un factor a la vez.e. Cul es la implicacin prctica de utilizar tres niveles de pruebas en lugar de dos en un factor dado?f. Por qu no tiene sentido utilizar el modelo de regresin cuando los factores son cualitativos? Si fueran cuantitativos, qu se gana con el modelo de regresin en relacin al modelo de efectos?g. De los tres supuestos del modelo, Cul puede afectar ms el anlisis en caso de no cumplirse?h. Cules son los supuestos del modelo en un diseo factorial y con cules grficas de residuos se puede verificar cada uno de estos supuestos?i. En la pregunta anterior, cmo se vera en las grficas un punto muy alejado o aberrante? j. Qu significa que el modelo estadstico sea de efectos aleatorios? k. En que cambian las hiptesis de inters en un diseo de factores aleatorio respecto al diseo de factores fijo?11. An Engineer wants to investigate the effect of type of suspension (A), opening of the mesh (B) and temperature cycling (C); over the% sedimentation of a mechanical suspension. For this a 3x2x2 factorial experiment was run with 4 replicates, using factor A as blocking factor. Test levels were A: suspension type A1, A2 and A3; The mesh opening B: 50 and 60; C Temperature: 0 to 30 degrees Celsius. Obtaining the results

A1A2A3

B1B2B1B2B1B2

C1606762717175

707368757375

756867807576

706865807677

C2555244605256

535745655155

535448674857

545445675059

a. Perform the corresponding analysis of variance, only set the hypothesis test of the interaction and the most significant factor.b. Construct a graph where the most significant interaction is illustrated and conclude. A low percentage of sedimentation is desired.c. Tukey used to determine which levels should use for type suspension 3.d. Determine if there are outliers.e. Is the assumption of normality of the errors met?f. Is the assumption of independence of errors met? For the type of suspension 2. Number vertically errors.

g. Build your regression model for the effects or terms that were significant and plot on the same graph the actual value and the predicted value. What is your conclusion?

12. The shrimp spawn in the sea and the eggs hatch into larvae while being transported to the coast, past the larval stage the shrimp go into the estuaries where they grow rapidly and become pre-adults migrating back to the sea where they reach maturity.During their migration and life cycle the shrimp face a wide variation in temperature, density and salinity, between other factors, so it is very important to know how these factors affect their growth. A designed experiment was performed with these three factors, temperature using 2 levels 15C, 25C, Salinity using 3 levels 10%, 25% and 40% and density of shrimp in the tank 80 shrimps/40 liters, 120 shrimps/40 liters. For each combination of levels, three tanks in the same conditions were used to get the average weight gain for shrimp in 4 weeks.

TDSWeight gained (mg)

20C8010%865273

20C8025%544371482

20C8040%390290397

20C12010%537386

20C12025%393398208

20C12040%249265243

25C8010%439436349

25C8025%249245330

25C8040%247277205

25C12010%324305364

25C12025%352267316

25C12040%188223281

a. Using minitab, perform the randomization of the experiment, report combinations of levels of factors and the order in which they must perform the experiment. Only report the first 5 runs. b. Only set the corresponding test hypotheses about the interaction of the 3 factors, perform the corresponding analysis of variance and concluded on all components of Anova model. c. Construct a graph to show how the interaction of temperature and salinity affects the average gained weight by the shrimps.d. Use Tukey to determine which factor levels should use. Note that the interaction of three factors is significant, therefore multiple comparisons should be done on level combinations of the three factors. e. Build your regression model according to the terms in the ANOVA model were significant and plot on the same graph the actual value and the predicted value, which concludes.13. The quality control department in a textile plant studies the effect of various factors on dyeing cotton fabric and synthetic fibers used to make shirts for men. 3 operators, 3 cycle lengths and two temperatures were selected; and three small samples of fabric under each set of conditions were dyed. The finished fabric was compared with a standard, and also it was assigned a numerical evaluation. The data are shown below. Considering the ability of operators as a blocking factor. Use = 0.08.Temperature

300350

CycleOperatorOperator

time123123

40232731243834

242832233636

252629283539

50363433373434

353834393836

363935353631

60283526263628

243527293726

a. Perform the appropriate analysis of variance, only set the hypothesis test of the interaction and the most significant factor. b. Construct a graph where the average dyed fabric for the most significant factor is illustrated and conclude. High average staining is desired. c. Construct a graph where the interaction is illustrated and conclude. d. Determine which levels should use. Use Tukey.e. Is the assumption of normality of the errors met? f. Determine if properly randomized experiment for Operator 1. g. Build your regression model with the significant terms. Note that the operator requires 2 Dummy variables.

14. Se estudia el rendimiento de un proceso qumico. Los dos factores de inters son la temperatura y la presin. Se seleccionan tres niveles de cada factor; sin embargo, solo es posible hacer nueve corridas en un da. El experimentador corre una rplica completa en cada da. Los datos se muestran en la tabla siguiente. Analizar los datos, considerando que los das son bloques. Use TemperaturaDa 1Da 2

PresinPresin

250260270250260270

15086.384.085.886.185.287.3

16088.587.389.089.489.990.3

17089.190.291.391.793.293.7

a. Utilice Minitab para establecer las corridas experimentales del 1er da, solo reporte las combinaciones de las primeras 5 corridas y su orden estndar.b. Plantee las respectivas hiptesis del factor perturbador y de la interaccin y realice el correspondiente anlisis de varianza, concluya.c. Utilice Tukey para determinar que niveles de los factores se deben de utilizar.d. Se cumple el supuesto de normalidad de los errores?e. Determine si se aleatorizo correctamente el experimento para el da 2.f. Determine si se cumple el supuesto de varianza constante para el factor ms significativo.g. Construya y compare los modelos de regresin siguientes, para compararlos use S y R2adj. Cul modelo considera mejor? Describa su desempeo.i. Con los trminos simples de temperatura y presin sin considerar el da, porque no se debe utilizar el da como variable de prediccin?ii. Adems adicional considerando la interaccin de temperatura y presin.iii. Adicional al caso anterior considerando los trminos cuadrticos de temperatura y presin

15. Una empresa de refrescos est interesada en obtener alturas de llenado ms uniformes en las botellas. La mquina de llenado est bajo control estadstico y en promedio llena las botellas a la altura objetivo, pero existe variacin en torno a este objetivo, al ingeniero del proceso le gustara entender mejor las fuentes de variabilidad y, en ltima instancia reducirla.Se pueden controlar 3 variables durante el proceso de llenado el % de carbonacin (A) la cual se manej a 3 niveles 10, 12 y 14%, la presin de operacin en el llenador (B) se manej a 2 niveles 25 y 30 psi y las botellas producidas por minuto o rapidez de la lnea (C) a 2 niveles (200 y 220 bpm). El ingeniero corri un experimento con dos replicas completamente aleatorizado. La variable respuesta es la desviacin promedio de la altura del llenado objetivo que se observa en una corrida de produccin con cada conjunto de combinaciones de niveles de los factores.

Las desviaciones positivas son alturas de llenado arriba del objetivo, mientras que las desviaciones negativas son alturas de llenado abajo del objetivo.

Factores

Orden de las corridas en el tiempoOrden estndarA: Porcentaje de carbonacin.B: Presin (psi)C: Rapidez de la lnea.Desviacin

1812302206

22012302205

31712252001

4612252202

531030200-1

61610302201

712143022010

811025200-3

9410302201

1021025220-1

1124143022011

12712302002

131510302000

141410252200

15914252005

162214252206

172114252004

181014252207

191812252201

20131025200-1

211114302007

22512252000

231912302003

242314302009

Usando excepto en d use a. Realice el correspondiente anlisis de varianza. Solo plantee las hiptesis del factor y de la interaccin ms significativa. b. Construya e interprete una grfica donde se ilustre la desviacin promedio respecto al llenado objetivo, para el factor ms significativo.c. Construya una grfica donde se ilustre la interaccin ms significativa y concluya.d. Utilice Tukey para determinar a qu niveles se debe de operar el proceso.e. Obtenga e interprete un intervalo al utilizar la combinacin ptima.f. Construya su modelo de regresin segn los efectos que resultaron significativos, recomienda el modelo con el termino constante igual a cero?, grafique en una misma grafica el valor real y el valor pronosticado, que concluye.g. Determine si se cumple el supuesto de Normalidad de los errores, realice la prueba de Anderson Darling. h. Determine si se cumple el supuesto de independencia de los errores respecto al tiempo.i. Determine si se cumple el supuesto de Homocedasticidad para el trmino ms significativo en el Anova.16. El artculo Towards Improving the propierties of Plasters Moulds and Castings (J. Engr Manuf., 1991, pp265-269) describe un estudio de como la cantidad de fibra de carbono y adiciones de arena afectan la dureza de las piezas fundidas. Obtenindose los siguientes datos

Adicin de fibra de carbono %

Adicin de arena % 02550

0616967

636969

15676969

697474

30657474

747275

a. Plantee las correspondiente hiptesis, obtenga su tabla Anova y concluyab. A que nivel de operase el proceso para que las piezas sean lo mas duras posible. Use Duncan.c. Determine grficamente si los errores son independientes entre si.

1

9