computer-based biometrics manual 2006 - statistics and biometry

77
Student Name:_______________________ Student No:_______________________ COMPUTER-BASED BIOMETRICS MANUAL (Using GenStat for Windows) For BIOMETRY 222 EXPERIMENTAL DESIGN & MULTIPLE REGRESSION 2006 School of Statistics and Actuarial Science University of KwaZulu-Natal Pietermaritzburg Campus Private Bag X01 Scottsville 3209, South Africa Compiled by: Peter M. Njuho, PhD Senior Lecturer E-mail: [email protected]

Upload: others

Post on 10-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: computer-based biometrics manual 2006 - Statistics and Biometry

Student Name:_______________________

Student No:_______________________

COMPUTER-BASED BIOMETRICS MANUAL (Using GenStat for Windows)

For

BIOMETRY 222

EXPERIMENTAL DESIGN & MULTIPLE REGRESSION

2006

School of Statistics and Actuarial Science University of KwaZulu-Natal

Pietermaritzburg Campus Private Bag X01

Scottsville 3209, South Africa

Compiled by: Peter M. Njuho, PhD ♦ Senior Lecturer ♦ E-mail: [email protected]

Page 2: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 2 -

COMPUTER-BASED BIOMETRICS MANUAL (Using GenStat for Windows)

Goal: To develop understanding of statistical analysis and ability to interpret the results obtained using GenStat. Objectives:

• To develop clear understanding of statistical concepts used in the design of experiments. • To learn how to fit multiple regressions and interpret the parameter estimates. • To develop ability in the use of GenStat. • To understand how results obtained using calculators related to those obtained using the

GenStat. • To develop skills and ability to interpret statistical results.

Introduction The use of this computer laboratory manual assumes knowledge in Biometry 210, “Introduction to Biometry” and some basics in Window based GenStat. The manual has been developed to supplement the course material given in Biometry 222, “Experimental Design & Multiple Regression.” The sections are divided into tutorials where background information is given for each tutorial. Some exercises that test the understanding of the concepts are given together with computer oriented exercises. The ability to interpret some of the results from the analyses is tested using part of the structured questions. An attempt is made to guide the student in getting the crucial GenStat directives. The student is required to ask for help where these directives fail to work or are not clear. It is advisable for each student to work independently and later compare results with a colleague. The student is expected to note down answers in the blank space provided. The data sets referred to in all the exercises are stored in the agriculture computer laboratory server, Pietermaritzburg Campus. The directory can be accessed as F:\Users\Biometry\Biom222\....

Page 3: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 3 -

TUTORIAL 0NE

Topic: Concept on Experimental Unit & Experimental Design Background Experimental design is a planned arrangement of treatments into experimental units in such away that bias is minimized. The factor of interest under investigation is called the treatment whereas an experimental unit is the smallest unit to which the treatment is applied. Independent application of a treatment to more than one on experimental unit is referred to as replication. Replication is necessary for the purpose of estimating experimental error. Remark: The exercises in this tutorial do not require use of computer. They are meant to assess your understanding of basic concepts of experimental design. Exercise 1.1 A researcher conducted an experiment to compare two room temperatures for doing a particular type of work. There were 6 rooms available for experimentation. Three randomly selected rooms were set at 60 degrees and the other three were set at 72 degrees. Five workers were put in each room and various measurements were made on each relative to their work performance. a)What are the treatments in this experiment? b) What are the experimental units? c) How many replications are there for each treatment? d) Provide analysis of variance table outlining only the sources of variability and the degrees of freedom. Source of variation Degrees of freedom Total Exercise 1.2 After the experiment was designed as in exercise 1.1, the researcher decided to add another factor called task. There were 5 tasks in all. Within each room one task was assigned randomly to each worker so that all tasks were done in each room.

Page 4: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 4 -

a) What are the treatments? b) Explain why this is not a completely randomized design. c) How would the experiment have to be changed so that it would be a completely randomized? Is such an experiment practical? d) How would you design the experiment if there were only one room for experimentation?

Page 5: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 5 -

Exercise 1.3 A researcher was interested in the ability of two chemicals to retard the spoilage of grain. A bin of grain was treated with chemical C1 and another identical bin was treated with chemical C2. The researcher took 10 samples from each bin after an appropriate period of time and measured the spoilage in each sample. a) What are the treatments? b) What are the experimental units? c) How many replications are there in this experiment? d) Provide analysis of variance table outlining only the sources of variability and the degrees of freedom. Source of variation Degrees of freedom Total

Page 6: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 6 -

TUTORIAL TWO Topic: Concept on Completely Randomized Design Background A completely randomized design (CRD) is one in which the experimental units are assumed to be homogeneous. The randomization process ensures that each treatment has an equal chance of falling any of the experimental unit. Randomization scheme can be established either using random numbers table or computer generated random numbers. Refer to the discussion on randomization made in the class. Replication occurs when a treatment is allocated independently to more than one experimental unit. It is possible for treatments to be unequally replicated depending on the level of precision associated to each treatment. More replications imply higher precision. Remark: The exercises in tutorial one do not require use of computer. The questions are meant to assess your understanding of concepts of completely randomized design applied under different scenarios. Utilize the following extract of random numbers whenever randomization scheme is required. 11649 96283 01898 61414 49174 12074 98551 97366 39941 21225 90474 41469 16812 28599 64109 09497 25254 16210 89717 28785 02760 19155 Exercise 2.1 An experiment was conducted where fifty students were showed a film on nutrition and another fifty were not. Each group was given a test on nutrition. The test was given at the same time to all the students, and it was given after the first group viewed the film. The purpose was to determine whether the film increases knowledge of nutrition. a) What are potential sources of bias in this experiment?

Page 7: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 7 -

b) Demonstrate how you would handle the assignment of students to one of the two groups so that, the potential sources of bias in fact will not bias the results of the experiments? Exercise 2.2 Fifteen consumers are to be selected at random from a certain population to evaluate one of the three formulations of a food product, P1, P2, P3. Each consumer will evaluate only one of the formulations, and the researcher who is conducting the study can deal with only one consumer at a time. a) Come up with a completely random design for collecting data. Make sure that your description of what to do clearly specifies who is to evaluate each product and when the evaluation is to take place.

Page 8: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 8 -

b) Provide analysis of variance table outlining only the sources of variability and the degrees of freedom. Source of variation Degrees of freedom Total Exercise 2.3 Twenty laboratory mice are to be used in a nutrition experiment. The factors involved in the experiment are protein (2 levels P1 and P2), and fat (2 levels F1 and F2). The diets consist of the possible combinations of the protein and fat levels. a) Come up with a plan in which the rats are randomly assigned to the diets.

Page 9: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 9 -

b) Provide analysis of variance table outlining only the sources of variability and the degrees of freedom. Source of variation Degrees of freedom Total c) The researcher decides to add a control diet to the experiment consisting of ‘mouse chow’. How would the diets be assigned to the mice so that the design is completely random?

Page 10: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 10 -

d) The researcher would also like to add an exercise factor. The levels of exercise in the experiment are to be E1 and E2. (Apparently, the level of mice exercise can be controlled by selecting the type of equipment that goes in the mice cages). Come up with a plan in which the rats are randomly assigned the diet and exercises to the mice.

Page 11: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 11 -

TUTORIAL THREE Topic: Additional Exercises on Completely Randomized Design Background The case of unequal number of replications is introduced through exercise 3.1. The within treatment variability is pooled to form the overall experimental unit variability which necessary for estimating the experimental error. Once an overall ANOVA is performed, there is need to conduct further analysis to investigate which treatments were significantly different. This is achieved through performance of t-tests or by partitioning the treatments degree of freedom to single degree of freedom. Each single degree of freedom is associated with a contrast. A contrast is a logical question constructed using certain treatments. Oftentimes, orthogonal contrasts are preferred. The term orthogonal refers to non-overlapping of information. The concept of orthogonality is introduced in exercise 3.2 Remark: The exercises will take you through data entry to actual analysis using GenStat statistical directives. You can choose to enter the data into an Excel Spreadsheet and then copy and paste directly through Clipboard to GenStat Spreadsheet. Alternatively, you can open a new GenStat Spreadsheet where you define the number of rows and columns and then enter the data. Exercise 3.1 Consider data from an experiment set to compare the effects of four levels of thinning on the height growth of Eucalyptus trees. Ten plots per thinning treatment were used. Initially there were 12 trees per plot and after 5 years of growth, 30 randomly selected plots were thinned to 4, 6, 9 and 12 trees. No thinning took place in the remaining 10 plots. After 10 more years, the heights were measured. Some plots were missing due to illegal felling of the trees (Unbalanced case).

Number of trees per plot (Treatments) 4 6 9 12 42.1 44.8 39.4 34.0 34.0 38.5 37.6 34.4 39.9 38.7 38.1 34.2 43.7 40.8 40.8 35.7 43.5 41.2 38.8 35.3 42.5 44.9 36.5 44.3 38.5 49.4 1.0 37.2

34.6 35.1

Here you are interested in comparing the four treatments – in this case, the four levels of thinning (4, 6, 9 and 12). Since the experiment was carried out as a completely randomized design, and your response variable is height, you would want to do a one-way analysis of variance. The null hypothesis in this case is that the mean heights for the four levels are all the same. You therefore want to investigate whether or not you can reject this hypothesis. a) Enter your data into a spreadsheet. The data set requires 2 columns and 30 rows where column 1 is

the treatment, call it thin and column two denotes the response variable, height. Covert column 1 into a factor after entering the data.

Page 12: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 12 -

How to get started. Once you log on to GenStat, Click on Spread followed by New and then Create. Enter the number of rows as 30 and columns as 2 and then Ok. Move the cursor onto C1 and click the right button of the mouse. Click on the rename. This allows you to type Thin. Do the same for C2 and type the name Height. You can now start entering the data. The format will be like

Thin Height

4 42.1 4 34.0 . . . . . .

0 34.6 12 35.1

Incase you chose to enter the data in an Excel Spreadsheet, the process to copy the data into a GenStat Spreadsheet follows. Highly the data plus the column names and click copy. Move to GenStat Window and Click on the Spread then New and then, From Clipboard. This takes you to a window New Spread from Clipboard. Select the necessary boxes and ensure the one on Column names are in the first row is selected. b) Provide ANOVA outline, giving only the source of variation and degrees of freedom. Source of variation Degrees of freedom Total c) Carryout the analysis as One-way Anova with no blocking. To conduct the analysis, click on the

Stats and then Analysis of Variance. Select General and on Design box, select one-way ANOVA (no blocking). Click the Height to move it into Y-Variate box. Treatment structure in this case is the Thin. Click to move it into this box. Remember to have converted the Thin into a factor. Leave the Block box blank. Record the information from the analysis below.

Page 13: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 13 -

Source of variation D.F. SS MS V.R. F pr Total d) Record in the table below the mean, number of replications and standard error associated with each

treatment.

Treatment 4 6 9 12 Treatment Mean Number of replications Standard error

e) Outline the conclusions you draw from your ANOVA? Remember to state the hypotheses being

tested and the level of significance you prefer to use.

Page 14: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 14 -

f) Perform multiple comparison tests using least significance difference (LSD) approach to determine which treatments are significantly different at 5 % significance level. You will need the difference between the treatment means and standard error of the difference to be able to do this.

g) Check to see if the assumptions required for ANOVA namely, normality, independent and constant

variance, are varied. You will get this information by clicking on Further Output box after executing the ANOVA analysis and ticking the appropriate boxes.

Exercise 3.2 A nursery experiment was conducted to study the growth performance of Albizia zygia seedlings under different fertilizer treatments. Four treatments were included in the design, which was completely randomized in 10 replicates. Data on plant height (in cms) were recorded after a fixed period of time. The treatments were:

A: one dose of cowdung; B: two doses of cowdung; C: poultry manure; D: control

Page 15: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 15 -

Fertilizer Plant Height (in cms) A 32.8 33.4 31.2 31.6 29.4 30.3 34.1 31.9 30.5 30.4 B 34.4 28.2 35.1 30.4 27.8 29.2 32.6 31.0 30.0 39.8 C 29.8 30.6 29.1 24.3 31.2 31.6 28.3 28.9 29.2 26.4 D 28.1 30.3 24.2 27.8 25.6 28.1 32.7 26.8 26.9 26.4

a) Enter the data in a spreadsheet and save it as cowdung.gsh. Again, you require 2 columns and 40 rows

to enter the data. Name the first column as Fertilizer and the other as PlantHt. Once you have completed entering the data, convert the treatment into a factor and save the file.

b) Present an outline of the ANOVA identify only the source of variation and the degrees of freedom. Source of variation D. F. Total c) Analyse data as simple CRD. On the Stats Menu, select Analysis of Variance then General. While

in General, Scroll down to find Completely Randomized Design. The Y-variate in this case is the PlantHt and the treatment is Fertilizer. Record your results below:

Source of variation D.F. SS MS V.R. F pr Total Remark: Construction of linear contrasts The four treatments have a structure that allows for the construction of orthogonal contrasts. A contrast is a linear function of treatment means whose coefficients sum up to zero. For instance the linear coefficient for comparing treatments 1 & 2 against 3 & 4 has coefficients 1, 1, -1, -1. If you add these coefficients you get zero. Two contrasts are said to be orthogonal if the sum of cross-product of the coefficients of the linear contrasts equal zero. Suppose we have another linear contrast comparing treatments 1 against 2. This linear contrast has coefficients 1, -1, 0, 0. To show that the two linear contrasts are orthogonal, all we need to show is (1)x(1)+(1)x(-1)+(-1)x(0)+(-1)x(0) = 0. In general, with t number of treatments, we can construct t-1 orthogonal contrasts. In this case we have 4 treatments which imply that we can construct 3 orthogonal contrasts. The three logical pre-planned comparisons in this case are:

0. One dose of cowdung versus two does of cowdung. This corresponds to coefficients 1, -1, 0, 0.

Page 16: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 16 -

1. Cowdung manure versus poultry manure. The coefficients are : 0.5, 0.5, -1, 0. 2. Applying versus not applying manure. The coefficients are: 1, 1, 1, -3.

You will realise any other contrast will not be independent of the three. The construction of the linear contrasts depends on the type of research questions that could be of interest. More linear contrasts could still be constructed to answer other questions of interest, but one should always bear in mind that such questions are no longer independent. d) Demonstrate how each of these coefficients were obtained and verify the three comparisons are pair-

wise orthogonal. . Remark: Testing the contrasts To test the three comparisons, click on Contrast while in Completely Randomized Design dialog. Select the Comparisons by ticking on the box. Click on the effect to be analysed which in this case is Fertilizer and indicate 3 as the number of comparisons to be made. Click Ok. This takes you to a matrix in a spreadsheet form with three rows and 4 columns. The four columns correspond to the number of treatments and the rows correspond to the number of questions or contrasts. Enter each of the comparison coefficients as they are. You can name the rows as Contrast1, Contrast2 and contrast3. Alternatively, you can use the actual names associated with the comparisons. For instance A vs B; (A+B)/2 –C; and (A+B+C)/3 – D to rows 1, 2, and 3, respectively. e) List the following information associated with these contrasts from your output. Contrasts D.F. Sum of Squares Mean Square V.R. F pr Contrast 1 Contrast 2 Contrast 3

Page 17: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 17 -

f) Use the information in part (e) to answer the following questions using a significance level of 5 %. i) Is there a difference between using one or two doses of cowdung? i.e. A versus B with

coefficients 1, -1, 0, 0.

ii) Do the cowdung treatments give different means compared to poultry? i.e. (A+B)/2 - C which gives coefficients 0.5, 0.5, -1, 0.

iii) Is there any difference between applying and not applying fertilizer? i.e. (A+B+C)/3 - D which gives coefficients 1, 1, 1, -3.

Page 18: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 18 -

g) Provide overall conclusions indicating the treatment you would recommend and why.

Page 19: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 19 -

TUTORIAL FOUR Topic: Randomized Complete Block Design Background Randomized complete block design abbreviated RCBD is the most used design in agricultural experiments owing to its ability to control inherent variability that is uni-directional. When the experimental units are not homogeneous in terms of variability and the pattern of variability can be characterized, blocking techniques should be applied. The experimental units are grouped into homogeneous units referred to as blocks in such a way that the variability within the blocks is minimized whereas the variability between blocks is maximized. All the treatments are randomized within each established block using an independent randomization scheme for each block. The orientation of the blocks should be orthogonal or perpendicular to the variability gradient. It should be noted that the blocks need not be continuous and it is possible to have more than one replication within the block. The blocks are assumed to have been drawn from a large population of possible blocks and for that reason they are considered to be random effects. The interest therefore is mostly in quantifying the amount of variability accounted for by the blocks rather than whether a particular block is significantly different from the other. The following should be noted with a RCBD ) Blocks should be laid perpendicular to the gradient. a) Blocks need not be continuous. b) Possible to replicate within a block. c) A block should signify a known variation that need to be controlled by the experiment. d) All the treatments should be randomized within each block, ensuring independent randomization in

each block. Even when no obvious natural blocks that exist, it is still sensible to define blocks representing major patterns of variation. For instance, in on-farm experiments one may use farmers’ knowledge of crops grown in the previous season and fertility patterns within the farming area. Missing data can also occur in RCBD. The good thing with the design is that, the analysis can still be performed in the event of losing a complete block or replication. Remark Exercise 4.1 tests your understanding in the construction of a RCBD through randomization process, whereas exercises 4.2 and 4.3 establish the link between a paired t-test and RCBD. Each pair acts as a block. However, randomization within the block is not possible since the structure is one of “before” and “after” treatment application. Nevertheless, the principle of blocking remains as one of removing the known variability from the experimental error. The purpose of blocking is to reduce the experimental error in order to make the overall test more sensitive to small differences between treatment means.

Page 20: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 20 -

Exercise 4.1 An animal scientist has 6 treatments (A, B, C, D, E and F), laid in a randomized complete block design (RCBD) using 3 blocks. Consider the following extract from a random number table.

74220 17612 65522 03786 85967 73152 14511 07483 51453 11649 96283 01898 61414 49174 12074 98551 97366 39941 21225 90474 41469 16812 28599 64109 09497 25254 16210 89717 28785 02760 19155 ) Determine the random layout of the field experiment for the scientist. (Show a random layout of the

treatments, explaining each step you make).

Page 21: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 21 -

a) Give analysis of variance (ANOVA) table showing only the source of variation and degrees of freedom.

Source of variation D. F. Total b) Give a mathematical model and state the assumptions associated with it. c) State the null and alternative hypotheses. Exercise 4.2 The cooling constants of freshly killed mice and those of the same mice reheated to body temperature were determined. Nineteen mice were used in the experiment. This was a paired experiment. The data is stored in F:\Users\Biometry\Biom222\micepair.gsh.

Page 22: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 22 -

a) Analyse the data as a paired t-test (i.e. test the hypothesis of no difference between population means.

On the Stats Menu, click on Statistical Tests, and then select One or Two Sample Tests. In the Test box, selected paired t-test. This is a two sided test where you are testing the difference to be equal to zero. List down your output in the following space.

b) What are your conclusions? Exercise 4.3 The same data used in exercise 4.2 has been re-entered in a RCBD data entry format. The data is stored in F:\Users\Biometry\Biom222\micepair.gsh. On the Stats Menu, click on the Analysis of Variance. Select General and on Design box, select one-way ANOVA (in Randomized Blocks). Double click to move the variable from the available data dialog to Y-Variate box. Double click on the treatment to move it into the Treatment box and do the same for the block. Remember both the treatments and the blocks are factors. ) Analyze the same data as a randomized complete block design. (Note: we have 2 treatments and 19

blocks). List down the following information from your output.

Page 23: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 23 -

Source of variation D.F. SS MS V.R. F pr Total a) What are your conclusions? c) Verify that F-calculate equals square of the t-calculate (i.e. F=t2). Exercise 4.4 Seven litters each of five rats were used in a randomized complete block design, with litters taken as blocks. The researcher is interested in studying the effects of five different diets on the gain in weights of rats. The data is stored in F:\Users\Biometry\Biom222\litter.gsh. b) Demonstrate using the extract of random numbers given in exercise 4.1 how you could allocate the

five diets into the seven litters considering each litter as a block.

Page 24: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 24 -

c) Carry out the analysis of variance to see whether there are any differences between the diets. Click on

Stat Analysis of Variance and then select one-way ANOVA with blocking. Using a 5% significance level. Summarize the output in the space below.

c) List down the treatment means and the standard error.

Treatment No. Treatment mean Standard error

d) Perform LSD test to determine which treatment means are different at 5 % significance level.

Page 25: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 25 -

e) Comment on the validity of the ANOVA assumptions. (You need to get the appropriate residual plots

by Clicking on Further Output Option immediately after executing the ANOVA). Base your comments on the histogram, half normal-plot and the residual plot which you get from Further Output. Cut and paste these plots in the space provided below.

f) What is the proportion of total variation is accounted for by the blocks. g) Compute the relative efficiency of RCBD compared to a CRD and explain gain/loss.

Page 26: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 26 -

xercise 4.5

An experiment was carried out to compare the effects of various fungicide treatments on the growth and yield of oil seed rape. Four plots for each of the five treatments were laid out in a randomized complete block design. The treatments (labeled A, B, C, D, and E) were: A – untreated control B – standard fungicide applied at time 1 C – new fungicide applied at full rate at time 1 D - new fungicide applied at full rate at time 2 E – new fungicide applied at half rate at times 1 and 2. The data is store in F:\Users\Biometry\Biom222\oilseed.gsh.

f) Carry out the analysis of variance to see whether there are any differences between the diets. Click on Stat Analysis of Variance and then select one-way ANOVA with blocking. Using a 5% significance level. g) Fit contrasts to assess the overall difference between the control and the new fungicide, the overall

difference between the standard and the new fungicide, and the difference between application times 1 and 2 for the new fungicide. First provide the necessary coefficients associated with these contrasts. You will enter these coefficients into a matrix which you obtain after clicking on the Contrasts then select comparison.

Page 27: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 27 -

c) What are your overall conclusions?

Page 28: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 28 -

TUTORIAL FIVE Topic: Latin Square Design

Background: In certain situations variability associated with experimental units is bi-directional. Latin square designs have the ability to control such inherent variations. The Latin squares designs are square where the number of treatment, the number of rows and the number of columns are equal. The treatments are applied in such away that each treatment appears once in each row and in each column. The rows, columns, and treatments are assumed to be orthogonal, thus additive effects. This implies that the two and three way interactions between these factors do not exist. These interactions constitute the error component. Basic plans for these designs are available in most of statistics textbooks such as Cox and Cochran (1957). The approach to using Latin squares design involves the selection of a design plan according to the number of treatments under consideration. The rows of the selected plan are randomized using the randomization procedure followed by randomization of the columns. The randomization scheme for the rows is independent of the column randomization. These designs are commonly used in animals and factory experiments. The designs are normally denoted as 3x3 Latin squares, ..., 8x8 Latin squares, etc. In practice, the designs are applicable only to experiments in which the number of treatments is not less than four and not more than eight. For small experiments involving less than four treatments, the error degrees of freedom are few leading to less sensitive design. Similarly, for large experiments involving more than eight treatments it becomes difficult to maintain homogeneity. Usually, the design works well in experiments where treatments are between five and twelve. In situations where treatments are less than four, multiple Latin squares could be used in order to increase the error degrees of freedom. Remarks: Exercise 5.1 demonstrates your ability to randomize a Latin square design. Exercises 5.2, and 5.3 use the same information, with the former using direct computation and the latter using the computer. Exercise 5.1 assesses your understanding of computational procedure in attaining ANOVA table, whereas exercise 5.2 shows how you attain the same results using computer. Exercise 5.1 A researcher was interested in estimating the effects of five types of feeds (F1, F2, F3, F4, and F5) on milk production. She selected five animals of relatively different weights (W1, W2, W3, W4, and W5). Five feeding periods (I, II, III, IV, and V) were used. Considering the animals to be columns and periods to be the rows, a Latin square design plan was selected.

F1 F2 F3 F4 F5 F2 F3 F4 F5 F1 F3 F4 F5 F1 F2 F4 F5 F1 F2 F3 F5 F1 F2 F3 F4

) Using this plan demonstrate how randomization could be done. Explain each step you take.

Use the following extract of random numbers to set up your randomization scheme and be sure to indicate your final plan.

16812 28599 64109 09497 25254 16210 89717

Page 29: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 29 -

a) Give analysis of variance (ANOVA) table showing only the source of variation and degrees of

freedom. Source of variation D. F. Total

Page 30: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 30 -

Exercises 5.2

In an experiment to assess the durability of four different types of carpet, four machines were available to simulate the wear arising from daily use. As it was thought that there might be differences between the conditions in the laboratory on each day that the experiment was run, a Latin square was used. The percentage wears of the carpet were the measurements made. These measurements are given below. The different types of carpet are denoted by the letters A –D. The days are the rows, the machines are the columns and the types of carpets are the treatments.

D 38 A 18 C 38 B 39

A 19 D 22 B 26 C 35

B 41 C 54 A 11 D 36

C 61 B 36 D 22 A 16

) Give analysis of variance (ANOVA) table showing only the source of variation and degrees of

freedom. Source of variation D. F. Total ) Give a mathematical model and state the assumptions associated with it.

a) State the null and alternative hypotheses.

Page 31: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 31 -

b) Complete the following table.

Treatment No. A B C D Treatment total Treatment mean

Page 32: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 32 -

c) Compute the sum of squares (SS) for the following components, total, treatments, rows, columns, and error.

) Compute the corresponding mean squares (MS) for part (e).

Page 33: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 33 -

) Present the analysis of variance (ANOVA) table. Source of variation D.F. SS MS V.R. F pr Total a) State the null hypothesis for testing the equality of the treatment means, and test the hypothesis at 5

% level of significance. b) Compute the standard error of the treatment means difference.

Page 34: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 34 -

c) Compute a least significance difference (LSD) at 5 % and determine which treatments are significantly different.

d) What are your overall conclusions based on results obtained in part (j)?

Page 35: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 35 -

l) Construct individual, 95 % confidence intervals for treatment mean differences and use these intervals to perform test of significance. ) What are your overall conclusions based on results obtained in part (l)?

Page 36: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 36 -

Exercise 5.3

This exercise refers to the data presented in exercise 5.2 where actual names of the four treatments are known. The data have been entered into a spreadsheet and stored in director: F:\Users\Biometry\Biom222\ carpet.gsh. Open the data file. Consider the following information on the carpet types:

A – Local material. B – Imported material. C – Local plus imported material (60 %). D - Local plus imported material (40 %).

) Carry out the analysis of variance to determine whether the four treatments are significantly different at 5 % level of significance. There are two ways one can conduct the analysis. One can either use the standard Latin square design or General analysis of variance.

How to use standard Latin square: On the Stats menu, click on the Analysis of variance. Select General then scroll down to find Latin square. Remember the rows, columns and treatments are factors. If they are not, you need to go back to the spreadsheet and convert them. Click to move the factors and the variate to the appropriate dialogue boxes. How to use General Analysis of Variance: On the Stats menu, click on the Analysis of variance. Select the General then General Analysis of Variance. Click to move the variate to the Y-variate dialogue box. Click to move the treatment into the treatment structure dialogue box. In the block structure dialogue box, type Row*Column. This will generate the row, column and row by column interaction components.

Present the following information from your output in the table below. Source of variation D.F. SS MS V.R. F pr Total a) Construct three meaningful questions (linear contrasts) and test them at 5% significance level. The

construction is based on the structure of the treatments. For instance, the research might be interested in testing if imported material is any better than the local material.

Page 37: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 37 -

Once you have constructed your three linear contrasts, repeat the analysis but this time remember to click on the Contrasts, indicate they are three, select the effect which is the treatment and then click on the Comparison box. The process takes you to a small spreadsheet where you enter the coefficients corresponding to the three questions.

List the following information associated with these contrasts from your output. Contrasts D.F. Sum of Squares Mean Square V.R. F pr Contrast 1 Contrast 2 Contrast 3 b) Comment on the ANOVA assumptions using plots obtained through Further Output option. Click

on the residual plots to get the plots. Each of the plots tests one or two of the assumptions made on the residual effect. (These are: normality, independent, and constant variance).

c) Suppose these assumptions are violated. Transform the data using logarithm function and repeat the

analysis, using the transformed data as the response variable. To transform the data on the Data menu, select Transformation. Scroll down to select the appropriate transformation function, in this case the Log function. Click on the variable to be transformed and provide a new name for the transformed data. Choose the option that displays the transformed in the original spreadsheet containing the data. Provide your output which includes the contrast in the table below.

Page 38: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 38 -

Source of variation D.F. SS MS V.R. F pr Total d) What are your final conclusions?

Exercise 5.4

An ornamental horticulturist conducted a fertilizer experiment in a greenhouse where 5 fertilizer treatments (A, B, C, D, and E) were tested by arranging plants in a Latin – square design. Thus rows and columns in the table are rows and columns in the greenhouse. The data below shows the yield from the experiment.

A 22 B 23 C 19 D 12 E 14 B 20 C 13 D 16 E 19 A 18 C 14 D 10 E 12 A 26 B 23 D 19 E 18 A 20 B 18 C 14 E 15 A 24 B 20 C 17 D 10

Page 39: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 39 -

) Write down the mathematical model for this design assuming rows and columns to be random effects. a) List down the parameters to be estimated from the model stated in part (a). b) List the necessary assumptions for the model stated in part (a). c) State the null hypotheses to test the: ) Fertilizer effects, (Consider the effects fixed)

ii) Row effects, (Consider the effects random)

Page 40: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 40 -

i) Column effects, (Consider the effects random) d) Enter the data into a spreadsheet and save it as hort.gsh. (You will require 4 columns and 25 rows,

where the fertilizer, row, column are converted to factors and the yield is a variate) e) Conduct the analysis of variance and present your output in the table below. Source of variation D.F. SS MS V.R. F pr Total f) Using a 1 % level of significance, determine if the mean yields are equal for the 5 fertilizers.

Page 41: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 41 -

g) Suppose the answer in part (g) is that they are different. Use a least significance difference (LSD) test at 1 % level of significance to determine which means are different.

h) What is the proportion of the total variation that is being accounted for by columns? i) Comment on the validity of some of the assumptions stated in part (c), using plots obtained through

Further Output option.

Page 42: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 42 -

j) Compute the relative efficiency of the Latin square design (rows as blocks) over RCBD and interpret the value you obtain.

k) What are your overall conclusions in terms of the best fertilizer treatment and the effectiveness of the

design?

Page 43: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 43 -

TUTORIAL SIX Topic: Split-Plot Design

Background Split-plot design involves two- or higher-order treatment structure with an incomplete block design structure and at least two different sizes of experimental units. The simplest split-plot design involves two factors where one factor levels are randomly applied to the blocks and the other factor levels are applied to the whole-plot. It implies that the treatment to be measured with higher precision is applied to the smaller experimental unit and that of less precision applied to the larger unit. Consequently, the interaction is measured with a higher precision. The whole plot treatment can be applied to any form of design structure depending on the nature of the experimental units. For instance, if CRD is to be used, the whole-plot treatment is randomly assigned to the units. The whole plot is them sub-divided into smaller units called sub-plots. The whole-plot is taken as a block with respect to the sub-plot treatments and all randomization procedures for the RCBD apply. That is, independent randomization scheme for each whole-plot. Similarly, if the experimental units are first grouped into blocks such that variability within blocks is minimized and variability between blocks is maximized, then the whole-plot treatment design structure is RCBD. The whole-plot factor levels are randomized within each block using new randomization scheme for each block. Again, the whole-plots act as blocks for the sub-plot factor levels. The selection of a split-plot design depends on practicability of the treatments. Say applying fertilizer to a whole-plot and varieties to a sub-plot, etc. The fact that there are two experimental units imply that there are two experimental errors, hereby, referred to as error (a) and error (b). The plot layout requires the whole-plot treatments to be randomly applied to the whole-plot and then the sub-plot treatments are applied to each whole -plot randomly. It should be noted that the experimental error from a RCBD is split into error (a) and error (b) when one decides to apply split-plot. There must be a reason behind the use of a split plot. Split-plot design should not be used when there is no reason as to why one factor should be assessed with a higher precision than the other. Another reason for using split-plot is when one factor requires a bigger experimental plot due to management practice than the other. For instance irrigation may require a bigger plot or use of a tractor. Remark Exercise 6.1 determines your ability to randomize the factor levels using random number table in split-plot layout with the whole-plot factor established in a RCBD. Exercise 6.1 An experiment was conducted to determine the performance of three oat varieties under several levels of nitrogen with respect to yield. The varieties (V1-Marvellous, V2 – Victory and V3 – Golden rain) were the whole-plot treatments and the nitrogen levels (N0 – 0 cwt, N1 – 0.2 cwt, N2 – 0.4 cwt, and N3 – 0.6 cwt) were the sub-plot treatments. The whole-plot treatments were replicated 6 times. The total number of experimental units required are (3 variety levels)x(4 nitrogen levels)x(3 replications) = 36.

Consider the following extract from a random number table.

74220 17612 65522 03786 85967 73152 14511 07483 51453 11649 96283 01898 61414 49174 12074 98551 97366 39941 21225 90474 41469 16812 28599 64109 09497 25254 16210 89717 28785 02760 19155

Page 44: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 44 -

a) Determine the random layout of the field experiment. (Show a random layout of the treatments, in the plots provided below and explain each step you make). You need to randomize the varieties within the block first and then randomize the nitrogen levels within each whole-plot. The whole-plot acts as a block for the subplot treatment. Use the notations V1 –V3, and N0 – N4 to show the layout.

Block I

Block II

Block III

Outline the steps you have taken in the randomization process in the space provided below.

Page 45: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 45 -

a) Give analysis of variance (ANOVA) table showing only the source of variation and degrees of freedom.

Source of variation D. F. Error (a) Error (b) Total 35

Page 46: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 46 -

Exercise 6.2 Refer to the data stored in F:\Users\Biometry\Biom222\golden.gsh which refers to exercise 6.1 and is setup to answer the following questions. ) Carry out the analysis to test the effects of variety, nitrogen and variety by nitrogen interaction at 5 % significance level. Once you get the data, display them in a spreadsheet, and proceed with the analysis as follows: On the Stat menu, select the Analysis of Variance, click on General and then Split Plot Design. Double click to move the variate and factors to the appropriate dialogue boxes inorder to obtain the initial ANOVA. Remember to click on further output to get the residual plots required to check on the assumptions. To get the estimates of the various variance components, click on Options then ensure the stratum variance dialogue box is selected. Present the ANOVA output in the table Source of variation D.F. SS MS V.R. F pr Error (a) Error (b) Total b) Attempt to obtain the same results in (a) by use of General Analysis of Variance directives. On the Stats menu, click on the Analysis of variance. Select the General then General Analysis of Variance. Click to move the variate to the Y-variate dialogue box. Click to move the treatment into the treatment structure dialogue box as Variety*Nitrogen. This command will generate Variety+Nitrogen+Variety.Nitrogen. In the block structure dialogue box, type Block/Variety. This line will automatically generate Block+Block.Variety. Note that Block.Variety is our Error (a). We do not need to specify Error (b) since it will be generated automatically.

Page 47: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 47 -

c) List down the estimated variance components in the table below: Name of Variance component Estimate of variance component Percentage Block Block*Variety Residual Total ) Where do we have the most variability? Interpret what this implies with respect to the overall design. i) Were the blocks effective in controlling variability? Why or why not?

Page 48: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 48 -

Exercise 6.3 In an experiment to study the effect of two meat-tenderizing chemicals, the two (back) legs were taken from four carcasses of beef and one leg was treated with chemical 1 and the other with chemical 2. Three sections were then cut from each leg and allocated (at random) to three cooking temperatures, all 24 sections ( 4 carcasses ´ 2 legs ´ 3 sections ) being cooked in separate ovens. The table below shows the force required to break a strip of meat taken from each of the cooked sections. Leg 1 2 --------------------- ------------------------ Carcass Section Chemical Temp Force Chemical Temp Force 1 1 1 2 5.5 2 3 6.3 2 1 3 6.5 2 1 3.5 3 1 1 4.3 2 2 4.8 2 1 2 1 3.2 1 3 6.2 2 2 3 6.0 1 2 5.0 3 2 2 4.7 1 1 4.0 3 1 2 1 2.6 1 2 4.6 2 2 2 4.3 1 1 3.8 3 2 3 5.6 1 3 5.8 4 1 1 3 5.7 2 2 4.1 2 1 1 3.7 2 3 5.9 3 1 2 4.9 2 1 2.9 Consider chemical as the whole-plot treatment and temp as the sub-plot treatment. The whole-plot treatment was laid in a randomised complete block design. The carcasses are the blocks. a) Provide a schematic sketch of how the treatments are applied.

Page 49: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 49 -

b) Is it possible to separate the effect of legs from that of the chemical? Why or why not? c) Give an outline of the ANOVA identifying the source of variations and degrees of freedom. Source of variation Degrees of .Freedom Error (a) Error (b) Total d) Enter the data into a spreadsheet and save it as carcass.gsh. Note: You require 4 columns (Carcass, Chemical, Temp, and Force) and 24 rows. e) Carryout a detailed analysis and summarise your outputs in the following table. Source of variation D.F. SS MS V.R. F pr Error (a) Error (b) Total

Page 50: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 50 -

f) Obtain the two-way table of the chemical and temperature means and the associated standard errors. g) Illustrate how the standard errors given in part (f) are computed. h) Present your overall conclusions which could be used to write the final report.

Page 51: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 51 -

TUTORIAL SEVEN

Topic: Split-Split-Plot Design

Background

A situation occurs when more than two sizes of experimental units are used. Three sizes of experimental units require three stages of randomisation, which corresponds to three sources of experimental errors, say errors (a), (b) and (c). The decision on which treatment goes to which plot, is determined by the required precision, as well as by plot management. Note that, the whole-plot treatment could be applied to any design structure, such as CRD, RCBD, Latin squares, etc., The process of split-split plot design layout can be extended to any other level. The number of experimental units and errors increase in the same number. Again, it should be noted that the standard errors from such analysis are incorrect, especially for unbalanced data or data characterised by missing values. This implies that, test statistics for means comparison and confidence intervals are inappropriate. It is therefore recommended that REML procedure be applied because it provides correct standard errors and hence, appropriates test statistics and confidence intervals. Exercise 7.1 Consider an experiment on grain yields of three rice varieties grown under three management practices and five nitrogen levels. The experiment was carried out in a split-split-plot layout with Nitrogen as Whole plot, Management Practice as Subplot, and Variety as Sub-subplot Factors, with the whole plot treatment applied in a RCBD with three replications. The data are store in F:Users\Biometry\Biom222\spltspltplot.gsh. a) Provide a schematic sketch of how the treatments are applied.

Page 52: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 52 -

b) Write down the mathematical model for this design and state the assumptions. c) Attempt to present a sketch of the format of ANOVA table giving only the sources of variation and degrees of freedom. Source of variation Degrees of .Freedom Error (a) Error (b) Error (c) Total d) Carried out the analysis using the standard split-split-plot design available from analysis of variance option.

Page 53: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 53 -

Source of variation D.F. SS MS V.R. F pr Error (a) Error (b) Error (c) Total e) Conduct the test of hypotheses using 5 % significance level and make the necessary conclusions.

Page 54: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 54 -

f) Present the various two way tables of treatment means and the associated standard errors. g) Indicate the proportions of total variation accounted by each of the random components.

Page 55: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 55 -

Exercise 7.2

Use the general analysis of variance option to obtain results which are similar to those obtained using standard split-split- plot design in exercise 7.1 part (d).

Note

Blocks/variety/management/nitrogen generates: Block+Block.variety+Block.management+Block.variety.management+Block.Nitrogen+ Block.variety.management.nitrogen

Variety*management*nitrogen generates: Variety+management+variety.management+nitrogen+variety.nitrogen+ Management.nitrogen+ variety.management.nitrogen Recall order matters.

Source of variation D.F. SS MS V.R. F pr Error (a) Error (b) Error (c) Total

Page 56: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 56 -

Exercise 7.3 A study was conducted to determine the influence of plant density and hybrids on maize yield. The experiment was a 2x2x3 factorial replicated 4 times in a randomized complete block design arranged in a split-split-plot layout. In this experiment, factor A is the 2 maize hybrids (P3730 and B70xLH55) assigned to the main plots, factor B is the 2 row spacing (12 and 25 inches) assigned to the subplots, and factor C is the 3 target plant densities (12 000, 16 000, and 20 000 plants per acre) assigned to the sub-subplots. The data is presented below:

Grain Yield (Bushels per Acre) Replications

Hybrid Row Spacing Plant Density (‘000) I II III IV P3730 12 12 140 138 130 142 16 145 146 150 147 20 150 149 146 150 25 12 136 132 134 138 16 140 134 136 140 20 145 138 138 142 B70xLH55 12 12 142 132 128 140 16 146 136 140 141 20 148 140 142 140 25 12 132 130 136 134 16 138 132 130 132 20 140 134 130 136

a) Enter the data into a spreadsheet and safe it as maizeyld.gsh. Hint: Label the columns as: Rep Hybrid RowSp PlantD Yield b) Attempt to present a sketch of the format of ANOVA table giving only the sources of variation and degrees of freedom. Source of variation Degrees of .Freedom Error (a) Error (b) Error (c) Total

Page 57: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 57 -

c) Carried out the analysis using the standard split-split-plot design available from analysis of variance option. Source of variation D.F. SS MS V.R. F pr Error (a) Error (b) Error (c) Total d) Test the various hypotheses for fixed effects and provide the conclusions.

Page 58: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 58 -

e) Present the various two way tables of treatment means and the associated standard errors. f) Indicate the proportions of total variation accounted by each of the random components.

Page 59: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 59 -

TUTORIAL EIGHT Topic: Repeated Measures Experiment Background The repeated measures designs are often called split-plot in time. In split-plot experiments, the errors are assumed to be independent whereas in repeated measures experiments, they are thought to be correlated as a result of inability to randomize time periods. Repeated measures designs involve one or more steps where the experimenter cannot randomly assign the levels of one or more treatments to a given size of experimental unit. A ‘size’ of an experimental unit is sometimes determined by a time interval when a given unit is observed at different points of time (e.g. growing plants, growing animals, etc.,) Consider a situation where a treatment is applied to an animal (assumed to be an experimental unit) and observations are time over time, say after one, two, three, etc., weeks. It is impossible to randomize time, considered to be a subplot. This is different from the known split-plot where the subplot treatment is possible to randomise. The aspects of having to take multiple measurements on the same experimental unit forms the bases for a repeated measures experiment. These experiments are quite common with animal or tree experiments. Most experiments with these kinds of characteristics have been analyzed as if they were separate experiments done over time. This approach is inadequate because it ignores time completely. The experiment is such that subjects which may be different for each treatment are nested with treatments but crossed with time (considering time as a factor). Treatments are applied to different subjects (one of unit) and observation taken on a subject at different time intervals. The fact that the design leads to correlation of responses through time and time cannot be randomised makes the design different from split-plot design. Exercise 8.1 An experiment was carried out at Ukulinga farm to study the effect of different sources of fertilizer on the growth of Swiss Chard. Seven treatment combinations were applied in a RCBD with 4 replications. The harvests were done at 8 time intervals. Treatments 1 – Control; 2 – Chemical fertilizer at 50%; 3 – Chemical fertilizer at 100%; 4 – Composit at 50%; 5- Composit at 100%; 6 – Biodigester liquid at 50% and 7 – Biodigester at 100% a) Refer to the data F:\Users\Biometry\Biom222\Swissch.gsh. Carry out the analysis as a split – plot design.

Page 60: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 60 -

b) Construct three meaningful contrasts and test them. c) What are your conclusions? Exercise 8.2 a) Refer to the data F:\Users\Biometry\Biom222\Swisschrpt.gsh. Carry out the analysis as repeated measures.

Page 61: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 61 -

b) Construct three meaningful contrasts and test them. c) What are your conclusions?

Exercise 8.3 An experiment involving two drugs plus a control was conducted to study each drug’s effect on the heart rate of humans. After the drug was administered, the heart rate was measured every five minutes for a total of 4 times. At the start of the study, 8 female human subjects were randomly assigned to each drug. The data on heart rate is stored in F:\Users\Biometry\Biom222\htrate.gsh. a) Write down the mathematical model, and define each component in the model.

Page 62: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 62 -

b) Identify your whole plot and the sub-plot experimental units. c) Give an ANOVA outline identifying the sources of variation and degrees of freedom only. Source of variation D. F. Error (a) Error (b) Total

Page 63: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 63 -

d) Carry out the analysis as a “Split-plot design”. What are the problems associated with this approach? Source of variation D.F. SS MS V.R. F pr Error (a) Error (b) Total e) What are your conclusions from this analysis after testing all the necessary hypotheses?

Page 64: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 64 -

f) Repeat parts d) and e) using the Repeated measures analysis. Refer to the data Biom222\htraterpt.gsh. What are observations?

TUTORIAL NINE Topic: Factorial experiment Background These are experiments involving more than one treatment structure. The selection of these arrangements is aimed at investigating interaction effect between the factors. The factorial experiment should not be confused with design of experiment. The factorial experiment refers to treatment arrangement and it can fit in any design structure. Factorial arrangements permit several different questions to be studied simultaneously, with little additional labour. Increased precision is achieved but also increased scope and utility of the results, by providing information on the interaction between the factors under test. A factor is a treatment variable, which is varied at the will and under the control of the experimenter. A factor can be quantitative or qualitative. The particular value, which a factor takes, is called the level of that factor. Each experimental unit receives one level of each factor. A treatment combination is a combination of factor levels. The interaction comparison investigates whether the effect of one factor is the same at different levels of

Page 65: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 65 -

the other factor. If the interaction is significant then clearly the effect of one factor is different at the different levels of the other factor. In this situation, interpretation of the main effect of a factor averaged over a factor with which it interacts may be misleading. If the interaction is not significant it is concluded that the factors behave independently and the main effects can then be interpreted directly. If the 2-factor and higher order interactions appear negligible, then the results of the experiment should be interpreted in terms of the main effect mean value only, ignoring the mean values for the combinations of levels. If the 3-factor and higher order interactions appear negligible, then the mean values for combinations of levels from pairs of factors should provide the basis for the interpretation. If a 2-factor interaction is clearly important, then the interpretation of the effects of these two factors should normally be based on the mean values for the combinations of levels for those two factors. If one or both of the main effect mean squares are large compared with the interaction mean square, then the main effect comparison of mean values will be meaningful and the interpretation should be in terms of the main effect, or main effects, modified by the interaction effects. If, on the other hand, the main effects mean squares are of the same order of magnitude as the interaction mean square, then the main effect mean values will probably add little to the interpretation, in terms of the mean values for the 2-factor combinations of levels. Exercise 9.1 Consider a 2 x 2 factorial experiment to investigate the effect of feed supplements under two light exposure times. The weight gains for 4 littermates from each of 3 litters were recorded. The design was a RCBD with 3 litters as blocks. The data recorded after 6 weeks is given below: Factors Levels Feed Supplement (F) f1 – none, f2 – supplement Light (L) t1 -3 hrs light exposure, t2 – 14 hrs time exposure

Treatment Litter1 1 2 3

f1t1 f1t2 f2t1 f2t2

10 16 13 12 15 15 13 16 16 18 22 20

) Compute all the relevant sum of squares and present them in ANOVA table.

Page 66: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 66 -

a) Test (i) Main effects and (ii) Interaction effects at 5 % level of significance. b) Present your results in a graphical table. c) Give your conclusions.

Exercise 9.2 Linthurst conducted a greenhouse experiment on the growth of Spartina alterniflora, an ecologically important salt marsh plant species, to evaluate the effects of salinity, nitrogen and aeration. The variable reported here is biomass, the dried weight of all the aerial plant material.

Page 67: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 67 -

Factors Levels Salinity 15, 30, 45 (parts per thousand) Nitrogen 0, 168 (Kg/Hectare) Aeration 1=none, 2=saturated Treatment Block: 1 2 3 4 5 6 7 8 9 10 11 12 1 11.8 18.8 21.3 83.3 8.8 26.2 20.4 50.2 2.2 8.8 1.4 25.8 2 8.1 15.8 22.3 25.3 8.1 19.5 8.5 47.7 3.3 7.6 15.3 22.6 3 22.6 37.1 19.8 55.1 2.1 17.8 8.2 16.4 11.1 6.0 10.2 17.9 4 4.1 22.1 49.0 47.6 10.0 20.3 4.8 25.8 2.7 7.4 0.0 14.0 This data is contained in the file 'Biometry\Biom222\spartina.gsh'

a) Write down a mathematical model for this experiment taking note of the factorial structure.

b) List down the number of parameters to be estimated. c) Present an analysis of variance for the above data and test the hypothesis that there are no significant differences between the 12 treatments. Ignore the fact that this is a "structured" treatment set.

Page 68: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 68 -

d) Reanalyze the data as 3x2x2 factorial treatment structure. e) A valid analysis of variance requires the assumption of homogeneity of variance; do you think this assumption has been violated in this experiment? A "box-plot" may help you in arriving at a conclusion.

Page 69: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 69 -

Exercise 9.3 (A case of unequally replicated control) Consider an experiment to test the effect of sulphur application on potato scab. There are 7 treatments replicated 8 times in a randomised complete block design. The first is an untreated control (which is replicated twice in each block). The other 6 treatments consist of 2 factors. There are 3 amounts of dressing of sulphur (300, 600 and 1200 Kg per ha) with an autumn or spring application of each amount. Notice therefore that there are 8 replicates for the control treatment and 4 each for the sulphur dressings. The variable being analysed is the ‘scab index’.

Treatments 300Kg 600Kg 1200Kg

Control Aut Spri Aut Spri Aut Spri Block 1 12 30 9 30 16 18 10 17 Block 2 10 18 9 7 10 24 4 7 Block 3 24 32 16 21 18 12 4 16 Block 4 29 26 4 9 18 19 5 17

a) Show how the 7 treatments are generated. b) Enter the data into a spreadsheet. Name the columns as Block, Treat, Applic, Sulp and SIndex.

Save the data as ScabIndex.gsh.

c) Perform the analysis of variance using General Analysis of Variance GenStat Command. Make use of Treat/(Applic*Sulp)in your treatment structure.

Page 70: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 70 -

d) What conclusions do you make?

Page 71: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 71 -

TUTORIAL TEN

Topic: Simple and Multiple Linear Regression Background Regression analysis is a technique used when association or relationship between variables is investigated. One of the variables is required to be dependent on one or more of independent variables. Several objectives exist for carrying out regression analysis, among them is to: • See if Xi affects Y. The object would be to investigate whether there is a change in Y when the level

of X is changed. Thus establishing a functional relationship between the two variables. In this case, X is assumed to be a continuous variable. A scatter plot would show is a relationship exists between the two variables.

• See how Xi affects Y. Would be interested in knowing by much the value of Y changes per unit

change in X. • Predict Y given Xi The objective in this case is to provide a mathematical function that would be

used in predicting values of Y per given X. When the interest is in the functional form of a relationship, then we are dealing with regression. For instance, the yield of a crop may be related to plant density by a certain curve. We could carry out an experiment to determine the equation of this curve in order to find the optimum plant density for the greatest yield per unit area. Regression analysis is used to find functional relationships. In particular, simple linear regression is used to test whether a relationship is linear, and if so to determine the equation of the best fitting straight line. Multiple linear regression is an extension of simple linear regression where more than one independent variable is involved. There are three types of mathematical models, namely, functional, control, and predictive. In real life it is impossible for an experimenter to have a true functional relationship between a response and predictor variables. The functional model even when known may fail to control a response variable. A control model is one where the experimenter has ability to control some response variables. Multiple regression techniques have their greatest contribution in deriving predictive models. The approach is good for variable screening. Exercise 10.1 Refer to file: F:\users\biometry\biom222\combin.xls involving an experiment conducted over several seasons with reps nested in seasons. There was a complete crop failure in 1993. You need to delete data for 1993 before carrying out the analysis. a) Generate a new column called Year while still in excel.

Page 72: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 72 -

b) Provide ANOVA outline for a combined analysis ignoring the year and with reps nested in seasons. c) Conduct analysis of variance inline with the outline given in part (b). What are your findings? d) Provide ANOVA outline for a combined analysis ignoring the season and with reps nested in years.

Page 73: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 73 -

e) Conduct analysis of variance inline with the outline given in part (d). What are your findings? f) Provide ANOVA outline for a combined analysis with year crossed with season and reps nested in seasons. g) Conduct analysis of variance inline with the outline given in part (f). What are your findings?

Page 74: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 74 -

Exercise 10.2 Refer to file: F:…\biom222\mregpol.xls a) Obtain a correlation matrix for X1, X2, X3 and Y in that order. What do you infer? b) Provide a multiple linear regression equation. c) Fit a multiple linear regression and comment on i) the significance of the overall model, ii) the significance of the individual regression parameters. (Use a 5% significance level).

Page 75: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 75 -

d) Generate new variables by transforming the independent variables as follows: X1sq=X1xX1; X2sq=X2xX2; X3sq=X3xX3; X12=X1xX2; X13=X1xX3; and X23=X2xX3 Display the new variables together with other variables. e) Fit a multiple linear regression and comment on i) the significance of the overall model, ii) the significance of the individual regression parameters. f) What would be your overall model based on your findings? Explain.

Page 76: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 76 -

Exercise 10.3 The following information is obtained from the data file: F:…\biom222\TOBACCO.gsh. An agronomist is investigating the influence of various nutrients, N, Cl, K, P, Ca, Mg, on certain qualities of smoking tobacco, namely, Rate of Burn, Sugar content, Nicotine %, note the order of variables.

) Obtain a correlation matrix table. a) Which regressors variable are highly correlated? b) Using multiple linear regression, establish which of the six (6) predictor variables contributes to

predicting of Nicotine content.

Page 77: computer-based biometrics manual 2006 - Statistics and Biometry

Computer-Based Biometrics Manual - 2006 PM Njuho

- 77 -

c) A tobacco sample, not included in the original data set is analysed for the six nutrients and the results are: [1.97, 2.92, 2.06, 0.44, 3.26, 0.64] respectively. Obtain the "predicted nicotine content" for this sample and set 95% confidence limits to this estimate.

Exercise 10.4 Refer to the data stored in F\users\biometry\biom\tuky.gsh. The columns are arranged in the following order: REP, TRT, CKGLOSS, CKGTIME, FAT, HEX, MOIST, NONHEM, PH. The first two columns are factors and should not be considered as variables. Use the CKGLOSS as the dependent variable.

Fit multiple regression on this data and provide an appropriate model for predicting CKGLOSS. Apply the three selection procedures namely, FORWARD, BACKWARD and STEPWISE.