central locations and variability raisheine joyce dalmacio ms bio i

37
Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Upload: lorena-martin

Post on 02-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Central Locations and VariabilityRAISHEINE JOYCE DALMACIO

MS BIO I

Page 2: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Concept and Importance:Central tendency

•A single value that represents the whole population or a sample of particular characteristics.

•Value or characteristics that fall in or near the middle.

• Originated from the concept of “average man”

•A number of ways have been developed for the measurement of central representative value(s).

Page 3: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Station LHC LSC DC DCA Other Algae AbioticOtherBiota

OTA OTD OTL

Mainland 1 9.42 0.00 8.02 37.66 0.60 1.78 41.18 1.34 0.002 13.92 0.00 0.00 53.06 17.80 12.74 0.40 2.08 0.003 14.90 0.00 0.00 46.74 9.20 20.44 0.06 8.66 0.004 7.18 0.20 14.42 57.52 1.76 9.04 3.04 6.48 0.365 10.96 0.40 0.60 47.80 11.00 26.20 0.00 3.00 0.046 31.30 0.16 1.04 38.06 9.66 16.72 0.00 2.90 0.167 10.52 0.00 1.40 78.38 1.88 5.80 0.00 2.00 0.028 9.24 0.00 0.00 65.66 7.38 11.48 0.26 5.98 0.009 34.70 0.00 23.22 39.58 0.20 2.20 0.00 0.10 0.0010 34.14 0.12 0.48 32.64 4.18 24.78 0.00 2.96 0.7011 41.42 3.00 6.50 37.16 1.56 9.32 0.64 0.40 0.0012 26.28 0.88 28.44 33.64 2.60 3.00 1.72 1.46 1.98

Island 1 44.70 8.12 5.92 35.04 0.00 5.36 0.40 0.00 0.462 30.00 12.18 0.00 50.10 2.72 1.54 0.26 2.66 0.543 30.76 6.64 35.38 16.42 2.48 6.92 0.60 0.80 0.004 17.02 1.14 24.34 34.36 1.82 12.78 2.96 4.36 1.225 25.92 0.40 40.54 18.68 0.00 12.46 0.00 2.00 0.006 30.28 0.72 64.04 0.00 0.00 0.94 0.00 4.02 0.007 28.20 1.72 33.18 14.14 0.00 14.49 8.45 0.00 0.008 26.06 1.22 30.58 9.84 0.00 26.50 5.80 0.00 0.009 22.92 1.46 46.18 10.90 0.00 16.91 0.00 1.63 0.0010 34.28 1.04 23.20 11.88 0.00 21.46 6.66 0.58 0.9011 46.08 1.72 24.56 8.48 0.00 18.64 0.00 1.04 0.0012 50.62 1.06 19.46 1.52 0.00 24.36 0.44 0.18 2.36

Page 4: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Measures of Central Tendency

Page 5: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

MeanArithmetic Geometric Harmonic

-if data are in numerical series

Used when:

• the data are not skewed (no extreme outliers)

• the individual data points are not dependent on each other

-if data are in geometric series

Used when:

• the data are inter-related–for example, when discussing returns on investment or interest rates.

• No zero values

-reciprocal of arithmetic series

Used when:

• a large population where the majority of the values are distributed uniformly but where there are a few outliers with significantly higher values

• No zero values

Page 6: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Station LHC LSC DC DCA Other Algae AbioticOtherBiota

OTA OTD OTLMainland 1 9.42 0.00 8.02 37.66 0.60 1.78 41.18 1.34 0.00

2 13.92 0.00 0.00 53.06 17.80 12.74 0.40 2.08 0.003 14.90 0.00 0.00 46.74 9.20 20.44 0.06 8.66 0.004 7.18 0.20 14.42 57.52 1.76 9.04 3.04 6.48 0.365 10.96 0.40 0.60 47.80 11.00 26.20 0.00 3.00 0.046 31.30 0.16 1.04 38.06 9.66 16.72 0.00 2.90 0.167 10.52 0.00 1.40 78.38 1.88 5.80 0.00 2.00 0.028 9.24 0.00 0.00 65.66 7.38 11.48 0.26 5.98 0.009 34.70 0.00 23.22 39.58 0.20 2.20 0.00 0.10 0.0010 34.14 0.12 0.48 32.64 4.18 24.78 0.00 2.96 0.7011 41.42 3.00 6.50 37.16 1.56 9.32 0.64 0.40 0.0012 26.28 0.88 28.44 33.64 2.60 3.00 1.72 1.46 1.98Mean (Arithmetic) 20.33 0.40 7.01 47.33 5.65 11.96 3.94 3.11 0.27Mean (Geometric) 17.05 #NUM! #NUM! 45.62 3.12 8.69 #NUM! 1.94 #NUM!Mean (Harmonic) 14.44 #NUM! #NUM! 44.13 1.26 5.73 #NUM! 0.73 #NUM!

Island 1 44.70 8.12 5.92 35.04 0.00 5.36 0.40 0.00 0.462 30.00 12.18 0.00 50.10 2.72 1.54 0.26 2.66 0.543 30.76 6.64 35.38 16.42 2.48 6.92 0.60 0.80 0.004 17.02 1.14 24.34 34.36 1.82 12.78 2.96 4.36 1.225 25.92 0.40 40.54 18.68 0.00 12.46 0.00 2.00 0.006 30.28 0.72 64.04 0.00 0.00 0.94 0.00 4.02 0.007 28.20 1.72 33.18 14.14 0.00 14.49 8.45 0.00 0.008 26.06 1.22 30.58 9.84 0.00 26.50 5.80 0.00 0.009 22.92 1.46 46.18 10.90 0.00 16.91 0.00 1.63 0.0010 34.28 1.04 23.20 11.88 0.00 21.46 6.66 0.58 0.9011 46.08 1.72 24.56 8.48 0.00 18.64 0.00 1.04 0.0012 50.62 1.06 19.46 1.52 0.00 24.36 0.44 0.18 2.36Mean (Arithmetic) 32.24 3.12 28.95 17.61 0.58 13.53 2.13 1.44 0.46Mean (Geometric) 30.85 1.81 #NUM! #NUM! #NUM! 9.50 #NUM! #NUM! #NUM!Mean (Harmonic) 29.51 1.24 #NUM! #NUM! #NUM! 4.78 #NUM! #NUM! #NUM!

Page 7: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Median and Mode After sorting and arranging a data set as an array, the value that falls right in the middle of the scale is called the median.

• Mode is defined as the value that appears most frequently in a given set of data.

Page 8: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Station LHC LSC DC DCA Other Algae Abiotic OtherBiotaOTA OTD OTL

Mainland 1 9.42 0.00 8.02 37.66 0.60 1.78 41.18 1.34 0.002 13.92 0.00 0.00 53.06 17.80 12.74 0.40 2.08 0.003 14.90 0.00 0.00 46.74 9.20 20.44 0.06 8.66 0.004 7.18 0.20 14.42 57.52 1.76 9.04 3.04 6.48 0.365 10.96 0.40 0.60 47.80 11.00 26.20 0.00 3.00 0.046 31.30 0.16 1.04 38.06 9.66 16.72 0.00 2.90 0.167 10.52 0.00 1.40 78.38 1.88 5.80 0.00 2.00 0.028 9.24 0.00 0.00 65.66 7.38 11.48 0.26 5.98 0.009 34.70 0.00 23.22 39.58 0.20 2.20 0.00 0.10 0.0010 34.14 0.12 0.48 32.64 4.18 24.78 0.00 2.96 0.7011 41.42 3.00 6.50 37.16 1.56 9.32 0.64 0.40 0.0012 26.28 0.88 28.44 33.64 2.60 3.00 1.72 1.46 1.98Median 14.41 0.06 1.22 43.16 3.39 10.40 0.16 2.49 0.01

Island 1 44.70 8.12 5.92 35.04 0.00 5.36 0.40 0.00 0.462 30.00 12.18 0.00 50.10 2.72 1.54 0.26 2.66 0.543 30.76 6.64 35.38 16.42 2.48 6.92 0.60 0.80 0.004 17.02 1.14 24.34 34.36 1.82 12.78 2.96 4.36 1.225 25.92 0.40 40.54 18.68 0.00 12.46 0.00 2.00 0.006 30.28 0.72 64.04 0.00 0.00 0.94 0.00 4.02 0.007 28.20 1.72 33.18 14.14 0.00 14.49 8.45 0.00 0.008 26.06 1.22 30.58 9.84 0.00 26.50 5.80 0.00 0.009 22.92 1.46 46.18 10.90 0.00 16.91 0.00 1.63 0.0010 34.28 1.04 23.20 11.88 0.00 21.46 6.66 0.58 0.9011 46.08 1.72 24.56 8.48 0.00 18.64 0.00 1.04 0.0012 50.62 1.06 19.46 1.52 0.00 24.36 0.44 0.18 2.36Median 30.14 1.34 27.57 13.01 0.00 13.63 0.42 0.92 0.00

Page 9: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

ConclusionBased on the results for the arithmetic, geometric, harmonic means, and median the best measure of central tendency to use would be the arithmetic mean because of the multiple zero values and also because we only have 4 values for each data set, making the median a quite inaccurate measure.

By comparing the arithmetic means of each station,

LHC LSC DC DCA Other Algae Abiotic OTA OTD OTL

Mainland 20.33 0.40 7.01 47.33 5.65 11.96 3.94 3.11 0.27

Island 32.24 3.12 28.95 17.61 0.58 13.53 2.13 1.44 0.46

Page 10: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Measures of Variability

Page 11: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Variability implies how the observations are either scattered all over or clustered around the central location

the basis for comparison, without which the definition of statistics is incomplete

measured by using various parameters

Page 12: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Range- difference between the largest and the smallest observations in a set of data.

Interquartile range- difference between the third and the first quartiles.

Mean deviation- dispersion of data is measured more comprehensively considering all the deviations of observations from the central location.

Variance and standard deviation- average of the squared deviations. Positive square root of variance, which is called standard deviation (SD), is used for the presentation purpose to express variation of a particular mean.

Page 13: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Station LHC LSC DC DCA Other Algae Abiotic OtherBiotaOTA OTD OTL

Mainland 1 9.42 0.00 8.02 37.66 0.60 1.78 41.18 1.34 0.002 13.92 0.00 0.00 53.06 17.80 12.74 0.40 2.08 0.003 14.90 0.00 0.00 46.74 9.20 20.44 0.06 8.66 0.004 7.18 0.20 14.42 57.52 1.76 9.04 3.04 6.48 0.365 10.96 0.40 0.60 47.80 11.00 26.20 0.00 3.00 0.046 31.30 0.16 1.04 38.06 9.66 16.72 0.00 2.90 0.167 10.52 0.00 1.40 78.38 1.88 5.80 0.00 2.00 0.028 9.24 0.00 0.00 65.66 7.38 11.48 0.26 5.98 0.009 34.70 0.00 23.22 39.58 0.20 2.20 0.00 0.10 0.0010 34.14 0.12 0.48 32.64 4.18 24.78 0.00 2.96 0.7011 41.42 3.00 6.50 37.16 1.56 9.32 0.64 0.40 0.0012 26.28 0.88 28.44 33.64 2.60 3.00 1.72 1.46 1.98Range 34.24 3.00 28.44 45.74 17.60 24.42 41.18 8.56 1.98Mean Deviation 11.03 0.52 7.68 10.97 4.46 6.85 6.21 1.96 0.37InterQuartile Range 23.74 0.35 12.70 19.12 7.93 15.81 1.45 3.87 0.31Variance 151.57 0.74 97.86 198.50 29.15 72.29 138.38 6.82 0.33Standard Deviation 12.31 0.86 9.89 14.09 5.40 8.50 11.76 2.61 0.58

Island 1 44.70 8.12 5.92 35.04 0.00 5.36 0.40 0.00 0.462 30.00 12.18 0.00 50.10 2.72 1.54 0.26 2.66 0.543 30.76 6.64 35.38 16.42 2.48 6.92 0.60 0.80 0.004 17.02 1.14 24.34 34.36 1.82 12.78 2.96 4.36 1.225 25.92 0.40 40.54 18.68 0.00 12.46 0.00 2.00 0.006 30.28 0.72 64.04 0.00 0.00 0.94 0.00 4.02 0.007 28.20 1.72 33.18 14.14 0.00 14.49 8.45 0.00 0.008 26.06 1.22 30.58 9.84 0.00 26.50 5.80 0.00 0.009 22.92 1.46 46.18 10.90 0.00 16.91 0.00 1.63 0.0010 34.28 1.04 23.20 11.88 0.00 21.46 6.66 0.58 0.9011 46.08 1.72 24.56 8.48 0.00 18.64 0.00 1.04 0.0012 50.62 1.06 19.46 1.52 0.00 24.36 0.44 0.18 2.36Range 33.60 11.78 64.04 50.10 2.72 25.56 8.45 4.36 2.36Mean Deviation 7.79 2.93 12.70 11.29 0.88 6.86 2.56 1.25 0.53InterQuartile Range 16.14 4.37 18.86 21.62 1.37 15.01 5.09 2.45 0.81Variance 101.16 14.13 296.05 222.29 1.16 72.84 9.50 2.38 0.53Standard Deviation 10.06 3.76 17.21 14.91 1.08 8.53 3.08 1.54 0.73

Page 14: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Comparison of Standard Deviations

LHC LSC DC DCA Other Algae Abiotic OTA OTD OTL

Mainland 12.31 0.86 9.89 14.09 5.40 8.50 11.76 2.61 0.58Island 10.06 3.76 17.21 14.91 1.08 8.53 3.08 1.54 0.73

Page 15: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Basics of HypothesisFormulation and Testing

Page 16: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Null hypothesis (H0) assumes that there is no difference between the new and old ones.

Alternate hypothesis (HA) assumes that the new idea is better or true and goes against the traditional belief.

In statistical procedure, null hypothesis is tested, not the alternative hypothesis. (diagram)

Significance level- probability (P) of occurrence of any event by chance or random error.

Confidence level, limits, and interval- When concluding that any hypothesis is true or false, there is a certain level of confidence. In most biological research, a confidence level of 95% is considered sufficient. Any mean has two confidence limits: the lower limit (LL) and the upper limit (UL) for a given level of confidence. The difference between the two limits is called confidence interval (CI).

Statistical and biological significance

Page 17: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Errors in hypothesis testing Selection of statistical tools

Page 18: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Test of goodness-of-fit• Test for normal distribution (normality test): The normality test is the gateway

test which determines whether we should choose a parametric or nonparametric test.

• If the collected data are normally distributed, then parametric tests are used for hypothesis testing; but, if they are not normally distributed, then we have to either normalize them by using data transformation methods or use nonparametric tests instead.

• Normally, the x2 -test or K–S test is used to determine whether the data set is normally distributed or not.

Page 19: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

ApplicationWe are given the percentages of live hard corals found in 12 stations each for the mainland and islands. We are to compare the amount of live hard corals in the two categories and make a conclusion.

First we establish our null and alternative hypotheses.

H0 = “There is no significant difference between the amount of live hard corals in the mainland and island.“HA = “There is a significant difference between the amount of live hard corals in the mainland and island.“

Here is the raw data for the experiment:Station LHC

Mainland 1 9.422 13.923 14.904 7.185 10.966 31.307 10.528 9.249 34.7010 34.1411 41.4212 26.28

Station LHC

Island 1 44.702 30.003 30.764 17.025 25.926 30.287 28.208 26.069 22.9210 34.2811 46.0812 50.62

Page 20: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Application (continued)Since we have two samples with equal sizes, we will use the two-sample t-test to test our hypotheses.

We first solve for and :

Then for and :

Then we solve for the sample pooled variance:

And the sample pooled standard deviation:

Page 21: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Application (continued)Now, we solve for our t-value. Since :

Then, we check the t-table for the value associated with and to get

Since our computed t-value (2.594102) exceeds 2.07, we reject our null hypothesis and conclude that there is a significant difference between the means of our samples.

Therefore, at significance level a=0.05, the island has significantly more live hard corals compared to the mainland.

Page 22: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Degrees of Freedom Probability, p 0.1 0.05 0.01 0.0011 6.31 12.71 63.66 636.62

2 2.92 4.30 9.93 31.60

3 2.35 3.18 5.84 12.92

4 2.13 2.78 4.60 8.61

5 2.02 2.57 4.03 6.87

6 1.94 2.45 3.71 5.96

7 1.89 2.37 3.50 5.41

8 1.86 2.31 3.36 5.04

9 1.83 2.26 3.25 4.78

10 1.81 2.23 3.17 4.59

11 1.80 2.20 3.11 4.44

12 1.78 2.18 3.06 4.32

13 1.77 2.16 3.01 4.22

14 1.76 2.14 2.98 4.14

15 1.75 2.13 2.95 4.07

16 1.75 2.12 2.92 4.02

17 1.74 2.11 2.90 3.97

18 1.73 2.10 2.88 3.92

19 1.73 2.09 2.86 3.88

20 1.72 2.09 2.85 3.85

21 1.72 2.08 2.83 3.82

22 1.72 2.07 2.82 3.79

23 1.71 2.07 2.82 3.77

24 1.71 2.06 2.80 3.75

25 1.71 2.06 2.79 3.73

26 1.71 2.06 2.78 3.71

27 1.70 2.05 2.77 3.69

28 1.70 2.05 2.76 3.67

29 1.70 2.05 2.76 3.66

30 1.70 2.04 2.75 3.65

40 1.68 2.02 2.70 3.55

60 1.67 2.00 2.66 3.46

120 1.66 1.98 2.62 3.37

¥ 1.65 1.96 2.58 3.29

Page 23: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Experimental Designsand

Analysis of Variance

Page 24: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

ANOVA When a hypothesis is tested by comparing variances after partitioning, the method is called analysis of variance.

More specifically, the effect of any factor is considered significant if the variance of a treatment is higher than the variance among the replicates.

Page 25: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Completely Randomized Design (CRD)

Used to study the effects of one factor, i.e. treatment or fixed factor, keeping others constant; therefore, it is often called a single-factor experiment.

All of the experimental units should be uniform, and the types of selected factors (treatments) are randomly assigned to the experimental units.

Can be done by using a lottery system, random numbers/table, or any other method.

Before randomizing, we need to determine the required total number of experimental units (n).

If there are “t” different treatments of a single factor and the treatments are replicated “r” times, then:

Total experimental units (n) = t × r

Page 26: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Treatment combinations or experimental design for CRD.

Page 27: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

The following equation represents the mathematical model for CRD:

Page 28: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

ANOVA table for CRD.

1. Group the data by treatments and calculate the treatment totals (T), grand total (G), grand mean, and coefficient of variation (CV).

2. Using the number of treatments (t) and the number of replications (r), determine the df for each source of variation.

3. Construct an outline/table of ANOVA as shown.

4. Using X i to represent the measurement of the ith plot, T I as the total of the ith treatment, and n as the total number of experimental plots [i.e. n = rt], calculate the correction factor (CF) and the various sums of square (SS).

5. Calculate the mean square (MS) for each source of variation by dividing SS by their corresponding df.

6. Calculate the F-value (R.A. Fisher) for testing significance of the treatment difference, i.e. mean square of treatment divided by the mean square error (F = MST/MSE).

7. Enter all of the computed values in the ANOVA table.

8. Obtain the tabular F-values using:

f 1 = treatment df = (t − 1)

f 2 = error df = t (r − 1)

and compare for conclusion.

Basis of Conclusions to be Made.

Parametric test: One-way Anova

Page 29: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Nonparametric test: Kruskal-Wallis test (H-test)

Nonparametric tests are similar to parametric tests for ANOVA; but, they use ranks rather than the original data for analysis. Therefore, they are also called “ANOVA by ranks.”

When the samples are not from normally distributed data or the variances are heterogeneous, ranks are assigned to the observation for analysis.

As in parametric tests, the Kruskal-Wallis test only determines whether there is an effect by a factor, but it doesn’t compare among the means.

A nonparametric method has also been developed for the purpose of multiple comparisons.

Page 30: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Randomized Complete Block Design

The randomized complete block design (RCBD) is probably the most widely used design because, in reality, it is difficult to find all identical or uniform experimental units in the field of aquaculture, especially in outdoor ponds.

Some of them are closer to or separated by canals, roads, shade, etc. Even when using cages, some of them are closer to dikes, whereas others can be far away. Similarly, few rows of indoor tanks can be in a darker area, whereas others can be in brighter areas.

Page 31: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Randomized Complete Block Design

These factors can have large effects on response variables, but these effects can neither be avoided nor even minimized to negligible levels. In such cases, the only option is to separate their effects while designing the experiment by blocking.

The experimental units that are thought to be uniform are considered one block. Blocking minimizes the random error by separating the experimental/random error, thereby maximizing the chance of treatment effects becoming significant.

However, care should be taken while designing the experiment. All of the treatments have to be included in each block.

Page 32: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Randomized Complete Block Design

Page 33: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Randomized Complete Block Design

Page 34: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I
Page 35: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I
Page 36: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I
Page 37: Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I