Download - Fundamental of Statistic Kel. 1

Fundamentals of Statistics

MEMBER OF GRUP 1 :•DETYA INDRIAWAN•DIAH AULIA I•KARINA PRAVITASARI•MAS’ATUL FARHAH•TIARA ARISENDA K

Pendidikan Biologi Reguler 2013

Statistics? A collection of quantitative data from a sample or population.

The science that deals with the collection, tabulation, analysis, interpretation, and presentation of quantitative data.

Statistic types Deductive or descriptive statistics

◦describe and analyze a complete data set

Inductive statistics ◦deal with a limited amount of data (sample).

◦Conclusions: probability?

Population A population is any entire collection of people, animals, plants or things from which we may collect data.

It is the entire group we are interested in, which we wish to describe or draw conclusions about.

For each population there are many possible samples.

Sample A sample is a group of units selected from a larger group (population).

By studying the sample it is hoped to draw valid conclusions about population.

The sample should be representative of the general population.

◦ The best way is by random sampling.

Parameter A parameter is a value, usually unknown (and which therefore has to be estimated), used to represent a certain population characteristic.

◦ For example, the population mean is a parameter that is often used to indicate the average value of a quantity.

Inferential Statistics Statistical Inference makes use of information from a sample to draw conclusions (inferences) about the population from which the sample was taken.

Types of data Variables data

◦ quality characteristics that are measurable values.◦ measurable and normally continuous; ◦ may take on any value - eg. weight in kg

Attribute data ◦ quality characteristics that are observed to be either present or absent, conforming or nonconforming.

◦ countable and normally discrete; integer - eg: 0, 1, 5, 25, …, but cannot 4.65

Describing the Data Graphical:

◦ Plot or picture of a frequency distribution.

Analytical:◦ Summarize data by computing a measure of central tendensy and

dispersion.

Sampling Methods Sampling methods are methods for selecting a sample from the population:Simple random sampling - equal chance for each

member of the population to be selected for the sample. Systematic sampling - the process of selecting every n-th

member of the population arranged in a list. Stratified sample - obtained by dividing the population

into subgroups and then randomly selecting from each subgroups.

Cluster sampling - In cluster sampling groups are selected rather than individuals.

Incidental or convenience sampling - Incidental or convenience sampling is taking an intact group (e.g. your own forth grade class of pupils)

Frequency Distribution

Consider the following set of data which are the high temperatures recorded for 30 consequetive days.

We wish to summarize this data by creating a frequency distribution of the temperatures.

Data Set - High Temperatures for 30

Days50 45 49 50 43

49 50 49 45 49

47 47 44 51 51

44 47 46 50 44

51 49 43 43 49

45 46 45 51 46

Frequency Distribution for High Temperatures

Temperature Tally Frequency

51 //// 4

50 //// 4

49 ////// 6

48 0

47 /// 3

46 /// 3

45 //// 4

44 /// 3

43 /// 3

N = 30Exa

mp

le o

f F

req

uen

cy

Dis

trib

uti

on

Cummulative Frequency DistributionCummulative Frequency Distribution A cummulative freq distribution can be created by adding an additional column called "Cummulative Frequency."

The cum. frequency for a given value can be obtained by adding the frequency for the value to the cummulative value for the value below the given value.

For example: The cum. frequency for 45 is 10 which is the cum. frequency for 44 (6) plus the frequency for 45 (4).

Finally, notice that the cum. frequency for the highest value should be the same as the total of the frequency column.

Cummulative Frequency Distribution for High Temperatures

Temperature

TallyFrequenc

yCummulative

Frequency

51 //// 4 3050 //// 4 2649

//////

6 2248 0 1647 /// 3 1646 /// 3 1345 //// 4 1044 /// 3 643 /// 3 3 N

=30

Grouped frequency distribution In some cases it is necessary to group the values of the data to

summarize the data properly. Eg., we wish to create a freq. distribution for the IQ scores of 30

pupils. The IQ scores in the range 73 to 139. To include these scores in a freq. distribution we would need 67

different score values (139 down to 73). This would not summarize the data very much. To solve this problem we would group scores together and create a

grouped freq. distribution. If data has more than 20 score values, we should create a grouped

freq. distribution by grouping score values together into class intervals.

Grouped frequency Look at the following data of high temperatures for 50 days.

The highest temperature is 59 and the lowest temperature is 39.

We would have 21 temperature values.

This is greater than 20 values so we should create a grouped frequency distribution.

Data Set - High Temperatures for 50 Days

57 39 52 52 43

50 53 42 58 55

58 50 53 50 49

45 49 51 44 54

49 57 55 59 45

50 45 51 54 58

53 49 52 51 41

52 40 44 49 45

43 47 47 43 51

55 55 46 54 41

Grouped Frequency Distribution for High Temperatures

Class Interval TallyInterval Midpoint

Frequency

57-59 ////// 58 6

54-56 /////// 55 7

51-53/////////

//52 11

48-50 ///////// 49 9

45-47 /////// 46 7

42-44 ////// 43 6

39-41 //// 40 4

N = 50

Histograms◦ Constructing a Histogram for Discrete Data

◦ First, determine the frequency and relative frequency of each x value.◦ Then mark possible x value on a horizontal scale.

Descriptive statistics Measures of Central Tendency

◦ Describes the center position of the data◦ Mean, Median, Mode

Measures of Dispersion◦ Describes the spread of the data◦ Range, Variance, Standard deviation

Measures of central tendency: Mean

Arithmetic mean: x =

where xi is one observation, means “add up what follows” and N is the number of observations

So, for example, if the data are : 0,2,5,9,12 the mean is (0+2+5+9+12)/5 = 28/5 = 5.6

N

i

ixN 1

1

Median - mode

Median = the observation in the ‘middle’ of sorted data

Mode = the most frequently occurring value

Median and mode

100 91 85 84 75 72 72 69 65

Mean = 79.22

MedianMode

Measures of dispersion: range

The range is calculated by taking the maximum value and subtracting the minimum value.

2 4 6 8 10 12 14

Range = 14 - 2 = 12

Measures of dispersion: variance

Calculate the deviation from the mean for every observation.

Square each deviation

Add them up and divide by the number of observations

n

xn

ii

1(

Measures of dispersion: standard deviation

The standard deviation is the square root of the variance.

The variance is in “square units” so the standard deviation is in the same units as x.

n

xn

ii

1(

Standard Deviation for a Sample

General formula/ungrouped data:

For computation purposes:1

)(1

2

n

XXs

n

ii

)1(1

2

1

2

nn

XXn

s

n

i

n

iii

Standard Deviation for a Sample

Grouped data:

)1(

)(1

2

1

2

nn

XfXfn

s

h

i

h

iiiii

Standard deviation and curve shape

If is small, there is a high probability for getting a value close to the mean.

If is large, there is a correspondingly higher probability for getting values further away from the mean.

The Normal Curve The normal curve or the normal frequency distribution or Gaussian distribution is a hypothetical distribution that is widely used in statistical analysis.

The characteristics of the normal curve make it useful in education and in the physical and social sciences.

Characteristics of the Normal Curve

The normal curve is a symmetrical distribution of data with an equal number of data above and below the midpoint of the abscissa.

Since the distribution of data is symmetrical the mean, median, and mode are all at the same point on the abscissa.

In other words, mean = median = mode.

If we divide the distribution up into standard deviation units, a known proportion of data lies within each portion of the curve.

34.13% of data lie between and 1 above the mean (). 34.13% between and 1 below the mean. Approximately two-thirds (68.28 %) within 1 of the mean. 13.59% of the data lie between one and two standard deviations Finally, almost all of the data (99.74%) are within 3 of the mean.

Standardized normal value, Z

When a score is expressed in standard deviation units, it is referred to as a Z-score.

A score that is one standard deviation above the mean has a Z-score of 1.

A score that is one standard deviation below the mean has a Z-score of -1.

A score that is at the mean would have a Z-score of 0.

The normal curve with Z-scores along the abscissa looks exactly like the normal curve with standard deviation units along the abscissa.

Z-value

Deviation IQ Scores, sometimes called Wechsler IQ scores, are a standard score with a mean of 100 and a standard deviation of 15.

What percentage of the general population have deviation IQs lower than 85?

So an IQ of 85 is equivalent to a z-value of –1.

So 50 % - 34.13 % = 15.87% of the population has IQ scores lower than 85.

Frequency Polygon A frequency polygon is what you may think of as a curve.

A frequency polygon can be created with interval or ratio data.

Let's create a frequency polygon with the data we used earlier to create a histogram.

To create a frequency polygon Arrange the values along the abscissa (horizonal axis). Arrange the lowest data on the left & the highest on the right. Add one value below the lowest data and one above the

highest data. Create a ordinate (vertical axis). Arrange the frequency values along the abscissa. Provide a label for the ordinate (Frequency). Create the body of the frequency polygon by placing a dot for

each value. Connect each of the dots to the next dot with a straight line. Provide a title for the frequency polygon.

To create a frequency polygon

Download - Fundamental of Statistic Kel. 1

Top Related