Download - Fundamental of Statistic Kel. 1
Fundamentals of Statistics
MEMBER OF GRUP 1 :•DETYA INDRIAWAN•DIAH AULIA I•KARINA PRAVITASARI•MAS’ATUL FARHAH•TIARA ARISENDA K
Pendidikan Biologi Reguler 2013
Statistics? A collection of quantitative data from a sample or population.
The science that deals with the collection, tabulation, analysis, interpretation, and presentation of quantitative data.
Statistic types Deductive or descriptive statistics
◦describe and analyze a complete data set
Inductive statistics ◦deal with a limited amount of data (sample).
◦Conclusions: probability?
Population A population is any entire collection of people, animals, plants or things from which we may collect data.
It is the entire group we are interested in, which we wish to describe or draw conclusions about.
For each population there are many possible samples.
Sample A sample is a group of units selected from a larger group (population).
By studying the sample it is hoped to draw valid conclusions about population.
The sample should be representative of the general population.
◦ The best way is by random sampling.
Parameter A parameter is a value, usually unknown (and which therefore has to be estimated), used to represent a certain population characteristic.
◦ For example, the population mean is a parameter that is often used to indicate the average value of a quantity.
Inferential Statistics Statistical Inference makes use of information from a sample to draw conclusions (inferences) about the population from which the sample was taken.
Types of data Variables data
◦ quality characteristics that are measurable values.◦ measurable and normally continuous; ◦ may take on any value - eg. weight in kg
Attribute data ◦ quality characteristics that are observed to be either present or absent, conforming or nonconforming.
◦ countable and normally discrete; integer - eg: 0, 1, 5, 25, …, but cannot 4.65
Describing the Data Graphical:
◦ Plot or picture of a frequency distribution.
Analytical:◦ Summarize data by computing a measure of central tendensy and
dispersion.
Sampling Methods Sampling methods are methods for selecting a sample from the population:Simple random sampling - equal chance for each
member of the population to be selected for the sample. Systematic sampling - the process of selecting every n-th
member of the population arranged in a list. Stratified sample - obtained by dividing the population
into subgroups and then randomly selecting from each subgroups.
Cluster sampling - In cluster sampling groups are selected rather than individuals.
Incidental or convenience sampling - Incidental or convenience sampling is taking an intact group (e.g. your own forth grade class of pupils)
Frequency Distribution
Consider the following set of data which are the high temperatures recorded for 30 consequetive days.
We wish to summarize this data by creating a frequency distribution of the temperatures.
Data Set - High Temperatures for 30
Days50 45 49 50 43
49 50 49 45 49
47 47 44 51 51
44 47 46 50 44
51 49 43 43 49
45 46 45 51 46
Frequency Distribution for High Temperatures
Temperature Tally Frequency
51 //// 4
50 //// 4
49 ////// 6
48 0
47 /// 3
46 /// 3
45 //// 4
44 /// 3
43 /// 3
N = 30Exa
mp
le o
f F
req
uen
cy
Dis
trib
uti
on
Cummulative Frequency DistributionCummulative Frequency Distribution A cummulative freq distribution can be created by adding an additional column called "Cummulative Frequency."
The cum. frequency for a given value can be obtained by adding the frequency for the value to the cummulative value for the value below the given value.
For example: The cum. frequency for 45 is 10 which is the cum. frequency for 44 (6) plus the frequency for 45 (4).
Finally, notice that the cum. frequency for the highest value should be the same as the total of the frequency column.
Cummulative Frequency Distribution for High Temperatures
Temperature
TallyFrequenc
yCummulative
Frequency
51 //// 4 3050 //// 4 2649
//////
6 2248 0 1647 /// 3 1646 /// 3 1345 //// 4 1044 /// 3 643 /// 3 3 N
=30
Grouped frequency distribution In some cases it is necessary to group the values of the data to
summarize the data properly. Eg., we wish to create a freq. distribution for the IQ scores of 30
pupils. The IQ scores in the range 73 to 139. To include these scores in a freq. distribution we would need 67
different score values (139 down to 73). This would not summarize the data very much. To solve this problem we would group scores together and create a
grouped freq. distribution. If data has more than 20 score values, we should create a grouped
freq. distribution by grouping score values together into class intervals.
Grouped frequency Look at the following data of high temperatures for 50 days.
The highest temperature is 59 and the lowest temperature is 39.
We would have 21 temperature values.
This is greater than 20 values so we should create a grouped frequency distribution.
Data Set - High Temperatures for 50 Days
57 39 52 52 43
50 53 42 58 55
58 50 53 50 49
45 49 51 44 54
49 57 55 59 45
50 45 51 54 58
53 49 52 51 41
52 40 44 49 45
43 47 47 43 51
55 55 46 54 41
Grouped Frequency Distribution for High Temperatures
Class Interval TallyInterval Midpoint
Frequency
57-59 ////// 58 6
54-56 /////// 55 7
51-53/////////
//52 11
48-50 ///////// 49 9
45-47 /////// 46 7
42-44 ////// 43 6
39-41 //// 40 4
N = 50
Histograms◦ Constructing a Histogram for Discrete Data
◦ First, determine the frequency and relative frequency of each x value.◦ Then mark possible x value on a horizontal scale.
Descriptive statistics Measures of Central Tendency
◦ Describes the center position of the data◦ Mean, Median, Mode
Measures of Dispersion◦ Describes the spread of the data◦ Range, Variance, Standard deviation
Measures of central tendency: Mean
Arithmetic mean: x =
where xi is one observation, means “add up what follows” and N is the number of observations
So, for example, if the data are : 0,2,5,9,12 the mean is (0+2+5+9+12)/5 = 28/5 = 5.6
N
i
ixN 1
1
Median - mode
Median = the observation in the ‘middle’ of sorted data
Mode = the most frequently occurring value
Median and mode
100 91 85 84 75 72 72 69 65
Mean = 79.22
MedianMode
Measures of dispersion: range
The range is calculated by taking the maximum value and subtracting the minimum value.
2 4 6 8 10 12 14
Range = 14 - 2 = 12
Measures of dispersion: variance
Calculate the deviation from the mean for every observation.
Square each deviation
Add them up and divide by the number of observations
n
xn
ii
1(
Measures of dispersion: standard deviation
The standard deviation is the square root of the variance.
The variance is in “square units” so the standard deviation is in the same units as x.
n
xn
ii
1(
Standard Deviation for a Sample
General formula/ungrouped data:
For computation purposes:1
)(1
2
n
XXs
n
ii
)1(1
2
1
2
nn
XXn
s
n
i
n
iii
Standard Deviation for a Sample
Grouped data:
)1(
)(1
2
1
2
nn
XfXfn
s
h
i
h
iiiii
Standard deviation and curve shape
If is small, there is a high probability for getting a value close to the mean.
If is large, there is a correspondingly higher probability for getting values further away from the mean.
The Normal Curve The normal curve or the normal frequency distribution or Gaussian distribution is a hypothetical distribution that is widely used in statistical analysis.
The characteristics of the normal curve make it useful in education and in the physical and social sciences.
Characteristics of the Normal Curve
The normal curve is a symmetrical distribution of data with an equal number of data above and below the midpoint of the abscissa.
Since the distribution of data is symmetrical the mean, median, and mode are all at the same point on the abscissa.
In other words, mean = median = mode.
If we divide the distribution up into standard deviation units, a known proportion of data lies within each portion of the curve.
34.13% of data lie between and 1 above the mean (). 34.13% between and 1 below the mean. Approximately two-thirds (68.28 %) within 1 of the mean. 13.59% of the data lie between one and two standard deviations Finally, almost all of the data (99.74%) are within 3 of the mean.
Standardized normal value, Z
When a score is expressed in standard deviation units, it is referred to as a Z-score.
A score that is one standard deviation above the mean has a Z-score of 1.
A score that is one standard deviation below the mean has a Z-score of -1.
A score that is at the mean would have a Z-score of 0.
The normal curve with Z-scores along the abscissa looks exactly like the normal curve with standard deviation units along the abscissa.
Z-value
Deviation IQ Scores, sometimes called Wechsler IQ scores, are a standard score with a mean of 100 and a standard deviation of 15.
What percentage of the general population have deviation IQs lower than 85?
So an IQ of 85 is equivalent to a z-value of –1.
So 50 % - 34.13 % = 15.87% of the population has IQ scores lower than 85.
Frequency Polygon A frequency polygon is what you may think of as a curve.
A frequency polygon can be created with interval or ratio data.
Let's create a frequency polygon with the data we used earlier to create a histogram.
To create a frequency polygon Arrange the values along the abscissa (horizonal axis). Arrange the lowest data on the left & the highest on the right. Add one value below the lowest data and one above the
highest data. Create a ordinate (vertical axis). Arrange the frequency values along the abscissa. Provide a label for the ordinate (Frequency). Create the body of the frequency polygon by placing a dot for
each value. Connect each of the dots to the next dot with a straight line. Provide a title for the frequency polygon.
To create a frequency polygon