confidential1. 2 warm up 1.describe what information you can get about a data set by looking at a...

46
Confidential 1

Upload: edwina-cunningham

Post on 02-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Confidential 1

Confidential 2

Warm Up

1. Describe what information you can get about a data set by looking at a box-and-whisker plot.

Find the Median and the LQ and UQ for the given

data

2. 593, 588, 540, 434, 420, 398, 390, 375

Confidential 3

Warm Up

4. Find the inter quartile range. 5. What are the limits on Outliers? Are there any Outliers?

From the data of Q3.

3. 40, 45, 50, 60, 65, 70, 75, 80, 85

Confidential 4

A box and whisker plot is used to display a set of data.

To create this plot we first find out median, first quartile and second quartile.

Plot the given data set on a number line.

Mark the highest and lowest data points with connected black circles and make a box between the quartiles and a line through the median.

Lets recap what we have learned in the last lesson

Confidential 5

To find median first write all the numbers in the ascending order.

If the number of data points is odd then the middle number will be the median.

If the number of data points in the set is even, the median is the average of the two middle numbers.

The Lower Quartile (LQ) of a data set is the median of lower half of the ordered data.

The Upper Quartile (UQ) of a data set is the median of the upper half of the data.

Review

Confidential 6

A circle graph is an efficient way to present certain types of data.

The graph shows data as percent or fractions of a whole.

The total should be 100% or 1.

This graph is used to show the parts of a whole

Review

Inner Quartile Range (IQR) is the difference between the Upper Quartile and the Lower Quartile

Fences are the limits till which we will accept the values to be correct and any data points outside these fences will be the ‘outliers’ and hence discarded.

Confidential 7

Basic Concepts in Statistics

• In a statistical study, the objects whose characteristics are studied are called Individuals Or Units.

• With any statistical study, the collection of all individuals under consideration is called Population or Universe.

Ex: In the study of financial condition of families of a particular tribe, the set of all families belonging to the tribe is the population. The families are individuals.

• Parameter is a a summary description of a particular aspect of the entire population. Example: The mean age of citizens in the country.

Lets get Started

Confidential 8

Basic Concepts in Statistics

• The study of characteristics of individuals of a population by using statistical devices and techniques is called Statistical Investigation or Statistical Enquiry. The person who conducts the statistical investigation is called Investigator.

• For a statistical study, the investigator collects the information about the individuals of the population. The persons who supply information are Informants.

• The investigator may directly collect the information from the informants or he may collect through his agents. The agents who collect and hand over the information to the investigator are Enumerators.

Confidential 9

Statistical Survey

• Statistical Survey is the process of collecting the information from the informants.

• For a statistical investigation of a population, an investigator may collect information from each and every individual belonging to the population or he may collect information from selected representative individuals only.The group of representative individuals from whom information are collected is called a Sample. Thus, Sample is a representative portion of the population.

• The number of individuals in a sample is called Sample Size.

Confidential 10

•The purpose of dealing with a sample is that, it enables us to study a large population and to learn things about it, so that we can draw important inferences, without having to go to the trouble of collecting data from every member of the entire population.

Example : In the study of infant health of all children born in the UK

in the 1990's, all babies born on 10th October in any of the years form a sample.

Confidential 11

Sample Survey and Census Enumeration

• A statistical survey in which sample is made use of is called Sample Survey.

• A survey in which the whole population is made use of is called Census Enumeration.

• Census method is costly and consumes much time and labor as it includes all the individuals. But the results arrived would be accurate and free of sampling errors.

Confidential 12

•Sample survey is scientific in nature. A well planned sample survey will give as valid result as a census method. It is cheaper than census method and consumes less time and labor.

•Census method is preferred when the population is very small and Sample survey is preferred when the population is very large.

• Ex: The 10 yearly population census of China is a Census enumeration The average height of girls belonging to grade 8 in different schools of the city is a Sample Survey

Confidential 13

Sampling

One important point in working with samples is the selection of a truly representative sample. The collection of individual items for observation which accurately represents the larger population is called Sampling. Since validity of results of an investigation depends mainly on the selection of the sample, the sample should be obtained with utmost care.

Sampling frame : It is the list of units comprising a population from which a sample is to be selected. If the sample is to be representative of the population, the sampling frame should include all members of the population.

Confidential 14

Example: Telephone book

Statistic: It is a summary description of a particular aspect of a sample.

Example: Mean age of the people in a sample.

Sampling Error: Whatever may be the sampling method adopted, the sample selected is likely to differ slightly in structure from the population. This difference leads to an error in the estimation of the population. This error is called sampling error.

Confidential 15

Random Sampling

The sampling which involves the selection of a sample from a population, based on the principle of randomization or chance is called Probability Sampling or Random Sampling.

The sampling method which focus on volunteers, easily available units, or those that just happen to be present when the research is done is called Non Probability Sampling or Non Random Sampling. Non-probability samples are useful for quick and cheap studies, for case studies, for qualitative research, for pilot studies, and for developing hypotheses for future research.

Confidential 16

There are several different ways in which a probability sample can be selected.

The most common probability sampling methods are,

1. Simple Random Sampling 2. Systamatic Sampling

3. Stratified Sampling 4. Cluster Sampling

Confidential 17

Simple Random Sampling

In simple random sampling, each member of a population has an equal chance of being included in the sample. Also, each combination of members of the population has an equal chance of composing the sample. Those two properties are what defines simple random sampling. To select a simple random sample, you need to list all of the units in the survey population.

Confidential 18

Generally simple random sample is obetained either by Lottery method or by the use of the Table of Random Numbers

Lottery Method:

Consider a population of 1000 units. Let a sample of size 100 be required. First let us assign the 1000 units with numbers from 1 to 1000. Let us put these 1000 identical chits in a box. Then, let us shake the box and then without looking ta the numbers, draw 100 chits from the box. Then, the 100 units with these picked numbers form the sample.

Example: A lottery draw is a good example of simple random sampling.

Confidential 19

Using table of random numbers for Simple Random Sampling

Table of Random Numbers is a tabular arrangement of randomly selected digits. The digit given in each position in the table was originally chosen randomly from the digits 1 , 2, 3, 4, 5, 6, 7, 8, 9, 0 by a random process in which each digit is equally likely to be chosen.

To select a sample by this method, first of all, the units are numbered. From the table of random numbers, in an orderly way, required number of random numbers are selected(unwanted numbers in the selection may be dropped). The units with these numbers are selected to form the sample.

Confidential 20

Using table of random numbers for Simple Random Sampling - Examples

First let us assume a random number table with only 10 numbers,which is as shown below.

Table1: Random Digits:

12429 63527 74608 01549 00793

28354 61218 95782 63940 58128

Table 2 : Frequency of Occurrence of Each Digit in Table 1:

Digit : 1 2 3 4 5 6 7 8 9 0

Frequency : 5 7 4 5 5 4 4 6 5 5

Let us now learn to use the tables to solve an example

Confidential 21

Example 1. Obtain a random sample of 4 out of 8 using the random digits in table 1

Let us simply read random digits ignoring those that are out of range or recur until we get four of them. Going from left to right across the top row of Table 1 we get, 1 2 4 [2] [9] 6 3 5 ; ;

1 2 4 [2] [9] 6 3 5 ; ; ; ;

Probability of being selected = (Sample Size, n ÷ Total Population ,N ) * 100% = 4 ÷ 8 * (100) = 50%

(Numbers within square brackets are either repeats of previously appearing

numbers or out of range.) Taking the first four usable numbers we get,

Random sample : 1, 2, 4, 6

Confidential 22

Example 2:

To draw a simple random sample from a telephone book, each entry would need to be numbered sequentially. If there were 10,000 entries in the telephone book and if the sample size were 2,000, then 2,000 numbers between 1 and 10,000 would need to be randomly generated by a computer.

Each number will have the same chance of being generated by the computer

The 2,000 telephone entries corresponding to the 2,000 computer-generated random numbers would make up the sample.

Using table of random numbers for

Simple Random Sampling - Examples

Confidential 23

Systematic Sampling

Systematic sampling means that there is a gap, or interval, between

each selected unit in the sample.Therefore, sometimes it is also called

as interval sampling.

Example:Selection of a syatamatic sample of size 100 from a population

having 400 units.

In order to select a systematic sample, you need to follow these steps.

1. Number the units on your frame from 1 to N (where N is the total population size). Here N = 400.

2. Determine the sampling interval (K) by dividing the number of units in the population by the desired sample size(n). Here n = 100.

Sampling Interval, K = N ÷ n = 400 ÷ 100 = 4.

Since K = 4, you will need to select one unit out of every four units to

end up with a total of 100 units in your sample.

Confidential 24

Systematic Sampling

3. Select a number between 1 and K at random. This number is called the random start(a) and would be the first number in the sample.

Let a = 3. Then the systamatic sample is formed by selecting the units which are having numbers, a, a+k, a+2k, ….., a+99k.

i.e the units with the numbers 3, 7, 11, 15,……… , 399 In the same way, you can have only four possible samples

that can be selected, corresponding to the four possible random starts. They are as follows.

• 1, 5, 9, 13...393, 397 • 2, 6, 10, 14...394, 398 • 3, 7, 11, 15...395, 399 • 4, 8, 12, 16...396, 400

Confidential 25

Systematic Sampling - example

Example : Obtain a systematic sample of 500 students by conducting a survey on student housing for a college, which has an enrolment of 10,000 students.

First determine sampling interval (K) Sampling interval, K = Total population ÷ sample size

K = 10,000 ÷ 500 = 20

Confidential 26

To begin systematic sampling,

1. Let us assign sequential numbers to all the students.

2. Choose a starting point by selecting a random number between 1 and 20. i.e let the random start be 9. Then the 9th student on the list would be the first member in the sample and every 20th student thereafter.

3. One of the the systematic samples of students would be those corresponding to student numbers 9, 29, 49, 69...9,929, 9,949, 9,969 and 9,989.

Confidential 27

Stratified Sampling

• In this method of sampling, the population is split into homogeneous groups called Strata.Then from each stratum, appropriate number of units are randomly selected to form the sample.

• The sampling method can vary from one stratum to another. When simple random sampling is used to select the sample within each stratum, the sample design is called stratified simple random sampling

• This method of sampling is adopted when the population can be split into groups of units which are homogeneous with regard to some characteristics.

Confidential 28

The most important merit of this method is that the sample has representations from all the strata and hence all the categories are represented

Example: For obtaining a sample from the population of students in a college, groups of students studying in various classes may be treated as a strata and from each stratum (class), some students may be randomly selected to form a sample

Confidential 29

Cluster Sampling

• Cluster sampling divides the population into groups or clusters. A number of clusters are selected randomly to represent the total population, and then all units within selected clusters are included in the sample. No units from non selected clusters are included in the sample.

• This method is adopted when it is too expensive to spread a sample across the population as a whole. Travel costs can become expensive if interviewers have to survey people from one end of the country to the other. To reduce costs, statisticians may choose a cluster sampling technique. Another reason is that sometimes a list of all units in the population is not available, while a list of all clusters is either available or easy to create.

Example: For the study of standard of living of the bank employees in a city, the bank offices in the city may be treated as clusters of employees. So, some offices are selected and all the employees in the those selected offices are included in the sample.

Confidential 30

Drawbacks of Cluster Sampling

Disadavtages of Cluster Sampling:

1. Loss of efficiency when compared with simple random sampling. Surveying a large number of small clusters is better than surveying a small number of large clusters. This is because neighbouring units tend to be more alike, resulting in a sample that does not represent the whole spectrum of opinions or situations present in the overall population.

2. One will not have total control over the final sample size. Considering the given example, as all the bank offices in the city have the same number of employees and one must interview all the employees. Finally the sample size would be either smaller or larger than the expected.

Confidential 31

Your Turn

Explain the following and give examples.

1. Population 2. Sample

3. Sample Size

4. parameter

Confidential 32

Your Turn

5. Distinguish between Census Enumeration and Sample Survey.

6. What do you mean by Random Sampling? What are its relative merits?

7. Do any of the following use simple random sampling? Provide a brief explanation of how each example uses the sampling method. a) Census b) Bingo Game

Confidential 33

8. Imagine that a local clothing manufacturer has 2,700 employees. The personnel manager decides to ask the employees for suggestions on how to improve their workplace. It would take too long to survey everyone, so the manager chooses to systematically sample 300 of the employees.

a) What would be the sampling interval?

b) If the number 6 was your first randomly drawn number, what would be the first 8 numbers of your sample?

Your Turn

Confidential 34

9. Explain Systematic Sampling with an example

Your Turn

10. What are the disadvantages with non probability samples?

Confidential 35

Refreshment time

Confidential 37

1. New Horizon Academy has been given a sizeable grant: enough to build either a new play ground or swimming pool. But, as there is only money enough to build one facility, the principal wants to ask her students which one they feel is in greater need of renovation.The table below indicates the number of students by sex, per grade from Kindergarten to Grade 7.

Confidential 38

a)What is the total population of Poplar Ridge Academy?

b) The principal wants to sample 50% of the students. How many students would this be?

Confidential 39

2. Which sampling method can be adopted in the following case and what are its benefits?

Suppose a farmer wishes to work out the average milk yield of each cow type in his herd which consists of Ayrshire, Friesian, Galloway and Jersey cows.

Confidential 40

3. When is cluster sampling preferred?

.

Confidential 41

1) Sampling is the art of learning about a very large group of people by getting information from a small set of people. 2) Population is the entire set of individuals, events, units with specified characteristics.

3) Parameter is a summary description of a particular aspect of the entire population.

4) Sample is the subset of the population from which data is collected and used as a basis for making statements about the entire population.

Let’s summarize what we have learnt today

Confidential 42

5) Statistic is a summary description of a particular aspect of a sample. Statistics are used to describe samples and to estimate population parameters.

6) Census is a sample that includes the entire population which is

very expensive, time-consuming.

7) Sampling frame is a list of units comprising a population from which a sample is to be selected.

Let’s summarize what we have learnt today

Confidential 43

8) In Probability or Random sampling, every unit of the population of interest must be identified, and all units must have a known, non-zero chance of being selected into the sample. 9) Non probability samples focus on volunteers, easily

available units, or those that just happen to be present when the research is done.

10) In Simple Random Sampling, each individual is chosen entirely by chance and each member of the population has an equal chance of being included in the sample

Confidential 44

11) In Systematic Sampling there is a gap, or interval, between

each selected unit in the sample.

12) A Stratified Sample is obtained by taking samples from each stratum or sub-group of a population.

13) Cluster sampling is a sampling technique where the entire population is divided into groups, or clusters, and a random sample of these clusters are selected.

Let’s summarize what we have learnt today

Confidential 45

Confidential 46

9. When is cluster sampling preferred?Answer: Cluster sampling is typically used when the

researcher cannot get a complete list of the members of a population they wish to study but can get a complete list of groups or 'clusters' of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive.