sampling methods and survey types
TRANSCRIPT
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 1/14
Sampling Methods and Survey Types:
One of the world's best-known polling organisations, Gallup, say that one of the most
frequently asked questions they get from Americans is why they've never been
interviewed for a survey.
In an adult population of almost two hundred million, Americans express scepticism
about the scientific reliability of sampling. In particular, they do not believe that a
survey of 1500 - 2000 people can represent the views of all citizens.
Gallup's sampling principle is that selecting a sample of a small proportion of the whole
population can represent the opinions of all the people, provided that the sample is
properly selected.
• So how do Gallup select a sample?
Firstly, they have to locate a place where all or most Americans can be found. This
isn't in the shopping mall, but at home. From the 1930s to mid 1980s, poll
respondents were interviewed face-to-face in their homes. But by the 1990s, withapproximately 95% of all U.S. homes having a telephone, the vast majority of
surveys use this medium. Of course, this has the benefit of being a substantially less
expensive method.
• Identifying and describing the population.
Gallup is often asked to carry out polls on behalf of an organisation with the aim of
learning more about the population's attitudes and beliefs. Let's imagine that an
American national newspaper wants a poll done about U.S. golf fans; the target
population may be all Americans aged at least 18 who say that they're fans of golf.
But if the poll was conducted on behalf of the U.S. PGA (Professional Golf
Association), the target audience might be more specific; for instance, all people
over the age of 16, who watch at least 5 hours of golf (during the majortournaments) each week. Two surveys about the same sport, including many of the
same target respondents, but with very different sample populations.
• Choosing a method to sample the target population randomly.
The polling organisations have lists of all household telephone numbers in
continental USA. A computerised system uses random digit dialling (RDD) to create
a new list of all possible American telephone numbers, then selects a subset of
numbers from that new list for the polling organisation to call. This is important
because approximately 30% of American residential numbers are unlisted, according
to recent estimates. The exclusion of these "hidden" numbers would introduce bias
into the sample.
• Sample Accuracy.
With a sample size of 1000 adults, using the random selection process outlined
above, Gallup can be statistically certain that 95 times out of one hundred,
continued polling would produce the same result within a margin of error of +/- 3%.
If the sample size was doubled to 2000 adults, Gallup would incur roughly twice the
cost in conducting the survey, but the margin of error would decrease only to +/-
2%.
• Interviewing the selected sample.
What if the people randomly selected to survey are not in?
What if some of the target population are busy on other phone calls when the
pollsters call? In these cases the target respondent's phone number is stored and
recalled later at regular times throughout the survey period.
Excluding people who don't answer the phone the first time Gallup calls them, would
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 2/14
introduce bias amongst the survey sample: for instance, young single adults, who
are frequently out or using the phone, are less likely to be included in the sample
population than more sedentary people who are less frequent phone users.
In a household with more than one adult in residence, Gallup randomly select an
adult, either by asking for the person with the latest birthday or by asking theperson who answers the phone to list all the adults who live there. The pollster then
selects one of these adults at random.
• Asking the "right" questions.
Gallup assess that the greatest source of bias or error in survey data is probably the
wording of the questions themselves.
For example, you may have thought that conducting a pre-election poll of voting
intentions would be a simple process. But the question "Who will you vote for in the
next election?" can be equally as open to bias as any other survey. Does the polling
organisation list the vice-presidential candidates along with the names of the
presidential candidates? Should the party represented by the candidate be listed or
should there be no indication of party affiliation?
In these cases, Gallup tries to mimic the format and content of the ballot paper and
reads the names of the presidential and vice-presidential candidates and gives the
name of the party represented by them.
Questions to do with policy issues can also be very tricky: are things like food
stamps or housing grants to be called "welfare" or "programs for the poor"? If
members of the armed services are going abroad should this be termed "sending"
troops or "contributing" to a UN force? These are emotive topics and the wording of
the question can "slant" the answer received from poll respondents.
• The oldest one in the book.
One of the oldest question wordings concerns presidential job approval. Since the
1950s and Roosevelt"s presidency, Gallup has used the following question: "Do you
approve or disapprove of the job .... is doing as president?"
This means that there is a reliable trend line provided by the continuity of the
question asked. If, for example, George W. Bush has a job approval rating of 48%
after one year of his presidency, what can be learned from such a rating? What the
trend line allows is for analysts to look into history and compare this figure with
ratings recorded earlier in the presidential term. Additionally, an analysis can be
made of this figure compared to ratings recorded during previous presidents' terms.
In this case the question may be asked: did previous presidents with this approvalrating at this stage in their term tend to get re-elected or not?
Top
Sampling: Further examples
1. Surveys usually involve considerable expenditure of time, effort and cost.
It is vital to clarify at the outset what you want to find out in the survey, before
starting to use precious resources.
The Trendy Tea and Coffee Company (TTCC) are set to launch a new premium brandof tea and want to get the packaging right. Four different designs are created from a
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 3/14
traditional dark green colour, to a flashy black, silver and yellow look. TTCC employ
a market research organisation who survey 1000 people to find out which design
they prefer.
On the basis of the reported survey findings, TTCC launch the new tea in the flashydesign, and sales of the new product nosedive after the initial period. It becomes
clear upon review that no research was carried out on the drinking habits of those
people surveyed. If this work had been done, it would have shown that the regular
tea drinkers in the sample population all preferred the dark green packaging.
2. A Goods-In Inspector at a large drinks manufacturer in South-West
England has to deal with a consignment of 1000 cases of grape juice. In the past,
the drinks company has been affected by minor contamination in its fermenting
process that has led to the loss of some batches of its best-selling line: "UK - the
British Sherry for British Tastes".
The inspector has neither the time not the staff to open all the cases to check for
possible sources of the contamination, but she wants to have an idea of what the
whole consignment is like. She decides to open twenty cases of the grape juice - one
case in every fifty delivered. She could just open every fiftieth case in turn, but this
seems to be too standard an approach. She wants to introduce a more random
method.
So instead, the inspector imagines that the cases are numbered one to one
thousand and then uses her computer to generate at random, twenty 4-figure
numbers, ignoring all those that exceed one thousand. This gives the inspector her
sample population. As a result, there is no bias in her choice of cases to inspect.
3. The sampling method outlined above will be very labour intensive to carry
out. The inspector may have to open case 972, followed by case 23, then case 427.She realises this will be very tedious work and tries to think of a different solution -
one that combines random and multi-stage sampling methods:
She decides to split the consignment into batches of twenty-five, giving forty
batches in total. From each of these she chooses one case by selecting a random
number from one to twenty-five.
This multi-stage sampling approach saves the inspector time, cost and effort.
Top
Correlation between variables
Let's start by looking at how a scatter diagram can illustrate these relationships:
• Scattergrams
The scattergram or XY chart can be a useful way of representing the relationship
between two variables. The usual conventions of dependent and independent
variable position on the axes are followed. Points on the diagram are not connected
as they are on a line graph. The relationship between the two variables displayed on
the chart may be positive, negative or non-existent.
In Chart 1 there is a very strong negative correlation shown between disposable
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 4/14
income levels and the number of discount stores in existence. You may feel that this
makes sense as a hypothesis, in that as income levels fall, more discount retail
enterprises emerge.
•
Chart 1: Scattergram (XY Chart) showing a negative association
(Data for display purposes only)
In Chart 2, disposable income is plotted against number of overseas holidays taken.
It shows that in this case there is a strong positive correlation between income and
"luxury" items, such as foreign holidays. You may not agree that a foreign holiday is
a luxury, but may feel that, in general, the higher the income level the greater the
number of overseas holidays taken will be.
•
Chart 2: Scattergram showing a positive association
(Data for display purposes only)
The charts shown here are meant to illustrate the concept of a scattergram. In
practice, of course, the points on a scattergram are likely to lie around the chart,
although a strong association between the two variables is likely to allow us to draw
a straight line through the points shown. Such a straight line is known as the "line of
best fit". This is a straight line that seems to fit the points on the diagram best.
The line of best fit is usually drawn by eye. But there are more sophisticated ways of
making the line more accurate. This is because it is known that for a set of points on
a scattergram, the line of best fit will always pass through the point (x-bar, y-bar)
where x-bar is the mean of the horizontal values and y-bar is the mean of the y
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 5/14
values.
Chart 3 illustrates a lack of a statistical relationship. There is little or a non-existent
correlation between disposable income and amount of rainfall, unless of course we're
looking at the long term effect on the global climate of taking all these extra foreign
holidays and driving all these new cars that our higher incomes can afford!
•
Chart 3: Scattergram showing little or no association
(Data for display purposes only)
But we can go further than just representing the correlation between two separate
variables; we can formally measure the strength of the association between them.
• The Correlation Coefficient
As indicated, the idea behind the correlation coefficient is that we can give a numbervalue to the strength of relationship between one variable and another. There are
two main measures commonly used: Spearman's Rank Correlation Coefficient
and Pearson's Product-moment Correlation Coefficient. The former of these
two is the least complicated to calculate and allows us to assess the aesthetic or
qualitative characteristics of data. The latter allows us to measure the strength of
the association between two variables by working out the dispersion of the
scattergram points.
There is an illustration of correlation coefficient measures in the 'Crunching' section
on TimeWeb.
Top
Normal Distribution Curve illustration
The chart below illustrates a normally distributed population. You will notice that the
curve conforms to the characteristics outlined in the explanation section: the most
frequent value is at the centre; there is symmetry about the central value; there is
diminishing frequency as you move away from the centre.
A line is drawn from each of the two points of inflexion (one on either side of the mean)
to the X-axis. The distance from that point to the mean point on the X-axis is equal to
the standard deviation.
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 6/14
Four separate areas are now identifiable from the chart:
Area A shows the area between the mean and one standard deviation above the mean.
Area B shows the area between the mean and one standard deviation below the mean.
Area C indicates the area to the right of one standard deviation above the mean.
Area D indicates the area to the left of one standard deviation below the mean.
Because the normal curve is symmetrical, Area A equals Area B. Areas C and D are also
equal. The total of A, B, C and D equals the total area under the curve, or the entire
population.
Mathematical calculations show that in any normal distribution, approximately 68% of
all observations fall within one standard deviation (SD) of the mean (Areas A plus B).
So, about 34% of observations lie between the mean and one standard deviation above
the mean (Area A) and 34% lie between the mean and one standard deviation below
the mean (Area B). By subtraction, we can tell that in a normal distribution 32% of the
observations fall outside one standard deviation, 16% on either side (16% in Area C
and 16% in Area D).
Let's now put this into the language of probability: In any normal distribution, there is a
.68 probability that a particular value will fall within one standard deviation of the
mean; there is approximately a .34 probability that a value will lie between the mean
and one SD above the mean (Area A) and a .34 probability that a value will lie between
the mean and one SD below the mean (Area B).
Also, there is a .16 probability that a particular value will lie above one SD from the
mean (Area C) and a .16 probability that the value will lie below one SD from the mean
(Area D).
Using this knowledge, we can re-draw our normal curve chart, now putting in six
separate areas:
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 7/14
The vertical lines from the curve to the X-axis represent the mean (at the centre) and
distances of one and two SDs on either side of the mean.
Areas A and B have the same characteristics as in the first chart; each being equal and
each containing approximately 34% of all the values in the normal distribution.
Areas C and D are also equal and are defined by the vertical lines indicating one and
two SDs from the mean (on either side). Each of these areas contain approximately
13.5% of all the values in the normal distribution.
Areas E and F at the extreme ends of the curve are defined by the vertical line
indicating three SDs from the mean and the tail ends of the distribution. Each of these
areas contain 2.5% of all the values . In other words, in a normal distribution, 5% of a
population will be beyond two SDs: 2.5% above the mean and 2.5% below.
Let's restate this information in the language of probability:
1. In any normal distribution, there is a .34 probability that any particular
value will fall between the mean and one SD above the mean (Area A) and the same
probability of the value falling between the mean and one SD below the mean (Area
B).
2. There is a .135 probability of any value falling between one and two SDs
above the mean (Area C) and the same probability of the value falling between one
and two SDs below the mean (Area D).
3. There is a .475 probability that any value will fall between two SDs above
the mean (within Areas A to C) and the same probability of the value falling between
two SDs below the mean (within Areas B to D).
4. The mathematics of normal curves shows that the area contained by the
vertical lines representing three SDs from the mean contains 99.7% of the area
under the curve and 99.7% of all the values in the data set. There is, therefore, a
probability of .997 that in any normal distribution any particular value will fall within
three SDs from the mean.
Why not try the what samples tell us worksheet to see that you understand this?
Top
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 8/14
Random Sampling:
Random sampling is
usually the preferred
method of sampling,
because of the lack of
built-in bias that is
involved.
This method requires
that a list of every
member of the
population is available. There are times when this will be impossible, for instance when
an entire national or regional population is involved, or for example if you are studying
the whole population of small businesses in the UK. In these cases, the simple random
sampling method outlined below will not be appropriate.
In a simple random sample, with a list of the entire population being studied, the
sampler gives a number to every item on the list and selects the sample by using a
random number generator or a table of random numbers.
Here's how it works.
Imagine you want to study all the cars being stored in a warehousing complex, but you
don't have the time or other resources to deal with them all. You might decide to work
with a sample of 30 cars out of a total warehouse population of 1000.
So, you begin by assigning a number to every member of the total population. As the
largest number you need (1000) has four digits, every car in the warehouse is given a
four digit number, beginning with 0001, 0002, 0003 and so on, up to 1000.
You look at your list of random numbers, which looks like the following:
A TABLE OF RANDOM NUMBERS
00 10097 32533 76520 13586 34673 54876 80959 09117 39292 74945
01 37542 04805 64894 74296 24805 24037 20636 10402 00822 91665
02 08422 68953 19645 09303 23209 02560 15953 34764 35080 33606
03 99019 02529 09376 70715 38311 31165 88676 74397 04436 27659
04 12807 99970 80157 36147 64032 36653 98951 16877 12171 76833
05 66065 74717 34072 76850 36697 36170 65813 39885 11199 29170
06 31060 10805 45571 82406 35303 42614 86799 07439 23403 09732
07 85269 77602 02051 65692 68665 74818 73053 85247 18623 88579
08 63573 32135 05325 47048 90553 57548 28468 28709 83491 25624
09 73796 45753 03529 64778 35808 34282 60935 20344 35273 88435
10 98520 17767 14905 68607 22109 40558 60970 93433 50500 73998
11 11805 05431 39808 27732 50725 68248 29405 24201 52775 67851
12 83452 99634 06288 98083 13746 70078 18475 40610 68711 77817
13 88685 40200 86507 58401 36766 67951 90364 76493 29609 11062
14 99594 67348 87517 64969 91826 08928 93785 61368 23478 34113
15 65481 17674 17468 50950 58047 76974 73039 57186 40218 16544
16 80124 35635 17727 08015 45318 22374 21115 78253 14385 53763
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 9/14
17 74350 99817 77402 77214 43236 00210 45521 64237 96286 02655
18 69916 26803 66252 29148 36936 87203 76621 13990 94400 56418
19 09893 20505 14225 68514 46427 56788 96297 78822 54382 14598
20 91499 14523 68479 27686 46162 83554 94750 89923 37089 20048
21 80336 94598 26940 36858 70297 34135 53140 33340 42050 82341
22 44104 81949 85157 47954 32979 26575 57600 40881 22222 06413
23 12550 73742 11100 02040 12860 74697 96644 89439 28707 25815
24 63606 49329 16505 34484 40219 52563 43651 77082 07207 31790
25 61196 90446 26457 47774 51924 33729 65394 59593 42582 60527
26 15474 45266 95270 79953 59367 83848 82396 10118 33211 59466
27 94557 28573 67897 54387 54622 44431 91190 42592 92927 45973
28 42481 16213 97344 08721 16868 48767 03071 12059 25701 46670
29 23523 78317 73208 89837 68935 91416 26252 29663 05522 82562
30 04493 52494 75246 33824 45862 51025 61962 79335 65337 12472
31 00549 97654 64051 88159 96119 63896 54692 82391 23287 2952932 35963 15307 26898 09354 33351 35462 77974 50024 90103 39333
33 59808 08391 45427 26842 83609 49700 13021 24892 78565 20106
34 46058 85236 01390 92286 77281 44077 93910 83647 70617 42941
35 32179 00597 87379 25241 05567 07007 86743 17157 85394 11838
36 69234 61406 20117 45204 15956 60000 18743 92423 97118 96338
37 19565 41430 01758 75379 40419 21585 66674 36806 84962 85207
38 45155 14938 19476 07246 43667 94543 59047 90033 20826 69541
39 94864 31994 36168 10851 34888 81553 01540 35456 05014 51176
40 98086 24826 45240 28404 44999 08896 39094 73407 35441 31880
41 33185 16232 41941 50949 89435 48581 88695 41994 37548 7304342 80951 00406 96382 70774 20151 23387 25016 25298 94624 61171
43 79752 49140 71961 28296 69861 02591 74852 20539 00387 59579
44 18633 32537 98145 06571 31010 24674 05455 61427 77938 91936
45 74029 43902 77557 32270 97790 17119 52527 58021 80814 51748
46 54178 45611 80993 37143 05335 12969 56127 19255 36040 90324
47 11664 49883 52079 84827 59381 71539 09973 33440 88461 23356
48 48324 77928 31249 64710 02295 36870 32307 57546 15020 09994
49 69074 94138 87637 91976 35584 04401 10518 21615 01848 76938
You begin the selection by pointing (with your eyes closed) to an area in the table.Imagine you point to line 10 (the lines are numbered down the left-hand side of the
table). The first possible four digit number between 0001 and 1000 is 0177. Notice that
as the table contains five digit numbers, it's acceptable to start by taking the fifth digit
of the first number in line 10.
The second four digit number is 0568.
The third number is 0722.
The fourth is 0940.
The fifth is 0970.
The sixth is 0500.
You would continue down the table, gathering four digit numbers until you had collectedthirty numbers between 0001 and 1000. Each of these would represent one car in the
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 10/14
warehouse, chosen at random to form a sample of thirty cars.
There is less bias in this selection method because every member of the population has
an equal chance of being selected, and represented in the sample. You have made no
attempt to organise the population into sections, so the selection process is free from
your direction.
Top
Probability
Jaques Bernoulli was the first to suggest what is known as the 'central limit theorem'
which is based on his work on probability. Imagine that you have a container that holds
thousands of pebbles; you don't know how many there are, neither do you know that of
the 5000 pebbles, 3000 of them are white and 2000 black. The ratio of white to black
pebbles is therefore 3:2.
Bernoulli asked how many pebbles you would draw from the container before you could
make an estimate of the actual ratio of white to black pebbles. Of course you would
begin to get a fairly clear idea pretty soon, as you picked out a pebble, noted its colour
and then replaced it in the container. But the key to the limit theorem is whether or not
you can repeat the experiment over and over until it's ten, or one hundred times more
probable that the 3:2 ratio exists.
Bernoulli states that this is the case; the more experiments are carried out, the more
likely it is that the estimated ratio will get close to the true ratio.
Top
Time series
To identify trends in time series data, other than drawing a trend curve onto a graph
freehand, there are two common measures used:
• using moving averages.
• using regression analysis to find the line of 'best fit'.
Top
ILLUSTRATION
Sampling and Statistics
Contents:
• Degrees of Freedom example
• Example of a collection of sample means (s-means)
Degrees of Freedom example
There is an explanation available of degrees of freedom if you are not sure.
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 11/14
If a random sample of 16 light bulbs produced in a larger batch is selected and the
mean of the sample is 1450 hours and the estimated SD is 80 hours, estimate the
population mean at the 95% confidence level.
SE of the sample means = 80 / ( 16)
= 20 hours
Number of degrees of freedom = 16 - 1
= 15
The t statistic (read from the t distribution tables) at a 95% level and with 15 degrees
of freedom = 2.13
So the population mean = 2.13 x 20 43 hours
So, we can be 95% confident that m (population mean) lies in within the range
1450 +/- 43 = 1407 to 1493 hours.
[Top]
Example of a collection of sample means (s-means)
Assume that we can properly identify a sample from a large population that we are
interested in studying, by using random, quota or stratified sampling techniques
outlined earlier.
We are interested in collecting a representative sample of a large population: for
instance, numbers of people in the workforce who are aged under eighteen. Let's say
we want to find out how many hours per week this group works on average.
Imagine that we sample a group of thirty people under the age of eighteen who are in
some form of paid work. We have a group of numbers that represent the number of
hours worked in a week by each of the thirty people in our sample. We can then
calculate the mean of this sample, either by adding up all the values and dividing by
the total number in the sample, or by entering the values into an Excel worksheet and
getting the calculation done that way.
Now, suppose that in our desire to produce as representative a sample as possible
within the time and cost contraints of our project, we continue to draw samples of thirty
people under the age of eighteen. We use the same random process as with our first
sample and make sure that we do not include in the samples anyone who was part of
the earlier samples.
What we have produced is a collection of sample means, one for each of the samples
we have drawn from the population. These are quite likely to be fairly close to each
other in value, but there will be some differences. In other words, the collection of
means from the various samples taken will have a frequency distribution (with a mean
value, a median, a variation and a standard deviation).
Let's suppose that the following table represents the ten samples and their means:
Sample 1: S-Mean = 6.25 hours
Sample 2: S-Mean = 6.50 hours
Sample 3: S-Mean = 6.00 hours
Sample 4: S-Mean = 7.75 hours
Sample 5: S-Mean = 4.50 hours
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 12/14
Sample 6: S-Mean = 8.00 hours
Sample 7: S-Mean = 3.50 hours
Sample 8: S-Mean = 9.25 hours
Sample 9: S-Mean = 4.75 hours
Sample 10:S-Mean = 6.50 hours
Remember that each of these S-Means is the average for a sample of thirty under-
eighteens who carry out some form of paid work. The list of numbers will have mean
value and a standard deviation. Can you place these into an Excel worksheet to
calculate these values?
The mean value in this case is 6.3 hours and the standard deviation is 1.74 hours.
Check you could get the same result by using a spreadsheet package.
Of course, if we kept collecting samples like the ten in this example, eventually we
would have sampled the entire population (as long as we made sure that no two under-
eighteens were in more than one sample). The average of all of our samples would then
be the average for the whole population, because all of our samples were the same as
the whole population.
In practice, we don't have the time or the money to conduct such a huge sampling taskand in most cases, we don't have to.
There is a worksheet available on 'what samples tell us'
• Standard Errors and Increasing Sample Size
As we have seen, we can take a single sample of more than 30 items and make
conclusions about the large population from which it is drawn.
As we find out more about the standard error, we can notice other interesting details
that should aid our understanding of statistics in practice.
Firstly, the size of the confidence interval depends on the size of the standard error.
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 13/14
So, if we can minimise the standard error, we can reduce the range of values in each
confidence level - thus producing more precise conclusions.
This is because we calculate the standard error from the sample by taking the
standard deviation of the sample and dividing it by the square root of the number of
observations in the sample. Increasing the number in the sample may only have a
small effect on reducing the size of the standard error.
You may ask yourself how much you would have to increase the sample size by in
order to have any significant impact. The answer is that increasing the sample size
will indeed narrow the range of results, but that the sample size has to be increased
so dramatically that the cost and time taken would make it unworkable.
This can be illustrated by the following example: If we were studying a sample of
100 students and their exam performance and if the standard deviation of the list of
results was, say, 14, then we could calculate the standard error by dividing the
standard deviation by the square root of the number in the sample. So, 14 divided
by the square root of 100, or 14 divided by 10 = 1.4.
This means that in estimating the confidence intervals for the entire population of
students, we use the figure of 1.4 marks as the basis of the intervals to calculate the
ranges for .68, .95 and .99 probability.
We might think that we ought to try to reduce the range in order to get a more
precise result. We could increase the sample size, in order to increase the size of its
square root and therefore reduce the size of the standard error.
But, because we are dealing with the square root of the number in the sample, we
find that to have any significant impact on the standard error, we would have toincrease the sample size considerably.
So in the example given above, we were studying a sample of 100 students, and
found the result for a standard error of 1.4 by dividing the standard deviation of the
sample (14) by the square root of 100 (10). If we wanted to reduce the standard
error by one half, we would have to divide 14 by 20. In order to do this we would
have to sample 400 students, as the square root of 400 is 20.
Do you see the relationship here between sample size and size of the standard
error? In order to halve the standard error, we have to increase the sample size by
four times its original scope.
What we should learn from this is that in many cases it is not worth the effort of
increasing the sample size in order to achieve more precise results. If you bear in
mind that the really time consuming part of the analysis is the selection of the
sample information, then you can see that it is usually more efficient to keep the
sample relatively small (as long as it is over 30 items) and to focus our efforts on
gathering the best sample we can. This means, of course, ensuring that our sample
is as free as possible from bias.
• Review of confidence interval analysis of a population from a single
sample.
8/8/2019 Sampling Methods and Survey Types
http://slidepdf.com/reader/full/sampling-methods-and-survey-types 14/14
It may be wise here, to review the steps we should take in making generalisations
within confidence levels abour an entire population from a single sample:
1. Firstly, we select a sampling strategy, which usually means a
random sample, and select our sample, making sure that we have at least
30 observations within it.
2. Then we collect the information from the sample and process it,
(using Excel or similar spreadsheet package), in order to find out the mean
and the standard error of the sample.
3. Finally, we make conclusions at the different confidence intervals:
68% for a range within plus or minus 1 standard error of the mean of the
sample; 95% for a range within plus or minus 2 standard errors of the mean
of the sample; and 99% for a range within plus or minus 3 standard errors
of the mean of the sample.