i
Cairo University
Faculty of Economics and Political Science
Department of Statistics
A Mathematical Programming Approach to
Stratified Random Sampling
Prepared by
Dina Mohsen Mohamed Sabry
Supervised by
Prof. Ramadan Hamed Mohamed Prof. Reda Ibrahim Mazloum
Professor of Statistics Professor of Statistics
Department of Statistics Department of Statistics
Dr. Mahmoud Mostafa Rashwan
Assistant Professor of Statistics
Department of Statistics
A Thesis Submitted to the Department of Statistics, Faculty of Economics and
Political Science in Partial Fulfilment of the Requirements for the M.Sc. Degree in Statistics
2012
ii
A Mathematical Programming Approach to Stratified Random Sampling
Abstract
When applying stratified sampling, the problem of allocating the sample to different strata
arises. Many classical methods are available to allocate the sample to the different strata.
Nevertheless, mathematical programming methods have many advantages and can handle the
allocation problem while overcoming the limitations of the classical methods. Thus, there
have been many attempts by researchers to apply mathematical programming in the field of
sampling. Most of these attempts concentrate on minimizing the variances of the overall
estimators when optimally allocating the sample to the different strata. However, none of the
models focuses on minimizing the variances of the estimators within the strata and this is
what this study aims to deal with. In many practical situations, the purpose of the study could
be to estimate overall estimators in addition to separate estimators within each stratum.
Hence, the present study targets minimizing the coefficients of variation of the overall
estimators in addition to the coefficients of variation of the estimators within the strata when
optimally allocating the sample. This creates a multiple objective problem that needs to be
dealt with using the appropriate approach. As a result, this study adopts a goal programming
approach that tries to tackle this problem in multivariate surveys by maximizing the precision
of the overall estimators in addition to the precision of the estimators within each stratum
under a fixed cost. Integer programming is used to guarantee integer values for the optimal
allocation. The proposed approach is compared with three of the classical methods of
allocation in addition to five mathematical programming models suggested in the literature
using a simulation study. Based on the criteria used for comparison, it is shown that the
suggested models have the highest efficiency in obtaining the estimators within the strata in
certain cases.
Keywords: Multivariate Stratified Sampling; Optimum Allocation; Goal Programming.
Supervised by
Prof. Ramadan Hamed Mohamed Prof. Reda Ibrahim Mazloum
Professor of Statistics Professor of Statistics
Department of Statistics Department of Statistics
Dr. Mahmoud Mostafa Rashwan
Assistant Professor of Statistics
Department of Statistics
A Thesis Submitted to the Department of Statistics, Faculty of Economics and Political
Science in Partial Fulfilment of the Requirements for the M.Sc. Degree in Statistics
2012
iii
Name: Dina Mohsen Mohamed Sabry Youssef
Nationality: Egyptian
Date and Place of Birth: 9/12/1985, Giza – Egypt
Degree: Master of Science
Specialization: Statistics
Supervisor:
Prof. Ramadan Hamed Mohamed Prof. Reda Ibrahim Mazloum
Professor of Statistics Professor of Statistics
Department of Statistics Department of Statistics
Dr. Mahmoud Mostafa Rashwan
Assistant Professor of Statistics
Department of Statistics
Title of the Thesis:
A Mathematical Programming Approach to Stratified Random Sampling
Summary of the Thesis:
The main objective of this study is to introduce goal programming models that try to
tackle the problem of sample allocation in stratified random sampling by taking into account
the precision of the overall estimators in addition to the precision of the estimators within the
strata under a fixed budget. Hence, the present thesis focuses on the formulation of the
proposed models. Moreover, the proposed models are compared with other models presented
in the literature through a simulation study. The performance of the models is evaluated using
three criteria that measure the efficiency of the models in obtaining the overall estimators in
addition to the estimators within the strata.
The present thesis is divided into five chapters which are organized in the following manner:
Chapter 1: Introduces the main objectives of this study in addition to outlining the contents
of the thesis.
Chapter 2: Illustrates a review on stratified random sampling, in addition to some of the
classical methods of sample allocation. Moreover, the notations that are to be used throughout
the thesis are to be demonstrated in this chapter as well.
iv
Chapter 3: Presents a review on various mathematical programming approaches suggested in
the literature that deal with the problem of sample allocation in stratified random sampling.
Chapter 4: Introduces the proposed goal programming approach discussing the criteria that
are to be used for comparison in addition to the simulation study conducted and the
conclusions reached from the simulation.
Chapter 5: Discusses the main concluding remarks reached and presents some points for
future work.
v
Acknowledgments
I would like to express my most profound gratefulness and appreciation to Prof. Ramadan
Hamed for his patience, guidance and continuous help during the preparation time of this
thesis.
Also, my deepest gratitude goes to Prof. Reda Mazloum for her support, care and
co-operation in providing me with her knowledge and expertise whenever needed.
I would also like to genuinely and sincerely thank Dr. Mahmoud Rashwan who never
hesitated in helping and assisting me. Dr. Mahmoud was very supportive, encouraging and
always provided me with positive energy that motivated me during the tough times of my
research.
A warm and heartfelt indebtedness and thankfulness goes to my family especially my parents
who were always there for me and for their unconditional love and support throughout my
whole life.
Last but not least, I would like to dedicate a very special thanks to my professors, colleagues
and friends at the faculty of Economics and Political Science for their continuous support.
vi
Table of Contents
Chapter 1: Introduction .............................................................................................. 1
1.1 Research Objective ............................................................................................... 2
1.2 Thesis Outline ....................................................................................................... 3
Chapter 2: Review on Stratified Random Sampling ................................................ 4
2.1 Stratified Random Sampling ................................................................................ 4
2.2 Types of Sample Allocation ................................................................................. 6
2.3 Sample Allocation with More than One Variable ................................................ 9
Chapter 3: Review on Mathematical Programming Approaches to Sample
Allocation in Stratified Random Sampling .......................................... 11
3.1 Univariate Case .................................................................................................. 12
3.2 Multivariate Case (correlation is not taken into account) .................................. 13
3.2.1 Cost As An Objective .................................................................................. 13
3.2.2 Precision As An Objective ........................................................................... 14
3.3 Multivariate Case (correlation is taken into account) ........................................ 24
3.4 Precision of Stratum Estimators ......................................................................... 25
Chapter 4: The Suggested Mathematical Programming Approach ..................... 26
4.1 The Suggested Mathematical Programming Approach ...................................... 27
4.1.1 The Suggested Objectives ............................................................................ 27
4.1.2 The Proposed Models .................................................................................. 28
4.1.3 The Criteria for Comparison ........................................................................ 32
4.2 Simulation Study ................................................................................................ 33
4.2.1 The Design of the Simulation Study ............................................................ 33
4.2.2 Data generation ............................................................................................ 35
4.2.3 Software Packages ....................................................................................... 37
4.3 Simulation Results .............................................................................................. 39
4.3.1 Mean of Relative Efficiencies (MRE) .......................................................... 40
4.3.2 Total Sample Size ........................................................................................ 42
4.3.3 Mean of Coefficients of Variation (MCV) ................................................... 42
4.3.4 Relative Mean Index (RMI) ......................................................................... 48
4.3.5 The Effect of Varying the Budget on the Models’ Performance ................. 49
Chapter 5: Conclusions and Further Research ....................................................... 52
References.. ................................................................................................................. 54
vii
List of Tables
Table 4.1 : Summary of the Models under Comparison with the Proposed Approach….32
Table 4.2 : Simulation Design…………………..………………………………….........35
Table 4.3 : Combination 1: 2x2 (2 strata and 2 variables)…………………………........36
Table 4.4 : Combination 2: 3x2 (3 strata and 2 variables)…………………………........36
Table 4.5 : Combination 3: 4x2 (4 strata and 2 variables)…………………………........36
Table 4.6 : Combination 4: 2x3 (2 strata and 3 variables)…………………………........37
Table 4.7 : Combination 5: 3x3 (3 strata and 3 variables)…………………………........37
Table 4.8 : Combination 6: 4x3 (4 strata and 3 variables)…………………………........37
Table 4.9 : Mean of Relative Efficiencies (MRE)…………………………………........40
Table 4.10 : Total Sample Size “ ”………………………………………………..........42
Table 4.11 : Mean of Coefficients of Variation in the 2 Strata Case…….......……….…43
Table 4.12 : Mean of Coefficients of Variation in the 3 Strata Case………...………….44
Table 4.13 : Mean of Coefficients of Variation in the 4 Strata Case…………...……….46
Table 4.14 : Relative Mean Index (RMI)……..……………………………………........48
Table 4.15 : Mean of Relative Efficiencies (MRE) Under Different Budgets…..………50 Table 4.16 : Total Sample size “ ” Under Different Budgets…..…………………....... 51
viii
Glossary of Notation Total number of units in stratum
Total population size
Number of units in the sample drawn from stratum
Total sample size
Value obtained for the th unit in the th stratum
Stratum weight
True population mean in stratum
Sample mean in stratum
True population variance in stratum
Sample variance in stratum
Overall population mean
Overall sample mean
( ) Variance of the sample mean in stratum
( ) Variance of the overall sample mean
Sample size in the th stratum for the th variable
Value obtained for the th unit in the th stratum for the th variable
True population mean of the th variable in stratum
Sample mean of the th variable in stratum
True population variance of the th variable in stratum
Sample variance of the th variable in stratum
( ) Variance of the sample mean of the th variable in stratum
( ( )) Variance of the overall sample mean of the th variable
Total budget
Fixed cost
Cost per sampling unit in the th stratum
Weights representing the importance of the th variable
( ( )) Individual desired variance of the overall sample mean of the th variable
( ( )) Compromise variance of the overall sample mean of the th variable under optimum
compromise strata sample sizes
( ) Compromise variance of the sample mean of the th variable in the th stratum under
optimum compromise strata sample sizes
Positive deviation: The amount of deviation for a given goal by which it exceeds the
aspired level (target)
Negative deviation: The amount of deviation for a given goal by which it is less than
the aspired level (target)
The lower bound on the sample size that is to be drawn from the th stratum
1
Chapter 1
Introduction
“Sampling is the process by which inference is made to the whole by examining
only a part”. Sample surveys are conducted on different cultural and scientific aspects
[18]. The use of sampling surveys arose from the need to minimize the time and effort
that is greatly consumed when using complete enumeration. Moreover, although the
cost per observation in sample surveys is higher than in complete enumeration; the
overall cost of the sample survey will be much less. Furthermore, sometimes
obtaining data by complete enumeration is not possible as in destructive tests such as
testing the life of electric bulbs and haematological testing [18].
In addition, more comprehensive (and frequent) data can be obtained using
sampling surveys as it is possible to make use of the highly trained and competent
personnel or the specialized equipment that are limited in availability. Hence, sample
surveys offer more scope and flexibility regarding the types of information that can be
collected which are impractical to obtain using complete enumeration [5].
Furthermore, sample surveys can produce more accurate results as opposed to
complete enumeration. And this is because the volume of work in surveys that rely on
sampling is much less. So, it is possible to employ staff of higher quality and more
careful supervision of the processing of the results can be provided [5].
Nevertheless, there are situations where complete enumeration appears to be
essential; for example, when basic information is needed for every unit such as
counting the population for census purposes and a voter’s list [18]. In addition,
sampling may not be useful in case the population is small or the variance in the
variable being measured is high [1].
In practice, post-enumeration sample surveys are usually conducted in order to
evaluate and supplement censuses by assessing the coverage and the errors that will
inevitably take place. Hence, it can be observed that sample surveys are often used in
conjunction with censuses and as a result sampling and complete enumeration are
“complementary and, in general, not competitive” [18].
Many sampling designs are available when conducting surveys. One of the most
frequently used designs is stratified sampling. In this design the population is divided
into separate sub-populations called strata. The main problem that faces researchers
2
when applying this design is to determine the sample size that is to be selected from
each stratum. This is known as the sample allocation problem.
This allocation problem was dealt with by many classical methods such as: equal
share allocation, proportional allocation and optimum allocation. In the optimum
allocation method, the allocation to the different strata is determined by minimizing
the variance of the overall estimator for a given total cost or minimizing the cost for a
given level of precision (measured by the variance of the overall estimator). However,
classical methods sometimes suffer from limitations such as: the inability to optimize
several objectives simultaneously, producing non-integer values for the sample sizes
and in some cases, producing a sample size larger than the corresponding stratum
size. Nonetheless, mathematical programming has many tools that can overcome
these limitations faced by classical methods. Thus, many researchers tried to tackle
this problem using mathematical programming approaches.
Most of the mathematical programming models available in the literature deal with
the allocation problem in the multivariate case. In these models, the allocation is
considered to be optimum if it minimizes the variances of the overall estimators
subject to a fixed cost or if it minimizes the total cost subject to a given level of
precision. However, none of the models concentrate on the minimization of the
variances of the estimators within the strata. In many surveys, it is sometimes the
objective of the study to obtain overall estimators in addition to separate estimators
within the strata. Hence, the precision of both overall estimators and estimators within
the strata should be taken into account when finding the optimal allocation.
In the following section, the main research objectives are introduced and section
1.2 will outline the main contents of the thesis.
1.1 Research Objective:
This study targets developing a goal programming approach that tackles the
allocation problem in multivariate surveys by maximizing the precision of the
overall estimators in addition to the precision of the estimators within each
stratum under a fixed cost. Integer programming is applied to guarantee integer
values for the sample sizes. The performance of the proposed approach is
compared with three of the classical methods of allocation in addition to five
mathematical programming models available in the literature using a simulation
study.
3
1.2 Thesis Outline:
Chapter 1: Presents an introduction to the thesis.
Chapter 2: Presents a review on stratified random sampling, stating the main reasons
for using stratified random sampling in addition to the properties of the estimators and
the main notations that are to be used throughout the study. Moreover, some of the
different classical methods of sample allocation are demonstrated in this chapter.
Chapter 3: Illustrates a review on the previous research that applies mathematical
programming to deal with the allocation problem in stratified random sampling. The
previous literature is divided into models conducted in the univariate case,
multivariate case without taking the correlation between the variables into account
and then the multivariate case while taking the correlation into consideration. Finally,
the chapter will end with a brief review on some of the attempts that take the precision
of the estimators within the strata into account.
Chapter 4: Introduces the suggested goal programming approach discussing the
suggested objectives, the different proposed models and the criteria used for
comparison. Moreover, this chapter demonstrates the design of the simulation study,
the procedures used for data generation and the different software packages used in
conducting the simulation. Finally, the chapter will end with an analysis of the main
results obtained from the simulation.
Chapter 5: Summarizes the main conclusions reached based on the performed
simulation study. In addition, the chapter will show some recommended points for
further research.
4
Chapter 2
Review on Stratified Random Sampling
The present chapter will first consider a review on stratified random sampling
indicating the reasons that may lead to the stratification of a population into distinct
sub-divisions (strata) and the notations that are to be used throughout this study.
Furthermore, the general properties of the estimators used will be dealt with in this
chapter. Finally, this chapter will consider the different types of allocating the total
sample to the different sub-populations and it will illustrate an allocation method
used in case of having more than one important variable.
2.1 Stratified Random Sampling:
There are different sampling designs available when conducting surveys. The
simplest design that is considered to be the basic sampling technique is simple
random sampling. In this sampling design each unit in the population has the same
chance of selection. Simple random sampling forms the basis of most of the other
designs [5], [18].
Another technique of sampling which is the most frequently used is stratified
sampling where the population is divided into suitable sub-populations that are
internally homogeneous but heterogeneous with respect to each other. There are many
reasons for dividing the population into distinct sub-populations: [2], [5], [16], [18]
1- When the variability in the population is very large, the use of stratified
sampling appears to be advantageous. Moreover, if it is required to give a
larger weight to some units that are uncommonly occurring in the population
(such as respondents with very high income) then, stratified sampling is of
significance in this case.
2- Stratified sampling can produce estimates for each stratum of the population
separately, such as estimates for each geographical sub-population.
3- When using stratified sampling there is the benefit of utilizing the flexibility
of using different sampling techniques in the different strata. For example,
simple random sampling or systematic random sampling could be applied in
the different strata.
4- Stratified sampling produces more precise estimates than those produced by
simple random sampling of the same size (especially when the measurements
within the strata are homogenous).
5
5- The cost per observation may be reduced when using stratified sampling (the
cost per observation includes the cost of the interviewer, time and travel)
6- Administrative convenience may command the use of stratified sampling. For
instance, the agency conducting the survey may have field offices, each of
which can supervise the survey for a part of the population.
In stratified sampling, the population consists of units, and it is divided into
non-overlapping sub-populations (called strata) of sizes units.
The values of ( ) are known in advance and when the strata have
been determined, a sample is drawn from each stratum independently and the sample
sizes are denoted by respectively.
Throughout this study, it is going to be taken for granted that the strata have
already been determined, the technique used in the different strata is simple random
sampling, and that sampling is done without replacement. Furthermore, this study will
only be concerned with the estimation of the mean.
Notation and Properties of the Estimators:
Throughout this study, the notation of Cochran (1977) [5] will be adopted, where
the subscript denotes the stratum and denotes the unit within the stratum:
total number of units in stratum ,
∑
total population size ,
number of units in the sample drawn from stratum ,
∑
total sample size ,
value obtained for the th unit in the th stratum ,
stratum weight ,
∑
true population mean in stratum ,
∑
sample mean in stratum ,
∑ ( )
true population variance in stratum ,
6
In stratified sampling, the population mean is denoted by and has the following
formula:
∑ ∑
∑
∑
(2.1)
An unbiased estimator for the population mean is ( stands for stratified),
where,
∑
(2.2)
Since as previously mentioned, sampling is done independently in the different
strata, hence:
( ) ∑ ( )
(2.3)
And provided that simple random sampling is applied in the different strata (which
is the case in our study), thus:
( )
(
)
(
) (2.4)
As a result, the variance of the estimator in stratified random sampling has the
following formula:
( ) ∑
(
)
(2.5)
2.2 Types of Sample Allocation:
In stratified random sampling, the problem of finding the values of the sample
sizes in the respective strata (i.e. allocating the sample) arises. There are several
methods of allocation such as: optimum allocation, Neyman allocation, equal share
allocation, proportional allocation and predetermined allocation. In this section the
different types of allocation are briefly discussed.
1- Optimum Allocation:
The allocation of the sample to the different strata is determined by either
minimizing the variance of the estimator ( ) for a given total cost “ ” or
minimizing the cost for a given level of precision (i.e. ( ) ). The
simplest form of the cost function is:
7
∑ (2. )
where is the cost per sampling unit in the th stratum, is the total budget
available and is the overhead (fixed) cost. There are other forms for the cost
function, however, only the linear form will be considered in this study.
The optimum allocation formula (in terms of the total sample size ) has
the following form:
( √ )
∑ ( √ )
(2. )
Hence, we can conclude from this formula that the sample size in a certain
stratum increases as the size of the stratum increases, as the variability
within the stratum increases and as the cost per unit in the stratum
decreases.
The previous formula is in terms of the sample size which may not be
known in advance. Thus, if the cost is fixed then the optimum values of can
be substituted in the cost function giving the following form:
( )∑ ( √ )
∑ ( √ )
(2. )
On the other hand, if the variance of the estimator is fixed
(say ( ) ) then the optimum values of can be substituted in ( )
giving,
(∑ √ )∑ ( √ )
( ⁄ )∑
(2. )
It should be noted that the values of are unknown. Hence, they are either
obtained from previous studies or estimated from a pilot investigation.
2- Neyman Allocation:
If the cost per unit is assumed to be equal for all the strata
(i.e. ) then, the cost function is reduced to:
(2.1 )
Hence, for a given total cost, the total sample size is of the following form:
( )
(2.11)
8
And the optimum allocation formula becomes [from equation (2.7)]:
∑ (2.12)
This type of allocation is known as “Neyman allocation” [5].
3- Equal Share Allocation:
This type of allocation divides the total sample into equal shares for the
different strata in the population,
(2.13)
Given that the total cost is fixed and takes the linear form (2.6), the total
sample size takes the following form:
∑ (2.14)
4- Proportional Allocation:
Here, the total sample is allocated to the different strata in proportion to the
total number of units in the sub-populations (i.e. is proportional to ),
(2.15)
In this type of allocation we select the same proportion of units from each
stratum.
For a given total cost, the linear cost function (2.6) gives the total sample
size in proportional allocation as follows [18]:
( )
∑ (2.1 )
where
.
If on the other hand, the cost per observation is equal for all the strata,
yielding the cost function (2.10) then the sample size will be given by the
formula (2.11).
5- Predetermined Allocation:
Predetermined allocation divides the total sample size (which could be
determined in a subjective way) among the different strata according to the
researcher’s judgement.
9
2.3 Sample Allocation with More than One Variable:
In all the previously presented types of allocation, it was assumed that there is only
one important variable that we base the allocation upon. However, this is usually not
the case since sample surveys usually include more than one important variable. And
an optimum allocation for one variable will not necessarily be optimum for another
[5]. Many researchers suggested solutions to this problem such as Chatterjee and
Yates (see [5]). However, in this study only one method is to be presented which is
“Cochran’s average”.
Cochran’s Average (i.e. compromised optimal allocation) :
A few of the most important variables are to be chosen to optimally
allocate the sample. Let the subscript denote the variable where (
). As mentioned earlier, equation (2.7) gives the optimum allocation
in terms of the total sample size , and equation (2.8) gives the total sample
size in case of a fixed total cost. Substituting (2.8) in (2.7) we get
( )( √ )
∑( √ ) (2.1 )
which represents the optimum allocation under a fixed total cost.
By applying this formula for each variable separately, we get the optimum
individual strata sample sizes,
( )( √ )
∑( √ ) (2.1 )
where,
value obtained for the th unit in the th stratum for the th
variable ,
∑
true population mean of the th variable in stratum ,
∑
sample mean of the th variable in stratum ,
∑ ( )
true population variance of the th variable in stratum .
10
The individual strata sample sizes given by (2.18) are to be averaged over
all the variables giving an optimum compromise allocation that takes all the
variables into account, i.e.:
∑
(2.1 )
In all the previous methods of allocation, there is no guarantee that the resulting
optimum allocation will be integer. This requires rounding of the values of the sample
sizes in the different strata which could provide a total cost that exceeds the total
budget specified (in case of a fixed total cost), hence providing infeasible solutions.
Moreover, in the previous allocation methods the problem of oversampling can occur
(oversampling happens when the sample size in one or more strata is larger than the
stratum size [6]). As noted by [5], the optimum allocation formula can produce an
in some strata that are larger than the corresponding number of units in the stratum
and this problem has happened in practice on several occasions. This problem arises
only when the overall sampling fraction (i.e.
) is large and the variability in some
strata is greater than the others [5].
Therefore, other alternatives to the classical methods have been applied that are
thought to overcome the previous problems. Hence, there have been many attempts by
researchers to apply mathematical programming in the field of sampling and this is
what the next chapter will discuss.
11
Chapter 3
Review on Mathematical Programming Approaches to Sample Allocation in Stratified Random Sampling
From the previous chapter, it can be seen that classical methods of sample
allocation offer only one objective subject to one constraint when optimally allocating
the sample. This can therefore be viewed as a limitation. Also, as stated before,
classical methods suffer from the problem of producing non-integer sample sizes for
the different strata. This could lead to infeasible solutions [i.e. having a total cost that
exceeds the total specified budget (in case of fixing the cost)] due to rounding.
Moreover, the problem of oversampling can be faced when using the classical
methods of allocation. Hence, the use of mathematical programming appears to be
advantageous as it can overcome these limitations.
Mathematical programming has several advantages over classical methods. First, it
offers the ability to optimize several objectives simultaneously and it has the benefit
of assigning priorities to different objectives. Also, several constraints could be
suggested. Second, mathematical programming can guarantee that the optimal
allocation has integer values by the use of integer programming. Third, it can ensure
that oversampling doesn’t occur. Accordingly, this chapter will illustrate a review on
the different mathematical programming approaches to sample allocation suggested in
the literature.
As previously mentioned, mathematical programming tools offer researchers the
advantage of optimizing more than one objective at the same time. And this is one of
the main benefits that has been utilised by many authors in the field of sampling. In
the coming sections, some of the mathematical programming models that were
suggested in the literature to determine the optimal sampling scheme are presented.
In most cases, we may want to estimate parameters for more than one variable;
therefore those variables should all be taken into consideration as the key variables
when determining the optimal strata sample sizes. Hence, the review will begin with
the models that were developed in the univariate case, the multivariate case without
taking the correlation between the variables into account, and then the multivariate
case where the correlation was taken into consideration. The present chapter will
finally end with a brief review on some attempts that take the precision of the
estimators within the strata into account.
12
Thus, the classification of the literature will be as follows:
All these cases will be presented in the following sections.
3.1 Univariate Case:
This section presents different mathematical programming models dealing with the
allocation problem when only one variable is of interest.
Arthanari and Dodge [2] presented a review on the use of mathematical
programming for optimal allocation of sample sizes in stratified random sampling.
They formulated the problem of obtaining statistical information on population
characteristics based on sample data as an optimization problem.
In the univariate case, the authors considered the problem of having strata where
it was assumed that the samples were drawn independently from different strata. The
problem of choosing optimal ’s is known as the “optimal allocation problem”. In
such a problem ’s are the decision variables and the objective can be the
minimization of the variance of the estimator of the variable under study (in this case
the estimator is ) with the restriction on the fixed total sample size . Hence, the
problem is formulated as:
Minimize ( ) ∑
(
)
, (3.1)
subject to ∑ , (3.2)
, integer, , (3.3)
is the true population variance in stratum and as mentioned before, its value is
either known from prior studies of the same kind or estimated from pilot
investigations.
Mathematical Programming Models
Univariate Multivariate
(no correlation)
Cost as an Objective
Precision as an Objective
Approach A Approach B Approach C
Multivariate (with
correlation)