clustering the countries of world on basis of some selected demographic and socio-economic...
DESCRIPTION
Clustering the Countries of World on Basis of Some Selected Demographic and Socio-Economic CharacteristicsTRANSCRIPT
1
Chapter 1
Prelude
1.1 Introduction
Most people think demography is just math in disguise a sort of dry social accounting. Once
exposed to the subject, many change their minds. They come to appreciate the profound
impact demographic forces have no societies. This has never been truer than during the past
half century, a period in which the United States and other societies have experienced
unprecedented social and demographic change. Since these demographic forces have not
been stilled, they will continue to cause social change and to shape social programs for the
balance of our lives and beyond.
People also find demography fascinating because it deals with so many personally relevant
topics. Nearly all the major events of people‟s lives have demographic implications birth,
schooling, marriage, occupational choices, childbearing, retirement and death.
Demography or population studies, is a discipline, an “interdiscipline,” and a sub discipline.
It is clearly a discipline because it is a field with its own body of interrelated concepts,
techniques, journals, departments, and professional associations. Demography is also an
interdisciplinary field because it draws its subject matter and methods from many disciplines,
including sociology, economics, biology, geography, history, and the health sciences. Finally,
demography is also considered a sub discipline within some of these same major disciplines.
In most universities, demography courses are taught within the sociology curriculum, perhaps
because population phenomena have so long been linked to social process. Demography is
defined as the study of human populations: their size, composition and distribution as well as
the causes and consequences of changes in these characteristics. Populations are never static.
They grow or decline through the interplay of three demographic processes: birth, death, and
migration. If some groups within a population grow or decline faster than others, the
composition of the whole is altered.
1.2 World’s Demographic and Socio-economic View
We entered the 20th
century with a population of 1.6 billion people. We entered the 21st
century with 6.1 billion people. And in 2007, world population is 6.6 billion. The increase in
the size of the human population in the last-half century is unprecedented. And nearly all of
the growth is occurring in the less developed countries. Currently, 80 million people are being
added every year in less developed countries, compared with about 1.6 million in more
developed countries. While the less developed countries will keep growing, the more
developed countries may grow slowly or not at all.
Population change is linked to economic development, education, the environment, the status
of women, epidemics, and other health threats, and access to family planning information and
services. All of these factors interact with every facet of our lives, regardless of where we
live. It is remarkable that, despite many new developments over the past 50 years, one fact
2
looks very much the same: populations are growing most rapidly where such growth can be
afforded the least.
In 2008, the world reaches an invisible but momentous milestone: for the first time in history,
more than half its population, 3.3 billion people, will be living in urban areas. By 2030, this is
expected to swell to almost 5 billion. Many of the new urbanities will be poor. Their failure,
the future of cities in developing countries, the future of humanity itself, all depend very
much on decisions made now in preparation for this growth.
While the world‟s urban population grew very rapidly (from 220 million to 2.8 billion) over
the 20th
century, the next few decades will see an unprecedented scale of urban growth in the
developing world. This will be particularly notable in Africa and Asia where the urban
population will double between 2000 and 2030, the towns and the cities of the developing
world will make up 80 percent of urban humanity.
Urbanization, the increase in the urban share of total population, is inevitable, but it can also
be positive. The current concentration of poverty, slum growth and social disruption in cities
does paint a threatening picture: yet no country in the industrial age has ever achieved
significant economic growth without urbanization. Cities concentrate poverty, but they also
represent the best hope of escaping it.
Cities also embody the environmental damage done by modern civilization; yet experts and
policymakers increasingly recognize the potential value of cities to long-term sustainability.
If cities create environmental problems, they also contain the solutions. The potential benefits
of urbanization far outweigh the disadvantages. The challenge is in learning how to exploit
the possibilities, in 1994, the Programmed of Action of the International Conference on
Population and Development called on governments to “respond of the need of all citizens,
including urban squatters, for personal safety, basic infrastructure and services, to eliminate
health and social problems….” More recently, the United Nations Millennium Declaration
drew attention to the growing significance of urban poverty, specifying, in Target 11, the
modest ambition of achieving by 2020 “a significant improvement in the lives of at least 100
million slum dwellers”.
1.3 Literature Review
Clusters have become the focal point of many new policy initiatives in the last few years, in
Europe as elsewhere around the globe. The challenge set out by the Lisbon European Council
in 2000 to make Europe “the world‟s most competitive and dynamic knowledge based
economy” in particular has sparked interest in new approaches to economic policy for
competitiveness. Mobilizing the potential of clusters is seen as critical to reach this ambitious
goal (See Christian Ketels, European Clusters, Structural Change in Europe 3 – Innovative
City and Business Regions, Hagbarth Publications, 2004).
Michael Porter defines clusters as geographically proximate groups of interconnected
companies and associated institutions in a particular field, linked by commonalities and
complementarities. Clusters are important, because they allow companies to be more
productive and innovative than they could be in isolation. And clusters are important because
3
they reduce the barriers to entry for new business creation relative to other locations (See
Michael Porter, Clusters and Competition, Harvard Business School Press, 2008).
Cluster analysis is a very important and effective statistical tool. It is used to find
homogenous groups. Some reports about cluster analysis are reviewed here:
Stan Salvador and Philip Chan, Determining the number of clusters/segments in Hierarchical
clustering/segmentation algorithms, Proc. 16th
IEEE international conference on tools with
Al, pp. 576-584, 2004.
Can, F., Ozkarahan, E.A. (1990) “Concepts and effectiveness of the cover coefficient based
clustering methodology for text databases.” ACM transactions on database system. 15 (4)
483-517.
Information theory, inference and learning algorithms by David J.C. Mackay includes chapter
on k-means clustering, soft k-means clustering and derivations including the E-M algorithm
and the variational view of the E-M algorithm.
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate
observations, proceedings of 5th
Berkeley Symposium on Mathematical Statistics and
Probability, Berkeley, University of California Press, 1:281-297.
Andreas F. Grein, S. Prakash Sethi, Lawrence G. Tatum, A Dynamic Analysis of Country
Clusters, the Role of Corruption, and Implications for Global Firms, The Patterns of
Corruption in the 21st Century, 6-7 September, 2008, Athens, Greece.
1.4 Relevance of this Study
In this study, we are interested to know how the countries of the world are grouped together
in the sense of some demographic view. We want to see which countries are similar and
which countries are dissimilar on those demographic characteristics. For this reason we want
to make a „Cluster Analysis‟ to make the clusters of the countries. In this study the countries
will be clustered or grouped on the basis of some chosen demographic and socio-economic
characteristics.
1.5 Objective of the study
The main objective of the study is given below:
To find the homogeneous groups of the countries of the world on the basis of some selected
demographic and socio-economic characteristics.
4
1.6 Organization of the study report
This study report is organized in five chapters. Brief descriptions of the chapters are given
below:
1. In the first chapter there is an introduction, a view of world‟s demography, literature
review, relevance of the study, and the objectives of the study.
2. In the second chapter the data and the relevant variables are described.
3. In the third chapter the methodology has been described.
4. The fourth chapter contains the analysis.
5. And the findings are stated in the fifth chapter.
5
Chapter 2
Data and variables
2.1 Introduction
This chapter provides a brief description of the data, data source and the variables. Also the
software package that has been used for this study has been discussed.
2.2 Source of the data
This study utilizes the data extracted from the Population Reference Bureau website and the
2012 World Population Data Sheet.
The Data Sheet lists all geopolitical entities with populations of 150,000 or more and all
members of the UN. These include sovereign states, dependencies, overseas departments, and
some territories whose status or boundaries may be undetermined or in dispute. More
developed regions, following the UN classification, comprise all of Europe and North
America, plus Australia, Japan, and New Zealand. All other regions and countries are
classified as less developed.
World and Regional Totals: Regional population totals are independently rounded and include
small countries or areas not shown. Regional and world rates and percentages are weighted
averages of countries for which data are available; regional averages are shown when data or
estimates are available for at least three-quarters of the region‟s population.
World Population Data Sheets from different years should not be used as a time series.
Fluctuations in values from year to year often reflect revisions based on new data or estimates
rather than actual changes in levels. Additional information on likely trends and consistent
time series can be obtained from PRB, and are also available from UN and U.S. Census
Bureau publications and websites.
The rates and figures are primarily compiled from the following sources: official country
statistical yearbooks, bulletins, and websites; the United Nations Demographic Yearbook,
2009-2010 and Population and Vital Statistics Report of the UN Statistics Division; World
Population Prospects: The 2010 Revision of the UN Population Division; and the
International Data Base of the International Programs Center, U.S. Census Bureau. Other
sources include recent demographic surveys such as the Demographic and Health Surveys,
Reproductive Health Surveys, special studies, and direct communication with demographers
and statistical bureaus in the United States and abroad. Specific data sources may be obtained
by contacting the authors of the 2012 World Population Data Sheet. For countries with
complete registration of births and deaths, rates are those most recently reported. For more
developed countries, nearly all vital rates refer to 2011 or 2010.
6
2.3 Background characteristics
Before performing any statistical analysis it is important to know the characteristics or nature
of the data. Therefore it is necessary to study the characteristics of data at the outset of the
analysis. In this chapter we introduce the background characteristics of the variables which
are considered throughout the study. The section of variables is based on the availability of
information in „Population Reference Bureau data sheets: The 2012 World Population Data
Sheet‟. The main concern of this is to determine the homogeneous clusters of the 208
countries of the basis of some demographic characteristics.
2.4 Variables:
Table 2.1: List of variables
Demographic Socio-Economic
1.Birth rate
2.Death rate
3.Rate of natural increase
4.Infant mortality rate
5.Total fertility rate
6.Percent of population of age <15
7.Percent of population of age 65+
8.Life expectancy (total)
9.Life expectancy (male)
10.Life expectancy (female)
11.Inflation rate
12.GDP rate
13. Population Density
14.Urban population
15. Literacy rate
16.Deaths due to NCDs
1. Birth and death rate:
The annual number of births and deaths per 1,000 total population. These rates are often
referred to as “crude rates” since they do not take a population‟s age structure into account.
Thus, crude death rates in more developed countries with a relatively large proportion of
high-mortality older population are often higher than those in less developed countries with
lower life expectancy.
2. Rate of natural increase:
The birth rate minus the death rate, implying the annual rate of population growth without
regard of migration, is expressed as percentage.
3. Infant mortality rate:
The annual number of deaths of infants under age is per 1000 live births. Rates shown with
decimals indicate national statistics reported as completely registered, while those without are
7
estimates from the sources cited above. Rates shown in italics are based upon fewer than 50
annual infant deaths and, as a result, are subject to considerable yearly variability.
4. Total fertility rate:
The average number of children a women would have assuming that current age-specific
birth rate remain constant throughout her childbearing years (usually considered to be ages 15
to 49) .
5. Population under age 15/age 65+:
This is the percentage of the total population in these ages, which are often considered the
“dependent ages”.
6. Life expectancy at birth:
The average number of years a new born infant can expect to live under current mortality
levels.
7. Percent urban:
Percentage of total population living in areas termed “urban” by that country. Typically, the
population living in towns of 2000 or more or in national and provincial capitals is classified
“urban”.
8. Inflation rate:
Inflation rate is the annualized percentage change in a general price index (normally the
consumer price index) over time.
9. GDP rate:
GDP growth on an annual basis adjusted for inflation is expressed as a percent. The growth
rates are year-over-year, and not compounded.
10. Literacy rate:
Literacy rates are based on the most common definition - the ability to read and write at a
specified age.
8
11. Population Density:
Population Density is population per unit of land area; for example, people per square mile or
people per square kilometre of arable land. The data is from the PRB 2011 World Population
Data Sheet.
12. Deaths due to NCDs:
The estimated percentage of all deaths that occurred in 2008 that resulted from NCDs. Data
are from WHO‟s Non-communicable Diseases Country Profiles 2011.
2.5 Statistical package
Since the study is with large data of 195 countries of the world on 12 demographic
characteristics, a suitable technical support is needed for performing the analysis. The entire
analysis is done by personal computer, which is one of the most effective and wonderful
technological invents of modern science. A well-known statistical programming language R
and package SPSS would be used to analyze the data. MS office word has been used for
report writing.
9
Chapter 3
Methodology
3.1 Introduction
This study has to find out the homogeneous clusters from 195 countries of the world
according to some demographic, economic, environmental, educational and health related
characteristics.
First the data is prepared for cluster analysis where cases are 195 countries of the world with
some selected characteristics.
In this connection complete linkage method of agglomerative hierarchical cluster analysis is
used. And finally a dendrogram with the results of the analysis is drawn.
3.2 Cluster analysis
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects
in the same group (called cluster) are more similar (in some sense or another) to each other
than to those in other groups (clusters). It is a main task of exploratory data mining, and a
common technique for statistical data analysis used in many fields, including machine
learning, pattern recognition, image analysis, information retrieval, and bioinformatics.
3.3 Uses
Biology, computational biology and bioinformatics
In the field of plant and animal ecology, cluster analysis is used to describe and to make
spatial and temporal comparisons of communities (assemblages) of organisms in
heterogeneous environments; it is also used in plant systematic to generate
artificial phylogenies or clusters of organisms (individuals) at the species, genus or higher
level that share a number of attributes.
In transcriptomics, clustering is used to build groups of genes with related expression patterns
(also known as co-expressed genes). Often such groups contain functionally related proteins,
such as enzymes for a specific pathway, or genes that are co-regulated. High throughput
experiments using expressed sequence tags (ESTs) or DNA microarrays can be a powerful
tool for genome annotation, a general aspect of genomics.
10
In sequence analysis, clustering is used to group homologous sequences into gene families.
This is a very important concept in bioinformatics, and evolutionary biology in general. See
evolution by gene duplication.
In high-throughput genotyping platforms clustering algorithms are used to automatically
assign genotypes.
In study of human genetic, the similarity of genetic data is used in clustering to infer
population structures.
Market research
Cluster analysis is widely used in market research when working with multivariate data
from surveys and test panels. Market researchers use cluster analysis to partition the
general population of consumers into market segments and to better understand the
relationships between different groups of consumers/potential customers, and for use
in market segmentation, Product positioning, New product development and Selecting test
markets.
World Wide Web
In the study of social networks, clustering may be used to recognize communities within
large groups of people.
In the process of intelligent grouping of the files and websites, clustering may be used to
create a more relevant set of search results compared to normal search engines like Google.
There are currently a number of web based clustering tools such as Clusty.
Flickr's map of photos and other map sites use clustering to reduce the number of markers on
a map. This makes it both faster and reduces the amount of visual clutter.
Computer science
Clustering is useful in software evolution as it helps to reduce legacy properties in code by
reforming functionality that has become dispersed. It is a form of restructuring and hence is a
way of directly preventative maintenance.
In Markov chain Monte Carlo methods, clustering is often utilized to locate and characterize
extrema in the target distribution.
11
Social science
Cluster analysis can be used to identify areas where there are greater incidences of particular
types of crime. By identifying these distinct areas or "hot spots" where a similar crime has
happened over a period of time, it is possible to manage law enforcement resources more
effectively.
Cluster analysis is for example used to identify groups of schools or students with similar
properties.
3.4 Clustering Algorithms
For clustering different algorithm are available. The essential criterion of all the algorithm is
not they attempt to maximize the difference between cluster relative to variation within the
clusters.
Most commonly used clustering algorithm can be classified into two general categories:
1. Hierarchical Cluster Analysis
2. Non- Hierarchical Cluster Analysis
1. Hierarchical Clustering method
Hierarchical clustering techniques proceed by either a series of successive mergers or a series
of successive divisions. Agglomerative hierarchical methods start with the individual objects.
There are basically two types Hierarchical Clustering Method:
a. Agglomerative Hierarchical Clustering Method.
b. Divisive Hierarchical Clustering Method.
a. Agglomerative Hierarchical Clustering Method:
This method starts with the individual objects. Thus, there are initially as many clusters as
objects. The most similar objects are first grouped, and these initial groups are merged
according to their similarities. Eventually, as the similarity decreases, all subgroups are fused
in to a single cluster.
b. Divisive Hierarchical Clustering Method:
Divisive hierarchical methods work in the opposite direction. An initial single group of
objects is divided into two subgroups such that the objects in one subgroup are "far from" the
objects in the other. These subgroups are then further divided into dissimilar subgroups; the
process continues until there are as many subgroups as objects.
12
3.5 Dendrograms
The results of both Agglomerative and Divisive methods may be displayed in the form of a
two dimensional diagram known as a dendrogram. It illustrates the mergers or divisions that
have been made at successive levels.
Steps of Agglomerative Hierarchical Clustering Algorithm
The following are the steps in the agglomerative hierarchical clustering algorithm for
grouping N objects (Items or variables):
1. Start with N clusters, each containing a single entity and an N x N symmetric matrix of
distances (or similarities) D,
D = {dik }
2. Search the distance matrix for the nearest (most similar) pair of clusters. Let the distance
between „most similar‟ clusters U and V be dUW .
3. Merge clusters U and V. Label the newly formed cluster (UV). Update the entries in the
distance matrix by
(a) deleting the rows and columns corresponding to clusters U and V and
(b) adding a row and column giving the distances between cluster (UV) and the remaining
clusters.
4. Repeat Steps 2 and 3 a total of N - 1 times. (All objects will be in a single cluster after the
algorithm terminates.) Record the identity of clusters that are merged and the levels
(distances or similarities) at which the mergers take place.
3.6 Linkage Method:
Linkage method is suitable for clustering items. Three types of linkage method are:
1. Single Linkage
2. Complete Linkage
3. Average Linkage
Single linkage:
Initially, we must find the smallest distance in D = {dik } and merge the corresponding
objects, say, U and V, to get the cluster (UV). For Step 3 of the general algorithm, the
distances between (UV) and any other cluster W are computed by
d(UV )W = min{dUW , dVW }
Here the quantities dUW and dVW are the distance between the nearest neighbours of clusters
U and W and clusters V and W, respectively.
13
Complete linkage:
The general agglomerative algorithm again starts by finding the minimum entry in D ={dik} and merging the corresponding objects, such as U and V, to get clusters (UV). For Step
3 of the general algorithm the distance between (UV) and any other cluster W are computed
by
d(UV )W = max{dUW , dVW }
Here dUW and dVW are the distances between the most distant members of clusters U and W
and clusters V and W, respectively.
Average Linkage:
Here we begin by searching the distance matrix D = {dik } to find the nearest (most similar)
objects for example, U and V. These objects are merged to form the cluster (UV). For Step 3
of the general agglomerative algorithm, the distances between (UV) and the other cluster W
are determined by
d(UV )W = dikki
NUV NW
where dik is the distance between object i in the cluster (UV) and object k in the cluster W,
and Nuv and Nw are the number of items in clusters (UV) and W, respectively.
3.7 Similarity measures
The data for Cluster Analysis usually consists of the values of p variables x1, x2, ………… xp
for n objects. For hierarchical algorithm variables values are then used to produce an array of
distance between the individuals. Here in this study Squared Euclidean Distance measure is
used.
Squared Euclidean Distance
The Euclidean Distance between two p-dimensional observations 𝐱′ = [x1, x2 , ………… xp ]
and 𝐲′ = [y1, y2 , ………… yp ] is:
d x, y = x1 − y1 2 − x2 − y2
2 − … . . ……… . . − xp − yp 2
12
= (𝐱 − 𝐲)′(𝐱 − 𝐲)
14
Chapter 4
Analysis of Data
4.1 Introduction
This study attempts to identify relatively homogeneous groups of cases based on some
demographic and socio-economic characteristics separately.
4.2 Cluster analysis
This study has performed a complete linkage method of agglomerative hierarchical cluster
analysis, selecting all the variables except countries in the variable box and labeled the cases
by countries. Command of a dendrogram in the output is given. This study has changed all
variables to z-scores to yield equal matrix and equal weighting, has selected the squared
Euclidian distance method of determining distance between clusters and the furthest neighbor
as the cluster method.
4.3 Agglomerative Schedule
The procedure followed by cluster analysis at stage 1 is to the cases that the smallest squared
Euclidian distance between them. Then SPSS will complete the distances measures between
all single cases and clusters. Next two cases with smallest distance will be combined, yielding
either two or one cluster of three. This process continues until all classes are clustered into a
single group. The agglomerative schedule of clusters using demographic characteristics is
given below:
15
Agglomeration Schedule
Table 4.1: Agglomerative schedule
Stage Cluster Combined Coefficients Stage Cluster First
Appears
Next Stage
Cluster 1 Cluster 2 Cluster 1 Cluster 2
1 160 169 .026 0 0 45
2 78 120 .041 0 0 68
3 137 191 .043 0 0 40
4 8 126 .059 0 0 93
5 30 103 .069 0 0 24
6 17 170 .072 0 0 72
7 115 134 .081 0 0 25
8 64 83 .096 0 0 57
9 125 130 .108 0 0 51
10 105 119 .108 0 0 90
11 46 58 .120 0 0 17
12 66 140 .134 0 0 57
13 9 156 .135 0 0 59
14 59 186 .138 0 0 51
15 23 192 .144 0 0 65
16 161 179 .145 0 0 48
17 16 46 .146 0 11 59
18 51 136 .149 0 0 42
19 75 151 .150 0 0 27
20 54 148 .151 0 0 46
21 112 157 .154 0 0 120
22 43 187 .155 0 0 58
23 45 139 .156 0 0 85
24 30 101 .159 5 0 61
25 50 115 .161 0 7 79
26 31 49 .178 0 0 122
27 75 96 .186 19 0 76
16
28 18 99 .191 0 0 96
29 2 163 .194 0 0 105
30 162 174 .196 0 0 62
31 15 144 .198 0 0 53
32 95 132 .206 0 0 123
33 21 155 .209 0 0 85
34 100 149 .216 0 0 84
35 36 166 .223 0 0 70
36 13 19 .224 0 0 103
37 10 60 .225 0 0 94
38 42 55 .230 0 0 139
39 121 129 .232 0 0 74
40 108 137 .233 0 3 79
41 14 48 .234 0 0 118
42 51 127 .239 18 0 81
43 61 72 .240 0 0 78
44 76 81 .242 0 0 93
45 147 160 .263 0 1 112
46 54 176 .266 20 0 134
47 5 11 .271 0 0 105
48 67 161 .280 0 16 94
49 91 154 .292 0 0 129
50 135 165 .294 0 0 108
51 59 125 .299 14 9 117
52 102 143 .313 0 0 76
53 15 184 .324 31 0 155
54 142 185 .331 0 0 158
55 37 62 .340 0 0 69
56 110 158 .345 0 0 106
57 64 66 .345 8 12 137
58 43 141 .349 22 0 119
59 9 16 .352 13 17 99
60 133 178 .360 0 0 100
61 30 111 .361 24 0 112
17
62 114 162 .362 0 30 97
63 122 181 .374 0 0 86
64 33 38 .379 0 0 106
65 23 84 .380 15 0 111
66 34 40 .383 0 0 143
67 26 183 .386 0 0 140
68 78 97 .390 2 0 75
69 37 150 .399 55 0 107
70 36 52 .403 35 0 95
71 80 106 .404 0 0 120
72 17 73 .407 6 0 81
73 32 70 .423 0 0 133
74 107 121 .429 0 39 101
75 78 180 .436 68 0 111
76 75 102 .436 27 52 121
77 29 53 .437 0 0 102
78 28 61 .454 0 43 91
79 50 108 .459 25 40 122
80 93 109 .471 0 0 124
81 17 51 .475 72 42 138
82 118 189 .475 0 0 128
83 94 138 .480 0 0 135
84 3 100 .487 0 34 124
85 21 45 .496 33 23 139
86 77 122 .521 0 63 153
87 56 113 .536 0 0 131
88 86 190 .536 0 0 109
89 44 171 .547 0 0 146
90 7 105 .566 0 10 118
91 28 89 .572 78 0 123
92 20 124 .574 0 0 127
93 8 76 .584 4 44 161
94 10 67 .590 37 48 125
95 36 164 .599 70 0 136
18
96 18 39 .606 28 0 154
97 35 114 .613 0 62 145
98 57 87 .618 0 0 156
99 9 168 .641 59 0 117
100 90 133 .701 0 60 156
101 69 107 .718 0 74 147
102 29 153 .725 77 0 133
103 13 79 .727 36 0 150
104 123 182 .736 0 0 153
105 2 5 .794 29 47 143
106 33 110 .796 64 56 149
107 37 88 .809 69 0 160
108 65 135 .826 0 50 134
109 86 146 .836 88 0 152
110 6 152 .859 0 0 148
111 23 78 .872 65 75 125
112 30 147 .880 61 45 130
113 98 167 .914 0 0 162
114 68 177 .915 0 0 152
115 12 24 .927 0 0 158
116 27 173 .958 0 0 140
117 9 59 .972 99 51 157
118 7 14 1.000 90 41 145
119 43 188 1.003 58 0 157
120 80 112 1.033 71 21 176
121 25 75 1.056 0 76 174
122 31 50 1.061 26 79 151
123 28 95 1.065 91 32 166
124 3 93 1.065 84 80 151
125 10 23 1.089 94 111 136
126 175 193 1.101 0 0 160
127 20 116 1.113 92 0 142
128 71 118 1.131 0 82 135
129 91 104 1.152 49 0 177
19
130 30 74 1.204 112 0 146
131 56 145 1.224 87 0 144
132 22 159 1.294 0 0 182
133 29 32 1.312 102 73 172
134 54 65 1.356 46 108 164
135 71 94 1.370 128 83 142
136 10 36 1.428 125 95 148
137 64 85 1.434 57 0 180
138 17 131 1.442 81 0 159
139 21 42 1.473 85 38 141
140 26 27 1.652 67 116 168
141 21 63 1.662 139 0 169
142 20 71 1.718 127 135 173
143 2 34 1.719 105 66 167
144 47 56 1.840 0 131 170
145 7 35 1.840 118 97 178
146 30 44 1.905 130 89 161
147 4 69 1.905 0 101 172
148 6 10 1.938 110 136 167
149 33 194 1.967 106 0 165
150 13 92 2.153 103 0 171
151 3 31 2.250 124 122 171
152 68 86 2.278 114 109 163
153 77 123 2.385 86 104 166
154 18 41 2.404 96 0 170
155 15 117 2.415 53 0 174
156 57 90 2.444 98 100 181
157 9 43 2.478 117 119 169
158 12 142 2.603 115 54 183
159 17 172 2.606 138 0 163
160 37 175 2.720 107 126 164
161 8 30 2.915 93 146 177
162 98 195 3.098 113 0 182
163 17 68 3.113 159 152 184
20
164 37 54 3.166 160 134 176
165 1 33 3.407 0 149 179
166 28 77 3.503 123 153 173
167 2 6 3.831 143 148 175
168 26 128 4.042 140 0 186
169 9 21 4.103 157 141 180
170 18 47 4.265 154 144 186
171 3 13 4.286 151 150 181
172 4 29 4.543 147 133 179
173 20 28 5.117 142 166 184
174 15 25 5.170 155 121 187
175 2 82 5.568 167 0 183
176 37 80 5.603 164 120 188
177 8 91 5.898 161 129 178
178 7 8 7.024 145 177 189
179 1 4 7.094 165 172 190
180 9 64 7.658 169 137 187
181 3 57 7.874 171 156 185
182 22 98 8.447 132 162 191
183 2 12 10.013 175 158 185
184 17 20 10.527 163 173 188
185 2 3 13.059 183 181 189
186 18 26 13.161 170 168 190
187 9 15 13.619 180 174 192
188 17 37 17.958 184 176 193
189 2 7 18.684 185 178 192
190 1 18 18.887 179 186 191
191 1 22 30.058 190 182 193
192 2 9 32.627 189 187 194
193 1 17 66.346 191 188 194
194 1 2 124.896 193 192 0
21
To clarify the above agglomerative schedule stages 1, 100 and 190 are explained.
At stage 1
Case 160 is clustered with case 169 and squared Euclidian distance between these two cases
is 0.026. Neither the cases have been previously clustered (the two zeros under cluster 1 and
cluster 2) and the next stage (when the clusters containing case 160 combines with another
case) is stage 45 (stage 45, case 160 joins case 147).
At stage 100
Case 90 joins with case 133. Case 90 has not been previously clustered (the zeroes under
cluster 1). Case 133 was previously joined with case 178 at stage 60. This forming a cluster
of 3 cases (90, 133, 178). Squared Euclidian distance between 90 and 133 is 0.701. Next
stage where case 90 is going to join is 156.
At stage 190
The clusters containing cases 1 and 18 are joined. Case 1 has been previously joined with
case 4 at stage 179, and case 18 was previously joined with case 26 at stage 186. This
forming a cluster of 4 cases (1, 18, 4, 26). The squared Euclidian distance with of case 1 and
18 is 18.887. Next stage where case 1 is going to join is 191.
Others stages can be explained this way. The process is same for the clusters using socio-
economic characteristics.
4.4 Clusters
10 cluster memberships obtained considering demographic characteristics have been shown
below:
Table 4.2: 10 cluster members (Demographic)
Clusters Country names
1 Afghanistan , Chad, Congo, Dem. Rep. of, Mali, Somalia
2 Angola, Cameroon, Central African Republic, Cote d'Ivoire, Equatorial
Guinea, Guinea, Guinea-Bissau, Lesotho, Malawi, Mozambique, Niger,
Nigeria, Sierra Leone, Zambia
3 Benin, Burkina Faso, Burundi, Congo, Rep. of, Djibouti, Ethiopia,
Gambia, Liberia, Mauritania, Swaziland, Uganda, Zimbabwe
4 Australia, Austria, Belgium, Canada, Cuba, Cyprus, Czech Republic,
Denmark, Finland, France, Germany, Greece, Hong Kong, SAR, Iceland,
Ireland, Italy, Japan, South Korea, Liechtenstein, Luxembourg, Macao,
SAR, Malta, Netherlands, New Zealand, Norway, Portugal, Puerto Rico,
San Marino, Singapore, Slovenia, Spain, Sweden, Switzerland, Taiwan,
United Kingdom, United States
22
5 Albania, Belarus, Bosnia-Herzegovina, Brazil, Bulgaria, Croatia,
Dominica, Estonia, Georgia, Hungary, Jamaica, Latvia, Lithuania,
Macedonia, Moldova, Montenegro, Palau, Poland, Romania, Russia,
Serbia, Slovakia, St. Lucia, Trinidad and Tobago, Turkey, Ukraine
6 Antigua and Barbuda, Argentina, Armenia, Azerbaijan, Bahamas, Bahrain,
Barbados, Belize, Brunei, Cape Verde, Chile, China, Colombia, Costa
Rica, Ecuador, El Salvador, Fiji, French Polynesia, Grenada, Israel,
Kazakhstan, Kuwait, Lebanon, Libya, Malaysia, Maldives, Mauritius,
Mexico, Nicaragua, Oman, Panama, Peru, Qatar, Saudi Arabia, Seychelles,
Sri Lanka, St. Kitts-Nevis, St. Vincent & the Grenadines, Suriname, Syria,
Thailand, Tunisia, United Arab Emirates, Uruguay, Venezuela, Vietnam
7 Algeria, Dominican Republic, Egypt, Guatemala, Honduras, Indonesia,
Iraq, Jordan, Korea, North, Kosovo, Kyrgyzstan, Marshall Islands,
Morocco, Namibia, Paraguay, Philippines, Samoa, Solomon Islands,
Tonga, Tuvalu, Vanuatu
8 Bangladesh, Bhutan, Guyana, India, Iran, Micronesia, Mongolia,
Myanmar, Nepal, Tajikistan, Turkmenistan, Uzbekistan
9 Botswana, Rwanda, Senegal, South Africa, Tanzania
10 Bolivia, Cambodia, Comoros, Eritrea, Gabon, Ghana, Haiti, Kenya,
Kiribati, Laos, Madagascar, Pakistan, Papua New Guinea, Sao Tome and
Principe, Sudan, Timor-Leste, Togo, Yemen
10 cluster memberships obtained considering socio-economic characteristics have been
shown below:
Table 4.3: 10 cluster members (Socio-economic)
Clusters Country names
1 Macao.
2 Libya.
3 Belarus.
4 Hong Kong, Singapore.
5 Bahrain, Belgium Greece, Japan, Malta, Puerto Rico, San Marino.
6 Argentina, Armenia, Australia, Austria, Bahamas ,Bolivia , Brazil
Brunei, Bulgaria, Canada, Cape Verde , Chile, Colombia, Costa, Rica,
Cuba, Cyprus, Czech, Republic, Denmark, Dominica, Dominican,
Republic, Ecuador, Estonia, France, Germany, Hungary, Iceland, Iran,
Ireland, Israel, Italy, Jordan, Korea North, Korea South, Kosovo, Kuwait,
Latvia, Lebanon, Lithuania, Macedonia, Malaysia, Mexico, Mongolia,
Montenegro, Netherlands, New Zealand, Oman, Palau, Panama, Paraguay,
Peru, Philippines, Poland, Qatar, Russia, Saudi Arabia, Seychelles, Spain,
Suriname, Sweden Switzerland, Taiwan, Turkey, Turkmenistan, Ukraine,
United Arab Emirates, United Kingdom, United States, Uruguay,
Venezuela.
7 Ethiopia, Sudan, Yemen.
8 Burkina Faso, Chad, Finland, Liechtenstein, Luxembourg, Mali, Niger,
Norway.
9 Afghanistan, Bangladesh, Benin, Bhutan, Cambodia, Central African
Republic, Congo, Dem. Rep. of, Cote d'Ivoire, Eritrea, Gambia,
23
Guatemala, Guinea, Guinea-Bissau, Haiti, India, Laos, Madagascar,
Mauritania, Mozambique, Nepal, Pakistan, Papua New Guinea, Senegal,
Sierra-Leone, Somalia, Tanzania, Timor-Leste, Togo, Uganda.
10 Albania ,Algeria, Angola, Antigua and Barbuda, Azerbaijan, Barbados,
Belize, Bosnia-Herzegovina, Botswana, Burundi, Cameroon, China,
Comoros, Congo, Rep. of, Croatia, Djibouti, Egypt, El Salvador,
Equatorial, Guinea, Fiji, French, Polynesia, Gabon, Georgia, Ghana,
Grenada, Guyana, Honduras, Indonesia, Iraq, Jamaica, Kazakhstan,
Kenya, Kiribati, Kyrgyzstan, Lesotho, Liberia, Malawi, Maldives,
Marshall Islands, Mauritius, Micronesia, Moldova, Morocco, Myanmar,
Namibia, Nicaragua, Nigeria, Portugal, Romania, Rwanda, Samoa, Sao
Tome and Principe, Serbia, Slovakia, Slovenia, Solomon, Islands, South
Africa, Sri Lanka, St. Kitts-Nevis, St. Lucia, St. Vincent & the Grenadines,
Swaziland, Syria, Tajikistan, Thailand, Tonga, Trinidad and Tobago,
Tunisia, Tuvalu, Uzbekistan, Vanuatu, Vietnam, Zambia, Zimbabwe.
4.5 Comparison of clusters
Countries in the same cluster are approximately similar according to the values of the
characteristics. This can be verified observing the true values of the variables used in the
analysis. If the mean values of the variables of each cluster are observed, dissimilarity can be
seen between the clusters. From this, also merging or addition of clusters into one cluster
considering less dissimilarity can be done.
Mean values of the demographic characteristics of the 10 clusters‟ members are shown
below:
Table 4.4: Mean values of the 10 cluster members (Demographic)
Clusters Birth
rate
Death
rate
Infant
mortality
rate
Rate of
natural
increase
Total
fertility
rate
Ages
<15
Ages
65+
Life
expectancy
(both)
Life
expectancy
(female)
Life
expectancy
(male)
1 44.6 16 118 2.88 6.24 46 2.4 49.2 47.8 50.2
2 39.5 14.36 91.93 2.49 5.35 43.43 3.14 51.43 50.43 52.71
3 37.42 11.42 71.08 2.63 5.01 42.42 2.92 55.42 54.17 56.42
4 11.14 7.81 3.74 0.34 1.58 16.39 15.42 80.42 77.86 83.06
5 11.62 10.35 12.03 0.16 1.54 18 12.77 73.23 69.54 77
6 18.41 5.59 13.57 1.28 2.22 26.02 6.28 74.48 71.91 77.28
7 25.76 5.90 25.33 1.93 3.25 34.29 4.67 69.86 67.67 72.24
8 22.5 6.25 46.67 1.65 2.58 30.5 4.25 67.67 65.5 69.58
9 31.8 11.2 47.2 1.92 4.04 39.2 3.2 55.4 55 56.2
10 32.06 8.11 53.61 2.42 4.19 39.28 3.44 62.83 61.33 64.56
24
Mean values of the socio-economic characteristics of the 10 clusters‟ members are shown
below:
Table 4.5: Mean values of the 10 cluster members (Socio-economic)
Clusters Inflation
rate
GDP
rate
Population
density
Urban
population
percentage
Literacy
rate
Death
due to
NCDs
1 5.8 1 21423 100 91
2 15.9 -59.7 4 78 89 78
3 53.3 5.3 46 75 99 87
4 5.25 4.95 7026.5 100 92.5 79
5 2.11 -1.44 707.86 91.57 95.71 86.83
6 5.40 4.35 109.20 74.10 94.35 80.54
7 23.5 -2.5 47.33 29 55.33 41
8 2.36 2.04 68.88 43.38 19.25 48.71
9 9.54 5.04 124.90 33.39 54.93 36.76
10 6.23 3.68 147.27 43.39 86.39 62.18
25
4.6 Dendrogram
The branching- type of dendrogram allows to trace backward or forward to any individual
case or cluster at any level. In addition, it gives an idea of how great the distance between
cases or groups those are clustered in particular steps are.
A complete linkage algorithm applied to the squared Euclidian distance between 195
countries of the world produces the dendrograms for both demographic and socio-economic
perspective.
From the dendrograms it can be said that the clusters are made on the basis of distance. The
countries in the same cluster are homogeneous and the countries in the different cluster are
non-homogeneous.
Dendrogram using complete linkage method (For demographic characteristics):
Figure 4.1: Dendrogram(Demographic)
26
Dendrogram using complete linkage method (For socio-economic characteristics):
Figure 4.2: Dendrogram(Socio-economic)
27
Chapter 5
Conclusion
5.1 Conclusion and Findings
In this study, it was tried to cluster the countries of the world on the basis of some
demographic and socio-economic characteristics. The study found that:
This study has obtained 10 clusters on the basis of demographic characteristics. In this case,
there are 5 countries in the first cluster, 14 in the second cluster, 12 countries in the third
cluster, 36 countries in fourth cluster, 26 countries in fifth cluster, 46 countries in sixth
cluster, 21 countries in seventh cluster, 12 countries in eighth cluster, 5 countries in ninth
cluster, 18 countries in tenth cluster.
It is clear that most of the developed countries are in the fourth cluster, and most of the
developing countries are in fifth, sixth and eighth clusters. Also most of the under developed
countries are in the second, third, ninth and tenth clusters. Here Bangladesh is in the eighth
cluster.
Again this study has obtained 10 clusters on the basis of socio-economic characteristics. In
this case, there are 1 county in the first, second and third cluster, 2 countries in the fourth
cluster, 7 countries in fifth cluster, 69 countries in sixth cluster, 3 countries in seventh cluster,
8 countries in eighth cluster, 29 countries in ninth cluster, 74 countries in tenth cluster.
In this case, most of the developed countries are in the sixth cluster, and most of the
developing countries are in ninth clusters. Also most of the under developed countries are in
the tenth clusters. Here Bangladesh is in the ninth cluster.
An example shows how the countries are homogeneous in the same cluster. One cluster is
chosen from ten clusters, cluster 4(from demographic characteristics), to compare whether
the countries in the same cluster are homogeneous or not.
28
Cluster 4 (from demographic characteristics)
Table 5.1: The countries of cluster 4 with their demographic characteristics
Country Birth
rate
Death
rate
Infant
mortality
rate
Rate of
natural
increase
Total
fertility
rate
Ages
<15
Ages
65+
Life
expectancy
(both)
Life
expectancy
(female)
Life
expectancy
(male)
Australia 14 7 3.9 0.7 1.9 19 14 82 79 84
Austria 9 9 3.7 0 1.4 15 18 80 77 83
Belgium 12 10 3.4 0.2 1.8 17 17 80 77 82
Canada 11 7 5.1 0.4 1.7 16 14 81 78 83
Cuba 11 8 4.8 0.4 1.7 17 13 78 76 80
Cyprus 12 6 7 0.6 1.4 17 12 78 76 81
Czech
Republic 10 10 2.7 0.1 1.4 14 15 78 74 81
Denmark 11 9 3.4 0.2 1.8 18 17 79 77 81
Finland 11 9 2.6 0.2 1.8 16 18 80 77 83
France 13 9 3.6 0.4 2 19 17 82 78 85
Germany 8 10 3.5 -0.2 1.4 13 21 80 77 83
Greece 10 10 3.2 0.1 1.5 14 19 80 78 82
Hong Kong 14 6 2 0.7 1.2 12 14 83 80 86
Iceland 14 6 2.2 0.9 2 21 12 81 80 84
Ireland 16 6 3.6 1 2.1 21 12 79 77 82
Italy 9 10 3.7 -0.1 1.4 14 21 81 79 84
Japan 9 10 2.6 -0.1 1.4 13 24 83 80 86
Korea, South 10 5 3.2 0.4 1.2 16 11 81 77 84
Liechtenstein 10 6 3.3 0.5 1.5 16 14 80 79 82
Luxembourg 11 7 3 0.4 1.5 18 14 80 78 83
Macao 11 3 3 0.6 1.2 12 7 82 79 85
Malta 10 7 6.7 0.2 1.4 15 16 79 78 82
Netherlands 11 8 3.8 0.3 1.7 17 16 81 79 83
New Zealand 14 7 5.1 0.8 2.1 20 14 81 79 83
Norway 12 8 2.8 0.4 1.9 19 15 81 79 83
Portugal 9 10 3 -0.1 1.3 15 19 79 76 82
Puerto Rico 11 8 8.8 0.4 1.6 20 15 79 75 83
San Marino 10 7 2 0.4 1.2 15 16 83 81 86
Singapore 10 4 2 0.5 1.2 17 9 81 79 84
Slovenia 11 9 2.5 0.2 1.5 14 17 80 76 83
Spain 10 8 3.5 0.2 1.4 15 17 82 79 85
Sweden 12 10 2.5 0.3 1.9 17 19 82 80 84
Switzerland 10 8 3.8 0.2 1.5 15 17 82 80 84
Taiwan 9 7 4.1 0.1 1.1 15 11 79 76 82
United
Kingdom 13 9 4.5 0.4 2 18 17 80 78 82
United States 13 8 6.1 0.5 1.9 20 13 78 75 80
From above table it is clear that the countries in same groups are very much homogeneous
and countries from different clusters are not homogenous.
29
5.2 Limitations
Data of all variables are not available for all countries, so this study had to analyze
only 195 countries.
This study got limited time to study, so more necessary things like more variables are
need to be added.
5.3 Further scopes
This study may analyze data using other clustering methods, and can compare for best
result.
This study may use more variables on more aspects for more countries.
30
Bibliography
1. Anderson R E et al. (2005). Multivariate Data Analysis. 5th
edition, Singapore:
Pearson education.
2. Anderberg M. R. (1973). Cluster Analysis for Application. New York : Academic
press.
3. Grein, Andreas F., S. Prakash Sethi and Lawrence G. Tatum. (2008). A Dynamic
Analysis of Country Clusters, the Role of Corruption, and Implications for Global
Firms. Online at
http://idec.gr/iier/new/CORRUPTION%20CONFERENCE/A%20Dynamic%20Analy
sis%20of%20Country%20Clusters%20-%20ANDREAS%20GREIN.pdf
4. Gupta, V., Paul J. Hanges and Peter Dorfman. (2002). Cultural clusters: methodology
and findings. Journal of World Business, 37: 11-15.
5. Jhonson R. A. and D. W. Wichern. (2002). Applied multivariate statistical analysis.
5th
edition, Singapore: Pearson education.
6. Ketels, Christian. (2004). European Clusters, Structural Change in Europe 3 –
Innovative City and Business Regions, Hagbarth Publications.
7. Ketels, Christian and Solvell, Orjan. Clusters in the EU-10 new member
countries.Online at
http://www.economicresearch.se/public/userfiles/70efdf2ec9b086079795c442636b55f
b/files/EU-10%20Report%20Valencia%2011-27-06%20CK.pdf
8. Michael, E. Porter. (1998). Clusters and Competition. in: on competition, Harvard
Business School Press. Cambridge.
9. Ward Jr. J.H. (1963). Hierarchical Grouping to Optimize an Objective Function.
Journal of the American Statistical Association, 58(301): 236-344.
Websites
www.prb.org- Population Reference Bureau
www.census.gov- U.S Census Bureau
www.unfpa.org- United Nations Population Fund
www.r-tutor.com