clustering the countries of world on basis of some selected demographic and socio-economic...

1

Chapter 1

Prelude

1.1 Introduction

Most people think demography is just math in disguise a sort of dry social accounting. Once

exposed to the subject, many change their minds. They come to appreciate the profound

impact demographic forces have no societies. This has never been truer than during the past

half century, a period in which the United States and other societies have experienced

unprecedented social and demographic change. Since these demographic forces have not

been stilled, they will continue to cause social change and to shape social programs for the

balance of our lives and beyond.

People also find demography fascinating because it deals with so many personally relevant

topics. Nearly all the major events of people‟s lives have demographic implications birth,

schooling, marriage, occupational choices, childbearing, retirement and death.

Demography or population studies, is a discipline, an “interdiscipline,” and a sub discipline.

It is clearly a discipline because it is a field with its own body of interrelated concepts,

techniques, journals, departments, and professional associations. Demography is also an

interdisciplinary field because it draws its subject matter and methods from many disciplines,

including sociology, economics, biology, geography, history, and the health sciences. Finally,

demography is also considered a sub discipline within some of these same major disciplines.

In most universities, demography courses are taught within the sociology curriculum, perhaps

because population phenomena have so long been linked to social process. Demography is

defined as the study of human populations: their size, composition and distribution as well as

the causes and consequences of changes in these characteristics. Populations are never static.

They grow or decline through the interplay of three demographic processes: birth, death, and

migration. If some groups within a population grow or decline faster than others, the

composition of the whole is altered.

1.2 World’s Demographic and Socio-economic View

We entered the 20th

century with a population of 1.6 billion people. We entered the 21st

century with 6.1 billion people. And in 2007, world population is 6.6 billion. The increase in

the size of the human population in the last-half century is unprecedented. And nearly all of

the growth is occurring in the less developed countries. Currently, 80 million people are being

added every year in less developed countries, compared with about 1.6 million in more

developed countries. While the less developed countries will keep growing, the more

developed countries may grow slowly or not at all.

Population change is linked to economic development, education, the environment, the status

of women, epidemics, and other health threats, and access to family planning information and

services. All of these factors interact with every facet of our lives, regardless of where we

live. It is remarkable that, despite many new developments over the past 50 years, one fact

2

looks very much the same: populations are growing most rapidly where such growth can be

afforded the least.

In 2008, the world reaches an invisible but momentous milestone: for the first time in history,

more than half its population, 3.3 billion people, will be living in urban areas. By 2030, this is

expected to swell to almost 5 billion. Many of the new urbanities will be poor. Their failure,

the future of cities in developing countries, the future of humanity itself, all depend very

much on decisions made now in preparation for this growth.

While the world‟s urban population grew very rapidly (from 220 million to 2.8 billion) over

the 20th

century, the next few decades will see an unprecedented scale of urban growth in the

developing world. This will be particularly notable in Africa and Asia where the urban

population will double between 2000 and 2030, the towns and the cities of the developing

world will make up 80 percent of urban humanity.

Urbanization, the increase in the urban share of total population, is inevitable, but it can also

be positive. The current concentration of poverty, slum growth and social disruption in cities

does paint a threatening picture: yet no country in the industrial age has ever achieved

significant economic growth without urbanization. Cities concentrate poverty, but they also

represent the best hope of escaping it.

Cities also embody the environmental damage done by modern civilization; yet experts and

policymakers increasingly recognize the potential value of cities to long-term sustainability.

If cities create environmental problems, they also contain the solutions. The potential benefits

of urbanization far outweigh the disadvantages. The challenge is in learning how to exploit

the possibilities, in 1994, the Programmed of Action of the International Conference on

Population and Development called on governments to “respond of the need of all citizens,

including urban squatters, for personal safety, basic infrastructure and services, to eliminate

health and social problems….” More recently, the United Nations Millennium Declaration

drew attention to the growing significance of urban poverty, specifying, in Target 11, the

modest ambition of achieving by 2020 “a significant improvement in the lives of at least 100

million slum dwellers”.

1.3 Literature Review

Clusters have become the focal point of many new policy initiatives in the last few years, in

Europe as elsewhere around the globe. The challenge set out by the Lisbon European Council

in 2000 to make Europe “the world‟s most competitive and dynamic knowledge based

economy” in particular has sparked interest in new approaches to economic policy for

competitiveness. Mobilizing the potential of clusters is seen as critical to reach this ambitious

goal (See Christian Ketels, European Clusters, Structural Change in Europe 3 – Innovative

City and Business Regions, Hagbarth Publications, 2004).

Michael Porter defines clusters as geographically proximate groups of interconnected

companies and associated institutions in a particular field, linked by commonalities and

complementarities. Clusters are important, because they allow companies to be more

productive and innovative than they could be in isolation. And clusters are important because

3

they reduce the barriers to entry for new business creation relative to other locations (See

Michael Porter, Clusters and Competition, Harvard Business School Press, 2008).

Cluster analysis is a very important and effective statistical tool. It is used to find

homogenous groups. Some reports about cluster analysis are reviewed here:

Stan Salvador and Philip Chan, Determining the number of clusters/segments in Hierarchical

clustering/segmentation algorithms, Proc. 16th

IEEE international conference on tools with

Al, pp. 576-584, 2004.

Can, F., Ozkarahan, E.A. (1990) “Concepts and effectiveness of the cover coefficient based

clustering methodology for text databases.” ACM transactions on database system. 15 (4)

483-517.

Information theory, inference and learning algorithms by David J.C. Mackay includes chapter

on k-means clustering, soft k-means clustering and derivations including the E-M algorithm

and the variational view of the E-M algorithm.

MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate

observations, proceedings of 5th

Berkeley Symposium on Mathematical Statistics and

Probability, Berkeley, University of California Press, 1:281-297.

Andreas F. Grein, S. Prakash Sethi, Lawrence G. Tatum, A Dynamic Analysis of Country

Clusters, the Role of Corruption, and Implications for Global Firms, The Patterns of

Corruption in the 21st Century, 6-7 September, 2008, Athens, Greece.

1.4 Relevance of this Study

In this study, we are interested to know how the countries of the world are grouped together

in the sense of some demographic view. We want to see which countries are similar and

which countries are dissimilar on those demographic characteristics. For this reason we want

to make a „Cluster Analysis‟ to make the clusters of the countries. In this study the countries

will be clustered or grouped on the basis of some chosen demographic and socio-economic

characteristics.

1.5 Objective of the study

The main objective of the study is given below:

To find the homogeneous groups of the countries of the world on the basis of some selected

demographic and socio-economic characteristics.

4

1.6 Organization of the study report

This study report is organized in five chapters. Brief descriptions of the chapters are given

below:

1. In the first chapter there is an introduction, a view of world‟s demography, literature

review, relevance of the study, and the objectives of the study.

2. In the second chapter the data and the relevant variables are described.

3. In the third chapter the methodology has been described.

4. The fourth chapter contains the analysis.

5. And the findings are stated in the fifth chapter.

5

Chapter 2

Data and variables

2.1 Introduction

This chapter provides a brief description of the data, data source and the variables. Also the

software package that has been used for this study has been discussed.

2.2 Source of the data

This study utilizes the data extracted from the Population Reference Bureau website and the

2012 World Population Data Sheet.

The Data Sheet lists all geopolitical entities with populations of 150,000 or more and all

members of the UN. These include sovereign states, dependencies, overseas departments, and

some territories whose status or boundaries may be undetermined or in dispute. More

developed regions, following the UN classification, comprise all of Europe and North

America, plus Australia, Japan, and New Zealand. All other regions and countries are

classified as less developed.

World and Regional Totals: Regional population totals are independently rounded and include

small countries or areas not shown. Regional and world rates and percentages are weighted

averages of countries for which data are available; regional averages are shown when data or

estimates are available for at least three-quarters of the region‟s population.

World Population Data Sheets from different years should not be used as a time series.

Fluctuations in values from year to year often reflect revisions based on new data or estimates

rather than actual changes in levels. Additional information on likely trends and consistent

time series can be obtained from PRB, and are also available from UN and U.S. Census

Bureau publications and websites.

The rates and figures are primarily compiled from the following sources: official country

statistical yearbooks, bulletins, and websites; the United Nations Demographic Yearbook,

2009-2010 and Population and Vital Statistics Report of the UN Statistics Division; World

Population Prospects: The 2010 Revision of the UN Population Division; and the

International Data Base of the International Programs Center, U.S. Census Bureau. Other

sources include recent demographic surveys such as the Demographic and Health Surveys,

Reproductive Health Surveys, special studies, and direct communication with demographers

and statistical bureaus in the United States and abroad. Specific data sources may be obtained

by contacting the authors of the 2012 World Population Data Sheet. For countries with

complete registration of births and deaths, rates are those most recently reported. For more

developed countries, nearly all vital rates refer to 2011 or 2010.

6

2.3 Background characteristics

Before performing any statistical analysis it is important to know the characteristics or nature

of the data. Therefore it is necessary to study the characteristics of data at the outset of the

analysis. In this chapter we introduce the background characteristics of the variables which

are considered throughout the study. The section of variables is based on the availability of

information in „Population Reference Bureau data sheets: The 2012 World Population Data

Sheet‟. The main concern of this is to determine the homogeneous clusters of the 208

countries of the basis of some demographic characteristics.

2.4 Variables:

Table 2.1: List of variables

Demographic Socio-Economic

1.Birth rate

2.Death rate

3.Rate of natural increase

4.Infant mortality rate

5.Total fertility rate

6.Percent of population of age <15

7.Percent of population of age 65+

8.Life expectancy (total)

9.Life expectancy (male)

10.Life expectancy (female)

11.Inflation rate

12.GDP rate

13. Population Density

14.Urban population

15. Literacy rate

16.Deaths due to NCDs

1. Birth and death rate:

The annual number of births and deaths per 1,000 total population. These rates are often

referred to as “crude rates” since they do not take a population‟s age structure into account.

Thus, crude death rates in more developed countries with a relatively large proportion of

high-mortality older population are often higher than those in less developed countries with

lower life expectancy.

2. Rate of natural increase:

The birth rate minus the death rate, implying the annual rate of population growth without

regard of migration, is expressed as percentage.

3. Infant mortality rate:

The annual number of deaths of infants under age is per 1000 live births. Rates shown with

decimals indicate national statistics reported as completely registered, while those without are

7

estimates from the sources cited above. Rates shown in italics are based upon fewer than 50

annual infant deaths and, as a result, are subject to considerable yearly variability.

4. Total fertility rate:

The average number of children a women would have assuming that current age-specific

birth rate remain constant throughout her childbearing years (usually considered to be ages 15

to 49) .

5. Population under age 15/age 65+:

This is the percentage of the total population in these ages, which are often considered the

“dependent ages”.

6. Life expectancy at birth:

The average number of years a new born infant can expect to live under current mortality

levels.

7. Percent urban:

Percentage of total population living in areas termed “urban” by that country. Typically, the

population living in towns of 2000 or more or in national and provincial capitals is classified

“urban”.

8. Inflation rate:

Inflation rate is the annualized percentage change in a general price index (normally the

consumer price index) over time.

9. GDP rate:

GDP growth on an annual basis adjusted for inflation is expressed as a percent. The growth

rates are year-over-year, and not compounded.

10. Literacy rate:

Literacy rates are based on the most common definition - the ability to read and write at a

specified age.

http://en.wikipedia.org/wiki/Price_index

http://en.wikipedia.org/wiki/Consumer_price_index

8

11. Population Density:

Population Density is population per unit of land area; for example, people per square mile or

people per square kilometre of arable land. The data is from the PRB 2011 World Population

Data Sheet.

12. Deaths due to NCDs:

The estimated percentage of all deaths that occurred in 2008 that resulted from NCDs. Data

are from WHO‟s Non-communicable Diseases Country Profiles 2011.

2.5 Statistical package

Since the study is with large data of 195 countries of the world on 12 demographic

characteristics, a suitable technical support is needed for performing the analysis. The entire

analysis is done by personal computer, which is one of the most effective and wonderful

technological invents of modern science. A well-known statistical programming language R

and package SPSS would be used to analyze the data. MS office word has been used for

report writing.

9

Chapter 3

Methodology

3.1 Introduction

This study has to find out the homogeneous clusters from 195 countries of the world

according to some demographic, economic, environmental, educational and health related

characteristics.

First the data is prepared for cluster analysis where cases are 195 countries of the world with

some selected characteristics.

In this connection complete linkage method of agglomerative hierarchical cluster analysis is

used. And finally a dendrogram with the results of the analysis is drawn.

3.2 Cluster analysis

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects

in the same group (called cluster) are more similar (in some sense or another) to each other

than to those in other groups (clusters). It is a main task of exploratory data mining, and a

common technique for statistical data analysis used in many fields, including machine

learning, pattern recognition, image analysis, information retrieval, and bioinformatics.

3.3 Uses

Biology, computational biology and bioinformatics

In the field of plant and animal ecology, cluster analysis is used to describe and to make

spatial and temporal comparisons of communities (assemblages) of organisms in

heterogeneous environments; it is also used in plant systematic to generate

artificial phylogenies or clusters of organisms (individuals) at the species, genus or higher

level that share a number of attributes.

In transcriptomics, clustering is used to build groups of genes with related expression patterns

(also known as co-expressed genes). Often such groups contain functionally related proteins,

such as enzymes for a specific pathway, or genes that are co-regulated. High throughput

experiments using expressed sequence tags (ESTs) or DNA microarrays can be a powerful

tool for genome annotation, a general aspect of genomics.

http://en.wikipedia.org/wiki/Data_mining

http://en.wikipedia.org/wiki/Statistics

http://en.wikipedia.org/wiki/Data_analysis

http://en.wikipedia.org/wiki/Machine_learning



http://en.wikipedia.org/wiki/Pattern_recognition

http://en.wikipedia.org/wiki/Image_analysis

http://en.wikipedia.org/wiki/Information_retrieval

http://en.wikipedia.org/wiki/Bioinformatics

http://en.wikipedia.org/wiki/Biology

http://en.wikipedia.org/wiki/Bioinformatics

http://en.wikipedia.org/wiki/Plant

http://en.wikipedia.org/wiki/Animal

http://en.wikipedia.org/wiki/Ecology

http://en.wikipedia.org/wiki/Systematics

http://en.wikipedia.org/wiki/Phylogeny

http://en.wikipedia.org/wiki/Transcriptome

http://en.wikipedia.org/wiki/Genes

http://en.wikipedia.org/wiki/Enzyme

http://en.wikipedia.org/wiki/Metabolic_pathway

http://en.wikipedia.org/wiki/Expressed_sequence_tag

http://en.wikipedia.org/wiki/DNA_microarray

http://en.wikipedia.org/wiki/Genome_annotation

http://en.wikipedia.org/wiki/Genomics

10

In sequence analysis, clustering is used to group homologous sequences into gene families.

This is a very important concept in bioinformatics, and evolutionary biology in general. See

evolution by gene duplication.

In high-throughput genotyping platforms clustering algorithms are used to automatically

assign genotypes.

In study of human genetic, the similarity of genetic data is used in clustering to infer

population structures.

Market research

Cluster analysis is widely used in market research when working with multivariate data

from surveys and test panels. Market researchers use cluster analysis to partition the

general population of consumers into market segments and to better understand the

relationships between different groups of consumers/potential customers, and for use

in market segmentation, Product positioning, New product development and Selecting test

markets.

World Wide Web

In the study of social networks, clustering may be used to recognize communities within

large groups of people.

In the process of intelligent grouping of the files and websites, clustering may be used to

create a more relevant set of search results compared to normal search engines like Google.

There are currently a number of web based clustering tools such as Clusty.

Flickr's map of photos and other map sites use clustering to reduce the number of markers on

a map. This makes it both faster and reduces the amount of visual clutter.

Computer science

Clustering is useful in software evolution as it helps to reduce legacy properties in code by

reforming functionality that has become dispersed. It is a form of restructuring and hence is a

way of directly preventative maintenance.

In Markov chain Monte Carlo methods, clustering is often utilized to locate and characterize

extrema in the target distribution.

http://en.wikipedia.org/wiki/Sequence_analysis

http://en.wikipedia.org/wiki/List_of_gene_families

http://en.wikipedia.org/wiki/Evolutionary_biology

http://en.wikipedia.org/wiki/Gene_duplication

http://en.wikipedia.org/wiki/Genotype

http://en.wikipedia.org/wiki/Human_genetic_clustering

http://en.wikipedia.org/wiki/Market_research

http://en.wikipedia.org/wiki/Statistical_survey

http://en.wikipedia.org/wiki/Population

http://en.wikipedia.org/wiki/Consumer

http://en.wikipedia.org/wiki/Customers

http://en.wikipedia.org/wiki/Market_segmentation

http://en.wikipedia.org/wiki/Positioning_(marketing)

http://en.wikipedia.org/wiki/New_product_development

http://en.wikipedia.org/wiki/World_wide_web

http://en.wikipedia.org/wiki/Social_network

http://en.wikipedia.org/wiki/Communities

http://en.wikipedia.org/wiki/Google

http://en.wikipedia.org/wiki/Clusty

http://en.wikipedia.org/wiki/Flickr

http://en.wikipedia.org/wiki/Computer_science

http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo

11

Social science

Cluster analysis can be used to identify areas where there are greater incidences of particular

types of crime. By identifying these distinct areas or "hot spots" where a similar crime has

happened over a period of time, it is possible to manage law enforcement resources more

effectively.

Cluster analysis is for example used to identify groups of schools or students with similar

properties.

3.4 Clustering Algorithms

For clustering different algorithm are available. The essential criterion of all the algorithm is

not they attempt to maximize the difference between cluster relative to variation within the

clusters.

Most commonly used clustering algorithm can be classified into two general categories:

1. Hierarchical Cluster Analysis

2. Non- Hierarchical Cluster Analysis

1. Hierarchical Clustering method

Hierarchical clustering techniques proceed by either a series of successive mergers or a series

of successive divisions. Agglomerative hierarchical methods start with the individual objects.

There are basically two types Hierarchical Clustering Method:

a. Agglomerative Hierarchical Clustering Method.

b. Divisive Hierarchical Clustering Method.

a. Agglomerative Hierarchical Clustering Method:

This method starts with the individual objects. Thus, there are initially as many clusters as

objects. The most similar objects are first grouped, and these initial groups are merged

according to their similarities. Eventually, as the similarity decreases, all subgroups are fused

in to a single cluster.

b. Divisive Hierarchical Clustering Method:

Divisive hierarchical methods work in the opposite direction. An initial single group of

objects is divided into two subgroups such that the objects in one subgroup are "far from" the

objects in the other. These subgroups are then further divided into dissimilar subgroups; the

process continues until there are as many subgroups as objects.

12

3.5 Dendrograms

The results of both Agglomerative and Divisive methods may be displayed in the form of a

two dimensional diagram known as a dendrogram. It illustrates the mergers or divisions that

have been made at successive levels.

Steps of Agglomerative Hierarchical Clustering Algorithm

The following are the steps in the agglomerative hierarchical clustering algorithm for

grouping N objects (Items or variables):

1. Start with N clusters, each containing a single entity and an N x N symmetric matrix of

distances (or similarities) D,

D = {dik }

2. Search the distance matrix for the nearest (most similar) pair of clusters. Let the distance

between „most similar‟ clusters U and V be dUW .

3. Merge clusters U and V. Label the newly formed cluster (UV). Update the entries in the

distance matrix by

(a) deleting the rows and columns corresponding to clusters U and V and

(b) adding a row and column giving the distances between cluster (UV) and the remaining

clusters.

4. Repeat Steps 2 and 3 a total of N - 1 times. (All objects will be in a single cluster after the

algorithm terminates.) Record the identity of clusters that are merged and the levels

(distances or similarities) at which the mergers take place.

3.6 Linkage Method:

Linkage method is suitable for clustering items. Three types of linkage method are:

1. Single Linkage

2. Complete Linkage

3. Average Linkage

Single linkage:

Initially, we must find the smallest distance in D = {dik } and merge the corresponding

objects, say, U and V, to get the cluster (UV). For Step 3 of the general algorithm, the

distances between (UV) and any other cluster W are computed by

d(UV )W = min{dUW , dVW }

Here the quantities dUW and dVW are the distance between the nearest neighbours of clusters

U and W and clusters V and W, respectively.

13

Complete linkage:

The general agglomerative algorithm again starts by finding the minimum entry in D ={dik} and merging the corresponding objects, such as U and V, to get clusters (UV). For Step

3 of the general algorithm the distance between (UV) and any other cluster W are computed

by

d(UV )W = max{dUW , dVW }

Here dUW and dVW are the distances between the most distant members of clusters U and W

and clusters V and W, respectively.

Average Linkage:

Here we begin by searching the distance matrix D = {dik } to find the nearest (most similar)

objects for example, U and V. These objects are merged to form the cluster (UV). For Step 3

of the general agglomerative algorithm, the distances between (UV) and the other cluster W

are determined by

d(UV )W = dikki

NUV NW

where dik is the distance between object i in the cluster (UV) and object k in the cluster W,

and Nuv and Nw are the number of items in clusters (UV) and W, respectively.

3.7 Similarity measures

The data for Cluster Analysis usually consists of the values of p variables x1, x2, ………… xp

for n objects. For hierarchical algorithm variables values are then used to produce an array of

distance between the individuals. Here in this study Squared Euclidean Distance measure is

used.

Squared Euclidean Distance

The Euclidean Distance between two p-dimensional observations 𝐱′ = [x1, x2 , ………… xp ]

and 𝐲′ = [y1, y2 , ………… yp ] is:

d x, y = x1 − y1 2 − x2 − y2

2 − … . . ……… . . − xp − yp 2

12

= (𝐱 − 𝐲)′(𝐱 − 𝐲)

14

Chapter 4

Analysis of Data

4.1 Introduction

This study attempts to identify relatively homogeneous groups of cases based on some

demographic and socio-economic characteristics separately.

4.2 Cluster analysis

This study has performed a complete linkage method of agglomerative hierarchical cluster

analysis, selecting all the variables except countries in the variable box and labeled the cases

by countries. Command of a dendrogram in the output is given. This study has changed all

variables to z-scores to yield equal matrix and equal weighting, has selected the squared

Euclidian distance method of determining distance between clusters and the furthest neighbor

as the cluster method.

4.3 Agglomerative Schedule

The procedure followed by cluster analysis at stage 1 is to the cases that the smallest squared

Euclidian distance between them. Then SPSS will complete the distances measures between

all single cases and clusters. Next two cases with smallest distance will be combined, yielding

either two or one cluster of three. This process continues until all classes are clustered into a

single group. The agglomerative schedule of clusters using demographic characteristics is

given below:

15

Agglomeration Schedule

Table 4.1: Agglomerative schedule

Stage Cluster Combined Coefficients Stage Cluster First

Appears

Next Stage

Cluster 1 Cluster 2 Cluster 1 Cluster 2

1 160 169 .026 0 0 45

2 78 120 .041 0 0 68

3 137 191 .043 0 0 40

4 8 126 .059 0 0 93

5 30 103 .069 0 0 24

6 17 170 .072 0 0 72

7 115 134 .081 0 0 25

8 64 83 .096 0 0 57

9 125 130 .108 0 0 51

10 105 119 .108 0 0 90

11 46 58 .120 0 0 17

12 66 140 .134 0 0 57

13 9 156 .135 0 0 59

14 59 186 .138 0 0 51

15 23 192 .144 0 0 65

16 161 179 .145 0 0 48

17 16 46 .146 0 11 59

18 51 136 .149 0 0 42

19 75 151 .150 0 0 27

20 54 148 .151 0 0 46

21 112 157 .154 0 0 120

22 43 187 .155 0 0 58

23 45 139 .156 0 0 85

24 30 101 .159 5 0 61

25 50 115 .161 0 7 79

26 31 49 .178 0 0 122

27 75 96 .186 19 0 76

16

28 18 99 .191 0 0 96

29 2 163 .194 0 0 105

30 162 174 .196 0 0 62

31 15 144 .198 0 0 53

32 95 132 .206 0 0 123

33 21 155 .209 0 0 85

34 100 149 .216 0 0 84

35 36 166 .223 0 0 70

36 13 19 .224 0 0 103

37 10 60 .225 0 0 94

38 42 55 .230 0 0 139

39 121 129 .232 0 0 74

40 108 137 .233 0 3 79

41 14 48 .234 0 0 118

42 51 127 .239 18 0 81

43 61 72 .240 0 0 78

44 76 81 .242 0 0 93

45 147 160 .263 0 1 112

46 54 176 .266 20 0 134

47 5 11 .271 0 0 105

48 67 161 .280 0 16 94

49 91 154 .292 0 0 129

50 135 165 .294 0 0 108

51 59 125 .299 14 9 117

52 102 143 .313 0 0 76

53 15 184 .324 31 0 155

54 142 185 .331 0 0 158

55 37 62 .340 0 0 69

56 110 158 .345 0 0 106

57 64 66 .345 8 12 137

58 43 141 .349 22 0 119

59 9 16 .352 13 17 99

60 133 178 .360 0 0 100

61 30 111 .361 24 0 112

17

62 114 162 .362 0 30 97

63 122 181 .374 0 0 86

64 33 38 .379 0 0 106

65 23 84 .380 15 0 111

66 34 40 .383 0 0 143

67 26 183 .386 0 0 140

68 78 97 .390 2 0 75

69 37 150 .399 55 0 107

70 36 52 .403 35 0 95

71 80 106 .404 0 0 120

72 17 73 .407 6 0 81

73 32 70 .423 0 0 133

74 107 121 .429 0 39 101

75 78 180 .436 68 0 111

76 75 102 .436 27 52 121

77 29 53 .437 0 0 102

78 28 61 .454 0 43 91

79 50 108 .459 25 40 122

80 93 109 .471 0 0 124

81 17 51 .475 72 42 138

82 118 189 .475 0 0 128

83 94 138 .480 0 0 135

84 3 100 .487 0 34 124

85 21 45 .496 33 23 139

86 77 122 .521 0 63 153

87 56 113 .536 0 0 131

88 86 190 .536 0 0 109

89 44 171 .547 0 0 146

90 7 105 .566 0 10 118

91 28 89 .572 78 0 123

92 20 124 .574 0 0 127

93 8 76 .584 4 44 161

94 10 67 .590 37 48 125

95 36 164 .599 70 0 136

18

96 18 39 .606 28 0 154

97 35 114 .613 0 62 145

98 57 87 .618 0 0 156

99 9 168 .641 59 0 117

100 90 133 .701 0 60 156

101 69 107 .718 0 74 147

102 29 153 .725 77 0 133

103 13 79 .727 36 0 150

104 123 182 .736 0 0 153

105 2 5 .794 29 47 143

106 33 110 .796 64 56 149

107 37 88 .809 69 0 160

108 65 135 .826 0 50 134

109 86 146 .836 88 0 152

110 6 152 .859 0 0 148

111 23 78 .872 65 75 125

112 30 147 .880 61 45 130

113 98 167 .914 0 0 162

114 68 177 .915 0 0 152

115 12 24 .927 0 0 158

116 27 173 .958 0 0 140

117 9 59 .972 99 51 157

118 7 14 1.000 90 41 145

119 43 188 1.003 58 0 157

120 80 112 1.033 71 21 176

121 25 75 1.056 0 76 174

122 31 50 1.061 26 79 151

123 28 95 1.065 91 32 166

124 3 93 1.065 84 80 151

125 10 23 1.089 94 111 136

126 175 193 1.101 0 0 160

127 20 116 1.113 92 0 142

128 71 118 1.131 0 82 135

129 91 104 1.152 49 0 177

19

130 30 74 1.204 112 0 146

131 56 145 1.224 87 0 144

132 22 159 1.294 0 0 182

133 29 32 1.312 102 73 172

134 54 65 1.356 46 108 164

135 71 94 1.370 128 83 142

136 10 36 1.428 125 95 148

137 64 85 1.434 57 0 180

138 17 131 1.442 81 0 159

139 21 42 1.473 85 38 141

140 26 27 1.652 67 116 168

141 21 63 1.662 139 0 169

142 20 71 1.718 127 135 173

143 2 34 1.719 105 66 167

144 47 56 1.840 0 131 170

145 7 35 1.840 118 97 178

146 30 44 1.905 130 89 161

147 4 69 1.905 0 101 172

148 6 10 1.938 110 136 167

149 33 194 1.967 106 0 165

150 13 92 2.153 103 0 171

151 3 31 2.250 124 122 171

152 68 86 2.278 114 109 163

153 77 123 2.385 86 104 166

154 18 41 2.404 96 0 170

155 15 117 2.415 53 0 174

156 57 90 2.444 98 100 181

157 9 43 2.478 117 119 169

158 12 142 2.603 115 54 183

159 17 172 2.606 138 0 163

160 37 175 2.720 107 126 164

161 8 30 2.915 93 146 177

162 98 195 3.098 113 0 182

163 17 68 3.113 159 152 184

20

164 37 54 3.166 160 134 176

165 1 33 3.407 0 149 179

166 28 77 3.503 123 153 173

167 2 6 3.831 143 148 175

168 26 128 4.042 140 0 186

169 9 21 4.103 157 141 180

170 18 47 4.265 154 144 186

171 3 13 4.286 151 150 181

172 4 29 4.543 147 133 179

173 20 28 5.117 142 166 184

174 15 25 5.170 155 121 187

175 2 82 5.568 167 0 183

176 37 80 5.603 164 120 188

177 8 91 5.898 161 129 178

178 7 8 7.024 145 177 189

179 1 4 7.094 165 172 190

180 9 64 7.658 169 137 187

181 3 57 7.874 171 156 185

182 22 98 8.447 132 162 191

183 2 12 10.013 175 158 185

184 17 20 10.527 163 173 188

185 2 3 13.059 183 181 189

186 18 26 13.161 170 168 190

187 9 15 13.619 180 174 192

188 17 37 17.958 184 176 193

189 2 7 18.684 185 178 192

190 1 18 18.887 179 186 191

191 1 22 30.058 190 182 193

192 2 9 32.627 189 187 194

193 1 17 66.346 191 188 194

194 1 2 124.896 193 192 0

21

To clarify the above agglomerative schedule stages 1, 100 and 190 are explained.

At stage 1

Case 160 is clustered with case 169 and squared Euclidian distance between these two cases

is 0.026. Neither the cases have been previously clustered (the two zeros under cluster 1 and

cluster 2) and the next stage (when the clusters containing case 160 combines with another

case) is stage 45 (stage 45, case 160 joins case 147).

At stage 100

Case 90 joins with case 133. Case 90 has not been previously clustered (the zeroes under

cluster 1). Case 133 was previously joined with case 178 at stage 60. This forming a cluster

of 3 cases (90, 133, 178). Squared Euclidian distance between 90 and 133 is 0.701. Next

stage where case 90 is going to join is 156.

At stage 190

The clusters containing cases 1 and 18 are joined. Case 1 has been previously joined with

case 4 at stage 179, and case 18 was previously joined with case 26 at stage 186. This

forming a cluster of 4 cases (1, 18, 4, 26). The squared Euclidian distance with of case 1 and

18 is 18.887. Next stage where case 1 is going to join is 191.

Others stages can be explained this way. The process is same for the clusters using socio-

economic characteristics.

4.4 Clusters

10 cluster memberships obtained considering demographic characteristics have been shown

below:

Table 4.2: 10 cluster members (Demographic)

Clusters Country names

1 Afghanistan , Chad, Congo, Dem. Rep. of, Mali, Somalia

2 Angola, Cameroon, Central African Republic, Cote d'Ivoire, Equatorial

Guinea, Guinea, Guinea-Bissau, Lesotho, Malawi, Mozambique, Niger,

Nigeria, Sierra Leone, Zambia

3 Benin, Burkina Faso, Burundi, Congo, Rep. of, Djibouti, Ethiopia,

Gambia, Liberia, Mauritania, Swaziland, Uganda, Zimbabwe

4 Australia, Austria, Belgium, Canada, Cuba, Cyprus, Czech Republic,

Denmark, Finland, France, Germany, Greece, Hong Kong, SAR, Iceland,

Ireland, Italy, Japan, South Korea, Liechtenstein, Luxembourg, Macao,

SAR, Malta, Netherlands, New Zealand, Norway, Portugal, Puerto Rico,

San Marino, Singapore, Slovenia, Spain, Sweden, Switzerland, Taiwan,

United Kingdom, United States

22

5 Albania, Belarus, Bosnia-Herzegovina, Brazil, Bulgaria, Croatia,

Dominica, Estonia, Georgia, Hungary, Jamaica, Latvia, Lithuania,

Macedonia, Moldova, Montenegro, Palau, Poland, Romania, Russia,

Serbia, Slovakia, St. Lucia, Trinidad and Tobago, Turkey, Ukraine

6 Antigua and Barbuda, Argentina, Armenia, Azerbaijan, Bahamas, Bahrain,

Barbados, Belize, Brunei, Cape Verde, Chile, China, Colombia, Costa

Rica, Ecuador, El Salvador, Fiji, French Polynesia, Grenada, Israel,

Kazakhstan, Kuwait, Lebanon, Libya, Malaysia, Maldives, Mauritius,

Mexico, Nicaragua, Oman, Panama, Peru, Qatar, Saudi Arabia, Seychelles,

Sri Lanka, St. Kitts-Nevis, St. Vincent & the Grenadines, Suriname, Syria,

Thailand, Tunisia, United Arab Emirates, Uruguay, Venezuela, Vietnam

7 Algeria, Dominican Republic, Egypt, Guatemala, Honduras, Indonesia,

Iraq, Jordan, Korea, North, Kosovo, Kyrgyzstan, Marshall Islands,

Morocco, Namibia, Paraguay, Philippines, Samoa, Solomon Islands,

Tonga, Tuvalu, Vanuatu

8 Bangladesh, Bhutan, Guyana, India, Iran, Micronesia, Mongolia,

Myanmar, Nepal, Tajikistan, Turkmenistan, Uzbekistan

9 Botswana, Rwanda, Senegal, South Africa, Tanzania

10 Bolivia, Cambodia, Comoros, Eritrea, Gabon, Ghana, Haiti, Kenya,

Kiribati, Laos, Madagascar, Pakistan, Papua New Guinea, Sao Tome and

Principe, Sudan, Timor-Leste, Togo, Yemen

10 cluster memberships obtained considering socio-economic characteristics have been

shown below:

Table 4.3: 10 cluster members (Socio-economic)

Clusters Country names

1 Macao.

2 Libya.

3 Belarus.

4 Hong Kong, Singapore.

5 Bahrain, Belgium Greece, Japan, Malta, Puerto Rico, San Marino.

6 Argentina, Armenia, Australia, Austria, Bahamas ,Bolivia , Brazil

Brunei, Bulgaria, Canada, Cape Verde , Chile, Colombia, Costa, Rica,

Cuba, Cyprus, Czech, Republic, Denmark, Dominica, Dominican,

Republic, Ecuador, Estonia, France, Germany, Hungary, Iceland, Iran,

Ireland, Israel, Italy, Jordan, Korea North, Korea South, Kosovo, Kuwait,

Latvia, Lebanon, Lithuania, Macedonia, Malaysia, Mexico, Mongolia,

Montenegro, Netherlands, New Zealand, Oman, Palau, Panama, Paraguay,

Peru, Philippines, Poland, Qatar, Russia, Saudi Arabia, Seychelles, Spain,

Suriname, Sweden Switzerland, Taiwan, Turkey, Turkmenistan, Ukraine,

United Arab Emirates, United Kingdom, United States, Uruguay,

Venezuela.

7 Ethiopia, Sudan, Yemen.

8 Burkina Faso, Chad, Finland, Liechtenstein, Luxembourg, Mali, Niger,

Norway.

9 Afghanistan, Bangladesh, Benin, Bhutan, Cambodia, Central African

Republic, Congo, Dem. Rep. of, Cote d'Ivoire, Eritrea, Gambia,

23

Guatemala, Guinea, Guinea-Bissau, Haiti, India, Laos, Madagascar,

Mauritania, Mozambique, Nepal, Pakistan, Papua New Guinea, Senegal,

Sierra-Leone, Somalia, Tanzania, Timor-Leste, Togo, Uganda.

10 Albania ,Algeria, Angola, Antigua and Barbuda, Azerbaijan, Barbados,

Belize, Bosnia-Herzegovina, Botswana, Burundi, Cameroon, China,

Comoros, Congo, Rep. of, Croatia, Djibouti, Egypt, El Salvador,

Equatorial, Guinea, Fiji, French, Polynesia, Gabon, Georgia, Ghana,

Grenada, Guyana, Honduras, Indonesia, Iraq, Jamaica, Kazakhstan,

Kenya, Kiribati, Kyrgyzstan, Lesotho, Liberia, Malawi, Maldives,

Marshall Islands, Mauritius, Micronesia, Moldova, Morocco, Myanmar,

Namibia, Nicaragua, Nigeria, Portugal, Romania, Rwanda, Samoa, Sao

Tome and Principe, Serbia, Slovakia, Slovenia, Solomon, Islands, South

Africa, Sri Lanka, St. Kitts-Nevis, St. Lucia, St. Vincent & the Grenadines,

Swaziland, Syria, Tajikistan, Thailand, Tonga, Trinidad and Tobago,

Tunisia, Tuvalu, Uzbekistan, Vanuatu, Vietnam, Zambia, Zimbabwe.

4.5 Comparison of clusters

Countries in the same cluster are approximately similar according to the values of the

characteristics. This can be verified observing the true values of the variables used in the

analysis. If the mean values of the variables of each cluster are observed, dissimilarity can be

seen between the clusters. From this, also merging or addition of clusters into one cluster

considering less dissimilarity can be done.

Mean values of the demographic characteristics of the 10 clusters‟ members are shown

below:

Table 4.4: Mean values of the 10 cluster members (Demographic)

Clusters Birth

rate

Death

rate

Infant

mortality

rate

Rate of

natural

increase

Total

fertility

rate

Ages

<15

Ages

65+

Life

expectancy

(both)

Life

expectancy

(female)

Life

expectancy

(male)

1 44.6 16 118 2.88 6.24 46 2.4 49.2 47.8 50.2

2 39.5 14.36 91.93 2.49 5.35 43.43 3.14 51.43 50.43 52.71

3 37.42 11.42 71.08 2.63 5.01 42.42 2.92 55.42 54.17 56.42

4 11.14 7.81 3.74 0.34 1.58 16.39 15.42 80.42 77.86 83.06

5 11.62 10.35 12.03 0.16 1.54 18 12.77 73.23 69.54 77

6 18.41 5.59 13.57 1.28 2.22 26.02 6.28 74.48 71.91 77.28

7 25.76 5.90 25.33 1.93 3.25 34.29 4.67 69.86 67.67 72.24

8 22.5 6.25 46.67 1.65 2.58 30.5 4.25 67.67 65.5 69.58

9 31.8 11.2 47.2 1.92 4.04 39.2 3.2 55.4 55 56.2

10 32.06 8.11 53.61 2.42 4.19 39.28 3.44 62.83 61.33 64.56

24

Mean values of the socio-economic characteristics of the 10 clusters‟ members are shown

below:

Table 4.5: Mean values of the 10 cluster members (Socio-economic)

Clusters Inflation

rate

GDP

rate

Population

density

Urban

population

percentage

Literacy

rate

Death

due to

NCDs

1 5.8 1 21423 100 91

2 15.9 -59.7 4 78 89 78

3 53.3 5.3 46 75 99 87

4 5.25 4.95 7026.5 100 92.5 79

5 2.11 -1.44 707.86 91.57 95.71 86.83

6 5.40 4.35 109.20 74.10 94.35 80.54

7 23.5 -2.5 47.33 29 55.33 41

8 2.36 2.04 68.88 43.38 19.25 48.71

9 9.54 5.04 124.90 33.39 54.93 36.76

10 6.23 3.68 147.27 43.39 86.39 62.18

25

4.6 Dendrogram

The branching- type of dendrogram allows to trace backward or forward to any individual

case or cluster at any level. In addition, it gives an idea of how great the distance between

cases or groups those are clustered in particular steps are.

A complete linkage algorithm applied to the squared Euclidian distance between 195

countries of the world produces the dendrograms for both demographic and socio-economic

perspective.

From the dendrograms it can be said that the clusters are made on the basis of distance. The

countries in the same cluster are homogeneous and the countries in the different cluster are

non-homogeneous.

Dendrogram using complete linkage method (For demographic characteristics):

Figure 4.1: Dendrogram(Demographic)

26

Dendrogram using complete linkage method (For socio-economic characteristics):

Figure 4.2: Dendrogram(Socio-economic)

27

Chapter 5

Conclusion

5.1 Conclusion and Findings

In this study, it was tried to cluster the countries of the world on the basis of some

demographic and socio-economic characteristics. The study found that:

This study has obtained 10 clusters on the basis of demographic characteristics. In this case,

there are 5 countries in the first cluster, 14 in the second cluster, 12 countries in the third

cluster, 36 countries in fourth cluster, 26 countries in fifth cluster, 46 countries in sixth

cluster, 21 countries in seventh cluster, 12 countries in eighth cluster, 5 countries in ninth

cluster, 18 countries in tenth cluster.

It is clear that most of the developed countries are in the fourth cluster, and most of the

developing countries are in fifth, sixth and eighth clusters. Also most of the under developed

countries are in the second, third, ninth and tenth clusters. Here Bangladesh is in the eighth

cluster.

Again this study has obtained 10 clusters on the basis of socio-economic characteristics. In

this case, there are 1 county in the first, second and third cluster, 2 countries in the fourth

cluster, 7 countries in fifth cluster, 69 countries in sixth cluster, 3 countries in seventh cluster,

8 countries in eighth cluster, 29 countries in ninth cluster, 74 countries in tenth cluster.

In this case, most of the developed countries are in the sixth cluster, and most of the

developing countries are in ninth clusters. Also most of the under developed countries are in

the tenth clusters. Here Bangladesh is in the ninth cluster.

An example shows how the countries are homogeneous in the same cluster. One cluster is

chosen from ten clusters, cluster 4(from demographic characteristics), to compare whether

the countries in the same cluster are homogeneous or not.

28

Cluster 4 (from demographic characteristics)

Table 5.1: The countries of cluster 4 with their demographic characteristics

Country Birth

rate

Death

rate

Infant

mortality

rate

Rate of

natural

increase

Total

fertility

rate

Ages

<15

Ages

65+

Life

expectancy

(both)

Life

expectancy

(female)

Life

expectancy

(male)

Australia 14 7 3.9 0.7 1.9 19 14 82 79 84

Austria 9 9 3.7 0 1.4 15 18 80 77 83

Belgium 12 10 3.4 0.2 1.8 17 17 80 77 82

Canada 11 7 5.1 0.4 1.7 16 14 81 78 83

Cuba 11 8 4.8 0.4 1.7 17 13 78 76 80

Cyprus 12 6 7 0.6 1.4 17 12 78 76 81

Czech

Republic 10 10 2.7 0.1 1.4 14 15 78 74 81

Denmark 11 9 3.4 0.2 1.8 18 17 79 77 81

Finland 11 9 2.6 0.2 1.8 16 18 80 77 83

France 13 9 3.6 0.4 2 19 17 82 78 85

Germany 8 10 3.5 -0.2 1.4 13 21 80 77 83

Greece 10 10 3.2 0.1 1.5 14 19 80 78 82

Hong Kong 14 6 2 0.7 1.2 12 14 83 80 86

Iceland 14 6 2.2 0.9 2 21 12 81 80 84

Ireland 16 6 3.6 1 2.1 21 12 79 77 82

Italy 9 10 3.7 -0.1 1.4 14 21 81 79 84

Japan 9 10 2.6 -0.1 1.4 13 24 83 80 86

Korea, South 10 5 3.2 0.4 1.2 16 11 81 77 84

Liechtenstein 10 6 3.3 0.5 1.5 16 14 80 79 82

Luxembourg 11 7 3 0.4 1.5 18 14 80 78 83

Macao 11 3 3 0.6 1.2 12 7 82 79 85

Malta 10 7 6.7 0.2 1.4 15 16 79 78 82

Netherlands 11 8 3.8 0.3 1.7 17 16 81 79 83

New Zealand 14 7 5.1 0.8 2.1 20 14 81 79 83

Norway 12 8 2.8 0.4 1.9 19 15 81 79 83

Portugal 9 10 3 -0.1 1.3 15 19 79 76 82

Puerto Rico 11 8 8.8 0.4 1.6 20 15 79 75 83

San Marino 10 7 2 0.4 1.2 15 16 83 81 86

Singapore 10 4 2 0.5 1.2 17 9 81 79 84

Slovenia 11 9 2.5 0.2 1.5 14 17 80 76 83

Spain 10 8 3.5 0.2 1.4 15 17 82 79 85

Sweden 12 10 2.5 0.3 1.9 17 19 82 80 84

Switzerland 10 8 3.8 0.2 1.5 15 17 82 80 84

Taiwan 9 7 4.1 0.1 1.1 15 11 79 76 82

United

Kingdom 13 9 4.5 0.4 2 18 17 80 78 82

United States 13 8 6.1 0.5 1.9 20 13 78 75 80

From above table it is clear that the countries in same groups are very much homogeneous

and countries from different clusters are not homogenous.

29

5.2 Limitations

Data of all variables are not available for all countries, so this study had to analyze

only 195 countries.

This study got limited time to study, so more necessary things like more variables are

need to be added.

5.3 Further scopes

This study may analyze data using other clustering methods, and can compare for best

result.

This study may use more variables on more aspects for more countries.

30

Bibliography

1. Anderson R E et al. (2005). Multivariate Data Analysis. 5th

edition, Singapore:

Pearson education.

2. Anderberg M. R. (1973). Cluster Analysis for Application. New York : Academic

press.

3. Grein, Andreas F., S. Prakash Sethi and Lawrence G. Tatum. (2008). A Dynamic

Analysis of Country Clusters, the Role of Corruption, and Implications for Global

Firms. Online at

http://idec.gr/iier/new/CORRUPTION%20CONFERENCE/A%20Dynamic%20Analy

sis%20of%20Country%20Clusters%20-%20ANDREAS%20GREIN.pdf

4. Gupta, V., Paul J. Hanges and Peter Dorfman. (2002). Cultural clusters: methodology

and findings. Journal of World Business, 37: 11-15.

5. Jhonson R. A. and D. W. Wichern. (2002). Applied multivariate statistical analysis.

5th

edition, Singapore: Pearson education.

6. Ketels, Christian. (2004). European Clusters, Structural Change in Europe 3 –

Innovative City and Business Regions, Hagbarth Publications.

7. Ketels, Christian and Solvell, Orjan. Clusters in the EU-10 new member

countries.Online at

http://www.economicresearch.se/public/userfiles/70efdf2ec9b086079795c442636b55f

b/files/EU-10%20Report%20Valencia%2011-27-06%20CK.pdf

8. Michael, E. Porter. (1998). Clusters and Competition. in: on competition, Harvard

Business School Press. Cambridge.

9. Ward Jr. J.H. (1963). Hierarchical Grouping to Optimize an Objective Function.

Journal of the American Statistical Association, 58(301): 236-344.

Websites

www.prb.org- Population Reference Bureau

www.census.gov- U.S Census Bureau

www.unfpa.org- United Nations Population Fund

www.r-tutor.com

http://www.economicresearch.se/public/userfiles/70efdf2ec9b086079795c442636b55fb/files/EU-10%20Report%20Valencia%2011-27-06%20CK.pdf

http://www.economicresearch.se/public/userfiles/70efdf2ec9b086079795c442636b55fb/files/EU-10%20Report%20Valencia%2011-27-06%20CK.pdf

http://www.prb.org/

http://www.census.gov/

http://www.unfpa.org/

clustering the countries of world on basis of some selected demographic and socio-economic...

Documents

population change

demographic change

developed countries

social change

world population

population phenomena

population studies

demography courses