method bias in comparative research: problems of construct ... · method bias in comparative...

1

Method Bias in Comparative Research: Problems of Construct Validity and Suggestions for Multivariate Modeling as Exemplified by the

Measurement of Ethnic Diversity. (forthcoming in Journal of Mathematical Sociology)

Robert Neumann Technische Universität Dresden Chair in Macrosociology, 01062 Dresden, Germany, [email protected]

Peter Graeff Goethe University Frankfurt/ Main Social Science Department, 60054 Frankfurt am Main Germany, [email protected]

Keywords: Multitrait- Multimethod, Multivariate Modeling, Validity, Ethnic Diversity, Index Operationalization ABSTRACT

This study investigates indices of ethnic diversity for method effects due to differences in operationalization. It adapts the methodology of Multitrait-Multimethod analysis to the field of socio-economic macro research. While approaches for checking construct validity or method bias are common in psychology and educational research, they are rarely applied in other social science disciplines such as economics and political science. We find that measures of polarization show considerable method effects which call their empirical utilization for multivariate modelling into question. We conclude that improved overall measures, which are based on a theoretical augmentation of a construct’s measurement instrument, do not necessarily lead to improved results, neither on the measurement level nor for hypothesis testing.

2

1 INTRODUCTION

Recently, construct validity and method effects have become an issue in macro level research, as there is a trend among researchers to improve the indicators that reflect their theoretical constructs at the measurement level (Langbein and Knack 2008; Thomas 2010; Kaufman, Kraay & Mastruzzi 2010; Neumann and Graeff in press). Over the last decades, sociological, political and economic indices were created to measure certain phenomena such as corruption or governance performance on a country´s level. Sometimes the reasoning behind the introduction of new measures that supplement or augment existing instruments is inspired by a lack of explanatory power. If a measure did not perform well in multivariate models, scholars created different indices to find empirical support for a theoretical argument. Hence, they doubted the capability of a measure to convey the adequate meaning of a theoretical proposition. This implies – in a more technical sense – that the measure lacks construct validity. Originally brought about in Psychology (Campbell and Fiske 1959, Cronbach and Meehl 1955), the term construct validity refers to the degree by which one can infer from operationalizations the theoretical constructs that should be measured. The elaboration of these measurement characteristics is not so prominent in macro data analysis, but we will address some methodological pitfalls that we perceive when the social construct of ethnic diversity is operationalized. Most complex constructs in social science cannot be measured directly; they are, moreover, conceived as a latent construct that is constituted by indicators on the measurement level. Ethnic diversity must be considered as such a construct that refers to observable indicators (such as language or origin of birth) which are measured and amount to the (unobservable) latent construct. Over the last decade, competing theories and measurement approaches for the construct of ethnic diversity have come under scrutiny as evidence from cross-country analyses increasingly yielded contradictory results on the effects of ethnicity at the macro level. This is especially the case in the field of conflict research (Kalyvas 2008), a field that, amongst others, we will adopt to illustrate the methodological problems that may occur at the macro level1. The relationship between ethnic diversity and the likelihood of conflict merits its validity from typologies of conflict which classify political struggles for power and resources among groups along ethnic cleavages as one of the most prominent forms of conflict (Horowitz 1985, Sambanis 2001). But the failure to detect any robust relationships between ethnic diversity and the incidence of conflicts in cross-country regressions has questioned the role of distinct ethnicities as one of the main causes for violent conflicts (Fearon and Laitin 2003, Collier and Hoeffler 2004). In order to provide support for the negative role of ethnic diversity within a country, authors started to refine its operationalization. Ethnic diversity remains a general concept in social sciences; ambiguous empirical results may indeed cast doubt on both the validity of the theoretical concept and its operationalization. Hence, testing for effects of operationalization (“method” effects) should yield insights on both methodological and theoretical aspects of macro research.

1 To be clear about the objectives of the article, we emphasize that the article does not focus on the relationship of ethnic diversity and conflict. Conflict research represents the origin of two of the instruments we will disentangle and there is a lively discussion about the validity of different measurement instruments of ethnic diversity (see Esteban and Schneider 2008 and the subsequent articles in this issue). Still, in our paper the examples from conflict research serve mainly illustrative purposes.

3

The present article addresses potential methodological problems by performing a Multitrait-Multimethod (MTMM) analysis for three distinct indices of ethnic diversity, namely the index of Ethno-linguistic Fractionalization, the Ethnic Groups in Power Index (Cederman and Girardin 2007) and the index of Ethnic Polarization (Montalvo and Reynal-Querol 2005). The indices differ in their degree of how they incorporate features of ethnic diversity but also of how they mingle conditions and results of diversity as well. MTMM analysis has its origins in Psychometrics as a tool to determine the construct validity of measurement instruments (Campbell and Fiske 1959), with the goal of distinguishing the variance attributed on a set of measured constructs and the variance that is related to specific and distinct methods in use. The on-going methodological disentanglements of ethnicity often follow the intentions of previously outlined schools of thought. We try to avoid the pitfall that social scientists potentially are trapped in what Merton once called a conceptual imprisonment (Merton 1949), in which the framework and the concepts of analysis lead “…to an unfortunate mixing of definition and explanation” (Braithwaite 1985, p.3). We rather attempt to shed light on the quantitative characteristics of the three ethnic diversity measures and try to simultaneously compare the proportion of method specific variance that is “mapped” onto macro level variables through the instruments itself. This would represent measurement error (Podsakoff et al. 2003) and would imply serious problems concerning the construct validity of an instrument. The article is structured as follows: Section 2 will outline the methodological pitfalls of interest for the present study, Section 3 will introduce the trait of ethnic diversity and Section 4 will subsequently introduce the three methods of measuring ethnicity at the country level. Section 5 will present the Multitrait-Multimethod approach, Section 6 covers the description of the data. In Section 7 we present the results and Section 8 the conclusion.

2 CONSTRUCT VALIDITY AND METHOD EFFECTS

Classic Test Theory posits that a measurement instrument shows construct validity if it is possible to derive hypotheses from the measured construct and these hypotheses are subsequently confirmed by other tests. Construct validity will therefore be achieved if a theory is confirmed by different method approaches (Campbell and Fiske 1959). While attempts to study construct validity originally concentrated on questions of whether an instrument grasps the meaning of a construct, a different line of research has focussed on the grounds of why construct validity may emerge. The methodological discussions in the fields of educational and psychological research (Podsakoff et al. 2003; Sartori and Pasini 2007) are centred on the premise that achievement of construct validity may seriously be hampered by method effects that constitute measurement error. Method effects or method bias consists of a random and a systematic component and constitutes a portion of the variance that is attributable to measures in use instead of the construct that is supposed to be measured (e.g. Cote and Buckley 1987; Bagozzi and Yi 1991). FIGURE 1 represents a simplified categorization of an instrument’s measurement properties distinguishing the degree of method specific variance from the degree of construct validity. -Insert FIGURE 1 here-

4

We argue that the case of low method specificity and high construct validity represents a 1st-best characteristic of an index measuring a certain phenomenon. If results are not driven by constraints of index construction and if the relevant features of interest are still actually measured, the measurement can be used intuitively for describing and comparing the construct (such a construct seems also to be the best for statistical applications, see Raykov and Widaman 1995). If high construct validity goes hand in hand with high method specificity, measurement errors on the construct level will arouse serious problems (e.g. Podsakoff et al. 2003). While this property may be perceived as favourable for individual-level data due to a potential high accuracy or a supposedly high inclusiveness of certain aspects of a trait, a macro data index with this property will not only reflect the construct of interest but also its ways of measuring. Hence, it implies that it will be impossible to draw clear interpretations. Ambiguous statistical results in relation to the construct might occur2. Low degrees in both properties would resemble a case in which a general measure attempts to account for constructs of undetermined content. If results can be found, it is hardly possible to trace its origins. It is not known whether an intended meaning is actually captured by the data, nor is it evident how strongly a specific measure influences a certain outcome. Finally, low construct validity might also occur together with high method specificity. Under such circumstances, one only knows that a result has shown up because a certain method has been applied. It is not perceptible, however, which construct has been observed by the measurement. While these characteristics are well recognized in individual-level analyses across disciplines3, measurement error due to method bias in macro-level research is systematically overlooked. What distinguishes method bias in macro research from method bias in micro-level studies4? In the present study, the methods of interest are not different ways of collecting data, as it has been the case within classical MTMM analyses for individual-level data. Rather, we will explore aspects of operationalization that have been in the focus of researchers who attempt to model socio-demographic data in accordance with theoretical suggestions. This attempt often ignores the problem that one will not be able to gain clear-cut results when a construct of interest is operationalized with high method specificity in terms of large method specific variance. For macro data, these measurement problems become evident when contradictory empirical results concerning a construct trigger debates among scholars. As a consequence, ill-defined or vaguely operationalized constructs of interest are criticized as ill-suited for multivariate modeling and alternative operationalizations are

2 If such indices are used as independent variables in multivariate designs, this could come at the expense that it is no longer clear which content is actually represented by the indicators. Under such conditions, results derived by the indices are more blurred the larger the method variance turns out to be. 3 Podsakoff et al. (2003, p. 882) list four groups of potential sources of method bias for micro-level studies that include rather common effects, item characteristics and item context issues as well as measurement context effects. Examples are for instance framing effects of questions (e.g. Gamliel and Peer 2006) or the potential of possible social desirability for certain answers (e.g. Ganster et al. 1983). 4 Potential method bias in macro research might occur particularly on two stages of index building: when data are being operationalized and aggregated. The first step usually constitutes a pre-condition for the second step (Adcock & Collier 2001) whereas both aspects may or may not rely on assembled micro-level data which are prone to the common method biases listed by Podsakoff et al. (2003).

5

suggested. Usually, the authors refrain from providing a test about the construct validity of their new measure. They are also only seldom aware of the method effects of their new measure. In the next section, we will tackle these problems for three measures of ethnic diversity and start with a summary of the main issues in the on-going debate on ethnic diversity. 3 ETHNIC DIVERSITY – CONCEPTS AND APPLICATIONS

Issues of diversity are at the centre of attention in a variety of studies within contemporary social science research dealing with macro-level indicators. Ethnic diversity draws attention when researchers focus on a country´s socio-demographic composition and its influence on the likelihood for domestic conflicts, levels of generalized trust, economic growth rates, quality of governance and development of democracy. Cleavages between ethnic groups based on the psychological process of categorization, identification and comparison enhance a process of in-group favouritism and discrimination of distinct out-groups (Tajfel and Turner 1979, Tajfel 1982). This “…sense of belonging…” (Wimmer 2008, p. 973) is conceived in many sociological, economic and political approaches as a source driving the perception of social differences and the awareness of being different (such as in the prominent theories by Weber and Marx). In congruence with these theoretical propositions, these micro-level processes translate into institutional boundaries that hamper growth (Easterly and Levine 1997), attenuate democratic development (Lipset et al. 1993; Przeworski et al. 2000), increase corruption (Mauro 1995, You and Khagram 2005) or generally weaken the institutional quality of a country (Djankov et al. 2003, LaPorta et al. 1998). Ethnic diversity was part of a broader concept of inequality that was suspected to hamper growth (see Weede 1996, 69), but in more recent publications mixed results were reported. The hypotheses about the negative influence of ethnic fractionalization on country levels of generalized trust received empirical support in some studies (Mishler and Rose 2001; Putnam 2007) but were also canvassed in others (Bjørnskov 2008; Hooghe et al. 2009). Also the negative effect of ethnic diversity on forms of collective actions and on the provision of public goods as reported in the empirical studies (Alesina, Baqir, and Easterly 1999; Costa and Kahn 2003; Alesina and La Ferrara 2005) appears to have quite diverse preconditions itself, which makes it more difficult to draw clear conclusion on the process, respectively (Habramanyan et al. 2007). The ambiguity whether ethnic diversity turns salient on the aggregate level is best reflected within the literature on the causes of civil war and conflict (Weede 2005). From the existence of ambiguous results on the role of diversity one can identify two separate developments within this stream of comparative research on ethnicity and conflict. The first is a process of theoretical differentiation that has developed into at least four broad categories5. Second, what unites all schools of quantitative analyses is the criticism that the instruments of analyses of ethnicity are not able to adequately map ethnicity onto the aggregate level. Especially the index of Ethno-Linguistic Fractionalization (ELF) has been criticized

5 As summarized by Wimmer, Cedermann and Min (2009), the “greed-and-opportunity perspective”, “the minority-mobilization school”, “the diversity breeds conflict perspective” and Wimmer’s own “institutionalist configurational perspective” all compete for the explanatory recipe of how ethnic diversity and “ethnic boundary making” (Wimmer 2008) may translate into conflict or war. For useful overviews on the topic of war and civil conflict see Collier and Hoeffler (2007) and Blattmann and Miguel (2010).

6

to be inappropriate in measuring ethnic structure and ethnic identities in their multidimensional fashion (Fearon and Posner 2001, Chandra and Wilkinson 2008). As a consequence, the “…contemporary civil-war literature has systematically overlooked what a long tradition of qualitative scholarship has established, namely, that ethnic and national identities relate their ethnic and national identities from the relationship to the state.” (Cederman and Girardin 2007, p.182) This implies that on the measurement level, empirical testing might be improved if the operationalization of a construct of interest is refined. As for ethnic fractionalization, a criterion must exist, however, that assigns people to different ethnic groups in order to grasp the core meaning of fragmentation (usually this criterion refers to a commonly shared group feature like language). The more complex a construct of interest is, the more features constitute its core meaning. If, for example, ethnic fragmentation is scrutinized in reference to “different ethnopolitical dynamics” (Wimmer et al 2009, p. 320), it may no longer be sufficient to focus on language differences between groups. The term “ethnic diversity” does no longer only include the differences between groups but also elements of political power. For matters of operationalization this implies that indicators measuring ethno-linguistic distinctions alone might not work appropriately and might turn out insignificant when they are to represent the influence of “ethnic fragmentation with different ethno-political dynamics” in multivariate testing. “Refinement” of operationalization would mean here that the content of a construct is augmented by additional material. We will present ways of proceeding in the next section. The different approaches aim at improvement in measurement in order to reconcile the theoretical importance of ethnicity with its empirical impact in multivariate models. 4 MEASURING ETHNIC DIVERSITY

Providing empirical answers to questions about the impact of ethnicity on subsequent social incidents such as conflicts or democratic destabilization presupposes a sound definition of the term’s meaning. Trying to advance a definition of what ethnicity “really is” (Wimmer 2008, p. 92) is a difficult matter from an empirical perspective when one refers to a limited space of indicators (such as linguistic cleavages). Within the operationalization of ethnic diversity, scholars rely on data on the number of ethnic groups in a country and the relative size of these ethnic groups. Several instruments have been developed over the last 50 years which try to map the socio-demographic features of ethnicity and diversity. One of the indices most often used is the index of Ethno-Linguistic Fractionalization or ELF. Originally the ELF index is based on the work of Russian Ethnologues who attempted to map ethnicity in the Atlas Narodov Mira (1969). The index reflects the probability that two randomly chosen citizens in a country belong to a different ethnic group whereas group belonging is attributed by language. It applies the Herfindahl formula of concentration to calculate the index value for a country using the size proportion of an ethnic group s in a country i as follows:

(1) ∑−= 21 ii sELF

The index has been criticized in a variety of ways over the last decade. To summarize the main arguments against the application of the ELF, it is pointed out that the index ignores the multilevel perspective of mapping ethnicity onto certain contexts by failing to differentiate between ethnic structure and ethnic

7

practice (Chandra and Wilkinson 2008), while these features of ethnic identity seem to overlap. It is also maintained that multiple dimensions of ethnicity due to differences in race, language or religion which constitute ethnic identities require a decision about the categories that should be included into the index and, foremost, about the extent by which certain categories are disaggregated (Posner and Laitin 2001; Fearon and Laitin 2003). The latter point implies that there is no unanimous understanding of the term ‘ethnicity’ which has already led to discussion on the definition level. As the ELF is invariant across time and country, it lacks crucial features for analyses on forms of migration and cultural differentiation when dynamic changes are considered (Posner and Laitin 2001). The main point of criticism is, however, that this index ignores the way ethnic cleavages become politically salient (Cederman and Girardin 2007). Over the years, some of these critical points have been acknowledged but the same mathematical formula for calculating the ELF still applies (Alesina et al. 2003; Fearon and Laitin 2003). The new indices derived from cross-country evaluations deviated from the categorizations of the Atlas Novida Mira by differentiating between groups, respecting different aspects of structure and assessing ethnic practices. Others have attempted to acknowledge the shortcomings of the ELF by compiling new instruments to map diversity in a different manner. The aspect of political salience of ethnicity has inspired the introduction of an index of Ethnic Power Relations denoted EPR (Cederman and Giradin 2007). Building on ideas from Posner (2004) who tried to identify politically relevant ethnic groups in Africa to explain differences in economic growth rates, the institutional approach of this index aims at mapping ethnicity by control and access to governmental power. Drawing from classical findings that highlight the role of ethnic division as a source for different paths of institutionalization, from the choice of the electoral system (Lijphart 1992; Boix 1999; but see Brambor, Clark, and Golder 2006) to extent of redistributive politics (Lipset and Marks 2000; Alesina and Glaeser 2004) the EPR index attempts to reflect disproportional representation of ethnic groups in government as the foundation of its operationalization. This approach distinguishes itself from the randomness that is inherent in the interpretation of the ELF index. Ethnic cleavages become more salient in countries in which access to governmental power is granted to ethnic groups that reflect a smaller part of the population – disproportionally small in comparison to their political relevance (Wimmer et al. 2009). The operationalization of the EPR is taken from Cederman and Girardin (2007, p. 176-177). Ethnopolitical balance r(i) within a country i between the ethnic group in power s0 and the rest of a country’s ethnic groups {s1, s2, …, sn} is linked by the formula

(2) i

i

ss

sir

+=

0

)( .

In order to assign higher scores of fractionalization to countries in which a minor ethnic group governs over large portion of the population (which in turn remain rather excluded from power), a logistic contest function is applied to map the ethnopolitical balance r(i). The EPR index6 that is shown in equation (3) reflects the degree of power exclusion that is based on “…ethnonational principals of political legitimacy.” (Wimmer et al. 2009: 321)

6 The present study will apply the same threshold values of k=5 and r=0.5 in accordance with Cederman and Girardin (2007).

8

(3) k

kn

i

irir

rirEPR −

−−

= +−= ∏

}/)({1

}/)({1

1

1

As the present analysis will rely on the Ethnic Power Relations dataset compiled by (Wimmer et al. 2009) a distinction between groups in and out of power will be achieved. Another approach that tries to overcome the weakness of the ELF as a measure of diversity at country level is represented by polarization indices (for an overview see Esteban and Schneider 2008). The ELF index yields higher scores the more ethnic groups in a country exist. Contrarily, measures of polarization (Esteban and Ray 1994) augment the measurement by emphasizing an economic approach on the propensity of conflict. Collective action problems of organizing are regarded from this (Reynal-Querol 2004; Montalvo and Reynal-Querol 2005). The RQ measure for polarization is therefore derived for the relative size of a group s in a country i the following way:

(4) ∑=

−=n

i

iii ssRQ1

2 )1(4

The index aims to reflect the relative size differences between minority and majority groups and yields the highest value on a scale from 0 to 1 when a country is divided into two equally large ethnic groups The reasoning behind this operationalization relies in the ubiquitous quest to explain the propensity of ethnic conflicts between groups where “…researchers should consider the measure of ethnic polarization – the concept used in most theoretical arguments....” (Montalvo and Reynal-Querol 2005:798). Some have criticized the compilation of the index for its sometimes inadequate definition of groups, a weakness we avoid by calculating all three measures from the same data source, the aforementioned Ethnic Power Relations data set. Having learned about the indices7, one may argue that these indices are not measures of the same concept. Chandra and Wilkinson (2008) distinguish between measures of “ethnic structure” (such as ELF) and “ethnic practice” (such as EPR), insinuating that there are different concepts from which the measures are derived. From that point of view, one might perceive the EPR not as an “improved” version of ELF but as a measure one step closer to the operationalization in terms of theoretical priority (Gerring 2005). In other words, rather than the number, relative size or other characteristics of ethnic structure, it is the way that certain ethnic groups have entered the sphere of governmental power. The ethnic practice scholars are simply identifying those mechanisms either because they think ethnic structure is too blunt a concept, or they think the structure is entirely irrelevant. Aspects of diversity merely describe the structure but not ethnic practice. Even if such an argumentation tries to justify the computation of such an index in a theoretical manner, it is apparent that the index is a mingled version of two theoretical ideas that needs to be backed up empirically before putting some supposedly working features together. In a measurement sense, the EPR is a

7 We restrict our analyses to three measures of ethnic diversity. Numerous other instruments exist that may grasp the meaning of ethnic diversity. For our analysis, some indices like the index of ethnic dominance (Collier and Hoeffler 2004) appear ill suited due to its dichotomous scale, others simply lack the appropriate amount of country specific data.

9

conditional index of ELF which produces outcome scores of structure if political criteria are met. The idea of EPR, however, points to a measure of ethnic differentiation which will match the content of the ELF or, to speak more technically, that also belong to the latent construct of ethnic diversity. Latent unities are not directly observable but are inferred from other variables (Loehlin 2004). They constitute the “core” of a construct in a technical sense as they are (more or less) related with variables from which they are derived. Similar to that, polarization measures may be perceived as entirely different to the ELF index, as the former focus on a bipolar distribution of some of population characteristic, whereas the ELF relies on fragmentation that rather constitutes multipolarity (Esteban and Schneider 2008). In our study, we will not focus on the “core meaning” of ethnicity on the macro-level, however. All theoretical discussions appear driven by an underlying interest, mainly to link empirically measures of diversity to other social phenomena. We are more interested in the indices themselves, trying to find out in which areas the proclaimed “core meaning” is influenced by method effects that may blur empirical inferences and theoretical conclusions. Therefore we claim that all three indices do represent a common feature, because despite their distinct theoretical underpinnings they all relate to two dimensions of the social structure in a given country: the relative size and number of ethnic groups in a country.

5 MODEL

The approach we employ to disentangle the measurement characteristics of the three methods presented is the Multitrait-Multimethod (MTMM) approach initially proposed by Campbell and Fiske (1959). MTMM analysis is a Structural Equations Model that is analyzed via Confirmatory Factor Analysis (CFA) and rests on a systematic combination of a group of trait and method factors (Marsh 1989). These trait and method factors are unobserved latent variables that are not measured directly, but their interrelation is rather reflected in the covariation of observed variables that constitute the sample MTMM covariance matrix. As a general rule to analyze MTMM models via CFA, there should be at least three traits (T) and three methods (M) that are reflected in at least T*M observed measurements, where each observed variable loads on one trait and one method (Marsh and Bailey 1991). For the analysis, the predefined methods and traits are specified as latent factors ξi, contained in the matrix Φ , the specific trait-method combinations are the observed variables xi, which are arranged in a (T*M) sample covariance matrix S8. Hence, the CFA model will be specified as required for the MTMM analysis and will be estimated by reproducing the sample covariance matrix S through the different elements in the parameter matrices that represent the covariance equation:

(5) δΘ+ΦΛΛ= 'xxS .

The factor loadings λi contained in the matrix Λx are the coefficients that will be estimated during the CFA. Along with the error terms in the matrix Θδ the aspects of validity are interpreted by the squared magnitude of the estimated standardized path coefficients along with the correlations among the latent trait factors (Bollen 1989).

8 All symbols are denoted in accordance to the program package LISREL 8.51 (Joereskog and Soerbom 1993).

10

Insert FIGURE 2 here – In the present study, we will use the specification of T=4 and M=3. The trait factor ξ1 denoted ‘Ethnicity’ will enter all distinct MTMM models to enable the comparability of all models that are aligned to this specification. To avoid structural feedbacks when observed variables are influenced by more latent variables, the present approach first chooses to estimate the Correlated-Traits-Uncorrelated-Methods CTUM model to derive the unique validity variance attributable to each method for the observed variables (Widaman 1985). The path model with correlated traits and uncorrelated methods denoted CTUM models9 is shown in Figure 2. Secondly, we assume τ-equivalence among the trait-method combinations for the three methods reflecting ethnic diversity and how they reflect the trait factor ethnicity (ξ1). This implies a priori equal reliability among the instruments but does not hinder the disentanglement of the variance and error components for each ethnicity measure. By assuming τ-equivalence, we are able to avoid an a priori fixation of a reference method (like in CTC(M-1) models, see Eid 2000). We rather maintain equal treatment among the three indices of ethnicity and especially warrant an easy interpretation of the estimation results without sacrificing any of the objectives of the study. As each observed variable is influenced by one trait and one method (and an error term) the squared magnitudes will reflect the proportion of variance that can be attributed to the respective trait and method. The interpretation of the estimation results will be two-fold: On one hand the factor loadings evaluation from the latent trait factors ξ1 – ξ4 to the observed variables identifies aspects of construct validity traditionally denoted as convergent and discriminant validity. Convergent validity describes the validity of an instrument towards a theoretical trait intended to be measured. Discriminant validity implies that instruments for the measurement of different traits should be weakly related in comparison to instruments of the same trait (Loehlin 2004). Simply speaking, an instrument shows discriminant validity when it does not measure what is supposed to be not measured10. On the other hand, the detection of possible method bias will be achieved by looking at the factor loadings of the three method factors ξ5 - ξ7 to the observed variables. Higher values for the estimated coefficients imply that a large part of the variance within the specific trait-method combination can be attributed to the method in use. Of course, both parts of the evaluative process are related to each other. By analyzing the magnitudes of factor loadings, the trait-correlations and then separating the variance that can be attributed to a particular method, the following can be identified: variance attributable to a certain trait and an

9 In the vast literature on MTMM analyses it is common strategy to allow non-correlation among latent factors as well as correlated error terms based on theoretical and modeling purposes (see Widaman 1985). 10 As Neumann and Graeff (in press) have pointed out, there might be a trade-off between both types of validities due to the dimensionality of the instruments when aggregate measures are compiled. Discriminant validity will rather be sacrificed to achieve higher level of convergence towards latent phenomena such as corruption. In the present case, all three indices differ only in their mathematical operationalization as they are based on the same data and, therefore, are not aggregate measures from different data sources.

11

error term, potential method bias that may drive quantitative outcomes and distort the validity of empirical interpretations (Podsakoff et al. 2003). Thus, MTMM analysis represents the tool to address a key problem in empirical studies often overseen: Contradictory results from cross-country analyses that involve ethnic diversity as an explanatory variable may stem from the fact that different instruments are also part of the measured variance. 6 DATA

In older studies, the Atlas Narodov Mira (1967) was used for compiling indices. It has been replaced by both the use of different data sources (such as the Encyclopedia Britannica, CIA World Factbook, etc.) and more nuanced assessments of ethnicity that reflect the aspects of political salience, ethnic practice and ethnic structure without disaggregating a country’s population to the last group or clan (see Fearon 2003 for an overview). As mentioned, the present analysis focuses on the quantitative operationalization of ethnic diversity and will forego the debate on the appropriate data source by deriving the three indices from a single data set, namely the aforementioned Ethnic Power Relation Dataset (see Min et al. 2009). With this, we are able to measure the ethnic group composition for 121 countries, conduct the MTMM analysis on one sample and are able to apply EPR’s necessary configuration of the ethnic groups concerning their political relevance and access to political power11. Table 1 shows the interrelations among the three measures, indicating their operationalization results in quite distinct measures for ethnic diversity. According to the ideas by Campbell and Fiske it could be assumed that convergent validity will be small if different measures of the same underlying trait do not correlate very highly (Loehlin 2004:98). Therefore, the three indices of the analysis may not measure the same construct (ethnic fractionalization, ethno-political fractionalization, polarization). But the assumption that all three measures attempt to map a common latent feature, the degree of ethnic diversity of the social stratification based on number of groups and relative group size of a country’s population as a contextual variable, is not unjustified.

- Insert Table 1- here - These measures of ethnic diversity constitute the part of the variance of the trait-method-combinations in the sample MTMM covariance matrices that shall reflect the latent method factors ξ5-ξ7. We denote these method factors according to the acronyms of the indices, ELF, EPR and RQ for ξ5, ξ6 and ξ7

respectively. For the other variables that are included in the sample MTMM covariance matrices, we structure the observed variables into four different categories, namely 1. Public Goods, 2. Politics, 3. Geography and 4. Socio-economic Development. We further divide areas 1. and 4. into three groups of variables concerning ‘social system’ and ‘infrastructure’ for the former and into ‘norms and values’ and ‘wealth’ for the latter, respectively. The justification for the use of the observed variables for these four categories which shall reflect the three remaining trait factors ξ2-ξ4 stems from their prior role in cross-country regressions involving ethnicity as potential sources of variation. Though the choice of some observed variables may appear arbitrarily, we will refer to the prior cross-country analyses that used most of these 11 Descriptive statistics are shown in the Appendix.

12

variables of the four categories (e.g. Alesina et al. 1999; Alesina et al. 2003; Fearon and Laitin 2003). Summarizing the presentation of the data one has to conclude that the trait factors ξ2-ξ4 play a rather marginal part in our analysis. Indeed, we concentrate our analysis foremost on the method factor loadings within the MTMM model. For example, we did not include conflict variables such as number of wars and conflicts due to their theoretical multidimensionality (Sambanis 2004), which simultaneously often causes serious problems within Structural Equation Models (e.g. Bollen and Lennox 1991, Edwards and Bagozzi 2000, Mackenzie et al. 2005). All other trait factors remain rather unspecified as we tried to avoid the aforementioned commingling of definition and explanation and to avoid the use of potential formative indicators. Technically, the trait factors are reflections of the three distinct socio-economic variables that constitute parts of the MTMM covariance matrix. 7 RESULTS

All models were estimated with LISREL 8.53 using the Maximum Likelihood estimation procedure. We assessed the reproduction of the MTMM covariance matrix by the sample (model fit) using the Normed Fit Index (NFI), the Comparative Fit Index (CFI), Chi-Square Test Statistic and the Root Mean Square Error of Approximation (RMSEA)12. The interpretation of the results can be solely based on the distinct trait and method factor loadings for two reasons. The CTUM-model implies the absence of inter-method correlations, hence there are no structural feedbacks via the method factors. Additionally, due to the model specification of τ -equivalence the values of the inter-trait correlations in the matrix appear to be estimated at values of 113. This also limits the structural feedbacks among the trait factors (see Bollen 1989) and simplifies the interpretation. The average estimation results for the factor loadings within the four analytical contexts are shown in Tables 2a and 2b. It turns out that all three diversity indices in Table 2a show high convergent validity towards the trait factor Ethnicity. First, ELF and EPR appear better suited as measures for ethnic diversity following the line of argumentation around Figure 1 because they both not only reveal higher factor loadings regarding the trait ‘Ethnicity’, but especially post lower method variances (27.04 % for the ELF, 37.45 % for the EPR) in comparison to the polarization measure RQ (53.58 %). - insert TABLE 2a here - - insert TABLE 2b here- Second, the method variance of the RQ is very salient across all four distinct contexts displayed in Table 2b. The polarization index always exceeds the two

12 Due to the a priori restriction of τ -equivalence among towards the trait factor ‘ethnicity’, this theoretically valid restriction implies an invariance condition that in turn constrains the values of the Chi-Square statistic to remain constant across models as well. Even if small Chi-Square values become difficult to judge, inflated Chi-Square values would have indicated the equivalence restriction as unjustified (see Bollen 1989, p. 263-267 for details). 13 All estimation procedures converge to a solution. When we do not assume τ –equivalence, we do not gain converging estimations and no identified solutions.

13

other ethnicity measures concerning the average magnitudes of the method factor loadings and, more importantly, its method variance surpasses the magnitude of the respective trait-variance many times. While there is no general rule proscribing it, it seems noteworthy that statistically significant regression coefficients within studies that used the RQ as an explanatory variable should be interpreted with caution, as there is a considerable probability that results are rather driven by the applied method than by the underlying theoretical construct. Third, while the EPR posts slightly lower method-specific variance than the ELF, this relationship turns in favour of the ELF in the analytical context of ‘Politics’. Though the difference is marginal and shall not doubt the favourable statistical properties of the EPR, this finding confirms our earlier statement. If one augments an index’ content, the index depends more on the applied method when it is used for analyses of causal inferences. Its method variance will be additionally increased, if political dynamics are considered (such as within the context of ‘Politics’ for the EPR). Even if the MTMM results do not provide knowledge about the relationship between ethnic influences and diverse socio-economic variables, the results allow the evaluation of the applicability of ethnic indices within certain analytical contexts. The ELF index shows high degrees of convergence towards all the traits studied across all four contexts while posting reasonable portions of method variance. For the EPR and its conceptual focus on political salience of ethnicity, the convergent validity towards the context variables within ‘politics’ do not similarly translate equally high degrees of convergence towards the other variables tied to political procedures, such as provision of public goods. All three ethnicity measures do not show high convergent validity towards measures for geography, which reinforces the idea that geography represents the “most” exogenous context within our analysis. To summarize the findings of our MTMM analyses, we argue that models using polarization measures must be specified deliberately if the large proportion of method variance should not cause problems. In our study, it turned out that the ELF reveals lower method variance in the area of politics than EPR and RQ. While the EPR index indicates slightly lower method variance than ELF in areas such as social system, infrastructure and geography, RQ comes up with the highest method variance in all areas. Despite the recent criticism towards the operationalization of the ELF, we challenge the criticism towards this index for measuring a country’s ethnic stratification – at least when applied in macro level studies. Contrarily, if the RQ is used as indicator variable in a multivariate model, colinearity will probably be more likely to appear. Even if it is debatable what the reasons for colinearity in fact are (Leamer 1983), test statistics of models are affected in an unpredictable manner. So, one has to cast doubts on the test results within the respective areas that are prone to method effects. While the reasoning for these statements are drawn from the comparative analyses above, we still put forward a more nuanced line of argumentation that we think is reflected within the numbers and between the lines of our methodological examination.

8 DISCUSSION

An outright critic of the present study would ask why to even think about things like construct validity and method bias in macro level modeling. The answer is

14

two-fold, as we perceive a scientific and practical reasoning behind the research efforts shown in the present study. First, in macro level research that adheres to empirical testing, scholars usually have to apply some strategies in order to check theoretical propositions of theories (Morton 1999, Geddes 2003). One may tailor a theoretical proposition into distinct sets of testable issues, thus modifying the explanandum with respect to the adequacy and availability of proper data14. Other researchers might try to improve the indicators that reflect the theoretical proposition’s measurement level, thus adjusting the explanans. In philosophy of science, augmenting the content of an explanatory construct seems to be in line with the ideas of Critical Rationalism and empirical proceedings, at least at first sight. Expanding the meaning of a construct implies that additional features are considered which reduce the generality of the construct which, in turn, makes a hypothesis more amenable for falsification (Albert 1964: 25-26; Popper 1963; Lakatos 1976). The augmentation of a macro-construct does not fit to these implications, however. In fact, the primordial meaning of the construct may be expanded by another dimension of content which is not originally linked to the first one. For instance, an index of corruption may count as a proxy for the degree of an administration’s quality in one case, but also as an indicator for the spread of corrupt practices in the whole societies in other cases. By doing so, the content of the explanans is changed and the generality of the construct is not reduced by this, it is inflated. The second line of reasoning deals with some practical issues. Some readers may perceive a trade-off between our conclusions and the subsequent applicability of our findings. While the purpose to integrate insights from psychological and educational methodological research into macro level modeling may appear reasonable for some, others would assert that this undermines “…the primary purpose of data – which is to test theories and to assess their empirical relevance.” (Kaufmann, Kraay and Mastruzzi 2010:57). Methodological aspects of index building and the subsequent conclusions that could be drawn from our elaborations (and the outlines surrounding Figure 1) may potentially post high hurdles and could sacrifice any possibilities of producing empirical relevant results – relevant in the sense that studies are accepted as publication for referred journals or that policy reports provide useful implications for policy makers and governments. The disadvantages will still apply, however, if amplified indicators prone to method bias are used for modeling different phenomena. As can be seen in tables 2b and 4, typical variables in macro-research share quite different proportions of method and trait variance with different indices. In that vein, the method variance of each index results in a unique pattern of links to indicators in the fields of politics, geography, socio-economic development or public goods. Therefore, tests of theories to assess their empirical relevance by use of instruments showing method biases remain singular and constricted rather than generalizable and utilizable. In defence of the researchers who came up with distinct and augmented measures of ethnic fractionalization, one might argue that they only changed the operationalization against the (new) relevant content domain for the construct. This would imply that they took interest in content validity (Bollen 1989, p. 186) and not construct validity per se. Refining indicators that increases the method

14 For example, this strategy would imply that one decides to analyze not all violent conflicts of the past 50 years, but e.g. only secessionist conflicts. This strategy is reflected in the development of the aforementioned schools of thought on causes of war and conflict (e.g. Sambanis 2004).

15

variance might be a second best strategy, even if it preserves the theoretical issues and does not tailor the hypotheses derived by theory.

REFERENCES

Adcock R., Collier D., 2001. Measurement validity: A shared standard for

qualitative and quantitative research. American Political Science Review 95 (3), 529-546.

Albert H., 1964. Problems in Building Theories: Development, Structure and Application of Theories in Social Sciences. In: Albert, H. (Ed.), Theory and Reality. (translated from German) Tübingen: J.C.B. Mohr (Siebeck), pp. 3-72.

Alesina A., Baqir R., Easterly, W., 1999. Public Goods and Ethnic Division. Quarterly Journal of Economics 114 (4), 1243-1284.

Alesina A., Devleeschauwer A., Easterly W., Kurlat W., Wacziarg, R., 2003. Fractionalization. Journal of Economic Growth 8 (2), 155-194.

Alesina A., Glaeser, E.L., 2004. Fighting Poverty in the US and Europe- A World of Differences. Oxford University Press.

Alesina A., La Ferrara, E., 2005. Ethnic Diversity and Economic Performance. Journal of Economic Literature XLIII, 762-800.

Atlas Narodov Mira. 1964. Moscow: Miklukho-Maklai. Ethnological Institute at the Department of Geodesy and Cartography of the State Geological Committee of the Soviet Union.

Bagozzi R.P., Yi Y., 1991. Multitrait–Multimethod matrices in consumer research. Journal of Consumer Research 17, 426–439.

Bjørnskov C., 2008. Social Trust and Fractionalization: A Possible Reinterpretation. European Sociological Review 24 (3), 271-283.

Blattman C., Miguel, M., 2010. Civil War. Journal of Economic Literature 48 (1), 3–57.

Bollen K.A., 1989. Structural Equations with Latent Variables. Wiley Series in

Probability and Mathematical Statistics. New York: Wiley. Bollen K. A., Lennox, R., 1991. Conventional wisdom on measurement: A

structural equation perspective. Psychological Bulletin 110, 305–314. Braithwaite J.,1985. White Collar Crime. Annual Review of Sociology 11, 1-25. Brambor T., Clark W.R., Golder, M., 2006. Understanding Interaction Models:

Improving Empirical Analyses Political Analysis 14(1), 63-82. Campbell D.L., Fiske D.W., 1959. Convergent and Discriminant Validation by

Multitrait-Multimethod Matrix. Psychological Bulletin 56, 81-105. Cederman L.-E., Girardin, L., 2007. Beyond Fractionalization: Mapping Ethnicity

onto Nationalist Insurgencies. American Political Science Review 101 (1), 173-185.

Central Intelligence Agency, 1996. CIA World Factbook, published online. Chandra K., Wilkinson, S., 2008. Measuring the Effect of Ethnicity. Comparative

Political Studies 41(4/5), 515-563. Collier P., Hoeffler, A., 2004. Greed and Grievance in Civil War. Oxford Economic

Papers 56, 563–95. Collier P., Hoeffler, A., 2007. Civil War. In: Hartley K., Sandler, T. (eds)

Handbook of Defense Economics. Elsevier North Holland. Costa D. L., Kahn M.E., 2003. Civic Engagement and Community Heterogeneity:

An Economist’s Perspective. Perspective on Politics 1(1), 103-111.

16

Cote J. A., Buckley, R., 1987. Estimating trait, method, and error variance: Generalizing across 70 construct validation studies. Journal of Marketing Research 24, 315–318.

Cronbach L.J., Meehl, P.E., 1955. Construct validity in psychological tests” Psychological Bulletin 52, 281-302.

Djankov S., La Porta R., López-de-Silanes F., Shleifer A., 2002. The Regulation of Entry. Quarterly Journal of Economics 117, 1-37.

Easterly W., Levine R., 1997. Africa’s Growth Tragedy: Policies and Ethnic Divisions. The Quarterly Journal of Economics 112 (4), 1203–1250.

Edwards J.R., Bagozzi, R., 2000. On the Nature and Direction of Relationships Between Constructs and Measures. Psychological Methods 5 (2), 155-174.

Eid M., 2000. A Multitrait-Multimethod Model with Minimal Assumptions. Psychometrika 65 (2), 241-261.

Esteban, Joan, Debraj Ray, 1994. On the Measurement of Polarization. Econometrica 62 (4), 819–851.

Esteban, Joan, Gerald Schneider, 2008. Polarization and Conflict: Theoretical and Empirical Issues. Journal of Peace Research 45 (2), 131-141.

Fearon J.D., Posner D., 2001. The Implications of Constructivism for Constructing Ethnic Fractionalization Indices. APSA-CP 13 (1), 13-17.

Fearon J.D., 2003. Ethnic and Cultural Diversity by Country. Journal of Economic Growth 8(2), 195-222.

Fearon J.D., Laitin D.D., 2003. Ethnicity, Insurgency, and Civil War. American

Political Science Review 97, 1–16. Gamliel E., Peer E., 2006: Positive versus Negative Framing Affects Justice. Social Justice

Research 19 (3), 307-322. Ganster D. C., Hennessey, H. W., Luthans, F., 1983. Social desirability response

effects: Three alternative models. Academy of Management Journal 26, 321–331.

Geddes B., 2003. Paradigms and Sand Castles. Theory Building and Research Design in Comparative Politics. Ann Arbor: University of Michigan Press.

Habyarimana J., Humphreys M., Posner D., Weinstein, J.M., 2007. Why Does Ethnic Diversity Undermine Public Good Provision? American Political

Science Review 101 (4), 709-725. Hooghe M., Reeskens T., Stolle D., Trappers A., 2009. Ethnic Diversity and

Generalized Trust in Europe- A Cross-National Multilevel Study. Comparative Political Studies 42 (2), 198-223.

Horowitz D., 1985. Ethnic Groups in Conflict. Berkeley: University of California Press.

Jöreskog K.G., Sörbom D., 1993. LISREL8: Structural Equation Modeling with the SIMPLIS Command Language. Hillsdale: Lawrence Earlbaum Associates Publishers.

Kalyvas S.N., 2008. Ethnic Defection in Civil Wars. Comparative Political Studies

Volume 41(8): 1043-1068. Kaufmann D., Kraay A., Mastruzzi M., 2007: Governance Indicators: Aggregate

and Individual Governance Indicators for 1996-2005. The World Bank. Kaufmann D., Kraay A., Mastruzzi M., 2010. The Worldwide Governance

Indicators Project: Answering the Critics. European Journal of Development Research 22 (1), 55-58.

Lakatos I., 1976. Proofs and Refutations. Cambridge: Cambridge University Press.

Langbein L., Knack, S., 2008. The Worldwide Governance Indicators and Tautology: Causally Related Separable Concepts, Indicators of a Common Cause, or Both? Policy Research Working Paper 4669.

17

La Porta R., López-de-Silanes F., Shleifer A., Vishny R., 1999. The Quality of Government. Journal of Law, Economics and Organization 15 (1), 222-279.

Leamer E., 1983. Let's Take the Con out of Econometrics. American Economic Review 73, 31-43.

Lijphart A., 1992. Democratization and Constitutional Choices in Czecho-Slovakia, Hungary, and Poland, 1989–1991. Journal of Theoretical Politics 4(2), 207–223.

Lipset S.M.,1993. The Social Requisites of Democracy Revisited. American Sociological Review 59(1), 1-22.

Lipset S.M., Marks G., 2000. It Didn’t Happen Here: Why Socialism Failed in the United States. New York: W.W. Norton and Co.

MacKenzie S.B., Podsakoff ,P.M., Jarvis, C.J., 2005 The Problem of Measurement Model Misspecification in Behavioral and Organizational Research and Some Recommended Solutions. Journal of Applied Psychology 90 (4): 710–730.

Marsh H.W., 1989. Confirmatory Factor Analyses of Multitrait-Multimethod Data. Many Problems and Few Solutions. Applied Psychological Measurement 13: 335-361.

Marsh H.W., Bailey M., 1991. Confirmatory Factor Analyses of Multitrait-Multimethod Data: A Comparison of Alternative Models. Applied Psychological Measurement 15: 47-70.

Merton R.K., 1949. Social Theory and Social Structure: Toward the Codification of Theory and Research. Glencoe, Illinois: Free Press.

Min B., Cederman L.-E., Wimmer A., 2009. Ethnic Exclusion, Economic Growth, and Civil War: A New Dataset on Ethnic Power Relations in the World, 1946–2005. University of California, Los Angeles. Unpublished manuscript.

Montalvo J.G., Reynal-Querol M., 2005. Ethnic Polarization, Potential Conflict and Civil Wars. American Economic Review 95: 796–816.

Morton R.B., 1999. Methods and Models. New York: CUP. Neumann R., Graeff P., A Multitrait-Multimethod approach to pinpoint the validity

of aggregated governance indicators. Quality & Quantity. in press. Podsakoff P.M., MacKenzie S.B., Lee J.Y., Podsakoff P.N., 2005. Common Method

Bias in Behavioral Research: A Critical Review of the Literature and Recommended Remedies. Journal of Applied Psychology Volume 88 (5): 879–903.

Popper K., 1963. Conjectures and Refutations. London: Routledge and Keagan Paul.

Putnam R.D., 2007. E pluribus unum. Diversity and community in the twenty-first century. Scandinavian Political Studies 30: 137-174.

Raykov T., Widaman K.F., 1995. Issues in applied structural equation modeling research. Structural Equation Modeling: A Multidisciplinary Journal 2(4): 289-318.

Reynal-Querol M., 2002. Ethnicity, Political Systems, and Civil Wars. Journal of Conflict Resolution 46(1): 29–54.

Sambanis N., 2001. Do Ethnic and Nonethnic Civil Wars Have the Same Causes? Journal of Conflict Resolution 45: 259–82.

Sambanis N., 2004. What is Civil War-Conceptual and Empirical Complexities of an Operational Definition. Journal of Conflict Resolution 48(6): 814–858.

Shadish W.R., Cook D.T., Campbell D.T., 2002. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston: Houghton Mifflin.

18

Tajfel H., 1982. Social Psychology of Intergroup Relations. Annual Review of Psyhology 33: 1-39.

Thomas M., 2010. What Do The Worldwide Governance Indicators Measure? European Journal of Development Research 22: 31-54.

Vanhannen T., 2007. Measures of Democracy 1810-2006 [computer file]. FSD1289, version 3.0 (2007-11-15). University of Tampere: Department of Political Science and International Relations.

Worldbank., 2008. World Development Indicators 2008. Washington: The Worldbank.

Weede E., 1996. Economic development, social order, and world politics. Boulder: Lynne Rienner.

Weede E., 2005. Balance of Power, Globalization and the Capitalistic Peace. Berlin: Liberal Verlag.

Widaman K. F., 1985. Hierarchically Nested Covariance Structure Models Model for Multitrait-Multimethod Data. Applied Psychological Measurement 10: 1-22.

Wimmer A., 2008. The Making and Unmaking Of Ethnic Boundaries: A Multi-Level Process Theory. American Journal of Sociology 113: 970–1022.

Wimmer A., Cederman L.-E., Min B., 2009. Ethnic Politics and Armed Conflict: A Configurational Analysis of a New Global Data Set. American Sociological Review 74: 316-337.

You J.-S., Khagram S., 2005. A comparative study on inequality and corruption. American Sociological Review 70: 136-157.

TABLES TABLE 1: Correlation Coefficients among the three Diversity Measures (n=121) A.

ELF EPR RQ

ELF 1* EPR 0.5696* 1 RQ 0.4075* 0.2792* 1

A * denotes significance at the 1% level (2-tailed). TABLE 2a: Method and Trait Factor Loadings concerning the Trait of ‘Ethnicity’.

Method loading Trait loading

ELF λ15

EPR λ56

RQ λ97

λ11 λ 51 λ 91

Ethnicity 0.520*** 0.612*** 0.732*** 0.854*** 0.791*** 0.682*** TABLE 2b: Results and Goodness of Fit Indices of the MTMM analysesC.

Method loadings Trait loadings

ELF EPR RQ T2 T3 T4 NFI CFI

λi5 λi6 λi7 λi2 λi3 λi4 1. Public Goods Social System 0.434 0.409 0.657 0.712 0.529 0.612 0.947 0.994 Infrastructure 0.317 0.262 0.541 0.520 0.340 0.504 0.963 0.992

2. Politics 0.486 0.530 0.699 0.798 0.687 0.651 0.978 0.998 3. Geography 0.315 0.268 0.533 0.465 0.307 0.458 0.949 0.994 4. Socioecon. Development

Values& Norms 0.414 0.375 0.613 0.662 0.468 0.563 0.963 0.996 Wealth 0.399 0.350 0.614 0.655 0.453 0.572 0.957 0.995

19

C Coefficients represent the average magnitude of the standardized factor loadings. RMSEA=0, χ2=28.928, df=38 for all models, see fn 5. Full results displayed in the Appendix.

20

APPENDIX

TABLE 3: Presentation of the Traits used in the MTMM AnalysesC.

Category/ Trait

factors Description Source

1. Public Goods School enrolment

Total enrolment, primary (% net); Total enrolment is the number of pupils of the school-age group for primary education, enrolled either in primary or secondary education, expressed as a percentage of the total population in that age group. Years: 1980-2005, mean score.

WORLDBANK (2008)

Literacy Average of adult literacy rate for the years 1980-2005 Adult literacy rate is the percentage of people ages 15 and above who can, with understanding, read and write a short, simple statement on their everyday life. Scale: 0 to 100.

WORLDBANK (2008)

Tuberculosis Incidence of Tuberculosis (per 100.000 people), Incidence of tuberculosis is the estimated number of new pulmonary, smear positive and extra-pulmonary tuberculosis cases. Years: 1980-2005, mean score.

WORLDBANK (2008)

Roads Roads, paved (% of total roads). Years: 1980-2005, mean score. WORLDBANK (2008) Water withdrawal Annual freshwater withdrawals, total (% of internal resources) Annual

freshwater withdrawals refer to total water withdrawals, not counting evaporation losses from storage basins. Withdrawals also include water from desalination plants in countries where they are a significant source. Years: 1980-2005, mean score.

WORLDBANK (2008)

Electricity production from oil

Electricity production from oil sources (% of total); Sources of electricity refer to the inputs used to generate electricity. Oil refers to crude oil and petroleum products. Years: 1980-2005, mean score.

WORLDBANK (2008)

2. Politics Political Rights

Political rights enable people to participate freely in the political process, including the right to vote freely for distinct alternatives in legitimate elections, compete for public office, join political parties and organizations, and elect representatives who have a decisive impact on public policies and are accountable to the electorate. The more specific list of rights considered vary over the years. Countries are graded between 1 (most free) and 7 (least free). Years: 1980-2005, mean score.

Freedom House www.freedomhouse.org

21

Civil Liberties Civil liberties allow for the freedoms of expression and belief, associational and organizational rights, rule of law, and personal autonomy without interference from the state. The more specific list of rights considered vary over the years. Countries are graded between 1 (most free) and 7 (least free). Years: 1980-2005, mean score.

Freedom House www.freedomhouse.org

Participation The percentage of the total population who actually voted in the election. Years: 1980-2005, mean score.

Vanhannen (2007)

3. Geography Latitude

The absolute value of the latitude of the country, scaled to take values between 0 and 1.

CIA 1996

Arable land Arable land (% of land area) Arable land includes land defined by the FAO as land under temporary crops (double-cropped areas are counted once), temporary meadows for mowing or for pasture, land under market or kitchen gardens, and land temporarily fallow. Land abandoned as a result of shifting cultivation is excluded. Years: 1980-2005, mean score.

WORLDBANK (2008)

Rural Population Rural population (% of total population); Rural population is calculated as the difference between the total population and the urban population. Years: 1980-2005, mean score.

WORLDBANK (2008)

4. Socio-econ. Dev. Suicide

Suicide rate per 100,000 people. WORLD HEALTH ORGANIZATION (2009)

Religious Fractionalization

Reflects probability that two randomly selected people from a given country will not belong to the same religious group. The higher the number, the more fractionalized society. The score covers data over 294 different religions in 215 countries and dependencies and its operationalization is similar to the ELF.

Alesina et al. (2003).

Corruption The average of the index of corruption from the International Country Risk Guide between 2003 and 2007. The scale of the index is from 0 to 6, where higher numbers mean lower corruption.

ICRG International Country Risk Guide, Political Risk Services: www.prsgroup.com

Income Inequality GINI index ; Gini index measures the extent to which the distribution of income (or, in some cases, consumption expenditure) among individuals or households within an economy deviates from a perfectly equal distribution. The Gini index measures the area between the Lorenz curve and a hypothetical line of absolute equality, expressed as a percentage of the

WORLDBANK (2008)

22

maximum area under the line. Thus a Gini index of 0 represents perfect equality, while an index of 100 implies perfect inequality. Years: 1980-2005, mean score.

Innovation Patent applications, per residents Patent applications are applications filed with a national patent office for exclusive rights for an invention--a product or process that provides a new way of doing something or offers a new technical solution to a problem. Mean score for the years 1980-2005

WORLDBANK (2008)

Income Logarithm of GDP per capita expressed in current US dollars for the Years: 1980-2005, mean score.

WORLDBANK (2008)

C Sources and Data detail in the Appendix.

23

Table 4: Estimation Results of the MTMM AnalysesD. Method loading Trait loading

ELF EPR RQ T2 T3 T4

1. Public Goods

Social System λi5 λi6 λi7 λi2 λi3 λi4

λ2j 0.430*** 0.397*** 0.661*** 0.706*** 0.513*** 0.616*** λ3j 0.423*** 0.391** 0.656*** 0.695*** 0.506*** 0.612***

λ4j 0.448*** 0.440*** 0.654*** 0.735*** 0.569*** 0.609*** Mean 0.434 0.409 0.657 0.712 0.529 0.612

Infrastructure λ2j 0.389*** 0.326** 0.635*** 0.639*** 0.422*** 0.592***

λ3j 0.333** 0.312** 0.526*** 0.546*** 0.404*** 0.490*** λ4j 0.228* 0.149** 0.461*** 0.374*** 0.193*** 0.429***

Mean 0.317 0.262 0.541 0.520 0.340 0.504 2. Politics

λ2j 0.476*** 0.515*** 0.685*** 0.786*** 0.663*** 0.643*** λ3j 0.479*** 0.512*** 0.690*** 0.782*** 0.667*** 0.638*** λ4j 0.504*** 0.564*** 0.722*** 0.827*** 0.730*** 0.673***

Mean 0.486 0.530 0.699 0.798 0.687 0.651

3. Geography λ2j 0.179** 0.105 0.376*** 0.293** 0.136 0.350** λ3j 0.338** 0.294** 0.572*** 0.555*** 0.380*** 0.533*** λ4j 0.428** 0.405** 0.650*** 0.546*** 0.404*** 0.490***

Mean 0.315 0.268 0.533 0.465 0.307 0.458 4. Socioecon.

Development

Values & Norms

λ2j 0.480*** 0.420*** 0.695*** 0.733*** 0.490*** 0.623*** λ3j 0.304** 0.259** 0.458*** 0.499** 0.335** 0.427*** λ4j 0.459*** 0.447*** 0.686*** 0.754*** 0.579*** 0.639***

Mean 0.414 0.375 0.613 0.662 0.468 0.563

Wealth λ2j 0.302* 0.232** 0.492*** 0.496 0.300 0.459 λ3j 0.447*** 0.379*** 0.668*** 0.733 0.490 0.623 λ4j 0.449*** 0.439*** 0.681*** 0.737 0.568 0.634

Mean 0.399 0.350 0.614 0.655 0.453 0.572

D *,** and *** denote significance at the 10, 5 and 1 %-level (2-tailed), .

24

Table 5: Sample MTMM Covariance Matrices for the six Analytical Contexts (T1 stands for ‘Ethnicity’, changed due to space restraints). 1.a Public Goods- Social System ELFT1 ELFT2 ELFT3 ELFT4 EPRET1 EPRT2 EPRT3 EPRT4 RQT1 RQT2 RQT3 RQT4

0.047 0.142 0.587 0.130 0.392 0.512 0.149 0.447 0.412 0.603 0.030 0.090 0.083 0.095 0.038 0.075 0.224 0.207 0.235 0.093 0.610 0.069 0.206 0.190 0.217 0.086 0.213 0.534 0.086 0.258 0.238 0.271 0.108 0.266 0.245 0.648 0.028 0.085 0.078 0.089 0.010 0.024 0.023 0.028 0.047 0.083 0.248 0.229 0.261 0.029 0.071 0.066 0.082 0.137 0.518 0.079 0.238 0.220 0.250 0.028 0.069 0.063 0.079 0.132 0.386 0.485 0.083 0.251 0.231 0.263 0.029 0.072 0.066 0.083 0.139 0.405 0.390 0.542

1.b Public Goods- Infrastructure ELFT1 ELFT2 ELFT3 ELFT4 EPRET1 EPRT2 EPRT3 EPRT4 RQT1 RQT2 RQT3 RQT4

0.047 0.104 0.380 0.077 0.171 0.277 0.051 0.112 0.083 0.243 0.030 0.067 0.049 0.032 0.038 0.049 0.110 0.081 0.053 0.062 0.406 0.043 0.095 0.071 0.046 0.054 0.089 0.338 0.019 0.042 0.031 0.020 0.024 0.039 0.034 0.295 0.028 0.063 0.046 0.030 0.010 0.016 0.014 0.006 0.047 0.066 0.146 0.108 0.071 0.023 0.038 0.033 0.014 0.110 0.362 0.043 0.095 0.071 0.046 0.015 0.025 0.022 0.009 0.072 0.167 0.240 0.037 0.083 0.062 0.040 0.013 0.021 0.019 0.008 0.062 0.145 0.095 0.243

2. Politics ELFT1 ELFT2 ELFT3 ELFT4 EPRET1 EPRT2 EPRT3 EPRT4 RQT1 RQT2 RQT3 RQT4 0.047 0.328 2.646 0.331 2.308 2.670 1.952 13.606 13.730 85.404 0.030 0.210 0.212 1.248 0.038 0.197 1.371 1.384 8.159 0.247 2.396 0.193 1.346 1.358 8.007 0.242 1.583 2.338 0.925 6.445 6.504 38.342 1.159 7.578 7.438 42.983 0.028 0.197 0.199 1.173 0.010 0.065 0.063 0.303 0.047 0.183 1.276 1.288 7.592 0.064 0.418 0.410 1.964 0.305 2.330 0.187 1.303 1.315 7.752 0.065 0.427 0.419 2.005 0.311 2.015 2.383

1.345 9.370 9.456 55.748 0.469 3.069 3.011 14.420 2.239 14.489 14.794 109.914 3. Geography ELFT1 ELFT2 ELFT3 ELFT4 EPRET1 EPRT2 EPRT3 EPRT4 RQT1 RQT2 RQT3 RQT4 0.047 0.018 0.051 0.070 0.028 0.222 0.122 0.048 0.183 0.442 0.030 0.012 0.045 0.078 0.038 0.007 0.003 0.010 0.018 0.008 0.075 0.036 0.014 0.054 0.094 0.045 0.010 0.271

25

0.068 0.027 0.102 0.177 0.086 0.019 0.103 0.489 0.028 0.011 0.042 0.074 0.010 0.002 0.012 0.022 0.047 0.013 0.005 0.019 0.034 0.005 0.001 0.005 0.010 0.021 0.045 0.044 0.017 0.066 0.114 0.015 0.003 0.018 0.035 0.073 0.033 0.208 0.072 0.028 0.107 0.186 0.025 0.006 0.030 0.057 0.119 0.055 0.186 0.407

4.a Socio-Economic Development- Norms & Values ELFT1 ELFT2 ELFT3 ELFT4 EPRET1 EPRT2 EPRT3 EPRT4 RQT1 RQT2 RQT3 RQT4

0.047 1.030 25.732 0.035 0.775 0.069 0.306 6.704 0.230 2.453 0.030 0.658 0.023 0.196 0.038 0.372 8.133 0.280 2.419 0.466 13.431 0.020 0.436 0.015 0.130 0.025 0.308 0.107 0.160 3.506 0.121 1.043 0.201 2.481 0.133 2.175 0.028 0.619 0.021 0.184 0.010 0.122 0.007 0.053 0.047 0.521 11.401 0.392 3.391 0.182 2.247 0.120 0.968 0.868 18.204 0.019 0.425 0.015 0.127 0.007 0.084 0.004 0.036 0.032 0.596 0.066 0.179 3.923 0.135 1.167 0.063 0.773 0.041 0.333 0.299 5.499 0.205 2.225

4.b Socio-Economic Development- Wealth ELFT1 ELFT2 ELFT3 ELFT4 EPRET1 EPRT2 EPRT3 EPRT4 RQT1 RQT2 RQT3 RQT4 0.047 0.032 0.058 0.360 0.248 3.543 0.221 0.152 1.688 1.328 0.030 0.021 0.230 0.141 0.038 0.018 0.012 0.137 0.085 0.023 0.109 0.143 0.099 1.095 0.674 0.180 0.108 2.492 0.114 0.078 0.869 0.534 0.143 0.085 0.680 1.141 0.028 0.019 0.216 0.133 0.010 0.006 0.047 0.037 0.047 0.018 0.012 0.135 0.083 0.006 0.004 0.030 0.023 0.030 0.047 0.200 0.138 1.526 0.938 0.070 0.042 0.333 0.264 0.333 0.209 2.950 0.136 0.094 1.042 0.641 0.048 0.029 0.227 0.180 0.227 0.142 1.605 1.314

26

FIGURES

Construct Validity High Low

High Mixed results,

unclear interpretation

Method driven results

Method Specificity

Low 1st best No inference

possible FIGURE 1: Relationship of an Index’ Measurement Properties.

FIGURE 2: MTMM Path Diagram for four correlated Trait Factors and three uncorrelated Method Factors.

ξ4

ξ5 ξ6 ξ7

ξ1 ξ2 ξ3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12

λ11 λ51 λ91

λ22

λ62

λ102

λ33 λ73 λ113

λ44 λ84

λ15 λ35

λ45 λ25

λ56 λ76

λ86 λ66

λ97 λ117 λ107

method bias in comparative research: problems of construct ... · method bias in comparative...

Documents