the effect of air pollution on china’s ... - github pages · choice. second, did certain types of...

The Effect of Air Pollution on China’s Internal Migration

Wenbo Li1

University of Notre Dame

March 7, 2019

Abstract Do people in China move from more polluted cities to less polluted cities? To answer this

question, I merge the air pollution data of Chinese cities from 2003-2016 with the migration data from a

nationally representative sample. I estimate a fixed effect model to study the effect of air pollution in the

origin on out-migration and a conditional logit model to study the effect of air pollution on location

choice. In both models, I employ air pollution from distant sources as an instrument for local air pollution

to address the potential concern that air pollution is endogenous to local economic activities. I find that a

one-standard deviation increase in the average Air Quality Index (AQI) increased the probability of

having a migrant by a sizable 29 percentage points. A one-year increase in the household head’s

education increased the marginal effect of the average AQI by 1 percentage point. Moreover, I find that

people were less likely to choose a city with more air pollution.

1 I thank my advisor, Abigail Wozniak, for her advice and support.

1

I. Introduction

Air pollution in China is substantial and has health impacts. Among the 20 cities with the

worst air quality in the world in 2016, 4 are located in China (World Health Organization, 2016).

China’s increased industrial activities and rising number of automobiles are the two major

emission sources to blame. A coal-rich country, China primarily relies on coal for its electricity

generation and winter heating, which further aggravate the problem with its air quality. Air

pollution is now recognized as an increasing concern that affects the cardiopulmonary health of

people living in China (Brandt and Rawski 2008). The 2010 Global Burden of Disease Study

suggests that air pollution is the 4th leading health risk factor for Chinese people (Yang et al.

2013).

In addition to studying the impact of air pollution on health outcomes, the past literature

has also focused on the impact of air pollution on workers’ productivity (Zivin et al. 2012; Li et

al. 2015), academic outcomes (Currie et al. 2009; Stafford 2015), and housing prices (Zheng et

al. 2010; Zheng and Kahn 2013), but air pollution can also create distortions to other economic

activities, such as migration. There are two reasons why ignoring the effect on migration may

understate the true impact of air pollution. First, if people do move from more polluted cities to

less polluted cities, migration will have reduced the impact of air pollution shocks on people’s

health outcomes. This can imply that a country such as China should relax its institutional

restrictions on migration to facilitate migration’s function of coping with air pollution. This also

implies that, if residents already have migration as an alternative way to cope with air pollution,

they may press local authorities less hard to control air pollution. Second, the forced

2

displacement of people as a result of air pollution may create a distortion to the population

distribution across regions in China. This distortion may also vary by education, income, etc.

In this paper, I address the following two questions. First, did people move from more

polluted cities to less polluted cities? In particular, with city-level air pollution data from 2003 to

2016 and migration survey data of a nationally representative sample, I empirically uncover the

effect of air pollution in the origin on out-migration and the effect of air pollution on location

choice. Second, did certain types of people respond more to air pollution by migrating? In

particular, I study the heterogeneous effects across education and income groups, and for

families with or without children.

To study these questions, I calculate the annual average of the Air Pollution Index (API)

or the Air Quality Index (AQI) in each city and use it as the measure for air pollution levels. I

first estimate a fixed-effect model to examine the effect of the average AQI in the origin on the

probability that a household had a migrant. I also estimate a conditional logit model to study the

effect of air pollution on location choice. In both cases, I use instrumental variables (IV)

strategies to address the potential concern that air pollution is endogenous to local economic

activities, using air pollution from distant sources as the instrument for local air pollution. The

advantage of the choice model is that it allows me to explicitly describe an individual’s utility; it

also captures the relative characteristics of a place, and thus allows for a role for both the origin

and the destination. To incorporate an IV framework in the conditional logit model, I estimate

the conditional logit model through Generalized Method of Moments (GMM).

There are potential reasons why or why not air pollution in China has an effect on

migration. On the one hand, migration provides an air quality arbitrage opportunity and allows

migrants to accumulate additional health capital. On the other hand, it is possible that air

3

pollution does not affect people’s decision making in a developing country such as China

because the income level there is not sufficiently high. I find that air pollution did cause out-

migration. In particular, a one-standard deviation increase in the average AQI increased the

probability of having a migrant by a sizable 29 percentage points. I also find that more educated

households were more responsive to air pollution. In particular, a one-year increase in the

household head’s education increased the marginal effect of the average AQI by 1 percentage

point. Moreover, I find that people were less likely to choose a city with more air pollution.

My paper contributes to the existing literature in two ways. First, my paper explicitly

identifies whether air pollution caused out-migration. In the existing literature, Qin and Zhu

(2015) have investigated the short-term impact of air pollution in China on people’s interest in

emigration. They find that searches on “emigration” for a Chinese online search engine will

grow by approximately 2-5 percent the next day if today’s AQI is increased by 100 points. Since

migration is ultimately a long-term decision, however, a rise in a person’s short-term sentiment

found in Qin and Zhu (2015) regarding migration might not translate into a migration episode.

Another set of previous literature has studied the long-term migration responses to air pollution

in the context of the United States. Banzhaf and Walsh (2008) find that the introduction of a

polluting facility causes individuals to leave the neighborhood and that the exit of a polluting

facility causes them to enter. Sullivan (2017) finds that neighborhoods with improved air quality

see a significant decrease in low-income residents. To the best of my knowledge, only two

papers, Chen, Oliva, and Zhang (2017) and Li et al. (2017) have studied the long-term effect of

air pollution on the internal migration in China. Chen, Oliva, and Zhang (2017) use thermal

inversion as an instrument for air pollution, and find that air pollution reduced the population in a

given county and reduced in-migration of floating migrants. However, since they do not directly

4

observe out-migration, they cannot distinguish between whether air pollution caused out-

migration or whether it “only” determined destination once an individual had already decided to

migrate. Since I explicitly observe whether a household sent out a migrant in the survey data, I

am able to make this distinction. Another key difference between my paper and Chen, Oliva, and

Zhang (2017) is that Chen, Oliva, and Zhang (2017) use remotely sensed data, while I adopt

ground-based monitor readings. Although remotely sensed data have been shown to measure air

quality sufficiently well (Kumar et al., 2011), ground-based monitoring stations provide accurate

city-level air pollution readings for the same period the survey data my paper adopts cover.

Second, the survey data my paper adopts allow me to study the heterogeneous effects of

air pollution on migration across income groups and for families with or without children. The

previous literature, however, has only examined the heterogeneous effects of air pollution across

skill groups (Chen and Rosenthal 2008; Chen, Oliva, and Zhang, 2017).

II. Data

A. MEP/Air Quality

My air pollution data are published by the Ministry of Environmental Protection of the

People’s Republic of China (MEP), but come from two sources. The first source reports

historical air pollution data with API being the measure for air pollution and covers 2000-2013.

To reduce the negative health impacts of air pollution and encourage pollution monitoring, the

MEP enacted the Ambient Air Quality Standard (GB3095-1996) in 1996 and started disclosing

air pollution information. According to the Standard, around 86 selected cities, including all

provincial municipalities and provincial capitals, were required to report the daily API, which is

5

a normalized index transformed from three pollutant concentrations, 𝑃𝑀10, sulfur dioxide (𝑆𝑂2),

and nitrogen dioxide (𝑁𝑂2). Some of the 𝑃𝑀10 particulates are emitted directly from a source,

such as construction sites, unpaved roads, fields, smokestacks, or fires. Most particulates form in

the atmosphere as a result of complex reactions of chemicals such as 𝑆𝑂2 and nitrogen oxides

(𝑁𝑂𝑥), which are pollutants emitted from power plants, industries, and automobiles.

The API ranges from 0 to 500, with a larger number indicating worse air quality. It is

classified into six levels of air quality: excellent for API≤50, good for 51≤API≤ 100, lightly

polluted for 101≤API≤150, moderately polluted for 151≤API≤200, heavily polluted for

201≤API≤300, and severely polluted for 301≤API≤500. Figure Ia reports the fraction of

cities/days in each category for all cities monitored from June 4th, 2000 to December 29th, 2013.

The most represented categories are good, excellent, and slightly polluted. In total, 2 percent of

city/days are moderately polluted, heavily polluted, or severely polluted. In addition to reporting

the daily API, the selected cities were also required to report the primary pollutant (𝑃𝑀10, 𝑆𝑂2,

or 𝑁𝑂2), potential health effects, and a cautionary statement for specific sensitive groups of

people. Among all three pollutants, 𝑃𝑀10 is the main pollutant for 91 percent of cities/days.

All the real-time API readings on the monitoring stations were submitted to the MEP’s

Data Center and were disclosed to the public on the MEP’s official website. In addition to

accessing this air pollution information on the MEP’s official website, the public could also

obtain it through other websites, smart phone apps, and social networks. The information was

also available through more traditional media sources such as newspaper, TV, and radio.

These disclosed API readings constitute the first source of my air pollution data. The data

were collected by gracecode.com, a third party that web-scraped the MEP’s Data Center and

6

compiled the historical pollution information.2 The data contain the daily API and the main

pollutant for all Chinese cities required to report the air pollution information from June 4th, 2000

to December 29th, 2013. These data are used in the analysis of the effect of air pollution on

location choice.

The second source of my air pollution data uses AQI as the measure of air pollution and

covers a more recent time period. On February 29th, 2012, the Ambient Air Quality Standard

(GB3095-1996) of 1996 was replaced with the Ambient Air Quality Standard (GB3095-2012).

The new Standard introduced the AQI as a new measure of air quality. Although the new index

is classified similarly as the old index, the AQI is calculated based on a different formula, and

comprises a more thorough set of pollutants, including 𝑆𝑂2, 𝑁𝑂2, 𝑃𝑀10, 𝑃𝑀2.5, 𝑂3, and 𝐶𝑂.

Therefore, I do not make direct comparisons between days measured by the API and those

measured by the AQI. Figure Ib reports the distribution of city/days in each category for all cities

monitored from May 13th to June 12th of both 2014 and 2016. The AQI as well as the

concentration of the full set of pollutants were published by the China National Environmental

Monitoring Center, a subsidiary of the MEP, and web-scraped by beijingair.sinaapp.com, another

third party that compiled the historical air pollution data. I use these data for my analysis on out-

migration.

Several papers have examined the reliability of the API data published by the MEP.

Andrews (2008) is the first to question the accuracy of these API data. Heeding the existence of a

national air pollution standard3 used to evaluate local governments, he argues that the reported

improvements in air quality for 2006-2007 over 2002 levels can be attributed to 1) a shift in

2 The MEP does not provide compiled historical air pollution data. The information released online was retracted

after a period of disclosure. 3 The national standard for a “blue-sky day” is API being less than 100. The number of “blue-sky days” factor into

local cadre’s evaluation, and thus the potential manipulation around the cutoff of 100 forms the basis of the

controversy.

7

reported 𝑃𝑀10 levels from just above to just below the national standard, and 2) a shift of

monitoring stations in 2006 to less polluted areas. Ghanem and Zhang (2013) and Chen et al.

(2013) similarly document the manipulation of reported air pollution data by the local

governments by noticing a significant discontinuity at the threshold of the national standard.

Nevertheless, the API data published by the MEP convey useful information and should in

general be reliable, because, by comparing the published API with visibility reported by the

China Meteorological Administration and Aerosol Optical Depth (AOD) from NASA satellites,

Chen et al. (2013) find significant correlation of the API published by the MEP with the two

alternative measures of air pollution. Furthermore, since the transition to the AQI is relatively

new, to the best of my knowledge, no study has questioned the reliability of the AQI.

B. CLDS

I use the information on migration from the China Labor-dynamics Survey (CLDS). The

CLDS is a rotating panel starting from 2012, and consists of a nationally representative sample

of individuals, households, and counties from 29 provinces and provincial-level municipalities.

The individuals, households, and counties were interviewed every other year from June to

August of the designated survey year.

For my analysis on out-migration, I use the panel component of the 2014 and 2016

CLDS, i.e. 7,744 households that were interviewed in both survey years, because these are the

two years when the sample contains information on out-migration. In particular, I derive the

migration status of each family member from the survey question: “Is this person currently living

at home?” I define people not living at home as migrants if they lived away due to one of the

following reasons: 1) going away long-term for work, 2) going away for school, 3) going away

8

for long-term visits to families or friends, 4) enlisting in the military, and 5) going abroad. A few

types of people away from home, however, are excluded from this definition of migrants; they

are: middle school and elementary school students living in boarding schools, people on short-

term trip for business or for pleasure, and people on short-term trip visiting families or friends.

Among the people living away from home, only those living in a district/county different from

that of the home being surveyed are counted as migrants. The basic information regarding the

family members away from home, i.e. the migrants, was collected from the family member

living at home at the time of the survey. For the people away from home, their mean age was 28;

43 percent of them were never married; only 8 percent of them were the household heads of their

households. These summary statistics suggest that the migrants defined by this survey question

mostly consist of the children of the households. This fact bears importance in my analysis of

out-migration. For instance, as reported in Table I, the households with migrants on average had

more children and less income compared to households without migrants; on average, the

household heads in the households with migrants were also older, less educated, and more likely

to have rural Hukou.

To study the effect of air pollution on location choice, I use the full sample of 23,594

individuals from the 2014 CLDS. In particular, the individual-level data of the 2014 CLDS

contain the complete migration history of each individual in the sample. The migration history

includes the destination city, the year, and the primary reason for each migration episode. I use

this piece of information to retrospectively construct a panel of individual location choice in each

year from 2004 to 2014. For an individual who moved more than once in a given year, I let

his/her final location after the last move to be his/her location in that year. Most people (around

82 percent) never migrated until being interviewed in 2014; about 13 percent of individuals

9

migrated once; about 5 percent of individuals migrated for more than once. There are 219 cities

represented by the location choices of this retrospective panel. I allow every individual to choose

each year from these 219 cities. Although there are a total of 287 cities in China, due to the

Independence of Irrelevant Alternatives assumption, excluding the other 68 cities in the

conditional logit model does not affect the consistency of my estimates, because excluding these

cities does not affect the relative probability between a chosen city and an unchosen city within

this 219-city set, given that these 219 cities were chosen randomly by the sampling design of the

CLDS.

C. Merging MEP/CLDS

The first part of my analysis focuses on the effect of air pollution in the origin on out-

migration. For this part of the analysis, I merge the AQI data from the MEP in 2014 and 2016

with the panel component of the household-level data from the 2014 and 2016 CLDS. As

mentioned above, the surveys of the 2014 and 2016 CLDS were conducted from June to August

of 2014 and 2016, and given that the AQI data are available only since May 13th, 2014, I

calculate the average AQI of each monitored city in the month following May 13th of 2014 and

2016 (i.e. 05/13/2014-06/12/2014 and 05/13/2016-06/12/2016) and let this average be my city-

level air pollution measure for that year. In doing so, I focus on the effect of air pollution in May

and June on the out-migration status at the time of the survey in the same year. The advantage of

limiting my analysis to this short window is that it allows for a consistent measure of air

pollution and that it allows me to exploit the panel component of the CLDS. The disadvantage of

adopting this short window is that air pollution in China displays seasonal patterns, so air

pollution in May and June may not be representative of air pollution in other months. This

10

disadvantage affects the interpretation of my results, but since I am only using the within-

household variation in out-migration, it does not affect the consistency of my coefficient

estimates. Figure II shows the average daily AQI in the month following May 13th, 2014 across

four major cities.

In the household-level CLDS sample, I consider the city in which a household was

located at the time of the survey as the origin for the potential migration of a member in the

household.4 The AQI data cover all 287 cities in 2016, but only 190 cities in 2014. This set of

190 cities monitored in 2014 does not perfectly overlap with the cities represented by the CLDS

household sample. To improve my ability to use the CLDS sample, I interpolate the average AQI

at un-monitored cities using the average AQI at monitored cities via ordinary kriging. Figure III

shows the prediction map of the interpolation,5 where the bubbles indicate the locations of the

monitored cities, and the size of the bubbles represents the severity of the air pollution at these

locations; the purple crosses illustrate the locations of 284 out of 287 cities in China; the

background color represents the AQI level at each point in China. The prediction map suggests

that the air pollution is more severe in northern and northwest China and less severe in northeast

China and southern China. This prediction is consistent with the geographical patterns of air

pollution in China because 1) the steel industry congregates in northern China around Beijing, 2)

Northwest provinces such as Inner Mongolia and Xinjiang experience sandstorms each year in

the spring, which significantly increases particulate matter concentrations in these regions.

4 To protect of the privacy of the interviewed households, the CLDS sample specifies the cities in which the

households resided, but not the county. 5 The interpolation assigns a value to the un-monitored cities based on a weighted average of the average API of all

monitored cities. With ordinary kriging, the weight placed on each monitored city not only is based on the distance

between the monitored city and the interpolated city, but also de-clusters groups of monitored cities. See Cressie

(2015) and Isaaks and Srivastava (1989) for a detailed description of interpolation using ordinary kriging.

11

The second part of my analysis focuses on the effect of air pollution on location choice.

For this part of the analysis, I calculate the average API of each potential destination city for

each year from 2003 to 2013. Figure IV shows the number of cities monitored for the API by the

MEP each day from June 4th, 2000 to December 29th, 2013. Since some individuals in the

retrospective panel chose un-monitored cities, I perform the same interpolation procedure as

mentioned above to impute the average annual API at un-monitored cities. To partially

demonstrate the validity of using the average API at monitored cities to impute the average API

at un-monitored cities, Table IIa and Table IIb compare the distribution of the AQI from

December, 2013 to February, 2015 in cities previously monitored for the API and that in cities

previously un-monitored for the API. Since the reporting of the AQI succeeded the reporting of

the API and the AQI was reported for a larger set of cities, comparing the AQI in later years

gives some sense of whether these two sets of cities are comparable. The lack of difference

between these two sets of cities lends credit to using the API in the monitored cities to

interpolate the API in the un-monitored cities.

D. Weather

For my instrument, I obtain data on the prevailing wind directions at all 1,156 monitoring

stations across China from 1981 to 2010 from the China Meteorological Data Service Center. To

calculate the prevailing wind direction of each city, I assign the city in which each household

resided or each individual chose to its closest monitoring station. The wind could take 16

different directions, with each direction representing a span of 22.5°.

E. City-Level Characteristics

12

To construct the second instrument, I obtain data on the dust (or soot) and 𝑆𝑂2 emission

levels for 290 Chinese cities in each year from 2003-2014 and 2016 from the China City

Statistical Yearbook. I also obtain data on the number of unemployed, the per capita GRP, and

the gross industrial output value for these cities in 2014 and 2016 from the same Yearbook.

III. Methods

A. Out-migration and Identification Strategy

In the first part of my analysis, I estimate a linear probability model to study the effect of

air pollution in the origin on out-migration using the panel component of the 2014 and 2016

CLDS households. The estimated equation is as follows:

(1) 𝑆𝑒𝑛𝑡𝑀𝑖𝑔𝑟𝑎𝑛𝑡𝑖𝑐𝑡

= 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝐴𝑄𝐼𝑐𝑡𝛽1 + 𝐻𝑒𝑎𝑑𝐸𝑑𝑢𝑖𝑐𝑡𝛽2 + 𝐻𝑒𝑎𝑑𝐸𝑑𝑢𝑖𝑐𝑡 × 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝐴𝑄𝐼𝑐𝑡𝛽3 + 𝐼𝑛𝑐𝑜𝑚𝑒𝑖𝑐𝑡𝛽4

+ 𝐼𝑛𝑐𝑜𝑚𝑒𝑖𝑐𝑡 × 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝐴𝑄𝐼𝑐𝑡𝛽5 + 𝐻𝑎𝑣𝑒𝐶ℎ𝑖𝑙𝑑𝑖𝑐𝑡𝛽6 + 𝐻𝑎𝑣𝑒𝐶ℎ𝑖𝑙𝑑𝑖𝑐𝑡 × 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝐴𝑄𝐼𝑐𝑡𝛽7

+ 𝑋𝑐𝑡𝛽8 + 𝜇𝑖 + 𝛿𝑡 + 𝜖𝑖𝑐𝑡

where 𝑆𝑒𝑛𝑡𝑀𝑖𝑔𝑟𝑎𝑛𝑡𝑖𝑡 is an indicator for having a migrant in household 𝑖 in year 𝑡;

𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝐴𝑄𝐼𝑐𝑡 is the average AQI in year 𝑡 of city 𝑐 in which household 𝑖 was located in;

𝐻𝑒𝑎𝑑𝐸𝑑𝑢𝑖𝑐𝑡 is the years of education of the household head of household 𝑖 in year 𝑡; 𝐼𝑛𝑐𝑜𝑚𝑒𝑖𝑐𝑡

is the family income of household 𝑖 in year 𝑡; 𝐻𝑎𝑣𝑒𝐶ℎ𝑖𝑙𝑑𝑖𝑐𝑡 is the dummy for having a child in

household 𝑖 in year 𝑡; 𝑋𝑐𝑡 is the city-level controls including the unemployment rate, per capita

GRP, and gross industrial output value; 𝜇𝑖 is the household fixed effect; 𝛿𝑡 is the year fixed

13

effect with 2014 being the base year; 𝜖𝑖𝑐𝑡 is the error term. The CLDS surveys were conducted

such that no households followed-up in 2016 moved between 2014 and 2016, so the household

fixed effect also controls for the location-specific time-invariant characteristics.

I include the interaction terms between the average AQI and household head’s years of

education, family income, and the dummy for having a child. It is possible that more educated

people are more responsive to air pollution. This is because, on the one hand, skilled workers

may view pollution as disamenity only; on the other hand, low-skilled workers may value the job

opportunities created by polluting factories, so it is less clear whether they view pollution as

disamenity. It is possible that richer households are more responsive to air pollution, since,

assuming that air quality is a normal good, I expect that richer households consume more good-

quality air by migrating. If only richer people migrate in response to air pollution, migration can

be a venue for exacerbated quality of life inequality in China. Also, It is possible that households

with children are more responsive to air pollution, since children are more vulnerable to air

pollution, and parents may want to invest in their children’s health capital. Thus, parents may

place additional weight on air pollution when they decide whether or not to leave and where to

go.

I aim to address a potential source of endogeneity: air pollution is endogenous to local

economic activities. There are many determinants of out-migration in the error term that capture

how good a city was. I do not know what they are, but they can be correlated with the air

pollution levels. This potential source of endogeneity can cause the OLS coefficient estimate for

the average AQI in Equation (1) to be downward biased. To address this concern, I instrument

the average AQI with air pollution from distant sources. The idea of using air pollution from

distant sources as an instrument for local air pollution is first seen in Bayer et al. (2009). Bayer et

14

al. use a detailed source-receptor matrix developed for the United States Environmental

Protection Agency that relates emissions from nearly 6,000 sources to 𝑃𝑀10 in each county in

the U.S. to calculate the marginal willingness to pay for clean air in the U.S. Using this matrix,

they are able to calculate how much the pollution sources more than 80km away from a county

contributed to the 𝑃𝑀10 levels in that county. A similar IV strategy based on air pollution from

distant sources is later seen in Zheng et al. (2015), who use it to study the effect of air pollution

on China’s housing prices. Since the same source-receptor matrix used in Bayer et al. (2009)

does not exist in China, I adopt the formulation of the instrument from Zheng et al. (2015):

(2) 𝑁𝐸𝐼𝐺𝐻𝐵𝑂𝑅𝑖𝑡 = ∑ 𝑤𝑖𝑗

𝑗

∙ 𝑠𝑚𝑜𝑘𝑒 𝑒𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑗𝑡 ∙ 𝑒−𝑑𝑖𝑗 , 𝑑𝑖𝑗 > 120𝑘𝑚

where 𝑤𝑖𝑗 is a dummy variable that takes the value of 1 if source city 𝑗 is located in the

prevailing wind direction of receiving city 𝑖; 𝑠𝑚𝑜𝑘𝑒 𝑒𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑗𝑡 is city 𝑗’s emission level in year

𝑡; 𝑑𝑖𝑗 is the distance between city 𝑖 and city 𝑗; and 𝑒−𝑑𝑖𝑗 is the value of a continuous and

exponential decreasing function, so the weight declines as the distance between the city 𝑗 and

city 𝑖 increases.

In constructing this instrument, I carry out the following procedure. First, I let the most

frequent wind direction of a city from 1981 to 2010 to be the prevailing wind direction of that

city. Prevailing wind is recurring, and the most frequent wind direction from 1981 to 2010

should represent the prevailing wind direction in 2014 and 2016. Second, following Bayer et al.

(2009), I sum the amount the dust (or soot) and the amount of 𝑆𝑂2 (both measured in tons) and

let it be the emission level of a source city, since both dust (or soot) and 𝑆𝑂2 factor into the

calculation of the AQI (and API). Since I observe the concentration of each pollutant along with

the AQI in 2014 and 2016, in Section V, I will instrument particulate matter concentration with

15

dust (or soot) from distant sources and 𝑆𝑂2 concentration with 𝑆𝑂2 from distant sources

separately as a robustness check. Third, I measure the distance 𝑑𝑖𝑗 by the degrees of longitude

and latitude.

The remaining step is to choose the exclusion distance within which the emissions do not

count toward air pollution from distant sources. An ideal distance would allow the instrument to

be correlated with local AQI but uncorrelated with local economic activities. Increasing this

distance would weaken both correlations, and decreasing this distance would strengthen both. To

choose a good exclusion distance, I summarize the correlation between air quality measures and

observable local economic activities variables in Table III. The air quality measures are in the

top row of Table III and include the AQI, pollution from sources > 50km, pollution from sources

> 80km, and pollution from sources > 120km. The local economic activities variables are in the

left-most column. * indicates a coefficient statistically significantly different from 0 at 20 percent

level while regressing the air quality measure on the city characteristic on a sample of cities. The

first observation is that, from Column (1), the AQI is highly correlated with observable local

economic activities variables. This is evidence that the OLS estimates of Equation (1) could be

biased. The second observation is that air pollution from distant sources, despite being highly

correlated with the AQI, is less correlated with local economic activities variables than the AQI

itself. I choose 120km as the exclusion distance to be consistent with the exclusion distance used

by Zheng et al. (2015).

Table IV reports the first stage. The first stage estimated on a sample of cities is strong,

with an F-Statistic of 36. The average AQI and air pollution from distant sources are both

normalized to z-scores with mean 0 and standard deviation 1. As expected, the average AQI was

increasing in air pollution from distant sources. On average, a one-standard deviation increase in

16

air pollution from distant sources was associated with a 0.4-standard deviation increase in the

average AQI.

B. Location Choice

For the second part of my analysis, I estimate a conditional logit model (McFadden,

1974) to study the effect of air pollution on location choice. This part of the analysis allows me

to exploit the long retrospective time span of the API data. The advantage of the choice model is

that it allows me to explicitly describe an individual’s utility; it also captures the relative

characteristics of a place, and thus allows for a role for both the origin and the destination. With

this model, the identification comes from the revealed preference of the individuals over

locations with different levels of air pollution. The model assumes that the error term has i.i.d.

type-1 extreme value distribution. Because the error terms are assumed to be independent, the

model also assumes IIA. In particular, the error terms for close-by locations are assumed to be

uncorrelated with one another. I estimate the following equation:

(3) 𝑈𝑖𝑗𝑡

= 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1𝛼 + 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1𝛽 + 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1𝛾

+ 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡𝜙 + 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1𝜃 + 𝜈𝑖𝑗𝑡

where 𝑈𝑖𝑗𝑡 is individual 𝑖’s utility of choosing city 𝑗 in year 𝑡; 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 is the average

API of city 𝑗 in year 𝑡 − 1; 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1 is a dummy for whether individual 𝑖 was located in city

𝑗 in year 𝑡 − 1; 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 is a dummy for whether individual 𝑖 had been to city 𝑗 by year 𝑡;

𝜈𝑖𝑗𝑡 is the error term. I lag the average API by one year to avoid reverse causality. 𝛼 is the effect

17

of the average API in a city in a given year on the probability that the city was chosen in the

following year. 𝛽 is the probability that someone stayed where he/she was the year before. 𝛼 + 𝛾

is the effect of the average API in the origin. 𝜙 is the probability that someone revisited a city he

had been to before. 𝜃 is the additional effect of the average API in a city he had been to on the

probability that the city was chosen again. I expect that 𝛼 < 0, 𝛽 > 0, 𝜙 > 0, and since an

individual might have better information regarding the air quality in places he had been to, I

expect 𝛾 < 0 and 𝜃 < 0.

Since the average API is endogenous to local economic activities, to instrument for the

average API, I estimate Equation (3) via GMM instead of maximum likelihood estimation

(MLE). Following Train (2009, pp. 326), I derive the following moment condition:

∑ ∑(𝑌𝑖𝑗𝑡 − ℙ(𝑌𝑖𝑗𝑡 = 1|𝑋𝑖𝑗𝑡))𝑍𝑖𝑗𝑡 = 0

𝑗𝑖,𝑡

Where 𝑌𝑖𝑗𝑡 is 1 if individual 𝑖 was located in city 𝑗 in year 𝑡; ℙ(𝑌𝑖𝑗𝑡 = 1|𝑋𝑖𝑗𝑡) is the conditional

probability that individual 𝑖 was located in city 𝑗 in year 𝑡; 𝑋𝑖𝑗𝑡 are the regressors in Equation (3)

including 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1, 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1, 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1, 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡,

and 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1; and 𝑍𝑖𝑗𝑡 is the instrument. The moment condition has an

intuitive construct: the observed mean of the instrument (i.e. ∑ ∑ 𝑌𝑖𝑗𝑡𝑍𝑖𝑗𝑡𝑗𝑖,𝑡 ) is equal to the mean

predicted by the model (i.e. ∑ ∑ ℙ(𝑌𝑖𝑗𝑡 = 1|𝑋𝑖𝑗𝑡)𝑍𝑖𝑗𝑡𝑗𝑖,𝑡 ). Under the assumption that the error

term has i.i.d. type-1 extreme value distribution, ℙ(𝑌𝑖𝑗𝑡 = 1|𝑋𝑖𝑗𝑡) has a closed-form solution

(McFadden, 1974):

ℙ(𝑌𝑖𝑗𝑡 = 1|𝑋𝑖𝑗𝑡) =𝑒𝑋𝑖𝑗𝑡

∑ 𝑒𝑋𝑖𝑘𝑡𝑘

18

As in the analysis of out-migration, I instrument for 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 with air pollution from

distant sources, while 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1 and 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 serve as their own instruments. Thus, the

model is exactly identified. In carrying out the GMM estimation, I assign the coefficient

estimates from estimating Equation (3) via MLE without the instrument as the initial values of

the coefficients.

IV. Results

A. Out-Migration

Table IV reports the results for the effect of the average AQI in the origin on the

probability that a household had a migrant. The OLS result in Column (1) suggests that a higher

average API in the origin is associated with a lower probability of having a migrant. The sign of

this coefficient estimate is expected and reflects that endogeneity causes the coefficient estimate

to be downward-biased. Column (2) shows the result after I add household and year fixed

effects. The fixed effects absorb all the household-specific or time-specific characteristics, and

partially corrects the sign of the coefficient estimate. Column (3) reports the IV result, and

suggests that a one-standard deviation increase in the average AQI increased the probability of

having a migrant by 29 percentage points. Given that the mean probability of having a migrant is

32 percent, the 29-percentage-point increase as suggested by the IV result is a sizable effect.

Table V reports the heterogeneous effects of the average AQI in the origin on the

probability that a household had a migrant across education and income groups, and for families

with or without children. The OLS results suggest that households with a less educated

household head were more likely to send out a migrant, and that households with children were

19

more likely to send out a migrant. These results may merely reflect that the out-migrants

recorded in the CLDS data are mostly the children of the household, so rural households with

less education and households with children were more likely to send their children away. After I

add household and year fixed effects, these associations go away, and I find an additional

association that households with children were less responsive to air pollution. Nevertheless, this

may simply be caused by the endogeneity of the average AQI. Indeed, this association goes away

after I instrument for the average AQI. The IV result also suggests that a one-year increase in the


point, so more educated households were more responsive to air pollution.

To explore whether the positive effect of the average AQI on the probability of having a

migrant masks any non-linearity, I add the quadratic term of the average AQI as well as the

interaction terms of the quadratic term and household head’s years of education, family income,

and having a child. Table VI shows the result including these terms. The quadratic terms are only

statistically significant for the interaction terms with family income and having a child in the

fixed effects regression, but not in the IV regression. Thus, I conclude that no non-linear effects

exist.

B. Location Choice

Table VII reports the coefficient estimates from estimating the conditional logit model in

Equation (3). Although these are not the marginal effects, the signs of the coefficient estimates

inform us of the signs of the marginal effects. Without instrumenting for 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1, I

find that the coefficient estimate for 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 is negative and statistically significant,

indicating that people were less likely to choose a city with more air pollution. The coefficient

20

estimates for 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 and 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 have the expected signs,

indicating that an individual was more likely to choose a city if the individual was at that city

before, and that a person was more responsive to air pollution if he/she had been to that city

before, presumably due to better information regarding the air pollution at that city, but the

coefficient estimate for 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 is not statistically significant. The

coefficient estimates for 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1 and 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 have the

unexpected signs, although the former coefficient estimate is not statistically significant; the

positive sign of the latter coefficient estimate may be due to the endogeneity of

𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1.

Column (2) reports the results after I instrument for 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 with air pollution

from distant sources. The coefficient estimates for 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 is more negative,

suggesting that an upward bias does exist if I do not instrument for 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1. None of

the coefficient estimates for the other regressors is statistically significant. Thus, people were

less likely to choose a city with more air pollution, but I cannot conclude whether or not people

were more responsive to air pollution in a city if they were at that city the year before or if they

had been to that city before.

If air pollution did not cause out-migration but “only” determined destination once an

individual had already decided to migrate, I should only observe an effect (𝛼 < 0) in the

conditional logit, and I should not observe a positive effect in the out-migration equation. In fact,

my results do suggest a negative effect (𝛼 < 0) in the conditional logit and a positive effect on

out-migration. I take these results as evidence that air pollution did cause out-migration.

V. Robustness

21

A. Comparison Between Interpolated Values and True Values

In this section, I present an additional piece of evidence that supports the interpolation

method I adopt, in addition to Table III. In particular, I test the reliability of the interpolation by

comparing the interpolated values and the true values of the average API for 10 randomly

selected cities monitored by the MEP in 2013, the last year in the analysis of location choice.

Since there were 68 monitored cities in 2013, the interpolation is performed using the average

API of the other 58 monitored cities. Table VIII shows the 10 cities, their location, and the

interpolated values and the true values of the average API. The interpolated values are

reasonably close to the true values, with a correlation of 0.8861. Furthermore, since 190 cities

were monitored for the AQI in 2014, the interpolation on these cities for the analysis of out-

migration should be even more reliable.

B. Other Exclusion Distances and Pollutants

In Table IX, I present the main out-migration results for varying exclusion distances that

I use to calculate the instrument and with different pollutants as the regressors of interest. As

mentioned above, the exclusion distance is the distance beyond which an emitter city is not

counted toward the air pollution from distant sources of a receiver city. For all columns, I

employ having a migrant in the household as the dependent variable and include household and

year fixed effects. The main coefficient estimates of all specifications are positive and

statistically significant. The magnitude of the coefficient estimates is generally increasing in the

exclusion distance, consistent with the fact that larger exclusion distances imply that the

instrument is less correlated with local economic activities. At the same time, larger exclusion

22

distances do not translate into less correlation between the instrument and the pollutant

concentration, as shown in the first stage and its F-statistics. This further lends credibility to my

choice of 120km as the exclusion distance.

VI. Conclusion

In this paper, I have studied whether people in China moved from more polluted cities to

less polluted cities. By using the API and the AQI data published by the MEP and the 2014 and

2016 CLDS sample, I have offered different strategies to study the effect of air pollution in the

origin and the destination: by estimating a fixed-effect model and by estimating a model of

location choice. In both models, I have implemented IV strategies to address the potential

concern that air pollution is endogenous to local economic activities.

I have found that a one-standard deviation increase in the average AQI increased the

probability of having a migrant by 29 percentage points. I have also found that more educated

households were more responsive to air pollution. In particular, a one-year increase in the


point. I have not found any non-linear effects. Moreover, I have found that people were less

likely to choose a city with more air pollution. From these results, I conclude that air pollution

did cause out-migration.

Many people living in heavily polluted Chinese cities have chosen to adopt particulate-

filtering facemasks and air filter products to cope with air pollution (Zhang and Mu 2016; Sun et

al. 2017). However, there are other ways in which people might have responded to air pollution.

One of them can be migration. In the economics literature, migration has been seen as a

23

technology that mitigates the impacts of negative income shocks (Blanchard and Katz 1992).

Based on the evidence I have provided, migration could have been chosen as a way to mitigate

the health impacts of air pollution. It remains to be shown that migration indeed improved

people’s health outcomes.

Furthermore, even though people moved in response to air pollution, businesses might

not have moved accordingly. For example, there is only limited demand for doctors in the

destination so that a doctor who migrated might have to take on another profession. This

mismatch may result in a change in return to education in some areas. Whether this change

happened merits future research.

24

Reference

Andrews, Steven Q. "Inconsistencies in air quality metrics:‘Blue Sky’days and 𝑃𝑀10

concentrations in Beijing." Environmental Research Letters 3.3 (2008): 034009.

Banzhaf, Spencer H., and Randall P. Walsh. "Do people vote with their feet? An empirical test of

Tiebout's mechanism." The American Economic Review 98.3 (2008): 843-863.

Blanchard, Olivier Jean, and Lawrence F. Katz. "Regional evolutions." Brookings papers on

economic activity 1992.1 (1992): 1-75.

Brandt, Loren, and Thomas G. Rawski, eds. China's great economic transformation. Cambridge

University Press, 2008.

Chay, Kenneth Y., and Michael Greenstone. "Does air quality matter? Evidence from the

housing market." Journal of political Economy 113.2 (2005): 376-424.

Chen, Shuai, Paulina Oliva, and Peng Zhang. "The Effect of Pollution on Migration: Evidence

from China." No. w24036. National Bureau of Economic Research, 2017.

Chen, Yong, and Stuart S. Rosenthal. "Local amenities and life-cycle migration: Do people move

for jobs or fun?." Journal of Urban Economics 64.3 (2008): 519-537.

Chen, Yuyu, et al. Gaming in Air Pollution Data?: Lessons from China. No. w18729. National

Bureau of Economic Research, 2013.

Cressie, Noel. Statistics for spatial data. John Wiley & Sons, 2015.

Currie, Janet, et al. "Does pollution increase school absences?." The Review of Economics and

Statistics 91.4 (2009): 682-694.

Dasgupta, Susmita, et al. "Confronting the environmental Kuznets curve." The Journal of

Economic Perspectives 16.1 (2002): 147-168.

Gauderman, W. James, et al. "The effect of air pollution on lung development from 10 to 18

years of age." New England Journal of Medicine 351.11 (2004): 1057-1067.

Ghanem, Dalia, and Junjie Zhang. "Effortless Perfection: Do Chinese Cities Manipulate” Blue

Skies?”." (2013).

Grainger, Corbett A. "The distributional effects of pollution regulations: Do renters fully pay for

cleaner air?." Journal of Public Economics 96.9 (2012): 840-852.

Isaaks, Edward H. Srivastava, Mohan R. Edward H. Isaaks, and Mohan R. Srivastava. Applied

geostatistics. No. 551.72 ISA. 1989.

25

Jin, Y-Q., and F. Yan. "Monitoring sandstorms and desertification in northern China using

SSM/I data and Getis statistics." International Journal of Remote Sensing 25.11 (2004): 2053-

2060.

Li, Ding, Yan Zhang, and Shuang Ma. "Would Smog Lead to Outflow of Labor Force?

Empirical Evidence from China." Emerging Markets Finance and Trade just-accepted (2017).

Li, Teng, Haoming Liu, and Alberto Salvo. Severe air pollution and labor productivity. No.

8916. IZA Discussion Papers, 2015.

Kumar, Naresh, et al. "Satellite remote sensing for developing time and space resolved estimates

of ambient particulate in Cleveland, OH." Aerosol Science and Technology 45.9 (2011): 1090-

1108.

McFadden, Daniel. 1974. “Analysis of Qualitative Choice Behavior.” Frontiers in Econometrics,

ed. Paul Zarembka. New York: Academic Press.

National Health and Family Planning Commission, Department of Services and Management for

Migrant Population, Report on China’s Migrant Population Development 2016, China

Population Publishing House (2016)

Qin, Yu, and Hongjia Zhu. Run Away? Air Pollution and Emigration Interests in China.

Working Paper, 2015.

Smith, V. Kerry, and Ju-Chin Huang. "Can markets value air quality? A meta-analysis of

hedonic property value models." Journal of political economy 103.1 (1995): 209-227.

Stafford, Tess M. "Indoor air quality and academic performance." Journal of Environmental

Economics and Management 70 (2015): 34-50.

Stern, David I. "Progress on the environmental Kuznets curve?." Environment and development

economics 3.02 (1998): 173-196.

Simpson, James J., et al. "Airborne Asian dust: case study of long-range transport and

implications for the detection of volcanic ash." Weather and Forecasting 18.2 (2003): 121-141.

Sullivan, Daniel M. "Residential Sorting and the Incidence of Local Public Goods: Theory and

Evidence from Air Pollution." Resources for the Future Working Paper (2016).

Sun, Cong, Matthew E. Kahn, and Siqi Zheng. "Self-protection investment exacerbates air

pollution exposure inequality in urban China." Ecological Economics 131 (2017): 468-474.

Train, Kenneth E. Discrete choice methods with simulation. Cambridge university press, 2009.

Wang, Feng, and Xuejin Zuo. "Inside China's cities: Institutional barriers and opportunities for

urban migrants." The American Economic Review 89.2 (1999): 276-280.

26

World Bank national accounts data, and OECD National Accounts data files

World Health Organization, Ambient Air Pollution Database, May 2016

Yang, Gonghuan, et al. "Rapid health transition in China, 1990–2010: findings from the Global

Burden of Disease Study 2010." The lancet 381.9882 (2013): 1987-2015.

Zhang, Junjie, and Quan Mu. "Air pollution and defensive expenditures: Evidence from

particulate-filtering facemasks." (2016).

Zhang, Xin, Xiaobo Zhang, and Xi Chen. "Happiness in the air: How does dirty sky affect

subjective well-being?." (2015).

Zheng, Siqi, and Matthew E. Kahn. "Understanding China's urban pollution dynamics." Journal

of Economic Literature 51.3 (2013): 731-772.

Zheng, Siqi, Matthew E. Kahn, and Hongyu Liu. "Towards a system of open cities in China:

Home prices, FDI flows and air quality in 35 major cities." Regional Science and Urban

Economics 40.1 (2010): 1-10.

Zheng, Siqi, et al. "Real estate valuation and cross-boundary air pollution externalities: evidence

from Chinese cities." The Journal of Real Estate Finance and Economics 48.3 (2014): 398-414.

Zivin, Joshua Graff, and Matthew Neidell. "The impact of pollution on worker

productivity." The American economic review 102.7 (2012): 3652-3673.

27

Figure Ia: Distribution of Days by Pollution Level (API) from 2000-2013

Figure Ib: Distribution of Days by Pollution Level (AQI) in 2014 and 2016

28

Figure II: Average Daily AQI of Four Major Cities in May and June of 2014

29

Figure III: Prediction Map of Interpolating the AQI in 2014 Using Ordinary Kriging

30

Figure IV: Number of Cities Monitored Daily for API by the MEP Over Time

31

1

Table I: Summary Statistics by Whether a Household Sent Out a Migrant

2014

CLDS

House-

holds

2014

House-

holds

with

Migrants

2014

House-

holds

without

Migrants

2014

Differ-

ence

2016

CLDS

House-

holds

2016

House-

holds

with

Migrants

2016

House-

holds

without

Migrants

2016

Differ-

ence

(1) (2) (3) (2)-(3) (4) (5) (6) (5)-(6)

Number of Children in Household 0.74 0.93 0.69 -0.24*** 0.77 0.91 0.68 -0.23***

(0.93) (1.07) (0.88) [0.03] (0.96) (1.06) (0.88) [0.02]

Household Head Age 53.69 54.73 53.41 -1.33*** 55.28 55.33 55.25 -0.08

(13.31) (11.64) (13.71) [0.37] (13.21) (11.52) (14.09) [0.31]

Household Head Years of 8.09 7.30 8.30 1.01*** 8.20 7.54 8.57 1.04***

Education (4.07) (3.63) (4.16) [0.11] (3.97) (3.65) (4.10) [0.09]

Household Head Hukou (1=Rural) 1.51 1.30 1.57 0.26*** 1.64 1.25 1.87 0.62***

(2.83) (3.47) (2.63) [0.08] (3.27) (0.67) (4.06) [0.08]

Total Family Income (¥,000s) 51.07 42.62 53.32 10.70*** 56.79 47.17 62.33 15.16***

(86.94) (79.35) (88.72) [2.42] (95.28) (79.17) (103.03) [2.24]

Having a Migrant in Household 0.21 1.00 0.00 - 0.37 1.00 0.00 -

(1=Having a Migrant) (0.41) (0.00) (0.00) - (0.48) (0.00) (0.00) -

N 7744 1625 6119 7744 2827 4917 Notes: Standard deviations are in the parentheses; standard errors are in the square brackets. ***, **, and * indicate statistically significant coefficients at the one,

five, and ten percent levels, respectively.

32

Table IIa: Comparing the Means of the AQI of Major Cities and of Non-Major Cities

Variable By provincial capital By cities previously monitored for the API

Provincial-level Non-provin- Cities previously Previously

municipalities and cial capital monitored for Un-monitored

provincial capitals cities the API Cities

AQI 108.2 106.7 105.8 109.0

(57.85) (54.34) (54.45) (55.72)

N 10,999 56,938 43,507 24,430

Note: The Ministry of Environmental Protection (MEP) monitored the API for around 86 cities from

06/04/2000 to 12/29/2013 and the AQI for around 330 cities from 12/31/2013 onward.

Table IIb: Comparing the Distributions (Percent) of the Pollution Levels of Major Cities

and of Non-Major Cities

By provincial capital By cities previously monitored for the API

Pollution Provincial-level Non-provin- Cities previously Previously

level municipalities and cial capital monitored for Un-monitored

provincial capitals cities the API Cities

Excellent or 58.50% 59.01% 60.19% 56.68%

good

Lightly 25.82% 25.81% 25.31% 26.70%

polluted

Moderately 8.36% 8.50% 7.90% 9.51%

polluted

Heavily 5.89% 5.66% 5.50% 6.05%

polluted

Severely 1.43% 1.02% 1.10% 1.06%

polluted

Total 100% 100% 100% 100%

Note: The MEP classifies a day in a city with the AQI below 50 as “excellent”, a day in a city with the AQI

between 51 and 100 as “good”, a day in a city with the AQI between 101 and 150 as “lightly polluted”, a

day in a city with the AQI between 151 and 200 as “moderately polluted”, a day in a city with the AQI

between 201 and 300 as “heavily polluted”, and a day in a city with the AQI above 300 as “severely

polluted”.

33

Table III: Correlation Between Air Quality Measures and Local Characteristics

AQI Pollution from

Sources >50km

Pollution from

Sources >80km

Pollution from

Sources >120km

AQI 1* 0.3635* 0.3675* 0.4740*

Per Capital GRP -0.0281* 0.0580 0.0611 0.0681

Gross Industrial Output 0.1441* 0.0401 0.0493* 0.0822*

Unemployment Rate 0.1000* -0.0359 -0.0314 0.0447

Notes: Each cell contains the correlation between the corresponding city characteristic (listed in the left-hand column) and the measure of air quality (listed in the

top row) in the city. The air quality measures are AQI, air pollution from sources more than 50 km away from the receiving city, air pollution from sources more

than 80 km away from the receiving city, air pollution from sources more than 120 km away from the receiving city. * indicates a coefficient statistically

significantly different from 0 at 20 percent level while regressing the air quality measure on the city characteristic on a sample of cities.

34

1

Table IV: The Effect of Air Pollution in the Origin on Out-migration

Dependent Variable: Having a Migrant in the Household

(1) (2) (3)

OLS FE IV

Average AQI -0.027*** -0.024*** 0.286***

(0.004) (0.009) (0.088)

Household and Year Fixed N Y Y

Effects

First Stage: 0.372***

(0.059)

F Statistic of First Stage: 36.22

R2 0.05 0.09 -

Mean of Dep. Var. 0.315 0.315 0.315

F 191.819 135.461 88.844

N 14289 14289 14092 Notes: The IV regression uses air pollution from distant sources as the instrument for average AQI. The

average AQI and air pollution from distant sources are normalized to z-scores. All regression control for

per capita GRP, gross industrial output, and unemployment rate. All regressions apply sampling weights.

Standard errors in parentheses. * p < 0.1, ** p < 0.05, *** p < 0.01.

35

1

Table V: The Heterogeneous Effects of Air Pollution in the Origin on Out-migration


(1) (2) (3)

OLS FE IV

Average AQI -0.041*** -0.010 0.238***

(0.010) (0.014) (0.078)

Household Head's Years of -0.007*** -0.001 -0.001

Education (0.001) (0.002) (0.002)

Average AQI × Years of 0.001 -0.000 0.007**

Education (0.001) (0.001) (0.003)

Family Income (¥,000s) -0.000 -0.000 -0.000

(0.000) (0.000) (0.000)

Average AQI × Family 0.000 0.000 -0.000

Income (0.000) (0.000) (0.000)

Having a Child 0.058*** -0.025 -0.033

(0.008) (0.020) (0.021)

Average AQI × Having a 0.008 -0.032*** -0.026

Child (0.008) (0.011) (0.022)

R2 0.06 0.09 -

Mean of Dep. Var. 0.316 0.316 0.316

F 87.622 61.751 40.392

N 14142 14142 13802 Notes: The IV regression uses air pollution from distant sources as the instrument for average AQI. The

average AQI and air pollution from distant sources are normalized to z-scores. All regressions apply

sampling weights. Standard errors in parentheses. * p < 0.1, ** p < 0.05, *** p < 0.01.

36

1

Table VI: The Non-linear Effects of Air Pollution in the Origin on Out-migration


(1) (2) (3)

OLS FE IV

Average AQI -0.037*** -0.036* 0.077

(0.011) (0.021) (0.156)

Average AQI Squared -0.007 0.010 -0.055

(0.010) (0.014) (0.116)

Household Head's Years of -0.006*** -0.001 -0.008

Education (0.001) (0.002) (0.012)

Average AQI × Years of 0.002* -0.001 -0.006

Education (0.001) (0.002) (0.013)

Average AQI Squared × -0.001 0.001 0.008

Years of Education (0.001) (0.001) (0.014)

Family Income (¥,000s) -0.000 -0.000* 0.000

(0.000) (0.000) (0.000)

Average AQI × Family 0.000 0.000 0.000

Income (0.000) (0.000) (0.000)

Average AQI Squared × -0.000 0.000* -0.000

Family Income (0.000) (0.000) (0.001)

Having a Child 0.057*** -0.045** -0.134

(0.011) (0.022) (0.112)

Average AQI × Having a 0.005 -0.045*** -0.145

Child (0.009) (0.015) (0.126)

Average AQI Squared × 0.002 0.018* 0.105

Having a Child (0.008) (0.011) (0.111)

R2 0.06 0.09 0.07

Mean of Dep. Var. 0.316 0.316 0.316

F 63.972 47.368 34.237

N 14142 14142 13802 Notes: The IV regression uses air pollution from distant sources as the instrument for average AQI and air

pollution from distant sources squared as the instrument for average AQI squared, etc. The average AQI

and air pollution from distant sources are normalized to z-scores. All regressions apply sampling weights.

Standard errors in parentheses. * p < 0.1, ** p < 0.05, *** p < 0.01.

37

Table VII: Coefficient Estimates of Conditional Logit Model

Dependent Variable: City Being Chosen

Instrument for API: No Yes

(1) (2)

Location API -0.052*** -0.086***

(0.025) (0.029)

Current -1.013 0.896

(0.926) (3.763)

Current × Location 0.103*** 0.069

API (0.017) (0.065)

Have Been 5.866*** 1.035

(1.845) (4.099)

Have Been × -0.008 0.073

Location API (0.030) (0.070)

N 56,836,194 56,836,194

Note: The sample includes all 23,594 individuals in the 2014 CLDS. All

individuals are allowed to choose from 219 Chinese cities, an exhaustive set

of cities represented by these individuals’ locations from 2004 to 2014.

Standard errors are in parentheses. * p < 0.1, ** p < 0.05, *** p < 0.01.

38

Table VIII: Comparison Between the Interpolated API of 10 Randomly Selected Cities with

Their True Values in 2013

City Name Province Longitude Latitude Interpolated True API

API

Weinan City Shaanxi 109.50 34.50 72.49 67.64

Weifang City Shandong 119.16 36.71 82.95 100.43

Yueyang City Hunan 113.13 29.36 77.62 64.98

Mianyang City Sichuan 104.68 31.47 69.65 63.01

Karamay City Xinjiang 84.89 45.58 66.70 54.17

Yuxi City Yunnan 102.53 24.35 57.85 56.19

Jining City Shandong 116.59 35.41 89.80 95.25

Qiqihar City Heilongjiang 123.92 47.35 54.16 53.10

Rizhao City Shandong 119.53 35.42 81.13 76.31

Zaozhuang City Shandong 117.32 34.81 88.23 101.85

Correlation between the interpolated API and the true API: 0.8861

Standard Deviation of the API of the 68 monitored cities in 2013: 15.17

Note: The table compares the predicted API and the true API of 10 randomly selected cities among all 68 monitored

cities in 2013. The interpolation is performed over the other 58 cities using ordinary kriging.

39

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Pollutant:

Instrument:

Exclusion Distance: >50km >80km >120km >50km >80km >120km >50km >80km >120km >50km >80km >120km

Average AQI 0.160*** 0.166*** 0.234***

(0.049) (0.052) (0.059)

Average PM10 0.087*** 0.082*** 0.114***

(0.026) (0.027) (0.030)

Average PM2.5 0.122*** 0.112*** 0.154***

(0.038) (0.037) (0.041)

Average SO2 0.119*** 0.115*** 0.154***

(0.037) (0.035) (0.036)

First Stage: 0.271*** 0.271*** 0.372*** 0.241*** 0.240*** 0.287*** 0.259*** 0.257*** 0.301*** 0.299*** 0.285*** 0.378***

(0.061) (0.061) (0.059) (0.061) (0.061) (0.060) (0.059) (0.059) (0.058) (0.060) (0.060) (0.059)

First Stage F Stat: 25.1 25.09 36.22 24.46 24.34 28.38 32.18 32.02 36.49 30.29 28.91 39.73

Mean of Dep. Var. 0.314 0.314 0.314 0.314 0.314 0.314 0.314 0.314 0.314 0.314 0.314 0.314

F 354.26 351.733 327.331 376.843 377.306 371.027 365.294 367.818 356.165 380.905 381.634 378.565

N 15356 15356 15356 15356 15356 15356 15356 15356 15356 15356 15356 15356

Table IX: Robustness Check with Other Exclusion Distance and Pollutants

Notes: PM10 is particulate matter with diameter less than 10 micrometers; PM2.5 is particulate matter with diameter less than 10 micrometers. The concentrations of PM10,

PM2.5, and SO2 are originally in micrograms per cubic meter, and are normalized to z-scores.. The average AQI and air pollution from distant sources are also normalized to z-

scores. Dust and SO2 emissions are from the China City Statistical Yearbook. The exclusion distance is the distance beyond which an emitter city is not counted toward the air

pollution of a receiver city. All regressions apply sampling weights. Standard errors in parentheses. * p<0.1, ** p<0.05, *** p<0.01.


AQI

Sum of Dust and SO2

PM10

Dust

PM2.5

Dust

SO2

SO2

40

the effect of air pollution on china’s ... - github pages · choice. second, did certain types of...

Documents