the effect of air pollution on china’s ... - github pages · choice. second, did certain types of...
TRANSCRIPT
The Effect of Air Pollution on China’s Internal Migration
Wenbo Li1
University of Notre Dame
March 7, 2019
Abstract Do people in China move from more polluted cities to less polluted cities? To answer this
question, I merge the air pollution data of Chinese cities from 2003-2016 with the migration data from a
nationally representative sample. I estimate a fixed effect model to study the effect of air pollution in the
origin on out-migration and a conditional logit model to study the effect of air pollution on location
choice. In both models, I employ air pollution from distant sources as an instrument for local air pollution
to address the potential concern that air pollution is endogenous to local economic activities. I find that a
one-standard deviation increase in the average Air Quality Index (AQI) increased the probability of
having a migrant by a sizable 29 percentage points. A one-year increase in the household head’s
education increased the marginal effect of the average AQI by 1 percentage point. Moreover, I find that
people were less likely to choose a city with more air pollution.
1 I thank my advisor, Abigail Wozniak, for her advice and support.
1
I. Introduction
Air pollution in China is substantial and has health impacts. Among the 20 cities with the
worst air quality in the world in 2016, 4 are located in China (World Health Organization, 2016).
China’s increased industrial activities and rising number of automobiles are the two major
emission sources to blame. A coal-rich country, China primarily relies on coal for its electricity
generation and winter heating, which further aggravate the problem with its air quality. Air
pollution is now recognized as an increasing concern that affects the cardiopulmonary health of
people living in China (Brandt and Rawski 2008). The 2010 Global Burden of Disease Study
suggests that air pollution is the 4th leading health risk factor for Chinese people (Yang et al.
2013).
In addition to studying the impact of air pollution on health outcomes, the past literature
has also focused on the impact of air pollution on workers’ productivity (Zivin et al. 2012; Li et
al. 2015), academic outcomes (Currie et al. 2009; Stafford 2015), and housing prices (Zheng et
al. 2010; Zheng and Kahn 2013), but air pollution can also create distortions to other economic
activities, such as migration. There are two reasons why ignoring the effect on migration may
understate the true impact of air pollution. First, if people do move from more polluted cities to
less polluted cities, migration will have reduced the impact of air pollution shocks on people’s
health outcomes. This can imply that a country such as China should relax its institutional
restrictions on migration to facilitate migration’s function of coping with air pollution. This also
implies that, if residents already have migration as an alternative way to cope with air pollution,
they may press local authorities less hard to control air pollution. Second, the forced
2
displacement of people as a result of air pollution may create a distortion to the population
distribution across regions in China. This distortion may also vary by education, income, etc.
In this paper, I address the following two questions. First, did people move from more
polluted cities to less polluted cities? In particular, with city-level air pollution data from 2003 to
2016 and migration survey data of a nationally representative sample, I empirically uncover the
effect of air pollution in the origin on out-migration and the effect of air pollution on location
choice. Second, did certain types of people respond more to air pollution by migrating? In
particular, I study the heterogeneous effects across education and income groups, and for
families with or without children.
To study these questions, I calculate the annual average of the Air Pollution Index (API)
or the Air Quality Index (AQI) in each city and use it as the measure for air pollution levels. I
first estimate a fixed-effect model to examine the effect of the average AQI in the origin on the
probability that a household had a migrant. I also estimate a conditional logit model to study the
effect of air pollution on location choice. In both cases, I use instrumental variables (IV)
strategies to address the potential concern that air pollution is endogenous to local economic
activities, using air pollution from distant sources as the instrument for local air pollution. The
advantage of the choice model is that it allows me to explicitly describe an individual’s utility; it
also captures the relative characteristics of a place, and thus allows for a role for both the origin
and the destination. To incorporate an IV framework in the conditional logit model, I estimate
the conditional logit model through Generalized Method of Moments (GMM).
There are potential reasons why or why not air pollution in China has an effect on
migration. On the one hand, migration provides an air quality arbitrage opportunity and allows
migrants to accumulate additional health capital. On the other hand, it is possible that air
3
pollution does not affect people’s decision making in a developing country such as China
because the income level there is not sufficiently high. I find that air pollution did cause out-
migration. In particular, a one-standard deviation increase in the average AQI increased the
probability of having a migrant by a sizable 29 percentage points. I also find that more educated
households were more responsive to air pollution. In particular, a one-year increase in the
household head’s education increased the marginal effect of the average AQI by 1 percentage
point. Moreover, I find that people were less likely to choose a city with more air pollution.
My paper contributes to the existing literature in two ways. First, my paper explicitly
identifies whether air pollution caused out-migration. In the existing literature, Qin and Zhu
(2015) have investigated the short-term impact of air pollution in China on people’s interest in
emigration. They find that searches on “emigration” for a Chinese online search engine will
grow by approximately 2-5 percent the next day if today’s AQI is increased by 100 points. Since
migration is ultimately a long-term decision, however, a rise in a person’s short-term sentiment
found in Qin and Zhu (2015) regarding migration might not translate into a migration episode.
Another set of previous literature has studied the long-term migration responses to air pollution
in the context of the United States. Banzhaf and Walsh (2008) find that the introduction of a
polluting facility causes individuals to leave the neighborhood and that the exit of a polluting
facility causes them to enter. Sullivan (2017) finds that neighborhoods with improved air quality
see a significant decrease in low-income residents. To the best of my knowledge, only two
papers, Chen, Oliva, and Zhang (2017) and Li et al. (2017) have studied the long-term effect of
air pollution on the internal migration in China. Chen, Oliva, and Zhang (2017) use thermal
inversion as an instrument for air pollution, and find that air pollution reduced the population in a
given county and reduced in-migration of floating migrants. However, since they do not directly
4
observe out-migration, they cannot distinguish between whether air pollution caused out-
migration or whether it “only” determined destination once an individual had already decided to
migrate. Since I explicitly observe whether a household sent out a migrant in the survey data, I
am able to make this distinction. Another key difference between my paper and Chen, Oliva, and
Zhang (2017) is that Chen, Oliva, and Zhang (2017) use remotely sensed data, while I adopt
ground-based monitor readings. Although remotely sensed data have been shown to measure air
quality sufficiently well (Kumar et al., 2011), ground-based monitoring stations provide accurate
city-level air pollution readings for the same period the survey data my paper adopts cover.
Second, the survey data my paper adopts allow me to study the heterogeneous effects of
air pollution on migration across income groups and for families with or without children. The
previous literature, however, has only examined the heterogeneous effects of air pollution across
skill groups (Chen and Rosenthal 2008; Chen, Oliva, and Zhang, 2017).
II. Data
A. MEP/Air Quality
My air pollution data are published by the Ministry of Environmental Protection of the
People’s Republic of China (MEP), but come from two sources. The first source reports
historical air pollution data with API being the measure for air pollution and covers 2000-2013.
To reduce the negative health impacts of air pollution and encourage pollution monitoring, the
MEP enacted the Ambient Air Quality Standard (GB3095-1996) in 1996 and started disclosing
air pollution information. According to the Standard, around 86 selected cities, including all
provincial municipalities and provincial capitals, were required to report the daily API, which is
5
a normalized index transformed from three pollutant concentrations, 𝑃𝑀10, sulfur dioxide (𝑆𝑂2),
and nitrogen dioxide (𝑁𝑂2). Some of the 𝑃𝑀10 particulates are emitted directly from a source,
such as construction sites, unpaved roads, fields, smokestacks, or fires. Most particulates form in
the atmosphere as a result of complex reactions of chemicals such as 𝑆𝑂2 and nitrogen oxides
(𝑁𝑂𝑥), which are pollutants emitted from power plants, industries, and automobiles.
The API ranges from 0 to 500, with a larger number indicating worse air quality. It is
classified into six levels of air quality: excellent for API≤50, good for 51≤API≤ 100, lightly
polluted for 101≤API≤150, moderately polluted for 151≤API≤200, heavily polluted for
201≤API≤300, and severely polluted for 301≤API≤500. Figure Ia reports the fraction of
cities/days in each category for all cities monitored from June 4th, 2000 to December 29th, 2013.
The most represented categories are good, excellent, and slightly polluted. In total, 2 percent of
city/days are moderately polluted, heavily polluted, or severely polluted. In addition to reporting
the daily API, the selected cities were also required to report the primary pollutant (𝑃𝑀10, 𝑆𝑂2,
or 𝑁𝑂2), potential health effects, and a cautionary statement for specific sensitive groups of
people. Among all three pollutants, 𝑃𝑀10 is the main pollutant for 91 percent of cities/days.
All the real-time API readings on the monitoring stations were submitted to the MEP’s
Data Center and were disclosed to the public on the MEP’s official website. In addition to
accessing this air pollution information on the MEP’s official website, the public could also
obtain it through other websites, smart phone apps, and social networks. The information was
also available through more traditional media sources such as newspaper, TV, and radio.
These disclosed API readings constitute the first source of my air pollution data. The data
were collected by gracecode.com, a third party that web-scraped the MEP’s Data Center and
6
compiled the historical pollution information.2 The data contain the daily API and the main
pollutant for all Chinese cities required to report the air pollution information from June 4th, 2000
to December 29th, 2013. These data are used in the analysis of the effect of air pollution on
location choice.
The second source of my air pollution data uses AQI as the measure of air pollution and
covers a more recent time period. On February 29th, 2012, the Ambient Air Quality Standard
(GB3095-1996) of 1996 was replaced with the Ambient Air Quality Standard (GB3095-2012).
The new Standard introduced the AQI as a new measure of air quality. Although the new index
is classified similarly as the old index, the AQI is calculated based on a different formula, and
comprises a more thorough set of pollutants, including 𝑆𝑂2, 𝑁𝑂2, 𝑃𝑀10, 𝑃𝑀2.5, 𝑂3, and 𝐶𝑂.
Therefore, I do not make direct comparisons between days measured by the API and those
measured by the AQI. Figure Ib reports the distribution of city/days in each category for all cities
monitored from May 13th to June 12th of both 2014 and 2016. The AQI as well as the
concentration of the full set of pollutants were published by the China National Environmental
Monitoring Center, a subsidiary of the MEP, and web-scraped by beijingair.sinaapp.com, another
third party that compiled the historical air pollution data. I use these data for my analysis on out-
migration.
Several papers have examined the reliability of the API data published by the MEP.
Andrews (2008) is the first to question the accuracy of these API data. Heeding the existence of a
national air pollution standard3 used to evaluate local governments, he argues that the reported
improvements in air quality for 2006-2007 over 2002 levels can be attributed to 1) a shift in
2 The MEP does not provide compiled historical air pollution data. The information released online was retracted
after a period of disclosure. 3 The national standard for a “blue-sky day” is API being less than 100. The number of “blue-sky days” factor into
local cadre’s evaluation, and thus the potential manipulation around the cutoff of 100 forms the basis of the
controversy.
7
reported 𝑃𝑀10 levels from just above to just below the national standard, and 2) a shift of
monitoring stations in 2006 to less polluted areas. Ghanem and Zhang (2013) and Chen et al.
(2013) similarly document the manipulation of reported air pollution data by the local
governments by noticing a significant discontinuity at the threshold of the national standard.
Nevertheless, the API data published by the MEP convey useful information and should in
general be reliable, because, by comparing the published API with visibility reported by the
China Meteorological Administration and Aerosol Optical Depth (AOD) from NASA satellites,
Chen et al. (2013) find significant correlation of the API published by the MEP with the two
alternative measures of air pollution. Furthermore, since the transition to the AQI is relatively
new, to the best of my knowledge, no study has questioned the reliability of the AQI.
B. CLDS
I use the information on migration from the China Labor-dynamics Survey (CLDS). The
CLDS is a rotating panel starting from 2012, and consists of a nationally representative sample
of individuals, households, and counties from 29 provinces and provincial-level municipalities.
The individuals, households, and counties were interviewed every other year from June to
August of the designated survey year.
For my analysis on out-migration, I use the panel component of the 2014 and 2016
CLDS, i.e. 7,744 households that were interviewed in both survey years, because these are the
two years when the sample contains information on out-migration. In particular, I derive the
migration status of each family member from the survey question: “Is this person currently living
at home?” I define people not living at home as migrants if they lived away due to one of the
following reasons: 1) going away long-term for work, 2) going away for school, 3) going away
8
for long-term visits to families or friends, 4) enlisting in the military, and 5) going abroad. A few
types of people away from home, however, are excluded from this definition of migrants; they
are: middle school and elementary school students living in boarding schools, people on short-
term trip for business or for pleasure, and people on short-term trip visiting families or friends.
Among the people living away from home, only those living in a district/county different from
that of the home being surveyed are counted as migrants. The basic information regarding the
family members away from home, i.e. the migrants, was collected from the family member
living at home at the time of the survey. For the people away from home, their mean age was 28;
43 percent of them were never married; only 8 percent of them were the household heads of their
households. These summary statistics suggest that the migrants defined by this survey question
mostly consist of the children of the households. This fact bears importance in my analysis of
out-migration. For instance, as reported in Table I, the households with migrants on average had
more children and less income compared to households without migrants; on average, the
household heads in the households with migrants were also older, less educated, and more likely
to have rural Hukou.
To study the effect of air pollution on location choice, I use the full sample of 23,594
individuals from the 2014 CLDS. In particular, the individual-level data of the 2014 CLDS
contain the complete migration history of each individual in the sample. The migration history
includes the destination city, the year, and the primary reason for each migration episode. I use
this piece of information to retrospectively construct a panel of individual location choice in each
year from 2004 to 2014. For an individual who moved more than once in a given year, I let
his/her final location after the last move to be his/her location in that year. Most people (around
82 percent) never migrated until being interviewed in 2014; about 13 percent of individuals
9
migrated once; about 5 percent of individuals migrated for more than once. There are 219 cities
represented by the location choices of this retrospective panel. I allow every individual to choose
each year from these 219 cities. Although there are a total of 287 cities in China, due to the
Independence of Irrelevant Alternatives assumption, excluding the other 68 cities in the
conditional logit model does not affect the consistency of my estimates, because excluding these
cities does not affect the relative probability between a chosen city and an unchosen city within
this 219-city set, given that these 219 cities were chosen randomly by the sampling design of the
CLDS.
C. Merging MEP/CLDS
The first part of my analysis focuses on the effect of air pollution in the origin on out-
migration. For this part of the analysis, I merge the AQI data from the MEP in 2014 and 2016
with the panel component of the household-level data from the 2014 and 2016 CLDS. As
mentioned above, the surveys of the 2014 and 2016 CLDS were conducted from June to August
of 2014 and 2016, and given that the AQI data are available only since May 13th, 2014, I
calculate the average AQI of each monitored city in the month following May 13th of 2014 and
2016 (i.e. 05/13/2014-06/12/2014 and 05/13/2016-06/12/2016) and let this average be my city-
level air pollution measure for that year. In doing so, I focus on the effect of air pollution in May
and June on the out-migration status at the time of the survey in the same year. The advantage of
limiting my analysis to this short window is that it allows for a consistent measure of air
pollution and that it allows me to exploit the panel component of the CLDS. The disadvantage of
adopting this short window is that air pollution in China displays seasonal patterns, so air
pollution in May and June may not be representative of air pollution in other months. This
10
disadvantage affects the interpretation of my results, but since I am only using the within-
household variation in out-migration, it does not affect the consistency of my coefficient
estimates. Figure II shows the average daily AQI in the month following May 13th, 2014 across
four major cities.
In the household-level CLDS sample, I consider the city in which a household was
located at the time of the survey as the origin for the potential migration of a member in the
household.4 The AQI data cover all 287 cities in 2016, but only 190 cities in 2014. This set of
190 cities monitored in 2014 does not perfectly overlap with the cities represented by the CLDS
household sample. To improve my ability to use the CLDS sample, I interpolate the average AQI
at un-monitored cities using the average AQI at monitored cities via ordinary kriging. Figure III
shows the prediction map of the interpolation,5 where the bubbles indicate the locations of the
monitored cities, and the size of the bubbles represents the severity of the air pollution at these
locations; the purple crosses illustrate the locations of 284 out of 287 cities in China; the
background color represents the AQI level at each point in China. The prediction map suggests
that the air pollution is more severe in northern and northwest China and less severe in northeast
China and southern China. This prediction is consistent with the geographical patterns of air
pollution in China because 1) the steel industry congregates in northern China around Beijing, 2)
Northwest provinces such as Inner Mongolia and Xinjiang experience sandstorms each year in
the spring, which significantly increases particulate matter concentrations in these regions.
4 To protect of the privacy of the interviewed households, the CLDS sample specifies the cities in which the
households resided, but not the county. 5 The interpolation assigns a value to the un-monitored cities based on a weighted average of the average API of all
monitored cities. With ordinary kriging, the weight placed on each monitored city not only is based on the distance
between the monitored city and the interpolated city, but also de-clusters groups of monitored cities. See Cressie
(2015) and Isaaks and Srivastava (1989) for a detailed description of interpolation using ordinary kriging.
11
The second part of my analysis focuses on the effect of air pollution on location choice.
For this part of the analysis, I calculate the average API of each potential destination city for
each year from 2003 to 2013. Figure IV shows the number of cities monitored for the API by the
MEP each day from June 4th, 2000 to December 29th, 2013. Since some individuals in the
retrospective panel chose un-monitored cities, I perform the same interpolation procedure as
mentioned above to impute the average annual API at un-monitored cities. To partially
demonstrate the validity of using the average API at monitored cities to impute the average API
at un-monitored cities, Table IIa and Table IIb compare the distribution of the AQI from
December, 2013 to February, 2015 in cities previously monitored for the API and that in cities
previously un-monitored for the API. Since the reporting of the AQI succeeded the reporting of
the API and the AQI was reported for a larger set of cities, comparing the AQI in later years
gives some sense of whether these two sets of cities are comparable. The lack of difference
between these two sets of cities lends credit to using the API in the monitored cities to
interpolate the API in the un-monitored cities.
D. Weather
For my instrument, I obtain data on the prevailing wind directions at all 1,156 monitoring
stations across China from 1981 to 2010 from the China Meteorological Data Service Center. To
calculate the prevailing wind direction of each city, I assign the city in which each household
resided or each individual chose to its closest monitoring station. The wind could take 16
different directions, with each direction representing a span of 22.5°.
E. City-Level Characteristics
12
To construct the second instrument, I obtain data on the dust (or soot) and 𝑆𝑂2 emission
levels for 290 Chinese cities in each year from 2003-2014 and 2016 from the China City
Statistical Yearbook. I also obtain data on the number of unemployed, the per capita GRP, and
the gross industrial output value for these cities in 2014 and 2016 from the same Yearbook.
III. Methods
A. Out-migration and Identification Strategy
In the first part of my analysis, I estimate a linear probability model to study the effect of
air pollution in the origin on out-migration using the panel component of the 2014 and 2016
CLDS households. The estimated equation is as follows:
(1) 𝑆𝑒𝑛𝑡𝑀𝑖𝑔𝑟𝑎𝑛𝑡𝑖𝑐𝑡
= 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝐴𝑄𝐼𝑐𝑡𝛽1 + 𝐻𝑒𝑎𝑑𝐸𝑑𝑢𝑖𝑐𝑡𝛽2 + 𝐻𝑒𝑎𝑑𝐸𝑑𝑢𝑖𝑐𝑡 × 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝐴𝑄𝐼𝑐𝑡𝛽3 + 𝐼𝑛𝑐𝑜𝑚𝑒𝑖𝑐𝑡𝛽4
+ 𝐼𝑛𝑐𝑜𝑚𝑒𝑖𝑐𝑡 × 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝐴𝑄𝐼𝑐𝑡𝛽5 + 𝐻𝑎𝑣𝑒𝐶ℎ𝑖𝑙𝑑𝑖𝑐𝑡𝛽6 + 𝐻𝑎𝑣𝑒𝐶ℎ𝑖𝑙𝑑𝑖𝑐𝑡 × 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝐴𝑄𝐼𝑐𝑡𝛽7
+ 𝑋𝑐𝑡𝛽8 + 𝜇𝑖 + 𝛿𝑡 + 𝜖𝑖𝑐𝑡
where 𝑆𝑒𝑛𝑡𝑀𝑖𝑔𝑟𝑎𝑛𝑡𝑖𝑡 is an indicator for having a migrant in household 𝑖 in year 𝑡;
𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝐴𝑄𝐼𝑐𝑡 is the average AQI in year 𝑡 of city 𝑐 in which household 𝑖 was located in;
𝐻𝑒𝑎𝑑𝐸𝑑𝑢𝑖𝑐𝑡 is the years of education of the household head of household 𝑖 in year 𝑡; 𝐼𝑛𝑐𝑜𝑚𝑒𝑖𝑐𝑡
is the family income of household 𝑖 in year 𝑡; 𝐻𝑎𝑣𝑒𝐶ℎ𝑖𝑙𝑑𝑖𝑐𝑡 is the dummy for having a child in
household 𝑖 in year 𝑡; 𝑋𝑐𝑡 is the city-level controls including the unemployment rate, per capita
GRP, and gross industrial output value; 𝜇𝑖 is the household fixed effect; 𝛿𝑡 is the year fixed
13
effect with 2014 being the base year; 𝜖𝑖𝑐𝑡 is the error term. The CLDS surveys were conducted
such that no households followed-up in 2016 moved between 2014 and 2016, so the household
fixed effect also controls for the location-specific time-invariant characteristics.
I include the interaction terms between the average AQI and household head’s years of
education, family income, and the dummy for having a child. It is possible that more educated
people are more responsive to air pollution. This is because, on the one hand, skilled workers
may view pollution as disamenity only; on the other hand, low-skilled workers may value the job
opportunities created by polluting factories, so it is less clear whether they view pollution as
disamenity. It is possible that richer households are more responsive to air pollution, since,
assuming that air quality is a normal good, I expect that richer households consume more good-
quality air by migrating. If only richer people migrate in response to air pollution, migration can
be a venue for exacerbated quality of life inequality in China. Also, It is possible that households
with children are more responsive to air pollution, since children are more vulnerable to air
pollution, and parents may want to invest in their children’s health capital. Thus, parents may
place additional weight on air pollution when they decide whether or not to leave and where to
go.
I aim to address a potential source of endogeneity: air pollution is endogenous to local
economic activities. There are many determinants of out-migration in the error term that capture
how good a city was. I do not know what they are, but they can be correlated with the air
pollution levels. This potential source of endogeneity can cause the OLS coefficient estimate for
the average AQI in Equation (1) to be downward biased. To address this concern, I instrument
the average AQI with air pollution from distant sources. The idea of using air pollution from
distant sources as an instrument for local air pollution is first seen in Bayer et al. (2009). Bayer et
14
al. use a detailed source-receptor matrix developed for the United States Environmental
Protection Agency that relates emissions from nearly 6,000 sources to 𝑃𝑀10 in each county in
the U.S. to calculate the marginal willingness to pay for clean air in the U.S. Using this matrix,
they are able to calculate how much the pollution sources more than 80km away from a county
contributed to the 𝑃𝑀10 levels in that county. A similar IV strategy based on air pollution from
distant sources is later seen in Zheng et al. (2015), who use it to study the effect of air pollution
on China’s housing prices. Since the same source-receptor matrix used in Bayer et al. (2009)
does not exist in China, I adopt the formulation of the instrument from Zheng et al. (2015):
(2) 𝑁𝐸𝐼𝐺𝐻𝐵𝑂𝑅𝑖𝑡 = ∑ 𝑤𝑖𝑗
𝑗
∙ 𝑠𝑚𝑜𝑘𝑒 𝑒𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑗𝑡 ∙ 𝑒−𝑑𝑖𝑗 , 𝑑𝑖𝑗 > 120𝑘𝑚
where 𝑤𝑖𝑗 is a dummy variable that takes the value of 1 if source city 𝑗 is located in the
prevailing wind direction of receiving city 𝑖; 𝑠𝑚𝑜𝑘𝑒 𝑒𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑗𝑡 is city 𝑗’s emission level in year
𝑡; 𝑑𝑖𝑗 is the distance between city 𝑖 and city 𝑗; and 𝑒−𝑑𝑖𝑗 is the value of a continuous and
exponential decreasing function, so the weight declines as the distance between the city 𝑗 and
city 𝑖 increases.
In constructing this instrument, I carry out the following procedure. First, I let the most
frequent wind direction of a city from 1981 to 2010 to be the prevailing wind direction of that
city. Prevailing wind is recurring, and the most frequent wind direction from 1981 to 2010
should represent the prevailing wind direction in 2014 and 2016. Second, following Bayer et al.
(2009), I sum the amount the dust (or soot) and the amount of 𝑆𝑂2 (both measured in tons) and
let it be the emission level of a source city, since both dust (or soot) and 𝑆𝑂2 factor into the
calculation of the AQI (and API). Since I observe the concentration of each pollutant along with
the AQI in 2014 and 2016, in Section V, I will instrument particulate matter concentration with
15
dust (or soot) from distant sources and 𝑆𝑂2 concentration with 𝑆𝑂2 from distant sources
separately as a robustness check. Third, I measure the distance 𝑑𝑖𝑗 by the degrees of longitude
and latitude.
The remaining step is to choose the exclusion distance within which the emissions do not
count toward air pollution from distant sources. An ideal distance would allow the instrument to
be correlated with local AQI but uncorrelated with local economic activities. Increasing this
distance would weaken both correlations, and decreasing this distance would strengthen both. To
choose a good exclusion distance, I summarize the correlation between air quality measures and
observable local economic activities variables in Table III. The air quality measures are in the
top row of Table III and include the AQI, pollution from sources > 50km, pollution from sources
> 80km, and pollution from sources > 120km. The local economic activities variables are in the
left-most column. * indicates a coefficient statistically significantly different from 0 at 20 percent
level while regressing the air quality measure on the city characteristic on a sample of cities. The
first observation is that, from Column (1), the AQI is highly correlated with observable local
economic activities variables. This is evidence that the OLS estimates of Equation (1) could be
biased. The second observation is that air pollution from distant sources, despite being highly
correlated with the AQI, is less correlated with local economic activities variables than the AQI
itself. I choose 120km as the exclusion distance to be consistent with the exclusion distance used
by Zheng et al. (2015).
Table IV reports the first stage. The first stage estimated on a sample of cities is strong,
with an F-Statistic of 36. The average AQI and air pollution from distant sources are both
normalized to z-scores with mean 0 and standard deviation 1. As expected, the average AQI was
increasing in air pollution from distant sources. On average, a one-standard deviation increase in
16
air pollution from distant sources was associated with a 0.4-standard deviation increase in the
average AQI.
B. Location Choice
For the second part of my analysis, I estimate a conditional logit model (McFadden,
1974) to study the effect of air pollution on location choice. This part of the analysis allows me
to exploit the long retrospective time span of the API data. The advantage of the choice model is
that it allows me to explicitly describe an individual’s utility; it also captures the relative
characteristics of a place, and thus allows for a role for both the origin and the destination. With
this model, the identification comes from the revealed preference of the individuals over
locations with different levels of air pollution. The model assumes that the error term has i.i.d.
type-1 extreme value distribution. Because the error terms are assumed to be independent, the
model also assumes IIA. In particular, the error terms for close-by locations are assumed to be
uncorrelated with one another. I estimate the following equation:
(3) 𝑈𝑖𝑗𝑡
= 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1𝛼 + 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1𝛽 + 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1𝛾
+ 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡𝜙 + 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1𝜃 + 𝜈𝑖𝑗𝑡
where 𝑈𝑖𝑗𝑡 is individual 𝑖’s utility of choosing city 𝑗 in year 𝑡; 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 is the average
API of city 𝑗 in year 𝑡 − 1; 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1 is a dummy for whether individual 𝑖 was located in city
𝑗 in year 𝑡 − 1; 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 is a dummy for whether individual 𝑖 had been to city 𝑗 by year 𝑡;
𝜈𝑖𝑗𝑡 is the error term. I lag the average API by one year to avoid reverse causality. 𝛼 is the effect
17
of the average API in a city in a given year on the probability that the city was chosen in the
following year. 𝛽 is the probability that someone stayed where he/she was the year before. 𝛼 + 𝛾
is the effect of the average API in the origin. 𝜙 is the probability that someone revisited a city he
had been to before. 𝜃 is the additional effect of the average API in a city he had been to on the
probability that the city was chosen again. I expect that 𝛼 < 0, 𝛽 > 0, 𝜙 > 0, and since an
individual might have better information regarding the air quality in places he had been to, I
expect 𝛾 < 0 and 𝜃 < 0.
Since the average API is endogenous to local economic activities, to instrument for the
average API, I estimate Equation (3) via GMM instead of maximum likelihood estimation
(MLE). Following Train (2009, pp. 326), I derive the following moment condition:
∑ ∑(𝑌𝑖𝑗𝑡 − ℙ(𝑌𝑖𝑗𝑡 = 1|𝑋𝑖𝑗𝑡))𝑍𝑖𝑗𝑡 = 0
𝑗𝑖,𝑡
Where 𝑌𝑖𝑗𝑡 is 1 if individual 𝑖 was located in city 𝑗 in year 𝑡; ℙ(𝑌𝑖𝑗𝑡 = 1|𝑋𝑖𝑗𝑡) is the conditional
probability that individual 𝑖 was located in city 𝑗 in year 𝑡; 𝑋𝑖𝑗𝑡 are the regressors in Equation (3)
including 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1, 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1, 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1, 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡,
and 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1; and 𝑍𝑖𝑗𝑡 is the instrument. The moment condition has an
intuitive construct: the observed mean of the instrument (i.e. ∑ ∑ 𝑌𝑖𝑗𝑡𝑍𝑖𝑗𝑡𝑗𝑖,𝑡 ) is equal to the mean
predicted by the model (i.e. ∑ ∑ ℙ(𝑌𝑖𝑗𝑡 = 1|𝑋𝑖𝑗𝑡)𝑍𝑖𝑗𝑡𝑗𝑖,𝑡 ). Under the assumption that the error
term has i.i.d. type-1 extreme value distribution, ℙ(𝑌𝑖𝑗𝑡 = 1|𝑋𝑖𝑗𝑡) has a closed-form solution
(McFadden, 1974):
ℙ(𝑌𝑖𝑗𝑡 = 1|𝑋𝑖𝑗𝑡) =𝑒𝑋𝑖𝑗𝑡
∑ 𝑒𝑋𝑖𝑘𝑡𝑘
18
As in the analysis of out-migration, I instrument for 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 with air pollution from
distant sources, while 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1 and 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 serve as their own instruments. Thus, the
model is exactly identified. In carrying out the GMM estimation, I assign the coefficient
estimates from estimating Equation (3) via MLE without the instrument as the initial values of
the coefficients.
IV. Results
A. Out-Migration
Table IV reports the results for the effect of the average AQI in the origin on the
probability that a household had a migrant. The OLS result in Column (1) suggests that a higher
average API in the origin is associated with a lower probability of having a migrant. The sign of
this coefficient estimate is expected and reflects that endogeneity causes the coefficient estimate
to be downward-biased. Column (2) shows the result after I add household and year fixed
effects. The fixed effects absorb all the household-specific or time-specific characteristics, and
partially corrects the sign of the coefficient estimate. Column (3) reports the IV result, and
suggests that a one-standard deviation increase in the average AQI increased the probability of
having a migrant by 29 percentage points. Given that the mean probability of having a migrant is
32 percent, the 29-percentage-point increase as suggested by the IV result is a sizable effect.
Table V reports the heterogeneous effects of the average AQI in the origin on the
probability that a household had a migrant across education and income groups, and for families
with or without children. The OLS results suggest that households with a less educated
household head were more likely to send out a migrant, and that households with children were
19
more likely to send out a migrant. These results may merely reflect that the out-migrants
recorded in the CLDS data are mostly the children of the household, so rural households with
less education and households with children were more likely to send their children away. After I
add household and year fixed effects, these associations go away, and I find an additional
association that households with children were less responsive to air pollution. Nevertheless, this
may simply be caused by the endogeneity of the average AQI. Indeed, this association goes away
after I instrument for the average AQI. The IV result also suggests that a one-year increase in the
household head’s education increased the marginal effect of the average AQI by 1 percentage
point, so more educated households were more responsive to air pollution.
To explore whether the positive effect of the average AQI on the probability of having a
migrant masks any non-linearity, I add the quadratic term of the average AQI as well as the
interaction terms of the quadratic term and household head’s years of education, family income,
and having a child. Table VI shows the result including these terms. The quadratic terms are only
statistically significant for the interaction terms with family income and having a child in the
fixed effects regression, but not in the IV regression. Thus, I conclude that no non-linear effects
exist.
B. Location Choice
Table VII reports the coefficient estimates from estimating the conditional logit model in
Equation (3). Although these are not the marginal effects, the signs of the coefficient estimates
inform us of the signs of the marginal effects. Without instrumenting for 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1, I
find that the coefficient estimate for 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 is negative and statistically significant,
indicating that people were less likely to choose a city with more air pollution. The coefficient
20
estimates for 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 and 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 have the expected signs,
indicating that an individual was more likely to choose a city if the individual was at that city
before, and that a person was more responsive to air pollution if he/she had been to that city
before, presumably due to better information regarding the air pollution at that city, but the
coefficient estimate for 𝐻𝑎𝑣𝑒𝐵𝑒𝑒𝑛𝑖𝑗𝑡 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 is not statistically significant. The
coefficient estimates for 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1 and 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑖𝑗𝑡−1 × 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 have the
unexpected signs, although the former coefficient estimate is not statistically significant; the
positive sign of the latter coefficient estimate may be due to the endogeneity of
𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1.
Column (2) reports the results after I instrument for 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 with air pollution
from distant sources. The coefficient estimates for 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1 is more negative,
suggesting that an upward bias does exist if I do not instrument for 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝐴𝑃𝐼𝑖𝑗𝑡−1. None of
the coefficient estimates for the other regressors is statistically significant. Thus, people were
less likely to choose a city with more air pollution, but I cannot conclude whether or not people
were more responsive to air pollution in a city if they were at that city the year before or if they
had been to that city before.
If air pollution did not cause out-migration but “only” determined destination once an
individual had already decided to migrate, I should only observe an effect (𝛼 < 0) in the
conditional logit, and I should not observe a positive effect in the out-migration equation. In fact,
my results do suggest a negative effect (𝛼 < 0) in the conditional logit and a positive effect on
out-migration. I take these results as evidence that air pollution did cause out-migration.
V. Robustness
21
A. Comparison Between Interpolated Values and True Values
In this section, I present an additional piece of evidence that supports the interpolation
method I adopt, in addition to Table III. In particular, I test the reliability of the interpolation by
comparing the interpolated values and the true values of the average API for 10 randomly
selected cities monitored by the MEP in 2013, the last year in the analysis of location choice.
Since there were 68 monitored cities in 2013, the interpolation is performed using the average
API of the other 58 monitored cities. Table VIII shows the 10 cities, their location, and the
interpolated values and the true values of the average API. The interpolated values are
reasonably close to the true values, with a correlation of 0.8861. Furthermore, since 190 cities
were monitored for the AQI in 2014, the interpolation on these cities for the analysis of out-
migration should be even more reliable.
B. Other Exclusion Distances and Pollutants
In Table IX, I present the main out-migration results for varying exclusion distances that
I use to calculate the instrument and with different pollutants as the regressors of interest. As
mentioned above, the exclusion distance is the distance beyond which an emitter city is not
counted toward the air pollution from distant sources of a receiver city. For all columns, I
employ having a migrant in the household as the dependent variable and include household and
year fixed effects. The main coefficient estimates of all specifications are positive and
statistically significant. The magnitude of the coefficient estimates is generally increasing in the
exclusion distance, consistent with the fact that larger exclusion distances imply that the
instrument is less correlated with local economic activities. At the same time, larger exclusion
22
distances do not translate into less correlation between the instrument and the pollutant
concentration, as shown in the first stage and its F-statistics. This further lends credibility to my
choice of 120km as the exclusion distance.
VI. Conclusion
In this paper, I have studied whether people in China moved from more polluted cities to
less polluted cities. By using the API and the AQI data published by the MEP and the 2014 and
2016 CLDS sample, I have offered different strategies to study the effect of air pollution in the
origin and the destination: by estimating a fixed-effect model and by estimating a model of
location choice. In both models, I have implemented IV strategies to address the potential
concern that air pollution is endogenous to local economic activities.
I have found that a one-standard deviation increase in the average AQI increased the
probability of having a migrant by 29 percentage points. I have also found that more educated
households were more responsive to air pollution. In particular, a one-year increase in the
household head’s education increased the marginal effect of the average AQI by 1 percentage
point. I have not found any non-linear effects. Moreover, I have found that people were less
likely to choose a city with more air pollution. From these results, I conclude that air pollution
did cause out-migration.
Many people living in heavily polluted Chinese cities have chosen to adopt particulate-
filtering facemasks and air filter products to cope with air pollution (Zhang and Mu 2016; Sun et
al. 2017). However, there are other ways in which people might have responded to air pollution.
One of them can be migration. In the economics literature, migration has been seen as a
23
technology that mitigates the impacts of negative income shocks (Blanchard and Katz 1992).
Based on the evidence I have provided, migration could have been chosen as a way to mitigate
the health impacts of air pollution. It remains to be shown that migration indeed improved
people’s health outcomes.
Furthermore, even though people moved in response to air pollution, businesses might
not have moved accordingly. For example, there is only limited demand for doctors in the
destination so that a doctor who migrated might have to take on another profession. This
mismatch may result in a change in return to education in some areas. Whether this change
happened merits future research.
24
Reference
Andrews, Steven Q. "Inconsistencies in air quality metrics:‘Blue Sky’days and 𝑃𝑀10
concentrations in Beijing." Environmental Research Letters 3.3 (2008): 034009.
Banzhaf, Spencer H., and Randall P. Walsh. "Do people vote with their feet? An empirical test of
Tiebout's mechanism." The American Economic Review 98.3 (2008): 843-863.
Blanchard, Olivier Jean, and Lawrence F. Katz. "Regional evolutions." Brookings papers on
economic activity 1992.1 (1992): 1-75.
Brandt, Loren, and Thomas G. Rawski, eds. China's great economic transformation. Cambridge
University Press, 2008.
Chay, Kenneth Y., and Michael Greenstone. "Does air quality matter? Evidence from the
housing market." Journal of political Economy 113.2 (2005): 376-424.
Chen, Shuai, Paulina Oliva, and Peng Zhang. "The Effect of Pollution on Migration: Evidence
from China." No. w24036. National Bureau of Economic Research, 2017.
Chen, Yong, and Stuart S. Rosenthal. "Local amenities and life-cycle migration: Do people move
for jobs or fun?." Journal of Urban Economics 64.3 (2008): 519-537.
Chen, Yuyu, et al. Gaming in Air Pollution Data?: Lessons from China. No. w18729. National
Bureau of Economic Research, 2013.
Cressie, Noel. Statistics for spatial data. John Wiley & Sons, 2015.
Currie, Janet, et al. "Does pollution increase school absences?." The Review of Economics and
Statistics 91.4 (2009): 682-694.
Dasgupta, Susmita, et al. "Confronting the environmental Kuznets curve." The Journal of
Economic Perspectives 16.1 (2002): 147-168.
Gauderman, W. James, et al. "The effect of air pollution on lung development from 10 to 18
years of age." New England Journal of Medicine 351.11 (2004): 1057-1067.
Ghanem, Dalia, and Junjie Zhang. "Effortless Perfection: Do Chinese Cities Manipulate” Blue
Skies?”." (2013).
Grainger, Corbett A. "The distributional effects of pollution regulations: Do renters fully pay for
cleaner air?." Journal of Public Economics 96.9 (2012): 840-852.
Isaaks, Edward H. Srivastava, Mohan R. Edward H. Isaaks, and Mohan R. Srivastava. Applied
geostatistics. No. 551.72 ISA. 1989.
25
Jin, Y-Q., and F. Yan. "Monitoring sandstorms and desertification in northern China using
SSM/I data and Getis statistics." International Journal of Remote Sensing 25.11 (2004): 2053-
2060.
Li, Ding, Yan Zhang, and Shuang Ma. "Would Smog Lead to Outflow of Labor Force?
Empirical Evidence from China." Emerging Markets Finance and Trade just-accepted (2017).
Li, Teng, Haoming Liu, and Alberto Salvo. Severe air pollution and labor productivity. No.
8916. IZA Discussion Papers, 2015.
Kumar, Naresh, et al. "Satellite remote sensing for developing time and space resolved estimates
of ambient particulate in Cleveland, OH." Aerosol Science and Technology 45.9 (2011): 1090-
1108.
McFadden, Daniel. 1974. “Analysis of Qualitative Choice Behavior.” Frontiers in Econometrics,
ed. Paul Zarembka. New York: Academic Press.
National Health and Family Planning Commission, Department of Services and Management for
Migrant Population, Report on China’s Migrant Population Development 2016, China
Population Publishing House (2016)
Qin, Yu, and Hongjia Zhu. Run Away? Air Pollution and Emigration Interests in China.
Working Paper, 2015.
Smith, V. Kerry, and Ju-Chin Huang. "Can markets value air quality? A meta-analysis of
hedonic property value models." Journal of political economy 103.1 (1995): 209-227.
Stafford, Tess M. "Indoor air quality and academic performance." Journal of Environmental
Economics and Management 70 (2015): 34-50.
Stern, David I. "Progress on the environmental Kuznets curve?." Environment and development
economics 3.02 (1998): 173-196.
Simpson, James J., et al. "Airborne Asian dust: case study of long-range transport and
implications for the detection of volcanic ash." Weather and Forecasting 18.2 (2003): 121-141.
Sullivan, Daniel M. "Residential Sorting and the Incidence of Local Public Goods: Theory and
Evidence from Air Pollution." Resources for the Future Working Paper (2016).
Sun, Cong, Matthew E. Kahn, and Siqi Zheng. "Self-protection investment exacerbates air
pollution exposure inequality in urban China." Ecological Economics 131 (2017): 468-474.
Train, Kenneth E. Discrete choice methods with simulation. Cambridge university press, 2009.
Wang, Feng, and Xuejin Zuo. "Inside China's cities: Institutional barriers and opportunities for
urban migrants." The American Economic Review 89.2 (1999): 276-280.
26
World Bank national accounts data, and OECD National Accounts data files
World Health Organization, Ambient Air Pollution Database, May 2016
Yang, Gonghuan, et al. "Rapid health transition in China, 1990–2010: findings from the Global
Burden of Disease Study 2010." The lancet 381.9882 (2013): 1987-2015.
Zhang, Junjie, and Quan Mu. "Air pollution and defensive expenditures: Evidence from
particulate-filtering facemasks." (2016).
Zhang, Xin, Xiaobo Zhang, and Xi Chen. "Happiness in the air: How does dirty sky affect
subjective well-being?." (2015).
Zheng, Siqi, and Matthew E. Kahn. "Understanding China's urban pollution dynamics." Journal
of Economic Literature 51.3 (2013): 731-772.
Zheng, Siqi, Matthew E. Kahn, and Hongyu Liu. "Towards a system of open cities in China:
Home prices, FDI flows and air quality in 35 major cities." Regional Science and Urban
Economics 40.1 (2010): 1-10.
Zheng, Siqi, et al. "Real estate valuation and cross-boundary air pollution externalities: evidence
from Chinese cities." The Journal of Real Estate Finance and Economics 48.3 (2014): 398-414.
Zivin, Joshua Graff, and Matthew Neidell. "The impact of pollution on worker
productivity." The American economic review 102.7 (2012): 3652-3673.
27
Figure Ia: Distribution of Days by Pollution Level (API) from 2000-2013
Figure Ib: Distribution of Days by Pollution Level (AQI) in 2014 and 2016
28
Figure II: Average Daily AQI of Four Major Cities in May and June of 2014
29
Figure III: Prediction Map of Interpolating the AQI in 2014 Using Ordinary Kriging
30
Figure IV: Number of Cities Monitored Daily for API by the MEP Over Time
31
1
Table I: Summary Statistics by Whether a Household Sent Out a Migrant
2014
CLDS
House-
holds
2014
House-
holds
with
Migrants
2014
House-
holds
without
Migrants
2014
Differ-
ence
2016
CLDS
House-
holds
2016
House-
holds
with
Migrants
2016
House-
holds
without
Migrants
2016
Differ-
ence
(1) (2) (3) (2)-(3) (4) (5) (6) (5)-(6)
Number of Children in Household 0.74 0.93 0.69 -0.24*** 0.77 0.91 0.68 -0.23***
(0.93) (1.07) (0.88) [0.03] (0.96) (1.06) (0.88) [0.02]
Household Head Age 53.69 54.73 53.41 -1.33*** 55.28 55.33 55.25 -0.08
(13.31) (11.64) (13.71) [0.37] (13.21) (11.52) (14.09) [0.31]
Household Head Years of 8.09 7.30 8.30 1.01*** 8.20 7.54 8.57 1.04***
Education (4.07) (3.63) (4.16) [0.11] (3.97) (3.65) (4.10) [0.09]
Household Head Hukou (1=Rural) 1.51 1.30 1.57 0.26*** 1.64 1.25 1.87 0.62***
(2.83) (3.47) (2.63) [0.08] (3.27) (0.67) (4.06) [0.08]
Total Family Income (¥,000s) 51.07 42.62 53.32 10.70*** 56.79 47.17 62.33 15.16***
(86.94) (79.35) (88.72) [2.42] (95.28) (79.17) (103.03) [2.24]
Having a Migrant in Household 0.21 1.00 0.00 - 0.37 1.00 0.00 -
(1=Having a Migrant) (0.41) (0.00) (0.00) - (0.48) (0.00) (0.00) -
N 7744 1625 6119 7744 2827 4917 Notes: Standard deviations are in the parentheses; standard errors are in the square brackets. ***, **, and * indicate statistically significant coefficients at the one,
five, and ten percent levels, respectively.
32
Table IIa: Comparing the Means of the AQI of Major Cities and of Non-Major Cities
Variable By provincial capital By cities previously monitored for the API
Provincial-level Non-provin- Cities previously Previously
municipalities and cial capital monitored for Un-monitored
provincial capitals cities the API Cities
AQI 108.2 106.7 105.8 109.0
(57.85) (54.34) (54.45) (55.72)
N 10,999 56,938 43,507 24,430
Note: The Ministry of Environmental Protection (MEP) monitored the API for around 86 cities from
06/04/2000 to 12/29/2013 and the AQI for around 330 cities from 12/31/2013 onward.
Table IIb: Comparing the Distributions (Percent) of the Pollution Levels of Major Cities
and of Non-Major Cities
By provincial capital By cities previously monitored for the API
Pollution Provincial-level Non-provin- Cities previously Previously
level municipalities and cial capital monitored for Un-monitored
provincial capitals cities the API Cities
Excellent or 58.50% 59.01% 60.19% 56.68%
good
Lightly 25.82% 25.81% 25.31% 26.70%
polluted
Moderately 8.36% 8.50% 7.90% 9.51%
polluted
Heavily 5.89% 5.66% 5.50% 6.05%
polluted
Severely 1.43% 1.02% 1.10% 1.06%
polluted
Total 100% 100% 100% 100%
Note: The MEP classifies a day in a city with the AQI below 50 as “excellent”, a day in a city with the AQI
between 51 and 100 as “good”, a day in a city with the AQI between 101 and 150 as “lightly polluted”, a
day in a city with the AQI between 151 and 200 as “moderately polluted”, a day in a city with the AQI
between 201 and 300 as “heavily polluted”, and a day in a city with the AQI above 300 as “severely
polluted”.
33
Table III: Correlation Between Air Quality Measures and Local Characteristics
AQI Pollution from
Sources >50km
Pollution from
Sources >80km
Pollution from
Sources >120km
AQI 1* 0.3635* 0.3675* 0.4740*
Per Capital GRP -0.0281* 0.0580 0.0611 0.0681
Gross Industrial Output 0.1441* 0.0401 0.0493* 0.0822*
Unemployment Rate 0.1000* -0.0359 -0.0314 0.0447
Notes: Each cell contains the correlation between the corresponding city characteristic (listed in the left-hand column) and the measure of air quality (listed in the
top row) in the city. The air quality measures are AQI, air pollution from sources more than 50 km away from the receiving city, air pollution from sources more
than 80 km away from the receiving city, air pollution from sources more than 120 km away from the receiving city. * indicates a coefficient statistically
significantly different from 0 at 20 percent level while regressing the air quality measure on the city characteristic on a sample of cities.
34
1
Table IV: The Effect of Air Pollution in the Origin on Out-migration
Dependent Variable: Having a Migrant in the Household
(1) (2) (3)
OLS FE IV
Average AQI -0.027*** -0.024*** 0.286***
(0.004) (0.009) (0.088)
Household and Year Fixed N Y Y
Effects
First Stage: 0.372***
(0.059)
F Statistic of First Stage: 36.22
R2 0.05 0.09 -
Mean of Dep. Var. 0.315 0.315 0.315
F 191.819 135.461 88.844
N 14289 14289 14092 Notes: The IV regression uses air pollution from distant sources as the instrument for average AQI. The
average AQI and air pollution from distant sources are normalized to z-scores. All regression control for
per capita GRP, gross industrial output, and unemployment rate. All regressions apply sampling weights.
Standard errors in parentheses. * p < 0.1, ** p < 0.05, *** p < 0.01.
35
1
Table V: The Heterogeneous Effects of Air Pollution in the Origin on Out-migration
Dependent Variable: Having a Migrant in the Household
(1) (2) (3)
OLS FE IV
Average AQI -0.041*** -0.010 0.238***
(0.010) (0.014) (0.078)
Household Head's Years of -0.007*** -0.001 -0.001
Education (0.001) (0.002) (0.002)
Average AQI × Years of 0.001 -0.000 0.007**
Education (0.001) (0.001) (0.003)
Family Income (¥,000s) -0.000 -0.000 -0.000
(0.000) (0.000) (0.000)
Average AQI × Family 0.000 0.000 -0.000
Income (0.000) (0.000) (0.000)
Having a Child 0.058*** -0.025 -0.033
(0.008) (0.020) (0.021)
Average AQI × Having a 0.008 -0.032*** -0.026
Child (0.008) (0.011) (0.022)
R2 0.06 0.09 -
Mean of Dep. Var. 0.316 0.316 0.316
F 87.622 61.751 40.392
N 14142 14142 13802 Notes: The IV regression uses air pollution from distant sources as the instrument for average AQI. The
average AQI and air pollution from distant sources are normalized to z-scores. All regressions apply
sampling weights. Standard errors in parentheses. * p < 0.1, ** p < 0.05, *** p < 0.01.
36
1
Table VI: The Non-linear Effects of Air Pollution in the Origin on Out-migration
Dependent Variable: Having a Migrant in the Household
(1) (2) (3)
OLS FE IV
Average AQI -0.037*** -0.036* 0.077
(0.011) (0.021) (0.156)
Average AQI Squared -0.007 0.010 -0.055
(0.010) (0.014) (0.116)
Household Head's Years of -0.006*** -0.001 -0.008
Education (0.001) (0.002) (0.012)
Average AQI × Years of 0.002* -0.001 -0.006
Education (0.001) (0.002) (0.013)
Average AQI Squared × -0.001 0.001 0.008
Years of Education (0.001) (0.001) (0.014)
Family Income (¥,000s) -0.000 -0.000* 0.000
(0.000) (0.000) (0.000)
Average AQI × Family 0.000 0.000 0.000
Income (0.000) (0.000) (0.000)
Average AQI Squared × -0.000 0.000* -0.000
Family Income (0.000) (0.000) (0.001)
Having a Child 0.057*** -0.045** -0.134
(0.011) (0.022) (0.112)
Average AQI × Having a 0.005 -0.045*** -0.145
Child (0.009) (0.015) (0.126)
Average AQI Squared × 0.002 0.018* 0.105
Having a Child (0.008) (0.011) (0.111)
R2 0.06 0.09 0.07
Mean of Dep. Var. 0.316 0.316 0.316
F 63.972 47.368 34.237
N 14142 14142 13802 Notes: The IV regression uses air pollution from distant sources as the instrument for average AQI and air
pollution from distant sources squared as the instrument for average AQI squared, etc. The average AQI
and air pollution from distant sources are normalized to z-scores. All regressions apply sampling weights.
Standard errors in parentheses. * p < 0.1, ** p < 0.05, *** p < 0.01.
37
Table VII: Coefficient Estimates of Conditional Logit Model
Dependent Variable: City Being Chosen
Instrument for API: No Yes
(1) (2)
Location API -0.052*** -0.086***
(0.025) (0.029)
Current -1.013 0.896
(0.926) (3.763)
Current × Location 0.103*** 0.069
API (0.017) (0.065)
Have Been 5.866*** 1.035
(1.845) (4.099)
Have Been × -0.008 0.073
Location API (0.030) (0.070)
N 56,836,194 56,836,194
Note: The sample includes all 23,594 individuals in the 2014 CLDS. All
individuals are allowed to choose from 219 Chinese cities, an exhaustive set
of cities represented by these individuals’ locations from 2004 to 2014.
Standard errors are in parentheses. * p < 0.1, ** p < 0.05, *** p < 0.01.
38
Table VIII: Comparison Between the Interpolated API of 10 Randomly Selected Cities with
Their True Values in 2013
City Name Province Longitude Latitude Interpolated True API
API
Weinan City Shaanxi 109.50 34.50 72.49 67.64
Weifang City Shandong 119.16 36.71 82.95 100.43
Yueyang City Hunan 113.13 29.36 77.62 64.98
Mianyang City Sichuan 104.68 31.47 69.65 63.01
Karamay City Xinjiang 84.89 45.58 66.70 54.17
Yuxi City Yunnan 102.53 24.35 57.85 56.19
Jining City Shandong 116.59 35.41 89.80 95.25
Qiqihar City Heilongjiang 123.92 47.35 54.16 53.10
Rizhao City Shandong 119.53 35.42 81.13 76.31
Zaozhuang City Shandong 117.32 34.81 88.23 101.85
Correlation between the interpolated API and the true API: 0.8861
Standard Deviation of the API of the 68 monitored cities in 2013: 15.17
Note: The table compares the predicted API and the true API of 10 randomly selected cities among all 68 monitored
cities in 2013. The interpolation is performed over the other 58 cities using ordinary kriging.
39
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Pollutant:
Instrument:
Exclusion Distance: >50km >80km >120km >50km >80km >120km >50km >80km >120km >50km >80km >120km
Average AQI 0.160*** 0.166*** 0.234***
(0.049) (0.052) (0.059)
Average PM10 0.087*** 0.082*** 0.114***
(0.026) (0.027) (0.030)
Average PM2.5 0.122*** 0.112*** 0.154***
(0.038) (0.037) (0.041)
Average SO2 0.119*** 0.115*** 0.154***
(0.037) (0.035) (0.036)
First Stage: 0.271*** 0.271*** 0.372*** 0.241*** 0.240*** 0.287*** 0.259*** 0.257*** 0.301*** 0.299*** 0.285*** 0.378***
(0.061) (0.061) (0.059) (0.061) (0.061) (0.060) (0.059) (0.059) (0.058) (0.060) (0.060) (0.059)
First Stage F Stat: 25.1 25.09 36.22 24.46 24.34 28.38 32.18 32.02 36.49 30.29 28.91 39.73
Mean of Dep. Var. 0.314 0.314 0.314 0.314 0.314 0.314 0.314 0.314 0.314 0.314 0.314 0.314
F 354.26 351.733 327.331 376.843 377.306 371.027 365.294 367.818 356.165 380.905 381.634 378.565
N 15356 15356 15356 15356 15356 15356 15356 15356 15356 15356 15356 15356
Table IX: Robustness Check with Other Exclusion Distance and Pollutants
Notes: PM10 is particulate matter with diameter less than 10 micrometers; PM2.5 is particulate matter with diameter less than 10 micrometers. The concentrations of PM10,
PM2.5, and SO2 are originally in micrograms per cubic meter, and are normalized to z-scores.. The average AQI and air pollution from distant sources are also normalized to z-
scores. Dust and SO2 emissions are from the China City Statistical Yearbook. The exclusion distance is the distance beyond which an emitter city is not counted toward the air
pollution of a receiver city. All regressions apply sampling weights. Standard errors in parentheses. * p<0.1, ** p<0.05, *** p<0.01.
Dependent Variable: Having a Migrant in the Household
AQI
Sum of Dust and SO2
PM10
Dust
PM2.5
Dust
SO2
SO2
40