generation of fine-scale population layers using multi-resolution satellite imagery and geospatial...

14
Generation of ne-scale population layers using multi-resolution satellite imagery and geospatial data Derek Azar a , Ryan Engstrom a, b , Jordan Graesser a , Joshua Comenetz a, a U.S. Census Bureau, 4600 Silver Hill Road, Washington, DC 20233, United States b Department of Geography, The George Washington University, 1922 F Street NW (Old Main), Washington, DC 20052, United States abstract article info Article history: Received 2 March 2012 Received in revised form 19 November 2012 Accepted 23 November 2012 Available online 29 December 2012 Keywords: Population distribution Built-up area Pakistan Dasymetric CART A gridded population dataset was produced for Pakistan by developing an algorithm that distributed popula- tion either on the basis of per-pixel built-up area fraction or the per-pixel value of a weighted population likelihood layer. Per-pixel built-up area fraction was calculated using a classication and regression trees (CART) methodology integrating high- and medium-resolution satellite imagery. The likelihood layer was produced by weighting different geospatial layers according to their effect on the likelihood of population being found in the particular pixel. The geospatial layers integrated into the likelihood layer were: 1) prox- imity to remotely sensed built-up pixels, 2) density of settlement points in a xed kernel, 3) slope, 4) eleva- tion, and 5) heterogeneity of landcover types found within a search radius. The method for weighting these layers varied according to settlement patterns found in the provinces of Pakistan. Differences in zonal popu- lation estimates generated from the 100-meter gridded population layer resulting from this study, Oak Ridge National Laboratory's LandScan (2002), and CIESIN's Gridded Population of the World and Global Rural Urban Mapping Project (GPW and GRUMP) are examined. Population estimates for small areas produced using this paper's method were found to differ from census counts to a lesser degree than those produced using LandScan, GPW, or GRUMP. The root mean square error (RMSE) for small area population estimates for this method, LandScan, GPW, and GRUMP were 31,089, 48,001, 100,260, and 72,071, respectively. Published by Elsevier Inc. 1. Introduction Readily available and accurate data on spatial population distribu- tion have multiple uses for humanitarian relief, disaster response planning, and development assistance. To assist in these areas, the U.S. Census Bureau, using remotely sensed imagery and population census data, created a highly detailed population distribution dataset for Pakistan. This work is part of the Census Bureau's ongoing Populations at Risk initiative, which follows a 2007 National Research Council (NRC) report that called for the Census Bureau to produce high-resolution, subnational estimates for those populations that are at risk of exposure to natural disasters and complex humanitarian emergencies. Population data are a fundamental component of both policymaking and disaster response. In order to be useful, however, baseline data must be timely, detailed, and spatially enabled (Noji, 2005). These criteria can be met by reliable and recent population censuses and surveys. Many countries, however, including some prone to natural disasters and humanitarian emergencies, lack recent census and survey data. Additionally, the lack of a link between population data and digital geographic boundaries can further reduce the availability of informa- tion on populations at risk. In this case, data tables may exist for small areas at lower levels of administrative geography. However, without a link between digital maps and demographic data, users are limited to what can be gleaned from place names found within the census or sur- vey. Researchers have recognized for over a decade that the technical difculty linking human geography and data produced from remotely sensed sources is a barrier to the increased use of satellite data (Rindfuss & Stern, 1998). The absence of population data, either because the data are not collected or lack useful accompanying geographical data, is a formidable obstacle to policymaking and humanitarian re- sponse in parts of the developing world (NRC, 2007). Satellite imagery can be collected over large areas at relatively low cost compared to the price of conducting a nationally representative population census or household survey. The idea of exploiting the synoptic coverage of satellite imagery to compensate for the uneven temporal and spatial resolution of census data is not new in the geo- graphic literature. Areas of human activity appear to the naked eye readily in remotely sensed imagery. Surface characteristics of built- up areas make them recognizable by a human interpreter. For exam- ple, built-up areas tend to be bright in all visible bands and tend to be highly textured relative to most natural surfaces. Human observers Remote Sensing of Environment 130 (2013) 219232 This paper is released to inform interested parties of research and to encourage discus- sion. Any views expressed on statistical, methodological, or technical issues are those of the author(s) and not necessarily those of the U.S. Census Bureau. Corresponding author. Tel.: +1 301 763 1408; fax: +1 301 763 2516. E-mail address: [email protected] (J. Comenetz). 0034-4257/$ see front matter. Published by Elsevier Inc. http://dx.doi.org/10.1016/j.rse.2012.11.022 Contents lists available at SciVerse ScienceDirect Remote Sensing of Environment journal homepage: www.elsevier.com/locate/rse

Upload: joshua

Post on 15-Dec-2016

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

Remote Sensing of Environment 130 (2013) 219–232

Contents lists available at SciVerse ScienceDirect

Remote Sensing of Environment

j ourna l homepage: www.e lsev ie r .com/ locate / rse

Generation of fine-scale population layers using multi-resolution satellite imageryand geospatial data☆

Derek Azar a, Ryan Engstrom a,b, Jordan Graesser a, Joshua Comenetz a,⁎a U.S. Census Bureau, 4600 Silver Hill Road, Washington, DC 20233, United Statesb Department of Geography, The George Washington University, 1922 F Street NW (Old Main), Washington, DC 20052, United States

☆ This paper is released to inform interested parties of resion. Any views expressed on statistical, methodological, orauthor(s) and not necessarily those of the U.S. Census Bure⁎ Corresponding author. Tel.: +1 301 763 1408; fax:

E-mail address: [email protected] (J. Com

0034-4257/$ – see front matter. Published by Elsevier Ihttp://dx.doi.org/10.1016/j.rse.2012.11.022

a b s t r a c t

a r t i c l e i n f o

Article history:Received 2 March 2012Received in revised form 19 November 2012Accepted 23 November 2012Available online 29 December 2012

Keywords:Population distributionBuilt-up areaPakistanDasymetricCART

A gridded population dataset was produced for Pakistan by developing an algorithm that distributed popula-tion either on the basis of per-pixel built-up area fraction or the per-pixel value of a weighted populationlikelihood layer. Per-pixel built-up area fraction was calculated using a classification and regression trees(CART) methodology integrating high- and medium-resolution satellite imagery. The likelihood layer wasproduced by weighting different geospatial layers according to their effect on the likelihood of populationbeing found in the particular pixel. The geospatial layers integrated into the likelihood layer were: 1) prox-imity to remotely sensed built-up pixels, 2) density of settlement points in a fixed kernel, 3) slope, 4) eleva-tion, and 5) heterogeneity of landcover types found within a search radius. The method for weighting theselayers varied according to settlement patterns found in the provinces of Pakistan. Differences in zonal popu-lation estimates generated from the 100-meter gridded population layer resulting from this study, Oak RidgeNational Laboratory's LandScan (2002), and CIESIN's Gridded Population of theWorld and Global Rural UrbanMapping Project (GPW and GRUMP) are examined. Population estimates for small areas produced using thispaper's method were found to differ from census counts to a lesser degree than those produced usingLandScan, GPW, or GRUMP. The root mean square error (RMSE) for small area population estimates forthis method, LandScan, GPW, and GRUMP were 31,089, 48,001, 100,260, and 72,071, respectively.

Published by Elsevier Inc.

1. Introduction

Readily available and accurate data on spatial population distribu-tion have multiple uses for humanitarian relief, disaster responseplanning, and development assistance. To assist in these areas, theU.S. Census Bureau, using remotely sensed imagery and populationcensus data, created a highly detailed population distribution datasetfor Pakistan. This work is part of the Census Bureau's ongoingPopulations at Risk initiative, which follows a 2007 National ResearchCouncil (NRC) report that called for the Census Bureau to producehigh-resolution, subnational estimates for those populations thatare at risk of exposure to natural disasters and complex humanitarianemergencies.

Population data are a fundamental component of both policymakingand disaster response. In order to be useful, however, baseline datamust be timely, detailed, and spatially enabled (Noji, 2005). Thesecriteria can be met by reliable and recent population censuses andsurveys. Many countries, however, including some prone to natural

search and to encourage discus-technical issues are those of theau.+1 301 763 2516.enetz).

nc.

disasters and humanitarian emergencies, lack recent census and surveydata. Additionally, the lack of a link between population data and digitalgeographic boundaries can further reduce the availability of informa-tion on populations at risk. In this case, data tables may exist for smallareas at lower levels of administrative geography. However, without alink between digital maps and demographic data, users are limited towhat can be gleaned from place names found within the census or sur-vey. Researchers have recognized for over a decade that the technicaldifficulty linking human geography and data produced from remotelysensed sources is a barrier to the increased use of satellite data(Rindfuss & Stern, 1998). The absence of population data, either becausethe data are not collected or lack useful accompanying geographicaldata, is a formidable obstacle to policymaking and humanitarian re-sponse in parts of the developing world (NRC, 2007).

Satellite imagery can be collected over large areas at relatively lowcost compared to the price of conducting a nationally representativepopulation census or household survey. The idea of exploiting thesynoptic coverage of satellite imagery to compensate for the uneventemporal and spatial resolution of census data is not new in the geo-graphic literature. Areas of human activity appear to the naked eyereadily in remotely sensed imagery. Surface characteristics of built-up areas make them recognizable by a human interpreter. For exam-ple, built-up areas tend to be bright in all visible bands and tend to behighly textured relative to most natural surfaces. Human observers

Page 2: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

220 D. Azar et al. / Remote Sensing of Environment 130 (2013) 219–232

may use these textural cues in landscapes where building and naturalmaterials are intermingled and when local materials are used for con-struction. Object-based classification of urban extents and settlementdetection has focused on high-resolution imagery (Blaschke, 2010).There are examples of object-based classification used with medium-resolution satellite imagery to detect landscape features (e.g., forestpatches, burned areas), but urban object-based remote sensing tendsto focus on individual structures or linear features, rather than urbanas a landcover.

Per-pixel classification of optical satellite imagery generally relieson the surface characteristics of texture and brightness in some wayto classify an image. This methodology tends to have difficultydistinguishing between bright anthropogenic and natural surfaces,though texture-based indices can help ameliorate these difficultiesand are used in this study (Guindon et al., 2004). However, humanimage interpreters can reduce error by using contextual clues thatare not available in automated approaches. Consequently, the chal-lenge for population geographers and remote sensing specialists isto 1) develop automated methods that can compensate for the miss-ing experience and contextual awareness of the human interpreterwhen classifying built-up areas and 2) determine how best to relatebuilt-up areas to population distribution.

1.1. Review of population distribution methodologies

Research into the use of satellite and related geospatial data toinfer population distribution began during the 1960s. The longestthread of study in this area has focused on landcover categories inwell-established urban areas in the United States and other devel-oped countries (Lo, 1989, 2003; Tobler, 1969). Estimates of impervi-ous area based on satellite data may also be used to indicate theextent of human settlement, without specific reference to per-pixelpopulation density (Small, 2005; Small & Nicholls, 2003) or as an in-dication of human encroachment on natural systems (Goetz & Jantz,2006). For studies where population density is modeled, the goal isto relate observable characteristics in satellite imagery to populationdistribution as recorded in censuses in an attempt to produce ahigher-resolution population estimate. Wu et al. (2008) provide auseful typology for organizing population disaggregation research.In general, previous research has attempted to either 1) disaggregatecensus data collected at an administrative level, by tracts, or othercensus geography to some smaller unit, or 2) to infer populationcounts using a per-pixel, continuous approach based on a set of relat-ed variables. Both approaches are driven by a mathematical relation-ship established between population count or density and a set ofspatial variables. This relationship may be simple, especially in thecase of dasymetric mapping, or more complex, such as a statistical es-timate produced using multiple variable regressions (Hay et al., 2005;Liu et al., 2006).

Dasymetric mapping employs a series of spatial partitions to intro-duce a higher resolution level into a dataset than the level at whichthe data were originally captured. Geospatial layers with resolutionsthat exceed a population dataset's are used to parse that datasetinto areas smaller than the base mapping unit (Eicher & Brewer,2001). The goal is to use the higher resolution data to produceweights to disaggregate the population data, thus creating a higherresolution population dataset. Landcover data derived either from re-mote sensing or collected from other sources are an often-used layerfor disaggregation (Holt et al., 2004; Liu & Clarke, 2002; Mennis,2003). Nighttime light emission can also be used in combinationwith landcover data (Briggs et al., 2007), but glow filtering intoneighboring pixels has been shown to substantially reduce accuracywith this approach (Small & Nicholls, 2003). Dasymetric mapping ap-proaches also make use of remotely sensed variables similar to thoseused when attempting to conduct continuous, per-pixel analysis (Wu& Murray, 2007; Wu et al., 2008). The variables tend to be used

differently, however, with zonal averages common in dasymetricmapping and per-pixel values used as input in continuous mappingapproaches. Oak Ridge National Laboratory's LandScan global griddedpopulation layer—for which the Census Bureau has provided the de-mographic base—is also produced using a weighted, dasymetric ap-proach (Dobson et al., 2000).

Continuous approaches attempt to associate directly spectral orspatial characteristics of pixels or groups of pixels with populationdensity (Harvey, 2002; Li & Weng, 2005; Liu & Clarke, 2002; Liu etal., 2006) or with derived proxies for population density (Chen,2002). Continuous approaches include those where a model is devel-oped that allows for a per-pixel population estimate based on contin-uously variable input surfaces. Such models allow for the productionof population estimates for extremely small areas—on the order of aneighborhood block. The success of per-pixel models has primarilybeen driven by ancillary data, such as zoning and cadastral data thatare rarely available in data-scarce countries, and they have neverbeen implemented at the national level. Martin et al. (2002) notedthat both continuous and dasymetric approaches to population map-ping using census data and ancillary geospatial layers produce resultsthat may be even more useful than the census data as collected.

Research on population estimation using remote sensing sited indata-rich countries allows for the inclusion of high-resolution, reli-able census data. Lu et al. (2006) achieved a high degree of accuracyrelating population density and built-up area fraction using block-level census data for the United States and an ETM+ derived built-up area layer, masked using zoning information.

The Gridded Population of the World (GPW) and Gridded Rural–urbanMapping Project (GRUMP) use relatedmethodologies based on ad-ministrative areas, with the GRUMP approach explicitly accounting forurban population centers. The most recent version (version 3) of GPWdisaggregated the 375,000 administrative polygons into an unsmoothed,griddedpopulation product. Grid cells that overlapmore than one admin-istrative unit are allocated using proportional areal weighting. GRUMP ismethodologically similar but incorporates urban footprints and associat-ed population data as additional units for population disaggregation (Balket al., 2006). This contrasts with the LandScan method where the totalpopulation of administrative areas may be altered. The United NationsEnvironment Programmeproduces a similar product, theGlobal ResourceInformationDatabase (GRID) population layer. The layer is designed to beintegrated with other biophysical layers to allow for a quantification ofexposure and risk. It distributes population using administrative bound-aries and amodel of accessibility to settlements based on the transporta-tion network (Nelson, 2004).

Hay et al. (2005) questioned the ability of various methodologies toproduce meaningful information below the level of input census dataand stated that ancillary data are insignificant compared to the influ-ence of input administrative data on accuracy. The verification methodin this paper uses census counts from low-level administrative units inresponse specifically to Hay et al. (2005) in order to demonstrate thatmethods that include ancillary data can add meaningful informationbelow the input administrative level. Methods with more ancillarydata tend to provide smoother surfaces and thus are cartographicallyappealing. However, cartographic appeal does not necessarily translateinto meaningful information at a finer scale. Higher-resolution griddedpopulation layers only add meaningful information when they demon-strably correlate with population distribution at the finer scale.

Tatem et al. (2007) and Linard et al. (2012) demonstrate that adasymetric mapping approach using landcover data can producemore detailed population maps even in data-scarce regions such asparts of Africa. Their work provides an important example of popula-tion mapping outside North America and Europe. While the researchis useful because of the lack of data generally available in Africa, arti-facts due to the use of landcover data are evident. For example, thegradation of population density moving from a settlement core, to pe-riphery, and then to hinterland, may be non-existent, with a blob-like

Page 3: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

1 Includes the part of Kashmir controlled by Pakistan.2 Referred to at the time of the census as North-west Frontier Province or NWFP.

221D. Azar et al. / Remote Sensing of Environment 130 (2013) 219–232

appearance. Linard et al. (2012) noted that care must be taken whenworking with global datasets at urban scales and that the applicationof satellite data can allow for more accurate mapping in rural areas.The AfriPOP dataset methodology was developed from a public healthperspective. For these applications, the determination of a denomina-tor for prevalence rates is a key concern. The AfriPOP dataset is wellsuited to public health applications with its 100-meter resolution.However, AfriPOP lacks variability within its dasymetic mappingzones, which are based on landcover and settlement extents. Themethodology tested here introduces variability at finer scales by con-sidering the intensity of settlement, as indicated by percent built-up.

Mubareka et al. (2008) provides a notable example of high-resolutionpopulation mapping in a data-scarce country by using a probability ap-proach that combines several geospatial layers to model population den-sity. This approach provides the basis for this study. The metric todetermine the utility of this approach is set by Hay et al. (2005). The re-search question posed here is: can a methodology using ancillary dataproduce meaningful information below the resolution of the input ad-ministrative geography asmeasured by lower administrative level censuspopulation counts?

1.2. Research on international population data and remotely sensed im-agery at the U.S. Census Bureau

TheCensus Bureaudeveloped a subnational, satellite imagery-derivedpopulation estimate methodology using dasymetric mapping techniquesthat produced population estimates at a 100-meter resolution for Haiti(Azar et al., 2010). Hereafter the 100-meter gridded population productwill be referred to as the “100-meter grid.” The methodology uses re-motely sensed estimates of per-pixel built-up area and a distribution al-gorithm with relatively few parameters, which was developed toascertain the strength of the hypothesis that satellite-derived built-uparea could be used to estimate population. The parameters of the modelspecified that population is distributed either: 1) directly proportionalto built-up area in urban and more densely populated administrativeunits or 2) evenly in rural areas. In rural areas, built-up areas were usedwhere detected, and thresholds were created for elevation and slopethresholds for delimiting areas that received no population in the distri-bution. Using a sample population dataset composed of over 100 settle-ments for validation, the new subnational population estimates were asignificant improvement over estimates available from earlier globalgridded population products.

For countries that are on average less densely populated than Haitior where disasters or complex emergencies occur away from urbancenters, the algorithm developed for Haiti could produce a populationdistribution pattern with little detail in rural areas. This pattern wasthe result of limited detection of built-up surfaces in rural areas anda simple slope threshold that excluded population from steep slopes.Ideally, the algorithm developed must cover the range of environ-ments found in data-scarce countries.

Like Haiti, Pakistan experiences significant humanitarian emer-gencies, but the two countries differ in many ways. Haiti was chosenfor study because of its vulnerability to hazards, small size, and rela-tively homogenous cultural and physical environments. Pakistanwas chosen precisely because it is large and culturally and physicallyheterogeneous. Pakistan (796,000 km2) covers slightly less thantwice the area of California, more than 28 times the size of Haiti(Central Intelligence Agency, 2010). Haiti has a homogenous climateand only two main types of terrain, inland hills and coastal flatlands.Environmental conditions are more varied in Pakistan, from themonsoon-dominated Indus delta to the arctic-like high-elevation al-pine northern areas. Whereas Haiti has a relatively homogenous pop-ulation in terms of language, ethnicity, and religion, Pakistan is hometo many ethnic groups and also has regions at different levels of de-velopment. The Pakistan project represented a next step in determin-ing whether remotely sensed and geospatial data in combination

with census data can produce meaningful population estimates at aresolution much finer than the input census data. The ability of therelationship between remotely sensed built-up area and census-measured population density to estimate current population basedon updated imagery (more recent than the census date) was alsoexamined.

In this research, we use a combination of two approaches: 1) thedetection of built-up or impervious surfaces for distributing popula-tion on a per-pixel basis and 2) the use of built-up surfaces in adasymetric mapping approach as part of an overall population distri-bution model. Azar et al. (2010) demonstrated that a dasymetric ap-proach using built-up area greatly improved the accuracy of smallarea estimates, especially in dense cities. This study further validatesthat finding and explores whether the increased complexity of thedistribution model, which weights ancillary geospatial data, yieldsan improved population distribution compared to our past work inHaiti. In Azar et al. (2010), small settlements in rural areas were notwell-defined in the final population distribution. An improvementwould be characterized by the new methodology producing more ac-curate population estimates of these small settlements.

2. Data

2.1. Population data

The study area covered the entire country of Pakistan.1 Pakistan isdivided into 8 first-level administrative units. Balochistan, KhyberPakhtunkhwa,2 Punjab, and Sindh are provinces. There are also fourterritories. These are Azad-Kashmir, Federally Administered TribalAreas (FATA), Gilgit-Baltistan, and Islamabad Capital Territory. Forthe purpose of the population distribution algorithm, the IslamabadCapital Territory was processed with Punjab. The provinces and terri-tories were used to group data for image processing on the assump-tion that these boundaries represent a rough first cut of the culturalboundaries present in Pakistan. Both provinces and territories are fur-ther divided into 129 districts and 393 tehsils (third-level administra-tive units).

The most recent (1998) census in Pakistan provided the popula-tion base (Population Census Organization, 1998). The tehsils consti-tuted the base unit for the population distribution algorithm. Tehsilsare analogous to townships in the United States. The average area ofthe tehsils was 2223 km2, with a minimum of 28 km2 and a maxi-mum of 24,992. The total population count also was available foreach tehsil. The tehsils ranged from a maximum population of3,778,172 to a minimum of 3247 and had a mean population of346,554 with a standard deviation of 428,838. Fourth-level adminis-trative units and below represented settlement points, which wereused for both calibration and validation in the population distributionalgorithm. Population counts for areas at the fourth level and belowwere available, but geographic information for these areas was not.The spatial extent of these areas was therefore inferred using imagery.A detailed explanation of the determination of spatial extent forsettlement points is provided in Section 3.3.2.

We made minor edits to the names of administrative units foundin the population data tables from the 1998 census of Pakistan andthe geographic administrative boundaries obtained from the Depart-ment of State's Office of the Geographer, to ensure that the twodatasets could be spatially joined. Only name edits necessary to en-sure a successful join were made. Census data were then joined tothe polygons and verified by checking that the numbers aggregatedto the province and country totals reported in the census.

The LandScan 2002 (Oak Ridge National Laboratory, 2002), GPW(2000 v3), and GRUMP (2000 v1) (Center for International Earth

Page 4: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

222 D. Azar et al. / Remote Sensing of Environment 130 (2013) 219–232

Science Information Network, 2000) datasets were also used to pro-vide points of comparison to the 100-meter grid. These datasetswere the closest available dates to the census. In addition, as will beshown, the LandScan, GPW, and GRUMP datasets showed an under-count relative to the population recorded in the census for a groupof validation settlements. The date offset was viewed as being rela-tively insignificant because the latter date would usually produce anovercount due to population growth.

2.2. Geospatial data

Combinations of medium- and high-resolution imagery data wereused. A mosaic of Landsat 7 Enhanced Thematic Mapping (ETM+)imagery was assembled with a target date as close to the 1998 dateof the census as possible. All or parts of 72 Landsat ETM+ imageswere used to cover the entire study area. The individual sceneswere converted to top of atmosphere reflectance and then mosaicked.The images were not color balanced because the distribution of train-ing areas throughout the final mosaic was able to compensate for ra-diometric differences between scenes (Yang et al., 2003). An equal-area conic projection was used with standard parallels of 26.5N and32.5N and a central meridian of 69E.

A large number of high-resolutionQuickBird (140) and IKONOS (11)sceneswere obtained. The sceneswere selected to cover the diversity oflandcover found in eachmedium-resolution training region across Paki-stan. The cost of imagery, a challenge for many academic researchers,was not a problem because of a government license for access tohigh-resolution imagery including QuickBird and IKONOS. The fileswere orthorectified using a 30-meter digital elevation model (DEM)and projected to match the medium-resolution data. Orthorectificationincreased co-registration between the high- and medium-resolutiondata. If it did not produce sufficient co-registration, ground controlpoints were obtained to warp the high-resolution imagery to co-register with Landsat data to within half a pixel. In general, an entirehigh-resolution scene was not used. Instead, the image was subsetinto a chip designed to cover the diversity of landcover and settlementdensities reflected in each high-resolution scene similar to Azar et al.(2010). Briefly, a chip simply refers to a subset of a high-resolutionimage that captures the landscape variation found in the originalimage and does not include cloud or haze, while focusing on the prima-ry settlement within the imagery.

In addition to the imagery data, we used an ecoregion digital bound-ary file, a DEM, and a landcover map. The ecoregion map defined large

Table 1Mapping areas used for image processing based on Pakistan administrative and ecological

Mapping area First-level administrative unit Primary ecoregion(s

1a Azad Kashmir None dominant2 Balochistan Balochistan xeric wo3 Balochistan Northwestern thorn4 Balochistan South Iran Nubo-Sin

Registan-North Paki5 FATA Balochistan xeric wo6 FATA Balochistan xeric wo7b Gilgit-Baltistan Himalayan8 NWFP Balochistan xeric wo9 NWFP Northwestern thorn10 NWFP Various Himalayan a11 Punjab Balochistan xeric wo

12c Punjab Northwestern thorn13 Punjab Thar desert14 Sindh Northwestern thorn15 Sindh Thar desert

Source: Olson et al., 2001.a Because of the small size and low availability of high-resolution imagery in Azad Kashmb Due to the low availability of high-resolution imagery in Gilgit-Baltistan, the province wc The Islamabad Capital Territory, relatively small in area compared to other first-level a

land or water areas that contain geographically distinct assemblagesof natural communities that reflect biological areas found in Pakistan(Olson et al., 2001). The four groups of ecoregions we used in Pakistanare: 1) xeric woodlands, 2) northwest thorn scrub forests, 3) arid (de-serts and semi-desert dry lands), and 4) Himalayan highlands. Someecoregions, which are not shown,with small extents but similar charac-teristics, were collapsed into these four broad categories. Azad-Kashmirformed a single mapping unit because of its small size. Images wereprocessed in groups based on these ecoregions and first-level adminis-trative boundaries. A 30-meter Shuttle Radar Topography-derived U.S.government DEM with interpolation of missing values based on sur-rounding pixels was also obtained The DEM covered the entire studyarea and was re-projected to match the imagery. A 20-meter U.S. Gov-ernment landcover dataset with 17 classes was also used (both thisand the DEM were unclassified but government-use-only datasets).This dataset was based on medium-resolution remotely sensed dataand verified in situ.

3. Methods

The overall methodology is divided into three steps: 1) high spatialresolution image classification, 2) medium spatial resolution imageclassification, and 3) population distribution algorithm developmentand application. Satellite data were processed in 15 different mappingareas derived using ecoregion and first-level administrative boundariesand then aggregated to the entire country (Table 1). These 15 areaswere only used for satellite imagery processing. We used first-level ad-ministrative units to process the population distribution algorithm,which were processed individually because of the large variation inlandcover found in Pakistan (e.g., Indus river valley, arid, andmountain-ous regions). The Landsat-derived percent built-up area raster, an inter-mediate product yielded by the first two steps, is the primary input forthe final population distribution phase. Although the final goal of thismethodology was the production of a gridded population raster, the in-termediate products also constitute useful geospatial products.

Both the high- and medium-resolution imagery were classifiedusing classification and regression trees (CART). CART is similar toother supervised classification techniques where limited user-inputis used to classify large areas of imagery. CART was chosen becauseit has been previously demonstrated to produce high-confidence re-sults when classifying built-up areas for large regions, including theentire United States (Homer et al., 2004) and because it was success-fully used it to classify the entire country of Haiti (Azar et al., 2010).

geography.

) Other ecoregions

odlands Sulaiman range alpine meadowsscrub forest East Afghan montane conifer forestsdian desert and semi-desert,stan sandy desert

Kuh Rud and eastern Iran montanewoodlands

odlands Sulaiman range alpine meadowsodlands Sulaiman range alpine meadows

odlandsscrub forest Balochistan xeric woodlandslpine forests and shrublands Rock and iceodlands Western Himalayan subalpine conifer

forests, Himalayan subtropical pine forestsscrub forest

scrub forest Balochistan xeric woodlands

ir, it was not broken up into smaller regions.as processed whole. Different types of Himalayan ecoregions were found throughout.

dministrative units, was included with Punjab.

Page 5: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

223D. Azar et al. / Remote Sensing of Environment 130 (2013) 219–232

3.1. Built-up area fraction

3.1.1. High-resolution classificationDecision trees (DTs) using the Quinlan (1993) C5.0 algorithm

were used to classify the high-resolution images into two classes, asa built-up pixel or not built-up pixel. DTs are nonparametric (makingno assumptions concerning the distribution of class signature) classi-fiers (Huang et al., 2002), in which recursive binary splits partitionvariables into pure terminal nodes that contain a homogeneous sub-set of the original dataset (Quinlan, 1993). These terminal nodes rep-resent a ‘pure’ pixel, or feature, which can be separated from all otherfeatures (Hansen et al., 2000). DTs search for independent variablesused to split pixels into two groups that explain the largest propor-tion of deviation of the dependent variable (Varlyguin et al., 2001).

To train the DT, a sample of small areas was selected that represent-ed the range of surfaces found in the high-resolution image scene.Within the training areas, built-up areas were digitized as 1, and areasthat were not built up were classified as 0. Built-up areas could also in-clude highlymodified naturalmaterials such as packed soils along pathsor denuded bare soil in central courtyards in rural areas. Within urbanareas, a more restrictive definition of a built-up area was adhered toduring digitization. Only impervious or nearly impervious surfaces,such as asphalt, concrete, brick, tile, and wood, were considered builtup. This approach towards digitization reflected the different develop-ment levels found in urban and rural areas in Pakistan.

In total, 22 variables were used in the DT model, including the vis-ible and near-infrared bands (NIR), spectral vegetation indices, bandratios, and gray level co-occurrence matrix (GLCM) second-order tex-ture statistics (Table 2). Texture-based variables were included basedon Azar et al. (2010), which indicates that they improve classificationaccuracy by DTs in high-resolution imagery. The window size for theGLCM was set at 5×5 pixels to match roughly the size of the build-ings and clusters of buildings that we were attempting to detect. Allsecond-order texture statistics were calculated using ENVI softwareand reintegrated with the spectral variables for further processingusing ERDAS Imagine software.

TheDTmodel was then applied to the remainder of the image, label-ing each pixel as either built up or not. Each DT output yielded error sta-tistics, based on pixels from the heads-up digitizationwithheld from theclassification model. While the statistics were considered, each DTmodel was judged primarily on the visual inspection of the classificationof the range ofmaterials found in the image. Fig. 1 illustrates an example

Table 2Spectral and texture variables used for classification of high-resolution satellite imageswith decision trees.

Band Notes

1 Blue2 Green3 Red4 Near5 NDVI Normalized Difference Vegetation Index6 GNDVI Green Normalized Difference Vegetation Index7 TNDVI Transformed Normalized Difference Vegetation Index8 SAVI Soil-Adjusted Vegetation Index9 MSAVI Modified Soil-Adjusted Vegetation Index10 RDVI Renormalized Difference Vegetation Index11 EVI Enhanced Vegetation Index12 SVI Spectral Vegetation Index13 Red/green14 Red/blue15 Green/blue16 Mean GLCM-derived17 Variance GLCM-derived18 Homogeneity GLCM-derived19 Contrast GLCM-derived20 Dissimilarity GLCM-derived21 Entropy GLCM-derived22 Second moment GLCM-derived

of a QuickBird image with the results of the DT shown as red pixelssuperimposed on the image. In some cases where DT training datacould not be refined to produce clearly discernible built and not-builtclasses, post-classification edits were made manually and/or using asmoothing filter to remove salt-and-pepper effects. This was especiallyimportant in areas with textured, highly reflective and compacted, ordesert surfaces. Inclusion of highly textured natural surfaces led to errorsof commission. Exclusion of these surfaces led to errors of omission.Built-up surfaces in sparsely populated desert areas were particularlydifficult to distinguish from highly reflective natural surfaces using theDT approach due to spectral similarity andwere the cause of most com-mission errors. In these sparsely populated areas, errors of omissionwere favored, and the areas were manually digitized. In addition, insome cases large buildings in otherwise sparsely populated areas weredifficult to classify. The low texture areas at the center of these buildingswere confused with surrounding, similarly low-texture natural surfacesin the area. In these rare cases, the outline of the building was heads-updigitized into the final, classified image.

3.1.2. Medium-resolution regression tree classificationBecause the high-resolution imagery only covered a fraction of

Pakistan, the results needed to be extrapolated to produce a nationalestimate of percent built-up area. Supervised classification using Cub-ist5 regression trees (RT) software was the method chosen. RTs aresimilar to DTs, with the major exception being output is a continuousestimate rather than categorical (Bittencourt & Clarke, 2003; Breimanet al., 1984). RTs produce a series of linear regression models that re-turn a numerical estimate of percent built-up area for each pixel,rather than a discrete result. The binary estimates produced in theDT step provided the percent built-up area estimates used for trainingthe RT. To produce the mosaic, the results from the DTs were aggre-gated from native QuickBird or IKONOS resolution to 28.5-meters.Ten percent of the data points provided by DT training mosaic pixelswere withheld from the classification for validation purposes. Formany of the regions, the original training mosaic was smoothedusing a 3×3 low-pass mean filter. The filter smoothed the image bydecreasing sharp edges between high and low built-up area valuesin the training data.

A set of independent variables was created for the RTs (Table 3). Aswith the DTs, most of the variables were derived from the satellite im-agery. In addition to the satellite-derived layers used for the DTs,landcover and DEM-derived variables including elevation and slopewere used. Also, note that GLCM was not used in the RTs for medium-resolution data. Visual inspection of the results indicated that for mostof the provinces, pixels with a very low built-up fraction wereunreliably classified. The pattern of these pixels appeared as a salt-and-pepper effect rather than an actual indicator of human settlementor activity. Thus, below a certain threshold in each province, pixelswith the lowest—but still non-zero—built-up area fraction were consid-ered noise and were reclassified as zero. The percent built-up thresholdvaluewas slightly larger than one% in all of thefirst-level units except inSindh and Gilgit-Baltistan (Table 4, Column 1). For these two provinces,the complete results of the RTs were used.

3.2. Likelihood raster layer

The per-pixel distribution of population was performed using twolayers: 1) the percent built-up area layer derived from the previoussteps and 2) a likelihood layer, which is described below. The two-data-layer approach was used because in densely settled areas astrong, linear relationship between built-up areas and populationwas evident, while in less densely built-up areas this distributionmethodology was more tenuous.

The built-up area estimate does not capture every piece of thehuman footprint, but Mubareka et al. (2008) found that the creationof a probability raster layer that combinesmultiple sources of geospatial

Page 6: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

Fig. 1. Decision tree results (QuickBird image with superimposed results) for Mingora, NWFP.

224 D. Azar et al. / Remote Sensing of Environment 130 (2013) 219–232

data could improve population distributions in less dense areas. In thisstudy, the researchers had a large set of small-area population dataavailable. These data were used to calibrate their probability layerusing a stochastic algorithm. Data sufficient to duplicate their method-ology were not available for Pakistan. Therefore, the layer used in thisstudy, though inspired by a probability layer, was generated iterativelyrather than stochastically and is referred to as a likelihood layer.

FollowingMubareka et al. (2008),we chosefive input geospatial var-iables to create a population likelihood raster: 1) proximity to built-uparea detected in Landsat ETM+ imagery, 2) density of populated pointsfound in the National Geospatial-Intelligence Agency (NGA) GEOnetNames Server (National Geospatial-Intelligence Agency, 2010), 3) eleva-tion, 4) slope, and 5) local heterogeneity of landcover classes. For each ofthese variables a weight was used to indicate the relative importance ofeach value. Variable weights are shown in the column headings inTable 5. The value of each pixel for a particular variable was obtainedand scored according to the values shown by row in Table 5. Thatscore was thenmultiplied by the variable weight. The sum of this multi-plication for each variable yielded individual pixel values in the proba-bility layer. The scores and weighting were determined iterativelywith the goal of reducing error as measured by calibration settlementpoints, discussed in Section 3.3.2. Lastly, the results were then smoothedand normalized by dividing the values by the maximum score. Thus,final values in the probability raster ranged from 0 to 100 percent, thesame range as the built-up area layer.

Distance to built-up area is the most heavily weighted probabilityfactor. This reflects the assumption that people are most likely to befound near where a settlement has been detected, especially in ruralareas. In the less densely settled administrative tehsils, it is difficult todetect widely scattered buildings using moderate-resolution imagery.

The village density likelihood vector layerwas created by convertingNGA's GEOnet database of population point file into a density raster. Aregion with a 250-meter radius was created around each point, andthe density of settlements within a 1 km moving window was thenassigned to each pixel in the probability grid. The 250-meter radiusused was the same as the smallest settlements in Mubareka et al.(2008), because the larger settlements were detected within thebuilt-up area layer. Since the probability layer is used to assign popula-tion only to pixels where no built-up surfaces were detected, any areasalready detected in the built-up area estimate would already have pop-ulation assigned to them.

Calculations of the probability for DEM-derived elevation andslope were identical. The amount of built-up area detected within aseries of predefined categories was used to calculate the score forthat category within each province, as shown in Table 5. Thus, if90% of built-up area is found in the lowest elevation or slope categoryfor a province, that category is scored with a 90 in the probabilitylayer. The overall effect was that in a province where more built-upareas were detected on steeper slopes or at higher elevations, theprobability of population being present in high or steep areas, not

Page 7: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

Table 3Satellite and geospatial variables used for classification of Landsat imagery with regres-sion trees.

Number Group Band Notes

1 Visible Blue2 Green3 Red4 SWIR Near5 Mid6 Far7 Vegetation

indicesNDVI Normalized Difference Vegetation Index

8 GNDVI Green Normalized Difference VegetationIndex

9 TNDVI Transformed Normalized DifferenceVegetation Index

10 SAVI Soil-Adjusted Vegetation Index11 NDBI Normalized Difference Built-up Index12 NDBaI Normalized Difference Bareness Index13 NDWI Normalized Difference Water Index14 SLAI Satellite-Derived Leaf Area Index15 MNDWI Modified Normalized Difference Water

Index16 Tasseled cap Brightness17 Greenness18 Wetness19 Additional veg.

indexIBI Index of Biological Integrity

20 Terrain DEM SRTM 30-meter21 Slope22 Categorical Landcover

225D. Azar et al. / Remote Sensing of Environment 130 (2013) 219–232

detected in the built-up area layer, was increased. This was importantin the more mountainous areas, because it increased the likelihoodscore for distributing population to higher slopes and elevations.

The likelihood weights for the heterogeneity of landcover layerwere determined by calculating the number of landcover classesfrom a 20-meter landcover map found within a moving window600-meter on a side (Mubareka et al., 2008). This factor capturesthe assumption that a high number of landcover classes within a rel-atively small area indicated anthropogenic disturbance. Areas with alarger number of landcover classes were scored higher, thus slightlyincreasing their likelihood of containing population. This was themost lightly weighted factor in the likelihood layer.

The next step was to determine how to distribute populationusing the two layers (built-up area and likelihood layer) within the393 tehsils. The distribution of population within an administrativeunit was carried out using four groups based on the percentage ofbuilt-up area detected within the unit. In the tehsils that were domi-nated by built-up areas (Group 1), population was distributed only tobuilt-up pixels (Table 6). The majority of tehsils in Pakistan (Group 2)were predominately rural but had one or more dense settlements.These tehsils had population distributed to both built-up areas and

Table 4List of thresholds used in population distribution algorithm.

First-level unit 1 2 3

Pixel %built-uparea

Threshold of ADM3 built-uparea for inclusion in Group 1(%)

Threshold of ADM3 buarea for inclusion in G(%)

Azad Kashmir 5 to 100 >=1.0 b=.01Balochistan 3 to 100 >=1.0 b=.01FATA 3 to 100 >=1.0 b=.01Gilgit-Baltistan 1 to 100 >=1.0 b=.01NFWP 4 to 100 >=2.0 b=.01Punjabb 6 to 100 >=5.0 b=.01Sindh 1 to 100 >=1.5 b=.01

a The selected thresholds are the result of an iterative population disaggregation procedaccurate. The numbers at the top of the columns are for easier reference in the text and ha

b Includes Islamabad.

to the likelihood layer. The proportion of population assigned to thebuilt-up areas and likelihood layers was determined by iterativeruns for each province, described in Section 3.3.2, and reported inTable 4 (columns 5 and 6).

The final two groups of tehsils were handled similarly. Adminis-trative units with negligible built-up area (Group 3) had all of theirpopulation distributed to the likelihood raster with the caveat thatany built-up pixels were transferred to the likelihood layer and re-ceived the highest possible weighting, thus receiving the mostper-pixel population. The few units where no built-up pixels weredetected (Group 4) had all of their population distributed to the like-lihood raster. Population was not distributed to a pixel where thelikelihood value of a pixel fell below a value that was iteratively setfor each province. The process to determine the value for each prov-ince is described in Section 3.4.

3.3. Population distribution

3.3.1. Population distribution algorithmThe formula for population distribution shown below was applied

for each pixel in the study area based on the built-up areas and like-lihood layers, and with the tehsils sorted into groups based ondensity.

Pi;j ¼Bið Þ Aið ÞΣBUILTj

!NBUILTð Þ

" #þ Wi

ΣLIKj

!NLIKð Þ

" #ð1Þ

where

Pi,j Population for pixel i in administrative unit (tehsil) jBi Pixel i built-up percentWi Pixel i likelihood weightAi Pixel i areaNBUILT Population pool assigned to built-up areasNLIK Population pool assigned to likelihood areasΣBUILTj

Total area of all built-up pixels above threshold for admin-istrative unit (tehsil) j

ΣLIKjTotal likelihood score for all pixels above threshold in ad-ministrative unit (tehsil) j

The fraction of built-up area that a pixel represents within its ad-ministrative unit was determined. This fractional number represent-ed the proportion of the built-up population pool assigned to thatsame pixel. The per-pixel likelihood weight was treated similarly tothe built-up fraction, with per-pixel population based on the fractiona pixel was calculated to have out of the total likelihood weight for anadministrative unit. As with the built-up area layer, pixels below a

4 5 6

ilt-uproup 3

Probability range thatreceived population(%)a

Proportion of tehsilpopulation to built-upareaa

Proportion of tehsilpopulation toprobabilitya

25 to 100 20 8020 to 100 30 7020 to 100 20 8025 to 100 20 8040 to 100 50 5040 to 100 30 7045 to 100 40 60

ure. The resulting combination with the lowest statistical error was deemed the mostve no other meaning.

Page 8: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

Table 5Per-pixel scores for ancillary geospatial layers (rows) with overall layer weight (column headings) used in population distribution algorithm.

Distance from built-up area Village point density (%) Elevation (m) Slope (%) Landcoverheterogeneity(# of classes)

Weight=0.7 Weight=0.07 Weight=0.08 Weight 0.1 Weight=0.05

Range Score Range Score Range Score Range Score Range Score

0–0.5 km 200 >60 100 0–400 Varieda 0–10 Varieda 7 or 8 400.5 km–1 km 150 51–60 95 401–800 11–20 5 or 6 301.1–2 km 125 41–50 90 801–1200 21–30 3 or 4 202.1–3 km 75 31–40 85 1201–1600 >30 0–2 103.1–5 km 20 21–30 80 1600–18005.1–10 km 15 11–20 40 >1800>10 km 5 0–10 5

a The amount of built-up area detected within a series of predefined categories was used to calculate the score for that category within each province.

Table 6Groups used to determine population distribution method based on amount of built-up

226 D. Azar et al. / Remote Sensing of Environment 130 (2013) 219–232

certain normalized likelihood score received no population. Thisthreshold was determined iteratively, varied by province, and isdiscussed in Section 3.3.2 (see Table 4, column 4 for thresholds).

3.3.2. Iterative calibration of built-up area or likelihood population poolsand determination on minimum likelihood threshold

As previously mentioned, an iterative population disaggregationalgorithm was applied to tehsils grouped by each first-level adminis-trative unit, resulting in 77 different combinations for each unit, for atotal of 539 for the country.3 The number 539 is the result of 7 differ-ent first-level administrative units used for processing, run 77 times(remember that Islamabad was processed with Punjab because ofits small size). Each resulting population grid was checked against aset of 160 settlement points. These points were collected by examin-ing the 1998 Pakistan census and then identifying place names thatcould be identified using available Google Earth imagery as of 2010and the NGA GNS (National Geospatial-Intelligence Agency, 2010).The settlement was examined in Google Earth and a rough centralpoint was chosen (Google Earth, 2010). A radial buffer was then de-termined that contained the approximate urban extent of continuousbuilt-up area in Google Earth imagery. The exact urban extent wasnot digitized. A single measurement for the radius of urban extentwas used for each settlement. Only settlements that could be reason-ably isolated in the imagery, with a unique population recorded in thecensus, were used. The population grid was aggregated within thebuffered settlement areas, and the estimated population was com-pared to the population recorded in the census. A final populationraster was selected for each province based on the lowest RMSE,mean absolute percentage error, and bias within the city verificationpoints.4

3.4. Population distribution error check

Lastly, for tehsils where the population distribution was split be-tween built-up areas and the likelihood layer, a final processing stepwas necessary. Each unit was checked to ensure that the maximumpopulation generated by the likelihood layer did not exceed the min-imum value generated by distribution to built-up areas. This wouldhave created an edge effect, where there was an abrupt change tohigher population densities if density had declined moving away

3 The 77 runs resulted from different combinations of the proportion of populationdistributed to built-up and likelihood areas. In each run, the population allotted tobuilt-up areas and likelihood areas was between a minimum of 20 percent and a max-imum of 80 percent of the total, at 10 percent increments, with the total of the twopools always equal to 100 percent of the tehsil population. The minimum cutoffthreshold for the likelihood raster in each unit was also varied in increments of 5 witha minimum pixel score of 10 and a maximum pixel score of 60.

4 The raster with the lowest RMSE was used. If two rasters had RMSEs that did notsubstantially differ, the raster with the lowest mean absolute percentage error andthen lowest bias was selected.

from the core of settlement. In any unit where the maximum popula-tion of the likelihood layer violated this test, population was drawnevenly from the likelihood pool and reassigned to the built-up pooluntil the minimum value in built-up areas slightly exceeded the max-imum value in likelihood layer areas. If the number of peoplesubtracted from a likelihood pixel exceeded the value of that pixel,the difference was tracked and additional people were subtractedfrom remaining pixels until the population removed from the likeli-hood pixels was equal to the number added to built-up pixels. For ex-ample, if the lowest population pixel value in the built-up area of aunit was 8 and the highest value for likelihood pixels in the sameunit was 10, population would be removed from all likelihood pixels.When the built-up minimum density exceeded the likelihood mini-mum for instance reaching 9.5 (built-up minimum) compared to 9.0(likelihood maximum), the loop terminated.

4. Results

Overall, 15 different areas, based on first-level administrative ge-ography and ecoregion boundaries, were used when classifying themedium-resolution imagery for the entire country of Pakistan. Thepattern of urbanization in Pakistan is evident in the classificationresults as urban areas in the Indus Valley are easily recognizable(Fig. 2). The southern and eastern portions of the country are moredensely settled than the less urban north and west, though thereare large cities in these areas. While the built-up area layer alone pro-vides a broad look at the pattern of settlement, it is not adequate forlocal relief planning before being linked to population data.

4.1. Medium-resolution regression tree built-up area estimates

The parameters for RT classification were varied slightly for each re-gion to produce the highest confidence results. The r-value as reportedin the Cubist software was used as the indicator for medium-resolutionclassification success. This value was based on the 10% sample set asidefor within-model validation. The r-value was above 0.85 for all first-level units except for Azad-Kashmir (0.75) and Gilgit-Baltistan (0.64).

area detected in tehsils.

Group Area type To built-up (%) To probability (%)

Group 1 Dense urban 100 0Group 2 Rural with towns See Table 4a See Table 4a

Group 3b Negligible built-up area 0 100Group 4 No built-up area 0 100

a The percentage of population in a tehsil disaggregated to the built-up area andprobability layers varied by province (see columns 5 and 6 in Table 4).

b Any built-up pixels in a Group 3 administrative unit received the maximumamount of population with the probability raster.

Page 9: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

Fig. 2. Built-up area classification results from the northern portion of Pakistan with insets showing Swat Valley (upper boxes) and Northern Indus Valley (lower boxes). Built-uparea is shown in binary, as built-up (black) or not built-up (white). The pattern of settlement follows the sinuous valley floor, top right inset. The bottom right inset displays a clearurban hierarchy in the Northern Indus Valley.

227D. Azar et al. / Remote Sensing of Environment 130 (2013) 219–232

Low values in these first-level units were due to the small amount ofbuilt-up area and developed infrastructure. Some areaswere detectableusing high-resolution imagery but when aggregated to 30 m, the per-centages of built-up areas with which to establish the relationshipwere low. For example, the highest pixel value for the aggregated train-ing areas in Gilgit-Baltistan was 24%.

While Cubist provides endogenous verification within each modelrun based on 10% of the sample not used as training, we also conductedpost-classification verification using high-resolution imagery indepen-dent of the training regions. In order to do this, an additional fiveQuickBird images were downloaded and classified in Punjab (2), NWFP(2), and Sindh (1). Of these five scenes, two were largely urban scenes(Southern Lahore and Northern Havelian). The remaining three verifica-tion sites (Malakwal, Northern Hyderabad Outskirts, and NorthernCharsadda Outskirts) were from more rural areas, with sparser settle-ments, on the outskirts of larger urban areas. The two urban scenes

were chosen for verification to establish that results in this type of envi-ronment were consistent with our previous work in Haiti (Azar et al.,2010). The sparser settlements in the verification dataset were unlikeany area in Haiti. They were therefore chosen to provide a test of thehigh-resolution classification in what was viewed as the most challeng-ing environments for the RT.

These additional images were classified using the same methodolo-gy as the high-resolution training images. The DT-based classificationresults were resampled directly to 28.5-meters, and smoothed using a3×3 focal mean moving window. The results of Landsat-based Cubistclassification were then subset to the extent of each validation image.Both the QuickBird validation images and the Landsat-based Cubist re-sults were resampled to 100-meters to reduce registration errors. Aper-pixel comparison was then performed, using a spatially stratifiedrandom sample. The sample was stratified to favor built-up areas, re-ducing the number of zero-to-zero comparisons. Because up to eight

Page 10: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

Table 7Statistics generated from validation sites used to assess the accuracy of Landsat-basedbuilt-up surface classification by regression trees.

Site N r2 RMSE MAE Bias Densitycharacterization

Southern Lahore 801 0.77 17.10 13.32 −11.17 UrbanNorthern Havelian 512 0.73 12.33 10.63 −2.31 Largely urbanMalakwal 457 0.73 10.11 5.96 −5.48 Scattered

settlementsNorthern HyderabadOutskirts

619 0.72 7.29 4.30 −3.03 Scatteredsettlements

Northern CharsaddaOutskirts

993 0.67 10.63 5.96 −4.40 Scatteredsettlements

N=Number of pixel sampled in each validation site.

228 D. Azar et al. / Remote Sensing of Environment 130 (2013) 219–232

years separated the date of the Landsat imagery and the QuickBird val-idation imagery, areas of change due to urban growth were excludedfrom the analysis. The resulting scatterplots displaying the per-pixelcomparison of the Landsat-based Cubist results and QuickBird-basedvalidation images are shown in Fig. 3.

The results from verification, summarized in Table 7, were foundto have similar r2 values, RMSE, mean absolute error (MAE), andbias compared with the results from the previous study in Haiti(Azar et al., 2010). The correlation between model output and valida-tion was again dependent on the density of the validation area. Themost urban section, in Southern Lahore, has the highest ‘goodnessof fit’ (r2=0.77) between Cubist results and the validation dataset.The greater error statistics, both RMSE and MAE, for Southern Lahoreare due to the overall higher per-pixel built-up fractions found in thatimage.

Results in the three less urban to rural areas indicate a slight de-crease in correlation between the medium-resolution model outputand the high-resolution validation. The lower error statistics in theseareas were due to the overall lower per-pixel built-up fraction in themore rural validation sites. The bias towards under-prediction demon-strated in all of the validation areas was also consistent with the resultsof Azar et al. (2010). Under-prediction reflects the loss of informationthat occurs when moving from high-resolution to medium-resolutionimagery. Ideally, the entire study area (Pakistan) would be classified

r² = 0.77

0

50

100

0 50 100

% B

uilt-

up (

Land

sat)

% Built-up (QuickBird)

Southern Lahore

r² = 0.73

0

50

100

0 50 100

% B

uilt-

up (

Land

sat)

% Built-up (QuickBird)

Northern Havelian

r² = 0.73

0

50

100

0 50 100

% B

uilt-

up (

Land

sat)

% Built-up (QuickBird)

Malakwal

r² = 0.72

0

50

100

0 50 100

% B

uilt-

up (

Land

sat)

% Built-up (QuickBird)

Northern Hyderabad Outskirts

r² = 0.67

0

50

100

0 50 100

% B

uilt-

up (

Land

sat)

% Built-up (QuickBird)

Northern Charsadda Outskirts

Fig. 3. Correlation scatterplots between regression tree results and decision tree valida-tion. Landsat is the regression-tree derived percent built-up area value for a pixel. TheQuickBird value is derived from a decision tree in place of in situ verification. Both referto the percent of the area of the pixel that is built-up.

using high-resolution imagery. For now, the scale of that undertakingremains cost- and time-prohibitive.

4.2. Population distribution algorithm

After validating the results of medium-resolution imagery classifi-cation, the population distribution algorithm was used to distributethe population data. A portion of the final distributed population ras-ter is shown in Fig. 4. Because Pakistan is such a large country, it couldnot be mapped on a single page with adequate detail. Therefore, onlythe northern part of the country is shown, with inset boxes showingthe Swat Valley and the northern Indus floodplain. Population valuesare symbolized categorically, although the actual values in the rasterstretch continuously from 0 to the maximum value of 65,535 personsper square km.

Once the distribution was computed, we sought to evaluate it andcompare it to existing datasets using a second set of settlement pointsthat could be linked to the 1998 census data tables. Following themeth-odology used for calibration points, 56 settlement locations and censuspopulations were collected from NWFP (20 points), Punjab (15 points),and Sindh (21 points) provinces. These locations were selected with anemphasis on 1) finding points whose area of settlement could be isolat-ed from surrounding settlements and 2) ensuring an evenly spreadsample of settlements smaller than the largest cities. Cities with largepopulations were purposely not included in the dataset because theirextent may coincide with an administrative unit, in which case theirutility as evaluators would be compromised. This is because as theurban feature approaches the size of the tehsil, the error of the gridded,redistributed population estimate dataset approaches zero. Overall, fo-cusing on smaller settlements that occupy only a small fraction oftheir encompassing administrative units is the most robust assessmentof accuracy.

Each settlement point was buffered and a population was generat-ed from four gridded population datasets: the 100-meter product de-scribed here, LandScan (Oak Ridge National Laboratory, 2002), GPW,and GRUMP (Center for International Earth Science InformationNetwork, 2011). For both LandScan and GPW, we used the availabledata year closest to the census date. GPW estimates were only gener-ated for 18 points due to this dataset's low (quarter degree, or about16 miles) resolution in Pakistan. The remaining 38 settlements weresmaller than one GPW grid cell. On average, the population of the set-tlements was 51,589 people, with a minimum size of 3019, a maxi-mum of 358,376, and a standard deviation of 66,231. Although thegreatest strength of our method is for neighborhood-level analysisin urban areas, it was necessary to aggregate up to the level of an in-dividual settlement in order to be able to conduct a comparison be-tween datasets.

Census population counts have their own margin of error. Accura-cy, in this analysis, reflects similarity to the census count rather thanan absolute ground truth. Fig. 5 compares population estimates gen-erated using the 100-meter grid, LandScan, and GRUMP to the count

Page 11: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

People per Hectare (0.01 km2)

00.01 - 9.9910 - 29.9930 - 99.99100 - 299.99300 - 599.99600 or Greater

^A . K .

G I L G I T - B A L T I S T A N

F A T A

K H Y B ER- PA K H T U N K H WA

P U N J A B

0 10 Miles

0 10 Miles

Fig. 4. Population distribution results for the northern portion of Pakistan with insets for the Swat Valley and northern Indus plain. Population values are symbolized categoricallyalthough the actual values in the raster stretch continuously from 0 to the maximum value of 65,535 persons per square km.

229D. Azar et al. / Remote Sensing of Environment 130 (2013) 219–232

recorded for a named place in the census of Pakistan for each valida-tion settlement point. For both the 100-meter grid and LandScan ther2 is 0.92 (Fig. 5), indicating a similar linear relationship between thepopulation counts estimated when using both datasets. The r2 forGRUMP is 0.77. This value indicates that it captures variation in thepopulation, but it was not close to the one-to-one line as each datapoint indicated a substantial underprediction relative to census pop-ulation data. The trend lines for the all datasets indicate that the100-meter grid product is closer to the one-to-one line. The variabil-ity of error for estimates produced by the datasets also differs. TheRMSE and MAE for the 100-meter grid were lower than LandScan'sand substantially lower than those for GPW or GRUMP (Table 8). Bya large margin, predicting the arithmetic mean of population foreach settlement would produce a more useful result than relying onthe GPW dataset. The lower difference between RMSE and MAE ofthe 100-meter grid relative to LandScan's or GRUMP's difference canbe interpreted as a decrease in the variability of small-area estimationerror (Wilmott & Matsuura, 2005). This is an indication of improvedestimation accuracy with the 100-meter grid relative to LandScan orGRUMP. It also indicates that the most accurate small area estimatesproduced using the 100-meter grid can be found at the neighborhoodlevel in large cities and due to the strong relationship betweenbuilt-up areas and population in those areas. It was not possible tocollect point verification in these areas because of the structure ofthe census of Pakistan.

5. Discussion

The Swat Valley and the northern Indus plain in Punjab are twoareas that have recently been in the news for different reasons: theSwat valley because it was the scene of a complex humanitarianemergency in 2009, where two million people were rapidly displacedfrom their homes due to civil unrest (Rummery & Khan, 2010), andthe northern Indus plain because of extensive flooding in 2010(United Nations Office for the Coordination of Humanitarian Affairs,

2010). They feature two very different settlement patterns, with theSwat Valley covered primarily by small, isolated villages and thenorthern Indus plain in Punjab containing a highly organized urbanpattern with a full range of settlement types from small villages tolarge cities. Together these two areas are displayed in the insetmaps of Fig. 4. The textured variability shown in the insets indicatesthe ability of the product developed here for estimating populationof areas that have been or could be affected by hazards at andbelow the neighborhood scale.

The accuracy of census data and subsequent ability to tie those datato a specific location are sources of uncertainty in the distribution algo-rithm and validation attempt. Since the base of the population distribu-tion is the total population at the third-level administrative unit asreported in the census, any errors in those counts are propagated tothe final raster. When we compared census counts to our estimates ofbuilt-up area at the tehsil level, the population counts for the majorityof the country appeared to be reliable.

The method for collecting calibration and validation settlementpoints, buffering those points, and associating them with a placename and census population count taken from the census tables rep-resents the best possible approach given available data. It is nearlyimpossible to know which people, as indicated by the areal footprintof their homes, were counted in each administrative jurisdictionbelow the mapped census units. For many named areas in Pakistan,a census table entry exists for both the urban and rural populationsof that locale. For example, a fifth-level administrative area may bethe seat of government for a fourth-level unit. The central, densercore settlement would constitute the urban population while sur-rounding areas (until the beginning of the next fourth-level unit)would constitute the rural population. Because neither the fourth-nor fifth-level boundaries were available, it was difficult to determinewhich hamlets on the periphery of the core areas may or may nothave been enumerated in a particular census table entry. Given thelevel of uncertainty, a consistent approach was sought. The bufferswere drawn to include the farthest continuous areas of settlement

Page 12: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

Fig. 5. Comparison between 100-meter grid, LandScan, and GRUMP derived population estimates for each settlement.

230 D. Azar et al. / Remote Sensing of Environment 130 (2013) 219–232

connected to the center of the settlement except when connectedonly by a large radiating road. Even if the perfectly digitized adminis-trative boundaries at lower levels were available, people are not al-ways enumerated at a precise address. Residents of small, clan-basedhamlets may have their enumeration tied to a close-by market townor be counted within a larger unit. For all of these reasons, there wasconsiderable uncertainty in attempting to create new georeferencedcensus data working below the third administrative level.

This dataset represents a marked improvement in accuracy relativeto GPW, which uses only administrative units to map population data.The dataset also improves upon GRUMP, which has a uniform popula-tion distribution in primarily rural areas. Agriculturally productive,rural areas in Pakistanhave substantial population clusters that are pres-ent in the 100-meter grid dataset, and these population clusters are notvisible as concentrations of population in GRUMP. This contributes tothe higher error in GRUMP relative to the 100-meter grid or LandScan.The 100-meter grid methodology, which combines a dasymetric ap-proach with a per-pixel prediction based primarily on built-up area, isable to produce meaningful information well below the resolution of

Table 8Difference statistics comparing small area population estimates generated for rasterdistributed population products.

Statistic 100-meter grid LandScan GPW GRUMP

RMSEa 31,089 48,001 100,260 72,071MAEa 17,372 23,212 72,752 47,029

a This table displays error statistics comparing population estimates generated usingzonal statistics with the four rasters listed and comparing the results to recorded cen-sus counts.

the input administrative unit. The method also matches or exceeds thesmall area estimate accuracy of LandScan for rural areas and areaswith predominantly small towns. In addition, it likely produced substan-tially improved estimates in urban areas, though direct testing is difficultdue to the limitations of the census boundaries and the coarse resolutionof LandScan. Testing the accuracy of any gridded population dataset insmaller rural settlements and for smaller parts of large urban settle-ments is difficult because digital boundary files for these are seldomavailable. In addition, the connection between the area named in a cen-sus table and the actual spatial extent of that area may be tenuous. The100-meter resolution allows for a marked increase in the perception oflocal density, permitting the kind of neighborhood scale analysis thatis critical for humanitarian crises that affect urban areas. For this reason,during the summer 2010 flooding in Pakistan, the Humanitarian Infor-mation Unit at the U.S. Department of State estimated the population af-fected by flooding using this 100-meter grid dataset.

The methodology in this paper is complex, but can be implementedrelatively quickly. It is not intended to produce a high-temporal resolu-tion product, continually updated during the course of an emergency.Rather, it seeks to provide a high-resolution baseline for use duringemergencies when field reports describe relatively small named placesand how the people who live in those places are impacted. The spatialscale of the 100-meter gridded dataset offers a higher-resolution baselinethat can reduce guesswork about populations affected by disasters—forexample, when field workers report refugee arrival or departures forsmall settlements that do not show up in coarser datasets. Overall, theprocessing of Pakistan took two full-time staff and a teamof a few internsroughly six months to complete. This time requirement is longer thanthe critical window of most emergencies. Population displacements

Page 13: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

231D. Azar et al. / Remote Sensing of Environment 130 (2013) 219–232

and movements occur frequently during rapid-onset emergencies. Alarger number of staff could reduce the processing time substantially.However, because census data provide the baseline input for this meth-odology, it would not be possible to update rapidly the gridded popula-tion product during an emergency unless a large-scale survey takesplace.

6. Conclusion

This study applied advanced populationmappingmethods to a largecountry in the developing world. This research represents a uniquecombination of method, resolution, input data, and scope. Over 200high- and medium-resolution images were used to distribute censuspopulations at a fine scale. The methods combined elements ofdasymetric mapping with semi-automated image classification tech-niques to produce per-pixel population estimates highly sensitive tothe variability in settlement density. While studies in the developedworld have mapped at similarly high resolutions, these are in data-rich environments with less demand for externally provided humani-tarian relief and where the population distribution is typically alreadywell known. Other population distribution methodologies either havethe same resolution but are based on strictly dasymetric mapping ap-proaches that capture less variation within mapping zones (Linard etal., 2010a, 2010b) or have a global scale but lower resolution (LandScan,GPW). All methodologies, including the one presented here, are limitedby the quality of available input data.

Future work on this methodology will focus on improvements tocalibration and validation. Any improvements would likely necessi-tate the availability of population data below the administrativelevel used for distribution. Such data could be obtained from in situdata collection or one of the few developing countries that hasvillage-level boundary and demographic data, such as Kenya.

In advancing our method and comparing it to other availableproducts and methodologies, we seek to add to the richness of dataavailability and methodological inquiry in the field of populationmapping. Due to the ease of obtaining geographic population datafor the United States, basic information on population characteristicsand distribution can be taken for granted by policymakers and ana-lysts. Each gridded population dataset has a scale and type of analysisfor which it is more appropriate. The products discussed in this paperare freely available to the public for unrestricted use. They aredesigned to fill a data gap in the developing world at a scale appropri-ate for neighborhood through national scale analysis and to be usefulfor planning and response before, during, and after natural disastersand complex humanitarian crises.

References

Azar, D., Graesser, J., Engstrom, R., Comenetz, J., Leddy, R. M., Schechtman, N. G., et al.(2010). Spatial refinement of census population distribution using remotelysensed estimates of impervious surfaces in Haiti. International Journal of RemoteSensing, 31, 5635–5655.

Balk, D. L., Deichmann, U., Yetman, G., Pozzi, F., Hay, S. I., & Nelson, A. (2006). Determin-ing global population distribution: methods, applications, and data. Advances inParasitology, 62, 119–156.

Bittencourt, H. R., & Clarke, R. T. (2003). Use of Classification and Regression Trees(CART). IGARSS 2003. Toulouse: IEEE.

Blaschke, T. (2010). Object based analysis for remote sensing. ISPRS Journal of Photo-grammetry and Remote Sensing, 65, 2–16.

Breiman, L., Friedman, J. H., Olshen, A., & Stone, C. J. (1984). Classification and RegressionTrees. Belmont, CA: Wadsworth.

Briggs, D. J., Gulliver, J., Fecht, D., & Vienneau, D. M. (2007). Dasymetric modelling ofsmall-area population distribution using landcover and light emissions data. Re-mote Sensing of Environment, 108, 451–466.

Center for International Earth Science Information Network (2000). Gridded Populationof the World, Version 3 Data Collection. Palisades, NY: Columbia University.

Center for International Earth Science Information Network (CIESIN)/ColumbiaUniversity, International Food Policy Research Institute (IFPRI), World Bank, andCentro Internacional de Agricultura Tropical (CIAT) (2011). Global Rural-Urban Map-ping Project, Version 1 (GRUMPv1): Population Count Grid. Palisades, New York:

NASA Socioeconomic Data and Applications Center (SEDAC). Retrieved from http://sedac.ciesin.columbia.edu/data/set/grump-v1-population-count

Central Intelligence Agency (2010). The World Factbook. Retrieved from https://www.cia.gov/library/publications/the-world-factbook

Chen, K. (2002). An approach to linking remotely sensed data and areal census data. In-ternational Journal of Remote Sensing, 23, 37–48.

Dobson, J. E., Bright, E. A., Coleman, P. R., Durfee, R. C., &Worley, B. A. (2000). LandScan:A global population database for estimating populations at risk. PhotogrammetricEngineering and Remote Sensing, 66, 849–857.

Eicher, C. L., & Brewer, C. A. (2001). Dasymetricmapping and areal interpolation: Implemen-tation and evaluation. Cartography and Geographic Information Science, 14, 125–138.

Goetz, S. J., & Jantz, P. (2006). Satellite map shows Chesapeake Bay urban development.EOS, Transactions, 87, 149–156.

Google Earth (Version 5.1.3533) (2010). Available from http://www.google.com/earth/index.html

Guindon, B., Zhang, Y., & Dillabaugh, C. (2004). Landsat urban mapping based on a com-bined spectral-spatial methodology. Remote Sensing of Environment, 92, 218–232.

Hansen, M. C., Defries, R. S., Townshend, J. R. G., & Sohlberg, R. (2000). Global land coverclassification at 1 km spatial resolution using a classification tree approach. Inter-national Journal of Remote Sensing, 21, 1331–1364.

Harvey, J. T. (2002). Estimating census district populations from satellite imagery: Someapproaches and limitations. International Journal of Remote Sensing, 25, 2071–2095.

Hay, S. I., Noor, A. M., Nelson, A., & Tatem, A. J. (2005). The accuracy of human popula-tion maps for public health application. Tropical Medicine & International Health, 10,1073–1086.

Holt, J. B., Lo, C. P., & Hodler, T. W. (2004). Dasymetric estimation of population densityand areal interpolation of census data. Cartography and Geographic Information Sci-ence, 31, 103–122.

Homer, C., Huang, C., Yang, L., Wylie, B., & Coan, M. (2004). Development of a 2001 na-tional landcover database for the United States. Photogrammetric Engineering andRemote Sensing, 70, 829–840.

Huang, C., Yang, L., Wylie, B., & Coan, M. (2002). Synergistic use of FIA plot data andLandsat 7 ETM+ images for large area forest mapping. In R. McRoberts, G.Reams, P. Van Deusen, & J. Moser (Eds.), Proceedings of the Third Annual Forest In-ventory and Analysis Symposium (pp. 20–55). St. Paul, MN: US Department of Agri-culture, Forest Service, North Central Research Station.

Li, G., &Weng, Q. (2005). Using Landsat ETM+ imagery tomeasure population density in In-dianapolis, Indiana, USA. Photogrammetric Engineering and Remote Sensing, 71, 947–958.

Linard, C., Alegana, V. A., Noor, A. M., Snow, R. W., & Tatem, A. J. (2010a). A high reso-lution spatial population database of Somalia for disease risk mapping. Internation-al Journal of Health Geographics, 9.

Linard, C., Gilbert, M., Snow, R. W., Noor, A. M., & Tatem, A. J. (2012). Population distri-bution, settlement patterns and accessibility across Africa in 2010. PLoS One, 7(2),e31743. http://dx.doi.org/10.1371/hournal.pone0031743.

Linard, C., Glibert, M., & Tatem, A. J. (2010b). Assessing the use of global landcover datafor guiding large area population distirbution modeling. GeoJournal, 76.

Liu, X., & Clarke, K. C. (2002). Estimation of residential population using high resolutionsatellite imagery. In D. Maktav, C. Jeurgens, & F. Sunar-Erbek (Eds.), Proceedings ofthe 3rd Symposium in Remote Sensing of Urban Areas (pp. 153–160). Istanbul: Istan-bul Technical University Press.

Liu, X., Clarke, K., & Herold, M. (2006). Population density and image texture: Acomparison study. Photogrammetric Engineering and Remote Sensing, 72,187–196.

Lo, C. P. (1989). A raster approach to population estimation using high-altitude aerialand space photographs. Remote Sensing of Environment, 27, 59–71.

Lo, C. P. (2003). Zone-based estimation of population and housing units fromsatellite-generated land use/land cover maps. In V. Mesev (Ed.), Remotely SensedCities (pp. 157–180). New York, New York: Taylor and Sons.

Lu, D., Weng, Q., & Li, G. (2006). Residential population estimation using a remote sens-ing derived impervious surface approach. International Journal of Remote Sensing,27, 3553–3570.

Martin, D., Tate, N., & Langford, M. (2002). Refining population surface models: Exper-iments with Northern Ireland census data. Transactions in GIS, 4, 343–360.

Mennis, J. (2003). Generating surface models of population using dasymetric mapping.The Professional Geographer, 55, 31–42.

Mubareka, S., Erlich, D., Bonn, F., & Kayitakire, F. (2008). Settlement location and popula-tion density estimation in rugged terrain using information derived fromLandsat ETMand SRTM data. International Journal of Remote Sensing, 29, 2339–2357.

National Geospatial-Intelligence Agency (2010). GEOnet Names Server (GNS). Retrievedfrom http://earth-info.nga.mil/gns/html/

National Research Council (2007). Tools and Methods for Estimating Population at Riskfrom Natural Disasters and Complex Humanitarian Crises. Washington, DC: NationalAcademy of Science.

Nelson, A. (2004). African Population Database Documentation. Retreived from http://na.unep.net/siouxfalls/globalpop/africa/Africa_index.html

Noji, E. K. (2005). Estimation of population size in emergencies. Bulletin of the WorldHealth Organization, 83, 164.

OakRidgeNational Laboratory (2002). LandScan 2002. (ContractNo. DE-AC05-00OR22725with the United States Department of Energy).

Olson, D. M., Dinerstein, E., Wikramanayake, E. D., Burgess, N. D., Powell, G. V. N.,Underwood, E. C., et al. (2001). Terrestrial ecoregions of the world: A new mapof life on earth. BioScience, 51, 933–938.

Population Census Organization (1998). Population and Housing Census of Pakistan1998. Islamabad, Pakistan: Government of Pakistan.

Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: MorganKaufmann.

Page 14: Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data

232 D. Azar et al. / Remote Sensing of Environment 130 (2013) 219–232

Rindfuss, R. R., & Stern, P. C. (1998). Linking remote sensing and social science: Theneeds and challenges. In D. Liverman, E. F. Moran, R. R. Rindfuss, & P. C. Stern(Eds.), People and Pixels: Linking Remote Sensing and Social Science. Washington,DC: National Academy Press.

Rummery, A., & Khan, Q. A. (August 3). Civilians still need help in Swat, a year afterconflict engulfed the area. Retrieved from http://www.unhcr.org/4bdaeb646.html

Small, C. (2005). A global analysis of urban reflectance. International Journal of RemoteSensing, 26, 661–681.

Small, C., & Nicholls, R. J. (2003). A global analysis of human settlement in coastalzones. Journal of Coastal Research, 19, 584–599.

Tatem, A. J., Noor, A. M., von Hagen, C., Di Gregorio, A., & Hay, S. I. (2007). High resolu-tion population paps for low income nations: Combining land cover and census inEast Africa. PLoS One, 2, e1298.

Tobler, W. R. (1969). Satellite confirmation of settlement size coefficients. Area, 1,30–34.

United Nations Office for the Coordination of Humanitarian Affairs (UNOCHA) (2010).Pakistan Floods Relief and Early Recovery Response Plan. Revised November 2010.Retrieved from http://reliefweb.int/node/394785

Varlyguin, D. L., Wright, R. K., Goetz, S. J., & Price, S. D. (2001). Advances in landcoverclassification for applications research: A case study from the Mid-Atlantic RESAC. Col-lege Park. MD: Department of Geography, University of Maryland and Mid-AtlanticRegional Earth Science Applications Center.

Wilmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE)over the root mean square error (RMSE) in assessing average model performance.Climate Research, 30, 79–82.

Wu, C., & Murray, A. T. (2007). Population estimation using Landsat Enhanced Themat-ic Mapper imagery. Geographical Analysis, 39, 24–43.

Wu, S., Qui, X., & Wang, L. (2008). Using semi-variance image texture statistics tomodel population densities. Cartography and Geographic Information Science, 33,127–140.

Yang, L., Huang, C., Homer, C. G., Wylie, B., & Coan, M. J. (2003). An approach for map-ping large-area impervious surfaces: Synergistic use of Landsat 7 ETM+ and highspatial resolution imagery. Canadian Journal of Remote Sensing, 29, 230–240.