new england statistics symposium 2016

Post on 11-Apr-2017

401 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bayesian hierarchical models for estimating the health

effects of air pollution sourcesRoger D. Peng, PhD

Department of Biostatistics Johns Hopkins Bloomberg School of Public Health

@rdpeng, simplystatistics.org

(joint work with Jenna Krall and Amber Hackstadt)

New England Statistics Symposium April 2016

Not So Standard Deviations(with Hilary Parker of Stitch Fix)

Subscribe in iTunes: https://goo.gl/ZhWYbdhttps://soundcloud.com/nssd-podcast

Particulate Matter and Health• PM has been linked with health outcomes:

hospitalization, mortality, decreased lung function, cardiac events

• WHO estimates ~800,000 premature deaths per year

• Evidence of both short-term (acute) and long-term (chronic) effects of exposure to ambient PM

• Much recent work has examined ambient PM mass (PM10, PM2.5, PM10-2.5) indicator

• There is strong evidence that ambient PM is associated with mortality and morbidity

• What should we do about it? How can we intervene to improve health?

• Target sources of PM that are most harmful to human health

• How do we identify sources of ambient PM?

What’s Next?

• There is strong evidence that ambient PM is associated with mortality and morbidity

• What should we do about it? How can we intervene to improve health?

• Target sources of PM that are most harmful to human health

• How do we identify sources of ambient PM?

What’s Next?

Pollution Source Apportionment

9%10%

12%

13% 36%

19%

Particulate matter

Power plant

Car

Oil heating

7%8%10%11%

29%

35%

12%15%15%

20%24%

15%

5%17%

20%23%

5%

30%

Source-specific concentrations

Elemental carbon

Organic carbon

Sulfate

Nitrate

Nickel

Vanadium

Pollution Source Apportionment

9%10%

12%

13% 36%

19%

Particulate matter

Elemental carbon

Organic carbon

Sulfate

Nitrate

Nickel

Vanadium

Pollution Source Apportionment

9%10%

12%

13% 36%

19%

Particulate matter

Elemental carbon

Organic carbon

Sulfate

Nitrate

Nickel

Vanadium

?

Pollution Source Apportionment Methods

Y = FΛ

o εSource profiles

Observed chemical constituents (n x p) Error (n x p)Source

concentrations (n x k)

(k x p)

Problems• Current source apportionment methods are applied

on an ad hoc, highly tweaked, basis and are difficult to scale to a region or nation

• Source apportionment models are typically informed by investigator’s local knowledge

• No reproducible way to combine information across locations to gain power when estimating health effects (multi-site studies)

Incorporating New Data Sources on Pollution Sources

Component Data Source

Particulate Matter EPA Air Quality System (AQS)

PM Chemical Constituents EPA Chemical Speciation Network

PM Source Profiles EPA SPECIATE Database

PM Source Emissions EPA National Emissions Inventory

Model Specification

Chemical Speciation Network

SPECIATENEI

2002 2003 2004 2005

05101520

date

SULFATE

2002 2003 2004 2005

0

5

10

15

date

NITR

ATE

2002 2003 2004 2005

0.51.01.52.02.5

date

Elem

ental_Ca

rbon

2002 2003 2004 20050246810

OC_

K14

Sulfate

Nitrate

Elemental carbon

Organic carbon

alum

inum

ammonium_ion

antim

ony

arsenic

barium

brom

ine

cadm

ium

calcium

cerium

cesiu

mchlorine

chromium

cobalt

copper

elem

ental_carbon

europium

gallium

gold

hafnium

indium

iridium iron

lanthanum

lead

magnesiu

mmanganese

mercury

molybdenum

nickel

niobium

nitrate OC

phosphorus

potassium

rubidium

samarium

scandium

selenium

silico

nsilver

sodium

_ion

strontium

sulfate

sulfur

tantalum

terbium tin

titanium

tungsten

vanadium

yttrium zinc

zirconium

alum

inum

ammonium_ion

antim

ony

arsenic

barium

brom

ine

cadm

ium

calcium

cerium

cesiu

mchlorine

chromium

cobalt

copper

elem

ental_carbon

europium

gallium

gold

hafnium

indium

iridium iron

lanthanum

lead

magnesiu

mmanganese

mercury

molybdenum

nickel

niobium

nitrate OC

phosphorus

potassium

rubidium

samarium

scandium

selenium

silico

nsilver

sodium

_ion

strontium

sulfate

sulfur

tantalum

terbium tin

titanium

tungsten

vanadium

yttrium zinc

zirconium

Informative Prior Distribution

Railroad Equipment/Diesel (1999-2008)

Agricultural Crop/Livestock Dust (1999-2008)

Annual Source Emissions

Model Fitting• We use Markov chain Monte Carlo to simulate from

the posterior distribution of the unknown parameters

• Adaptive MCMC approach of Haario et al (2001)

• Use data from SPECIATE and NEI to calibrate the prior distributions

• Constraints placed on profile matrix based on what is known about composition of specific sources

Estimating Health Effects• For individual cities, estimated source time series can

be plugged into regression models with health outcomes

• For multi-site studies, source determination cannot be a manual process (not reproducible)

• Need automatic method to combine information across a region

• Current approaches assume pollution sources are the same everywhere

US EPA Chemical Speciation Network • 85 monitors, 24 constituentsMedicare cohort (1999—2010) • CVD hospitalizations for 63 counties

SHARE• A method for estimating health effects of sources

SHared Across a REgion

• Sources are estimated at individual locations and health effects estimated

• Sources are matched across locations via population value decomposition (Crainiceanu et al. 2011)

• Health effects combined for common sources via hierarchical modeling

SHARE for Source ID

PM2.5 and CVD Hospitalizations

Summary• Bayesian source apportionment model can

integrate information from 3 national databases

• Data on sources and profiles can be used to constrain the problem and to construct informative prior distributions

• SHARE method can be used to automatically combine health effects of estimated sources across a region

top related