1 researcharticle hseyintoros ...toros/yayinlar/extreme... · air pollutant concentration in the...

10
Selamet G. Ercelebi 1 Hüseyin Toros 2 1 Faculty of Mines, Istanbul Technical University, Istanbul, Turkey. 2 Department of Meteorology, Faculty of Aeronautics and Astronautics, Istanbul Technical University, Maslak, Istanbul, Turkey. Research Article Extreme Value Analysis of Istanbul Air Pollution Data With respect to air quality standards, extreme events are usually of the most interest. Air quality standards require that the observed extreme concentration in a given time interval must not exceed a certain value. In this paper, it is shown that the meas- ured maximum concentration in a time interval can be represented by one of three types of large asymptotic distribution of extreme value statistics. By using this statisti- cal tool, it is possible to analyze the data and to predict the future extreme concentra- tions with a given probability. The theoretical background of extreme value statistics, procedure to estimate the parameters of the largest extreme value distributions and procedure to forecast future extreme events are briefly explained. The theory is applied to data obtained from two permanent stations in Istanbul. Hourly SO 2 and NO 2 concentrations are analyzed and future largest SO 2 and NO 2 concentrations for the following 12 months are forecasted. It has been found that Gumbel's Type I and Type II extreme value distributions represent these air quality data obtained from the two stations very well. The expected maximum SO 2 concentration is found to be 593.7 mg/m 3 and the NO 2 concentration is found to be 393.4 mg/m 3 for the Alibeykoy sta- tion. The air quality exceeds the limit of EN standards for hourly SO 2 concentration twice a year or in a return period of 5.77 months, and 5 times a year or in a return period of 2.6 months for the hourly NO 2 concentrations. Similarly, for the Umraniye station, the expected maximum concentration is 514.5 mg/m 3 SO 2 with a return period of 1.78 months and 437.6 mg/m 3 NO 2 with a return period of 5.6 months. The performed prediction suggests that preventive measures should be carried out in the future in order to meet stringent air quality standards. Keywords: Extreme value; Air pollution; Pollution control; Asymptotic distributions; Forecasting; Received: February 22, 2008; revised: September 5, 2008; accepted: November 30, 2008 DOI: 10.1002/clen.200800041 1 Introduction Extreme air pollution events in the field of pollution control are of particular interest. The occurrence of extreme pollutant concentra- tions of harmful atmospheric substances represents a risk to peo- ple's health and may lead to changes in the world eco-system. There- fore, the prediction of these extreme concentrations and the assess- ment of their contribution to air pollution are subjects of environ- mental concern. The largest sources of pollution in most urban areas are emissions from motor vehicles and household heating, and to a lesser extent, industrial emissions. There are many sources of pollution, both indoor and outdoor, which can affect human health. Traffic-generated pollutants include carbon monoxide, nitrogen oxides, volatile organic compounds, and particulates. Heating systems and other home appliances using coal, fuel oil, gas or wood produce several combustion products of which the most dangerous are sulfur dioxides, nitrogen dioxide, and carbon monox- ide. Istanbul is one of the largest cities in the world where air qual- ity continued to improve after 2000 as a result of the usage of natu- ral gas and higher quality coal for heating. However, if no other actions are taken, the improvements in air quality may be lost once again due to increases in population and associated traffic density. Air pollution concentrations are random variables because they depend on a variety of meteorological conditions and emission vari- ables. From an assessment of the air pollution literature, many stud- ies on applying statistical distribution models to air quality data have been undertaken. The distribution of all peaks within an air quality dataset may fit to a normal or lognormal distribution. On the other hand, if one only considers the highest peak (extreme value) in a given time interval, then a distribution of highest peaks (extremes) may be obtained for a number of such intervals. This dis- tribution of highest peaks is known as an extreme value distribu- tion. When treating this type of datasets, available as successive observations, one may be interested in certain random variables, which include the following: The highest concentration in finite sample, The number of exceedances of a given concentration level in a time period, and The number of observations between exceedances (or return period), etc. The proper tool to analyze this problem is provided by extreme value statistics theory. The results of extreme value statistics models can provide useful information about the occurrence of limit Correspondence: Dr. H. Toros, Department of Meteorology, Faculty of Aeronautics and Astronautics, Istanbul Technical University, 34469 Mas- lak, Istanbul, Turkey. E-mail: [email protected] i 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.clean-journal.com 122 Clean 2009, 37 (2), 122 – 131

Upload: others

Post on 23-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 ResearchArticle HseyinToros ...toros/yayinlar/Extreme... · air pollutant concentration in the high-concentration region. Two predictionmethods,i.e.,methodI:typeIasymptoticdistributionof

Selamet G. Ercelebi1

H�seyin Toros2

1Faculty of Mines, Istanbul TechnicalUniversity, Istanbul, Turkey.

2Department of Meteorology, Faculty ofAeronautics and Astronautics, IstanbulTechnical University, Maslak, Istanbul,Turkey.

Research Article

Extreme Value Analysis of Istanbul Air Pollution Data

With respect to air quality standards, extreme events are usually of the most interest.Air quality standards require that the observed extreme concentration in a giventime interval must not exceed a certain value. In this paper, it is shown that the meas-ured maximum concentration in a time interval can be represented by one of threetypes of large asymptotic distribution of extreme value statistics. By using this statisti-cal tool, it is possible to analyze the data and to predict the future extreme concentra-tions with a given probability. The theoretical background of extreme value statistics,procedure to estimate the parameters of the largest extreme value distributions andprocedure to forecast future extreme events are briefly explained. The theory isapplied to data obtained from two permanent stations in Istanbul. Hourly SO2 andNO2 concentrations are analyzed and future largest SO2 and NO2 concentrations forthe following 12 months are forecasted. It has been found that Gumbel's Type I andType II extreme value distributions represent these air quality data obtained from thetwo stations very well. The expected maximum SO2 concentration is found to be 593.7mg/m3 and the NO2 concentration is found to be 393.4 mg/m3 for the Alibeykoy sta-tion. The air quality exceeds the limit of EN standards for hourly SO2 concentrationtwice a year or in a return period of 5.77 months, and 5 times a year or in a returnperiod of 2.6 months for the hourly NO2 concentrations. Similarly, for the Umraniyestation, the expected maximum concentration is 514.5 mg/m3 SO2 with a returnperiod of 1.78 months and 437.6 mg/m3 NO2 with a return period of 5.6 months. Theperformed prediction suggests that preventive measures should be carried out in thefuture in order to meet stringent air quality standards.

Keywords: Extreme value; Air pollution; Pollution control; Asymptotic distributions; Forecasting;

Received: February 22, 2008; revised: September 5, 2008; accepted: November 30, 2008

DOI: 10.1002/clen.200800041

1 Introduction

Extreme air pollution events in the field of pollution control are ofparticular interest. The occurrence of extreme pollutant concentra-tions of harmful atmospheric substances represents a risk to peo-ple's health and may lead to changes in the world eco-system. There-fore, the prediction of these extreme concentrations and the assess-ment of their contribution to air pollution are subjects of environ-mental concern. The largest sources of pollution in most urbanareas are emissions from motor vehicles and household heating,and to a lesser extent, industrial emissions. There are many sourcesof pollution, both indoor and outdoor, which can affect humanhealth. Traffic-generated pollutants include carbon monoxide,nitrogen oxides, volatile organic compounds, and particulates.Heating systems and other home appliances using coal, fuel oil, gasor wood produce several combustion products of which the mostdangerous are sulfur dioxides, nitrogen dioxide, and carbon monox-ide. Istanbul is one of the largest cities in the world where air qual-ity continued to improve after 2000 as a result of the usage of natu-

ral gas and higher quality coal for heating. However, if no otheractions are taken, the improvements in air quality may be lost onceagain due to increases in population and associated traffic density.

Air pollution concentrations are random variables because theydepend on a variety of meteorological conditions and emission vari-ables. From an assessment of the air pollution literature, many stud-ies on applying statistical distribution models to air quality datahave been undertaken. The distribution of all peaks within an airquality dataset may fit to a normal or lognormal distribution. Onthe other hand, if one only considers the highest peak (extremevalue) in a given time interval, then a distribution of highest peaks(extremes) may be obtained for a number of such intervals. This dis-tribution of highest peaks is known as an extreme value distribu-tion. When treating this type of datasets, available as successiveobservations, one may be interested in certain random variables,which include the following:– The highest concentration in finite sample,– The number of exceedances of a given concentration level in a

time period, and– The number of observations between exceedances (or return

period), etc.The proper tool to analyze this problem is provided by extreme

value statistics theory. The results of extreme value statistics modelscan provide useful information about the occurrence of limit

Correspondence: Dr. H. Toros, Department of Meteorology, Faculty ofAeronautics and Astronautics, Istanbul Technical University, 34469 Mas-lak, Istanbul, Turkey.E-mail: [email protected]

i 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.clean-journal.com

122 Clean 2009, 37 (2), 122 –131

Page 2: 1 ResearchArticle HseyinToros ...toros/yayinlar/Extreme... · air pollutant concentration in the high-concentration region. Two predictionmethods,i.e.,methodI:typeIasymptoticdistributionof

Clean 2009, 37 (2), 122 – 131 Analysis of Istanbul Air Pollution 123

exceedances of air pollution with respect to the standards and canbe used to make future predictions about these limit exceedances.

In this study, it is shown that the extreme value statistics theoryprovides a useful tool to analyze air quality data and forecast futureextreme events. For this purpose, the theory is applied to model airquality data, SO2 and NO2 concentrations measured at two monitor-ing stations in Istanbul and extreme concentrations for the futureis also forecast. The procedure gives the expected and the most prob-able largest air pollution concentrations and also provides an inter-val in which the maximum concentrations will be observed with agiven probability.

Statistical distribution models have been widely applied in thefield of air pollution. The level of application of extreme valuetheory in the field has been increasing recently. One of the firstapplications of the theory to air pollution is by Roberts [1, 2] andprovides a comprehensive overview of the theory. Roberts alsopresents application to SO2 and NO2 data in the Long Beach area,CA, USA. When the extreme concentration of pollutants is inde-pendently and identically distributed, the parent distribution ofthe pollutant belongs to the exponential type. A type I asymptoticdistribution was successfully applied to estimate the monthly andannual maximum concentrations.

A procedure for using air quality data to estimate the mean valueof the maximum concentration in a year-long sequence of lognor-mally distributed air pollutant concentrations has been describedby Larsen [3].

Horowitz and Barakat [4] illustrate two methods for using airquality data to estimate the distributional properties of maxima ofnon-stationary sequences of concentrations. One method involvesidentifying a non-stationary stochastic process that explains thedata and computing the probability distributions of the maximal ofthe sequences generated by this stochastic process. The otherinvolves applying the Larsen procedure to a suitably selected subse-quence of the data. The authors showed the O3 extreme data couldbe fitted well by a type I asymptotic distribution of the extremevalue.

Extreme value theory is discussed for analysis of air pollutiondata by Surman et al. [5]. The method of application of the theory isdescribed by means of an example analysis on an ozone dataset andthe theory is applied to several datasets collected in Brisbane,Queensland, Australia. The theory is used to predict the number ofviolations of WHO and US standards expected in the year followingdata collection.

Sharma et al. [6] used to make predictions of the expected numberof violations of the National Air Quality Standards of India forhourly- and eight-hourly average carbon monoxide concentrationsfrom an urban road intersection. The parameters of extreme valuedistribution (first type) have been estimated by four methods, i. e.,method of moments, maximum likelihood, least square fit, andGumbel's fitting method. The authors conclude that a comparisonof the predicted violations of standards and the exceedance of themaximum pollution concentration with the observed data suggestthat the theory performs satisfactorily for the extreme air pollutionevents.

Ercelebi and Kirmanli [7] predicted methane concentration in anunderground mine environment by means of extreme value statis-tics techniques. They fitted the first type asymptotic distribution todata for one mine, and the third type asymptotic distribution todata for another mine. They also predicted future maximum con-centrations with a given probability.

Lu and Fang [8] applied thee theoretical distributions, i. e., lognor-mal, Weibull and type V Pearson, to fit the parent distribution ofthe PM10 at five air monitoring stations in Taiwan from 1995 to1999. However, these distributions cause large errors in predictingair pollutant concentration in the high-concentration region. Twoprediction methods, i. e., method I: type I asymptotic distribution ofextreme value and method II: type I two-parameter exponential dis-tribution, were used to fit the distributions of the monthly maxi-mum and high PM10 concentration over a specific percentile,respectively. Moreover, these two methods were taken to estimatethe return period and exceedances of a critical PM10 concentration,such as Taiwan EPA's standard, in Taiwan. The authors concludethat the fitted type I two-parameter exponential distribution is bet-ter matched with the high PM10 concentration levels than theparent lognormal distribution.

Kan and Chen [9] determined that lognormal, Pearson V andextreme value distributions are the best statistical distributions forthe daily average concentration data of PM10, SO2, and NO2 data inShanghai, respectively. They concluded that the results can be fur-ther applied to local air pollution prediction and control.

Hurairah et al. [10] applied a new extreme value model to air pol-lution data in Malaysia, and more specifically to CO data. They fittedthe model by using the maximum likelihood method. The authorsstated that the new extreme value model is easy to use and canachieve higher accuracy compared with other models.

Sfetsos et al. [11] modeled daily PM10 concentration values froman industrial area in West Macedonia by using extreme valuetheory. Achcar et al. [12] considered the problem of estimating thenumber of times an air quality standard is exceeded in a givenperiod of time. A non-homogeneous Poisson model was proposed toanalyze this issue. The theoretical development was applied to dataprovided by the pollution monitoring network of Mexico City.

2 Materials and Methods

The first step in the evaluation of an air quality standard is to selectthe statistical distribution that best fits the data. Air quality stand-ards are expressed in terms of the fact that a certain limit is exceedmore than a specified number of times during a one year period ofone-hour average concentrations. The extreme value statisticstheory provides a very useful tool for the evaluation of these types ofstandards and they give air quality decision makers or managers aneffective means to manage future air pollution problems.

Suppose m is the number of measured air quality concentrationsin a time period (day, month, year), among which the largest values,Xm have been selected. In other words, a series of measurements dur-ing a time period is taken and the maximum value in this timeperiod is called Xm. If one lets x represent the air quality concentra-tions measured in a time interval, and F (x) be the unknown proba-bility distribution function of air quality concentrations, then theexact probability function, G (Xm), of largest concentrations, Xm, in atime period can be expressed as Eq. (1):

G (Xm) = Pr{X1 f x, X2 f x ... Xm f x} = [F (x)]m (1)

As m becomes large, [F (x)]m approaches one of the thee types ofasymptotic distributions. In other words, the observed asymptoticdistribution, I(Xm), can be used to approximate G (Xm). The randomvariable, Zn, such that Zn ¼ maxðX1

m;X2m; . . . Xn

mÞ, represents the larg-

i 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.clean-journal.com

Page 3: 1 ResearchArticle HseyinToros ...toros/yayinlar/Extreme... · air pollutant concentration in the high-concentration region. Two predictionmethods,i.e.,methodI:typeIasymptoticdistributionof

124 S. G. Ercelebi and H. Toros Clean 2009, 37 (2), 122 –131

est concentrations in n time periods, will have the probability func-tion H (Zn) which is given by Eq. (2):

H (Zn) ¼ [G (Zn)]n X [I(Zn)]n (2)

The analytical form of I(Zn) can be found by using historical data.As n increases indefinitely, I(Zn) approaches one of thee types ofextreme value distributions. In the first type, X is an unbounded var-iable, whereas the second and third types deal with lower and upperlimits respectively. With the help of these distributions, the valuesfor the air quantity concentrations for the future can be predicted.Table 1 details the analytical expressions and Fig. 1 shows the proba-bility distribution functions for the largest values plotted onextreme probability paper. The probabilities, I(x), and return period,T (x), are plotted on the horizontal scale. The reduced variable ywhere y = – log[ – log I(x)] is shown on the bottom of the horizontalscale. The random extreme variable X is plotted on the vertical scale.

The minimum number of past observations required to estimatethe extreme distributions can be found as follows: The probabilitythat the mth largest value of N initial observations will be exceededat least once among n future observations is given by Gumbel [13] asEq. (3):

PmðX F 1Þ ¼ 1� N!ðnþ N �mÞ!ðnþ NÞ!ðN �mÞ! ð3Þ

For m = 1 this reduces to give Eq. (4):

P1ðX F 1Þ ¼ nN þ n

ð4Þ

If the number of past and future observations are equal such thatN = n, then P1(X F 1) = 0.5, which means that there is a 0.5 probabilitythat a new sample from n observations will result in a value largerthan the maximum of N past observations. If one selects P = 0.05,then n = 19N. This means that if one is interested in the extremes

that would be observed monthly, at least 19 months of data must beanalyzed to provide a probability of 0.05 for observing one or morevalues in future months greater than the largest observed value.

2.1 Plotting Extreme Value Data

The measured largest concentrations can be represented by one ofthe three types of asymptotic distributions mentioned earlier. Theprocedure for finding a representative distribution is following.– Arrange N individual largest measurements in a time period in

increasing magnitude;

– Calculate plotting positions, Ri = mi/N + 1, mi = rank of the ith

measurement, i = 1,2,…, N. Several of the measured values canbe repetitious. A value that is repeated j times, is assigned j suc-cessive ranks from r to r + j – 1 when calculating the plottingposition;

– Plot the N points, value of the measurement i as ordinate and Ri

as abscissa on extreme probability paper;

– Determine the best fit distribution via visual inspection, andtest the quality of fit via, e. g., a Chi square test, and

– Estimate the parameters of fitted distribution according tomethods given below.

2.2 Parameter Estimation

Several methods of estimating parameters of the first asymptoticdistribution are available in the literature [15, 16] and are brieflyexplained here.

Good estimators of a and m are given by the expressions in Eq. (5):

aa ¼ Sn

Sx

and

i 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.clean-journal.com

Figure 1. Thee types of extreme value dis-tribution functions.

Page 4: 1 ResearchArticle HseyinToros ...toros/yayinlar/Extreme... · air pollutant concentration in the high-concentration region. Two predictionmethods,i.e.,methodI:typeIasymptoticdistributionof

Clean 2009, 37 (2), 122 – 131 Analysis of Istanbul Air Pollution 125

mm ¼ X � Yn

aað5Þ

where Yn and Sn are means and standard deviations of reducedextremes provided by Gumbel for a given sample size n, i. e.;

Yn ¼1n

Xn

m¼1

ym

Y2n ¼

1n

Xn

m¼1

y2m

S2n ¼ Y2

n � �YY2n ð6Þ

The factors ym and m counted from below, were obtained from theplotting position as given in Eq. (7):

ym ¼ � lgð� lgm

nþ 1Þ;m ¼ 1; 2; � � � ; n ð7Þ

where X and Sx are the mean and standard deviations of observedvalues, respectively.

Methods for estimating parameters for the second and the thirdasymptotic distributions can be found elsewhere [6, 13 – 15]. Theparameters are estimated by employing different methods, e. g.,method of moments, method of least squares, method of maximumlikelihood, and Gumbel's fitting method.

2.3 Return Period

A major topic of interest in environmental studies is the returnperiod. The return period is the mean number of intervals requiredin order for a largest concentration value equaling or exceeding x tobe observed. There is a probability 1 – F (x) of a value to be equaledor exceeded by a specific value x. Its reciprocal is known as thereturn period, Eq. (8):

TðxÞ ¼ 11� FðxÞ ð8Þ

2.4 Forecasting

The next step is to forecast the future peak concentrations. Theexpected largest concentration of the next n months for the type I isgiven by Eq. (9):

HðZnÞ ffi e�ne�aðZn�mÞ ð9Þ

Let:

y ¼ aðZn � mÞ or Zn ¼yaþ m ð10Þ

and in addition: HðyÞ ¼ e�ne�y

From the moment generating function of y, one obtains Eq. (11):

GyðtÞ ¼Z1

�1

eythðyÞdy ð11Þ

where h (y) is the density function of y, Eq. (12):

hðyÞ ¼ ne�ne�ye�y ð12Þ

or:

GyðtÞ ¼Z1

�1

eytne�ne�ye�ydy ¼ ntCð1� tÞ ð13Þ

Therefore:

G9yðtÞ ¼ Cð1� tÞnt lnðnÞ þ ntC9ð1� tÞ ð14Þ

EðyÞ ¼ G9yðtÞt¼0 ¼ lnðnÞCð1Þ þ EC ð15Þwhere EC is Euler's Constant and is equal to 0.577215. Eq. (16) isobtained from combining Eqs. (13) and (15):

�ZZn ¼ EðZnÞ ¼ mþ lnðnÞaþ 0:577215

að16Þ

The most probable largest concentration (mode), ~ZZn, of the next ntime periods is given by Eq. (17):

hðZnÞ ¼ ne�ne�aðZn�mÞe�aðZn�mÞ ¼ ne�ne�aðZn�mÞ�aðZn�mÞ ð17Þ

or Eq. (18):

dhðZnÞdZn

¼ na �aþ nae�aðZn�mÞh i

e�ne�aðZn�mÞ�aðZn�mÞh i

ð18Þ

For the modedhðZnÞ

dZn¼ 0, therefore:

~ZZn ¼ mþ lnðnÞa

ð19Þ

In addition to these point estimates, one may consider an intervalin which the largest concentrations of the next n time periods willlie with a given probability p. Thus, solving this problem gives Eqs.(20) and (21):

1þ p2¼ e�ne�aðZ1�mÞ ð20Þ

1� p2¼ e�ne�aðZ2�mÞ ð21Þ

Therefore:

Z1 ¼ mþ� ln � lnð 1þ p

2Þ�

n

� �

að22Þ

and:

Z2 ¼ mþ� ln � lnð 1� p

2Þ�

n

� �

að23Þ

where Z1 is the upper bound and Z2 is the lower bound.Similarly for type II, Eqs. (24 – 27) are obtained:

�ZZn ¼ eþ n1=kðm� eÞCð1� 1=kÞ ð24Þ

~ZZn ¼ eþ ðm� eÞ nkkþ 1

� �1=k

ð25Þ

Z1 ¼ eþ ðm� eÞ � 1n

ln1þ p

2

� �� ��1=k

ð26Þ

Z2 ¼ eþ ðm� eÞ � 1n

ln1� p

2

� �� ��1=k

ð27Þ

i 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.clean-journal.com

Page 5: 1 ResearchArticle HseyinToros ...toros/yayinlar/Extreme... · air pollutant concentration in the high-concentration region. Two predictionmethods,i.e.,methodI:typeIasymptoticdistributionof

126 S. G. Ercelebi and H. Toros Clean 2009, 37 (2), 122 –131

In addition, for type III, Eqs. (28 – 31) are obtained:

�ZZn ¼ EðZnÞ ¼Zw

�1

ZnhðZnÞdZn ¼ w� ðw� mÞC 1þ 1k

� �1n

� �1=k

ð28Þ

~ZZn ¼ w� ðw� mÞ 1� 1=kn

� �1=k

ð29Þ

Z1 ¼ w� ðw� mÞ� ln

1þ p2

� �

n

2664

3775

1=k

ð30Þ

Z2 ¼ w� ðw� mÞ� ln

1� p2

� �

n

2664

3775

1=k

ð31Þ

3 Application of Extreme Value Statistics toIstanbul Air Quality Data

3.1 Data Description

The raw data for hourly average of SO2 and NO2 concentrations werecollected for Istanbul. The city municipality is divided into twozones, i. e., the Anatolian and European sites, where permanent airpollution monitoring stations are established. One permanent sta-tion is chosen from each zone for this study. These were Alibeykoymonitoring station for the European site and Umraniye station forthe Anatolian site, Fig. 2. Alibeykoy station is located in an indus-trial area, and Umraniye station is situated in a residential area.

A principal assumption behind the theory of extreme value statis-tics is that of independence of the measurements of the initial ran-dom variable, e. g., hourly average SO2 concentration measure-ments. A series of successive hourly averages of SO2 will exhibit acorrelation. Because of this correlation, a high value will usually befollowed by another value that is higher than average. However, thedegree of correlation between two measurements should decrease

i 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.clean-journal.com

Figure 2. Geographic location of monitoring stations.

Figure 3. Alibeykoy SO2 monthly maxima.

Figure 4. Alibeykoy NO2 monthly maxima.

Table 1. Asymptotic distributions of largest extremes.

Type Largest Values Conditionsa)

I (Exponential) IðxÞ ¼ e�e�aðx�mÞa A 0, –va x av

II (Cauchy) IðxÞ ¼ e� ðm�eÞ=ðx�eÞ½ �k v A e, x F e, k A 0III (Limited) IðxÞ ¼ e� ðw�xÞ=ðw�m½ �k w A v, w F x, k A 0

a) a = scale parameter, v = characteristic largest value, e = lowerbound on x, k = shape parameter, and w = upper bound on x.

Page 6: 1 ResearchArticle HseyinToros ...toros/yayinlar/Extreme... · air pollutant concentration in the high-concentration region. Two predictionmethods,i.e.,methodI:typeIasymptoticdistributionof

Clean 2009, 37 (2), 122 – 131 Analysis of Istanbul Air Pollution 127

as the period between measurements increases. Because of this, theseries of SO2 measurements show a seasonal trend Fig. 3. However,no yearly trend is evident. The NO2 series shows no seasonal trend,Fig. 4. All measurements are in microgram/m3(lg/m3).

The extreme value series of monthly (X) maxima were formed asfollows. There were 24 hourly concentration measurements eachday and assuming that a month is, on average, 30 days, there was anaverage of n = 30 N 24 = 720 observations of X for the model develop-ment.

X1;ð1Þ > X1;ð2Þ > � � � > X1;ð720Þ N ¼ 1X2;ð1Þ > X2;ð2Þ > � � � > X2;ð720Þ N ¼ 2

..

.

XM;ð1Þ > XM;ð2Þ > � � � > XM;ð720Þ N ¼ M

ð32Þ

where M is the total number of months.The collected data represent seven years from January 1998 till

March 2005 for Alibeykoy and 6 years for Umraniye from January1999 till March 2005. The last thee months of data (January – March2005) is used to test the future prediction scheme.

3.2 Analysis

The monthly maxima of 1 h SO2 concentrations for Alibeykoy areplotted in Fig. 5 on an extreme probability graph. It can be seen thatthe monthly maxima fit the asymptotic theory of the first type ofthe largest values quite well. Plotting the theoretical curve for thefirst type is simple. Using the reduced form suggested by Gumbel,Eq. (33):

Y = a(x – v) (33)

results in estimations of a and m, performed by the methodexplained in Sect. 2.2 being given by Eqs. (34) and (35):

aa ¼ Sn

Sx¼ 1:19382

139:336¼ 0:0085679 ð34Þ

mm ¼ X � Yn

aa¼ 221:275� 0:55688

0:0085679¼ 156:27891 ð35Þ

By plotting the two extreme values of x, say 28 (smallest observedmonthly maximum value) and 705 (largest observed monthly maxi-mum value), one obtains:

Y28 = 0.0085679(28 – 156.27891) = –1.10 and

Y705 = 0.0085679(705 – 156.27891) = 4.70

The straight line connecting these two points having coordinates( – 1.10, 28) and (4.70, 705), respectively, on the extreme probabilitypaper is then best fit to the observed values of the SO2 series. Itshould be noted that Y is the reduced variable. The analytical formof the theoretical line for the daily maximum is given by Eq. (36):

IðxÞ ¼ e�e�0:0085679ðx�156:27891Þ ð36Þ

Once again, the actual observations for the SO2 series appear tofollow this distribution with small random deviations.

The EN Air Quality standard for the hourly limit value for the 1 haveraging SO2 concentration is 350 lg/m3, not to be exceeded more

i 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.clean-journal.com

Figure 5. Largest 1 h SO2 concentration for Alibeykoy.

Page 7: 1 ResearchArticle HseyinToros ...toros/yayinlar/Extreme... · air pollutant concentration in the high-concentration region. Two predictionmethods,i.e.,methodI:typeIasymptoticdistributionof

128 S. G. Ercelebi and H. Toros Clean 2009, 37 (2), 122 –131

i 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.clean-journal.com

Figure 6. Largest 1 h NO2 concentration for Alibeykoy.

Figure 7. Largest 1 h SO2 concentration for Umraniye.

Page 8: 1 ResearchArticle HseyinToros ...toros/yayinlar/Extreme... · air pollutant concentration in the high-concentration region. Two predictionmethods,i.e.,methodI:typeIasymptoticdistributionof

Clean 2009, 37 (2), 122 – 131 Analysis of Istanbul Air Pollution 129

than 24 times in a calendar year. According to the fitted first typeasymptotic distribution, the probability that the largest 1 h SO2 con-centration would exceed 350 lg/m3 is 1 – 0.826 = 0.174. This corre-sponds to a return period of 5.77 months or twice in a year, for anoccurrence of 1 h in which the SO2 concentration would exceed thelimit value, Eq. (37):

TðxÞ ¼ 11� IðxÞ ¼

11� 0:826

¼ 5:77 ð37Þ

Figure 6 shows the monthly maxima of 1 h NO2 concentrationsfor Alibeykoy, and Figs. 7 and 8 show the monthly maxima of 1 hSO2 and NO2 concentrations for Umraniye, respectively, plotted onextreme probability graphs. The EN Air Quality standard for thehourly limit value for the 1 h averaging NO2 concentration is 200lg/m3, not to be exceeded more than 18 times a calendar year.According to the fitted first type asymptotic distribution, the proba-bility that the largest 1 h NO2 concentration would exceed 200 lg/m3 is 1 – 0.621 = 0.379. This corresponds to a return period of 2.6months or five times in a year, for an occurrence of 1 h in which theNO2 concentration would exceed the limit value.

Umraniye 1 h SO2 and NO2 concentration data fit provide a betterfit to the type II distribution, Figs. 7 and 8. The probability that thelargest 1 h SO2 concentration would exceed the limit value of 350lg/m3 is 1 – 0.439 = 0.561. This corresponds to a return period of 1.78months. The probability that the largest 1 h NO2 concentrationwould exceed the limit value is 1 – 0.82 = 0.18. This corresponds to areturn period of 5.6 months. Finally, the best fit asymptotic distribu-tions to Istanbul air quality data and related parameters are givenin Tab. 2.

3.3 Forecasting

Using the equations in Sect. 2.4, the largest SO2 and NO2 concentra-tions for the 12 months following the monitoring is forecast. Theresults are shown in Figs. 9 – 12.

Figure 9 shows that over 12 months, the interval in which themaximum SO2 concentrations will lie, contains the mean (�ZZn) andmode (~ZZn) values. One could say that under present conditions, theexpected maximum SO2 concentration for the following 12 monthswould be 593.7 lg/m3 and the most probable concentration for thenext 12 months is 446.3 lg/m3 for Alibeykoy.

i 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.clean-journal.com

Figure 8. Largest 1 h NO2 concentration for Umraniye.

Figure 9. Forecast for Alibeykoy SO2.

Page 9: 1 ResearchArticle HseyinToros ...toros/yayinlar/Extreme... · air pollutant concentration in the high-concentration region. Two predictionmethods,i.e.,methodI:typeIasymptoticdistributionof

130 S. G. Ercelebi and H. Toros Clean 2009, 37 (2), 122 –131

Similarly, one could say that under present conditions, theexpected maximum NO2 concentration in Alibeykoy would be 393.4lg/m3 with a 95% probability over the next 12 months. The mostprobable concentration is 345.2 lg/m3. Figure 10 shows that overthe 12 months, the interval in which the maximum NO2 concentra-tions will lie, contains the mean (�ZZn) and mode (~ZZn) values.

Figure 11 shows the interval in which the maximum SO2 concen-trations will lie over the following 12 months. The expected maxi-mum SO2 concentration would be 514.5 lg/m3 for Umraniye with a95% probability over the next 12 months. The most probable con-centration is 361.0 lg/m3. Figure 12 shows the interval in which themaximum NO2 concentrations will lie over the 12 months. The

expected maximum NO2 concentration would be 437.6 lg/m3 forUmraniye with a 95% probability over the next 12 months. Themost probable concentrations is 345.2 lg/m3.

4 Conclusions

In this study, it has been shown that extreme value statistics can beused to analyze air quality data. Three types of asymptotic distribu-tions were shown to be the most appropriate to represent the maxi-mum concentrations of air quality measured in a given time inter-val. The basic principles of extreme value statistics theory werereviewed. Following this, the theory was applied to the analysis ofair pollution data collected from two measuring stations in Istan-bul. The return period and exceedances of critical SO2 and NO2 con-centrations, e. g., EN standards, were estimated by type I and II distri-butions. A forecasting procedure was then successfully applied toestimate the future peak concentrations. Although, only SO2 andNO2 data from two stations were analyzed, the method can beextended to other stations as well as other pollutants.

References[1] E. M. Roberts, Review of statistics of extreme values with applica-

tions to air quality data, part I, Review, J. Air Pollut. Control Assoc.1979a, 29, 632 – 637.

[2] E. M. Roberts, Review of statistics of extreme values with applica-tions to air quality data, part II, Applications, J. Air Pollut. Control Assoc.1979b, 29, 733 – 740.

[3] R. I. Larsen, An air quality data analysis system for interrelatingeffects, standards and needed source reductions, J. Air Pollut. ControlAssoc. 1973, 23, 933 – 940.

[4] J. Horowitz, S. Barakat, Statistical analysis of the maximum concen-tration of an air pollutant: effects of autocorrelation and non-sta-tionary, Atmos. Environ. 1979, 13, 811 – 818.

[5] P. G. Surman, J. Bodero, R. W. Simpson, The prediction of the num-bers of violations of standards and the frequency of air pollution epi-sodes using extreme value theory, Atmos. Environ. 1987, 21, 1843 –1848.

[6] P. Sharma, M. Khare, S. P. Chakrabarti, Application of Extreme valuetheory for predicting violations of air quality standards for an urbanroad intersection, Transport. Res., Part D 1999, 4, 201 – 216.

[7] S. G. Ercelebi, C. Kirmanli, Statistical prediction of high methaneconcentration in underground coal mines, Inst. Mining Metall. Eng.Trans. 1999, 108, A132 – 135.

[8] H. C. Lu, G. C. Fang, Predicting the exceedences of a critical PM10concentration, a case study in Taiwan, Atmos. Environ. 2003, 37,3491 – 3499.

i 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.clean-journal.com

Figure 10. Forecast for Alibeykoy NO2.

Figure 11. Forecast for Umraniye SO2.

Figure 12. Forecast for Umraniye NO2

Table 2. Best fit asymptotic distributions.

Data Best Fit AsymptoticDistribution

Parameters Estimated

Alibeykoy-SO2 I a = 0.0085679m = 156.27891

Alibeykoy-NO2 I a = 0.0119936m = 138.0612

Umraniye-SO2 II e = – 151.579m = 89.54k = 0.2658

Umraniye-NO2 II e = – 20.193m = 107.3679k = 0.3712

Page 10: 1 ResearchArticle HseyinToros ...toros/yayinlar/Extreme... · air pollutant concentration in the high-concentration region. Two predictionmethods,i.e.,methodI:typeIasymptoticdistributionof

Clean 2009, 37 (2), 122 – 131 Analysis of Istanbul Air Pollution 131

[9] H. D. Kan, B. H. Chen, Statistical distributions of ambient air pollu-tants in Shanghai, Chin. Biomed. Environ. Sci. 2004, 17 (3), 366 – 372.

[10] A. Hurairah, N. A. Ibrahim, I. B. Daud, K. Haron, An application of anew extreme value distribution to air pollution data, Manage. Envi-ron. Qual.: Int. J. 2005, 16 (1), 17 – 25.

[11] A. Sfetsos, S. Zoras, J. G. Bartzis, A. G. Triantafyllou, Extreme valuemodeling of daily PM10 concentrations in an industrial area, Frese-nius Environ. Bull. 2006, 15 (8), 841 – 845.

[12] J. A. Achcar, A. A. Fernandez Bremauntz, E. R. Rodrigues y GuadalupeTzintzun, Estimating the number of ozone peaks in Mexico Cityusing a non-homogeneous Poisson model, Environmetrics 2007, 19 (5),469 – 485.

[13] E. J. Gumbel, Statistical Theory of Extreme Values and Some Practical Exam-ples, Applied Mathematics Series 33, National Bureau of Standards,Gaithersburg, MD 1954.

[14] E. J. Gumbel, Statistics of Extremes, Columbia University Press, NewYork 1958.

[15] T. M. Yegulalp, J. T. Kuo, Statistical prediction of the occurrence ofmaximum magnitude earthquakes, Bull. Seismol. Soc. Am. 1974, 64,393 – 414.

[16] T. M. Yegulalp, Forecasting for largest earthquakes, Manage. Sci. 1974,21, 418 – 421.

i 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.clean-journal.com