distribution analysis - finding the best distribution that explains … group... · 2016. 3....

13
Distribution Analysis Real data Summary References Distribution Analysis Finding the best distribution that explains your data Sébastien Casault ENMAX Energy Corporation 8 October, 2015 Sébastien Casault Distribution Analysis

Upload: others

Post on 30-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Distribution AnalysisReal dataSummary

    References

    Distribution AnalysisFinding the best distribution that explains your data

    Sébastien Casault

    ENMAX Energy Corporation

    8 October, 2015

    Sébastien Casault Distribution Analysis

  • Distribution AnalysisReal dataSummary

    References

    IntroductionStatistical testsGoodness of fit

    Introduction

    We often fit observations to a model (e.g., lognormal distribution).How can we ensure that the model is appropriate? Is there a modelthat would provide more accurate predictions?

    Goodness of fitMeasures of goodness of fit typically summarize the discrepancybetween observed values and the values expected under the modelin question. – Wikipedia

    Sébastien Casault Distribution Analysis

  • Distribution AnalysisReal dataSummary

    References

    IntroductionStatistical testsGoodness of fit

    Kolmogorov-Smirnov

    The K-S statistic, D, is defined as:

    Dn = supx|Fn(x)− F (x)|

    for the hypothesized distribution is F , and empirical (sample)cumulative distribution function is Fn.

    Sébastien Casault Distribution Analysis

  • Distribution AnalysisReal dataSummary

    References

    IntroductionStatistical testsGoodness of fit

    Anderson-Darling

    There are many fit tests - they are mostly variations of the KS test.

    For example, the AD statistic, A, is defined as:

    A = n∫ ∞−∞

    (Fn(x)− F (x))2

    F (x) (1− F (x)) dF (x)

    and is a weighted sum of the quadratic difference between thehypothesized distribution and the sample one, placing more weighton observations in the tails.

    Sébastien Casault Distribution Analysis

  • Distribution AnalysisReal dataSummary

    References

    IntroductionStatistical testsGoodness of fit

    P value statistics

    The P value is the answer to this question:

    If the two samples were randomly sampled from identicalpopulations, what is the probability that the two cumulativefrequency distributions would be as far apart as observed?More precisely, what is the chance that the value of the teststatistic would be as large or larger than observed?

    If the P value is small, conclude that the two groups were sampledfrom populations with different distributions. The populations maydiffer in median, variability or the shape of the distribution.

    Sébastien Casault Distribution Analysis

  • Distribution AnalysisReal dataSummary

    References

    ContextCoal plant outagesDistribution fitting

    Electricity prices in Alberta

    One of the most volatile commodities traded in wholesale markets.

    Sébastien Casault Distribution Analysis

  • Distribution AnalysisReal dataSummary

    References

    ContextCoal plant outagesDistribution fitting

    Sources of volatility

    An increasing portion of the supply portfolio is stochastic:

    Alberta has an installed wind capacity of 8.3%Coal-fired power plants undergo forced outages

    Sébastien Casault Distribution Analysis

  • Distribution AnalysisReal dataSummary

    References

    ContextCoal plant outagesDistribution fitting

    Case study - Sundance 2

    TransCanada’s Sundance A and B Power Purchase Arrangements

    entitle TransCanada to more than 900 megawatts (MW) of capacity

    from the Sundance Power Plant. TransCanada sells this electricity

    under long-term contracts and into the spot market. The Sundance

    Power Plant has a total of six generating units and is owned and

    operated by TransAlta.

    Sundance A & B Power Purchase Agreement

    Power Purchase Arrangement Highlights

    Sundance A PPA:

    100 per cent of the output from units 1 & 2 = 560 MW. Term expires in 2017.

    Sundance B PPA:

    50 per cent of the output from Units 3 & 4 = 353 MW. Term expires in 2020.

    Location:

    The plant is located 70 kilometres (about 45 miles) west of Edmonton, Alberta on the south shore of Lake Wabamun.

    In-Service Date:

    Unit 1 - 1970; Unit 2 - 1973; Unit 3 - 1976; Unit 4 - 1977.

    Capacity:

    2,029 MW.

    Fuel:

    Coal from TransAlta’s Highvale mine.

    Environmental Features:

    Meets ISO 14001 standards; Regulated by Alberta Environment and the Alberta Electric Utilities Board.

    Owner:

    TransAlta Utilities Corporation.

    Operator:

    TransAlta Utilities Corporation.

    Sébastien Casault Distribution Analysis

  • Distribution AnalysisReal dataSummary

    References

    ContextCoal plant outagesDistribution fitting

    Outage statistics

    The Sundance 2 unit has undergone several forced outages in 2015 -often coinciding with wholesale market price spikes.

    Sébastien Casault Distribution Analysis

  • Distribution AnalysisReal dataSummary

    References

    ContextCoal plant outagesDistribution fitting

    Distribution fitting

    PROC UNIVARIATE DATA = WORK.ON ;VAR ON ;HISTOGRAM ON / NORMAL LOGNORMAL EXP WEIBULL ;CDFPLOT ON / WEIBULL ;

    RUN ;

    Sébastien Casault Distribution Analysis

  • Distribution AnalysisReal dataSummary

    References

    ContextCoal plant outagesDistribution fitting

    Distribution statistics and goodness of fit

    Both fits provide an accurate description of the observed data.There may be a more theoretical reason to choose the Weibulldistribution.

    Sébastien Casault Distribution Analysis

  • Distribution AnalysisReal dataSummary

    References

    Summary

    Finding the right predictive model is importantThere are several tests that can quantify how well a certainmodel fits empirical dataUsing these tests, we can obtain GOF statisticsBuild more reliable models using the right fit

    Sébastien Casault Distribution Analysis

  • Distribution AnalysisReal dataSummary

    References

    Bibliography

    1 Base SAS(R) 9.2 Procedures Guide: Statistical Procedures. UNIVARIATE Procedure, Goodness-of-FitTests.

    2 Clauset, A., Shalizi, C. R., and Newman, M. E. J. (2007). Power-Law Distributions in Empirical Data.SIAM Review, 51, 661-703.

    3 Hagiwara, Y. (1974). Probability of earthquake occurrence as obtained from a Weibull distribution analysisof crustal strain. Tectonophysics, 23, 313-318.

    Sébastien Casault Distribution Analysis

    Distribution AnalysisIntroductionStatistical testsGoodness of fit

    Real dataContextCoal plant outagesDistribution fitting

    SummaryReferences