univariate random variables geog 210cchris/lecture3_210c_spring2011... · chris funk lecture 3....
Post on 25-Jul-2020
3 Views
Preview:
TRANSCRIPT
Univariate Random VariablesGeog 210C
Introduction to Spatial Data Analysis
Chris Funk
Lecture 3
Monte Carlo Simulation-I
General approach based on repeated random sampling of a distributionOriginally devised by physicists at Los Alamos National labs in the 1940s
Enrico Fermi & Stanislaw UlamJohn Von Neumann
Needed to simulate neutron path penetration depth in order to design appropriate radiation shieldingWidely used in many disciplines
2
Enrico Fermi
Enrico Fermi
John Von Neumann
Monte Carlo Simulation-Recipe
Define a space of potential inputs
Typically a theoretical distribution
Draw samples from the potential inputsRun simulation, algorithm, estimation procedure, etc.Analyze the large set of potential solutions
3
Enrico Fermi
Enrico Fermi
John Von Neumann
Bootstrapping
Term inspired by Rudolf Erich Raspe’s The Surprising Adventure’s of Baron von Munchausen
The Baron, trapped at the bottom of the ocean, pulls himself up by the bootstraps
Used by Bradley Efron in 1983 to refer to a technique for
Estimating the uncertainty of statistical parameters
Based on resampling (with replacement) from the observed set of data (as opposed to a theoretical distribution)
C. Funk Geog 210C Spring 20114
Baron von Munchausen
Bootstrapping versus Monte Carlo Simulation
Bootstrapping draws from the observational data setMonte Carlo Simulation draws from the cumulative distribution function (CFD)
Empirical CDF or
Theoretical CDF
5
C. Funk Geog 210C Spring 20116
Random Variables: Some Definitions
Random variable (RV):variable, say X, with series of possible outcomes (realizations),i.e., x-valuese.g., total number of members in the households of a city
Probability distribution:a table, graph, or mathematical function, that links potential outcomes of a RV with probabilities of their occurrencee.g., probability of a household selected at random to have x members
Discrete and continuous RVs:discrete: RVs taking particular values (finite or countably infinite) e.g., counting variables such as population, number of accidents on a road, number of floods or earthquakes in a regioncontinuous: RVs taking infinitely many values e.g., height, temperature, speed, distance
Bootstrapping Example-WRSI in Southern Africa
Context: in October of 2002, El Nino conditions quickly evolvedStatistical Question: What was the likely impact of El Nino on Southern African Crop production?Science Question: How do El Nino teleconnections interact with rainfall and crop water requirements in Southern Africa?Method
Use a long time-series of rainfall to drive a gridded water requirement satisfaction index (WRSI) modelUse bootstrapping to assess which areas had below normal WRSI values during El Nino events
C. Funk Geog 210C Spring 20117
Crop Phenology
C. Funk Geog 210C Spring 20118
Crop Water Requirements
C. Funk Geog 210C Spring 20119
Technique (run at each pixel)
C. Funk Geog 210C Spring 2011
10
Long term average WRSI 1. Translate data into anomaliesWRSI’ = WRSI-WRSIavg
2. Assume ‘n’ El Nino years, 3. Calculate 1,000 samples of ‘n’-year av
WRSI anomalies4. Sort all 1,000 average anomalies,
identify the 5% percentile value5. If the average El Nino anomaly is
less than this value, it is significantat the 95% level
Weak and moderate-too-strong El Nino WRSI anomalies
11
Weak El Nino anomalies Weak-to-Strong Anomalies
Bootstrapping used to assess significance at the 95% level
El Nino Rainfall Anomalies
12
The inter-occular accuity test?
Forecast Interpretation using Monte Carlo Simulation
Fit distributionDraw sample from distributioncontingent on probabilisticforecastRefit distribution to resampleddataEstimate values of interest
Change in meanProbability of exceedance10th and 90th percentile rainfall
C. Funk Geog 210C Spring 2011
13
FIT Process
C. Funk Geog 210C Spring 2011
14
Datos históricos
0 500 1000 1500 2000quanti le
0.0000
0.0005
0.0010
0.0015
Probability Map Map of Rainfall
Fit Process
C. Funk Geog 210C Spring 2011
15
0 500 1000 1500 2000quanti le
0.0000
0.0005
0.0010
0.0015
33% 34% 33%
GuatemalaClimatologyMean 955 mmStdd 257 mm
Forecast 25% 35% 40%60 Samples 15 21 34
0 500 1000 1500 2000FIT
0.0000
0.0005
0.0010
0.0015
InterpretedForecast Distribution
Original and Forecasted Distributions for Catacamas
Climatic Distribution: Mean of 621, Std. Dev of 118Forecast of 20/35/45 for hi/mid/loFIT distribution: Mean of 650, std. Dev of 105
C. Funk Geog 210C Spring 2011
16
200 300 400 500 600 700 800 900 10000
0.5
1
1.5
2
2.5
3
3.5
4x 10
-3 Effects of Climate Forecast on Rainfall Distribution
Rainfall, mm
Pro
babi
lity
Original Mean
Forecasted mean
Fit Example (March 2011)
C. Funk Geog 210C Spring 2011
17
March – May 2011 Seasonal PerformanceForecasts
Greater Horn of Africa Consensus Climate Outlook for March to May
2011
Source: ICPAC
Rainfall outlooks for March-May 2011
Source: KMA
MAM 2011 Seasonal forecastProbability of ML category of
precipitation
Source: ECMWF
Source: Ethiopian NMA
18
Probability and Cumulative Mass Functions
Probability distribution function (PDF):tabulation of occurrent probabilities for outcomes of a discrete RV:
e.g., PDF of household members
Cumulative distribution function (CDF):cumulative form of PDF:
C. Funk Geog 210C Spring 2011
19
Expected Value of a Discrete RV
Expected value:mean E{X} of RV X using N outcomes {x1, . . . , xN}:
probability weighted sum of N outcomes; NOTE:the expected value could be an impossible outcome, e.g., the mean of an integer-valued discrete RV need not be an integer
Example:household data:
expected value (mean):
note that the mean value 3.75 is not an integer
C. Funk Geog 210C Spring 2011
20
Expectation of a Linear Combination of a RV
expectation E{Y} of a RV Y defined as function y = h(x) of RV X:
Special cases:expectation of a constant RV X = c:
expectation of a RV Y = cX, i.e., product of a RV X with a constant:
expectation of a RV Y = a + bX, i.e., a linear combination of a RV X:
expectation of a linear combination of a RV X= linear combination of its expectation E{X};=> expectation is a linear operation (same as summation)
C. Funk Geog 210C Spring 2011
21
Variance of a Discrete RV
expected value of squared deviations from mean μ = E{X}:or, expectation of a function h(x) = (x − μ)2 of a RV X:
computational formula: V {X} = E{(X − μ)2} =
Household data example:
C. Funk Geog 210C Spring 2011
22
Binomial Distribution (1)
General remarks:situations where, on a number of N >1 experiments (trials), one or the other of two MECE events: (i) success (coded as 1), or (ii) failure (coded as 0), will occurRV of interest X = number of event occurrences (sum of zeros and ones) over N trials; X can take N + 1 integer outcomes: from 0 (event absent in all trials) to N (event present in all trials)binomial distribution used to calculate the probabilities that X attains any of the N + 1 possible outcomes
Conditions of applicability:1. probability of event occurrence does not change from trial to trial e.g., in a coin tossing experiment, if the coin is “fair”, the probability of either heads or tails is 0.5, and does not change from one coin flipping to another (or from one coin to another)2. outcomes of each of N trials are mutually independent e.g., if N >1 coins are flipped simultaneously, outcomes of one coin do not affect outcomes of other coins
C. Funk Geog 210C Spring 2011
23
Binomial Distribution (2)
Binomial PDF:
(1) combinatorial part: (N choose x)
= number of distinct ways of getting x success outcomes (x ones) from a collection of N trials
(2) probability part:
Fitting a Binomial PDF to data:estimate parameters N and π from sample datafind probability for any particular number of successes x, for given π and for required Nspecial case: N = 1 corresponds to the Bernoulli distribution
C. Funk Geog 210C Spring 2011
24
Binomial Distribution (3)
Binomial PDF:
Example:years in which a lake has frozen in a 200 year record:
whether the lake freezes in a particular year (success) depends only on the conditions of that year, not on those of previous yearsunder the assumption of no climate change, the probability πthat the lake will freeze in any year is constant over the record: π = 10/200 = 0.05
Requisites:probability that lake freezes exactly once (x = 1) in N = 10 years:
probability that lake freezes at least once (x ≥ 1) in N = 10 years:
lake cannot freeze exactly once and exactly twice each year: the events are mutually exclusive for a Binomial RV X: E{X} = Nπ, and V {X} = Nπ(1 − π)
C. Funk Geog 210C Spring 2011
25
Geometric Distribution
General remarks:situations where, on a experiment (trial), one or the other of two dichotomous events:(i) success (coded as 1), or (ii) failure (coded as 0), will occur
RV of interest X = number of trials that will be required to observe the next successGeometric distribution used to calculate the probabilities that X attains any positive integer valueConditions of applicability:
1. outcomes are independent from one trial to another, and probability of event occurrence does not change from trial to trial (same conditions as for Binomial)2. trials must occur in a sequence
Geometric PDF:
Examples:model for trials that occur consecutively in time: waiting time distribution, e.g., lengths of weather regimes such as “spells” of dry periods
C. Funk Geog 210C Spring 2011
26
Poisson Distribution (1)*
General remarks:used for modeling number of discrete events occurring in a sequence, e.g., number of hurricanes over a time period, or number of gasoline stations along a portion of a highwayindividual events being counted are independent, i.e., the probability of event occurrence in an interval depends only on the size of that interval, not on where the particular interval is located or on how often events have been observed in other intervalsPoisson events occur randomly, but at a constant rate, a sequence of Poisson events is said to stem from a Poisson process
Poisson PDF:
Fitting a Poisson PDF to data:estimate intensity from sample data by taking their averageequate that intensity to theoretical population parameter μ
*How many fish are in the sea?
C. Funk Geog 210C Spring 2011
27
Poisson Distribution (2)
Data set:# of tornados per year reported in NY state from 1959-1988; rate of tornado occurrence: μ = E{X} = 138/30 =4.6 tornados/year
Sample and Poisson-derived probabilities:stem plot: sample frequencies of # of tornados per year for x = {0, 1, . . . , 12}Poisson probabilities with mean 4.6 evaluated at x = {0, 1, . . . , 12}, and superimposed on stem plot
Why bother about fitting theoretical distributions to data:smoothing sampling variations due to limited number of datacondense information with parametric distributionsextrapolate probabilities of events not seen in the sample data
ignore thick lines connecting bullets of the Poisson PDF, since it is discrete
C. Funk Geog 210C Spring 2011
28
Uniform Distribution
General remarks:situations where any particular event out of a total of K events has equal probability of occurrence as any other eventuniformly distributed events are said to denote complete independence (or lack of knowledge)
Uniform PDF:
C. Funk Geog 210C Spring 2011
29
Continuous RVs: Some Definitions (1)
Probability density function (PDF):tabulation of probabilities of occurrence of (class) outcomes of a continuous RV:
Cumulative density function (CDF):
ε denotes a very small positive number
C. Funk Geog 210C Spring 2011
30
Continuous RVs: Some Definitions (2)
Quantile function or inverse CDF:
Monte Carlo simulation:procedure of sampling from a CDF:1. draw (simulate) a random number pi in [0, 1]2. retrieve a simulated quantile as: 3. repeat steps 1-3 S times to get a set of S simulated values
simulated values that are distributed according to CDF FX(x)used for uncertainty propagation in model predictions
C. Funk Geog 210C Spring 2011
31
Continuous RVs: Some Definitions (3)
Link between CDF and PDF:the PDF is the derivative of the CDF:
if the CDF has discontinuities, then the PDF is not defined⇒ the CDF is always defined even if the PDF is not
Expected value of a continuous RV:mean E{X} = μ of RV X:
Variance of a continuous RV:expected value of squared deviations (X − μ)2 from mean μ = E{X}:
probability weighted sum of infinitely many outcomes
C. Funk Geog 210C Spring 2011
32
Uniform Random Variable
All outcomes within an Interval are equiprobable
Uniform PDF Uniform CDF
C. Funk Geog 210C Spring 2011
33
Exponential Random Variable
Continuous equivalent of geometric distribution; used for modeling random (waiting) times, e.g., radioactive decay
Exponential PDF: Exponential CDF
λ > 0 is interpreted as the rate of events over the unit interval
C. Funk Geog 210C Spring 2011
34
Monte Carlo Drawing from Exponential RVs
Exponential CDF:
Monte Carlo simulation:generate uniform random numbers in [0, 1] (in Matlab use rand):p = [0.2140 0.6435 0.3200 0.9601 0.7266 0.4120 0.7446 0.2679 0.4399 0.9334]
use quantile function, i.e., solve FX(x) = p with respect to x:
to compute simulated values:x = [0.4816 2.0628 0.7713 6.4428 2.5936 1.0621 2.7298 0.6237 1.1593 5.4181]
C. Funk Geog 210C Spring 2011
35
Standard Normal (Gaussian) Random Variable
Most famous continuous distribution, with characteristic bell shape
PDF: CDF:
no analytical equation; approximated using numerical methodsfor a standard Gaussian RV X: E{X} = 0, and V {X} = 1
FX(xp) = p, FX(−xp) = 1 − p = FX(x1−p) =>x1−p = −xp
Prob {X in [−2, +2]} = FX(2) − FX(−2) = 0.977 − 0.023 = 0.954
C. Funk Geog 210C Spring 2011
36
Normal (Gaussian) Random Variable
Most famous continuous distribution, with characteristic bell shape
PDF: CDF:
C. Funk Geog 210C Spring 2011
37
Link Between Std Normal and Normal RVs
Gaussian and Std Gaussian PDF:
From a Gaussian to a Std Gaussian RV:normalize the x-data, i.e., get their z-scores
From a std Gaussian to a Gaussian RV:multiply the z data by the target std deviation σ and add the target mean μ:
C. Funk Geog 210C Spring 2011
38
Some Continuous Distribution Examples
Gaussian PDFs Gaussian CDFs Lognormal PDF/CDF
top related