estimating the accuracy of the approximation (surrogate)

13
Estimating the accuracy of the approximation (surrogate) From assumption that error is due to normally distributed uncorrelated random variables, get estimate to error standard deviation (called standard error) Standard measure of accuracy Coefficient of multiple determination measures how much of variability in data is captured by approximation Adjusted coefficient of multiple determination accounts for the fitting bias n n y T e e 2 ˆ y r n i i r n i i y SS SS R y y SS y y SS y y 2 1 2 1 2 ˆ n n n R R y y a 1 ) 1 ( 1 2 2

Upload: raziya

Post on 10-Jan-2016

42 views

Category:

Documents


2 download

DESCRIPTION

Estimating the accuracy of the approximation (surrogate). From assumption that error is due to normally distributed uncorrelated random variables, get estimate to error standard deviation (called standard error) Standard measure of accuracy - PowerPoint PPT Presentation

TRANSCRIPT

PowerPoint Presentation

Estimating the accuracy of the approximation (surrogate)From assumption that error is due to normally distributed uncorrelated random variables, get estimate to error standard deviation (called standard error)

Standard measure of accuracyCoefficient of multiple determination measures how much of variability in data is captured by approximation

Adjusted coefficient of multiple determination accounts for the fitting bias

1Curve fitnoise=randn(1,30); x=1:1:30; y=x+noise 3.908 2.825 4.379 2.942 4.5314 5.7275 8.098 25.84 27.47 27.00 30.96[p,s]=polyfit(x,y,1); yfit=polyval(p,x); plot(x,y,'+',x,x,'r',x,yfit,'b')

With dense data, functional form is clear. Fit serves to filter out noise

Example with y=0.1*x noise=randn(1,30); x=1:1:30; y=0.1*x+noise;

xx=[ones(30,1),x']; [B,BINT,R,RINT,STATS] = regress(y',xx)Stat 0.3016 12.0896 0.0017 1.7498

Estimating error in coefficientsSome coefficients are more accurately estimated than othersStandard error in coefficient is

t-statistic is ratio of coefficient to standard error, would like it to be at least 2Coefficients that are poorly estimated may be dropped to improve accuracy of predictions Dropping one coefficients changes t-statistics for othersNeed to iterate in dropping and adding coefficients

4Regression in Excel (add-in data analysis) Rand Rand-0.5 x y fit error0.7647420.26474211.2647421.035390.035390.258649-0.2413521.7586492.0311920.0311920.7350260.23502633.2350263.0269940.0269940.411036-0.0889643.9110364.0227970.0227970.6749210.1749212424.1749223.93884-0.061160.694810.194812525.1948124.93465-0.065350.6479640.1479642626.1479625.93045-0.069550.407839-0.092162726.9078426.92625-0.073750.211674-0.288332827.7116727.92205-0.077950.405013-0.094992928.9050128.91786-0.082140.242633-0.257373029.7426329.91366-0.08634

Regression outputSUMMARY OUTPUTRegression StatisticsMultiple R0.999381R Square0.998763Adjusted R Square0.998719Standard Error0.313962Observations30CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept0.0395870.1175700.3367110.738845-0.2012450.280419X Variable 10.9958020.006623150.3646232.93E-420.9822371.009368

Output with y=0.1xSUMMARY OUTPUTRegression StatisticsMultiple R0.969193R Square0.939334Adjusted R Square0.937168Standard Error0.251021Observations30CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept-0.190830.094-2.030120.051942-0.38340.0017X Variable 10.110250.00529520.821771.41E-180.09940.1211

Example 3.2.1Given dataUse Microsoft Excel to fit linear and quadratic polynomialsCompare standard errors and t-statistics of coefficientsX-2-1012Y-1.5-1.501.251.75

8Linear fit

9Quadratic fit

10Graphical comparison.

11Cross validationError estimates based on model assumptions are vulnerableFor polynomial response surface approximations assumptions are rarely satisfiedCross validation divides data into ng groupsFit the approximation to ng -1 groups, and use last group to estimate error. Repeat for each groupWhen each group consists of one point, error called PRESS (prediction error sum of squares)Calculate error at each point and then presenting r.m.s errorCan be shown thatCan be used only if not ill-conditioned

12Questions The pairs (0,0), (1,1), (2,1) represent strain (millistrains) and stress (ksi) measurements.Estimate Young modulus using the three commonly used error norms.Estimate the error in Young modulus using cross validation

Sheet1SUMMARY OUTPUTRegression StatisticsMultiple R0.9683342568R Square0.9377Adjusted R Square0.9169Standard Error0.4354Observations5ANOVAdfSSMSFSignificance FRegression18.556258.5562545.13186813190.0067320097Residual30.568750.1895833333Total49.125CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%Intercept00.194701-0.61970.6197-0.61969296750.6196929675X Variable 10.9250.13776.71800.00670.48681.36320.48681090041.3631890996RESIDUAL OUTPUTObservationPredicted YResiduals1-1.850.352-0.925-0.57530040.9250.32551.85-0.1

Sheet1SUMMARY OUTPUTRegression StatisticsMultiple R0.9706051535R Square0.942074364Adjusted R Square0.884148728Standard Error0.5140872633Observations5ANOVAdfSSMSFSignificance FRegression28.59642857144.298214285716.26351351350.057925636Residual20.52857142860.2642857143Total49.125CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%Intercept-0.10714285710.3582838915-0.29904458360.7931182075-1.64871509431.43442938-1.64871509431.43442938X Variable 10.9250.16256866685.68990333840.02952682580.2255229951.6244770050.2255229951.624477005X Variable 20.05357142860.13739560040.38990643370.734211283-0.53759453830.6447373954-0.53759453830.6447373954RESIDUAL OUTPUTObservationPredicted YResiduals1-1.74285714290.24285714292-0.9785714286-0.52142857143-0.10714285710.107142857140.87142857140.378571428651.9571428571-0.2071428571