empirical model. empirical models the empirical model represents an application of mathematics which...
TRANSCRIPT
Empirical Models
The empirical model represents an application of mathematics which tends to describe the process / phenomenon / case study as a "black box“:
The empirical models are not derived from assumptions concerning the relationship between variables and they are not based on physical principles.An empirical model describes the relationship between dependent (y) and independent variables (x) without explaining the mechanism involved.The number of dependent variables that can be analyzed in the empirical models is usually not greater than one, so you have more accurate feedback from the model can be tested in the only variable. If the dependent variables are more than one, you need to study them one at a time and then analyze and overlap each other inferences.
x y
y=f(x)
Empirical Models
The empirical model consists mainly in the formulation of a mathematical law that joins together the available experimental data, observations, etc.. in order to give answers relevant to the process / phenomenon / mechanism examined.An empirical model is entirely based on experimental data, observations, etc.. (Data Rich and Theory Poor)
Each equation in the model includes one or more coefficients or parameters that are presumed constant.With the help of the experimental data, we can determine the form of the model, and subsequently (or simultaneously) estimate the value of some or all of the parameters in the model.
Example: correlation that provides the specific heat at constant pressure, cp, as a function of temperature T: cp = a + b Twhere a and b are two constants that represent the parameters of the empirical model Note: this model provides an empirical model which employ 1 dependent variable and 1 independent variableA second example with more than 1 independent variable can be the algorithm of a multiple regression
Empirical Models
Once the data is collected, you need to decide on the techniques you want to use in order to find an appropriate model.
Depending on the type and quantity of data, and the characteristics of the problem, you can choose between two different approaches:
1. Interpolation: finding a function that contains all the data points.
2. Regression (Smoothing) or Model fitting: finding a function that is as close as possible to containing all the data points. Such function is also called a regression curve.
Sometimes you would need to combine these methods since the interpolation curve might be too complex and the best fit model not to be sufficiently accurate
Empirical Models: InterpolationAssuming accurate data, we try to construct the curve that passes through the points representing the data
"Interpolation guarantees the fitted curve will pass through each and every data point.“
Historically, the most famous and used interpolating functions are polynomials
NOTE: the points awarded may NOT have the same abscissa, but ordinate different!
SPLINEThe interpolation by spline provides a different approach to the problem of interpolation. The spline are piecewise polynomial functions interpolating the function by keeping fixed the degree of the interpolating polynomial, but by dividing the entire interval of interpolation into n smaller sub-intervals and considering a different polynomial for each of these subintervals (same degree but different coefficients).In particular, the most considered are the cubic spline that are third degree.
Empirical Models: REGRESSION (SMOOTHING)
It assumes the data with errors and tries to construct a curve which deviates slightly from the data, so as not to lose information contained in them
"Regression simply ensures that the" merit function ", which is an arbitrary function that measures the disagreement between the data and the model, is minimized."
Empirical Models: REGRESSION (SMOOTHING)
Selection of the form of an empirical model requires judgment as well as some skill in organizing how response patterns match possible function.
Optimization method can help in the selection of the model structure as well as in the estimation of the unknown coefficients.
If you can specify a quantitative criterion which defines what is “best” in representing the data, then the model representation can be improved by adjustment of the form of the model to improve the value of he criterion.
The best model, presumably, exhibits the least error between actual data and the predicted response in some sense.
The development of empirical model
The development of empirical models is divided into several phases:
Postulate of empirical modelMake an experimental plan and get dataAdaptation" of the modelEstimation of parametersEvaluation of results
In most cases, the development of empirical model goes through a cyclical process.Note: The construction work of the empirical model can be further improved using the techniques made available by the "Experimental Design".
Analisi dei risultati e revisione del modello e del
programma degli esperimenti
Progetto dell’esperimento
Sperimentazione
Modello
Analisi dei risultati e revisione del modello e del
programma degli esperimenti
Progetto dell’esperimento
Sperimentazione
Modello
Empirical Models: REGRESSION (SMOOTHING)
When the model is linear in the coefficients, the coefficients can be estimated by a procedure called linear regression.
If the coefficients appear in the function in a nonlinear fashion, the estimation of the coefficient is referred to as nonlinear regression.
Y = . X +
Y = exp ( . X + )
Linear equation
Non-linear equation
Empirical Models: REGRESSION (SMOOTHING)
In fitting, the number of data sets must be equal to or greater than the number of coefficients in the model, the values of which are to be estimated.
However, if p is the number of data sets and n the number of undetermined coefficients in the model, then you should collect enough data sets such that p>n (inconsistent set of equations) rather than p=n.
An optimization criterion can be used to obtain the best solution of the p equations, i.e., the best values of the unknown coefficients, according to some selected criterion.
Empirical Models: REGRESSION (SMOOTHING)
One of these is the least squares that minimize the sum of the squares of the errors between the predicted and the experimental values of the dependent variable y for each data point x.
f = Σ (yi - f(xi))2
A quadratic objective function is minimized with respect to the unknown coefficients.
For model in which the coefficients appear linearly, this procedure leads to a set of linear equations that can be solved uniquely.
To estimate the values of coefficients in nonlinear model you have to minimize f with respect to the coefficients using a computer code. However, the calculations are iterative rather than noniterative as with linear regression.
Example of fuctions able to describe a large number of physical phenomen
Most of them can be changed in linear function when properly plotted.
Linear y = ax+b: Easiest, simplest, used very frequently. If b is sufficiently small, y is said to be proportional to x (y / x);
Quadratic y = ax2 + bx + c: Appropriate for fitting data with one minimum or one maximum. If a > 0, this function is concave up, if a<0, it is concave down.
Cubic y = ax3 + bx2 + cx + d Appropriate for fitting data with one minimum and one maximum.
Quartic y = ax4 + bx3 + cx2 + dx + e. Convenient for fitting data with two minima and one maximum or two maxima and one minimum.
Model Fitting. Modeling using Regressions.
Exponential y=a*bx or y=a*ekx. If k>0, then the function is increasing and concave up. If k < 0, then the function is decreasing and concave up. This model is appropriate if the increase is slow at first but then it speeds up (or, if k < 0 if the decrease is fast at first but then slows down).
Logarithmic y=a+b ln x. If b > 0, then the function is increasing and concave down. If b < 0, then the function is decreasing and concave up. If the data indicates an increase, this model is appropriate if the increase is fast at first but then it slows down.
Logistic y = c/(1+a*e-bx). Increasing for b >0: In this case, the increase is slow at first, then it speeds up and then it slows down again and approaches the y-value c when x tends to ∞.
Power axb. If a > 0, it is increasing for b > 0 and decreasing for b<0: It is called a power model since an increase of x by factor of t causes an increase of y by the power tb of t (for b > 0). Increasing power function will not increase as rapidly as an increasing exponential function.
Model Fitting. Modeling using Regressions.
Empirical Models: REGRESSION (SMOOTHING)
Connection with linear model:
If y = axb, then ln y = ln a + b ln x.Therefore, if y is a power function, ln y is a linear function of ln x.
Note: If y = abx, then ln y = ln a + x ln b.Therefore, ify is an exponential function, ln y is a linear function of x:
In conclusion, the first step in deriving models is to get the plot of the data, if the data does not seem to linear, try to plot variables as logarithms so that you can check if an exponential or power model are good fits. The idea is to get a graph that looks reasonable linear and then to get a linear model (when it is possible).
Evaluation the goodness of fit
Determination (or Correlation) coefficient (R2)
Residuals (d)
Square root of the mean square error (RMSE)
Bias factor (Bf)
Accuracy factor (Af)
Equivalent line
EVALUATION OF THE "GOOD" THE FITTING
RMSE = S = [Σ (yi - f(xi))2/(n – nparam)]1/2
where:y denotes the value calculated from the regression model y=f(x)yi denotes the data pointsn is the number of data pointsnparam is the number of parameters in the particular model(n - nparam) is the number of degrees of freedom
Note: an RMSE value closer to 0 indicates a fit that is more useful for prediction
MatLab help on:Curve Fitting Tool Evaluating the Goodness of Fit
Square Root of mean square error (RMSE)
EVALUATIng OF THE "GOODness of the fit
R2 = (St – Sr) / St
Where
St is the Sum of Squares about the Mean (SST=St=Σ (yi- ym)2 )ym = (Σ yi)/n (Mean)Sr is the Sum of Squared Errors (SSE=Sr=Σ(yi-f(xi))2)
Note: R2 is independent on the units of x and y and is also independent on which of the two variables is labeled as independent and which as dependent (in other words, data (yi; xi), i = 1,…, m, would produce the same value of R2.
Example: If R2 = 0.9, then 90% of the total variation of y-values from the line y=ym is accounted for by a linear relationship with the values of x.
Determination coefficient R2
Evaluation the goodness of fit
n
i n
n
observed
predictedLog
n
fB 1 )(
)(10
1
10
Bias Factor
Indicate, in the mean, if the experimental values are below or above the equivalent line
In other words, it indicates whether the model overpredicts (Bf>1) or underpredicts (Bf<1) the observed data and, if so, by how much.
Bf = 1; Bf > 1; the model predicts longer
values than observed
Bf < 1; the model predicts smaller
values than observed
S. enteritidis
-8-6-4-20
-8
-6
-4
-2
0
Observedpr
edic
ted
S. enteritidis
-8-6-4-20
-8
-6
-4
-2
0
Observedpr
edic
ted
Evaluation the goodness of fit
S. enteritidis
-8-6-4-20
-8
-6
-4
-2
0
Accuracy Factor
Provides a measure of the average difference between observed and predicted values.
n
i n
n
observed
predictedLog
n
fA1
2
)(
)(10
1
10
Af = 1;
Observed
pred
icte
d
Empirical Models: Validation
Objective:
to ensure that the model is sufficiently accurate for its intend use.
Procedure
Comparing the obtained data to the corresponding predicted values
Types
1. Internal validation: Uses the data used to obtain the model
2. External validation: Uses new data from additional replicates (recommended) (in general data used in formulating a model should not to be used to validate it if at all possible)
Assesment
- Equivalent graph.
- Bias factor (Bf)
- Accuracy factor (Af)
S. enteritidis
-8-6-4-20
-8
-6
-4
-2
0
Linear and Non-linear regression:
Objective:
To find the values for the parameters of a
given equation that better describe the
experimental data
Predictive Microbiology
Y = . X +
Y = exp ( . X + )
Linear equation
Non-linear equation
Linear regression:Analyzes the relationship between two variables x and Y that
are known and try to find the best straight line through the
data.
Y = . X +
= slope
= intercept
Y
X
Predictive Microbiology
Nonlinear regression:Fits experimental data to any equation that defines Y as
function of X an one or more parameters. It find the values of
those parameters that generate the curve that comes closest to
the data
Y = exp ( . X + )
= parameter
= parameterY
X
Predictive Microbiology
60ºC
Treatment time
Predictive Microbiology
Inactivation models
Nt = No e-kt
nu
mb
er o
f su
rviv
ors
ADT
1
DT
Log10 Nt = Log10 No - A * t
Gráfica de supervivenciaL
og
nu
mb
er o
f su
rviv
ors
Treatment time
Predictive Microbiology
Inactivation models
T
t
D
t
N
NLog
0
No
D60ºC D71ºC
60ºC70ºC
Survival line
Lo
g N
um
ber
of
surv
ivo
rs
Treatment time
Predictive Microbiology
Inactivation models
D50ºC
50ºC
Predictive Microbiology
Inactivation models
1
-1
0
-2
Lo
g D
t
Tª (ºC)
50 60 70
Z
z
TT
D
DLog 12
2
1
0 1 2 3 4 5-5
-4
-3
-2
-1
0
Tiempo tratamiento
Lo
g1
0 N
t/N0
0 1 2 3 4 5-5
-4
-3
-2
-1
0
Tiempo tratamiento
Lo
g1
0 N
t/N0
0 1 2 3 4 5-5
-4
-3
-2
-1
0
Tiempo tratamiento
Lo
g1
0 N
t/N0
0 1 2 3 4 5-5
-4
-3
-2
-1
0
Tiempo tratamiento
Lo
g1
0 N
t/N0
Deviations of the linear behaviour
Linear “Shoulder”
“Tail “Shoulder and Tail”
or
Sigmoideal
Predictive Microbiology
Inactivation models
Deviations of the first order kinetics
Predictive MicrobiologyInactivation models
)/(log410 101
)(
te
tSLog
ctLnatSLog 1)(10
tt epeptS 21 1)(
1
2
1)0()(
s
mt
eUFCtUFC
Cole et al. (1993)
Peleg and Penchina (2000)
Pruit and Kamau (1993)
Augustin et al (1998)
Peleg y Cole (1998)p
t
N
NLog
0
Predictive MicrobiologyInactivation models
t = treatment time
b = scale factor
n = shape factor
α = scale factor: time to inactivate the first ln cycle
β = shape factor.
δ = scale factor = time to inactivate the first log10 cycle
p = shape factor.
nbtN
NLog
0
pt
N
NLog
0
t
N
NLog
303,2
1
0
van Boekel (2002)
Peleg y Cole (1998)
Mafart et al. (2002)
MODEL BASED ON WEIBULL DISTRIBUTION
Primary model: model based on Weibull distribution
-1
Log 10
S(t)
t: timeValor δ
pt
tSLog
)(10
S (t): Survival fractiont: Treatment timeδ: Scale factor= time to inactivate the first log10 cyclep: Shape factor
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.50.00
0.25
0.50
0.75
1.00
time
Fre
qu
en
cy
t
et
tf1
)(
Predictive Microbiology
Inactivation models