design of experiments in nonlinear models · luc pronzato (cnrs) design of experiments in nonlinear...
TRANSCRIPT
Design of experimentsin nonlinear models
Luc Pronzato
Université Côte d’Azur, CNRS, I3S, France
Outline I
1 DoE: objectives & examples
2 DoE based on asymptotic normality
3 Construction of (locally) optimal designs
4 Problems with nonlinear models
5 Small-sample properties
6 Nonlocal optimum design
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 2 / 74
1 DoE objectives & examples A/ Parameter estimation
1 DoE: objectives & examples
A/ Parameter estimation
Ex1: Weighing with a two-pan balance☞ Determine the weights of 8 objets, with mass mi , i = 1, . . . , 8i.i.d. errors εi ∼ N (0, σ2)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 3 / 74
1 DoE objectives & examples A/ Parameter estimation
1 DoE: objectives & examples
A/ Parameter estimation
Ex1: Weighing with a two-pan balance☞ Determine the weights of 8 objets, with mass mi , i = 1, . . . , 8i.i.d. errors εi ∼ N (0, σ2)Method a: weigh each objet successively→ y(i) = mi + εi , i = 1, . . . , 8→ estimated weights : m̂i = yi ∼ N (mi , σ
2)Repeat 8 times, average the results: ˆ̂mi ∼ N (mi , σ
2/8) (with 64 observations. . . )
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 3 / 74
1 DoE objectives & examples A/ Parameter estimation
Method b: more sophisticated. . .
y1 = m1 + m2 + m3 + m4 + m5 + m6 + m7 + m8 + ε1
y2 = m1 + m2 + m3 −m4 −m5 −m6 −m7 + m8 + ε2
y3 = m1 −m2 −m3 + m4 + m5 −m6 −m7 + m8 + ε3
y4 = m1 −m2 −m3 −m4 −m5 + m6 + m7 + m8 + ε4
y5 = −m1 + m2 −m3 + m4 −m5 + m6 −m7 + m8 + ε5
y6 = −m1 + m2 −m3 −m4 + m5 −m6 + m7 + m8 + ε6
y7 = −m1 −m2 + m3 + m4 −m5 −m6 + m7 + m8 + ε7
y8 = −m1 −m2 + m3 −m4 + m5 + m6 −m7 + m8 + ε8
→ m̂8 =1
8
8∑
i=1
yi
= m8 +ε1 + ε2 + ε3 + ε4 + ε5 + ε6 + ε7 + ε8
8
∼ N (m8, σ2/8) (idem for all mj , j ≤ 7)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 4 / 74
1 DoE objectives & examples A/ Parameter estimation
Method b: more sophisticated. . .
y1 = m1 + m2 + m3 + m4 + m5 + m6 + m7 + m8 + ε1
y2 = m1 + m2 + m3 −m4 −m5 −m6 −m7 + m8 + ε2
y3 = m1 −m2 −m3 + m4 + m5 −m6 −m7 + m8 + ε3
y4 = m1 −m2 −m3 −m4 −m5 + m6 + m7 + m8 + ε4
y5 = −m1 + m2 −m3 + m4 −m5 + m6 −m7 + m8 + ε5
y6 = −m1 + m2 −m3 −m4 + m5 −m6 + m7 + m8 + ε6
y7 = −m1 −m2 + m3 + m4 −m5 −m6 + m7 + m8 + ε7
y8 = −m1 −m2 + m3 −m4 + m5 + m6 −m7 + m8 + ε8
→ m̂8 =1
8
8∑
i=1
yi
= m8 +ε1 + ε2 + ε3 + ε4 + ε5 + ε6 + ε7 + ε8
8
∼ N (m8, σ2/8) (idem for all mj , j ≤ 7)
➠ 8 observations only, against 64 with method a!
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 4 / 74
1 DoE objectives & examples A/ Parameter estimation
Here, selection of a good design = combinatorial problem
yk =∑8
i=1 fk imi + εk = f⊤k m + εk ,
(e.g., in Method b f2 = [1 1 1 − 1 − 1 − 1 − 1 1]⊤)y = Fm + ε with
method a: Fa = I8
method b: Fb = 8× 8 Hadamard matrix, F⊤b Fb = 8 I8
( = fractional factorial design with 2 levels)
LS estimator m̂ = arg minm
n∑
k=1
[yk − f⊤k m]2
=
(n∑
k=1
fk f⊤k
)−1 n∑
k=1
yk fk = (F⊤F)−1F⊤y
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 5 / 74
1 DoE objectives & examples A/ Parameter estimation
Here, selection of a good design = combinatorial problem
yk =∑8
i=1 fk imi + εk = f⊤k m + εk ,
(e.g., in Method b f2 = [1 1 1 − 1 − 1 − 1 − 1 1]⊤)y = Fm + ε with
method a: Fa = I8
method b: Fb = 8× 8 Hadamard matrix, F⊤b Fb = 8 I8
( = fractional factorial design with 2 levels)
LS estimator m̂ = arg minm
n∑
k=1
[yk − f⊤k m]2
=
(n∑
k=1
fk f⊤k
)−1 n∑
k=1
yk fk = (F⊤F)−1F⊤y
=⇒ Choose the fk ’s such that Mn = 1n
∑nk=1 fk f⊤
k = 1nF⊤F is nonsingular
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 5 / 74
1 DoE objectives & examples A/ Parameter estimation
Here, selection of a good design = combinatorial problem
yk =∑8
i=1 fk imi + εk = f⊤k m + εk ,
(e.g., in Method b f2 = [1 1 1 − 1 − 1 − 1 − 1 1]⊤)y = Fm + ε with
method a: Fa = I8
method b: Fb = 8× 8 Hadamard matrix, F⊤b Fb = 8 I8
( = fractional factorial design with 2 levels)
LS estimator m̂ = arg minm
n∑
k=1
[yk − f⊤k m]2
=
(n∑
k=1
fk f⊤k
)−1 n∑
k=1
yk fk = (F⊤F)−1F⊤y
=⇒ Choose the fk ’s such that Mn = 1n
∑nk=1 fk f⊤
k = 1nF⊤F is nonsingular
E{m̂} = m (no bias)
E{(m̂−m)(m̂−m)⊤} = σ2
nM−1
n
=⇒ minimize a scalar function of M−1n
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 5 / 74
1 DoE objectives & examples A/ Parameter estimation
In this particular situation: combinatorial problem (since fk i ∈ {−1, 0, 1}) [Fisher1925 . . . ]
More generally, when the design variables (inputs) are real numbers, optimumdesign for parameter estimation is obtained by optimization of a scalar function ofthe (asymptotic) covariance matrix of the estimator
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 6 / 74
1 DoE objectives & examples A/ Parameter estimation
Ex2: [D’Argenio 1981]: two-compartment model in pharmacokineticsA product x is injected in blood (→ input u(t)),xC (t) (product in blood) moves to another tissue → xP(t)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 7 / 74
1 DoE objectives & examples A/ Parameter estimation
Ex2: [D’Argenio 1981]: two-compartment model in pharmacokineticsA product x is injected in blood (→ input u(t)),xC (t) (product in blood) moves to another tissue → xP(t)
→ Linear differential equations:
{dxC (t)
dt= (−KEL− KCP)xC (t) + KPC xP(t) + u(t)
dxP (t)dt
= KCPxC (t) − KPC xP(t)
we observe the concentration of x in blood: y(t) = xC (t)/V + ε(t),the errors ǫ(ti)’s are i.i.d. N (0, σ2), σ = 0.2µg/ml
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 7 / 74
1 DoE objectives & examples A/ Parameter estimation
Ex2: [D’Argenio 1981]: two-compartment model in pharmacokineticsA product x is injected in blood (→ input u(t)),xC (t) (product in blood) moves to another tissue → xP(t)
→ Linear differential equations:
{dxC (t)
dt= (−KEL− KCP)xC (t) + KPC xP(t) + u(t)
dxP (t)dt
= KCPxC (t) − KPC xP(t)
we observe the concentration of x in blood: y(t) = xC (t)/V + ε(t),the errors ǫ(ti)’s are i.i.d. N (0, σ2), σ = 0.2µg/ml
There are 4 unknown parameters θ = (KCP ,KPC ,KEL,V )The profile of the input u(t) is given (fast infusion 75 mg/min for 1 min, then1.45 mg/min)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 7 / 74
1 DoE objectives & examples A/ Parameter estimation
Ex2: [D’Argenio 1981]: two-compartment model in pharmacokineticsA product x is injected in blood (→ input u(t)),xC (t) (product in blood) moves to another tissue → xP(t)
→ Linear differential equations:
{dxC (t)
dt= (−KEL− KCP)xC (t) + KPC xP(t) + u(t)
dxP (t)dt
= KCPxC (t) − KPC xP(t)
we observe the concentration of x in blood: y(t) = xC (t)/V + ε(t),the errors ǫ(ti)’s are i.i.d. N (0, σ2), σ = 0.2µg/ml
There are 4 unknown parameters θ = (KCP ,KPC ,KEL,V )The profile of the input u(t) is given (fast infusion 75 mg/min for 1 min, then1.45 mg/min)☞ simulated experiments with «true» parameter values
θ̄ = (0.066 min−1, 0.038 min−1, 0.0242 min−1, 30 l)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 7 / 74
1 DoE objectives & examples A/ Parameter estimation
Experimental variables = sampling times ti , 1 ≤ ti ≤ 720 min• «conventional» design :
t = (5, 10, 30, 60, 120, 180, 360, 720) (in min)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 8 / 74
1 DoE objectives & examples A/ Parameter estimation
Experimental variables = sampling times ti , 1 ≤ ti ≤ 720 min• «conventional» design :
t = (5, 10, 30, 60, 120, 180, 360, 720) (in min)
• «optimal» design (for θ̄) :
t = (1, 1, 10, 10, 74, 74, 720, 720) (in min)
(assumes that independent measurements at the same time are possible)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 8 / 74
1 DoE objectives & examples A/ Parameter estimation
Experimental variables = sampling times ti , 1 ≤ ti ≤ 720 min• «conventional» design :
t = (5, 10, 30, 60, 120, 180, 360, 720) (in min)
• «optimal» design (for θ̄) :
t = (1, 1, 10, 10, 74, 74, 720, 720) (in min)
(assumes that independent measurements at the same time are possible)→ 400 simulations→ 400 sets of 8 observations each, for each design→ 400 parameter estimates (LS) for each design . . .➔ histograms of θ̂i
(and approximated marginals [Pázman & P 1996])
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 8 / 74
1 DoE objectives & examples A/ Parameter estimation
➠ «optimal» design gives more precise estimation
K̂EL [K̄EL = 0.0242]
0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.050
20
40
60
80
100
120
140
160
180
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 9 / 74
1 DoE objectives & examples A/ Parameter estimation
➠ «optimal» design gives more precise estimation
K̂CP [K̄CP = 0.066]
0 0.05 0.1 0.15 0.2 0.250
5
10
15
20
25
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 9 / 74
1 DoE objectives & examples A/ Parameter estimation
➠ «optimal» design gives more precise estimation
K̂PC [K̄PC = 0.038]
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.160
5
10
15
20
25
30
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 9 / 74
1 DoE objectives & examples A/ Parameter estimation
➠ «optimal» design gives more precise estimation
V̂ [V̄ = 30]
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 9 / 74
1 DoE objectives & examples B/ Model discrimination
B/ Model discrimination
Ex3: [Box & Hill 1967] Chemical reaction A→ B
2 design variables: x = (time t, temperature T )reaction of 1st, 2nd, 3rd ou 4th order?→ 4 model structures are candidate:
η(1)(x, θ1) = exp[−θ11t exp(−θ12/T )]
η(2)(x, θ2) =1
1 + θ21t exp(−θ22/T )
η(3)(x, θ3) =1
[1 + 2θ31t exp(−θ32/T )]1/2
η(4)(x, θ4) =1
[1 + 3θ41t exp(−θ42/T )]1/3
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 10 / 74
1 DoE objectives & examples B/ Model discrimination
Simulated experimentObservations with 2nd structure («true»): y(xj) = η(2)(xj , θ̄2) + εj , with➤ θ̄2 = (400, 5000)⊤ the «true» value (unknown) of parameters in model 2➤ (ǫj) i.i.d. N (0, σ2), σ = 0.05Admissible experimental domain: 0 ≤ t ≤ 150, 450 ≤ T ≤ 600
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 11 / 74
1 DoE objectives & examples B/ Model discrimination
Simulated experimentObservations with 2nd structure («true»): y(xj) = η(2)(xj , θ̄2) + εj , with➤ θ̄2 = (400, 5000)⊤ the «true» value (unknown) of parameters in model 2➤ (ǫj) i.i.d. N (0, σ2), σ = 0.05Admissible experimental domain: 0 ≤ t ≤ 150, 450 ≤ T ≤ 600
Sequential design: after the observation of y(xj), j = 1, . . . , k,
• estimate θ̂ki (LS) for i = 1, 2, 3, 4
• compute posterior probability πi(k) that model i is correct for i = 1, 2, 3, 4
Initialization: πi(0) = 1/4, i = 1, . . . , 4 and x1, . . . , x4 are given
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 11 / 74
1 DoE objectives & examples B/ Model discrimination
probabilities πi(k)
0 2 4 6 8 10 12 140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
π2
π3
π4
π1
→ η(2)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 12 / 74
1 DoE objectives & examples B/ Model discrimination
probabilities πi(k)
0 2 4 6 8 10 12 140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
π2
π3
π4
π1
→ η(2)
design points xk
0 20 40 60 80 100 120 140 160440
460
480
500
520
540
560
580
600
1
2 3
4
6,9,12,14 10,11,13
5,7,8
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 12 / 74
1 DoE objectives & examples B/ Model discrimination
Design for discrimination is not considered in the following
A simple sequential method for discriminating between two structures η(1)(x, θ1),η(2)(x, θ2) [Atkinson & Fedorov 1975]
After observation of y(x1), . . . , y(xk) estimate θ̂k1 and θ̂k
2 for both models
place next point xk+1 where [η(1)(x, θ̂k1 )− η(2)(x, θ̂k
2 )]2 is maximum
k → k + 1, repeat
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 13 / 74
1 DoE objectives & examples B/ Model discrimination
Design for discrimination is not considered in the following
A simple sequential method for discriminating between two structures η(1)(x, θ1),η(2)(x, θ2) [Atkinson & Fedorov 1975]
After observation of y(x1), . . . , y(xk) estimate θ̂k1 and θ̂k
2 for both models
place next point xk+1 where [η(1)(x, θ̂k1 )− η(2)(x, θ̂k
2 )]2 is maximum
k → k + 1, repeat
If more than two models: estimate θ̂ki for all of them, place next point using the
two models with best and second best fitting(see [Atkinson & Cox 1974; Hill 1978] for surveys)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 13 / 74
2 DoE based on asymptotic normality A/ Regression models
2 DoE based on asymptotic normality
A/ Regression models
yi = y(xi) = η(xi , θ) + εi
System: physical experimental device, experimental conditions xi ,i = 1, 2, . . . , n
Model(θ): mathematical equations, parameters θ = (θ1, . . . , θp)⊤
(response η(x , θ) known explicitly or result of simulation of ODEs or PDEs)
Criterion: similarity between yi = y(xi) and η(xi , θ), i = 1, 2, . . . , n, e.g.,J(θ) = 1
n
∑ni=1[yi − η(xi , θ)]
2 (LS)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 14 / 74
2 DoE based on asymptotic normality A/ Regression models
Remarks:
Model(θ) also provides derivatives∂η(x , θ)/∂θ = (∂η(x , θ)/∂θ1, . . . , ∂η(x , θ)/∂θp)⊤
— plus higher-order derivatives if necessary—via simulation of sensitivity functions, or automatic differentiation (adjointcode)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 15 / 74
2 DoE based on asymptotic normality A/ Regression models
Remarks:
Model(θ) also provides derivatives∂η(x , θ)/∂θ = (∂η(x , θ)/∂θ1, . . . , ∂η(x , θ)/∂θp)⊤
— plus higher-order derivatives if necessary—via simulation of sensitivity functions, or automatic differentiation (adjointcode)
Criterion: other criteria than LS can be used, e.g.,J(θ) = 1
n
∑ni=1 |yi − η(xi , θ)| (→ robust estimation), including
Maximum-Likelihood (ML) estimation in more general settings(J(θ) = 1
n
∑ni=1 log π(yi |θ) → max!)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 15 / 74
2 DoE based on asymptotic normality A/ Regression models
Remarks:
Model(θ) also provides derivatives∂η(x , θ)/∂θ = (∂η(x , θ)/∂θ1, . . . , ∂η(x , θ)/∂θp)⊤
— plus higher-order derivatives if necessary—via simulation of sensitivity functions, or automatic differentiation (adjointcode)
Criterion: other criteria than LS can be used, e.g.,J(θ) = 1
n
∑ni=1 |yi − η(xi , θ)| (→ robust estimation), including
Maximum-Likelihood (ML) estimation in more general settings(J(θ) = 1
n
∑ni=1 log π(yi |θ) → max!)
We always assume independent observations y(xi) (independent εi inregression) — often much more difficult otherwise!
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 15 / 74
2 DoE based on asymptotic normality A/ Regression models
Remarks:
Model(θ) also provides derivatives∂η(x , θ)/∂θ = (∂η(x , θ)/∂θ1, . . . , ∂η(x , θ)/∂θp)⊤
— plus higher-order derivatives if necessary—via simulation of sensitivity functions, or automatic differentiation (adjointcode)
Criterion: other criteria than LS can be used, e.g.,J(θ) = 1
n
∑ni=1 |yi − η(xi , θ)| (→ robust estimation), including
Maximum-Likelihood (ML) estimation in more general settings(J(θ) = 1
n
∑ni=1 log π(yi |θ) → max!)
We always assume independent observations y(xi) (independent εi inregression) — often much more difficult otherwise!
Most of the following can be found in [P & Pázman: Design of Experiments inNonlinear Models, Springer, 2013]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 15 / 74
2 DoE based on asymptotic normality B/ LS estimation
B/ LS estimation
θ̂n = arg minθ1n
∑ni=1[yi − η(xi , θ)]
2
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 16 / 74
2 DoE based on asymptotic normality B/ LS estimation
B/ LS estimation
θ̂n = arg minθ1n
∑ni=1[yi − η(xi , θ)]
2
Linear model: η(x , θ) = f⊤(x)θ → θ̂n = (F⊤F)−1F⊤y ,
with y = (y1, . . . , yn)⊤ and F⊤ = (f(x1), . . . , f(xn))⊤
=⇒ choose the xi such that Mn = 1n
F⊤F has full rank(Mn = normalized information matrix)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 16 / 74
2 DoE based on asymptotic normality B/ LS estimation
B/ LS estimation
θ̂n = arg minθ1n
∑ni=1[yi − η(xi , θ)]
2
Linear model: η(x , θ) = f⊤(x)θ → θ̂n = (F⊤F)−1F⊤y ,
with y = (y1, . . . , yn)⊤ and F⊤ = (f(x1), . . . , f(xn))⊤
=⇒ choose the xi such that Mn = 1n
F⊤F has full rank(Mn = normalized information matrix)
Since yi = f⊤(xi)θ̄ + εi for some θ̄ and E{εi} = 0 for all i , E{y} = Fθ̄and E{θ̂n} = θ̄
Also, Var(θ̂n) = E{(θ̂n − θ̄)(θ̂n − θ̄)⊤} = σ2 (F⊤F)−1 = σ2
nM−1
n when the εi arei.i.d. with finite variance σ2
=⇒ choose the xi to minimize a scalar function of M−1n
(see Example 1: weighing with a two-pan balance)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 16 / 74
2 DoE based on asymptotic normality B/ LS estimation
Nonlinear model: η(x , θ)Under «standard» assumptions (θ ∈ Θ compact, η(x , θ) continuous in θ for allx . . . ) and for a suitable sequence (xi)
θ̂n a.s.→ θ̄ as n→∞ (strong consistency)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 17 / 74
2 DoE based on asymptotic normality B/ LS estimation
Nonlinear model: η(x , θ)Under «standard» assumptions (θ ∈ Θ compact, η(x , θ) continuous in θ for allx . . . ) and for a suitable sequence (xi)
θ̂n a.s.→ θ̄ as n→∞ (strong consistency)
Moreover, under «standard» regularity assumptions (η(x , θ) twice continuouslydifferentiable in θ for all x . . . ), for i.i.d. errors εi with finite variance σ2, for asuitable sequence (xi)√
n(θ̂n − θ̄) d→ N (0, σ2 M−1) as n→∞ (asymptotic normality)
with M = limn→∞1n
∑ni=1
∂η(xi ,θ)∂θ
∣∣θ̄
∂η(xi ,θ)∂θ⊤
∣∣θ̄
(information matrix)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 17 / 74
2 DoE based on asymptotic normality B/ LS estimation
Nonlinear model: η(x , θ)Under «standard» assumptions (θ ∈ Θ compact, η(x , θ) continuous in θ for allx . . . ) and for a suitable sequence (xi)
θ̂n a.s.→ θ̄ as n→∞ (strong consistency)
Moreover, under «standard» regularity assumptions (η(x , θ) twice continuouslydifferentiable in θ for all x . . . ), for i.i.d. errors εi with finite variance σ2, for asuitable sequence (xi)√
n(θ̂n − θ̄) d→ N (0, σ2 M−1) as n→∞ (asymptotic normality)
with M = limn→∞1n
∑ni=1
∂η(xi ,θ)∂θ
∣∣θ̄
∂η(xi ,θ)∂θ⊤
∣∣θ̄
(information matrix)
=⇒ choose the xi (design) to minimize a scalar function of M−1,or maximize a function Φ(M)
= classical approach for DoE(see Example 2 with a two-compartment model)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 17 / 74
2 DoE based on asymptotic normality B/ LS estimation
Remarks:
Weighted LS: suppose heteroscedastic errors
var{εi} = E{ε2i } = E{ε2(xi)} = σ2(xi)
Weighted LS estimator θ̂nWLS minimizes JWLS(θ) = 1
n
∑ni=1 w(xi) [yi − η(xi , θ)]
2
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 18 / 74
2 DoE based on asymptotic normality B/ LS estimation
Remarks:
Weighted LS: suppose heteroscedastic errors
var{εi} = E{ε2i } = E{ε2(xi)} = σ2(xi)
Weighted LS estimator θ̂nWLS minimizes JWLS(θ) = 1
n
∑ni=1 w(xi) [yi − η(xi , θ)]
2
Strong consistency and asymptotic normality√
n(θ̂nWLS − θ̄)
d→ N (0,C) asn→∞, whereC = M−1
a MbM−1a and
Ma = limn→∞1n
∑ni=1 w(xi)
∂η(xi ,θ)∂θ
∣∣θ̄
∂η(xi ,θ)∂θ⊤
∣∣θ̄
Mb = limn→∞1n
∑ni=1 w2(xi)σ
2(xi)∂η(xi ,θ)
∂θ
∣∣θ̄
∂η(xi ,θ)∂θ⊤
∣∣θ̄
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 18 / 74
2 DoE based on asymptotic normality B/ LS estimation
Remarks:
Weighted LS: suppose heteroscedastic errors
var{εi} = E{ε2i } = E{ε2(xi)} = σ2(xi)
Weighted LS estimator θ̂nWLS minimizes JWLS(θ) = 1
n
∑ni=1 w(xi) [yi − η(xi , θ)]
2
Strong consistency and asymptotic normality√
n(θ̂nWLS − θ̄)
d→ N (0,C) asn→∞, whereC = M−1
a MbM−1a and
Ma = limn→∞1n
∑ni=1 w(xi)
∂η(xi ,θ)∂θ
∣∣θ̄
∂η(xi ,θ)∂θ⊤
∣∣θ̄
Mb = limn→∞1n
∑ni=1 w2(xi)σ
2(xi)∂η(xi ,θ)
∂θ
∣∣θ̄
∂η(xi ,θ)∂θ⊤
∣∣θ̄
➠ C �M−1
M = limn→∞1n
∑ni=1
1σ2(xi )
∂η(xi ,θ)∂θ
∣∣θ̄
∂η(xi ,θ)∂θ⊤
∣∣θ̄
➠ C = M−1 when w(x) ∝ σ−2(x)
=⇒ choose the best estimator, then the best design
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 18 / 74
2 DoE based on asymptotic normality B/ LS estimation
Remarks (continued):
One may also consider the case var{εi} = σ2(xi , θ) (errors withparameterized variance)➠ Use two-stage LS: 1/ use w(x) ≡ 1 → θ̂n
(1); 2/ use w(x) = σ−2(x , θ̂n(1))
or use iteratively-reweighted LS (i.e., go on with more stages), orpenalized LS. . .
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 19 / 74
2 DoE based on asymptotic normality B/ LS estimation
Remarks (continued):
One may also consider the case var{εi} = σ2(xi , θ) (errors withparameterized variance)➠ Use two-stage LS: 1/ use w(x) ≡ 1 → θ̂n
(1); 2/ use w(x) = σ−2(x , θ̂n(1))
or use iteratively-reweighted LS (i.e., go on with more stages), orpenalized LS. . .
Similar asymptotic results for ML estimation√
n(θ̂nML − θ̄)
d→ N (0, σ2 M−1F )
as n→∞, with MF = Fisher information matrix
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 19 / 74
2 DoE based on asymptotic normality B/ LS estimation
Remarks (continued):
One may also consider the case var{εi} = σ2(xi , θ) (errors withparameterized variance)➠ Use two-stage LS: 1/ use w(x) ≡ 1 → θ̂n
(1); 2/ use w(x) = σ−2(x , θ̂n(1))
or use iteratively-reweighted LS (i.e., go on with more stages), orpenalized LS. . .
Similar asymptotic results for ML estimation√
n(θ̂nML − θ̄)
d→ N (0, σ2 M−1F )
as n→∞, with MF = Fisher information matrix
Model(θ) = linear ODE, experimental design = system input u(t)➠ (≈ simple) analytic expression for M➠ optimal input design ⇔ optimal control problem (frequency domain →optimal combination of sinusoidal signals) [Goodwin & Payne 1977; Zarrop1979; Ljung 1987; Walter & P 1994, 1997]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 19 / 74
2 DoE based on asymptotic normality C/ Design based on the information matrix
C/ Design based on the information matrix
Maximize Φ(M), but which Φ(·)?➠ There are many possibilities!LS estimation in linear regression with i.i.d. errors N (0, σ2)
R(θ̂n, α) = {θ ∈ Rp : (θ − θ̂n)⊤Mn(θ − θ̂n) ≤ σ2
nχ2
p(1− α)}
= confidence region (ellipsoid) at level α: Prob{θ̄ ∈ R(θ̂n, α)} = α(asymptotically true in nonlinear situations — e.g., nonlinear regression)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 20 / 74
2 DoE based on asymptotic normality C/ Design based on the information matrix
C/ Design based on the information matrix
Maximize Φ(M), but which Φ(·)?➠ There are many possibilities!LS estimation in linear regression with i.i.d. errors N (0, σ2)
R(θ̂n, α) = {θ ∈ Rp : (θ − θ̂n)⊤Mn(θ − θ̂n) ≤ σ2
nχ2
p(1− α)}
= confidence region (ellipsoid) at level α: Prob{θ̄ ∈ R(θ̂n, α)} = α(asymptotically true in nonlinear situations — e.g., nonlinear regression)
➠ Most criteria can be related to geometrical properties of R(θ̂n, α)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 20 / 74
2 DoE based on asymptotic normality C/ Design based on the information matrix
C/ Design based on the information matrix
Maximize Φ(M), but which Φ(·)?➠ There are many possibilities!LS estimation in linear regression with i.i.d. errors N (0, σ2)
R(θ̂n, α) = {θ ∈ Rp : (θ − θ̂n)⊤Mn(θ − θ̂n) ≤ σ2
nχ2
p(1− α)}
= confidence region (ellipsoid) at level α: Prob{θ̄ ∈ R(θ̂n, α)} = α(asymptotically true in nonlinear situations — e.g., nonlinear regression)
➠ Most criteria can be related to geometrical properties of R(θ̂n, α)
➠ Nonlinear model =⇒ M = M(θ) depends on the θ where η(x , θ) is linearized:for the moment use a nominal value θ0
➠ locally optimum design
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 20 / 74
2 DoE based on asymptotic normality C/ Design based on the information matrix
A few choices for Φ(·)
A-optimality: maximize −trace[M−1] ⇔ maximize 1/trace[M−1]⇔ minimize the sum of lengths2 of axes of (asymptotic) confidence ellipsoidsR(θ̂n, α)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 21 / 74
2 DoE based on asymptotic normality C/ Design based on the information matrix
A few choices for Φ(·)
A-optimality: maximize −trace[M−1] ⇔ maximize 1/trace[M−1]⇔ minimize the sum of lengths2 of axes of (asymptotic) confidence ellipsoidsR(θ̂n, α)
E -optimality: maximize λmin(M)⇔ minimize the longest axis of R(θ̂n, α)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 21 / 74
2 DoE based on asymptotic normality C/ Design based on the information matrix
A few choices for Φ(·)
A-optimality: maximize −trace[M−1] ⇔ maximize 1/trace[M−1]⇔ minimize the sum of lengths2 of axes of (asymptotic) confidence ellipsoidsR(θ̂n, α)
E -optimality: maximize λmin(M)⇔ minimize the longest axis of R(θ̂n, α)
D-optimality: maximize log det M⇔ minimize volume of R(θ̂n, α) (proportional to 1/
√det M)
Very much used:
a D-optimum design is invariant by reparametrization
det M′(β(θ)) = det M(θ) det−2(
∂β
∂θ⊤
)
often leads to repeat the same experimental conditions (replications)(remember Ex2: dim(θ) = 4 → 4 different sampling times, severalobservations at each)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 21 / 74
2 DoE based on asymptotic normality C/ Design based on the information matrix
Ds -optimality: only s < p parameters of interest (and p − s «nuisance»
parameters) → θ⊤ = (θ⊤1 , θ
⊤2 ), with θ1 the vector of s parameters of interest
M(θ) =
(M11 M12
M21 M22
)
, M−1(θ) =
(A11 A12
A21 A22
)
withA11 = [M11 − M12M−1
22 M21]−1
A12 = −[M11 − M12M−122 M21]−1M12M−1
22
A21 = −M−122 M21[M11 − M12M−1
22 M21]−1
A22 = M−122 + M−1
22 M21[M11 − M12M−122 M21]−1M12M−1
22
→ maximize ΦDs[M] = det[M11 −M12M−1
22 M21]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 22 / 74
2 DoE based on asymptotic normality C/ Design based on the information matrix
Ds -optimality: only s < p parameters of interest (and p − s «nuisance»
parameters) → θ⊤ = (θ⊤1 , θ
⊤2 ), with θ1 the vector of s parameters of interest
M(θ) =
(M11 M12
M21 M22
)
, M−1(θ) =
(A11 A12
A21 A22
)
withA11 = [M11 − M12M−1
22 M21]−1
A12 = −[M11 − M12M−122 M21]−1M12M−1
22
A21 = −M−122 M21[M11 − M12M−1
22 M21]−1
A22 = M−122 + M−1
22 M21[M11 − M12M−122 M21]−1M12M−1
22
→ maximize ΦDs[M] = det[M11 −M12M−1
22 M21]
➠ Useful for model discrimination:if η(2)(x , θ2) = η(1)(x , θ1) + δ(x , θ2\1) (nested models),
estimate θ2\1 in η(2) to decide whether η(1) or η(2) is moreappropriate, see [Atkinson & Cox 1974]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 22 / 74
3 Construction of (locally) optimal designs A/ Exact design
3 Construction of (locally) optimal designs
A/ Exact design
n observations at Xn = (x1, . . . , xn) in a regression model (for simplicity)Each design point xi can be anything, e.g. a point in a subset X of Rd
☞ maximize Φ(Mn) w.r.t. Xn with Mn = M(Xn, θ0) = 1
n
∑ni=1
∂η(xi ,θ)∂θ
∣∣θ0
∂η(xi ,θ)∂θ⊤
∣∣θ0
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 23 / 74
3 Construction of (locally) optimal designs A/ Exact design
3 Construction of (locally) optimal designs
A/ Exact design
n observations at Xn = (x1, . . . , xn) in a regression model (for simplicity)Each design point xi can be anything, e.g. a point in a subset X of Rd
☞ maximize Φ(Mn) w.r.t. Xn with Mn = M(Xn, θ0) = 1
n
∑ni=1
∂η(xi ,θ)∂θ
∣∣θ0
∂η(xi ,θ)∂θ⊤
∣∣θ0
• If problem dimension n × d not too large → standard algorithm (but withconstraints, local optimas. . . )
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 23 / 74
3 Construction of (locally) optimal designs A/ Exact design
3 Construction of (locally) optimal designs
A/ Exact design
n observations at Xn = (x1, . . . , xn) in a regression model (for simplicity)Each design point xi can be anything, e.g. a point in a subset X of Rd
☞ maximize Φ(Mn) w.r.t. Xn with Mn = M(Xn, θ0) = 1
n
∑ni=1
∂η(xi ,θ)∂θ
∣∣θ0
∂η(xi ,θ)∂θ⊤
∣∣θ0
• If problem dimension n × d not too large → standard algorithm (but withconstraints, local optimas. . . )• Otherwise, use an algorithm that takes the particular form of the problem intoaccount
Exchange methods: at iteration k, exchange one support point xj with a betterone x∗ in X (design space) — better for Φ(·)
X kn = (x1, . . . , xj
lx∗
, . . . , xn)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 23 / 74
3 Construction of (locally) optimal designs A/ Exact design
[Fedorov 1972]: consider all n possible exchanges successively, each timestarting from X k
n , retain the «best» one among these n → X k+1n
X kn = ( x1
lx∗
1
, . . . , xj
lx∗
j
, . . . , xn
lx∗
n
)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 24 / 74
3 Construction of (locally) optimal designs A/ Exact design
[Fedorov 1972]: consider all n possible exchanges successively, each timestarting from X k
n , retain the «best» one among these n → X k+1n
X kn = ( x1
lx∗
1
, . . . , xj
lx∗
j
, . . . , xn
lx∗
n
)
One iteration → n optimizations of dimension d followed by ranking n
criterion values
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 24 / 74
3 Construction of (locally) optimal designs A/ Exact design
[Mitchell, 1974]: DETMAX algorithmIf one additional observation were allowed → optimal choice
X k+n = (x1, . . . , xj , . . . , xn, x
∗n+1)
Then, remove one support point to return to a n-points design:
➔ consider all n + 1 possible cancellations,retain the less penalizing in the sense of Φ(·)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 25 / 74
3 Construction of (locally) optimal designs A/ Exact design
[Mitchell, 1974]: DETMAX algorithmIf one additional observation were allowed → optimal choice
X k+n = (x1, . . . , xj , . . . , xn, x
∗n+1)
Then, remove one support point to return to a n-points design:
➔ consider all n + 1 possible cancellations,retain the less penalizing in the sense of Φ(·)
➔ globally, exchange some xj with x∗n+1
[= excursion of length 1, longer excursions are possible. . . ]One iteration → 1 optimization of dimension d followed by ranking n + 1criterion values
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 25 / 74
3 Construction of (locally) optimal designs A/ Exact design
DETMAX has simpler iterations than Fedorov, but usually requires moreiterations
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 26 / 74
3 Construction of (locally) optimal designs A/ Exact design
DETMAX has simpler iterations than Fedorov, but usually requires moreiterations
dead ends are possible:• DETMAX: the point to be removed is xn+1
• Fedorov: no possible improvement when optimizing one xi at a time
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 26 / 74
3 Construction of (locally) optimal designs A/ Exact design
DETMAX has simpler iterations than Fedorov, but usually requires moreiterations
dead ends are possible:• DETMAX: the point to be removed is xn+1
• Fedorov: no possible improvement when optimizing one xi at a time
▲ both give local optima only ▲
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 26 / 74
3 Construction of (locally) optimal designs A/ Exact design
DETMAX has simpler iterations than Fedorov, but usually requires moreiterations
dead ends are possible:• DETMAX: the point to be removed is xn+1
• Fedorov: no possible improvement when optimizing one xi at a time
▲ both give local optima only ▲
Other methods:
Branch and bound: guaranteed convergence, but complicated [Welch 1982]Rounding an optimal design measure (support points xi and associatedweights w
∗
i , i = 1, . . . , m, presented next in B/):choose n integers ri (ri = nb. of replications of observations at xi ) such that∑m
i=1 ri = n and ri /n ≈ w∗
i
(e.g., maximize mini=1,...,m ri /w∗
i = Adams apportionment, see [Pukelsheim &Reider 1992])
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 26 / 74
3 Construction of (locally) optimal designs B/ Design measures: approximate design theory
B/ Design measures: approximate design theory
[Chernoff 1953; Kiefer & Wolfowitz 1960, Fedorov 1972; Silvey 1980, Pázman1986, Pukelsheim 1993, Fedorov & Leonov 2014. . . ](nonlinear) regression, n observations at Xn = (x1, . . . , xn) with i.i.d. errors:
M(Xn, θ0) =
1
n
n∑
i=1
∂η(xi , θ)
∂θ
∣∣θ0
∂η(xi , θ)
∂θ⊤
∣∣θ0
(the additive form is essential — related to the independence of observations)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 27 / 74
3 Construction of (locally) optimal designs B/ Design measures: approximate design theory
B/ Design measures: approximate design theory
[Chernoff 1953; Kiefer & Wolfowitz 1960, Fedorov 1972; Silvey 1980, Pázman1986, Pukelsheim 1993, Fedorov & Leonov 2014. . . ](nonlinear) regression, n observations at Xn = (x1, . . . , xn) with i.i.d. errors:
M(Xn, θ0) =
1
n
n∑
i=1
∂η(xi , θ)
∂θ
∣∣θ0
∂η(xi , θ)
∂θ⊤
∣∣θ0
(the additive form is essential — related to the independence of observations)Suppose that several xi ’s coincide (replications): only m < n different xi ’s
M(Xn, θ0) =
m∑
i=1
ri
n
∂η(xi , θ)
∂θ
∣∣θ0
∂η(xi , θ)
∂θ⊤
∣∣θ0
ri
n= proportion of observations collected at xi
= «percentage of experimental effort» at xi
= weight wi of support point xi
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 27 / 74
3 Construction of (locally) optimal designs B/ Design measures: approximate design theory
M(Xn, θ0) =
m∑
i=1
wi
∂η(xi , θ)
∂θ
∣∣θ0
∂η(xi , θ)
∂θ⊤
∣∣θ0
➔ design Xn ⇔{
x1 · · · xm
w1 · · · wm
}
with∑m
i=1 wi = 1
➔ normalized discrete distribution on the xi ,
with constraints ri/n = wi
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 28 / 74
3 Construction of (locally) optimal designs B/ Design measures: approximate design theory
M(Xn, θ0) =
m∑
i=1
wi
∂η(xi , θ)
∂θ
∣∣θ0
∂η(xi , θ)
∂θ⊤
∣∣θ0
➔ design Xn ⇔{
x1 · · · xm
w1 · · · wm
}
with∑m
i=1 wi = 1
➔ normalized discrete distribution on the xi ,
with constraints ri/n = wi
➠ Release the constraints: only enforce wi ≥ 0 with∑m
i=1 wi = 1➔ ξ = discrete probability measure on X (= design space)
support points xi and associated weights wi
= «approximate design»
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 28 / 74
3 Construction of (locally) optimal designs B/ Design measures: approximate design theory
M(Xn, θ0) =
m∑
i=1
wi
∂η(xi , θ)
∂θ
∣∣θ0
∂η(xi , θ)
∂θ⊤
∣∣θ0
➔ design Xn ⇔{
x1 · · · xm
w1 · · · wm
}
with∑m
i=1 wi = 1
➔ normalized discrete distribution on the xi ,
with constraints ri/n = wi
➠ Release the constraints: only enforce wi ≥ 0 with∑m
i=1 wi = 1➔ ξ = discrete probability measure on X (= design space)
support points xi and associated weights wi
= «approximate design»
More general expression: ξ = any probability measure on X
M(ξ) = M(ξ, θ0) =
∫
X
∂η(x , θ)
∂θ
∣∣∣∣θ0
∂η(x , θ)
∂θ⊤
∣∣∣∣θ0
ξ(dx) ,
∫
X
ξ(dx) = 1
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 28 / 74
3 Construction of (locally) optimal designs B/ Design measures: approximate design theory
M(ξ) ∈ convex closure of M= set of rank 1 matrices
M(δx ) = ∂η(x ,θ)∂θ
∣∣θ0
∂η(x ,θ)∂θ⊤
∣∣θ0
M(ξ) is symmetric p × p: ∈ q-dimensional space, q = p(p+1)2
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 29 / 74
3 Construction of (locally) optimal designs B/ Design measures: approximate design theory
M(ξ) ∈ convex closure of M= set of rank 1 matrices
M(δx ) = ∂η(x ,θ)∂θ
∣∣θ0
∂η(x ,θ)∂θ⊤
∣∣θ0
M(ξ) is symmetric p × p: ∈ q-dimensional space, q = p(p+1)2
ξ = w1δx1 + w2δx2 + w3δx3
(3 points are enough for q = 2)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 29 / 74
3 Construction of (locally) optimal designs B/ Design measures: approximate design theory
Caratheodory Theorem:M(ξ) can be written as the linear combination of at most q + 1 elements of M:
M(ξ) =
m∑
i=1
wi
∂η(xi , θ)
∂θ
∣∣θ0
∂η(xi , θ)
∂θ⊤
∣∣θ0 , m ≤ p(p + 1)
2+ 1
⇒ consider discrete probability measures with p(p+1)2 + 1 support points at most
(true in particular for the optimum design!)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 30 / 74
3 Construction of (locally) optimal designs B/ Design measures: approximate design theory
Caratheodory Theorem:M(ξ) can be written as the linear combination of at most q + 1 elements of M:
M(ξ) =
m∑
i=1
wi
∂η(xi , θ)
∂θ
∣∣θ0
∂η(xi , θ)
∂θ⊤
∣∣θ0 , m ≤ p(p + 1)
2+ 1
⇒ consider discrete probability measures with p(p+1)2 + 1 support points at most
(true in particular for the optimum design!)[Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is
on the boundary of the convex closure of M and p(p+1)2 support points are enough]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 30 / 74
3 Construction of (locally) optimal designs B/ Design measures: approximate design theory
Caratheodory Theorem:M(ξ) can be written as the linear combination of at most q + 1 elements of M:
M(ξ) =
m∑
i=1
wi
∂η(xi , θ)
∂θ
∣∣θ0
∂η(xi , θ)
∂θ⊤
∣∣θ0 , m ≤ p(p + 1)
2+ 1
⇒ consider discrete probability measures with p(p+1)2 + 1 support points at most
(true in particular for the optimum design!)[Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is
on the boundary of the convex closure of M and p(p+1)2 support points are enough]
Suppose we found an optimal ξ∗ =∑m
i=1 w∗i δxi
☞ for a given n, choose the ri so that ri
n≃ w∗
i optimum→ rounding of an approximate design
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 30 / 74
3 Construction of (locally) optimal designs B/ Design measures: approximate design theory
Caratheodory Theorem:M(ξ) can be written as the linear combination of at most q + 1 elements of M:
M(ξ) =
m∑
i=1
wi
∂η(xi , θ)
∂θ
∣∣θ0
∂η(xi , θ)
∂θ⊤
∣∣θ0 , m ≤ p(p + 1)
2+ 1
⇒ consider discrete probability measures with p(p+1)2 + 1 support points at most
(true in particular for the optimum design!)[Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is
on the boundary of the convex closure of M and p(p+1)2 support points are enough]
Suppose we found an optimal ξ∗ =∑m
i=1 w∗i δxi
☞ for a given n, choose the ri so that ri
n≃ w∗
i optimum→ rounding of an approximate design
☞ Sometimes, ξ∗ can be implemented without any approximation: ξ = power
spectral density of an input signal→ design of optimal input for ODE model in the frequency domain
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 30 / 74
3 Construction of (locally) optimal designs B/ Design measures: approximate design theory
Caratheodory Theorem:M(ξ) can be written as the linear combination of at most q + 1 elements of M:
M(ξ) =
m∑
i=1
wi
∂η(xi , θ)
∂θ
∣∣θ0
∂η(xi , θ)
∂θ⊤
∣∣θ0 , m ≤ p(p + 1)
2+ 1
⇒ consider discrete probability measures with p(p+1)2 + 1 support points at most
(true in particular for the optimum design!)[Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is
on the boundary of the convex closure of M and p(p+1)2 support points are enough]
Suppose we found an optimal ξ∗ =∑m
i=1 w∗i δxi
☞ for a given n, choose the ri so that ri
n≃ w∗
i optimum→ rounding of an approximate design
☞ Sometimes, ξ∗ can be implemented without any approximation: ξ = power
spectral density of an input signal→ design of optimal input for ODE model in the frequency domain
Why design measures are interesting?How does it simplify the optimization problem?
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 30 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
C/ Optimal design measures
➠ Maximize Φ(·) concave w.r.t. M(ξ) in a convex setEx: D-optimality: ∀ M1 ≻ O, M2 � O, with M2 6∝ M2, ∀α, 0 < α < 1,log det[(1− α)M1 + αM2] > (1− α) log det M1 + α log det M2
⇒ log det[·] is (strictly) concaveconvex set + concave criterion ⇒ one unique optimum!
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 31 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
C/ Optimal design measures
➠ Maximize Φ(·) concave w.r.t. M(ξ) in a convex setEx: D-optimality: ∀ M1 ≻ O, M2 � O, with M2 6∝ M2, ∀α, 0 < α < 1,log det[(1− α)M1 + αM2] > (1− α) log det M1 + α log det M2
⇒ log det[·] is (strictly) concaveconvex set + concave criterion ⇒ one unique optimum!
ξ∗ is optimal ⇔ directional derivative≤ 0 in all directions=⇒ “Equivalence Theorem” [Kiefer &Wolfowitz 1960]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 31 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
Ξ = set of probability measures on X , Φ(·) concave, φ(ξ) = Φ[M(ξ)]
Fφ(ξ; ν) = limα→0+φ[(1−α)ξ+αν]−φ(ξ)
α= directional derivative of φ(·) at ξ in direction ν
Equivalence Theorem: ξ∗ maximizes φ(ξ) ⇔ maxν∈Ξ Fφ(ξ∗; ν) ≤ 0
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 32 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
Ξ = set of probability measures on X , Φ(·) concave, φ(ξ) = Φ[M(ξ)]
Fφ(ξ; ν) = limα→0+φ[(1−α)ξ+αν]−φ(ξ)
α= directional derivative of φ(·) at ξ in direction ν
Equivalence Theorem: ξ∗ maximizes φ(ξ) ⇔ maxν∈Ξ Fφ(ξ∗; ν) ≤ 0
➔ Takes a simple form when Φ(·) is differentiable
ξ∗ maximizes φ(ξ) ⇔ maxx∈X Fφ(ξ∗; δx ) ≤ 0
☞ Check optimality of ξ∗ by plotting Fφ(ξ∗; δx )
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 32 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
Ξ = set of probability measures on X , Φ(·) concave, φ(ξ) = Φ[M(ξ)]
Fφ(ξ; ν) = limα→0+φ[(1−α)ξ+αν]−φ(ξ)
α= directional derivative of φ(·) at ξ in direction ν
Equivalence Theorem: ξ∗ maximizes φ(ξ) ⇔ maxν∈Ξ Fφ(ξ∗; ν) ≤ 0
➔ Takes a simple form when Φ(·) is differentiable
ξ∗ maximizes φ(ξ) ⇔ maxx∈X Fφ(ξ∗; δx ) ≤ 0
☞ Check optimality of ξ∗ by plotting Fφ(ξ∗; δx )
Ex: D-optimal design
ξ∗D maximizes log det[M(ξ)] w.r.t. ξ ∈ Ξ
⇔ maxx∈X d(ξ∗D , x) ≤ p
⇔ ξ∗D minimizes maxx∈X d(ξ, x) w.r.t. ξ ∈ Ξ
where d(ξ, x) = ∂η(x ,θ)∂θ⊤
∣∣θ0M
−1(ξ) ∂η(x ,θ)∂θ
∣∣θ0
Moreover, d(ξ∗D , xi) = p = dim(θ) for any xi = support point of ξ∗
D
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 32 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
Ex: η(x , θ) = θ1 exp(−θ2x) (p = 2) i.i.d. errors, X = R+
[θ02 = 2]
→ d(ξ, x) as a function of x
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 33 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
Ex: η(x , θ) = θ1 exp(−θ2x) (p = 2) i.i.d. errors, X = R+
[θ02 = 2]
→ d(ξ, x) as a function of x
ξ2 =
{0.01 0.751/2 1/2
}
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.5
1
1.5
2
2.5
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 33 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
Ex: η(x , θ) = θ1 exp(−θ2x) (p = 2) i.i.d. errors, X = R+
[θ02 = 2]
→ d(ξ, x) as a function of x
ξ2 =
{0.01 0.751/2 1/2
}
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.5
1
1.5
2
2.5
ξ∗D =
{0 1/θ2 = 0.5
1/2 1/2
}
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.5
1
1.5
2
2.5
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 33 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
KW Eq. Th. relates optimality in θ space to optimality in y space (i.i.d. errors)
n var[η(x , θ̂n)]→ σ2 ∂η(x ,θ)∂θ⊤
∣∣θ̄M−1(ξ, θ̄) ∂η(x ,θ)
∂θ
∣∣θ̄
= σ2 d(ξ, x)∣∣θ̄, n→∞
D-optimality ⇔ G-optimality
➠ ξ∗D minimizes the maximum value of prediction variance over X
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 34 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
KW Eq. Th. relates optimality in θ space to optimality in y space (i.i.d. errors)
n var[η(x , θ̂n)]→ σ2 ∂η(x ,θ)∂θ⊤
∣∣θ̄M−1(ξ, θ̄) ∂η(x ,θ)
∂θ
∣∣θ̄
= σ2 d(ξ, x)∣∣θ̄, n→∞
D-optimality ⇔ G-optimality
➠ ξ∗D minimizes the maximum value of prediction variance over X
η(x , θ̄), η(x , θ̄)± 2 st.d.
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
→ put next observation where d(ξ, x) is large
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 34 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
Remark:Eq. Th. = stationarity condition = NS condition for optimality
6= duality property!
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 35 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
Remark:Eq. Th. = stationarity condition = NS condition for optimality
6= duality property!
Dual problem to D-optimum design:
Define S ={
∂η(x ,θ)∂θ
∣∣θ0 , x ∈X
}
[S ∪ −S = Elfving’s set]
E∗ = minimum-volume ellipsoid centered at 0 that contains S
Lagrangian theory ⇒ E∗ = {z ∈ Rp : z⊤M−1F (ξ∗
D)z ≤ p} where ξ∗D is D-optimum
support points of ξ∗D = contact between E∗ and S
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 35 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
Remark:Eq. Th. = stationarity condition = NS condition for optimality
6= duality property!
Dual problem to D-optimum design:
Define S ={
∂η(x ,θ)∂θ
∣∣θ0 , x ∈X
}
[S ∪ −S = Elfving’s set]
E∗ = minimum-volume ellipsoid centered at 0 that contains S
Lagrangian theory ⇒ E∗ = {z ∈ Rp : z⊤M−1F (ξ∗
D)z ≤ p} where ξ∗D is D-optimum
support points of ξ∗D = contact between E∗ and S
In general, few contact points → repeat observations at the same place (see [Yang2010, Dette & Melas 2011])
There exist dual problems for other criteria Φ(·)
(= one of the main topics in [Pukelsheim 1993])
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 35 / 74
3 Construction of (locally) optimal designs C/ Optimal design measures
Ex: η(x , θ) = θ1θ1−θ2
[exp(−θ2x)− exp(−θ1x)]
θ = (1, 5), X = R+
−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
E∗ and S
⇒ D optimum design ξ∗D supported on two points
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 36 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
D/ Construction of an optimal design measure
Ξ = set of probability measures on X , Φ(·) concave and differentiable,φ(ξ) = Φ[M(ξ)]
Concavity =⇒ for any ξ ∈ Ξ, φ(ξ∗) ≤ φ(ξ) + maxx∈X Fφ(ξ; δx )
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 37 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
D/ Construction of an optimal design measure
Ξ = set of probability measures on X , Φ(·) concave and differentiable,φ(ξ) = Φ[M(ξ)]
Concavity =⇒ for any ξ ∈ Ξ, φ(ξ∗) ≤ φ(ξ) + maxx∈X Fφ(ξ; δx )
Fedorov–Wynn Algorithm: sort of steepest ascent
1 : Choose ξ1 not degenerate (det M(ξ1) > 0)
2 : Compute x∗k = arg maxX Fφ(ξk ; δx )
If Fφ(ξk ; δx+k
) < ǫ, stop: ξk is ǫ-optimal
3 : ξk+1 = (1− αk)ξk + αkδx∗
k(delta measure at x∗
k )[Vertex Direction]
k → k + 1, return to Step 2
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 37 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
D/ Construction of an optimal design measure
Ξ = set of probability measures on X , Φ(·) concave and differentiable,φ(ξ) = Φ[M(ξ)]
Concavity =⇒ for any ξ ∈ Ξ, φ(ξ∗) ≤ φ(ξ) + maxx∈X Fφ(ξ; δx )
Fedorov–Wynn Algorithm: sort of steepest ascent
1 : Choose ξ1 not degenerate (det M(ξ1) > 0)
2 : Compute x∗k = arg maxX Fφ(ξk ; δx )
If Fφ(ξk ; δx+k
) < ǫ, stop: ξk is ǫ-optimal
3 : ξk+1 = (1− αk)ξk + αkδx∗
k(delta measure at x∗
k )[Vertex Direction]
k → k + 1, return to Step 2
Step-size αk?
➠ αk = arg max φ(ξk+1) [=d(ξk ,x∗
k)−p
p[d(ξk ,x∗
k)−1]
for D-optimal design [Fedorov 1972]]
➔ monotone convergence
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 37 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
D/ Construction of an optimal design measure
Ξ = set of probability measures on X , Φ(·) concave and differentiable,φ(ξ) = Φ[M(ξ)]
Concavity =⇒ for any ξ ∈ Ξ, φ(ξ∗) ≤ φ(ξ) + maxx∈X Fφ(ξ; δx )
Fedorov–Wynn Algorithm: sort of steepest ascent
1 : Choose ξ1 not degenerate (det M(ξ1) > 0)
2 : Compute x∗k = arg maxX Fφ(ξk ; δx )
If Fφ(ξk ; δx+k
) < ǫ, stop: ξk is ǫ-optimal
3 : ξk+1 = (1− αk)ξk + αkδx∗
k(delta measure at x∗
k )[Vertex Direction]
k → k + 1, return to Step 2
Step-size αk?
➠ αk = arg max φ(ξk+1) [=d(ξk ,x∗
k)−p
p[d(ξk ,x∗
k)−1]
for D-optimal design [Fedorov 1972]]
➔ monotone convergence➠ αk > 0 , limk→∞ αk = 0 ,
∑∞i=1 αk =∞ [[Wynn 1970] for D-optimal design]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 37 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
Remarks:
Consider sequential design, one xi at a time enters M(X )M(Xk+1) = k
k+1 M(Xk)
+ 1k+1
∂η(xk+1,θ)∂θ
∣∣θ0
∂η(xk+1,θ)∂θ⊤
∣∣θ0
with xk+1 = arg maxX Fφ(ξk ; δx )⇔ Wynn algorithm with αk = 1
k+1
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 38 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
Remarks:
Consider sequential design, one xi at a time enters M(X )M(Xk+1) = k
k+1 M(Xk)
+ 1k+1
∂η(xk+1,θ)∂θ
∣∣θ0
∂η(xk+1,θ)∂θ⊤
∣∣θ0
with xk+1 = arg maxX Fφ(ξk ; δx )⇔ Wynn algorithm with αk = 1
k+1
Guaranteed convergence to the optimum
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 38 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
Remarks:
Consider sequential design, one xi at a time enters M(X )M(Xk+1) = k
k+1 M(Xk)
+ 1k+1
∂η(xk+1,θ)∂θ
∣∣θ0
∂η(xk+1,θ)∂θ⊤
∣∣θ0
with xk+1 = arg maxX Fφ(ξk ; δx )⇔ Wynn algorithm with αk = 1
k+1
Guaranteed convergence to the optimum
There exist faster methods:
remove support points from ξk (≈ allow αk to be < 0) [Atwood 1973;Böhning 1985, 1986]combine with gradient projection (or a second-order method) [Wu 1978]use a multiplicative algorithm [Titterington 1976; Torsney 1983–2009; Yu2010] [for D or A optimal design, far from the optimum]combine different methods [Yu 2011]Still an active topic. . .
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 38 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
Remarks: Usually, X = compact subset of Rd (e.g., the probability simplex formixture experiments)→ discretized into Xℓ with ℓ elements (a grid — or better, a low-discrepancysequence, see [Niederreiter 1992])➔ the algorithms above may be slow when ℓ is large
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 39 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
Remarks: Usually, X = compact subset of Rd (e.g., the probability simplex formixture experiments)→ discretized into Xℓ with ℓ elements (a grid — or better, a low-discrepancysequence, see [Niederreiter 1992])➔ the algorithms above may be slow when ℓ is large
➠ Combine continuous search for support points in X with optimization of adesign measure with few support points, say m≪ ℓ➠ Exploit guaranteed (and fast) convergence of algorithms for m small
+ use Eq. Th. to check optimality [Yang et al., 2013, P & Zhigljavsky, 2014]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 39 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
Ex: D-optimal design for
η(x , θ) = θ0 + θ1 exp(−θ2x1) +θ3
θ3 − θ4[exp(−θ4x2)− exp(−θ3x2)]
with x = (x1, x2) ∈X = [0, 2]× [0, 10] (and p = 5, θ02 = 2, θ0
3 = 0.7, θ04 = 0.2)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 40 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
Ex: D-optimal design for
η(x , θ) = θ0 + θ1 exp(−θ2x1) +θ3
θ3 − θ4[exp(−θ4x2)− exp(−θ3x2)]
with x = (x1, x2) ∈X = [0, 2]× [0, 10] (and p = 5, θ02 = 2, θ0
3 = 0.7, θ04 = 0.2)
Additive model [Schwabe 1995]: ξ∗D = tensor product of optimal designs for
β(1)0 + β
(1)1 exp(−β(1)
2 x1)and
β(2)0 + β
(2)1 [exp(−β(2)
2 x2)− exp(−β(2)1 x2)]/(β
(2)1 − β
(2)2 )
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 40 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
Ex: D-optimal design for
η(x , θ) = θ0 + θ1 exp(−θ2x1) +θ3
θ3 − θ4[exp(−θ4x2)− exp(−θ3x2)]
with x = (x1, x2) ∈X = [0, 2]× [0, 10] (and p = 5, θ02 = 2, θ0
3 = 0.7, θ04 = 0.2)
Additive model [Schwabe 1995]: ξ∗D = tensor product of optimal designs for
β(1)0 + β
(1)1 exp(−β(1)
2 x1)and
β(2)0 + β
(2)1 [exp(−β(2)
2 x2)− exp(−β(2)1 x2)]/(β
(2)1 − β
(2)2 )
Use the Equivalence Th. to construct ξ∗D (with arbitrary precision — Maple)
[weight 1/9 at (0, 0.46268527927, 2) ⊗ (0, 1.22947139883, 6.85768905493)]
➠ 7 iterations of the algorithm in [P & Zhigljavsky, 2014] yield ξ such thatmaxx∈X Fφ(ξ; δx ) < 10−5
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 40 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
What if Φ(·) not differentiable? (e.g., maximize Φ(M) = λmin(M))Φ(·) concave, X discretized into Xℓ, ℓ not too large➔ optimal design ⇐⇒ optimal vector of weights w ∈ R
ℓ
wi ≥ 0,∑ℓ
i=1 wi = 1
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 41 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
What if Φ(·) not differentiable? (e.g., maximize Φ(M) = λmin(M))Φ(·) concave, X discretized into Xℓ, ℓ not too large➔ optimal design ⇐⇒ optimal vector of weights w ∈ R
ℓ
wi ≥ 0,∑ℓ
i=1 wi = 1
subgradients (↔ directional derivatives)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 41 / 74
3 Construction of (locally) optimal designs D/ Construction of an optimal design measure
What if Φ(·) not differentiable? (e.g., maximize Φ(M) = λmin(M))Φ(·) concave, X discretized into Xℓ, ℓ not too large➔ optimal design ⇐⇒ optimal vector of weights w ∈ R
ℓ
wi ≥ 0,∑ℓ
i=1 wi = 1
subgradients (↔ directional derivatives)
general method for non-differentiable optimization (cutting plane method[Kelley 1960], level method [Nesterov 2004]), see Chap. 9 of [P & Pázman2013]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 41 / 74
4 Problems with nonlinear models
4 Problems with nonlinear models
Linear regression: easy
The expectation surface Sη = {η(θ) = (η(x1, θ), . . . , η(xn, θ))⊤ : θ ∈ R
p} is flatand linearly parameterized
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 42 / 74
4 Problems with nonlinear models
Nonlinear regression: may be a bit tricky
Sη is curved (intrinsic curvature) and nonlinearly parameterized (parametriccurvature) [Bates & Watts 1980]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 43 / 74
4 Problems with nonlinear models
Ex: η(x, θ) = θ1{x}1 + θ31(1− {x}1) + θ2{x}2 + θ2
2(1− {x}2)X = (x1, x2, x3), x1 = (0 1), x2 = (1 0), x3 = (1 1), θ ∈ [−3, 4]× [−2, 2]
−40−20
020
4060
80
−4
−2
0
2
4
6
8−6
−4
−2
0
2
4
6
η1
η2
η3
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 44 / 74
4 Problems with nonlinear models
Two important difficulties:
❶ Asymptotically (n→∞) — or if σ2 small enough — all seems fine(use linear approximations),
but the distribution of θ̂n may be far from normal for small n (or for σ2 large)➠ small-sample properties
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 45 / 74
4 Problems with nonlinear models
Two important difficulties:
❶ Asymptotically (n→∞) — or if σ2 small enough — all seems fine(use linear approximations),
but the distribution of θ̂n may be far from normal for small n (or for σ2 large)➠ small-sample properties
❷ Everything is local (depends on θ): if we linearize, where do we linearize?(choice of a nominal value θ0)
➠ nonlocal optimum design
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 45 / 74
5 Small-sample properties A/ A classification of regression models
5 Small-sample properties
A/ A classification of regression models
Suppose that
yi = y(xi) = η(xi , θ̄) + εi with E{εi} = 0 and E{ε2i } = σ2(xi) for all i
Divide yi and η(xi , θ̄) by σ(xi) ➔ one may suppose that σ2(x) = σ2 for all x
Denotey = (y1, . . . , yn)⊤ and η(θ) = (η(xi , θ), . . . , η(xn, θ))
⊤
ε = (ε1, . . . , εn)⊤ so that E{ε} = 0 and Var(ε) = σ2 In
We suppose η(x , θ) twice continuously differentiable w.r.t. θ for any x
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 46 / 74
5 Small-sample properties A/ A classification of regression models
5 Small-sample properties
A/ A classification of regression models
Suppose that
yi = y(xi) = η(xi , θ̄) + εi with E{εi} = 0 and E{ε2i } = σ2(xi) for all i
Divide yi and η(xi , θ̄) by σ(xi) ➔ one may suppose that σ2(x) = σ2 for all x
Denotey = (y1, . . . , yn)⊤ and η(θ) = (η(xi , θ), . . . , η(xn, θ))
⊤
ε = (ε1, . . . , εn)⊤ so that E{ε} = 0 and Var(ε) = σ2 In
We suppose η(x , θ) twice continuously differentiable w.r.t. θ for any x
➤ Expectation surface: Sη = {η(θ) : θ ∈ Rp}
➤ Orthogonal projector onto the tangent space to Sη at η(θ):
Pθ =1
n
∂η(θ)
∂θ⊤M−1(X , θ)
∂η(θ)
∂θ(a n × n matrix)
(both depend on X )Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 46 / 74
5 Small-sample properties A/ A classification of regression models
A classification of regression models [Pázman 1993]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 47 / 74
5 Small-sample properties A/ A classification of regression models
Intrinsically linear models
➤ The expectation surface Sη = {η(θ) : θ ∈ Rp} is flat (plane) — intrinsic
curvature ≡ 0➤ A reparameterization (continuously differentiable) exists that makes the modellinear➤ PθH�
ij(θ) = H�
ij(θ), where H�
ij(θ) = ∂2η(θ)∂θi ∂θj
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 48 / 74
5 Small-sample properties A/ A classification of regression models
Intrinsically linear models
➤ The expectation surface Sη = {η(θ) : θ ∈ Rp} is flat (plane) — intrinsic
curvature ≡ 0➤ A reparameterization (continuously differentiable) exists that makes the modellinear➤ PθH�
ij(θ) = H�
ij(θ), where H�
ij(θ) = ∂2η(θ)∂θi ∂θj
Observing at p different xi only (replications) makes the model intrinsically linear
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 48 / 74
5 Small-sample properties A/ A classification of regression models
Parametrically linear models
➤ M(X , θ) = constant➤ PθH�
ij(θ) = 0 — parametric curvature ≡ 0
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 49 / 74
5 Small-sample properties A/ A classification of regression models
Parametrically linear models
➤ M(X , θ) = constant➤ PθH�
ij(θ) = 0 — parametric curvature ≡ 0
Linear models
➤ η(x , θ) = f⊤(x)θ + c(x)➤ the model is intrinsically and parametrically linear
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 49 / 74
5 Small-sample properties A/ A classification of regression models
Parametrically linear models
➤ M(X , θ) = constant➤ PθH�
ij(θ) = 0 — parametric curvature ≡ 0
Linear models
➤ η(x , θ) = f⊤(x)θ + c(x)➤ the model is intrinsically and parametrically linear
Flat models
➤ A reparameterization exists that makes the information matrix constant➤ Riemannian curvature tensor ≡ 0 Rhijk(θ) = Thjik(θ)− Thkij(θ) ≡ 0 whereThjik(θ) = [H�
hj(θ)]⊤[In − Pθ]H�
ik(θ)If all parameters but one appear linearly, then the model is flat
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 49 / 74
5 Small-sample properties A/ A classification of regression models
A classification of regression models [Pázman 1993] (bis)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 50 / 74
5 Small-sample properties B/ Density of the LS estimator
B/ Density of the LS estimator
Suppose ε ∼ N (0, σ2In)Intrinsically linear models (in particular, repetitions at p points):
→ exact distribution θ̂n ∼ q(θ|θ̄) = np/2 det1/2 M(X ,θ)(2π)p/2 σp exp
{− 1
2σ2 ‖η(θ)− η(θ̄)‖2}
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 51 / 74
5 Small-sample properties B/ Density of the LS estimator
B/ Density of the LS estimator
Suppose ε ∼ N (0, σ2In)Intrinsically linear models (in particular, repetitions at p points):
→ exact distribution θ̂n ∼ q(θ|θ̄) = np/2 det1/2 M(X ,θ)(2π)p/2 σp exp
{− 1
2σ2 ‖η(θ)− η(θ̄)‖2}
Ex: η(x , θ) = exp(−θx), θ̄ = 2, 15 observations at the same x = 1/2 (σ2 = 1)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 51 / 74
5 Small-sample properties B/ Density of the LS estimator
B/ Density of the LS estimator
Suppose ε ∼ N (0, σ2In)Intrinsically linear models (in particular, repetitions at p points):
→ exact distribution θ̂n ∼ q(θ|θ̄) = np/2 det1/2 M(X ,θ)(2π)p/2 σp exp
{− 1
2σ2 ‖η(θ)− η(θ̄)‖2}
Ex: η(x , θ) = x θ3, θ̄ = 0, all observations at the same x 6= 0
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 51 / 74
5 Small-sample properties B/ Density of the LS estimator
Flat models: approximate density of θ̂n
q(θ|θ̄) = det[Q(θ,θ̄)](2π)p/2 σp np/2 det1/2 M(X ,θ)
exp{− 1
2σ2 ‖Pθ[η(θ)− η(θ̄)]‖2}
where {Q(θ, θ̄)}ij = {n M(X , θ)}ij + [η(θ)− η(θ̄)]⊤[In − Pθ]H�
ij(θ)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 52 / 74
5 Small-sample properties B/ Density of the LS estimator
Flat models: approximate density of θ̂n
q(θ|θ̄) = det[Q(θ,θ̄)](2π)p/2 σp np/2 det1/2 M(X ,θ)
exp{− 1
2σ2 ‖Pθ[η(θ)− η(θ̄)]‖2}
where {Q(θ, θ̄)}ij = {n M(X , θ)}ij + [η(θ)− η(θ̄)]⊤[In − Pθ]H�
ij(θ)
Remarks:
This approximation coincides with the saddle-point approximation ofHougaard (1985)
Other approximations (more complicated) for models with Rhijk(θ) 6≡ 0(non-flat)
An approximation of the density of the penalized LS estimatorarg minθ∈Θ
{‖y− η(θ)‖2 + 2w(θ)
}(which includes the case of Bayesian
estimation) is also available
We also know the (approximate) marginal densities of the LS estimator θ̂n
[Pázman & P 1996]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 52 / 74
5 Small-sample properties C/ Confidence regions
C/ Confidence regions
Suppose ε ∼ N (0, σ2In), define e(θ) = y− η(θ)➔ e⊤(θ̄) Pθ̄ e(θ̄)/σ2 ∼ χ2
p
➔ e⊤(θ̄) [In − Pθ̄] e(θ̄)/σ2 ∼ χ2n−p
and they are independent
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 53 / 74
5 Small-sample properties C/ Confidence regions
C/ Confidence regions
Suppose ε ∼ N (0, σ2In), define e(θ) = y− η(θ)➔ e⊤(θ̄) Pθ̄ e(θ̄)/σ2 ∼ χ2
p
➔ e⊤(θ̄) [In − Pθ̄] e(θ̄)/σ2 ∼ χ2n−p
and they are independent
➠ exact confidence regions at level α{θ ∈ R
p : e⊤(θ) Pθ e(θ)/σ2 < χ2p[1− α]
}(if σ2 known)
{
θ ∈ Rp : n−p
p
e⊤(θ) Pθ e(θ)e⊤(θ) [In−Pθ ] e(θ) < Fp,n−p[1− α]
}
(if σ2 unknown)
(but they are not of minimum volume, maybe composed of disconnectedsubsets. . . )
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 53 / 74
5 Small-sample properties C/ Confidence regions
C/ Confidence regions
Suppose ε ∼ N (0, σ2In), define e(θ) = y− η(θ)➔ e⊤(θ̄) Pθ̄ e(θ̄)/σ2 ∼ χ2
p
➔ e⊤(θ̄) [In − Pθ̄] e(θ̄)/σ2 ∼ χ2n−p
and they are independent
➠ exact confidence regions at level α{θ ∈ R
p : e⊤(θ) Pθ e(θ)/σ2 < χ2p[1− α]
}(if σ2 known)
{
θ ∈ Rp : n−p
p
e⊤(θ) Pθ e(θ)e⊤(θ) [In−Pθ ] e(θ) < Fp,n−p[1− α]
}
(if σ2 unknown)
(but they are not of minimum volume, maybe composed of disconnectedsubsets. . . )
➠ approximate confidence regions based on likelihood ratio (usually connected):{
θ ∈ Rp : ‖e(θ)‖2 − ‖e(θ̂)‖2 < σ2χ2
p[1− α]}
(if σ2 known){
θ ∈ Rp : ‖e(θ)‖2/‖e(θ̂)‖2 < 1 + p
n−pFp,n−p[1− α]
}
(if σ2 unknown)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 53 / 74
5 Small-sample properties D/ Design based on small-sample properties
D/ Design based on small-sample properties
3 main ideas (exact design only) based on:
a) (approximate) volume of (approximate) confidence regions (not necessarily ofminimum volume) [Hamilton & Watts 1985; Vila 1990; Vila & Gauchi 2007]
(ellipsoidal approximation → D-optimality)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 54 / 74
5 Small-sample properties D/ Design based on small-sample properties
D/ Design based on small-sample properties
3 main ideas (exact design only) based on:
a) (approximate) volume of (approximate) confidence regions (not necessarily ofminimum volume) [Hamilton & Watts 1985; Vila 1990; Vila & Gauchi 2007]
(ellipsoidal approximation → D-optimality)
b) (approximate or exact) density of θ̂n
e.g., minimize∫‖θ − θ̄‖2q(θ|θ̄) dθ w.r.t. X (using stochastic approximation,
[Pázman & P 1992, Gauchi & Pázman 2006])
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 54 / 74
5 Small-sample properties D/ Design based on small-sample properties
D/ Design based on small-sample properties
3 main ideas (exact design only) based on:
a) (approximate) volume of (approximate) confidence regions (not necessarily ofminimum volume) [Hamilton & Watts 1985; Vila 1990; Vila & Gauchi 2007]
(ellipsoidal approximation → D-optimality)
b) (approximate or exact) density of θ̂n
e.g., minimize∫‖θ − θ̄‖2q(θ|θ̄) dθ w.r.t. X (using stochastic approximation,
[Pázman & P 1992, Gauchi & Pázman 2006])
c) higher-order approximation of optimality criteria,using ϕ(y|X , θ̄) = N (η(θ̄), σ2In)
minimize MSE∫‖θ̂n(y)− θ̄‖2ϕ(y|X , θ̄) dy [Clarke 1980]
minimize entropy −∫
log[q(θ̂n(y)|θ̄)]ϕ(y|X , θ̄) dy [P & Pázman 1994]
(usual normal approximation for q(·|θ̄) → D-optimality)→ explicit (but rather complicated) expressions (depend on 3rd-orderderivatives of η(x , θ) w.r.t. θ)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 54 / 74
5 Small-sample properties E/ One additional difficulty
E/ One additional difficulty
Overlapping of Sη, local minimizers. . .
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.5
1
1.5
2
2.5
3
3.5
4
+
+
+B
C
A
▲ Important and difficult problem, often neglected!
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 55 / 74
5 Small-sample properties E/ One additional difficulty
What can we do at the design stage?➠ extensions of usual optimality criteria, e.g.
maximize φeE (X ) = minθ
‖η(θ)− η(θ0)‖2
‖θ − θ0‖2
or
maximize φeE (ξ) = minθ
∫[η(x , θ)− η(x , θ0)]2 ξ(dx)
‖θ − θ0‖2
➔ corresponds to E -optimal design if the model is linear (maximize λminM(ξ)),see Chap. 7 of [P & Pázman 2013] and [Pázman & P, 2014]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 56 / 74
5 Small-sample properties E/ One additional difficulty
What can we do at the design stage?➠ extensions of usual optimality criteria, e.g.
maximize φeE (X ) = minθ
‖η(θ)− η(θ0)‖2
‖θ − θ0‖2
or
maximize φeE (ξ) = minθ
∫[η(x , θ)− η(x , θ0)]2 ξ(dx)
‖θ − θ0‖2
➔ corresponds to E -optimal design if the model is linear (maximize λminM(ξ)),see Chap. 7 of [P & Pázman 2013] and [Pázman & P, 2014]
▲ All approaches presented so far are local(the optimal design depends on θ̄ unknown ← θ0)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 56 / 74
6 Nonlocal optimum design
6 Nonlocal optimum design
Ex: η(x , θ) = exp(−θx), yi = η(xi , θ̄) + εi , θ > 0, x ∈X = [0,∞)➔ M(ξ, θ0) =
∫
Xx2 exp(−2θ0x) ξ(dx)
➠ ξ∗D = ξ∗
A = . . . = δ1/θ0
Objective: remove the dependence in nominal value θ0
3 main classes of methods (related)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 57 / 74
6 Nonlocal optimum design
6 Nonlocal optimum design
Ex: η(x , θ) = exp(−θx), yi = η(xi , θ̄) + εi , θ > 0, x ∈X = [0,∞)➔ M(ξ, θ0) =
∫
Xx2 exp(−2θ0x) ξ(dx)
➠ ξ∗D = ξ∗
A = . . . = δ1/θ0
Objective: remove the dependence in nominal value θ0
3 main classes of methods (related)❶ Average optimum design: maximize Eθ{φ(X , θ)} (or Eθ{φ(ξ, θ)})
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 57 / 74
6 Nonlocal optimum design
6 Nonlocal optimum design
Ex: η(x , θ) = exp(−θx), yi = η(xi , θ̄) + εi , θ > 0, x ∈X = [0,∞)➔ M(ξ, θ0) =
∫
Xx2 exp(−2θ0x) ξ(dx)
➠ ξ∗D = ξ∗
A = . . . = δ1/θ0
Objective: remove the dependence in nominal value θ0
3 main classes of methods (related)❶ Average optimum design: maximize Eθ{φ(X , θ)} (or Eθ{φ(ξ, θ)})❷ Maximin optimum design: maximize minθ{φ(X , θ)} (or minθ{φ(ξ, θ)})➠ Between ❶ and ❷: regularized maximin criteria, quantiles and probability levelcriteria
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 57 / 74
6 Nonlocal optimum design
6 Nonlocal optimum design
Ex: η(x , θ) = exp(−θx), yi = η(xi , θ̄) + εi , θ > 0, x ∈X = [0,∞)➔ M(ξ, θ0) =
∫
Xx2 exp(−2θ0x) ξ(dx)
➠ ξ∗D = ξ∗
A = . . . = δ1/θ0
Objective: remove the dependence in nominal value θ0
3 main classes of methods (related)❶ Average optimum design: maximize Eθ{φ(X , θ)} (or Eθ{φ(ξ, θ)})❷ Maximin optimum design: maximize minθ{φ(X , θ)} (or minθ{φ(ξ, θ)})➠ Between ❶ and ❷: regularized maximin criteria, quantiles and probability levelcriteria❸ Sequential design
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 57 / 74
6 Nonlocal optimum design A/ Average Optimum design
A/ Average Optimum design
Nothing special: probability measure µ(dθ) on Θ ⊆ Rp
φ(·, θ0)→ φAO(·) =∫
Θ φ(·, θ)µ(dθ)
No difficulty if Θ = {θ(1), . . . θ(M)} finite and µ =∑M
i=1 αiδ(i)θ (integral → finite
sum)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 58 / 74
6 Nonlocal optimum design A/ Average Optimum design
A/ Average Optimum design
Nothing special: probability measure µ(dθ) on Θ ⊆ Rp
φ(·, θ0)→ φAO(·) =∫
Θ φ(·, θ)µ(dθ)
No difficulty if Θ = {θ(1), . . . θ(M)} finite and µ =∑M
i=1 αiδ(i)θ (integral → finite
sum)
☛ Approximate design theory:φAO(·) is concave when each φ(·, θ) is concave➠ same properties and same algorithms as in Section 3 fordesign measures
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 58 / 74
6 Nonlocal optimum design A/ Average Optimum design
A/ Average Optimum design
Nothing special: probability measure µ(dθ) on Θ ⊆ Rp
φ(·, θ0)→ φAO(·) =∫
Θ φ(·, θ)µ(dθ)
No difficulty if Θ = {θ(1), . . . θ(M)} finite and µ =∑M
i=1 αiδ(i)θ (integral → finite
sum)
☛ Approximate design theory:φAO(·) is concave when each φ(·, θ) is concave➠ same properties and same algorithms as in Section 3 fordesign measures
☛ Exact design: same algorithms as in Section 3(for continuous distributions µ use stochastic approximation to avoidevaluations of integrals [P & Walter 1985])
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 58 / 74
6 Nonlocal optimum design A/ Average Optimum design
A Bayesian interpretation:Suppose µ=prior distribution has a density π(θ)
→ entropy −∫
Θ π(θ) log[π(θ)] dθ
Posterior distribution of θ: π(θ|X , y) = ϕ(y|X ,θ)π(θ)ϕ(y|X)
Gain in information = decrease of entropyEntropy may increase, but expected gain in information I(X ) is always positive[Lindley 1956]
➠ I(X ) = Ey{∫
Θ(π(θ|X , y) log[π(θ|X , y)]− π(θ) log[π(θ)]) dθ}where the expectation Ey{·} is for the marginal ϕ(y|X )
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 59 / 74
6 Nonlocal optimum design A/ Average Optimum design
A Bayesian interpretation:Suppose µ=prior distribution has a density π(θ)
→ entropy −∫
Θ π(θ) log[π(θ)] dθ
Posterior distribution of θ: π(θ|X , y) = ϕ(y|X ,θ)π(θ)ϕ(y|X)
Gain in information = decrease of entropyEntropy may increase, but expected gain in information I(X ) is always positive[Lindley 1956]
➠ I(X ) = Ey{∫
Θ(π(θ|X , y) log[π(θ|X , y)]− π(θ) log[π(θ)]) dθ}where the expectation Ey{·} is for the marginal ϕ(y|X )
If the experiment informative enough (σ2 small, n large enough):
maximizing I(X )∼⇔ maximizing
∫log det M(X , θ)π(θ) dθ
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 59 / 74
6 Nonlocal optimum design A/ Average Optimum design
Which prior π(θ)? Expected gain in information maximum when π(·) =noninformative prior (Jeffrey) — which depends on ξ
π∗(θ) = det1/2 M(ξ,θ)∫
Θdet1/2 M(ξ,θ) dθ
➠ maximize∫
det1/2 M(ξ, θ) dθ
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 60 / 74
6 Nonlocal optimum design A/ Average Optimum design
Which prior π(θ)? Expected gain in information maximum when π(·) =noninformative prior (Jeffrey) — which depends on ξ
π∗(θ) = det1/2 M(ξ,θ)∫
Θdet1/2 M(ξ,θ) dθ
➠ maximize∫
det1/2 M(ξ, θ) dθ
πν(θ) = det1/2 M(ν,θ)∫
Θdet1/2 M(ν,θ) dθ
→ uniform distribution of responses η(·, θ) (for the
metric defined by ν) [Bornkamp 2011]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 60 / 74
6 Nonlocal optimum design A/ Average Optimum design
Which prior π(θ)? Expected gain in information maximum when π(·) =noninformative prior (Jeffrey) — which depends on ξ
π∗(θ) = det1/2 M(ξ,θ)∫
Θdet1/2 M(ξ,θ) dθ
➠ maximize∫
det1/2 M(ξ, θ) dθ
πν(θ) = det1/2 M(ν,θ)∫
Θdet1/2 M(ν,θ) dθ
→ uniform distribution of responses η(·, θ) (for the
metric defined by ν) [Bornkamp 2011]Ex: η(x , θ) = exp(−θx), α-quantiles of η(x , θ) for different π
π uniform on Θ = [1, 10]
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
η(x
,θ)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 60 / 74
6 Nonlocal optimum design A/ Average Optimum design
Which prior π(θ)? Expected gain in information maximum when π(·) =noninformative prior (Jeffrey) — which depends on ξ
π∗(θ) = det1/2 M(ξ,θ)∫
Θdet1/2 M(ξ,θ) dθ
➠ maximize∫
det1/2 M(ξ, θ) dθ
πν(θ) = det1/2 M(ν,θ)∫
Θdet1/2 M(ν,θ) dθ
→ uniform distribution of responses η(·, θ) (for the
metric defined by ν) [Bornkamp 2011]Ex: η(x , θ) = exp(−θx), α-quantiles of η(x , θ) for different π
π uniform on Θ = [1, 10]
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
η(x
,θ)
πν , ν uniform on X = [0, 2]
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
η(x
,θ)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 60 / 74
6 Nonlocal optimum design B/ Maximin Optimum design
B/ Maximin Optimum design
φ(·, θ0)→ φMmO(·) = minθ∈Θ φ(·, θ)☛ Exact design:
Θ finite → same algorithms as in Section 3Θ compact subset of Rp → relaxation method to solve a sequenceof maximin problems with finite (and growing) sets Θ(k) [P & Walter 1988]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 61 / 74
6 Nonlocal optimum design B/ Maximin Optimum design
B/ Maximin Optimum design
φ(·, θ0)→ φMmO(·) = minθ∈Θ φ(·, θ)☛ Exact design:
Θ finite → same algorithms as in Section 3Θ compact subset of Rp → relaxation method to solve a sequenceof maximin problems with finite (and growing) sets Θ(k) [P & Walter 1988]
☛ Approximate design:φMmO(·) concave when each φ(·, θ) is concave▲ but φMmO(·) is non-differentiable!➠ maximize φMmO(ξ) using a specific algorithm forconcave non-differentiable maximization (cutting plane, level method. . .see Section 3)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 61 / 74
6 Nonlocal optimum design B/ Maximin Optimum design
How to check optimality of ξ∗?
φ(·, θ0) differentiable: maxx∈X Fφ(ξ∗; δx , θ0) ≤ 0 ?
→ plot Fφ(ξ∗; δx , θ0) as a function of x
φMmO(·) not differentiable: maxν∈Ξ FφMmO(ξ∗; ν) ≤ 0 cannot be exploited directly
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 62 / 74
6 Nonlocal optimum design B/ Maximin Optimum design
How to check optimality of ξ∗?
φ(·, θ0) differentiable: maxx∈X Fφ(ξ∗; δx , θ0) ≤ 0 ?
→ plot Fφ(ξ∗; δx , θ0) as a function of x
φMmO(·) not differentiable: maxν∈Ξ FφMmO(ξ∗; ν) ≤ 0 cannot be exploited directly
Equivalence Theorem:ξ∗ maximizes φMmO(ξ) ⇔ maxν∈Ξ FφMmO
(ξ∗; ν) ≤ 0⇔ maxν∈Ξ minθ∈Θ(ξ∗) Fφ(ξ∗; ν, θ) ≤ 0
with Θ(ξ) = {θ : φ(ξ, θ) = φMmO(ξ)}⇔ maxx∈X
∫
Θ(ξ∗) Fφ(ξ∗; δx , θ)µ∗(dθ) ≤ 0
for some probability measure µ∗ on Θ(ξ∗)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 62 / 74
6 Nonlocal optimum design B/ Maximin Optimum design
How to check optimality of ξ∗?
φ(·, θ0) differentiable: maxx∈X Fφ(ξ∗; δx , θ0) ≤ 0 ?
→ plot Fφ(ξ∗; δx , θ0) as a function of x
φMmO(·) not differentiable: maxν∈Ξ FφMmO(ξ∗; ν) ≤ 0 cannot be exploited directly
Equivalence Theorem:ξ∗ maximizes φMmO(ξ) ⇔ maxν∈Ξ FφMmO
(ξ∗; ν) ≤ 0⇔ maxν∈Ξ minθ∈Θ(ξ∗) Fφ(ξ∗; ν, θ) ≤ 0
with Θ(ξ) = {θ : φ(ξ, θ) = φMmO(ξ)}⇔ maxx∈X
∫
Θ(ξ∗) Fφ(ξ∗; δx , θ)µ∗(dθ) ≤ 0
for some probability measure µ∗ on Θ(ξ∗)
Once ξ∗ is determined, solve a LP problem:µ∗ on Θ(ξ∗) minimizes maxx∈X
∫
Θ(ξ∗) Fφ(ξ∗; δx , θ)µ(dθ)
➠ plot∫
Θ(ξ∗) Fφ(ξ∗; δx , θ)µ∗(dθ) (should be ≤ 0)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 62 / 74
6 Nonlocal optimum design B/ Maximin Optimum design
Ex: η(x , θ) = θ1 exp(−θ2x), p = 2, X = [0, 2], θ2 ∈ [0, θ2max ]
φ(ξ, θ) = det1/p M(ξ,θ)det1/p M(ξ∗
D,θ)
(D efficiency, ∈ [0, 1])
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 63 / 74
6 Nonlocal optimum design B/ Maximin Optimum design
Ex: η(x , θ) = θ1 exp(−θ2x), p = 2, X = [0, 2], θ2 ∈ [0, θ2max ]
φ(ξ, θ) = det1/p M(ξ,θ)det1/p M(ξ∗
D,θ)
(D efficiency, ∈ [0, 1])∫
Θ(ξ∗) Fφ(ξ∗; δx , θ)µ∗(dθ) for
θ2max = 2 (solid line, 2 support points) andθ2max = 20 (dashed line, 4 support points)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.06
−0.05
−0.04
−0.03
−0.02
−0.01
0
0.01
x
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 63 / 74
6 Nonlocal optimum design C/ Regularized Maximin Optimum design
C/ Regularized Maximin Optimum design
Suppose φ(·, θ) > 0 for all θ; µ a probability measure on Θ ⊂ Rp compact
φMmO(ξ) = minθ∈Θ φ(·, θ) ≤ φq(ξ) =[∫
Θ φ−q(ξ, θ)µ(dθ)
]− 1q (differentiable)
with φ−1(ξ) = φAO(ξ), φ0(ξ) = exp{∫
Θ log[φ(ξ, θ)]µ(dθ)}
and
φq(ξ) → φMmO(ξ) as q →∞ (and φq(·) concave for q ≥ −1)
Moreover, Θ = {θ(1), . . . , θ(M)}, µ =
∑
iδ
θ(i)
M=⇒ φMmO(ξ∗
q )φ∗
MmO
≥ M−1/q
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 64 / 74
6 Nonlocal optimum design C/ Regularized Maximin Optimum design
C/ Regularized Maximin Optimum design
Suppose φ(·, θ) > 0 for all θ; µ a probability measure on Θ ⊂ Rp compact
φMmO(ξ) = minθ∈Θ φ(·, θ) ≤ φq(ξ) =[∫
Θ φ−q(ξ, θ)µ(dθ)
]− 1q (differentiable)
with φ−1(ξ) = φAO(ξ), φ0(ξ) = exp{∫
Θ log[φ(ξ, θ)]µ(dθ)}
and
φq(ξ) → φMmO(ξ) as q →∞ (and φq(·) concave for q ≥ −1)
Moreover, Θ = {θ(1), . . . , θ(M)}, µ =
∑
iδ
θ(i)
M=⇒ φMmO(ξ∗
q )φ∗
MmO
≥ M−1/q
Ex: η = exp(−θx)
φ(ξ, θ) = M(ξ,θ)M(ξ∗,θ) (= efficiency)
Plot of φq(δx ) function of x
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
q=−0.8
q=0.4
q=40
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 64 / 74
6 Nonlocal optimum design D/ Quantiles and probability level criteria
D/ Quantiles and probability level criteria
A/ ξ∗AO good for µ(dθ) on Θ, may be bad for some θ▲ for ψ(·)ր, maximizing ψ
[∫
Θ φ(·, θ)µ(dθ)]
(AO-opt.)is different from maximizing
∫
Θ ψ[φ(·, θ)]µ(dθ)B/ ξ∗
MmO often depends on the boundary of Θ(➔ we often simply replace the dependence on θ0 by a dependence on θmax)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 65 / 74
6 Nonlocal optimum design D/ Quantiles and probability level criteria
D/ Quantiles and probability level criteria
A/ ξ∗AO good for µ(dθ) on Θ, may be bad for some θ▲ for ψ(·)ր, maximizing ψ
[∫
Θ φ(·, θ)µ(dθ)]
(AO-opt.)is different from maximizing
∫
Θ ψ[φ(·, θ)]µ(dθ)B/ ξ∗
MmO often depends on the boundary of Θ(➔ we often simply replace the dependence on θ0 by a dependence on θmax)
u given →Pu(ξ) = µ{φ(ξ, θ) ≥ u}
α ∈ (0, 1) given →Qα(ξ) = max{u : Pu(ξ) ≥ 1−α}
0 1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
Pu(ξ)=1−α
α
u=Qα(ξ)
p.d
.f. of φ
(ξ;θ
)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 65 / 74
6 Nonlocal optimum design D/ Quantiles and probability level criteria
➤ maximizing Pu(ξ) = µ{φ(ξ, θ) ≥ u} is well adapted toφ(ξ, θ) = efficiency ∈ (0, 1)
➤ Qα(ξ)→ φMMO as α→ 0➤ for ψ(·)ր, using ψ[φ(ξ, θ)] does not change Pu(ξ) and Qα(ξ)▲ Pu(ξ) and Qα(ξ) generally not concave!
(but we can compute directional derivatives and maximize)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 66 / 74
6 Nonlocal optimum design D/ Quantiles and probability level criteria
➤ maximizing Pu(ξ) = µ{φ(ξ, θ) ≥ u} is well adapted toφ(ξ, θ) = efficiency ∈ (0, 1)
➤ Qα(ξ)→ φMMO as α→ 0➤ for ψ(·)ր, using ψ[φ(ξ, θ)] does not change Pu(ξ) and Qα(ξ)▲ Pu(ξ) and Qα(ξ) generally not concave!
(but we can compute directional derivatives and maximize)
Ex: η = exp(−θx)φ(ξ, θ) = M(ξ, θ)Plot of Qα(δx ) function of x
(with φAO(δx ) and φMmO(δx ))
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
α=0.5
α=0.1
α=0.05
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 66 / 74
6 Nonlocal optimum design D/ Quantiles and probability level criteria
Ongoing work: conditional value at risk (also called superquantile)
φα(ξ) =1
α
∫
{θ:φ(ξ,θ)≤Qα(ξ)}φ(ξ, θ)µ(dθ)
which is concave in ξ when φ(·, θ) is concave for all θ, see (Valenzuela et al.,2015; Guerra, 2016)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 67 / 74
6 Nonlocal optimum design E/ Sequential design
E/ Sequential design
θ0 → design: X 1 = arg maxX φ(X , θ0)→ observe: y1 = y1(X 1)→ estimate: θ̂1 = arg minθ J(θ; y1,X 1)→ design: X 2 = arg maxX φ({X 1,X}, θ̂1)→ observe: y2 = y2(X 2)→ estimate: θ̂2 = arg minθ J(θ; { y1, y2
︸ ︷︷ ︸
growing
}, {X 1,X 2
︸ ︷︷ ︸
growing
})
→ design: X 3 = arg maxX φ({X 1,X 2,X}, θ̂2). . . etc.
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 68 / 74
6 Nonlocal optimum design E/ Sequential design
E/ Sequential design
θ0 → design: X 1 = arg maxX φ(X , θ0)→ observe: y1 = y1(X 1)→ estimate: θ̂1 = arg minθ J(θ; y1,X 1)→ design: X 2 = arg maxX φ({X 1,X}, θ̂1)→ observe: y2 = y2(X 2)→ estimate: θ̂2 = arg minθ J(θ; { y1, y2
︸ ︷︷ ︸
growing
}, {X 1,X 2
︸ ︷︷ ︸
growing
})
→ design: X 3 = arg maxX φ({X 1,X 2,X}, θ̂2). . . etc.
☞ Replace unknown θ by best current guess θ̂k
(there exist variants with Bayesian estimation and average optimality)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 68 / 74
6 Nonlocal optimum design E/ Sequential design
E/ Sequential design
θ0 → design: X 1 = arg maxX φ(X , θ0)→ observe: y1 = y1(X 1)→ estimate: θ̂1 = arg minθ J(θ; y1,X 1)→ design: X 2 = arg maxX φ({X 1,X}, θ̂1)→ observe: y2 = y2(X 2)→ estimate: θ̂2 = arg minθ J(θ; { y1, y2
︸ ︷︷ ︸
growing
}, {X 1,X 2
︸ ︷︷ ︸
growing
})
→ design: X 3 = arg maxX φ({X 1,X 2,X}, θ̂2). . . etc.
☞ Replace unknown θ by best current guess θ̂k
(there exist variants with Bayesian estimation and average optimality)
▲ Consistency of θ̂n?Asymptotic normality (for design based on M)?
(X k depends on y1, . . . , yk−1 =⇒ independence is lost)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 68 / 74
6 Nonlocal optimum design E/ Sequential design
➠ No problem if each X i has size ≥ p = dim(θ) (batch sequential design)
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 69 / 74
6 Nonlocal optimum design E/ Sequential design
➠ No problem if each X i has size ≥ p = dim(θ) (batch sequential design)If n observation in total, two stages only: size of first batch?
→ should be proportional to√
n (but it does not say much . . . )
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 69 / 74
6 Nonlocal optimum design E/ Sequential design
➠ No problem if each X i has size ≥ p = dim(θ) (batch sequential design)If n observation in total, two stages only: size of first batch?
→ should be proportional to√
n (but it does not say much . . . )
➠ Full sequential design: X k = {xk} (batches of size 1)→ convergence properties difficult to investigate
M(Xk+1, θ̂k) =
k
k + 1M(Xk , θ̂
k) +1
k + 1
∂η(xk+1, θ)
∂θ
∣∣θ̂k
∂η(xk+1, θ)
∂θ⊤
∣∣θ̂k
with xk+1 = arg maxX Fφ(ξk ; δx |θ̂k) ⇔ Wynn’s algorithm [1970] with αk = 1k+1
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 69 / 74
6 Nonlocal optimum design E/ Sequential design
➠ No problem if each X i has size ≥ p = dim(θ) (batch sequential design)If n observation in total, two stages only: size of first batch?
→ should be proportional to√
n (but it does not say much . . . )
➠ Full sequential design: X k = {xk} (batches of size 1)→ convergence properties difficult to investigate
M(Xk+1, θ̂k) =
k
k + 1M(Xk , θ̂
k) +1
k + 1
∂η(xk+1, θ)
∂θ
∣∣θ̂k
∂η(xk+1, θ)
∂θ⊤
∣∣θ̂k
with xk+1 = arg maxX Fφ(ξk ; δx |θ̂k) ⇔ Wynn’s algorithm [1970] with αk = 1k+1
➤ some CV results for Bayesian estimation [Hu 1998]➤ no general CV results for LS and ML estimation
some results when X is finite (X = {x (1), . . . , x (ℓ)}) [P 2009, 2010]
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 69 / 74
References
References I
Atkinson, A., Cox, D., 1974. Planning experiments for discriminating between models (with discussion).Journal of Royal Statistical Society B36, 321–348.
Atkinson, A., Fedorov, V., 1975. The design of experiments for discriminating between two rival models.Biometrika 62 (1), 57–70.
Atwood, C., 1973. Sequences converging to D-optimal designs of experiments. Annals of Statistics 1 (2),342–352.
Bates, D., Watts, D., 1980. Relative curvature measures of nonlinearity. Journal of Royal Statistical SocietyB42, 1–25.
Böhning, D., 1985. Numerical estimation of a probability measure. Journal of Statistical Planning andInference 11, 57–69.
Böhning, D., 1986. A vertex-exchange-method in D-optimal design theory. Metrika 33, 337–347.
Box, G., Hill, W., 1967. Discrimination among mechanistic models. Technometrics 9 (1), 57–71.
Chernoff, H., 1953. Locally optimal designs for estimating parameters. Annals of Math. Stat. 24, 586–602.
Clarke, G., 1980. Moments of the least-squares estimators in a non-linear regression model. Journal of RoyalStatistical Society B42, 227–237.
D’Argenio, D., 1981. Optimal sampling times for pharmacokinetic experiments. Journal of Pharmacokineticsand Biopharmaceutics 9 (6), 739–756.
Dette, H., Melas, V., 2011. A note on de la Garza phenomenon for locally optimal designs. Annals of Statistics39 (2), 1266–1281.
Fedorov, V., 1972. Theory of Optimal Experiments. Academic Press, New York.
Fedorov, V., Leonov, S., 2014. Optimal Design for Nonlinear Response Models. CRC Press, Boca Raton.
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 70 / 74
References
References II
Fisher, R., 1925. Statistical Methods for Research Workers. Oliver & Boyd, Edimbourgh.
Gauchi, J.-P., Pázman, A., 2006. Designs in nonlinear regression by stochastic minimization of functionnals ofthe mean square error matrix. Journal of Statistical Planning and Inference 136, 1135–1152.
Goodwin, G., Payne, R., 1977. Dynamic System Identification: Experiment Design and Data Analysis.Academic Press, New York.
Guerra, J., 2016. Optimisation multi-objectif sous incertitude de phénomènes de thermique transitoire. Ph.D.Thesis, Université de Toulouse.
Hamilton, D., Watts, D., 1985. A quadratic design criterion for precise estimation in nonlinear regressionmodels. Technometrics 27, 241–250.
Hill, P., 1978. A review of experimental design procedures for regression model discrimination. Technometrics20, 15–21.
Hougaard, P., 1985. Saddlepoint approximations for curved exponential families. Statistics & ProbabilityLetters 3, 161–166.
Hu, I., 1998. On sequential designs in nonlinear problems. Biometrika 85 (2), 496–503.
Kelley, J., 1960. The cutting plane method for solving convex programs. SIAM Journal 8, 703–712.
Kiefer, J., Wolfowitz, J., 1960. The equivalence of two extremum problems. Canadian Journal of Mathematics12, 363–366.
Ljung, L., 1987. System Identification, Theory for the User. Prentice-Hall, Englewood Cliffs.
Mitchell, T., 1974. An algorithm for the construction of “D-optimal” experimental designs. Technometrics 16,203–210.
Nesterov, Y., 2004. Introductory Lectures to Convex Optimization: A Basic Course. Kluwer, Dordrecht.
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 71 / 74
References
References III
Niederreiter, H., 1992. Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia.
Pázman, A., 1986. Foundations of Optimum Experimental Design. Reidel (Kluwer group), Dordrecht (co-pub.VEDA, Bratislava).
Pázman, A., 1993. Nonlinear Statistical Models. Kluwer, Dordrecht.
Pázman, A., Pronzato, L., 1992. Nonlinear experimental design based on the distribution of estimators.Journal of Statistical Planning and Inference 33, 385–402.
Pázman, A., Pronzato, L., 1996. A Dirac function method for densities of nonlinear statistics and for marginaldensities in nonlinear regression. Statistics & Probability Letters 26, 159–167.
Pázman, A., Pronzato, L., 2014. Optimum design accounting for the global nonlinear behavior of the model.Annals of Statistics 42 (4), 1426–1451.
Pronzato, L., 2009. Asymptotic properties of nonlinear estimates in stochastic models with finite design space.Statistics & Probability Letters 79, 2307–2313.
Pronzato, L., 2010. One-step ahead adaptive D-optimal design on a finite design space is asymptoticallyoptimal. Metrika 71 (2), 219–238, ( DOI: 10.1007/s00184-008-0227-y).
Pronzato, L., Pázman, A., 1994. Second-order approximation of the entropy in nonlinear least-squaresestimation. Kybernetika 30 (2), 187–198, Erratum 32(1):104, 1996.
Pronzato, L., Pázman, A., 2013. Design of Experiments in Nonlinear Models. Asymptotic Normality,Optimality Criteria and Small-Sample Properties. Springer, LNS 212, New York.
Pronzato, L., Walter, E., 1985. Robust experiment design via stochastic approximation. MathematicalBiosciences 75, 103–120.
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 72 / 74
References
References IV
Pronzato, L., Walter, E., 1988. Robust experiment design via maximin optimization. Mathematical Biosciences89, 161–176.
Pronzato, L., Zhigljavsky, A., 2014. Algorithmic construction of optimal designs on compact sets for concaveand differentiable criteria. Journal of Statistical Planning and Inference 154, 141–155.
Pukelsheim, F., 1993. Optimal Experimental Design. Wiley, New York.
Pukelsheim, F., Reider, S., 1992. Efficient rounding of approximate designs. Biometrika 79 (4), 763–770.
Schwabe, R., 1995. Designing experiments for additive nonlinear models. In: Kitsos, C., Müller, W. (Eds.),MODA4 – Advances in Model-Oriented Data Analysis, Spetses (Greece), june 1995. Physica Verlag,Heidelberg, pp. 77–85.
Silvey, S., 1980. Optimal Design. Chapman & Hall, London.
Titterington, D., 1976. Algorithms for computing D-optimal designs on a finite design space. In: Proc. of the1976 Conference on Information Science and Systems. Dept. of Electronic Engineering, John HopkinsUniversity, Baltimore, pp. 213–216.
Torsney, B., 1983. A moment inequality and monotonicity of an algorithm. In: Kortanek, K., Fiacco, A. (Eds.),Proc. Int. Symp. on Semi-infinite Programming and Applications. Springer, Heidelberg, pp. 249–260.
Torsney, B., 2009. W-iterations and ripples therefrom. In: Pronzato, L., Zhigljavsky, A. (Eds.), Optimal Designand Related Areas in Optimization and Statistics. Springer, Ch. 1, pp. 1–12.
Valenzuela, P., ROjas, C., Hjalmarsson, H., 2015. Uncertainty in system identification: learning from thetheory of risk. IFAC-PapersOnLine 48 (28), 1053–1058.
Vila, J.-P., 1990. Exact experimental designs via stochastic optimization for nonlinear regression models. In:Proc. Compstat, Int. Assoc. for Statistical Computing. Physica Verlag, Heidelberg, pp. 291–296.
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 73 / 74
References
References V
Vila, J.-P., Gauchi, J.-P., 2007. Optimal designs based on exact confidence regions for parameter estimation ofa nonlinear regression model. Journal of Statistical Planning and Inference 137, 2935–2953.
Walter, E., Pronzato, L., 1994. Identification de Modèles Paramétriques à Partir de Données Expérimentales.Masson, Paris, 371 pages.
Walter, E., Pronzato, L., 1997. Identification of Parametric Models from Experimental Data. Springer,Heidelberg.
Welch, W., 1982. Branch-and-bound search for experimental designs based on D-optimality and other criteria.Technometrics 24 (1), 41–28.
Wu, C., 1978. Some algorithmic aspects of the theory of optimal designs. Annals of Statistics 6 (6),1286–1301.
Wynn, H., 1970. The sequential generation of D-optimum experimental designs. Annals of Math. Stat. 41,1655–1664.
Yang, M., 2010. On de la Garza phenomenon. Annals of Statistics 38 (4), 2499–2524.
Yang, M., Biedermann, S., Tang, E., 2013. On optimal designs for nonlinear models: a general and efficientalgorithm. Journal of the American Statistical Association 108 (504), 1411–1420.
Yu, Y., 2010. Strict monotonicity and convergence rate of Titterington’s algorithm for computing D-optimaldesigns. Comput. Statist. Data Anal. 54, 1419–1425.
Yu, Y., 2011. D-optimal designs via a cocktail algorithm. Stat. Comput. 21, 475–481.
Zarrop, M., 1979. Optimal Experiment Design for Dynamic System Identification. Springer, Heidelberg.
Luc Pronzato (CNRS) Design of experiments in nonlinear models École ETICS, Porquerolles, 5 oct. 2017 74 / 74