randomized dimension reduction for full-waveform inversion · 2019. 9. 6. · slim impediments...
TRANSCRIPT
SLIMUniversityofBritishColumbia
FelixJ.Hermann
Randomized dimension reduction for full-waveform inversion
Released to public domain under Creative Commons license type BY (https://creativecommons.org/licenses/by/4.0).Copyright (c) 2010 SLIM group @ The University of British Columbia.
SLIMUniversityofBritishColumbia
FelixJ.Hermann,PeymanMoghaddam,andXiangLi
Randomized dimension reduction for full-waveform inversion
SLIM
ImpedimentsFull-waveform inversion (FWI) is suffering from
• multimodality, i.e,. a multitude of velocity models explain data
• local minima, i.e., requirement of an accurate initial model
• over- and underdeterminacy
Curse of dimensionality for d>2
• requires implicit (Helmholtz) solvers to address bandwidth
• # RHS’s makes computation of gradients prohibitively expensive
[Symes, ‘08]
SLIM
Wish listAn inversion technology that
• is based on a time-harmonic PDE solver, which is easily parallelizable, and scalable to 3D
• does not require multiple iterations with all data
• removes the linearly increasing costs of implicit solvers for increasing numbers of frequencies & RHS’s
• allows for a dimensionality reduction commensurate the model’s complexity
SLIM
Key technologiesNumerical linear algebra [Erlangga & Nabben,’08, Erlanga & FJH ’08-’09]
• multi-level Krylov preconditioner for Helmholtz
Simultaneous sources
• supershots
Stochastic optimization & machine learning [Bersekas, ’96]
• stochastic gradient decent
Compressive sensing [Candès et.al, Donoho, ’06]
• sparse recovery & randomized subsampling
[Beasley, ’98, Neelamani et. al,, ’08]
[Krebs et.al., ’09, Operto et. al., ’09, FJH et.al., ’08-10’]
SLIM
Full-waveform inversionSingle-source / frequency PDE-constrained optimization problem:
minu!U ,m!M
12!p"Du
!!2
2subject to H[m]u = q
p = Single-source and single-frequency dataD = Detection operatoru = Solution of the Helmholtz equationH = Discretized monochromatic Helmholtz systemq = Unknown seismic sourcem = Unknown model, e.g. c!2(x)
SLIM
Unconstrained problem
For each separate source q solve the unconstrained problem:
with q a single source function and
• Gradient updates involve 3 PDE solves• Increased matrix bandwith of Helmholtz discretization makes scaling to
3D challenging for direct methods...
F [m,q] = DH!1[m]q
minm!M
12!p" F [m,q]!2
2
SLIM
Preconditioner
Clustering around one
!(H) !(HM!1) !(HM!1Q)
[Erlanga, Nabben, ’08][Erlanga and F.J.H, ‘08]
M!! Q
• shifts the eigenvalues to positive half plane - solves indefiniteness)
• clusters eigenvalues near one - solves ill-conditioning
SLIM
Scaling
0 5 10 15 20 25 300
50
100
150
200
250
300
Frequency, Hz
MKMGMG
Number of iterationNumber of matrix-vec multiplies
10 15 20 25 30106
107
108
109
1010
Frequency, Hz
Unit
mem
ory
Direct method, LU
Iterative method
751! 201 2001! 5341501! 401 �"#$%"&ÿ(
SLIM
Example: Marmousi
x−axis, meter
dept
h, m
eter
0 1000 2000 3000 4000 5000 6000 7000
0
500
1000
1500
2000
2500
x−axis, meter
dept
h, m
eter
0 1000 2000 3000 4000 5000 6000 7000
0
500
1000
1500
2000
2500
Hard Model1500-4000 m/s
Smooth Model
SLIM
Simulations
grid in x−direction
grid
in d
epth
dire
ctio
n
Real part of u, freq = 10 Hz, 9 grid/wavelength
50 100 150 200 250 300 350
20
40
60
80
100
grid in x−direction
grid
in d
epth
dire
ctio
n
Real part of u, freq = 10 Hz, 18 grid/wavelength
100 200 300 400 500 600 700
50
100
150
200
SLIM
Convergence
0 5 10 15 20 25 300
10
20
30
40
50
Frequency, Hz
No. I
tera
tion
Hard modelSmooth model
SLIM
ObservationsImplicit solver that
• converges for high frequencies
• scales & embarrassingly parallelizable
• same of order complexity as TDFD
• trivial implementation of imaging conditions
But it’s complexity grows linearly with the number of frequencies and sources...
SLIM
Full-waveform inversionMultiexperiment PDE-constrained optimization problem:
P = Total multi-source and multi-frequency data volumeU = Solution of the Helmholtz equationH = Discretized multi-frequency Helmholtz systemQ = Unknown seismic sources
minU!U ,m!M
12!P"DU
!!2
2,2subject to H[m]U = Q
SLIM
0 50 100 150 200 250!5
!4
!3
!2
!1
0
1
2
3
4 x 104
Gridpoints in x!directionGridpoints
Gridpoints
50 100 150 200 250
20
40
60
80
100
120
simultaneous source Randomized amplitudes along the shot line
Simultaneous sources
Randomized superposition of sequential source functions creates a supershot
[Beasley, ’98, Neelamani et. al,, ’08, Krebs et.al., ’09, Operto & Virieux, ’09-’10, FJH et.al., ’08-10’]
SLIM
Gridpoints
Gridpoints
50 100 150 200 250
20
40
60
80
100
120
GridpointsGridpoints
50 100 150 200 250
20
40
60
80
100
120
sequential sourcewavefield
Simultaneous shotat 5 Hz
simultaneous sourcewavefield
Notice the increased wavenumber contend
SLIM
Image
Increased wavenumber contend leads to improved image/gradient updates ...
50 100 150 200 250
20
40
60
80
100
120 −2
−1.5
−1
−0.5
0
0.5
1
1.5x 10−6
50 100 150 200 250
20
40
60
80
100
120 −5
0
5
10x 10−6
sequential sourceimage
simultaneous sourceimage
SLIM
Supershots
H-1
RM =
sub sampler! "# $%
&&&&&&'
R!1 ! I!R"
1
...
R!ns! ! I!R"
ns!
(
))))))*
random phase encoder! "# $+F!
2 diag+ei!
,! I
,F3, (3)
where F2,3 are the 2,3-D Fourier transforms, and where ! = Uniform([0, 2!]) is a random
phase rotation. Notice that the F2 and phase rotations act along the source/receiver coor-
dinates. Application of this CS-sampling matrix, RM, to the original source wavefields in
s turns these single shots into a subset (n"s " ns) of time-harmonic simultaneous sources
that are randomly phase encoded and that have for each simultaneous shot a di!erent set of
angular frequencies missing—i.e., there are n"f " nf frequencies non-zero (see Figure 2(a)).
Because seismic data is bandwidth limited, we sample with a probability that is weighted
by the power spectrum of the source wavelet. The advantage of this implementation is that
it is matrix-free, fast, and it turns interferences into harmless noise (see Figure 2(b)).
The sparsfying transform: Aside from proper CS sampling the recovery from simulta-
neous simulations depends on a sparsifying transform that compresses seismic data, is fast,
and reasonably incoherent with the CS sampling matrix. We accomplish this by defining
the sparsity transform as the Kronecker product between the 2-D discrete curvelet trans-
form (Candes et al., 2006) along the source-receiver coordinates, and the discrete wavelet
transform along the time coordinate—i.e., S := C !W with C, W the curvelet- and
wavelet-transform matrices, respectively.
8
n!s ! ns
DQ DQRM
SLIM
Dimensionality reduction
Reduced system
!"#
"$
Q = D! s%&'(single shots
HU = Q!"
!#
$
Q = D! RMs% &' (simul. shots
HU = Q
P:=RMDU=DU
minU!U ,m!M
12!P"DU!2
2 subject to H[m]U = Q
[Neelamani et.al., ’08, Krebs et. al., ’09, FJH et. al., ’08-’10]
SLIM
Reduced gradientReplace gradient updates for all sequential sources:
by
• N’<<N number of multifrequency simultaneous experiments
• creates incoherent “Gaussian” simultaneous-source crosstalk
mk+1 := mk ! !k
N!
i=1
"f(mk,qi) with qi := ithCol(Q)
mk+1 := mk ! !k
N !!
i=1
"f(mk,qi) with qi := (RM)i Q
SLIM
Stochastic optimization
Stochastic “batch” gradient decent :
• for , the updates become deterministic
• prohibitively expensive
n!"
[Bersekas, ’96, Nemirovski, ’09]
mk+1 := mk ! !k1n
n!
i=1
"f(mk,qi) with qi := (RM)i Q
[Robbins and Monro, 1951]
SLIM
Stochastic Average Approximation (SAA)
Approximate expectation with ensemble average
• for becomes equality
• well studied and known as Monte Carlo sampling
• slow but embarrassingly parallel
E{f(m, q)} ! 1N
N!
i=1
minm f(m,qi) with qi := (RM)i Q
n!"
[Bersekas, ’96, Nemirovski, ’09]
SLIM
Renewals
Use different supershots for each (gradient) update:
‣ uses different random RM for each iteration
‣ cheap but introduces more “noise” and does not converge...
[Krebs et.al, ’09]
mk+1 := mk ! !k"f(mk,Qk) with Qk := (RM)k Q
SLIM
Stochastic Approximation (SA)
Stochastic “online” gradient descent with mini batches :
[Bersekas, ’96]
mk+1 = mk ! !k"f(mk,Qk) with Qk := (RM)k Q
mk+1 =1
k + 1
!k"
i=1
mi + mk+1
#
‣ averages over the past iterations and converges
‣ reasonable well understood
‣ not understood for (quasi) Newton source: http://en.wikipedia.org/wiki/Stochastic_gradient_descent
SLIM
offset (m)
dept
h (m
)
0 500 1000 1500 2000 2500 3000 3500 4000 4500
0
500
1000
1500
2000
25002
2.5
3
3.5
4
4.5
offset (m)
dept
h (m
)
0 500 1000 1500 2000 2500 3000 3500 4000 4500
0
500
1000
1500
2000
2500
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Marmoussi modeloriginal model initial model
SLIM
offset (m)
dept
h (m
)
0 500 1000 1500 2000 2500 3000 3500 4000 4500
0
500
1000
1500
2000
2500
1.5
2
2.5
3
3.5
4
4.5
5
5.5
offset (m)
dept
h (m
)
0 500 1000 1500 2000 2500 3000 3500 4000 4500
0
500
1000
1500
2000
2500
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Full-waveform inversionrecovered model
l-BFGSrecovered
stochastic gradient
SLIM
Speed upFull scenario:
‣ 113 sequential shots with 50 frequencies
‣ 18 iterations of l-BFGS (90=5*18 Helmholtz solves)
Reduced scenario:
‣ 16 randomized simultaneous shots with 4 frequencies
‣ 40 iterations of SA (2.27=16*4/(113*50)*40*5 solves)
Speed up of 40 X or > week vs 8 h on 32 CPUs
SLIM
Observations
SAA can be applied to supercharge FWI
But it
• requires care with line searches
• does not extend to Newton methods
• is relatively poorly understood mathematically
What does Compressive Sensing have to offer?
SLIM
Compressive recovery
Consider “Newton” updates of the reduced system as sparse recovery problems
• feasible because of the reduced system size
• imposes transform-domain sparsity on the updates
• corresponds to sparse linearized inversions
SLIM
Sparse linearized inversion
Invert adjoint of the Jacobian, i.e., linearized Born approximation
with obtained via sparse inversion
where
!mk = SH x
!dk = K[mk,Q]!mk with !dk = vec(P!F [mk!Q])
A := RMKSH = KSH and b = !dk := RM!dk
x = arg minx
||x||1 subject to b = Ax
SLIM
Initial model
Lateral ( × 15 meters)
Dept
h ( ×
15
met
ers)
50 100 150 200 250
20
40
60
80
100
1202000
2500
3000
3500
4000
4500
SLIM
Lateral ( × 15 meters)
Dep
th ( ×
15
met
ers)
50 100 150 200 250
20
40
60
80
100
120 −40
−20
0
20
40
60
80
Dep
th ( ×
15
met
ers)
Lateral ( × 15 meters)
50 100 150 200 250
20
40
60
80
100
120−40
−20
0
20
40
60
80
true reflectivity sparse recovery with wavelets
Linearized sparse inversion
30 simultaneous shots 10 random frequencies
SLIMD
epth
( ×
15
met
ers)
Lateral ( × 15 meters)
50 100 150 200 250
20
40
60
80
100
120−40
−20
0
20
40
60
80
Lateral ( × 15 meters)
Dep
th ( ×
15
met
ers)
50 100 150 200 250
20
40
60
80
100
120 −40
−20
0
20
40
60
Linearized sparse inversion
20 simultaneous shots 10 random frequencies
true reflectivity sparse recovery with wavelets
SLIMD
epth
( ×
15
met
ers)
Lateral ( × 15 meters)
50 100 150 200 250
20
40
60
80
100
120−40
−20
0
20
40
60
80
Lateral ( × 15 meters)
Dep
th ( ×
15
met
ers)
50 100 150 200 250
20
40
60
80
100
120 −40
−20
0
20
40
60
80
Linearized sparse inversion
10 simultaneous shots 5 random frequencies
true reflectivity sparse recovery with wavelets
SLIM
Subsample ratio 0.015 0.006 0.002
n!f/n!
s recovery error (dB)
5 17.44 (1.32) 11.66 (0.78) 6.83 (-0.14)1 17.53 (1.59) 11.89 (1.05) 7.19 (0.15)0.2 18.22 (1.68) 12.11 (1.32) 7.46 (0.27)
Speed up (!) 66 166 500
Linearized sparse inversion
errors for “migration” in parentheses
SLIM
Observations
Reconstruct model updates
‣ from randomized subsamplings
‣ with correct amplitudes (like Newton updates)
Removed the “curse of dimensionality” by reducing the number of RHS’s
Recovery quality depends on degree of subsampling
What about stochastic optimization?
SLIM
Lateral ( × 15 meters)
Dep
th ( ×
15 m
eter
s)
50 100 150 200 250
20
40
60
80
100
120 −20
−10
0
10
20
30
Reduced batch
10 X
SLIM
Lateral ( × 15 meters)
Dep
th ( ×
15 m
eter
s)
50 100 150 200 250
20
40
60
80
100
120−20
−10
0
10
20
30
Stochasticmini batch
66 X
SLIM
Lateral ( × 15 meters)
Dep
th ( ×
15 m
eter
s)
50 100 150 200 250
20
40
60
80
100
120 −20
−10
0
10
20
30
Stochastic average approximation
66 X
SLIM
Stochastic mini batch
Algorithm 1: FWI without renewal of CS experiments.Result: Estimate for the model !mm !" m0 ; // initial model
j !" 0 ; // loop counter
Q!" (RM)Q ; // Draw random sim. shot
while #P"F [m,Q]#22,2 $ ! doj := j + 1; // increase counter
A!" K[m,Q]S! ; // Calculate Jacobian
"P = P"F [m,Q] ; // Calculate residual
"!x = arg min!x #"x#"1 s.t. #"P"A"x#2 % # ; // Sparse recovery
m!"m + S!"!x ; // Compute model update
end
SLIM
SA mini batch
Algorithm 1: FWI with renewal of CS experiments.Result: Estimate for the model !mm !" m0 ; // initial model
j !" 0 ; // loop counter
Q!" (RM)iQ ; // Draw random sim. shot
while #P"F [m,Q]#22,2 $ ! doj := j + 1; // increase counter
A!" K[m,Q]S! ; // Calculate Jacobian
"P = P"F [m,Q] ; // Calculate residual
"!x = arg min!x #"x#"1 s.t. #"P"A"x#2 % # ; // Sparse recovery
m!"m + S!"!x ; // Compute model update
Q!" (RM)iQ ; // Draw random sim. shot
end
SLIM
Lateral ( × 15 meters)
Dept
h ( ×
15
met
ers)
50 100 150 200 250
20
40
60
80
100
1201500
2000
2500
3000
3500
4000
4500
Lateral ( × 15 meters)
Dept
h ( ×
15
met
ers)
50 100 150 200 250
20
40
60
80
100
120 1500
2000
2500
3000
3500
4000
4500
5000
with renewal(SNR = 20.8)
without renewal(SNR = 19.9)
Update with sparse recoveries
10 iterations
SLIM
Lateral ( × 15 meters)
Dep
th ( ×
15 m
eter
s)
50 100 150 200 250
20
40
60
80
100
120−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
x 10−5
Lateral ( × 15 meters)
Dep
th ( ×
15 m
eter
s)
50 100 150 200 250
20
40
60
80
100
120−1.5
−1
−0.5
0
0.5
1
1.5x 10−5
1st update 9th update
Updates
SLIM
Conclusions
Reconstruct “Newton-like” updates from randomized subsamplings
Remove the “curse of dimensionality”
Algorithms have parallel pathways
Results are encouraging but rigorous theory still lacking
SLIM
AcknowledgmentsThis work was in part financially supported by the Natural Sciences and Engineering Research Council of Canada Discovery Grant (22R81254) and the Collaborative Research and Development Grant DNOISE (334810-05).
This research was carried out as part of the SINBAD II project with support from the following organizations: BG Group, BP, Petrobras, and WesternGeco.
Further readingCompressive sensing
– Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information by Candes, 06.
– Compressed Sensing by D. Donoho, ’06Simultaneous acquisition
– A new look at simultaneous sources by Beasley et. al., ’98.– Changing the mindset in seismic data acquisition by Berkhout ’08.
Simultaneous simulations, imaging, and full-wave inversion:
– Faster shot-record depth migrations using phase encoding by Morton & Ober, ’98.– Phase encoding of shot records in prestack migration by Romero et. al., ’00.
– Efficient Seismic Forward Modeling using Simultaneous Random Sources and Sparsity by N. Neelamani et. al., ’08.– Compressive simultaneous full-waveform simulation by FJH et. al., ’09.
– Fast full-wavefield seismic inversion using encoded sources by Krebs et. al., ’09– Randomized dimensionality reduction for full-waveform inversion by FJH & X. Li, ’10
Stochastic optimization and machine learning:
– A Stochastic Approximation Method by Robbins and Monro, 1951– Neuro-Dynamic Programming by Bersekas, ’96
– Robust stochastic approximation approach to stochastic programming by Nemirovski et. al., ’09
– Stochastic Approximation and Recursive Algorithms and Applications by Kushner and Lin– Stochastic Approximation approach to Stochastic Programming by Nemirovski