curvature-based spatially adaptive sampling of...

17th International Symposium on Applications of Laser Techniques to Fluid Mechanics Lisbon, Portugal, 07-10 July, 2014

- 1 -

Curvature-based spatially adaptive sampling of heteroskedastic data

with application to Laser Doppler Velocimetry

Jesse S. Kadosh1, Raf Theunissen1,*

1: Department of Aerospace Engineering, University of Bristol, Bristol, UK

* Correspondent author: [email protected]

Abstract Spatially varying signals are typically sampled by collecting uniformly spaced samples irrespective of the signal content. For signals with inhomogeneous information content, this leads to unnecessarily dense sampling in regions of low interest or insufficient sample density at important features (or both). A new adaptive sampling technique is presented directing sample collection in proportion to local information content, capturing adequately the short-period features while sparsely sampling less dynamic regions. The proposed method incorporates a data-adapted sampling strategy on the basis of signal curvature, sample space-filling, variable experimental uncertainty and iterative improvement. Numerical assessment has indicated a reduction in number of samples required to achieve a predefined uncertainty level overall while improving local accuracy for important features. The potential of the proposed method has been further demonstrated on the basis of Laser Doppler Velocimetry experiments examining the wake behind a disk and the velocity gradient near the wall in a turbulent boundary layer. 1. Introduction The construction of a model to serve as surrogate for a physical response is the expression of an archetypal human aspiration: to reach the destination without pursuing the journey to get there. For physical models, this is to characterise the response— its values and the certainty with which they are known— without traversing the response landscape at infinitely many points. This impossible ideal can be converted into a realisable ambition if we agree to traverse part of the journey and admit the possibility of incomplete knowledge at the destination. Surrogate models are built by testing sample points of the target response to provide a heuristic over the whole domain, albeit subject to uncertainty; using all information available to ensure the new sample points are placed where they give the most new information moves towards the ideal. In the engineering community, the Nyquist sampling criterion is well-known and dictates the required distance between neighbouring sample points on the basis of the maximum occurring signal frequency. In order to capture the features of a sinusoid of frequency B in a domain of length L, for example, the minimum required sampling rate is 2B. This represents an ideal case with perfect placement of all samples in a uniform manner with spacing of L/2B. In case of inhomogeneous distribution of information such an approach may, however, lead to inadequate sampling and accompanying misuse of computational and acquisition effort. Previous attempts in adaptive sampling have demonstrated potential benefits. In PIV image processing Theunissen et al. (2007) attributed higher sampling densities to areas of higher variance in velocity and achieved higher spatial resolution. However, adaptivity criteria were inherently defined by first order gradients while discretization requires denser sampling in regions of stronger curvature due to nonlinearity of the signal. Existing adaptive techniques have focused on the balance between space-filling and local refinement. A successful method closely related to the one presented here is due to Mackman and Allen (2010), which consults the Laplacian to guide local refinement and models the spacing between samples with an interpolant. While such methods offer drastic improvements in the case of deterministic data, the introduction of experimental noise presents special challenges as random fluctuations contaminate correct measures of spatial curvature, and as regions of high and low uncertainty are treated without account of their differences. The approach presented herein alleviates constraints on sampling imposed by spurious curvature measures by including the local measurement error as a heuristic driving the sampling distribution. Moreover, the formulation of the error objective used here admits heteroskedastic error profiles: rather than considering only a domain-wide scalar value for uncertainty, locally varying error information can be exploited to focus more samples in regions of higher uncertainty or volatility, such as regions of unsteady flow. In an iterative manner the number of attributed samples is incremented using a Radial Basis Function interpolant as artificial target signal from which the adaptivity criteria are derived. The conduciveness of the presented


- 2 -

methodology is assessed on the basis of computer simulated signals and a one-dimensional LDV experiment. 2. Methodology 2.1 Surrogate Model Radial basis function (RBF) interpolation is presented in detail in Fasshauer (2007) and polynomial interpolation is discussed in Fox (1997). In comparison with other interpolation schemes such as splines, polynomial response models, and kriging, RBFs offer several advantages: an RBF interpolant is conceptually simple in construction, meshless (data can be arbitrarily spaced), easily extendible to higher dimensions, and provides models of arbitrary smoothness (Wendland 2005). Developed first (hardy 1971) to reconstruct complex geographical landscapes, RBFs typically out-perform polynomials in terms of reconstruction accuracy (Hussain et al. 2002); and even the ability of polynomials to model simple, convex responses of low dimensionality (Hussain et al. 2002, Paiva 2009) (which resemble the polynomial itself) can be subsumed into the RBF approach by including polynomial terms. If the field of application is particularly constrained, it is possible to achieve very convincing predictions of the response by adjusting model parameters as in Kriging; but the resulting model is tailored closely to a specific type of response, and the parameters so tuned are unsuitable for experiments with very different response shapes or scales. The greater the number of parameters in a surrogate model, the closer is the link between the physics of the particular response being modeled and the model itself; the advantage, when Kriging, is a model with impressive predictive ability but at the price of limited applicability to different response cases. Therefore, where parameters are necessary, their values should be either expressed in terms of other quantities (allowing adaptivity for different experiments) or fixed for all cases. For a wide range of applications, then, the remaining parameters ideally become robust constants rather than demanding case-specific tuning. The response of a RBF-cum-polynomial interpolant s of a signal f, using N basis functions can be formulated as

)x()x(1

, i

N

jjiji ps +=∑

=

φβ (1)

Here ji ,φ is the contribution of the jth basis function evaluated at location ix . As an interpolant, s recovers

the signal exactly at all data sites while a polynomial )x( ip is added to aid in recovery of the signal at locations between the data sites. Available basis functions are many and varied, with popular choices including thin plate splines (

rr log2=φ ), cubic ( 3r=φ ), quintic ( 5r=φ ), Gaussian ( )2/exp( 22 σφ r−= ), multiquadric (22 σφ += r ), and others (Forrester et al 2009, Hussain et al. 2002) In the current method, the fourth-

order-continuous radial basis function defined by Wendland (2005) is used in order to allow for a continuous and smooth analytic Laplacian. Wendland’s basis functions are of minimal polynomial degree (hence also minimum expense) for a given smoothness; moreover, they are compactly supported, ensuring the basis function is positive definite. This last property ensures solubility of the interpolation problem and also gives Wendland RBFs their relatively better matrix conditioning compared with other basis functions (Fornberg et al. 2002) as the number of samples or the support radius are increased. Wendland’s fourth-order-continuous RBF for up to a maximum of three spatial dimensions is given by

)31835()1()( 26 ++−= + rrrrφ (2) where ( )+⋅ denotes the cut-off function such that

⎩⎨⎧

<−

≥−−=− + 010

011)1(

rrr

r (3)

The contribution from each basis site to the interpolant depends on the distance scaled by a support radius R which determines the region of influence of each basis function:

Rr ixx −= (4)

Optimum selection of the support radius value has been a topic of ongoing discussion for many years with


- 3 -

authors suggesting sophisticated parameter estimation schemes to suit each individual response (Rippa 1999, Sheuerer 2011) and others proposing to use a constant value that is robust over most responses (Mackman and Allen 2010). Finding a single appropriate support radius to suit the spacing of samples—known as stationary interpolation— is simple in cases where samples are spaced with little variation from uniformity: for instance, the supremum nearest neighbour distance can be used as R (Mackman and Allen 2010). The shortcoming of this approach is that there is but one value of R for all sample spacings; if samples are clustered densely in certain regions— as is the goal of adaptive sampling— then the single value of R based on supremum nearest neighbour distance will be ill-suited except in the sparsest regions. When too large a value of R is used, matrix conditioning suffers and the RBF interpolant exhibits spurious overshoots reminiscent of the Gibbs phenomenon for polynomial interpolants. Using the infimum nearest neighbour distance instead guarantees no such oscillations will occur, but this comes at the expense of poor accuracy in regions of low sample density. Setting a constant 1=R has been found previously to yield good results as each basis function can contribute throughout the entire domain; but this also comes at the price of spurious Gibbs-like oscillations and poorer matrix conditioning. Experience has shown that 5.0=R is a robust choice for a wide variety of responses. This removes the expense of tuning R for each case or for each distribution of samples, and also strikes a balance between good accuracy throughout the domain and freedom from spurious oscillation phenomena and matrix conditioning problems. Using RBF interpolants with compactly supported basis functions of minimal degree for a given order of continuity offers numerical advantages. In particular, the availability of analytic derivatives of the RBF interpolant can be inexpensively computed without recalculating the interpolation coefficients is exploited by the proposed method. Interpolants are neither restricted to equidistant sample locations nor require any partitioning of the domain in a case-specific way. 2.2 Sampling Objective Function Having defined the model s of the target signal, an objective function J is used to decide which locations are most promising for new samples. The objective function is the product of several factors each representing a different aim in guiding the sampling scheme. A first estimate of the heuristic for measuring curvature is given by the function’s Laplacian JL= ∇2s. A smooth signal, such as might be obtained in deterministic (computer) experiments with no real noise, may have a smooth Laplacian, but physical measurements come with stochastic errors in the form of noise. These give rise to a model Laplacian which is erratic and not always true to that of the underlying function in the signal. For this reason the new adaptive method blends the Laplacian with a measure JΔ of local signal change relative to the wider signal environment. This factor has the useful tendency of approximating the true signal’s Laplacian even in the presence of noise though it makes no use of the RBF model’s analytic derivatives. The curvature objective factor JC is the smoothed product of LJ and ΔJ normalised onto [0, 1]:

Δ= JJJ LC (5) To encourage sampling of the domain toward a space-filling scheme, sample separation is modeled as an RBF interpolant (without a polynomial) after Mackman and Allen (2010). The spacing objective hJ is zero at existing sample locations and grows away from sample sites toward a value of unity.

)xx(1)x(1

ii ∑=

−−=N

jhh j

J φβ where 0)x( i =hJ for ji = (6)

To ensure exclusion of existing sample sites, the overall sampling objective also includes an exclusion factor Jho, which represents the limit of Jh with an infinitesimal support radius, i.e. essentially a Dirac rake with zeros at the sample sites and unity elsewhere. A significant expansion beyond existing adaptive sampling approaches is the added consideration of experimental error/uncertainty with the factor JE, which consists of an RBF interpolant to the error bars of existing data:

)xx()x(1

ii ∑=

−=N

jEE j

J φβ (7)

When samples are collected with information about their measurement uncertainty (represented as error bars), it is sensible to account for this when prescribing new sample sites. Extending sampling adaptivity in


- 4 -

this way can focus on regions with larger experimental error with the aim of either gaining a more accurate model in the region of high uncertainty by obtaining more exploratory points or of better characterizing (reducing) the uncertainty by repeated sampling. Account is taken of the extent to which previous samples change the model of the signal through the parameter JI, which quantifies the change of the model z(t-1) from the previous iteration to the current model z(t) at every interrogation site xi

( ) ( )

( ) 1)x(z

)x(z)x(z)x(

i1-t

i1-titi

+

−=IJ (8)

This demonstrated improvement factor is inspired by the implicit assumption that a site at which a great improvement (change) in the signal model has taken place is more likely to lie within an interval of interesting signal features than at its fringes. The objective factors are combined to yield the overall sampling objective function which forms the basis upon which to determine new sampling sites:

( ) ( ) ( ) ( ) hoIIEECChh JJJJJJ ⋅+⋅+⋅+⋅+= κκκκ (9) where κi are empirical offset constants which prevent their associated factors Ji from dominating the overall objective function in cases where Ji≈0. Rather than selecting the highest value in J as was done in prior adaptive methods, the new sampling location is decided based on the width and amplitude of all peaks exceeding a threshold value. The threshold is itself based on the mean value at the peaks of J and is in this sense fully adaptive to the individual signal. The entire process of evaluating the objective function and choosing a new sample site is repeated, increasing the total number of samples and updating J iteratively until ‘enough’ samples have been collected. 2.3 Convergence Criteria for Automatic Stopping The method described to this point determines new sample sites of interest and updates the surrogate model as new sample data are acquired; however, like recent closely related adaptive sampling techniques (Mackman and Allen 2010) applied to contexts without truly stochastic errors (e.g., CFD data), it does not come with criteria for deciding when to stop acquiring new samples. Instead, typically the number of initial non-adaptive samples and the number of adaptive sampling iterations is chosen at the outset. It has been suggested (Mackman and Allen 2010) that sampling continue until one of two conditions is met: (a) the predetermined budget of points is consumed, or (b) the difference between the surrogate model and the known function reaches a threshold value that is acceptable. In the former case, while the resulting surrogate model may capture a response more accurately than a full factorial sample scheme would have done, it will only achieve this at the same expense in terms of number of samples. The second condition is limited to use in benchmark testing of the sampling method rather than with real experiments because the true function needs to be known before a comparison with the current surrogate model is possible. The present work addresses this shortcoming, rendering the adaptive sampling method capable of complete automation. Related methods of constructing surrogate models, such as Kriging (Forrester et al. 2009), could use the Kriging variance (mean squared error, MSE)— a statistical estimate of uncertainty— as a stopping criterion. MSE provides uncertainty estimates at any point xi, so could be summed over the domain to produce an overall value to be monitored for convergence. The analogous uncertainty estimate for RBFs is given by the product of the native space norm with the power function (Fasshauer 2007, Jakobsson et al. 2009). The native space norm has been viewed as a measure of how ‘bumpy’ the response is, while the power function accounts for the effects of sample distribution and RBF parameters. Their product provides an upper bound for modeling error (Jakobsson et al. 2009). However, MSE and its RBF equivalent remain abstract estimates of uncertainty which take into account only modeling error and lack any local dependence on response values (Mackman et al. 2013) or on the heteroskedastic experimental uncertainty at different points. Moreover, expressions for MSE and the RBF power function do not lend themselves to subsuming other information, most notably omitting the experimental uncertainty associated with each data point. Instead, two factors whose physical meanings are clear have been crafted based on the aims embodied in the sampling objective itself. This furnishes a convergence value which can be used as a threshold against which the improvement factor IJ is compared. To ensure that the process halts due to progressive convergence rather to a single iteration which happens to meet the threshold, the present method requires the threshold to


- 5 -

be met for a number of consecutive iterations; if the average IJ over the last CCIN iterations falls below the threshold θ then the stopping criterion is satisfied and the algorithm terminates. The sampling objective function in this work includes two particularly salient emphases: regions of high uncertainty relative to response values; and regions where the response is relatively dynamic, e.g., as measured by the curvature factor. Recasting these elements to suit convergence assessment, we define the nondimensional error factor as

minmax yy −=

εα (10)

where ε is the mean of the absolute sample uncertainties and maxy and miny are the maximum and minimum sample responses, respectively. The contrast is defined as

( ) ( )( ) ( )

minmax

minmax

peakpeak

peakpeak

YY

YY

Δ+Δ

Δ−Δ=β (11)

where peakYΔ is the list of peak values in the graph of YΔ :

smoothssY −=Δ (12) The aim of the contrast β is to quantify how strongly the sampling algorithm will be attracted to regions of high contrast, focusing for several iterations on a highly peaked region of the response before venturing into regions with subtler features. Peli (1990) outlined the need for assessing local contrast rather than using a global contrast value in the study of human visual image detection; a similar rationale applies to the ability of the present algorithm to discern features in an experimental response. Circumventing wherever possible any case specificity in the present sampling method, the contrast does not assess the model response s itself but rather YΔ , where the smoothed model response smooths has been subtracted as a more agile form of background removal than simply subtracting the overall response mean s . However, unlike the human visual system, the present sampling method is not differentially sensitive to regions of ‘light’ and ‘dark’ (high and low response values) so the classic Michelson (1927) contrast definition providing a single, maximum contrast value for the whole domain is sufficient as this was computed in a way that accounts for local variation. Treating peaks and troughs on an equal footing as features away from the local background, the absolute value is taken. The resulting form of β spans [0,1], with zero indicating low contrast (i.e., the smallest peak feature is comparable in magnitude to the largest peak feature) and unity indicating high contrast (i.e., the smallest peak feature magnitude is negligible compared with the largest peak feature). The nondimensional error factor and the contrast are combined in a 1:1 ratio whose value controls a sliding scale for the automatic calculation of the convergence threshold:

( ) ⎥⎦

⎤⎢⎣

⎡⎟⎠

⎞⎜⎝

⎛ ++⎟⎟

⎠

⎞⎜⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛ +−+=

+ 2211 βα

θβα

θθθ finecoarsetol (13)

where tolθ , coarseθ , and fineθ are constants matched to the nondimensional improvement factor IJ discussed earlier with the objective function, and ( )+⋅ is the cut-off function defined above. Having quantified the convergence criterion threshold, the required number of consecutive converged iterations CCIN is

⎥⎥

⎤⎢⎢

⎡⎟⎠

⎞⎜⎝

⎛ ++=

21 βα

CCICCI KN (14)

where CCIK is a constant for all cases and ⎡ ⎤⋅ denotes the natural number ceiling. Since CCIN is determined adaptively for a given response rather than prescribed as a case-specific parameter, the method is completely automated. Figure 1 shows the convergence history for an example test case of two Gaussians on a flat background with a heteroskedastic error envelope (cf. Figure 2). This illustrates how the convergence criteria adapt as more is learned about the response. The first eight adaptive samples refine regions of the response which turn out to be well known already; but the ninth sample discovers a new feature, leading to


- 6 -

substantial improvement IJ well above the threshold θ (although the threshold also adjusts slightly upward when the new feature is found). The new feature in the surrogate model changes the contrast and nondimensional error assessments, resulting in reduction of the required number of consecutive converged iterations from eight to seven. Whereas seven converged iterations were insufficient earlier to declare an end to the sampling (when 8=CCIN ) earlier, seven converged iterations are sufficient after the discovery of the new feature since now 7=CCIN .

Figure 1: Convergence history for the double-Gaussian example

case (cf. Figure 2).

3. Results A number of simple one-dimensional test cases demonstrate the versatility and robustness of the new adaptive sampling method. This section presents a canon of synthetic experiments which pose important challenges to the scheme. In all test cases, a small initial grid of uniformly spaced samples is provided and the adaptive method proceeds to add one new sample per iteration until the automatic convergence criterion is satisfied. In each test case the surrogate model obtained with adaptive sampling is assessed against that obtained from uniform (full factorial) sampling. 3.1 Assessment Measures How well a surrogate model represents the target response can be evaluated differently depending on which aspects of the response are most important for a given application. How accurately a surrogate model reconstructs a smooth response for which the analytic form is known can be assessed simply by computing the difference between the known test response and the surrogate model at a set of test points; this is known as the residual, given by

)x()x()x( iii fs −=δ (15) The residual reveals model accuracy in different regions (Hussain et al. 2002, Mackman and Allen 2010). Converting this into a global measure of modeling error over the entire domain for a given test case, we can compute the root mean-square error (RMSE):

( )∑=

=N

iNRMSE

1

2i )x(

1δ (16)

RMSE suffices as a metric as long as there are no peculiar features of a test case which are not accounted for by simple accuracy; hence RMSE is the standard method for evaluating the performance of surrogate models


- 7 -

where the only error source is deterministic (modeling) error (Forrester et al 2007, Mackman and Allen 2010). Notwithstanding its usefulness in cases where all that matters is the absolute difference between the model and true response values, RMSE cannot properly assess model performance in cases where experimental uncertainty plays a role. Furthermore, as it gives a global assessment of squared residuals, it ignores the fact that a model may have mediocre performance in mundane regions of a response and good performance in capturing the most important features in the response, e.g., regions with large curvature. The importance of these considerations will be demonstrated with an LDV application later in this paper. To gain further insight than RMSE on its own allows, two other measures of performance are introduced: the quality value ξQ defined as

∑=

=N

iiN

Q1

1ξξ (17)

where ξ is a quantity of interest and N is the number of test points ix over which the quality is computed. In the present work, the quality is calculated for experimental error (uncertainty) and for curvature based on the values of the error and Laplacian objectives EJ and ΔJ , respectively. While the quality cannot be compared between different test cases, it can be compared within the same test case for different surrogate models or methods. Within a given response case, a surrogate model with larger quality value better captures the quantity of interest. 3.2 Performance in Synthetic Test Cases Figure 2 depicts the case of two Gaussian functions constant error throughout most of the domain except for a bulge at x = 0.75 and a sharp minimum of error coincident with the Gaussian feature on the right. The improved sampling is concentrated on the peaks with minimal expenditure of sampling effort in uniform regions. To facilitate unbiased comparison between results the same budget of samples was used. Table 1 summarises performance measures for this test case.

Test Case Performance Measures

Signal Description Error Description EQ Ratio

(adaptive/full factorial)

CQ Ratio (adaptive/full

factorial)

RMSE Ratio (adaptive/full

factorial) C1: Double-Gaussian,

flat background

Varying, not coincident with signal features

1.34 10.82 0.60

C2: Double-Gaussian, slanting background

Constant 1.00 2.06 0.55

D1: Sigmoid (Step)

Constant 1.00 2.31 0.67

E1: Fragmentary sinusoid

Constant 1.00 1.79 0.45

Table 1: Performance of adaptive sampling with various test cases, all using an initial grid of 17 full factorial points. Performance values are averaged over five different shifts of the test signal relative to the initial sample points in order to reduce the influence of particularly fortunate or unfortunate initial sample placements on performance assessment.

When reconstructing the signal from the samples, the adaptive method can yield appreciable reduction in root mean squared error (RMSE) compared to the full factorial (uniform) method in cases where signal features are localized within an otherwise mundane background. The adaptive sampling scheme also places extra samples in the region of large experimental error even though the surrogate model value in this area is nearly flat and would garner little interest based on its curvature alone. According to the ratio of error quality (adaptive/full factorial), the two sample schemes in this case capture a similar proportion of high-error features: while the adaptive sampling


- 8 -

routine does pursue high-error regions like the error bulge at x = 0.75, it also pursues the high-curvature region near x = 0.9, where uncertainty is very low. The quality value for curvature confirms much stronger capture of high-curvature regions by the adaptive sampling scheme than by full factorial sampling.

Figure 2: Distribution of samples for the double-Gaussian (C1) test case on constant background, showing sample clustering in regions of large

curvature or large experimental error.

This synthetic test case is devised to demonstrate the impact of the measurement error. Figure 2 presents a case where regions of the signal have larger measurement uncertainty compared to those of higher curvature, i.e., the heteroskedastic case. The resulting distribution of samples indicates less emphasis on regions of high curvature as the focus includes regions of relatively large error. Previous adaptive methods (such as Mackman and Allen 2010) have not included a model of error in guiding sample allocation, and the case of heteroskedastic error illustrates the importance of including such an assessment when gathering data from physical experiments. Despite concentrating samples in the region of larger error, the adaptive method still assigns appreciable sample density to regions of the signal where there are peak features, achieving a reduction in signal reconstruction RMSE of 40% compared with the uniform sampling of a full factorial design. Generally the proposed method achieves promising results for signals which have inhomogeneous information content (low entropy) and tends towards equidistant sampling distributions in case of periodic signals. This trend is expected, as a signal with homogeneous information content is known to be sampled optimally by a uniform scheme satisfying the Nyquist sampling criterion. The C2 test case comprising two Gaussians with very different amplitudes on a slanting background with constant error (Figure 3) yields good results with the adaptive sampling method. However, despite the sufficient initial sampling to detect the presence of the smaller Gaussian feature to the right, the large curvature of the Gaussian feature to the left dominates the objective function in this case, drawing all the adaptive samples towards it and never allowing adaptive refinement of the feature on the right. This demonstrates that a minimum ratio of curvature must be met in order for the adaptive routine to justify placing adaptive samples in a region of relatively low curvature, when all other objective considerations are equal.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Resp

onse

Val

ue

0

0.5

1

Abs.

Nor

mal

ised

Lap

laci

an V

alue

True Error EnvelopeTrue SignalTrue Laplacian

0 0.5 1 1.5x

AdaptiveFull Factorial


- 9 -

Figure 3: Distribution of samples for the double-Gaussian test case on a slanting background (C2), with sample clustering preferential at the large-curvature peak.

The sigmoid or step test case (Figure 4) confirms good placement of samples in the vicinity of the important change of the signal, leaving sparse sample density in regions less interesting.

Figure 4: Distribution of samples for the sigmoid test case (D1) with

constant error, showing the tendency to pursue the high-curvature region.

Finally, the fragmentary sinusoid test case (Figure 5) shows the same behavior, where adaptive samples chase regions of largest curvature; this is important since the true regions of large curvature are not known a priori but are successively discovered as adaptive samples are added and the surrogate model is gradually updated.

-0.6

-0.2

0.2

0.6

1

1.4

1.8

2.2

Resp

onse

Val

ue

0

0.2

0.4

0.6

0.8

1

Abs.

Nor

mal

ised

Lap

laci

an V

alue


0 0.5 1 1.5x


-0.6

-0.2

0.2

0.6

1

1.4

1.8

2.2

Resp

onse

Val

ue

0

0.5

1Ab

s. N

orm

alis

ed L

apla

cian

Val

ue


0 0.5 1 1.5x



- 10 -

Figure 5: Distribution of samples for the fragmentary sinusoid test case (E1) with constant error, showing a focus on regions of large curvature

for the adaptive sampling scheme.

4. Applications to LDV Experiments As the proposed adaptive method suggests new sample locations iteratively, it is best suited to experiments in which data are collected serially. To demonstrate the potential of the proposed method, adaptive sampling was applied to a single component LDV-based wake characterization. A horizontal velocity profile was measured behind a porous disk (porosity β = 0.14) three disk diameters downstream at a diameter based Reynolds number of ReD = 4.1·104. Sampling locations were predicted by the described sampling methodology using velocity statistics (mean and variance) obtained from LDV. Figure 6 presents a comparison between sample distributions from the full factorial (uniform) sampling approach (sampling distance of 0.1·D) and the adaptive sampling criteria. Here the adaptive method better resolves the region of higher curvature (the centerline area) while spending minimal acquisition effort in the free stream areas which have very low experimental uncertainty and low curvature.

Figure 6: LDV experiment signal with associated error

envelope (top); sample distributions obtained from adaptive and full factorial sampling (bottom) showing concentration of

adaptive points in the region of large error and large curvature.

-1.2

-0.8

-0.4

0

0.4

0.8

1.2

Resp

onse

Val

ue

0

0.5

1

Abs.

Nor

mal

ised

Lap

laci

an V

alue


0 0.5 1 1.5x


0

0.2

0.4

0.6

0.8

1

True SignalTrue Error Envelope

0 0.5 1 1.5



- 11 -

A separate experiment demonstrates the importance of refining specific features rather than concentrating all effort on RMSE minimization. The turbulent boundary layer on a flat plate downstream of a metal trip wire was studied with laser Doppler velocimetry (LDV) in the University of Bristol’s Low Turbulence Wind Tunnel (0.8 m x 0.6 m octagonal test section, turbulence level 0.05%). A Spectra-Physics / Dantec Dynamics LDV system with a three-axis traverse and a Stabilite 2016-05S argon laser source was used to obtain 2D velocity point measurements in the turbulent boundary layer. The wind tunnel free stream velocity was 19.6 m/s and temperature inside the test section was 19.8 oC. Data plotted after all points were collected is shown in Figure 7 along with the experimentally observed standard deviation at each point, forming a heteroskedastic uncertainty (error) envelope.

Figure 7: Boundary layer velocity profile collected with LDV (top) and

sample distributions for surrogate models based on adaptive and full factorial sample schemes. The presence of noise is evident in the signal

and in its standard deviation (uncertainty).

The surrogate models for the velocity profile obtained using adaptive sampling and full factorial sampling schemes are shown closer in to investigate the area near the wall in Figure 8. It is clear that the full factorial samples cannot capture adequately the steepness of the velocity gradient in this region whereas the denser local sampling of the adaptive sampling distribution brings its surrogate model closer to the true steepness near the wall. From Figure 8 it is apparent that the adaptive sampling surrogate model exhibits roughly twice as many samples as the full factorial sampling surrogate model in this important region, though both models use the same budget of samples over the entire domain.

-14 -12 -10 -8 -6 -4 -2 0 2 4 68

10

12

14

16

18

20

22

24

26

Velo

city

(m/s

)

True Error EnvelopeTrue Signal

-500 -400 -300 -200 -100 0 100 200x



- 12 -

Figure 8: Close-in view of the true and surrogate model values for velocity near the wall.

A common quantity of interest in this context is the shear stress at the wall defined as

wallyU∂

∂= µτ 0 (18)

where yU ∂∂ / is the velocity gradient and µ is the dynamic viscosity of air. Typically there are not many data points very close to the wall due to limits in resolution, for instance, so procedures exist for estimating the shear stress at the wall based upon knowledge of the velocity profile of a fully developed turbulent boundary layer. The method due to Clauser (1956) is one popular such procedure, with a newer method due to Kendall et al. (2008) offering additional flexibility. Unfortunately, in the present experiment the geometry limitations of the wind tunnel preclude full development of a turbulent boundary layer: the test section proximity to the trip wire upstream means the common estimation methods mentioned are not applicable here. Nevertheless, the experiment provides an illustrative example of the importance of capturing the features of greatest interest in a surrogate model regardless of reducing the overall RMSE as a figure of merit. Without recourse to methods such as Clauser’s, an estimate of the shear stress at the wall can be obtained from Equation 18 with measurements of the steepest velocity gradient near the wall in each available surrogate model. Table 2 presents the results of this assessment taken at five sequential signal shifts relative to the initial full factorial samples.

Shift

Relative to Initial Full Factorial

Grid

True Full Factorial Adaptive Sampling

yU∂

∂

(s-1)

yU∂

∂

(s-1)

% Difference

yU∂

∂

(s-1)

% Difference

1

635

180 -72 370 -42

2 192 -70 298 -53

3 237 -63 410 -36

4 208 -67 274 -57

5 214 -66 385 -39 Table 2: Improvement in estimating the velocity gradient near the wall when using adaptive sampling with the objective function targeting regions of large curvature and large uncertainty.

0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.4811

12

13

14

15

16

17

18

19

20

Vertical Position (m)

Velo

city

(m/s

)

TrueAdaptiveFull Factorial


- 13 -

In the region so close to the wall, % difference from all surrogate models compared with the ‘true’ velocity gradient near the wall is very large; but in all cases, the adaptive sampling scheme produces a surrogate model with smaller % difference. This improved performance is due to the adaptive sampling objective function’s focus on regions of larger error or larger curvature; and in this LDV experiment, the region nearest the wall matches both of these objective components. In applications like this one, global measures of accuracy such as RMSE may be misleading: a model with low RMSE may misrepresent a region of particular importance for the given application; on the other hand, a model with worse RMSE overall may capture the feature of interest very accurately. 5. Conclusions A new adaptive sampling method has been presented which takes into account sample separation and uniqueness balanced with signal curvature, error envelope, and demonstrated improvement. For signals with inhomogeneous information content, the adaptive method provides sampling schemes which capture signal features more accurately and/or with fewer samples compared with uniform sampling. Alternatively, for signals with homogenous information content, for which uniform sampling is the optimal design, the adaptive method recovers toward this result. The robustness of the adaptive sampling method presented here in the presence of experimental noise and heteroskedasticity has been demonstrated with physical LDV experiments examining the wake behind a disk and the turbulent boundary layer over a flat plate. References Clauser, F.H. (1956) The turbulent boundary layer. Adv Appl Mech 4:1–51 Fasshauer, G.E. (2007) Meshfree Approximation Methods with Matlab. World Scientific, NJ. Fornberg, B.; Driscoll, T.; Wright, G. & Charles, R.Observations on the Behavior of Radial Basis Function

Approximations Near Boundaries. Comput Math Appl, 43: 473-490. Forrester, A. I., and Keane, A. J. (2009) Recent advances in surrogate-based optimization

Prog Aero Sci, 45: 50-79. Fox, J. (1997) Applied Regression Analysis, Linear Models, and Related Methods. Sage Publications,

Beverley Hills, CA. Hardy, R.L. (1971) Multiquadric Equations of Topography and Other Irregular Surfaces. J Geoph Res, 76:

1905-1915 Hussain, M. F.; Burton, R. R., and Joshi, S. B. (2002) Metamodeling: Radial basis functions, versus

polynomials. Eur J Oper Res, 138: 142-154. Jakobsson, S., Andersson, B., and Edelvik, F. (2009) Rational radial basis function interpolation with

application to antenna design. J Comput Appl Math, 233: 889-904. Kendall, A., and Koochesfahani, M. (2008) A method for estimating wall friction in turbulent wall-bounded

flows Experiments in Fluids, 44: 773-780. Mackman, T.J. and Allen, C.B. (2010) Investigation of an adaptive sampling method for data interpolation

using radial basis functions, Int J Num Meth Eng, 83: 915–938. Mackman, T., Allen, C., Ghoreyshi, M., and Badcock, K. (2013) Comparison of Adaptive Sampling

Methods for Generation of Surrogate Aerodynamic Models. AIAA J, 51: 797-808. Michelson, A. A. (1927) Studies in Optics. Dover Publications, Inc., Mineola, NY. Paiva, R. M.; Carvalho, A. R. D., Crawfordy, C., Sulemanz, A. (2009) A Comparison of Surrogate Models in

the Framework of an MDO Tool for Wing Design. 50th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference.

Peli, E. (1990) Contrast in Complex Images. J Opt Soc Amer A, 7: 2032-2040. Rippa, S. (1999) An algorithm for selecting a good value for the parameter c in radial basis function

interpolation. Adv Comput Math, 11: 193-210. Scheuerer, M. (2011) An alternative procedure for selecting a good value for the parameter c in RBF-

interpolation. Adv Comput Math, 34: 105-126. Theunissen, R., Scarano, F., and Riethmuller, M.L. (2007) An adaptive sampling and windowing


- 14 -

interrogation method in PIV, Meas Sci Technol, 18: 275-287. Wendland, H. (2005) Scattered Data Approximation. Cambridge University Press, Cambridge, U.K.

curvature-based spatially adaptive sampling of...

Documents