noise filtering: the ultimate solution

1
Noise filtering: the ultimate solution Summary The ulmate soluon of the problem "Which smoothing method is beer" is offered. A method of noise filtering based on confidence interval evaluaon is described. In the case of the approximaon of a funcon, measured with er- ror by a polynomial or other funcons that allows esmaon of the confidence interval, the minimal confidence interval is used as a criterion for the selecon of the proper parameters of the approximang funcon. In the case of the pol- ynomial approximaon opmized parameters include the degree of the poly- nomial, the number of points (window) used for the approximaon, and the posion of the window center with respect to the approximated point. The special consideraons on confidence interval evaluaon and quality of polyno- mial fit using noise properes of the data array are discussed. The Method pro- vides the lowest possible confidence interval for every data point. The Method is demonstrated using generated and measured data. Improve- ment of noise reducon compared to compeng methods can vary depending on the input data, but always exists. Excellent noise reducon properes are combined with conserved object shape (e.g. chromatographic peak, photo- graphic object) without arfacts. The method requires extra computaons, which can be easily paralleled. Filtering white noise Triangular peak Signal to noise rao of the peak Area Height . Discussion Confidence interval is a very natural criterion of approximaon quality and it perfectly fits the case of noise filtering. In the case of variable window width and/or degree of the polynomial addional fit quality criteria based on noise esmate have to be applied to avoid effects of accidental good fit for small ap- proximaon windows and peak suppression in wide windows. The algorithm of Confidence Filter very effecvely suppresses baseline noise and significantly improves detecon and quanficaon limits. Even non-white noise can be suppressed, such as pump pulsaons, chemical noise; in addion peak shape does not suffer. No doubt, methods that ulize addional infor- maon about signal and noise, such as reconstrucon of pump pulsaon pro- file and subtracon of this profile from the signal before noise filtering, may give beer results, but they will be much more expensive, as will require build- ing a model of the process. The math used to compute confidence interval (Formula 2) is based on as- sumpon of homoscedasc noise. The case of heterocsedasc noise is much more complicated and in some cases can be avoided by scaling of the raw data. As an example of such scaling one can imagine taking square root of the signal before smoothing and then using square for final answer – in the case of any counng detector such a procedure has good theorecal base. The Confidence algorithm can be extended to data of higher dimensions, e.g. 2-D, by approximaon of smoothed data in different dimension. The heter- oscedasc noise model in this case is a big benefit, as for the second dimen- sion, regression may gain addional accuracy by accounng for different confi- dence intervals of data points. References 1. Savitzky, A.; Golay, M.J.E. (1964). "Smoothing and Differenaon of Data by Simplified Least Squares Procedures". Analycal Chemistry 36 (8): 1627– 1639. 2. Felinger A.; Data analysis and signal processing in chromatography / Data Handling in Science and Technology – v.21 ELSEVIER, 1998. 3. Linear Regression Analysis (Wiley Series in Probability and Stascs) by George A. F. Seber and Alan J. Lee (Feb. 5, 2003). 4. McWilliam, I. G.; Bolton, H. C., Instrumental Peak Distoron. I. Relaxaon Time Effects, Anal. Chem. 1969, 41, 1755-1762. 5. Ricardo Maronna, Doug Marn and Victor Yohai, Robust Stascs - Theory and Methods, Wiley, 2006 Yuri Kalambet, Yuri Kozmin*, Sergey Maltsev Ampersand Ltd., Moscow, Russia *Institute of Bioorganic Chemistry RAS, Moscow, Russia Theory where n - number of data points used for polynomial approximaon (gap of the filter); p - power of the polynomial; X - matrix of x power values on independent axis (me); Y - vector of detector response values; - Student’s coefficient for confidence probability (1-δ) and m degrees of freedom x * - posion at which smoothed (approximated) value is esmated. Handling baseline steps Non-white noise * ) 2 / 1 ( u S t C p n Y p n S ) ˆ ( ) ˆ ( 2 β X Y β X Y * 1 * x X) X ( x * u Y X X X β 1 ) ( ˆ } ,..., , 1 { * * p x x * x m t 2 2 2 Baseline Peak Area C w C C i 2 2 ) ( 2 Baseline x Y h C C C h a) Filtering arficial chromatogram of EMG *4+ peak with white noise applied; solid line – original data (Noisy); dashed line – Confi- dence filter with noise definion width of 31; b) Approximaon window size depend- ing on window posion. Pump pulsaons are effecvely suppressed by Confidence filter using noise definion window width of 121 (dashed line). Solid line – original data. When narrow noise definion window 11 (corresponding to half cycle of pump stroke) is used (doed line), pump pul- saons are not suppressed, just smoothed. Curves are shiſted along Y axis to avoid over- lapping. a) Approximaon of CE peak (solid line) with Confidence filter (doed line), and Savitzky- Golay filter (dashed line). b) Polynomial width (dashed line) and of the distance of the point used for approximaon from the center of the polynomial (solid line). c) confidence interval of the approximaon by the Confidence filter. doed – raw data; thick line – Confidence Filter; thin line – Savitzky-Golay filter Profile of √(u*) term for the gap 31 and polynomial degrees from 0 to 5. Modificaon of confidence interval esmaon algorithm for parcular polynomial depending on S 2 of this poynomial

Upload: others

Post on 12-Sep-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Noise filtering: the ultimate solution

Noise filtering: the ultimate solution

Summary

The ultimate solution of the problem "Which smoothing method is better"

is offered. A method of noise filtering based on confidence interval evaluation

is described. In the case of the approximation of a function, measured with er-

ror by a polynomial or other functions that allows estimation of the confidence

interval, the minimal confidence interval is used as a criterion for the selection

of the proper parameters of the approximating function. In the case of the pol-

ynomial approximation optimized parameters include the degree of the poly-

nomial, the number of points (window) used for the approximation, and the

position of the window center with respect to the approximated point. The

special considerations on confidence interval evaluation and quality of polyno-

mial fit using noise properties of the data array are discussed. The Method pro-

vides the lowest possible confidence interval for every data point.

The Method is demonstrated using generated and measured data. Improve-

ment of noise reduction compared to competing methods can vary depending

on the input data, but always exists. Excellent noise reduction properties are

combined with conserved object shape (e.g. chromatographic peak, photo-

graphic object) without artifacts. The method requires extra computations,

which can be easily paralleled.

Filtering white noise Triangular peak

Signal to noise ratio of the peak

▪ Area

▪ Height .

Discussion

Confidence interval is a very natural criterion of approximation quality and

it perfectly fits the case of noise filtering. In the case of variable window width

and/or degree of the polynomial additional fit quality criteria based on noise

estimate have to be applied to avoid effects of accidental good fit for small ap-

proximation windows and peak suppression in wide windows.

The algorithm of Confidence Filter very effectively suppresses baseline noise

and significantly improves detection and quantification limits. Even non-white

noise can be suppressed, such as pump pulsations, chemical noise; in addition

peak shape does not suffer. No doubt, methods that utilize additional infor-

mation about signal and noise, such as reconstruction of pump pulsation pro-

file and subtraction of this profile from the signal before noise filtering, may

give better results, but they will be much more expensive, as will require build-

ing a model of the process.

The math used to compute confidence interval (Formula 2) is based on as-

sumption of homoscedastic noise. The case of heterocsedastic noise is much

more complicated and in some cases can be avoided by scaling of the raw data.

As an example of such scaling one can imagine taking square root of the signal

before smoothing and then using square for final answer – in the case of any

counting detector such a procedure has good theoretical base.

The Confidence algorithm can be extended to data of higher dimensions,

e.g. 2-D, by approximation of smoothed data in different dimension. The heter-

oscedastic noise model in this case is a big benefit, as for the second dimen-

sion, regression may gain additional accuracy by accounting for different confi-

dence intervals of data points.

References

1. Savitzky, A.; Golay, M.J.E. (1964). "Smoothing and Differentiation of Data by Simplified Least Squares Procedures". Analytical Chemistry 36 (8): 1627–1639.

2. Felinger A.; Data analysis and signal processing in chromatography / Data Handling in Science and Technology – v.21 ELSEVIER, 1998.

3. Linear Regression Analysis (Wiley Series in Probability and Statistics) by George A. F. Seber and Alan J. Lee (Feb. 5, 2003).

4. McWilliam, I. G.; Bolton, H. C., Instrumental Peak Distortion. I. Relaxation Time Effects, Anal. Chem. 1969, 41, 1755-1762.

5. Ricardo Maronna, Doug Martin and Victor Yohai, Robust Statistics - Theory

and Methods, Wiley, 2006

Yuri Kalambet, Yuri Kozmin*, Sergey Maltsev

Ampersand Ltd., Moscow, Russia

*Institute of Bioorganic Chemistry RAS, Moscow, Russia

Theory

where

n - number of data points used for polynomial approximation (gap of the filter);

p - power of the polynomial;

X - matrix of x power values on independent axis (time);

Y - vector of detector response values;

- Student’s coefficient for confidence probability (1-δ) and m degrees of freedom

x* - position at which smoothed (approximated) value is estimated.

Handling baseline steps Non-white noise

*

)2/1( uStC pnY

pnS

)ˆ()ˆ(2

βXYβXY*

1

* xX)X(x *u

YXXXβ 1)(ˆ },...,,1{**

pxx*x

mt

222

BaselinePeak

Area CwCCi

22

)(

2

BaselinexYh CCCh

a) Filtering artificial chromatogram of EMG *4+ peak with white noise applied; solid line – original data (Noisy); dashed line – Confi-dence filter with noise definition width of 31; b) Approximation window size depend-ing on window position.

Pump pulsations are effectively suppressed by Confidence filter using noise definition window width of 121 (dashed line). Solid line – original data. When narrow noise definition window 11 (corresponding to half cycle of pump stroke) is used (dotted line), pump pul-sations are not suppressed, just smoothed. Curves are shifted along Y axis to avoid over-lapping.

a) Approximation of CE peak (solid line) with Confidence filter (dotted line), and Savitzky-Golay filter (dashed line). b) Polynomial width (dashed line) and of the distance of the point used for approximation from the center of the polynomial (solid line). c) confidence interval of the approximation by the Confidence filter.

dotted – raw data; thick line – Confidence Filter; thin line – Savitzky-Golay filter

Profile of √(u*) term for the gap 31 and polynomial degrees from 0 to 5.

Modification of confidence interval estimation algorithm for particular polynomial depending on S2 of this poynomial