performance comparison of first generation and second ...ing of electrocardiogram (ecg) signals...

26
Performance Comparison of First Generation and Second Generation Wavelets in the Perspective of Genomic Sequence Analysis Shiwani Saini 1 and Lillie Dewan 2 12 Department of Electrical Engineering, National Institute of Technology, Kurukshetra, India-136119. shiwani [email protected] January 13, 2018 Abstract Research in the area of Genomic signal processing is be- ing actively pursued after the completion of Human Genome Project, which has resulted in the accumulation of genomic data in large numbers. This necessitates the use of rapid and efficient signal processing techniques for data analysis. Signal processing techniques require suitable mathematical mappings of the genomic sequences before their analysis to maximize the accuracy of the results and efficiently char- acterize genomic information. It is therefore important to choose the best numerical representation as well as the anal- ysis technique for genomic data. Wavelet transform is an important technique for signal analysis due to its excellent localization properties in the time-frequency domain. First generation wavelet transforms (FGWT) are implemented using memory- intensive algorithms involving a convolution and filtering process. Second generation wavelet transforms (SGWT) are implemented with a lifting scheme that has 1 International Journal of Pure and Applied Mathematics Volume 118 No. 16 2018, 417-442 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu Special Issue ijpam.eu 417

Upload: others

Post on 13-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

Performance Comparison of FirstGeneration and Second Generation

Wavelets in the Perspective of GenomicSequence Analysis

Shiwani Saini 1 and Lillie Dewan2

1 2Department of Electrical Engineering,National Institute of Technology,

Kurukshetra, India-136119.shiwani [email protected]

January 13, 2018

Abstract

Research in the area of Genomic signal processing is be-ing actively pursued after the completion of Human GenomeProject, which has resulted in the accumulation of genomicdata in large numbers. This necessitates the use of rapidand efficient signal processing techniques for data analysis.Signal processing techniques require suitable mathematicalmappings of the genomic sequences before their analysis tomaximize the accuracy of the results and efficiently char-acterize genomic information. It is therefore important tochoose the best numerical representation as well as the anal-ysis technique for genomic data. Wavelet transform is animportant technique for signal analysis due to its excellentlocalization properties in the time-frequency domain. Firstgeneration wavelet transforms (FGWT) are implementedusing memory- intensive algorithms involving a convolutionand filtering process. Second generation wavelet transforms(SGWT) are implemented with a lifting scheme that has

1

International Journal of Pure and Applied MathematicsVolume 118 No. 16 2018, 417-442ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)url: http://www.ijpam.euSpecial Issue ijpam.eu

417

Page 2: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

reduced computational complexity and lesser memory re-quirements. But the applications of SGWT are limited tosignal denoising and compression and have not been used inthe field of genomic signal analysis. This paper comparesthe performance of first generation and second generationwavelet transforms in the context of genomic signal process-ing by analyzing different DNA symbolic to numeric repre-sentations of a given genomic sequence along with choos-ing the best numerical representation and wavelet functions.Performance measures of reconstruction error and compu-tation time have been evaluated. Results show that SGWToutperform FGWT in terms of reconstruction errors withthe only trade off being their higher computation time.

Key Words : First generation wavelets, second gen-eration wavelets, reconstruction errors, genomic sequences,signal processing

1 Introduction

Recent advances in genome sequencing have resulted in an extraor-dinary upsurge in genomic data. Moreover, the DNA sequenceshave been evolving over the years and are likely to diverge in termsof genome organization. Digital signal processing (DSP) basedmethods aided by visual analysis help analyze these changes in arapid and much more organized manner. DNA sequences are repre-sented by character strings, which cannot be analyzed directly withsignal processing methods. Before applying signal processing meth-ods, they need to be mapped into suitable numerical mappings.Mapping of DNA sequences into discrete numerical values enablesDSP based techniques to be applied for different sequence analysisrelated problems such as gene finding, exon prediction, motif detec-tion, etc. It is important to choose the appropriate mathematicalmapping as it determines the salient features of the sequence for itscharacterization and evolutionary information [1].

Several time domain and frequency domain approaches havebeen applied in genomic sequence analyses. Time domain represen-tations do not necessarily provide much information as many sig-nals contain the most relevant information in the frequency content.

2

International Journal of Pure and Applied Mathematics Special Issue

418

Page 3: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

Frequency domain tools are ideally suited for the quantification ofmany clinical and physiological phenomena. Wavelet Transforms(WT) outperform all the standard time-domain and frequency-domain methods due to their ability to localize time-frequency in-formation of a signal, multi resolution analysis and flexible choiceof basis functions.

First generation wavelets utilize FT to construct the basis func-tions, so the functions are analyzed in frequency domain. Anal-ysis of genomic data with FGWT is implemented using memory-intensive algorithm, which involves a convolution and filtering pro-cess and is called discrete wavelet transform (DWT). Moreover thecomputational complexity of the algorithm increases with the in-crease in filter length. To overcome these limitations, Sweldens[2] introduced a new algorithm based on a lifting scheme to con-struct wavelet functions in time domain instead of frequency do-main. These basis functions are called second-generation wavelets.Second generation wavelet coefficients can be calculated in-placewithout the requisite of auxiliary memory unlike FGWT. SGWTare not memory intensive since they are based on a lifting scheme,which uses less filter coefficients than the first generation waveletsmaking them computationally less expensive. Despite their advan-tages, use of SGWT has only been limited to the field of signalsand images denoising and compression.

FGWT find numerous applications in genomic signal processing.They have been used in locating pathogenicity islands in genomesequences of helicobacter pylori and N. meningitides by analyzingtheir G+C content [3], analyzing pattern irregularities such as ex-onic regions, introns and ribosomal RNA regions in the DNA se-quences [3], revealing long-range correlations in DNA sequences ofeukaryotic and bacterial genomes [5]. DWT approach has been ap-plied for detecting and characterizing repeating motifs in proteinsequences [6] and identifying protein coding regions [7], studyingDNA information embedded in chromosomes in human DNA [8].Wavelet algorithms in combination with entropy method have beendeveloped to identify gene locations in genomic DNA sequences [9].A similarity concept based on decomposition of protein sequencewith first generation wavelets and its cross correlation to examineprotein sequences similarity at different spatial resolutions has beenreported [10]. First generation wavelet analysis has been used for

3

International Journal of Pure and Applied Mathematics Special Issue

419

Page 4: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

identifying recurrent regions of copy number variations and distin-guishing cancer driver genes from passenger genes in gene expres-sion data [11]. An algorithm combining cross-correlation and firstgeneration discrete wavelet transform reported better predictionaccuracy in identifying exonic regions [12].

SGWT find applications in signal and image denoising and com-pression. A smoothing method based on the second-generationwavelets and bivariate shrinkage has been reported to be more effec-tive than classical wavelet based methods [13]. Lifting scheme hasbeen used for encephalic signal compression of normal and patho-logical EEG with rbio5.5 wavelet [14], electrocardiogram (ECG) de-noising, detrending and characteristic points detection [15], denois-ing of electrocardiogram (ECG) signals corrupted by non stationarynoises: muscle artifact noise, electrode motion artifact noise, andwhite noise [16]. An adaptive (signal-dependent) second-generationwavelet-based processing of phonocardiograms to simultaneouslypreserve relevant high-frequency details and reduce noise has beenreported [17]. Second generation wavelets have been used for ar-tifact removal in electroencephalogram (EEG) by comparing rel-ative wavelet energies before and after thresholding [18]. Liftingscheme has been utilized for processing EEG data for brain com-puter interface [19], which reported significant improvement in theclassification results as compared to the first generation wavelettransforms. A Modified Lifting Scheme (MLS) showed better re-sults in comparison to first generation wavelet packet method tosolve the recurring problem of electromyographic (EMG) signalsstorage and/or duration of transmission [20]. An algorithm basedon Haar lifting scheme for automatic fundamental heart sound de-tection in phonocardiograms (PCG) via joint time-frequency repre-sentation calculated on both normal and pathological PCG record-ings showed relatively high recall and precision rates [21]. A novelsecond generation wavelet based lifting filter has been implementedfor image noise cancellation [22]. Suitability of second generationwavelets coupled with hard or soft thresholding in the task of imagesequence super resolution with simultaneous noise ltering has beenillustrated in [23]. A symmetrical second-generation wavelet for im-age denoising has been reported to have performed better at imagedenoising in terms of the signal-to-noise ratio than that producedby the classical wavelets db2, bior3.1, bior3.3, coif1 and sym2 [24].

4

International Journal of Pure and Applied Mathematics Special Issue

420

Page 5: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

Due to the advantages of lower computational complexity andlesser memory requirements of SGWT over FGWT, they hold thepotential for analyzing large datasets like medical images, genomicsequences and microarray data. However application of second gen-eration wavelets in the field of genomic sequence analysis has notbeen reported in literature so far. This paper therefore evaluatesand compares the performance measures of first generation and sec-ond generation WTs in the analysis of genomic sequences and high-lights the potential of using second generation wavelets in genomicsignal processing.

2 Wavelet Transforms

2.1 First GenerationWavelet Transforms (FGWT)

Wavelet is a waveform of finite duration with zero average value.Continuous wavelet transform (CWT) is calculated by convolvinga given signal x(t) with the scaled and shifted mother wavelet func-tion ψ(t) represented by Eq.1.

Cab =∫tx(t) 1√

aψ ∗ ( t−b

a)dt (1)

where a is the scaling parameter and b is the translational pa-rameter. CWT generates a lot of redundant data since it is calcu-lated for continuous values of scales and translations. But DWT iscalculated at scales and positions that are chosen based on powersof 2. Discretization of the wavelet function removes the redundancyof CWT and is defined as in Eq. 2:

ψm,n(t) = 1√a0m

ψ ∗ (t−nb0am0 )

am0(2)

where a0 and b0 are constants with values of a0 = 2 and b0 = 1respectively. This discretization is called dyadic grid scaling and isexpressed as (Eq.3)

ψm,n(t) = 1√2mψ (1−n2m)

2m= 2−

m2 ψ(2−mt− n) (3)

where ψm,n(t) represents the wavelet coefficients at scale m andlocation n. The dyadic scaling scheme is implemented using quadra-

5

International Journal of Pure and Applied Mathematics Special Issue

421

Page 6: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

ture mirror filters and is called fast wavelet transform [25]. In thisalgorithm, the original signal s is first filtered through a pair of highpass g(n) and low pass h(n) filters and then down sampled to getthe decomposed signal, which is half the length of the original signal(Fig. 1.1). This filtering and down sampling of signal correspondsto one level of decomposition, which is expressed as

Yhp(k) =∑

n S(n)g(2k − n) (4)

Ylp(k) =∑

n S(n)h(2k − n) (5)

where S(n) is the original signal and g[n], h[n] are the impulse re-sponses of high-pass (H) and low-pass (L) filters respectively. Out-puts of the high-pass and low-pass filters after subsampling by 2 areYhp(k) and Ylp(k), respectively. Low pass filter components arecalled approximations (cA1) and high pass filter components arecalled details (cD1). This process of filtering and down sampling iscalled sub-band coding and can be repeated for further decomposi-tion. Repeated iterations consist of the filtering and down samplingof approximation coefficients of the previous decomposition levelthrough a set of high pass and low pass filters that decompose thesignal into lower resolution components at each level. This formsthe basis of multi resolution decomposition. The original signal safter two levels of decomposition can be expressed as s = cA2 +cD2+ cD1. Filtering and subsampling operations at each iterationresults in half the number of samples (that corresponds to half thetime resolution) and half the frequency band spanned (that corre-sponds to double the frequency resolution).

Fig.1.1. Two level signal decomposition with discrete wavelet trans-form

6

International Journal of Pure and Applied Mathematics Special Issue

422

Page 7: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

The original signal can be reconstructed using inverse discretewavelet transform by first upsampling the signal and then filteringthe decomposed wavelet coefficients through a set of two comple-mentary filters (L’ and H’). In multilevel reconstruction, coarse levelapproximations and details are upsampled and filtered through a setof low pass (L’) and high pass (H’) reconstruction filters which thencombine to form the approximations at the finer resolution. Thereconstructed signal for two levels of decomposition is expressed asA2 + D2 + D1 = s (Fig. 1.2). The set of decomposition filters(L and H) and reconstruction filters (L’ and H’) together are calledquadrature mirror filters.

Fig.1.2. Two level signal reconstruction with discrete wavelet trans-form

2.2 Second Generation Wavelet Transforms(SGWT)

While implementing dyadic grid scaling in DWT, the signal in theanalysis stage is first filtered and then subsampled. Similarly inthe synthesis stage filtering follows upsampling. Thus the filterperforms number of multiplications with zero, which increases thecomputation cost in DWT. However SGWT reduced the numberof redundant calculations using a lifting scheme. Lifting waveletscheme implemented on an input signal sj with 2j samples to betransformed into an approximation signal sj−1 and a detail signaldj−1 included the following three steps:

1. Split

2. Predict

7

International Journal of Pure and Applied Mathematics Special Issue

423

Page 8: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

3. Update

i. Split stage separates the original signal into two disjoint setsof samples - even indexed samples s2l and odd indexed sampless2l+1. This splitting of the signal into even and odd indexedsamples is called the lazy wavelet transform.

ii. Predict stage (referred to as dual lifting) predicts the oddpolyphase component (detail coefficient) based on a certainlinear combination of samples of the even polyphase compo-nent and is expressed as

dj−1 = Oddj−1 − P (Evenj−1) (6)

iii. Update stage (referred to as primal lifting) updates the evenpolyphase component based on a linear combination of differ-ence samples obtained from the predict step and is expressedas

sj−1 = Evenj−1 − U(dj−1) (7)

where sj−1 represents the approximation coefficient.

Forward lifting scheme implementation is shown in Fig. 1.3.While calculating the inverse transforms, order of operations andsigns is reversed. Inverse lifting transform is represented by thefollowing equations (Fig. 1.4)

Evenj−1 = sj−1 − U(dj−1) (8)

Oddj−1 = dj−1 + P (Evenj−1) (9)

sj = Merge(Evenj−1, Oddj−1) (10)

8

International Journal of Pure and Applied Mathematics Special Issue

424

Page 9: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

Fig.1.3. Forward lifting wavelet transform

Fig.1.4. Inverse lifting wavelet transform

Analysis of genomic signals with WT helps elicit useful informa-tion underlying it. WT methods construct a set of basis functionsfor analyzing signals. Reconstruction is the process of recovering asignal from its approximation and detail components in the analysisstage.

Implementation of a digital filter is on a computer makes errorsand constraints due to finite word length unavoidable. These arecalled quantization effects of digital filters. This error is finite andis called reconstruction error. Reconstructed signal is not perfect.Reconstruction error for a signal with n decomposition levels is cal-culated as

Reconstructionerror = max(abs(originalsignal−(an+∑n

i=1 di)))(11)

where an represents nth level approximation coefficient and di(i=1,2,..n) represents detail coefficients at given decomposition level.Value of reconstruction error gives the measure of accuracy of wavelet

9

International Journal of Pure and Applied Mathematics Special Issue

425

Page 10: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

function in the process of multilevel decomposition and reconstruc-tion. Minimization of reconstruction error is crucial to maximizingthe accuracy of information in genomic sequences.

3 Mathematical Representation of DNA

sequences

DNA sequences are represented by character string of nucleotidebases namely: A (Adenine), G (Guanine), Thymine (T) and Cyto-sine (C); which is considered as discrete-time data for DSP basedanalysis. Before analyzing this discrete time data, it needs to bemapped into numerical values with suitable representations. Nu-merical mappings of the DNA sequence can be representations ofnucleotide, codon or amino acid symbols in real or complex num-bers. Based on the values of nucleotide representations they can bedivided into two groups as

1. Values assigned randomly to the nucleotide bases. These rep-resentations include Voss, 2 bit binary, 4 bit binary, tetrahe-dron, complex number, integer number, real number, Quater-nion and inter-nucleotide distance.

2. Values assigned based on certain biophysical or biochemicalproperties of DNA. These representations include EIIP (elec-tron ion interaction potential), paired numeric, DNA walkand Z-curves.

Voss representations map the nucleotides A, C, G, and T withfour binary indicator subsequences: A[i], C[i], G[i] and T[i] indi-cating the presence of respective nucleotides with 1 and absencewith 0 [26]. 2-bit binary representation assigns a two-bit binarycode 00, 11, 10, 01 to the nucleotides A, C, G, and T respec-tively [27]. 4-bit binary representation encodes nucleotides A, C,G, T into 4-bit binary values 1000, 0010, 0001 and 0100 respec-tively [28]. Tetrahedron representation is a three dimensional rep-resentation, where each nucleotide is mapped into four equal lengthvectors directed towards the vertices of a regular tetrahedron andsymmetrically placed in 3-dimensional space [29]. Integer repre-sentation maps the nucleotides with integer numbers [30] such that

10

International Journal of Pure and Applied Mathematics Special Issue

426

Page 11: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

T=0,C=1,A=2,G=3. Real number representation assigns real num-bers to the nucleotides that are represented as A=1.5, T=1.5, C=0.5,G=0.5 [31]. Complex number representation assigns complex num-bers to each nucleotide of the DNA sequence based on certainnucleotide complementarity property [29, 32]. Quaternion repre-sentation assigns pure quaternions to each nucleotide [33]. EIIP(electron-ion interaction potential) values denote the average en-ergy of the delocalized electrons in the nucleotide [34] and repre-sent the nucleotides as A=0.1260; C=0.1340; G=0.0806; T=0.1335.Paired numeric representation assigns integer or real values to theDNA sequences based on nucleotide complementarity for examplecomplementary pairs A, T are assigned a value of 0 and C, G pairsare assigned value of 1 [2]. Another paired numeric representationmaps purines (A, G)= -1 and pyrimidines (C, T)= 1. DNA walk ofa sequence at a given location is represented as cumulative sum ofthe numerical representation of the sequence [32]. Z curve providesa 3-dimensional mapping of a DNA sequence where x-axis repre-sents the excess of purines or pyrimidines (RY), y-axis representsthe excess of amino or keto nucleotides (MK) and z-axis representsthe excess of weak or strong hydrogen bonded nucleotides (WS)along the sequence [35-36].

Most of the numerical mapping methods listed above are notsuitable for analysis with DSP methods. Voss, Z-curve and tetrahe-dron representations increase the computational complexity whenanalysed with signal processing techniques since they convert theDNA sequences into three or four numerical sub sequences. Se-quences mapped with quaternion representation can be analysedspecifically with discrete quaternion Fourier transform only [31].2-bit and 4-bit binary representations are converted into real num-ber representations before analysis. This paper compares the anal-ysis of genomic sequence represented with EIIP, paired numeric,DNA walk, complex, integer and real number representations withSGWT. Though comparisons of various numerical representationswith respect to pairwise DNA sequence similarity, exon predic-tion with discrete fourier transform and reconstruction errors withFGWT have been reported, such comparisons with SGWT havenot been performed. This paper extends the comparison of differ-ent numerical representations of EIIP, paired numeric, DNA walk,complex, integer and real number representations with FGWT [37]

11

International Journal of Pure and Applied Mathematics Special Issue

427

Page 12: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

to their comparison with SGWT. In addition, performance mea-sures of reconstruction error and computation time for differentDNA symbolic to numeric representations with FGWT and SGWTare also compared.

4 Algorithm

To compare different numerical mapping methods for genomic se-quence, a reference genomic sequence of strain H37Rv of Mycobac-terium Tuberculosis (NCBI accession number NC 000962.3) wasdownloaded from GenBank database [38]. Choice of this sequencewas governed by its extremely large size (4411532 base pairs), whichis an important parameter to decide the analysis technique. Dif-ferent numerical representations for comparison are listed in Table1.1 [37].

Following algorithm was implemented in Matlab R2014a

1. For each numerical representation, perform multilevel waveletdecomposition (levels of decomposition n=1, 2, 3, 4) of thegiven genomic sequence using lifting wavelet transform.

2. At each decomposition level n, extract approximation coeffi-cients Can and detail coefficients, Cdi where i = 1, 2n.

3. Using inverse lifting wavelet transform, reconstruct the ap-proximation coefficients (an) and detail coefficients (di; i =1, ...n) at each decomposition level n.

4. Calculate the reconstruction error at each decomposition levelby using equation (11).

5. Calculate the elapsed time /computation time (time requiredfor decomposition and reconstruction of the signal and cal-culation of the reconstruction error) at each decompositionlevel.

6. Repeat the algorithm using discrete wavelet transform.

12

International Journal of Pure and Applied Mathematics Special Issue

428

Page 13: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

5 Results and Discussion

Different mathematical representations of the given genomic se-quence have been analysed at different decomposition levels (n=1,2,3,4)for various orthogonal (db1, db2, sym2, coif1) and biorthogonal(bior1.1, bior2.2, bior3.5, rbio1.1, rbio2.2, rbio3.5) wavelet func-tions (ψ(t)) with both FGWT and SGWT. Reconstruction errorand computation time for each decomposition level has been evalu-ated. Comparison of reconstruction errors at decomposition levels1-4 using both FGWT and SGWT are presented in Tables 1.2-1.5.Computation time for each representation at decomposition lev-els 1-4 calculated with FGWT and SGWT are presented in Tables1.6-1.9.

From tables 1.2-1.5 it is observed that

• Reconstruction errors calculated with FGWT are much higherin magnitude as compared to the reconstruction errors calcu-lated with SGWT with the exception of the representations-DNA walk 2, Paired Numeric and EIIP, which when ana-lyzed with SGWT for wavelet functions bior2.2 and bior3.5gave reconstruction errors comparable in magnitude to thosecalculated with DWT.

• Level 1 reconstruction errors for complex representations com-plex 1, complex 2 and DNA walk 1 (complex) calculated withFGWT do not change for the respective representations irre-spective of the wavelet function used.

13

International Journal of Pure and Applied Mathematics Special Issue

429

Page 14: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

Table 1.1. Mathematical mapping of DNA sequences.

• All complex number representations analysed with FGWTshow very high error magnitudes when compared with theerrors calculated using SGWT thereby suggesting the suit-ability of SGWT for their analysis.

• Integer representation, real representation and all complexrepresentations, can be best analyzed with SGWT and notwith FGWT because of very high error magnitudes when an-alyzed with FGWT.

• Comparison of error magnitudes of DNA walk representationsshows that integer DNA walk is preferable over complex DNAwalk because of smaller reconstruction errors.

• Out of all the representations based on random assigning of

14

International Journal of Pure and Applied Mathematics Special Issue

430

Page 15: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

numerical values to sequences, real number representation hasthe least reconstruction errors with both FGWT and SGWTbased analysis.

• Comparison of error magnitudes of all the representationsshows that EIIP representation has minimum error for bothFGWT and SGWT based analyses hence can be preferredover all the representations.

• Of all the wavelet functions used for analysis, biorthogonalwavelet rbio1.1 shows minimum reconstruction errors withboth FGWT and SGWT irrespective of the representationused.

• Of all the orthogonal wavelets used for analysis, wavelet func-tion db1 gives minimum errors for both FGWT and SGWTbased analyses.

Table 1.2. Reconstruction errors for level 1 decomposition.

15

International Journal of Pure and Applied Mathematics Special Issue

431

Page 16: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

Table 1.3. Reconstruction errors for level 2 decomposition.

Table 1.4. Reconstruction errors for level 3 decomposition.

16

International Journal of Pure and Applied Mathematics Special Issue

432

Page 17: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

Table 1.5. Reconstruction errors for level 4 decomposition.

From tables 1.6-1.9 it can be observed that

• SGWT based analysis of genomic sequences requires highercomputation time than FGWT based analysis irrespective ofthe decomposition levels and representations used.

• Computation time increases with the increase in decomposi-tion level.

• Complex representations require higher computation timesthan the rest of the representations.

Reconstruction errors increase with increase in decompositionlevel because more number of decomposition levels increase thefilter bank iterations and hence the number of computations in

17

International Journal of Pure and Applied Mathematics Special Issue

433

Page 18: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

forward and inverse transforms. Increased filter bank iterations in-crease quantizations of greater number of lower scale coefficientsthat leads to increased reconstruction errors.

The error computation time with SGWT is higher than thatwith FGWT even though the computation complexity of SGWTis less primarily because SGWT performs in-place computation.Lifting algorithm in SGWT first separates the input signal into evenand odd samples, which are then processed according to the specificpolyphase analysis matrix. For many applications, the speed atwhich data can be read is one input sample per clock cycle, sosample pairs are usually processed at every other clock cycle [39],which results in a limitation on the speed of direct implementationof the lifting scheme.

The most suitable wavelet functions for analysis based on mea-sures of reconstruction errors are rbio1.1 and db1. Choice of nu-merical mapping puts forth most relevant information containedin the sequence. Mappings based on biophysical and biochemicalproperties have lower reconstruction errors in comparison to therepresentations assigned randomly to the nucleotides. Magnitudesof reconstruction errors show that EIIP representations are mostsuitable followed by paired numeric and DNA walk 2. While EIIPrepresentations can be preferred when required to represent thetrue biological characteristics of the sequence, DNA walk 2 repre-sentation can be preferred when it needs to be determined how thecomplete genome evolves along its length.

18

International Journal of Pure and Applied Mathematics Special Issue

434

Page 19: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

Table 1.6. Error computation time for level 1 decomposition inseconds.

19

International Journal of Pure and Applied Mathematics Special Issue

435

Page 20: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

Table 1.7. Error computation time for level 2 decomposition inseconds.

20

International Journal of Pure and Applied Mathematics Special Issue

436

Page 21: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

Table 1.8. Error computation time for level 3 decomposition inseconds.

21

International Journal of Pure and Applied Mathematics Special Issue

437

Page 22: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

Table 1.9. Error computation time for level 4 decomposition inseconds.

6 Conclusions

This paper presented the analysis of different mathematical repre-sentations of genomic sequences at four decomposition levels withfirst generation (DWT) and second generation (LWT) wavelet trans-forms and compared their measures of performance - reconstructionerrors and computation times. It can be concluded from the resultsthat SGWT outperform FGWT in terms of the reconstruction er-rors with the exception of wavelet functions bior2.2 and bior3.5which give slightly higher error magnitudes for EIIP, DNA walk2 and paired numeric representations. However there is a tradeoff between reconstruction error and computation time. The errorcomputation time with SGWT is higher than that with FGWT eventhough the computation complexity of SGWT is less primarily be-

22

International Journal of Pure and Applied Mathematics Special Issue

438

Page 23: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

cause SGWT performs in-place computation. While FGWT assignauxiliary memory for storing the calculated wavelet coefficients,SGWT calculate and update the values of wavelet coefficients inplace every time instead of assigning an auxiliary memory for stor-ing them. SGWT are more efficient and less memory intensive thanFGWT with the only tradeoff being the computation time. Secondgeneration wavelets hold promising results in terms of accuracy ofreconstructed signal especially when analyzing extremely large se-quences such as those of eubacterial genomes. Most suitable numer-ical representations are EIIP and DNA walk (integer) and the mostsuitable wavlet functions are rbio1.1, bior1.1 and db1. The resultsof this paper therefore suggest the suitability of second generationwavelets in the analysis of genomic sequences along with identifyingthe most suitable numerical representations and wavelet functionsfor analyzing the enormous genomic data sets with the purpose ofcompletely understanding their underlying information.

References

[1] Arniker S.B., and Kwan H.K.: Graphical Representationof DNA Sequences. Proceedings of IEEE International Con-ference Electro/InformationTechnology (EIT), pp. 311-314,(2009).

[2] Sweldens W.: The lifting scheme:A custom- design construc-tion of biorthogonal wavelet. Applied and Computational Har-monic Analysis, vol. 3, no. 2, pp. 186-200, (1996).

[3] Lio P., and Vannucci M.:Finding pathogenicity islands andgene transfer events in genome data. Bioinformatics, vol. 16,no.10, pp. 932-940, (2000).

[4] Haimovich A. D., Byrne B., Ramaswamy R., and Welsh W.J.: Wavelet Analysis Of DNA Walks. J. Comp. Biol., vol.13,no.7, pp.1289-1298, (2006).

[5] Audit B., Vaillant C., Arneodo A., dAubenton Carafa Y., andThermes C.:Long- range correlations between DNA bendingsites: relation to the structure and dynamics of nucleosome. J.Mol. Biol., vol.316, pp. 903-918, (2002).

23

International Journal of Pure and Applied Mathematics Special Issue

439

Page 24: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

[6] Murray K.B., Gorse D., and Thornton J.M.: Wavelet Trans-forms for the Characterization and Detection of Repeating Mo-tifs. J. Mol. Biol., vol. 316, no. 2, pp. 341-363, (2002).

[7] Mena-Chalco J., Carrer H., Zana Y., and Cesar R. M., Jr.:Identification of Protein Coding Regions Using the ModifiedGabor-Wavelet Transform, IEEE/ACM Trans. Comput. Biol.Bioinform., vol. 5, no. 2, 198-207, (2008).

[8] Machado J.A.T., Costa A.C., and Quelhas M.D.: Waveletanalysis of human DNA. Genomics, vol. 98, pp. 155-163,(2011).

[9] Ning J., Moore C.N., and Nelson J.C.: Preliminary WaveletAnalysis of Genomic Sequences. Proceedings of the IEEEComputer Society Conference on Bioinformatics (CSB ’03),pp. 509-510, (2003).

[10] Trad C.H., Fang Q., and Cosic I.:Protein Sequence ComparisonBased on the Wavelet Transform Approach, Protein Eng. Des.Sel., vol. 15, no. 3, pp. 193-203, (2002).

[11] Tran L.M., Zhang B., Zhang Z., Zhang C., Xie T., Lamb J.R.,Dai H., Schadt E.E., and Zhu J.:Inferring Causal Genomic Al-terations in Breast Cancer Using Gene Expression Data. BMCSyst. Biol., vol. 5, no.1, pp. 121-134, (2011).

[12] Abbasi O., Rostami A., and Karimian G.: Identification ofexonic regions in DNA sequences using cross-correlation andnoise suppression by discrete wavelet transform. BMC Bioin-formatics, vol. 12, no. 430, (2011), doi:10.1186/1471-2105-12-430.

[13] Hatsuda H.:Robust Smoothing of Quantitative Genomic DataUsing Second-Generation Wavelets and Bivariate Shrinkage.IEEE Trans. Biomed. Eng., vol. 59, no. 8, pp. 2099-2102,(2012), doi: 10.1109/TBME.2012.2198062.

[14] Kedir-Talha M.D., and Amer M. A. A.:The lifted wavelettransform for encephalic signal compression. Proceedings of36th International Conference on Telecommunications and

24

International Journal of Pure and Applied Mathematics Special Issue

440

Page 25: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

Signal Processing (TSP), Rome, pp.541-544, (2013), doi:10.1109/TSP.2013.6613992.

[15] Bsoul M., and Tamil L.: Using second generationwavelets for ECG characteristics points detection. Proceed-ings of 1st Middle East Conference on Biomedical En-gineering (MECBME), Sharjah, pp. 375-378, (2011), doi:10.1109/MECBME.2011.5752144

[16] Ercelebi E.:Electrocardiogram signals de-noising using lifting-based discrete wavelet transform. Comput. Biol. Med., vol. 34,no. 6, pp. 479-493, (2004).

[17] Gavrovska A., Zajic G., Reljin I., Bogdanovic V., and ReljinB.:Second generation wavelets: Advantages in cardiosignal pro-cessing, in: Proceedings of 11th International Conference onTelecommunication in Modern Satellite, Cable and Broadcast-ing Services (TELSIKS), Nis, vol. 1, pp. 333-336, (2013), doi:10.1109/TELSKS.2013.6704942.

25

International Journal of Pure and Applied Mathematics Special Issue

441

Page 26: Performance Comparison of First Generation and Second ...ing of electrocardiogram (ECG) signals corrupted by non stationary noises: muscle artifact noise, electrode motion artifact

442