exploring statistical approaches to auditory brainstem response testing

75
Exploring Statistical Approaches to Auditory Brainstem Response Testing Mohammad Khan Student ID: 25544209 A dissertation submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Audiology University of Southampton Faculty of Engineering and the Environment Institute of Sound and Vibration Research Supervisor: Dr Steven Bell January 2015 Word Count: 18,507 (Excluding front matter, tables, graphs, references and appendices)

Upload: mohammad-b-s-khan

Post on 15-Apr-2017

257 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Exploring statistical approaches to Auditory Brainstem Response testing

Exploring Statistical Approaches to Auditory

Brainstem Response Testing

Mohammad Khan

Student ID: 25544209

A dissertation submitted in partial fulfilment of the requirements for the

degree of Bachelor of Science in Audiology

University of Southampton

Faculty of Engineering and the Environment

Institute of Sound and Vibration Research

Supervisor: Dr Steven Bell

January 2015

Word Count: 18,507

(Excluding front matter, tables, graphs, references and appendices)

Page 2: Exploring statistical approaches to Auditory Brainstem Response testing

1

Declaration

I, Mohammad Khan, declare that this thesis is my own work, except where acknowledged, and that

the research reported was conducted in accordance with the principles and regulations outlined by

the University of Southampton, Institute of Sound and Vibration Research. Ethics approval was not

required for this study as no human subjects were tested on.

Page 3: Exploring statistical approaches to Auditory Brainstem Response testing

2

Acknowledgements Thanks go to my supervisor, Dr Steven Bell, for his time, guidance and ongoing support throughout

the course of this research project. I would also like to express my gratitude to all the lecturers of

Audiology at the University of Southampton who have assisted and guided me throughout the degree.

Additionally, I would like to thank Guy Lightfoot and John Stevens for allowing me to use the original

data that they collected. Lastly, thanks goes to my family and friends for their continual support and

patience throughout this project and the entire BSc degree.

Page 4: Exploring statistical approaches to Auditory Brainstem Response testing

3

Abstract

Objective: The Auditory Brainstem Response (ABR) is an important test that is used primarily for the

detection of a hearing loss in newborn infants. Currently, the method of interpreting an ABR is purely

by visual inspection in accordance to the Newborn Hearing Screening Programme (NHSP) protocol

which allows for variability in the interpretation of a waveform due to its subjective nature. Therefore,

the implementation of an objective statistical measure to interpret an ABR is highly desirable.

Method: Comparison of experts’ interpretation of ABR waveforms were made to measure the level of

variability present. 93 averaged ABR traces obtained from 26 babies who failed newborn screening

were used. Sensitivity and specificity of two objective parameters; Fsp and Autratio (parameter which

uses 3:1 NHSP rule) was also explored using experts detection as gold standard. Additionally, data

were simulated to propose critical values for Fsp and Autratio. Bootstrap analysis was applied

throughout testing to indicate significance levels for values produced by the parameters.

Results: A high level of variability was found between four experts when interpreting ABRs (Kappa

<0.9). Overall, the application of the bootstrap method produced very advantageous effects in terms

of sensitivity and specificity levels for both Autratio and Fsp. Critical values of 3.0 and 5.2 were found

for Fsp and Autratio respectively for the bootstrap distribution α=5% (p≤0.05) for the detection a wave

V.

Conclusion: The high level of variability between clinicians is of great concern and should be addressed

by the NHSP. The application of an automatic version of the 3:1 rule combined with bootstrapping is

still not a viable option due to its poor specificity levels. However, the application of bootstrapping

will allow comparisons to be made across studies. Future work should explore the critical values

proposed in this report and address the many limitations mentioned.

Page 5: Exploring statistical approaches to Auditory Brainstem Response testing

4

Contents

1.0 Introduction…………………………………………………………………………………..……………………………………………6

1.1 Auditory Evoked Potential (AEP).……………………………………………………………………….……………6

1.2 Auditory Brainstem Responses (ABR)………………………………………………………………………………6

1.3 Procedure for testing ABRs in newborn hearing threshold detection………………………………8

1.4 Interpretation of the ABR – NHSP Recommendations……………………………………………………11

1.5 Signal to Noise Ratio (SNR)…………………………………………………………………………………………… 12

1.6 Methods of ABR analysis……………………………………………………………………………………………….14

1.6.1 Subjective Measures………………………………………………………………………….……………………….14

1.6.2 Objective Measures - ± Difference……………………………………………………………………………..17

1.6.3 Objective Measures - Fixed Single Point (Fsp)…………………………………………………………….19

1.6.3.1 Variations of Fsp…………………………………………………………………………………………22

1.6.4 Objective Measures - Bootstrap Analysis……………………………………………………………………24

1.7 Study Rationale……………………………………………………………………………………………………………..26

1.8 Research Question………………………………………………………………………………………………………..28

2.0 Method……………………………………………………………………………………………………………………………………..30

2.1 Data Collected by Lightfoot & Stevens…………………………………………………………………………..30

2.2 Simulated Data………………………………………………………………………………………………………………35

2.3 Risk Assessment and Ethics……………………………………………………………………………………………37

2.4 Statistical Analysis…………………………………………………………………………………………………………37

3.0 Results……………………………………………………………………………………………………………………………………….38

3.1 Analysis of Data Collected by Lightfoot & Stevens…………………………………………………………38

3.1.1 Comparison between Experts’ Interpretations………………………………………………38

3.1.2 Sensitivity and Specificity of Objective Parameters………………………………………..39

3.1.3 Correlational Analysis…………………………………………………………………………………….41

3.2 Analysis of Simulated Data…………………………………………………………………………………………….43

4.0 Discussion………………………………………………………………………………………………………………………………….48

4.1 Comparison between Experts’ Interpretations………………………………………………………………48

4.2 Sensitivity and Specificity of Objective Parameters……………………………………………………….49

4.3 Correlational Analysis……………………………………………………………………………………………………51

4.4 Simulated Data………………………………………………………………………………………………………………52

5.0 Conclusion…………………………………………………………………………………………………………………………………55

6.0 References…………………………………………………………………………………………………………………………………57

7.0 Appendices……………………………………………………………………………………………………………………………..…62

7.1 Appendix A………………………………………………………………….……………………………………………..…62

Page 6: Exploring statistical approaches to Auditory Brainstem Response testing

5

7.2 Appendix B…………………………………………………………………………………………………………………….66

7.3 Appendix C…………………………………………………………………………………………………………………….69

7.4 Appendix D……………………………………………………………………………………………………………………72

7.5 Appendix E…………………………………………………………………………………………………………………….73

Page 7: Exploring statistical approaches to Auditory Brainstem Response testing

6

1.0 Introduction

1.1 Auditory Evoked Potential - AEP

An Auditory Evoked Potential (AEP) is electrical activity within the auditory system which is evoked

using an acoustic stimulus. The three main types of AEPs often recorded by audiologists are the

Auditory Brainstem (ABRs), Middle Latency Response (MLR) and the Slow Vertex Response (SVR). The

different responses vary in the site of anatomical generation and in latency of onset. The ABR occurs

about 0 – 10 ms after the stimulus (Coats, 1978) (Hecox & Galambos, 1974) (Picton, et al., 1977)

(Gorga, et al., 1985) (Pratt & Sohmer, 1977), MLR occurs after 10 – 50 ms and the SVR occurs about

50 – 500 ms after the stimulus (Hall, 2006).

These activities are electrical potentials (brain waves) which are represented graphically. The on-going

electrical activity is recorded whilst an acoustic stimuli, such as a tone pip or a click, is presented to

the patient’s ear. This results in the activity of interest to arise from the ear, brain and nerves, which

travels through various tissues and structures until it is finally detected by surface electrodes (Hall,

2006). AEP’s can be categorized in accordance to the different latencies at which the peaks arise in

the waveform. There are short, middle and long latency waveforms. Short latency waveforms are

known to be generated by the auditory nerve and the inner ear structures. As time increases,

responses from the auditory brainstem can be seen along with activity from higher auditory structures

such as the cerebral cortex (Hall, 2006).

AEPs date back to the early 20th century, with the rise of their clinical applications starting from the

1970s due to the growth in availability of powerful computers. Some clinical applications of AEPs are

neonatal screening, threshold detection and detection of neurological disorders along the auditory

pathway.

1.2 Auditory Brainstem Response - ABR

The first description of the human ABR was by Dr. Don Jewett and John Williston in 1970. Since then,

the ABR has been widely implemented in clinics around the world.

The human ABR produces seven distinguishable peaks which are universally labelled using Roman

numerals. Evidence highlights that wave I is from the distal portion of the auditory nerve; wave II from

the proximal portion of the auditory nerve; wave III from the cochlear nucleus and wave IV from the

superior olivary complex and cochlear nucleus. Wave V is thought to be generated from the inferior

colliculus and lateral lemniscus. Wave V is the most commonly used element of the waveform which

Page 8: Exploring statistical approaches to Auditory Brainstem Response testing

7

determines hearing thresholds using the ABR. Lastly, waves VI and VII are generated by the inferior

colliculus (Weinstein, 2000).

The main application of the ABR in the UK is to test the paediatric population, specifically the newborn

population as behavioural testing is often unachievable. The baby firstly undergoes an otoacoustic

emission and an automated auditory brainstem response as part of the Newborn Hearing Screening

Programme (NHSP) (NHS, 2013). The failure of these two tests would require the patient to visit their

local audiology clinic for an ABR test. This is conducted in order to gain frequency specific information

regarding the child’s hearing status (procedure outlined in section 1.3).

The ABR in newborn hearing testing is often described as objective as this method does not require

patients to produce a conscious response in order for clinicians to determine hearing threshold levels.

However, clinicians visually interpret the waveform using a 3:1 rule as recommended by the NHSP

(Sutton, et al., 2013), which is subjective and gives rise to variability (Hall, 2006) (Vidler & Parker,

2004). Therefore, an objective method to interpret the ABR is highly desirable (Elberling & Don, 1984).

It is crucial that an ABR is interpreted correctly and as accurately as possible as its main application is

to assess hearing threshold levels and gives frequency specific information to identify if a hearing

impairment is present (Sutton, et al., 2013). This information is very important because the detection

of a hearing loss, especially in paediatrics, offers children the chance to adequately develop their

speech and language skills by using interventions such as hearing aids or cochlear implants. Many

theories proposed by linguistics and psychologists such as Noam Chomsky and B.F. Skinner, believe

that it is during childhood that language is acquired (Smith, 2004) (Maltby, et al., 2013). At this critical

time, deprivation of sound will result in unstable development. Accurate interpretations and even

recordings require a skilled clinician as many variables must be taken into account when testing. For

example, the amount of background noise present and the state of arousal of the patient can greatly

influence the quality of a recording. Despite this, the ABR is still currently deemed a very powerful

method of determining the hearing threshold levels for the newborn population (Warren, 1989)

Currently, there are no objective methods that use the 3:1 criteria, where the signal of the response

must be three times that of the background noise in an averaged waveform. Limited research has

been conducted to address this issue, as many researchers have proposed various other objective

methods such as the Fsp (Elberling & Don, 1984), Fmp (a modified version of Fsp) and the ± difference

(Wong & Bickford, 1980). The NHSP recently recommended that the Fsp or Fmp method should be

used alongside visual inspection in order to supports clinicians in making a decision. However,

research needs to be conducted to address the questions of how accurate the objective methods are

at correctly identifying a response. It is currently not known how these objective parameters compare

Page 9: Exploring statistical approaches to Auditory Brainstem Response testing

8

to the recommendation of a 3:1 SNR. Therefore, this report aims to synthesise and evaluate the

current literature that is focused on both objective and subjective methods of ABR analysis. In

addition, the implementation of an automatic objective version of the 3:1 rule will be investigated.

1.3 Procedure for testing ABRs in newborn hearing threshold detection

ABR testing is widely used by clinicians to obtain objective hearing thresholds. The main application

of ABR testing in the UK is testing newborn infants. However, it can also be conducted on patients of

all ages if behavioural testing is unavailable. In order to obtain threshold levels using ABR, a systematic

procedure is adhered to as recommended by the NHSP (Sutton, et al., 2013).

Firstly it is recommended that stage A checks should be conducted at the beginning of each session

(NHS, 2008). Testing should ideally be performed in a sound proofed room, as the presence of

extraneous sound may cause interference with the recordings. Furthermore, all equipment should be

positioned a suitable distance away from the patient in order to reduce the level of electrical

interference. The NHSP also recommend the use of single use Ag/AgCl surface electrodes. Prior to this,

the skin should be gently abraded to allow for adequate impedance levels. Currently an impedance

level of 5kΩ or below is recommended (Sutton, et al., 2013).

A single channel recording parameter is recommended to use for AC and BC ABR (Sutton, et al., 2013)

(Stevens, et al., 2013b). The electrodes should be located accordingly:

Positive electrode High forehead (as near to Cz as possible and midline) Negative electrode Ipsilateral mastoid Common electrode Contralateral mastoid

Stevens et al., (2013a) report obtaining larger amplitudes of wave V when applying the negative

electrode on the nape of the neck rather than the mastoid. However they report that there is little

difference in test efficacy. When conducting BC ABR, two-channel recordings may be considered to

determine which cochlea is generating the ABR. If used, the montage should be as follows:

Positive electrode High forehead (as near to Cz as possible and midline) Negative electrodes Ipsilateral mastoid and contralateral mastoid Common electrode Forehead

However limitations are identified by Foxe & Stapells (1993), Small & Stapells (2008) and Stapells &

Ruben (1989) with this montage of wave V detection as it does not detect the correct side cochlea

that is generating the ABR in 100% of cases and may falsely label some unilateral conductive losses as

sensorineural.

Page 10: Exploring statistical approaches to Auditory Brainstem Response testing

9

Table 1a. Parameter recommendations by the NHSP (Sutton, et al., 2013) to obtain optimal recordings.

Tone pip ABR, click ABR and narrow-band chirp ABR can be used to measure hearing thresholds.

Studies by Ferm, et al., (2013) and Elberling & Don, (2010) report an advantage of using narrow-band

chirp ABR where the ABR response is usually found to be larger, which in effect reduces test time.

However as the method of using narrow-band chirp ABR is still relatively new, little experience is

present when using this method in more severe hearing impairments. Thus it is recommended to use

tone pip ABR as it provides more frequency specific responses (Stevens, et al., 2013b) (Sutton, et al.,

2013).

To present the acoustic stimulus, supra aural headphones such as the TDH39/49 models or insert

earphones (e.g. ER-3A) are suitable to use. Care should be taken when using insert earphones as they

may deliver greater amplitudes to babies due to their smaller ear canal (Sutton, et al., 2013). A B-71

bone vibrator that can present up to 60 dB nHL should be used for BC ABR and should be placed on

the mastoid as it produces a greater level of stimulus compared to being placed on the forehead in

Page 11: Exploring statistical approaches to Auditory Brainstem Response testing

10

paediatric audiology (Sutton, et al., 2013). Furthermore, Sutton, et. al., (2013) state that a sleeping

baby is the key to a successful test, as interferences are kept to a minimal.

An artefact rejection level between 3 μV and 10 μV is recommended, and an initial value of no more

than 5 μV should be set up in test protocols. If occasionally the background activity is above 5 μV for

long periods, it is suggested to wait until the activity reduces. If no reduction is observed, then the

rejection level can be raised to a maximum of 10 μV. Filter settings of 30 Hz and 1500 Hz are suggested

by Sutton et. al. (2013) as these produce the best SNR of wave V. The use of digital filters are not

suggested by the NHSP as they cause changes in the waveform which may give rise to difficulties

during interpretations.

Testing is recommended by the NHSP to be started at 60/40 dB nHL (Stevens, et al., 2013b) and steps

should be changed in 10 dB. However, the use of larger steps may be necessary if the clinician believes

that the baby may not be asleep for very long. In this case, steps of 20 dB should be used. Two traces

should ideally be recorded at each stimulus intensity. The clinician needs to label if the response is a

clear response (CR), result absent (RA) or inconclusive (details of which are discussed later in section

1.4). If a CR is present, then a reduction by 10 dB or 20 dB should be made. If the result at the lower

stimulus level is RA, the clinician should increase by 10 dB (Stevens, et al., 2013b). This down in 20/10

dB, up in 10 dB method should be used and two repeatable traces at each level should be obtained.

The gold standard of threshold is defined as the lowest level at which a CR is present, with a RA

recording at a level 5 or 10 dB below the threshold, and a CR at 5/10 dB above threshold (Sutton, et

al., 2013).

The minimum criteria to be considered for a discharge is 30 dB nHL in both ears using AC tone pip ABR

at 4 kHz for newborn patients. This frequency is used as it is the most sensitive to sensorineural hearing

losses (SNHL) and is usually the easiest tone pip ABR to record (Stevens, et al., 2013b). After 4 kHz is

obtained, 1 kHz, 2 kHz and 0.5 kHz should be tested respectively. Both ears must be tested for the

newborn baby, giving priority to starting testing on the better ear.

AC click ABR may be considered only if it is not possible to conduct tone pip ABR or if quick results

need to be found as click stimuli cover a wider range of frequencies (Hall, 2006). In addition, if AC

thresholds are raised, BC thresholds should be measured to 15 dB nHL to check for a conductive

element (Stevens, et al., 2013b).

Page 12: Exploring statistical approaches to Auditory Brainstem Response testing

11

1.4 Interpretation of the ABR – NHSP Recommendations

When interpreting ABR waveforms, clinicians are less interested to know whether a response is

normal, and more interested to know if a response is definitely present or not. The detection of a CR

is based on the signal to noise ratio (SNR) of the averaged waveform. For each stimulus level, the result

is marked in one of three ways:

Decision criteria for the result at each stimulus level

CR Clear Response present RA Response Absent Inc Inconclusive

The defenitions of CR, RA and Inc are reported by Sutton, et al., (2013):

Definition of a clear response

1 Does waveform have good morphology, latency, amplitude? 2 Is wave V peak to SN10 3:1 of the average noise level? 3 Is the wave V peak to SN10 trough > 0.05 µV?

If answer yes to all 3 points = CLEAR RESPONSE (CR)

Definition of a response absent

1 Does not fit criteria of clear response? 2 The average difference between the 2 traces ≤ 0.025 µV?

If answer yes to both points = RESULT ABSENT (RA)

Definition of an inconclusive response

1 Does not fit criteria of clear response? 2 Does not fit criteria of result absent?

If answer yes to both points = INCONCLUSIVE

The analysis is recommended to be performed by a skilled clinician as experience is key to the

successful interpretation of an ABR as the wave V is sometimes difficult to identify. Some clinicians

use the highest amplitude on the wave to define ABRs by their peak, whereas others use the shoulder

of the wave. This method is commonly used as waves IV and V are often combined, which does not

result in an obvious peak for each wave. These dissimilarities in methods highlight the variations that

can occur during the interpretation of an ABR between different clinicians, which may cause different

results.

Furthermore, when interpreting an ABR waveform, an important factor to consider is the latency of

the wave. Latency is the time between two peaks which change as a function of frequency and

Page 13: Exploring statistical approaches to Auditory Brainstem Response testing

12

intensity. It is important to consider latencies during ABR interpretations as an increase in stimulus

intensity, decreases the absolute latency of the ABR (Hall, 2006). Lastly, and most importantly, the

interpretation of an ABR waveform includes the identification of a wave V response. As wave V is used

to determine whether a response is present or not, it is vital that this is correctly identified. Incorrect

identification or misinterpretation can lead to problems in language development as sufficient

amplification may not be provided by a hearing aid or a hearing loss may not be detected.

Furthermore, over amplification is another issue that may arise and can lead to cochlea damage. There

are two approaches that are employed by clinicians when attempting to identify a wave V. Clinicians

either choose the final point on the waveform before the negative slope that follows (SN10), or

selecting the peak. As waveform morphology may sometimes be irregular causing issues with

interpretations and repeatability, it is recommended that parameters should be adjusted accordingly

in order to achieve clearer responses which are repeatable (Hall, 2006) (Sutton, et al., 2013).

Evidently, there are many factors to take into consideration during the interpretation of an ABR

waveform. The combination of which results in considerable variability between different clinicians,

emphasising the desirability of an objective approach.

1.5 Signal to Noise Ratio (SNR)

ABR responses that are detected by surface electrodes are extremely small in voltage, thus are

measured in microvolts (µV) which is one-millionth of a volt. Hence these signals are required to be

amplified up to 100,000 times to be distinguishable. In addition, the signal of interest is concealed

within other brain activity (electroencephalogram - EEG) and also other extraneous electrical signals

from outside the range of the auditory system. Electrical activity which is not of interest and is

unrelated to the auditory stimulus is known as noise. The SNR value thus defines the quality of a

recording by taking into account the level of noise that is present. The higher the SNR, the better the

quality of the recording (Hall, 2006).

There are several factors that may influence the SNR when recording an ABR. For example, if the

patient is not relaxed during the recording and their head is not supported, this may increase the

volume of myogenic noise that is present (Hall, 2006). Problems with myogenic noise is common when

conducting ABR testing in infants and children, resulting in difficulties obtaining adequate SNRs.

Hence, only if deemed necessary, the use of a sedative such as chloral hydrate is occasionally used for

inducing sleep, although this method is not routine in the UK (Reich & Wiatrak, 1996). Other ways of

improving SNR is to ensure that all unnecessary electrical equipment are turned off. This includes

Page 14: Exploring statistical approaches to Auditory Brainstem Response testing

13

lights, fans, computers and similar equipment. Additionally, using a quiet room (sound proof room)

will ensure that extraneous background noise does not mask the evoking stimulus, generating AEPs

which may interfere with the resultant traces. Another factor to consider is the electrical impedance

levels between electrodes. As mentioned previously, an impedance level of 5 kΩs or below is

recommended (Sutton, et al., 2013). Impedances of higher values do not affect the ABR itself, but

pickup an increased volume of external electromagnetic interference and of artefacts from movement

of the electrode (Bremner, et al., 2012).

Several noise removal techniques are used to improve SNRs. One of the main methods, perhaps the

most important, is signal averaging. This method works on several assumptions (Hall, 2006) (Rice

University, 2014):

1 The evoked response to each stimulus is repetitive and time locked

2 The noise is a randomly varying AC wave (uncorrelated to the stimulus)

3 The temporal position of each stimulus and response waveform are accurately known

The recorded waveform is split up between the evoked response (signal) and the noise. The noise is a

random AC waveform, therefore it is equally likely to be positive as it is negative at a given point in

time and varies from epoch to epoch. As a result, noise will tend to cancel out in the Long Term

Average (LTA). Signal averaging works by presenting the auditory stimulus hundreds of times one after

another, in a systematic fashion, causing evoked responses to be detected. As the evoked activity is

repetitive and time locked to the stimulus, this means that it will always produce electrical activity of

very similar voltage at a specific time after the stimulus. Noise, which is also detected by the skin

electrodes, is presumed to be random; i.e. not time locked to the auditory stimulus nor does it produce

a specific reaction to the auditory stimulus. Thus the process of averaging reduces the amount of noise

present in the recording, increasing the SNR (Rice University, 2014).

Bayesian averaging is another type of averaging that can be used to increase SNR. This method works

by weighting individual sweeps inversely with the estimated power of background noise. However,

underestimations of the signal amplitude can sometimes occur (Hall, 2006). Furthermore, sorted

averaging is another method that can be used to reduce the effect of interference. All sweeps are

sorted according to their estimated background noise and then are weighted accordingly. Evidence is

present suggesting that the use of sorted averaging produces significantly higher SNR levels compared

to using Bayesian and standard averaging methods (Mühler & Specht, 1999).

Artefact rejection is also used alongside signal averaging in order to reduce the levels of noise by

rejecting sweeps that consist of noise above a certain level (Hall, 2006). A rejection level between 3

μV and 10 μV is recommended (Sutton, et al., 2013), meaning that if a signal exceeds this

Page 15: Exploring statistical approaches to Auditory Brainstem Response testing

14

predetermined limit, it will not be included in the averaged sweep as it is deemed to have excess

interference (Hall, 2006).

Another method to increase SNR is the use of filters. The use of a notch filter at 50 Hz is used to exclude

the mains-hum (Morelle, 2012). Additionally, using a band-pass filter allows for the recording of traces

at selected frequency bands, helping increase SNR levels (Stockard, et al., 1978), (Laukli & Mair, 1981)

(Ruth, et al., 1982).

Despite the various methods used to decrease the level of interference in a recording, clinicians are

still required to be cautious and should not depend solely on these methods. It is essential for clinicians

to apply their skills and knowledge in order to successfully reduce the level of background noise in

order to meet the 3:1 criteria and correctly identify a wave V. Therefore, the goal still remains to put

in place an alternative method to determine the presence of a significant ABR without considerable

variability.

1.6 Methods of ABR analysis

1.6.1 Subjective Measures

The conventional method used to analyse and interpret ABR waveforms conducted by audiologists in

the UK is by the means of visual inspection, following the NHSP protocol (Sutton, et al., 2013). The

NHSP recommendations were provided in order to help implement a standardised procedure to

reduce the variability that may be present between different clinicians and departments.

Furthermore, the NHSP has provided examples of ‘gold standard’ responses, where the wave V is

labelled by the NHSP for all clinicians to view and recognise (Sutton, et al., 2013). The criteria for a

gold standard response is outlined in section 1.4. There is still no certain way to ensure that all

clinicians will identify a wave V on the same place as other clinicians, thus the introduction of a gold

standard method does not fully eliminate the high level of variability.

Vidler & Parker (2004) designed an experiment to test the level of variability between professionals

when interpreting waveforms. The subjects were 16 professionals who had a mean experience of 8.41

years with ABR testing, ranging from 1.5 to 25 years. The employment of subjects with a wide range

of experience increases this studies external validity, as this provides a good representation of the

population of audiologists. 15 subjects worked as audiological scientists, where one subject worked

as a post-doctoral researcher on AEPs. This may mean that this individual subject may have had

different training to interpret ABRs, and may not be representative of a conventional audiologist.

Page 16: Exploring statistical approaches to Auditory Brainstem Response testing

15

The subjects used a simulator which produced 12 traces that had to be interpreted. The computer

simulator allowed the subjects to have control over obtaining the responses. For example, the

stimulus level at which to start testing, terminate averaging and to record further traces were options

available to the subjects. These options were available to the subjects as an attempt to replicate a real

clinical environment. However, it did not truly represent what a clinician may be presented with in

clinic. The subjects were not given knowledge of previous test results such as failed screening tests,

results of behavioural tests or history taken at the time of acquiring the traces. This may present as a

limitation to this study as the exclusion of this information would not occur in real world scenarios.

Audiologists may have used this information to assist their interpretation of ABRs in clinics, thus the

ecological validity of this study is threatened. However, in order to replicate real clinical situations,

Vidler & Parker included a range of thresholds for which the balance between noise and response

varied.

The results of this test indicate that considerable variability is present between subjects’

interpretations of ABRs. No consistent agreement of thresholds between the 16 subjects were found

for any traces. For nine traces, differences of 35 dB or greater were found between estimated

threshold levels. This wide range of estimated thresholds suggests that large differences in patient

management may likely occur.

The crucial question to address is if this level of variability exists due to the nature of the experimental

procedure. If so, it is unlikely to be representative of real clinical practice. The simulation contained

pre-set high and low frequency filter values which presents a limitation as it was reported by some

participants that different filter settings are used in their clinic. It was also reported that SN10

information was lost as a result of the filter setting which may have resulted in an increased difficulty

during the interpretation of waveforms as many clinicians use SN10 to identify wave V. Additionally,

subjects requested features such as traces to be displayed with negative peaks and peak labelling,

however these features were not available. The authors also note that the test material may have

been unrepresentative of clinical practice as it may have been biased towards difficult ABR cases.

There was a limitation as to the number of traces that could be acquired at each stimulus intensity,

which may have effected judgement of threshold. There were 30 instances in total where clinicians

indicated that recording a third trace would be their next step in data acquisition.

Lastly, data was recorded form the adult population and not the paediatric population. As the main

application of the ABR in the UK is for newborn testing, the adult population does not provide an

adequate representation, reducing the study’s external validity. The combination of the limitations

Page 17: Exploring statistical approaches to Auditory Brainstem Response testing

16

mentioned above suggest that this test interface did not fully allow the replication of real world

scenarios; a major weakness is present in terms of the ecological validity of this study.

A study was conducted by Gans et al. in 1992 (cited in Vidler and Parker, 2004) who also found

significant levels of variability between clinicians when determining threshold levels using ABR testing.

Additionally, they found that the more experienced testers were more accurate in their identifications

of a wave V. However the significance of their results are limited as Gans et al. used a sample of nine

students which is unrepresentative of the conventional clinicians who interpret ABRs.

Kuttva et al., (2009) also aimed to investigate the level of variability between clinicians when reporting

ABR thresholds. They looked at the effect of peer review on threshold detection. 76 babies who failed

the NHSP and were referred to the audiology department for ABR testing. A major advantage to this

study is the use of the paediatric population, who are the main group of patients that undergo ABR

testing. Babies were split into two groups, group A consisted of 38 babies who were tested when no

formal peer review was in place. Group B consisted of another 38 babies where peer review had been

in operation for at least 6 months. It was not stated whether the same or different clinicians examined

groups A and B.

The babies were then tested by individual audiologists and then peer review was conducted by

experts. The experts were two audiologists with at least five years of experience with ABRs. A

limitation to this study is that the details of the initial audiologists were not stated. This means that

critical information, such as the years of experience with ABRs or the number of audiologists used, is

not known which may result in a weakness in the studies external validity as it is harder to generalise

these findings to a larger population of audiologists. Another limitation to this study is that only AC

click stimulus traces were considered for the audit. This does not replicate real world scenarios where

tone pip and bone conduction ABR testing may be used which may yield different resultant waveforms

and thus different interpretations.

Kuttva et al. found differences of up to 20 dB for group A between the threshold levels reported by

the tester and the experts. A Wilcoxon’s signed-rank revealed a significant difference was present

(p<0.00) between the experts and the testers in group A. Similarly findings were also present for group

B, where differences of up to 35 dB were found between testers and experts. This study supports the

findings of Vidler & Parker (2004) and Gans et al. (1992), highlighting the variability that is present due

to the subjective nature of interpretation between different clinicians and therfore emphasising the

desirability of an objective method.

Page 18: Exploring statistical approaches to Auditory Brainstem Response testing

17

1.6.2 Objective Measures

Several researchers have proposed different objective methods to analyse ABR waveforms in order to

reduce the level of subjectivity and variability. However, these methods do not take into account the

3:1 rule of analysis as recommended by the NHSP. This thesis will now focus on the ± difference

method proposed by Wong and Bickford (1980), the Fsp parameter, proposed by Elberling & Don

(1984) and variations of the Fsp.

± Difference

Wong and Bickford (1980) conducted an experiment to determine the presence of a signal in

background noise. They used the ± difference technique in order to add objectivity to the analysis.

This method provides an alternate technique to estimate the SNR in a recording and allows for instant

results to be obtained regarding the signal size. It is found by assigning the even numbered stimulus

responses to one group, and the odd numbered stimulus responses to another, and their coherent

averages calculated accordingly (Wong & Bickford, 1980). By the application of this method during

averaging, the variance in the background noise is estimated which will aid the removal of runs with

poor SNRs. The intervals of variance are calculated in order to find p-values to determine the

empirically acceptable conditions. This is conducted by firstly selecting areas of interest or the whole

array (excluding stimulus artefacts).

𝑉𝑎𝑟() =1

180∙ ∑[

𝑡

𝐴(𝑡) − 𝑚𝑒𝑎𝑛𝐴(𝑡) 2

]

𝑉𝑎𝑟(′) =1

𝑚∙ ∑[

𝑡

𝐴′(𝑡) − 𝑚𝑒𝑎𝑛𝐴′(𝑡) 2

]

𝑃 =𝑉𝑎𝑟()

𝑉𝑎𝑟(′)

A = Average

A(t) and A’(t) = The noise average samples stored in different arrays

P = p-value

M = Each epoch of 10ms represented by 256 words

180 points corresponds to a 1.17 – 8.2 ms region of interest

Table 1.6.2a. Formula for the calculation of the ± difference method. (Wong & Bickford, 1980)

The formula displayed above on table 1.6.2a allows for the calculation of the ± difference which is

used to determine if a significant response is present or not. If p<20, this may suggest contamination

Page 19: Exploring statistical approaches to Auditory Brainstem Response testing

18

of the recording by artefacts and that there is no significant response present. In contrast, if the p-

value is >30, then the response is significant in comparison to the SNR.

Wong and Bickford used two different methods in order to obtain their data. The first method was by

using computer simulated waveforms in order to compare statistical analysis with visual analysis in

the detection of a response. The second method was using two participants who’s data were recorded

twice when restless to imitate background noise, and once when relaxed. The use of only two subjects

may present a limitation in terms of generalizability as a greater sample size would be needed to

represent the target population. No further information was given regarding the subjects’ ages or sex;

as a result, we cannot identify if a paediatric or adult population was used to collect the data, thus

threatening the studies external validity. Furthermore, dissimilar conditions of testing were used for

the two subjects. This results in a weakness in the study’s internal validity as this may introduce

confounding variables which may affect the ability to draw comparisons between the two results.

The results using the human participants revealed that as the intensity decreased from 80 dB nHL to

5 dB nHL, p-values also showed a decrease. However when additional averaging was conducted, a rise

in p-values was observed. Furthermore, a relationship between p-values and body movements was

observed; as body movement increased, p-values decreased.

Wong and Bickford also investigated the effects of noise on amplitude, peak and latency of a

waveform using the data collected from the two participants. They found that poor SNR conditions

give rise to miscalculations in amplitudes, peaks and latencies of a waveform. Thus if low p-values are

observed, this should indicate that improvements are needed with respect to the testing conditions

in order to decrease the noise and not just carry out further signal averaging.

The results of the testing using simulated data revealed that the ± difference method was in good

agreement with visual analysis when determining the presence of a signal. However, Wong and

Bickford acknowledge that visual analysis should be used along with the objective technique as p-

values alone could not be fully relied upon for asymmetrical waveforms. As a result, the exclusion of

subjective analysis has not been achieved by the use of this objective method.

From conducting their study Wong and Bickford concluded that using the ± difference technique

allows clinicians to be indicated when further averaging is necessary, when test conditions need to be

improved and allows one to determine if a significant signal is present or not. However several

limitations are present in this study which should be addressed by, for example, using a larger number

of participants who vary in age and gender in order to find results which can be applied with greater

confidence.

Page 20: Exploring statistical approaches to Auditory Brainstem Response testing

19

1.6.3 Fixed Single Point (Fsp)

A commonly applied quality estimator for the ABR response was proposed by Elberling & Don (1984).

They recognised the difficulty in concluding actual SNR levels and attempted to provide a solution by

proposing a statistical method which determines the quality of the SNR in the averaged recording.

They proposed the Fsp method which is an algorithm that measures the variance ratio to statistically

calculate post-average SNR levels. Residual noise is calculated by measuring the differences in the

noise values obtained from a single point in each sweep from a fixed analysis time window. This

technique provides an estimate of the ratio of the response and the noise that is present in recordings.

Signal averaging reduces the amount of noise present in a recording as the magnitude of noise varies

widely compared to the magnitude of the signal of interest. Thus the level of noise is reduced as signal

averaging increases, increasing SNR, which in turn yields a greater Fsp value. The purpose of this

method was to reduce the reliance on subjective measures and allow the ability to make comparisons

of data across studies.

𝐹𝑠𝑝 = 𝑉𝑎𝑟(𝐴𝐵𝑅 )

𝑉𝑎𝑟(𝑆𝑃 )

Var(𝐴𝐵𝑅 ) = Variance of the averaged ABR

Var (SP) = Estimated variance of background noise

Table 1.6.3a. Fsp equation to determine the magnitude of the background noise. (Elberling & Don, 1984)

Elberling & Don (1984) conducted an experiment to determine the ratio between SNR in averaged

ABR recordings consisting of both the response and background noise. They used 10 subjects in this

experiment. No information was given regarding their age or gender which may cause issues regarding

the extrapolation of the findings to the population as these subjects may not provide a true

representation of the population. Also, a greater sample size should be used in order to effectively

test the usefulness of the parameter. The authors noted that the subjects were either members of the

laboratory or paid volunteers, which may result in recruitment bias. Furthermore, the authors

mention that there was considerable variation between subjects and levels of background noise,

therefore a reasonable sample was established and it was not deemed necessary to use data from the

newborn population to represent this. It was also reported that during the process of recording ABR

responses, variations were presents in background noise over time and changes were observed in the

morphology. However these deviations were considered unimportant as they had no clinical

significance.

Page 21: Exploring statistical approaches to Auditory Brainstem Response testing

20

The degrees of freedom (V1) was calculated in the physiological background noise using a no-stimulus

condition. By choosing the worst case design, the value of the degree of freedom was calculated to be

V1=5. Using this information, they authors report that the minimum SNR associated with any Fsp value

is 2.25 (p≤0.05) or 3.10 (p≤0.01). This means that when detecting thresholds using ABRs where the

quality of waveform morphology is less important, if an Fsp value of 3.1 is achieved, it indicates that a

response is positively detected (p<0.01). Furthermore, the authors suggested that as the Fsp method

produces a simple answer of either a positive or negative detection of a response for whether the

thresholds are significant or not, test times should be shortened for threshold evaluation. The criterion

value for the Fsp can also be altered to any value, making it an advantageous addition to newborn

testing.

They concluded that the use of Fsp allows for objectivity to be added when determining whether a

response is present or not, relieving the reliance on subjective analysis. Additionally, it allows for a

shortened test time for threshold detection as a response can be identified straightaway with 99%

confidence when an Fsp of 3.1 is met, therefore making it a useful addition to ABR testing. Elberling

and Don compared the Fsp method with Wong and Bickford’s method of objective analysis and found

that the ± difference technique was more prone to type II errors. They state that this was the prime

reason they aimed to use another method for the estimation of noise. The authors also added that if

the Fsp method was combined with time varying filtering, significant improvements would be

observed in the quality of ABR recordings, allowing for a much superior form of response detection.

This study presented with several limitations. For example, impedance levels were not reported which

may cause issues with reliability as it may be harder to replicate these findings. Additionally,

impedance levels have an effect on the volume of external electromagnetic interference that is

present on the ABR along with the amount of artefact from the movements of the electrodes

(Bremner, et al., 2012). Furthermore, Elberling & Don did not mention any use of a sound treated

room which may give rise to questions regarding the level of noise present and whether the

experiment setup replicates real world scenarios. Future work should address the aforementioned

limitations, and more importantly, use a larger sample size and ideally a newborn population as this

is the most common patient group used in ABR testing.

In addition to their previous study, Don, et al. conducted an additional study in 1984 shortly after

Elberling and Don (Don, et al., 1984). This study further applies the Fsp method to demonstrate the

application of Fsp in automatic threshold detection and to estimate the number of sweeps required

to reach detection criterion. Don et al. claim that the application of this method can theoretically

reduce the test time and help reduce the variability in test interpretation.

Page 22: Exploring statistical approaches to Auditory Brainstem Response testing

21

Six normal hearing adults were used for this study. However it was not stated whether the hearing

statuses of the participants were actually determined using a test protocol such as PTA, or if the

hearing statuses were just assumed. This lack of information may result in the experiment to be harder

to replicate and doubts about its accuracy may arise. Furthermore, similar to the limitation that is

present in the previous study, a newborn population was not used, threatening its external validity as

the main application of the ABR is newborn testing.

Of the six subjects, four were female and two were male. The age of the subjects ranged from 20 to

32 years old. The use of such a small quantity of participants gives rise to a weakness in this study as

the use of additional participants may have given data which may represent the population better.

Furthermore, an uneven ratio of males to females was used which may also cause problems with

generalizability as results may be more biased towards a specific gender. Additionally, the use of a

specific age range of 20 to 32 years old may also result in a weakness in the studies external validity

as this does not accurately represent the entire population which includes younger and older people.

The reason behind why most studies do not use a paediatric population is unstated. It may perhaps

be due to ethical reasons. However, this results in a vital gap in the literature as most studies use an

adult population to test the effectiveness of objective analytical parameters such as the Fsp.

The results of this study shows that when making comparisons between this study (Don, et al., 1984)

and Wong and Pickford’s (1980) study, Don et al. found that a lower number of sweeps were needed

in order to achieve 99% significance with the Fsp method compared to using the ± difference method

for the same conditions. In addition, a greater risk of Type II error rates was found using the ±

difference method compared to using Fsp, which supports the findings in the previous study by

Elberling and Don (1984). Furthermore, Don et al. state that care should be taken when determining

the position of the analysis time window. When they used a 0-10 ms time window criterion was not

reached, but was reached by 2500 sweeps when using a 4-14 ms window.

Don et al. predicted a linear growth between the number of sweeps averaged with improvements in

SNR in order to predict the number of sweeps necessary to reach threshold criterion. The number of

sweeps that is required to achieve significance can be calculated based on the linear interpretation of

the data if Fsp criterion is not achieved by 1500-2000 sweeps. If the calculated number of sweeps is

excessively high, this may indicate to the clinician to increase the intensity as there may not be a signal

present, or the SNR is very poor. They found that this was true for 90% of the cases when the maximum

number of sweeps reached 5000.

Don et al. provided evidence to strengthen the possibility of adding objectivity to the analysis of ABRs.

Thus the implementation of the Fsp may possibly allow for the automatic detection of a response

Page 23: Exploring statistical approaches to Auditory Brainstem Response testing

22

during threshold detection, reducing the need for subjective analysis. In addition, it would result in a

standardised method which will allow comparisons to be made between results at different clinics.

Furthermore, compared to Wong and Bickford’s method, the Fsp proved to be superior with respect

to accuracy and type II errors.

This study presents several limitations, the most important one being the limited sample size and no

use of the newborn population. Further research should address these issues in order to fully test the

effectiveness of the parameter, increasing the internal and external validity of the study.

Another study was later conducted by Elberling & Don (1987), who used Fsp criterion 3.1 in order to

detect the presence of a response (Elberling & Don, 1984). Psychoacoustic behavioural thresholds

were found using a modified block up-down method (Wetherill & Levitt, 1965). This allowed

comparisons to be made between ABR (Fsp) and psychoacoustic thresholds. They used 10 normal

hearing subjects to record the ABR data. A small sample size threatens the generalizability of the

findings.

The results indicate that across the 10 subjects, a slightly higher median value was found using the Fsp

method compared to using the modified up-down method. The ranges were found to be very similar

for the two test parameters. The results provide evidence that ABR thresholds detected with Fsp

criterion 3.1, are on average, elevated compared to the psychoacoustic thresholds determined by the

modified block up-down method. This study presents several limitations; no information was given

regarding impedance levels, the use of a sound proof room or location of time window on ABR.

Perhaps the use of a more conventional style of behavioural threshold detection, such as PTA, would

have yielded more accurate information and would have allowed for more accurate comparisons to

be made.

1.6.3.1 Objective Measures Continued – Variations of Fsp

Similarly, the Fmp method can also facilitate clinicians in estimating the quality of a recording by the

analysis of ‘multiple points’ – hence ‘mp’. The Fmp analysis tool produces a value based on the

statistical confidence of the repeated detection of a response. Similar to the Fsp method, it uses time-

locked points for the response size and residual noise in order to provide a confidence level. Repeated

presence of time-locked points are analysed for each sweep, where the presence of less variation in

the measures produces a higher rate of confidence in the response. Likewise, residual noise is

determined by calculating the differences in the noise values obtained from the multiple points in

Page 24: Exploring statistical approaches to Auditory Brainstem Response testing

23

each sweep. The amplitude of the noise is then measured; the lower the variation, the less noise in

the trace (Sauter, et al., 2012).

Silva (2009) conducted research into the Fmp method of analysis in AEPs. Silva used a modified non-

stationary fixed multiple point method - NS Fmp. The use of this method would allow for the

accounting of a discrete number of noise sources which may prove to be beneficial as in real world

situations, there may be different sources of noise present. Monte-Carlo simulations were conducted

along with using data from 5 normal hearing subjects. ABR measurements were conducted at 0, 20,

30, 40, 60 and 80 dB SPL where two recordings were made at each level expect 0 dB SPL. A total of

4000 sweeps were obtained at each level using rarefaction clicks. Perhaps the use of tone pips would

have allowed for better generalizability of the findings as threshold detection is recommended to be

carried out using tone pips (Sutton, et al., 2013).

The study by Silva (2009) contained two stages in order to analyse the quality of different SNR

estimators. Firstly, the Fsp and NS Fmp method were evaluated in order to compare the mean square

errors. Secondly, Silva aimed to compare the receiver operating curves for the Fsp and Fmp using real

data obtained from 5 participants. The results showed that by using NS Fmp method, lower mean

square errors were observed compared to using the Fsp parameter. It was also found that the

weighted averaging Fmp has a greater receiver operating curve area compared to the Fsp method. In

addition, NS Fmp proved superior compared to Fsp on artefact rejection levels, as did weighted

averaging compared to conventional averaging.

Another method of objective analysis was introduced by Mocks et al. (1984) where single sweeps are

used to estimate SNR based on power estimate ratios of the signal and noise. Ozdamar & Delgado,

(1996) conducted an experiment to investigate the relationship between the Fsp method, proposed

by Elberling and Don, with the method by Mocks et al. (1984). Ozdamar & Delgado developed a

computationally efficient method of calculating the SNR estimate along with the signal and noise

estimate in ongoing averaging. Both parameters, (Elberling & Don, 1984) (Mocks, et al., 1984), were

evaluated and compared using the developed SNR technique.

Ozdamar & Delgado recorded four sets of data at a given stimulus level from each subject. The storage

of single sweep responses allowed the inspection of recording characteristics and off-line execution

of conventional or other averaging techniques as well as different signal and noise power calculations.

In addition, this also allowed for the direct comparison of different signal-processing techniques to be

studied as no variations in EEG, external or internal noise were present. Only four young subjects with

normal hearing were using in this study, consequentially the external validity is weakened. Recordings

Page 25: Exploring statistical approaches to Auditory Brainstem Response testing

24

were conducted without artefact rejection to obtain both noisy and clean sweeps to test and compare

the performance of the various processing and SNR estimation techniques.

Analysis of the parameters described by Elberling and Don (1984) and Mocks et al. (1984) showed that

they are in fact very similar. The Fsp can be generalised to cover multiple points in time (Fmp) and was

found to be practically equivalent to the SNR estimate of Mocks et al. (1984) with only a difference in

unity. The authors noted that both parameters that were developed for the measurement of signal

power, residual noise and SNR estimates, proved very useful, not only for monitoring the averaging

process, but also in implementing various noise reduction algorithms. In addition, both methods can

be readily implemented in clinical applications for online averaging. The authors also state that a total

time saving of 65% was achieved compared to using only standard averaging with a fixed sweep count

of 2048. The methods analysed in this study are widely applicable to any averaging technique and are

especially important for hearing screening and threshold detection, where reducing test times is

favoured and where an objective means of detection is desired for several reasons. However, this

study presented several limitations, the main one being the small sample size. Further research should

address these limitations in order to strengthen the external validity which would allow for the

findings to be generalised with greater confidence.

1.6.4 Bootstrap Analysis

The bootstrap technique was first introduced by Efron (1979). It falls under the broad umbrella of

resampling methods. This technique provides a means of testing statistical significance of a particular

parameter, such as the Fsp, strengthening its accuracy for threshold detection. It allows assigning

measures of accuracy (which can be defined as confidence intervals, variance or others similar) to test

parameters where typical methods cannot be used (Tibshirani & Efron, 1993). Bootstrapping is carried

out by constructing a number of resamples with replacement of the original dataset at random points

in time which in effect regenerates averages. The bootstrap averages are different each time and this

process is repeated a large number of times (typically few thousand times). These are then averaged

to find an overall average where p-values are generated. The bootstrap method provides a way to

control and check the stability of results. Although for the majority of the problems it is difficult to

know true confidence intervals, the bootstrap is more accurate compared to the intervals obtained

using simple variance and assumptions of normality (Efron, 1979). However, although it is consistent

under some conditions, important assumptions are being made whilst undertaking bootstrap analysis.

For example, the independence of samples, which would otherwise be formally stated in other

approaches (Shapland & Leong, 2010) (Efron, 1979).

Page 26: Exploring statistical approaches to Auditory Brainstem Response testing

25

A study conducted by Lv, et al., (2007) looked at objective detection of evoked potentials using a

bootstrap technique which provided p-values to prove statistical significance of a response. This

technique has not been used in the detection of evoked potentials so this study fills in a vital gap in

knowledge.

Monte-Carlo simulations were firstly carried out with no stimulus response in order to determine

whether a false positive rate (α=5%) is obtained. All four parameters, diff, power, Fsp and ± difference

were applied to 500 EEG signals and a time window of 5-15 ms was used which ensured that wave V

was included in all recordings as this time window accounts for stimulus intensity effects on wave V

latencies (Coats, 1978) (Hecox & Galambos, 1974) (Picton, et al., 1977) (Gorga, et al., 1985) (Pratt &

Sohmer, 1977). Significance was determined by the bootstrap if p≤0.05. The use of Monte-Carlo

simulations increased this study’s internal validity as it gave a chance for the authors to check the

power of the methods before applying them to data recorded from normal hearing subjects. This gave

the opportunity to modify the research design or methods prior to the experiment.

Lv. et al. then applied the bootstrap method to the data obtained from 12 normal hearing adults

whose ages ranged from only 18 to 30. This may be a limitation of the study due to the use of only 12

subjects along with only using a limited age range. The use of conventional audiometry confirmed the

normal hearing thresholds of the subjects, eliminating any doubt. Hearing thresholds using all four

parameters were then acquired and compared to the visual inspection of three experienced

audiologists.

Lv et al. investigated the accuracy of the proposed method to detect responses when present by

simulating a response of a known SNR. They found that when responses were added to the

simulations, the percentage of detection determined by the objective parameters increased

consistently as SNR increased. The results of the subjective interpretations between the testers when

determining hearing thresholds showed large variations. A Cohen’s Kappa statistic was used to

measure inter-observer reliability and values of 0.70, 0.63 and 0.81 were found. As they are less than

0.90, they cannot be regarded as high (Seigel, et al., 1992) (iSixSigma, 2014), thus a good agreement

between the audiologists was not found. These results further reinforce the findings from the studies

by Vidler & Parker (2004), (Gans, et al., 1992) and Kuttva et al., (2009) regarding the variability in

interpretations between clinicians and highlight the desirability for an objective method of analysis.

Furthermore, the authors found that the Fsp method produced lower mean threshold values for the

12 subjects when compared using subjective analysis. However, data is present from only three

audiologists and in order to make accurate comparisons, more testers would need to be employed.

Page 27: Exploring statistical approaches to Auditory Brainstem Response testing

26

Lv et al. found an Fsp value of 1.81 for the bootstrap distribution α=5% compared to the value of 2.25

when using 250 sweeps found my Elberling & Don (1984). As the number of stimuli was increased to

2000, Lv et al. found an Fsp critical value of 1.75. Thus it appears that the value determined by Elberling

& Don (1984) would be too high for α=5%, in accordance with the worst case assumptions made in

deriving it. Furthermore, the bootstrap analysis shows that the threshold values for the parameters

depend on the number of stimuli used (for a given level α), and vary quite considerably between

individuals. Thus any fixed threshold values for parameters such as Fsp or ± difference would lead to

false-positive rates that differ between subjects indicating that universally valid critical values for

parameters such as the Fsp probably cannot be justified. This method has given evidence that the

application of the bootstrap method can provide statistically significant results on chosen confidence

intervals.

The authors suggest that this method should not supersede other objective methods used for

detecting responses. It should applied when testing significance of values as determined by other

objective parameters such as the Fsp. The bootstrap would also enable ABR data to be compared

between different clinics, which is currently difficult due to the subjective nature of analysis.

1.7 Study Rationale

ABR testing is one of the most important objective methods used in paediatric audiology to determine

hearing levels in the newborn population. Misinterpretation of results can lead to significant changes

in the management of a patient which can result in dissatisfaction on the patient’s behalf. Additionally,

it can lead to problems in language development as sufficient amplification may not be provided or a

hearing loss may not be detected. Furthermore, over amplification is another issue that may arise and

can lead to cochlea damage.

By carrying out an extensive review of the literature, there is consistence evidence suggesting that

significant differences are present between clinicians when interpreting ABRs (Vidler & Parker, 2004)

(Kuttva, et al., 2009) (Gans, et al., 1992) (Lv, et al., 2007). Inconsistencies between clinicians are as a

result of the varying levels of background noise that is present during an ABR recording. In some

departments, peer review is conducted in order to identify and reduce levels of uncertainties.

However, as this is not carried out in all departments and as this still does not fully eliminate

subjectivity (Kuttva, et al., 2009), the ultimate goal would be to disregard any doubt that is present

during waveform analysis.

Page 28: Exploring statistical approaches to Auditory Brainstem Response testing

27

Currently the NHSP suggests a 3:1 rule for the detection of a CR in ABR testing where the signal of

interest must be 3 times that of the noise present (Sutton, et al., 2013). However, the statistics and

rationale behind this method is unknown. Several noise reduction techniques, such as filters, signal

averaging and artefact rejection are used in order to keep noise at a minimal level, increasing the

likelihood of achieving the 3:1 criterion as determined by the NHSP protocol (Sutton, et al., 2013).

Several objective methods of analysis have been developed to reduce the aspect of subjectivity.

Elberling & Don (1984) and Wong & Bickford (1980) proposed the Fsp and ± difference method

respectively in order to introduce objectivity to ABR analysis. Additionally, Lv et al. (2007) introduced

the application of the bootstrap method which allows for significance values to be associated with

values generated by objective parameters (such as the Fsp). However, these objective methods do not

use the 3:1 rule as recommended by the NHSP protocol. Thus the implementation of an objective

parameter which takes into account the 3:1 rule may prove useful as this would combine the

agreement of the subjective NHSP criterion, with a method of objectivity. Combined with the newly

introduced bootstrapped method, this should in theory be a powerful tool at identifying a response.

The Fsp has been recommended by Sutton et al. (2013) as an acceptable parameter which can be used

in order to strengthen the decision of a clinician during ABR analysis. However, little research has been

conducted which compares the objective automated 3:1 rule to the Fsp with the addition of the

bootstrap. Thus, a comparison of both objective methods is desirable to determine which produces

the most accurate results and therefore support its implementation in clinics. Furthermore, the

comparison of the objective methods with experts’ subjective interpretations is also desirable as it will

allow for further investigation regarding the usefulness of the objective methods and the differences

in results between objective and subject methods. In addition, direct comparisons between known

SNRs and significance values (p-values) generated by the bootstrap have not been investigated. This

would prove useful as it would allow for the comparison regarding the levels of SNR at which

significance is reached by the bootstrap for each the objective parameter.

For the purposes of this research, ± difference and Fmp will not be investigated further due to the

limited research present and their uncommon use in clinics. The objective parameters that will be

explored in detail in this thesis are the Fsp and the automated version of the 3:1 rule (which will now

be referred to as ‘Autratio’). Furthermore, the bootstrap method will be applied to the objective

parameters in order to allow for a detailed investigation regarding its applications and relationship

with SNR. Lastly, subjective analysis using the NHSP protocol (Sutton, et al., 2013) will be explored in

order to make comparisons.

Page 29: Exploring statistical approaches to Auditory Brainstem Response testing

28

1.8 Research Questions

By conducting an extensive review of the available literature, this thesis aims to answer the following

research questions:

1. Is there a significant level of variability between experts when interpreting ABRs?

2. Are the objective parameters (Fsp and Autratio) equally sensitive at correctly identifying a

wave V response?

3. Is there a significant correlation between the parameters Fsp and Autratio when detecting

ABRs?

4. Does SNR have a significant effect on Fsp and Autratio values?

a. Does SNR have a significant effect on p-values generated by the bootstrapping of

Fsp and Autratio?

5. At a 95% confidence level (p≤0.05), is ‘3.0’ the critical value of an objective parameter

(Autratio) which uses the 3:1 ratio rule to detect a signal?*

*Value 3.0 is based on the 3:1 signal to noise ratio rule proposed by the NHSP for

subjective analysis.

Experimental Hypotheses

Hypothesis

1. There will be a significant level of variability between experts when interpreting ABRs.

2. When detecting a wave V response in ABR analysis, there will be a significant difference in

detection rates between the parameters Fsp and Autratio.

3. There will be a significant positive correlation between the parameters Fsp and Autratio.

4. As the level of SNR increases, Fsp and Autratio values will increase.

a. As SNR increases, p-values generated by the bootstrapping of Fsp and Autratio will

decrease towards 0.

5. The critical value of Autratio at (p≤0.05), for a response detection, will be 3.0 as Autratio is

calculated according to the 3:1 signal to noise ratio rule.

Page 30: Exploring statistical approaches to Auditory Brainstem Response testing

29

Null Hypothesis

1. There will not be a significant level of variability between experts when interpreting ABRs.

2. There will not be a significant difference in detection rates between the parameters Fsp and

Autratio.

3. There will not be a significant correlation between the parameters Fsp and Autratio.

4. The level of SNR will have no effect on the values generated by Fsp and Autratio.

a. The level of SNR will have no effect on the p-values generated by the bootstrapping

of Fsp and Autratio.

5. The critical value of Autratio at (p≤0.05), for a response detection, will not be 3.0.

Page 31: Exploring statistical approaches to Auditory Brainstem Response testing

30

2.0 Method

The data used in this study was primarily collected by Lightfoot & Stevens for their research article

titled “Effects of Artefact Rejection and Bayesian Weighted Averaging on the Efficiency of Recording”

which was published on the Ear and Hearing journal (Lightfoot & Stevens, 2014). The data collected

by Lightfoot & Stevens will be discussed in section 2.1. This study has also generated original data,

details of which are outlined in section 2.2.

2.1 Data Collected by Lightfoot & Stevens (2014)

Experimental Variables

This study contained several independent, dependent and extraneous variables which are detailed

below.

Independent Variables: Frequency and the intensity of the signal*

*The independent variables mentioned above were manipulated by Lightfoot & Stevens for

their study.

Dependent Variables: Values generated by: Fsp, Autratio, Bootstrapping of Fsp and Autratio, and the

expert analysis of the ABRs*

*The dependent variables mentioned above are primarily for this study only.

Confounding Variables: Variance between testers and environmental noise interference.

Variance between testers was minimised as much as possible by instructing the experts to

follow the NHSP guidance for interpreting an ABR response (see section 1.3 and 1.4). This

ensured that a standardised method of analysis was performed by all the experts*

*The confounding variable mentioned above was primarily present for this study only.

Environmental noise interference was attempted to be kept at a minimal by using a sound

proofed room. In addition, the position of electrode attachment was also kept consistent for

all participants and impedance levels kept below 5 kΩ. Lastly, the same high and low pass filter

settings were used throughout testing along with using approximately 3000 sweeps per

waveform*

*The confounding variable mentioned above was controlled by Lightfoot and Stevens

for their study.

Page 32: Exploring statistical approaches to Auditory Brainstem Response testing

31

Participants

Lightfoot and Stevens’ data consisted of a total of 26 babies, referred in one or both ears from the

NHSP for failing a transient evoked otoacoustic emission and an automated ABR. Participants

underwent routine ABR diagnostic testing at Arrowe Park Hospital, Wirral, United Kingdom. The mean

corrected age of the 26 participants was 3.5 weeks with a range of -1 to +12 weeks. Differing intensities

and frequencies were used along with differing number of repeats, depending on the judgement of

the present clinicians (Lightfoot & Stevens, 2014). An inclusion criteria was employed where

recordings of one or more repeats at the same frequency and intensity had to be carried out in order

to be included in the analysis. This resulted in a total of 93 averaged waveforms to be used (see

Appendix A) for full details.

Equipment and Software

The analysis of the ABR recordings required the following equipment and software:

Laptop/desktop computer

Matlab 2014a (Mathworks, 2014)

Audacity version 1.2.6 (2012)

IBM SPSS 22 (IBM, 2014)

The following equipment were required for the initial testing by Lightfoot and Stevens:

Otoscope

Abrasive gel

Alcohol rub

Single use electrodes (Ag/AgCl)

Audio recording software ClimaxDigital (2012) and Audacity version 1.2.6 (2012)

TDH-39 Supra-aural earphones

Modified Interacoustics Eclipse ABR system

Page 33: Exploring statistical approaches to Auditory Brainstem Response testing

32

Testing Conditions

At the time of testing, electrical noise was reduced by switching off all unnecessary electrical

equipment and sitting the participant as distant as possible from any equipment. The testing room

was a double walled, insulated, sound proof room which ensured optimal acoustic isolation. Artefacts

were reduced by ensuring that the participants’ state of arousal was as calm as possible. Note that

some data were deliberately recorded when participants’ state of arousal was not calm, allowing for

variations with higher noise content. These conditions were controlled by Lightfoot and Stevens.

Test Method

Initial ABR Testing by Lightfoot and Stevens

Lightfoot & Stevens (2014) recorded their data at Arrowe Park Hospital, Wirral, UK. They followed

closely to the procedures outlines in the NHSP guidance for ABR testing in babies (Sutton, et al., 2013)

and English NHSP guidelines for early audiological assessment (Stevens, et al., 2013b).

A total of 26 babies were tested which consisted of 17 male and 8 female participants. The mean

corrected age of the 26 participants was 3.5 weeks with a range of -1 to +12 weeks. Differing intensities

and frequencies were used along with differing number of repeats, depending on the judgement of

the present clinicians (Lightfoot & Stevens, 2014). Lightfoot & Stevens note that the majority of the

babies were from the well population with 19% having risk factors. 16 babies satisfied the English

discharge criteria and so achieved thresholds of 40 dB nHL and 10 dB above.

To detect the ABR, Lightfoot and Stevens used single use Ag/AgCl electrodes which were attached on

the non-test ear mastoid (common), test ear mastoid (negative) and forehead (positive). Electrodes

were attached after applying an abrasive gel which ensured good contact between the electrodes and

the skin. All impedances were kept below 5kΩ with similar impedances across each electrode. The

majority of the participants were tested in good conditions where noise levels were kept to a minimum

along with the baby being asleep or relaxed. However, some participants did not settle and EEG was

still recorded in order to acquire data of slightly higher noise content.

Amplified EEG was recorded using Climax Digital ACAP100 (2012) and Audacity version 1.2.6 (2012).

Participants underwent standard ABR testing which involved the use of a 4 kHz 5-cycle Blackman

envelope tone burst stimuli which was presented at 49.1/sec via TDH-39 transducers. The incoming

EEG was filtered between 33 Hz and 1500 Hz and typically 3000 sweeps contributed to each waveform.

Recordings (a total of n=93) were terminated after 61 second intervals and then saved to be analysed

Page 34: Exploring statistical approaches to Auditory Brainstem Response testing

33

later which allowed the inclusion of noisy and quiet conditions. When re-averaging using conventional

and Bayesian averaging, a wider filter bandwidth (3.3 to 3000 Hz) was used. If the filter settings had

been the same as those used when recording the EEG this would have resulted in doubling the

effective high and low-pass filter slopes.

Analysis of ABR Data

This study analysed the initial data collected by Lightfoot & Stevens. The data were converted into

‘wav’ format using Audacity (2012). In this study, analysis of the waveforms was conducted by using

Matlab R2014a (Mathworks, 2013). A code was devised by Dr Bell (see Appendix B) which allowed the

following features to be displayed and parameters to be calculated for each participant for each ear,

intensity and frequency:

Fsp

Autratio

P-value arising from bootstrapping of Fsp

P-value arising from bootstrapping of Autratio

Graphical representation of initial ABR recording and repeated recording (see figure 2.1a)

Graphical representation of averaged ABR (see figure 2.1a)

Figure 2.1a. Graph displaying two initial ABR waveforms (left) and the corresponding averaged waveform (right) produced on Matlab, from an example subject.

Page 35: Exploring statistical approaches to Auditory Brainstem Response testing

34

For this study, the 93 waveforms collected by Lightfoot and Stevens were given to four experts for

visual analysis. The experts had an option either to label the waveform as wave V present (Y), wave V

absent (N) or request further sweeps or repeat (R). These were then compared to objective parameter

values for analysis. Figure 2.1b below is an example of the graphical representation of the waveform

given to the experts.

Figure 2.1b. Graph displaying example ABR data in the form given to the experts for analysis.

The Fsp parameter was calculated according to the formula proposed by Elberling & Don (1984)

mentioned in section 1.6.3. The parameter Autratio was calculated according to a 3:1 SNR rule based

on the recommendation from the NHSP. An analysis time window of 6-16 ms was used. Peaks were

calculated by taking the maximum averaged wave in the time window. The trough, using the minimum

of the averaged waveform in the time window. The size of the wave is defined as the magnitude from

the peak to the trough, and the noise as the magnitude of the mean absolute difference between the

waveforms for the pre-determined time window. Hence, ratio equals (size of wave)/(noise). Fsp and

Autratio were then bootstrapped in order to generate p-values to allocate significance levels to the

determined values.

Bootstrapping will be carried out by constructing a number of resamples with replacement of the

original dataset at random points in time which in effect will regenerates averages. The averages are

different each time and this process will be repeated a total of approximately 499 times. These will

then be averaged in order to find an overall average where p-values can be generated accordingly

which indicate a significant response present if p≤0.05.

Page 36: Exploring statistical approaches to Auditory Brainstem Response testing

35

2.2 Simulated Data

Experimental Variables

Whilst conducting simulations, several variables were present in this study:

Independent Variables: SNR level before averaging

Dependent Variables: Values generated by: Fsp, Autratio and by the Bootstrapping of Fsp and Autratio

Confounding Variables: As simulations were carried out in this study, there was no presence of any

confounding variables, as all variables were tightly controlled.

Equipment and Software

The following equipment were required in order to conduct simulations:

Laptop/desktop computer

Matlab 2014a (Mathworks, 2014)

IBM SPSS 22 (IBM, 2014)

Testing Conditions

At the time of testing, all unnecessary programmes were terminated in order to reduce the stress on

the CPU. In addition, this also ensured that the chance of a computer error occurring was kept to a

minimal.

Test Method

Generating Simulated Data

An autoregressive model of EEG noise from an example subject was generated using the Yule Walker

method. Sections of simulated EEG noise were generated by putting white noise into the

autoregressive model.

An example ABR response from a subject was embedded into the noise at different levels to generate

different SNR levels. Ten repeats will be conducted at each SNR level and means will be calculated and

then analysed.

Page 37: Exploring statistical approaches to Auditory Brainstem Response testing

36

Figure 2.2a. ABR response from subject 19 which was embedded into the noise at different levels to generate different SNR levels. (Sound presented to subject at right ear with frequency of 4 KHz at 50 dB).

Analysis of Simulated Data

The simulated data were analysed using Matlab R2014a (Mathworks, 2013). A code was devised by Dr

Bell, see Appendix C, which allowed the following features to be displayed and parameters to be

calculated for different SNR levels (set before averaging - which was user assigned):

Fsp

Autratio

P values arising from the bootstrapping of Fsp

P values arising from the bootstrapping of Autratio

Graphical representation of simulated ABR waveform

The data was analysed and different objective parameters were calculated using the same approach

as for real patient data (see section 2.1 – ‘Analysis of ABR data’).

Page 38: Exploring statistical approaches to Auditory Brainstem Response testing

37

2.3 Risk Assessment and Ethics

Prior to commencing analysis, a risk assessment form was completed and submitted to the University

of Southampton. This was approved shorty after submission. Details can be found on Appendix D.

Ethics Approval from the Institute of Sound and Vibration Research was not necessary as advised by

Dr Bell. This was due to the reason that the two sets of data used in this study were either simulated

or originally recorded by Lightfoot & Stevens (2014) in their study. However, it should be noted that

Lightfoot and Stevens obtained NHS ethics approval from the Department of Medical Physics & Clinical

Engineering, Royal Liverpool University Hospital.

2.4 Statistical Analysis

Note: All statistical testing will be carried out using IBM SPSS 22 (IBM, 2014) software.

A Shapiro Wilk test will be conducted firstly in order to check for the distribution of the data.

Appropriate parametric or non-parametric tests will be then carried out based on the distribution of

the data. This will allow comparisons to be made and variable effects to be identified between the

data.

Page 39: Exploring statistical approaches to Auditory Brainstem Response testing

38

3.0 Results

3.1 Analysis of Data Collected by Lightfoot & Stevens

3.1.1 Comparison between Experts’ Interpretations

The experts who analysed the data were either from the University of Southampton Institute of Sound

and Vibration Research or the Southampton Auditory Implant Service. As shown in Appendix A,

analysis was conducted in two groups. Group one consisted of expert 1 only, where group two

consisted of experts 2, 3 and 4 who made their decisions together.

Firstly, comparisons between experts’ answers revealed interesting results. Table 3.1.1a below

demonstrates the percentage of agreement and disagreement between the experts when analysing

waveforms. For a present detection of a wave V, a high level of agreement was observed between the

experts of 86.96%. In contrast, significantly high levels of disagreement was found between experts

when identifying a no response or when requesting further sweeps, 85.19% and 72.73 % respectively.

Clearly it is evident that there is a higher level of reliability when identifying a present Wave V (Y)

compared to identifying a no response (N) or requesting further sweeps (R).

In a private meeting, expert one noted that he should have identified some waveforms as ‘R’ instead

of ‘N’, as the levels of noise were too large to meet the criterion described in the NHSP procedure. On

two occasions, experts 2, 3 and 4 identified a waveform as ‘Y’ where expert one marked ‘N’. This

provides further evidence for the variability that is present between clinicians when interpreting

waveforms. In addition, a Cohen’s Kappa test was conducted which provides a measure of inter-

observer agreement (Jean, 1996). Kappa is defined as the ‘proportion of observed agreement after

correction for chance agreement’. The calculation produces a value between 0 and 1, where 0 is poor

reliability and 1 is excellent. A kappa value of 0.90 is regarded as high, i.e. for there to be good

agreement between the experts (Arnold, 1985). A value of 0.334 was found which was highly

significant (p=0.000). This indicates that a very poor level of reliability was present between the two

groups of experts.

As expert one indicated that he should have identified some waveforms as ‘R’ instead of ‘N’, it may

prove beneficial to analyse the results using only two categories; category 1 as ‘Y’ (yes) and category

2 as ‘N or R’ (not yes) (see table 3.1.1b below). For a present detection of a wave V, the same level of

86.96% was observed as this category did not change. When ‘N’ and ‘R’ were categorised as one, there

was a significant reduction in the level of disagreement found between experts. Experts agreed

78.87% of cases where a response was either not present or when further sweeps would be requested.

In addition, a Cohen’s kappa test was carried out revealing a value of 0.663 which was highly significant

Page 40: Exploring statistical approaches to Auditory Brainstem Response testing

39

(p=0.000). A kappa value of 0.663 is considerably higher than that found when using three groups,

indicating a higher level of reliability between testers when using two categories of waveform

identification. However, as the kappa value is still considerably lower than 0.90, which is regarded as

high (i.e. for there to be good agreement) (Arnold, 1985), significant inconsistency is still present

between both groups of experts.

Y (Yes) N (No) R (Further Sweeps)

Agreement 86.96% 14.81% 27.27%

Disagreement 13.04% 85.19% 72.73%

Table 3.1.1a. Percentage of agreement between experts using three categories of waveform identification.

Y (Yes) N or R (Not Yes)

Agreement 86.96% 78.87%

Disagreement 13.04% 21.13%

Table 3.1.1b. Percentage of agreement between experts using only two categories of waveform identification.

3.1.2 Sensitivity and Specificity of Objective Parameters

Expert one is an experienced lecturer in Audiology at the Institute of Sound a Vibration Research, and

experts two, three and four are experienced clinicians from the Southampton Auditory Implant Service

who all regularly inspect ABR data. Thus for this section, this study will assume a gold standard based

on the agreement of all testers for each waveform in order to analyse the rate of detection of a wave

V using Fsp and Autratio. Table 3.1.2a below shows the number of correct and incorrect detections of

a wave V. A value of 3.1 was chosen for Fsp as the critical value/threshold in order to test the accuracy

of the proposed findings by Elberling & Don (1984) when using a worst case scenario. A critical value

of 3.0 was assigned for the parameter Autratio in order to replicate the criterion assigned by the NHSP

of obtaining a 3:1 SNR for a signal present (Sutton, et al., 2013).

If values produced by Fsp or Autratio met or exceeded their critical values when the experts agreed

on a response present, this would be deemed as a correct detection by the objective parameter. If

critical values were met or exceeded by the Fsp or Autratio, but experts requested further sweeps or

indicated a no response present, then this would be deemed as an incorrect identification by the

objective parameters.

Furthermore, bootstrapping was applied to Autratio and Fsp, where a wave V response was deemed

present only if the p-value was found to be less than or equal to 0.05 which was determined by the

Page 41: Exploring statistical approaches to Auditory Brainstem Response testing

40

bootstrap. Bootstrapping was used to determine if the parameter value derived from the coherent

average is significantly different from the distribution of the non-coherent averages at p<0.05. From

these values, the corresponding sensitivity and specificity levels were calculated by using the equation

found below on table 3.1.2b.

Table 3.1.2a indicates that the parameter Autratio produced the highest sensitivity level of 96% using

a critical value of 3.0. Autratio sensitivity dropped to 92% when only examining Autratio values which

were deemed significant by the application of bootstrapping. In addition, a specificity of 35.71% was

found for Autratio, meaning that this parameter has a high chance of producing a false positive result.

Furthermore, if looking at significant Autratio values only as determined by the bootstrap, a

significantly higher specificity of 64.29% was found.

When using a critical value of 3.1 as proposed by Elberling & Don (1984), the Fsp parameter produced

the lowest sensitivity level of 70%. In contrast, a significantly higher sensitivity of 88% was achieved

when only analysing Fsp values which were deemed significant by the bootstrap. However, the

specificity dropped from 82.14% to 67.88% when only analysing Fsp deemed significant by the

bootstrap.

Table 3.1.2a. Sensitivity and specificity levels of Fsp and Autratio when using experts’ answers as gold standard. Note: Substitute ‘x’ according to assigned parameter value

Page 42: Exploring statistical approaches to Auditory Brainstem Response testing

41

𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚 =𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠𝑋 100

𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 + 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠𝑋 100

Table 3.1.2b. Calculations of sensitivity and specificity

3.1.3 Correlational Analysis

A correlational test was performed on the values produced by the parameters Fsp and Autratio for all

data (n=93, see Appendix A) in order to determine if any similarities were present. Firstly, a Shapiro-

Wilk test determined that all data were non-parametric. Therefore a Spearman’s Rho test was

conducted to identify correlation between the data for Fsp and Autratio. The results of the test show

that a significant correlation with a coefficient of 0.752 exists between Fsp and Autratio (p=0.000).

Additionally, Figure 3.1.3a shows the relationship between Fsp and Autratio in a graphical manner. As

the value of Autratio increases, Fsp value increases too. Furthermore, it can be clearly deduced that

Autratio values were higher on most occasions compared to the Fsp.

The same procedure was adhered to when examining Fsp and Autratio values which were deemed

significant (p≤0.05) by the application of bootstrapping (n=62, see Appendix A). Values determined

significant by the bootstrapping of Fsp and Autratio will now be referred as SigFsp and SigAutratio

respectively for this section only.

A test for normality revealed that both sets of data were non-parametric. A Spearman’s Rho test

indicated that a significant correlation (p=0.000) was present between SigAutratio and SigFsp with a

correlation coefficient of 0.533. However, this correlation was not as strong as that between Fsp and

Autratio. Additionally, figure 3.1.3b displays this information in a graphical manner.

Page 43: Exploring statistical approaches to Auditory Brainstem Response testing

42

Figure 3.1.3a. Graph displaying the relationship between Fsp and Autratio when using all data (n=93).

Figure 3.1.3b. Graph displaying the relationship between Fsp and Autratio when only using significant values (n=62), as determined by bootstrapping the objective parameters (p≤0.05).

0

2

4

6

8

10

12

14

16

18

0 2 4 6 8 10 12 14 16 18

Fsp

Autratio

0

2

4

6

8

10

12

14

16

18

0 2 4 6 8 10 12 14 16 18

Sign

ific

ant

Fsp

Val

ue

s(D

ete

rmin

ed

by

the

bo

ots

trap

pin

g o

f Fs

p)

Significant Autratio Values (Determined by the bootstrapping of Autratio)

Page 44: Exploring statistical approaches to Auditory Brainstem Response testing

43

3.2 Analysis of Simulated Data

SNR and P-value Analysis

The use of Monte Carlo simulations has also allowed for the analysis of the functioning of the

bootstrap method with respect to the two parameters Fsp and Autratio. This was done by retesting

ten times at each SNR and then averaging the actual p-values generated from the bootstrapping of

Autratio and Fsp. Table 3.2a below displays the mean p-values at each SNR.

Table of Means

SNR (BA) Fsp P value Autratio P Value

0.002 0.28537 0.61763

0.004 0.12901 0.16533

0.006 0.0172* 0.02566*

0.008 0.0004* 0.00501*

0.01 0* 0.00401*

Table 3.2a. Mean p-values generated by using the bootstrap method at each SNR. *Significant (p≤0.05) A Shapiro-Wilk test indicated that all data were parametric (p>0.05) thus a Pearson’s r test was carried

out in order to analyse the relationship between each parameter and SNR. The test revealed a very

highly significant correlation coefficient of -0.895 (p=0.04) between SNR and the p-values generated

for Fsp by the bootstrap. Similarly, a correlation coefficient of -0.836 was found between SNR and the

p-values generated by the bootstrapping of Autratio. However, statistical significance was not

achieved (p=0.078) for this correlation. Figure 3.2b displays this relationship graphically. By analysing

table 3.2a and figure 3.2b, it is clear that the p-values generated for Autratio are on average higher

compared to p-values generated for Fsp across all SNRs, but mostly towards the lower level SNRs.

Averaged p-values for both parameters first reach statistical significance (defined as p≤0.05) at an SNR

of 0.006 after which they are then very similar to one another. This indicates that p-values produced

by the bootstrapping of both objective parameters tend closer towards 0 as SNR increases.

A repeated measures ANOVA test was conducted in order to measure the effect of SNR on p-values

generated by the bootstrapping of both objective parameters. The test indicated that SNR has a

significant effect on p-values generated by the bootstrapping of Fsp (p=0.000). In addition, statistical

significance was also achieved for the effect of SNR on p-values arising from the bootstrapping of

Autratio (p=0.000).

Page 45: Exploring statistical approaches to Auditory Brainstem Response testing

44

Figure 3.2b. Graph displaying the relationship between SNR and Fsp/Autratio p-values as determined by the bootstrap.

SNR and Parameter Value Analysis

The use of Monte-Carlo simulations has enabled for the investigation of how the parameter values

differ according to changes in SNR. Each SNR was retested ten times in order to generate average Fsp

and Autratio values (all data can be found on Appendix E). Table 3.2c displays the means at each SNR.

Table of Means

Fsp Autratio SNR (BA)

1.63723 2.0826 0.002

2.23355 3.87623 0.004

3.96289 6.57598 0.006

6.27492 7.29712 0.008

7.58759 8.3473 0.01

Table 3.2c. Table displaying the mean parameter values at each SNR.

A Shapiro-Wilk test revealed that all data were parmetric (p>0.05), thus a Pearson’s r test was carried

out. The test revealed a very highly significant correlation coefficient of 0.985 (p=0.002) between Fsp

and SNR. Similarly, a highly significant correlation coefficient of 0.976 (p=0.004) was found for the

relationship between Autratio and SNR. The values produced by Autratio were found to be higher on

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.002 0.004 0.006 0.008 0.01

Sign

ific

ance

Ge

ne

rate

d b

y B

oo

tstr

ap (

p-v

alu

es)

SNR (Before Averaging)

Fsp AutRatio

Page 46: Exploring statistical approaches to Auditory Brainstem Response testing

45

average compared to that produced by the Fsp parameter, which is also supported by the findings

discussed in section 3.1.3. Figure 3.2d displays this relationship in a graphical manner.

A repeated measures ANOVA test was conducted in order to investigate the effect of SNR on Autratio

and Fsp values. However statistical significance was not achieved; p=0.316 and p=0.104 with respect

to Fsp and Autratio.

Figure 3.2d. Graph displaying the relationship between SNR and Fsp/Autratio parameter values. Parameter and P-value Analysis By conducting simulations the relationship between parameter values and their corresponding p-

values as determined by the bootstrap can be analysed. The values from table 3.2c and 3.2a have been

imported to create table 3.2e (see below), which will be used for the upcoming calculations.

Table of Means

SNR Fsp Fsp P value Autratio Autratio P value

0.002 1.63723 0.28537 2.0826 0.61763

0.004 2.23355 0.12901 3.87623 0.16533

0.006 3.96289 0.0172 6.57598 0.02566

0.008 6.27492 0.0004 7.29712 0.00501

0.01 7.85759 0 8.34730 0.00401

Table 3.2e. Mean parameter values and their corresponding mean p-values at different SNR levels.

0

1

2

3

4

5

6

7

8

9

10

0.002 0.004 0.006 0.008 0.01

Par

ame

ter

Val

ue

SNR (Before Averaging)

Fsp Autratio

(y) (x)

Page 47: Exploring statistical approaches to Auditory Brainstem Response testing

46

A Shapiro-Wilk test indicated that all data were parametric (p>0.05). A Pearson’s r test revealed a

strong negative correlation coefficient of -0.829 between Fsp values and p-values arising from the

bootstrapping of Fsp (p=0.082). A significant difference was found in the values for Fsp (M=4.34,

SD=2.56) and p-values arising from the bootstrapping of Fsp (M=0.86, SD=0.12); t(4)=3.573, p=0.023.

Furthermore, a stronger negative correlation coefficient of -0.900 (p=0.038) was found between

Autratio values and p-values arising from the bootstrapping of Autratio. Similarly, a repeated

measures t-test determined that a significant difference was present for values of Autratio (M=5.63,

SD=2.58) and p-values arising from the bootstrapping of Autratio (M=0.16, SD=0.26); t(4)=4.335,

p=0.012.

By looking at point ‘x’ on figures 3.2f and 3.2g, it can be deduced that an Fsp critical value of 3.0 and

an Autratio critical value of 5.2 are found for the bootstrap distribution α=5% (p≤0.05). These critical

values are the lowest values to be deemed as significant when applying the bootstrap method to the

objective parameters in order to detect a significant response present.

Figure 3.2f. Graph displaying the relationship between Fsp values and their corresponding p-values at different SNR levels.

(x)

Page 48: Exploring statistical approaches to Auditory Brainstem Response testing

47

Figure 3.2g. Graph displaying the relationship between Autratio values and their corresponding p-values at different SNR levels.

(x)

Page 49: Exploring statistical approaches to Auditory Brainstem Response testing

48

4.0 Discussion

4.1 Comparison between Experts’ Interpretations

The results of this study support the findings of Vidler & Parker (2004), Gans, et al. (1992), Kuttva et

al. (2009) and Lv et al. (2007) that a high level of variability is present during the subjective

interpretation of an ABR using the 3:1 proposed by the NHSP.

Agreement was found between the experts when they believed a response was present in the

recording (Y) (86.96%). In contrast, a high level of disagreement was found when reporting a response

as absent (N) (85.19%) and when requesting for additional sweeps or a repeat (R) (72.73%). On two

occasions, experts 2, 3 and 4 labelled a waveform as signal present (Y), where expert 1 identified them

as a no response present (N). In addition to this, a Cohen’s Kappa test, which provides a measure of

inter-observer agreement (Jean, 1996), produced a value of 0.334 (p = 0.000). This relatively low Kappa

value strengthens the findings as a low value indicates a poor level of agreement between the experts.

This high level of disagreement may have been established due to the fact that experts 2, 3 and 4 were

reluctant to identify a response as not present (N) 2 times, instead labelling it as (R) 30 times. In

contrast with expert 1 who identified no response (N) 25 times, and requested further sweeps (R), 14

times. This inconsistency may have been present due to the experts not having any control over the

acquisition of the data. Therefore they may not have wanted to rule out a response as absent (N),

making ‘R’ a popular choice.

Further analysis was conducted in order to address the matter that expert 1 introduced. Expert 1

indicated that he should have marked some waveforms as ‘R’ instead of ‘N’, which may have produced

different findings. By analysing the results again, this time with only two categories; ‘Yes’ (Y) and ‘Not

yes’ (N or R), significant reductions were found between the level of disagreement between experts.

Experts agreed 78.87% of cases for a ‘Not yes’, compared to an agreement of only 14.81% and 27.28%

with respect to ‘N’ and ‘R’. A higher Kappa value was found, 0.663 (p = 0.000), which indicates that

the level of agreement between the experts had increased compared to using three categories.

However, these findings still express the potential differences between clinicians’ interpretations of

ABR waveforms. Major differences in management options may result as a consequence of the

variability that is present between clinicians. Hence even the smallest differences in results highlight

the need for and desirability of an objective method in order to reduce the level of uncertainty and

variability that may occur during subjective analysis. The results provide evidence that the null

hypothesis; ‘there will not be a significant level of variability between experts when interpreting ABRs’

can be rejected, and the experimental hypothesis accepted.

Page 50: Exploring statistical approaches to Auditory Brainstem Response testing

49

However, several limitations were present regarding the method of obtaining the data. Firstly, a

potential limitation may have been that the experts were not used to analysing graphical ABR data in

the form given to them (see figure 2.1b). In addition, the graphs did not contain labelled axes which

may have also affected their interpretations. However, no mention was given by any of the experts

regarding this issue having effected their ability to interpret results.

Another major limitation is the method of data collection. Ideally, if all experts collected the data

themselves, this would have resulted in a greater external validity as this would have better replicated

a real world scenario. Additional information regarding the number of sweeps, arousal of patient,

conditions of testing and test parameters may have also better represented real world scenarios. As

a result of the lack of additional information, experts 2, 3 and 4 reported difficulty in ruling out a

response as absent. Further research into this field could address these limitations by allowing the

clinicians who collected the data, to analyse the data themselves. If this is not possible, full information

should be provided to the experts with respect to the number of sweeps, arousal of patient and

conditions of testing.

Further research may also address the limitation of comparing interpretations from only four experts.

In addition the experts were all based in the Southampton area where expert 1 was a lecturer of

Audiology and experts 2, 3 and 4 were experienced clinicians from the Southampton Auditory Implant

Service Centre. This may not have accurately represented a conventional audiologist who works in a

paediatric ABR clinic across the UK. This along with the use of a small sample size (n=4) threatens the

external validity of the findings. It may prove beneficial to include several clinicians from clinics around

the UK in order to examine the level of variability that is present between the experts and perhaps

even between clinics. Furthermore, the use of a greater sample size (ABR data) would allow for more

accurate predictions to be made with greater confidence in the findings.

4.2 Sensitivity and Specificity of Objective Parameters

A primary aim of this project was to investigate the accuracy of the objective methods. This was

conducted by making comparisons with the experts’ analysis. In order to do this, the agreement of a

response present (n=50) by experts 1, 2, 3 and 4 was used as a gold standard.

Firstly, sensitivity and specificity levels were calculated according to the formula in table 3.1.2b. When

using a critical value of 3.1, Fsp produced a sensitivity level of 70%, meaning that 30% of wave V signals

will be missed (not labelled as present). In addition, a specificity level of 82.14% was found which

meant that only 5 false positives were produced by the Fsp. A considerably higher sensitivity level of

Page 51: Exploring statistical approaches to Auditory Brainstem Response testing

50

88% was found when only analysing Fsp values which were deemed significant by the bootstrapping

of Fsp. This indicates that the application of the bootstrap produced a better correct identification

rate of a wave V. However, a lower specificity level of 67.86% was found, meaning that the

bootstrapping of Fsp resulted in producing greater false positive results (n=9). However, these results

imply that the advantages of applying the bootstrap method to Fsp outweigh the disadvantages, as a

greater benefit in terms of sensitivity is achieved with only a slight deterioration in the specificity level.

A very high sensitivity level was found for Autratio when using a critical value of 3.0 (96%). When

bootstrapping was applied to Autratio, a minute reduction of only 4% was found with regards to the

sensitivity level. However, a significantly higher specificity level was produced as a result of the

bootstrapping of Autratio (64.29%) compared to when analysing Autratio using a critical value of 3.0

(35.71%). Similar to Fsp, this means that the advantages of applying the bootstrap method to Autratio

outweigh the disadvantages, as greater benefits in terms of specificity are achieved with only a slight

decrease in the sensitivity level.

Although Fsp and Autratio are calculated considerably differently from one another, the application

of bootstrapping to both parameters revealed very interesting results. Both parameters produced very

similar sensitivity and specificity levels. A sensitivity of 92% was achieved by the bootstrapping of

Autratio, which is only 4% higher than that found by the bootstrapping of Fsp. In addition, a specificity

of 67.86% was found by the bootstrapping of Fsp, which is only 3.57% greater than that produced by

the bootstrapping of Autratio. These results imply that the bootstrap method is universally applicable

to various objective parameters and could be used to compare results across clinics.

These findings suggest that the application of the bootstrap method is advantageous for Fsp sensitivity

levels and Autratio specificity levels. However, although significant improvements were seen with

regards to sensitivity and specificity levels for both the Fsp and Autratio, neither parameter produced

an adequate specificity level, the highest being 82.14% achieved by the Fsp using a 3.1 critical value.

The highest sensitivity of 96% was achieved by Autratio using a critical value of 3.0, which however

yielded a poor sensitivity of 35.71%. With the application of the bootstrap, a good balance between

sensitivity specificity was found for both the Fsp and Autratio, which is very desirable. However,

although a good balance was achieved, values of sensitivity and specificity are still too low in order for

the objective methods to supersede the subjective analysis performed by clinicians.

The results thus provide evidence that the null hypothesis; ‘there will not be a significant difference

in detection rates between the parameters Fsp and Autratio’ can be rejected.

Page 52: Exploring statistical approaches to Auditory Brainstem Response testing

51

However, the specificity values may not be fully accurate as a result of a limitation. Using the experts

as gold standard may not provide information that is fully accurate, as in reality, the objective

parameters could have correctly indicated a response as present at lower stimulation levels than the

experts. For example, if the objective parameters detected a signal as present but the experts

requested further sweeps or a repetition ‘R’, this report deemed this as an incorrect detection by the

objective parameter. It is important to note that as a worst case design has been employed when

calculating specificity levels, in reality, values could be considerably better. However, it was necessary

to label a waveform as ‘R’ due to experts not wanting to definitely rule out a response. As mentioned

previously, if the data was primarily acquired by the experts themselves, perhaps more definite

decisions would have been made regarding whether a response was present or not, allowing for a

better measurement of accuracy of the objective parameters.

Furthermore, the relatively low Kappa values and percentages of agreement and disagreement

discussed previously in section 3.1.1 provides evidence that considerable inconsistency is present

between the experts when interpreting waveforms. The experts may have been incorrect when

marking a signal as not present, when in reality, it may have actually been present. Thus using the

experts as a gold standard may not provide the best measurement of accuracy for when comparing

the accuracy of the objective parameters.

Further research should address these limitations by allowing the experts to acquire the data with full

control of all parameters such as the number of sweeps etc. This would result in more accurate

calculations of sensitivity and specificity levels for both objective parameters which may support their

implementation into a clinical environment.

4.3 Correlational Analysis

The Fsp and Autratio values calculated using Matlab r2014a (Mathworks, 2014), were investigated to

identify if a correlation was present as they are both independent of each other and are calculated

considerably differently. A Spearman’s Rho test was conducted which produced a highly significant

result. A correlation coefficient of 0.752 was found between Fsp and Autratio (p=0.000). In addition,

when comparing Fsp and Autratio values which were deemed significant (p≤0.05) by means of the

bootstrap method (n=62), a Spearman’s Rho test revealed a correlation coefficient of 0.533 (p=0.000).

A higher mean value of 5.19 was observed for SigFsp (Fsp values determined significant by the

bootstrap method) compared to 3.8 for Fsp. Also a higher mean of 7.57 was found for SigAutratio

(Autratio values determined significant by the bootstrap method) compared to 6.2 for Autratio. This

Page 53: Exploring statistical approaches to Auditory Brainstem Response testing

52

indicates that the use of bootstrapping will on average, result in obtaining higher parameter values.

Furthermore, the results suggest that the parameter Autratio produced higher values on average

compared to those produced by Fsp.

By examining the line of best fit on figure 3.1.3a, if we consider an Autratio value of 3.0 (according to

the NHSP 3:1 rule), the corresponding Fsp value is close to 2.0. A corresponding Autratio value close

of 5.0 is found when using an Fsp value of 3.1 (as proposed by Elberling & Don [1984]). This supports

the findings discussed previously that the Fsp parameter produces lower values on average compared

to that produced by the Autratio. If these objective parameters are used in conjunction with one

another in clinics, obtaining a significant Autratio value of 3.0 may not always result in a significant

Fsp value of 3.1. This suggests that the combined use of both parameters may not currently be viable

with their current critical values.

Furthermore, the results indicate that the null hypothesis; ‘there will not be a significant correlation

between the parameters Fsp and Autratio’ can be rejected, and the experimental hypothesis

accepted.

4.4 Simulated Data

Strong negative correlations between SNR and both parameter p-values were found as a result of

conducting simulations. As value of -1 is a very strong negative correlation, the results indicate that as

SNR increases, Fsp p-values and Autratio p-values decrease and significance arising from the use of

bootstrapping (p≤0.05) is achieved at a greater rate.

For Autratio p-values at 0.002 SNR, significance was only reached once (1/10) compared to n=10/10

at 0.01 SNR. This further provides evidence that the bootstrapping method is sensitive to changes in

SNR. A repeated measures ANOVA test showed that the effect of SNR on p-values arising from the

bootstrapping of Fsp and Autratio was statistically significant (p=0.000); strengthening the idea that

the level of SNR has a significant effect on p-values. Thus the null hypothesis; ‘the level of SNR will

have no effect on the p-values generated by the bootstrapping of Autratio and Fsp’ can be rejected.

Furthermore, by the inspection of figure 3.2b, it is evident that lower p-values are produced for Fsp

across all SNRs. This means that by using the Fsp parameter, on average, values of greater confidence

(p<0.05) will be produced compared to Autratio, supporting the combined use of Fsp and

bootstrapping in clinics.

Significant correlations were found between SNR and both parameters, suggesting a clear relationship

that as SNR increases, parameter values also increase. However significance was not achieved by an

Page 54: Exploring statistical approaches to Auditory Brainstem Response testing

53

ANOVA test, which tested for the effect of SNR on values produced by Autraio and Fsp. This indicates

that the null hypothesis ‘the level of SNR will have no effect on the values generated by Fsp and

Autratio’ cannot be fully rejected. Furthermore, across all SNRs, Autratio produced higher values

compared to Fsp. This gives indications that the critical value (to detect a wave V response) of Autratio

may be considerably higher than that for Fsp (see figure 3.2d).

Negative correlations were found between parameter values and their corresponding p-values,

suggesting that as parameter values increase, p-values tend closer to 0 and significance is achieved at

a better rate. This further supports the use of bootstrapping in clinics, as a greater/stronger parameter

value results in significance to be achieved more frequently.

By analysing figure 3.2d, specifically point ‘y’, it can be deduced that the recommendation of the 3:1

rule proposed by the NHSP, which should in theory associate to an Autratio value of 3.0, equates to

an SNR level (before averaging) of approximately 0.003. Furthermore, point ‘x’ indicates that an SNR

of 0.005 equates to the Fsp critical value of 3.1 as proposed by Elberling and Don (1984). However,

the lowest SNR level where the p-values (generated from the bootstrapping of Autratio and Fsp) were

below 0.05 was at an SNR of 0.006, which is significantly higher than 0.003 and 0.005. This suggests

that critical values of 3.1 for Fsp and 3.0 for Autratio may be too low. However only five different SNR

levels were used in this study, which may be a reason as to why these values seem too low/inaccurate.

We can deduce from the results by examining point ‘x’ marked on figures 3.2f and 3.2g, that the

minimum value associated with a significant response (i.e. the critical value) for Fsp is 3.0 for the

bootstrap distribution α=5% (p≤0.05). Furthermore, the suggested critical value for Fsp that has been

extrapolated from the findings of this report is very similar to the critical value of 3.1 suggested by

Elberling and Don (1984) for (p<0.01). However, the suggested Fsp critical value of 3.0 is lower

compared to that found by Lv, et al. (2007), who suggested 1.75 based on their results which was

determined by the bootstrap distribution α=5% (p≤0.05). The differences in Fsp critical values found

between this study and the studies by Lv, et al (2007) and Elberling and Don (1984) support the results

of Lv, et al. who also found that the critical values for Fsp varied quite considerably between

recordings. This supports the idea that universally valid threshold values for Fsp probably cannot be

justified.

This study also found an Autratio critical value of 5.2 for the bootstrap distribution α=5% (p≤0.05).

Evidently, this is higher than the subjective criteria of a 3:1 signal to noise ratio proposed by the NHSP

(which should equate to an Autratio value of 3.0). Thus the null hypothesis; ‘the critical value of

Autratio at (p≤0.05), for a response detection, will be not be 3.0.’ has to be accepted. Although the

parameter Autratio has been coded to calculate values using the subjective 3:1 criteria, there are

Page 55: Exploring statistical approaches to Auditory Brainstem Response testing

54

many reasons as to why a critical value of 3.0 may have not been achieved. One plausible explanation

may be that the noise estimate from visual inspection is different from what the algorithm of Autratio

produces. For example, if visual inspection tends to overestimate noise compared to the 'true'

statistical value, then for a given recording the 'ratio' (which is signal divided by noise) will be lower

for visual inspection than for the objective method (assuming the signal peak to trough difference is

similar for both). So where subjective visual inspection suggests that clear responses start with ratios

around 3.0, the objective method may calculate the ratio to be around 5.0 due to the difference in

noise level estimates.

It is also important to note that when conducting simulations, several limitations were present. Most

importantly, this study used only five different SNR levels which were set before averaging. Future

research should address this issue by using an extensive range of SNRs in order to obtain more

accurate information regarding relationships and effects between variables. In addition, the use of

more SNRs will enable more accurate estimates of critical values for the objective parameters. In

addition, significance was not achieved for various relationships and effects between variables, which

may be a result of the limited range SNRs. Another limitation is that each SNR was retested at ten

times. Perhaps a greater retest value would have yielded more accurate and reliable results.

Another limitation in this study was due to the fact that no artefacts were present in the simulated

waveforms. In reality, when recording an ABR in clinic, several artefacts are present which may

influence the interpretation of a waveform. Consequentially, the ecological validity of the simulated

findings are weakened.

Lastly, the simulations did not explore false positive rates of the objective parameters. Future research

should address this by generating a known waveform with no actual wave V present (no stimulation

data), in order to determine which parameter has a higher false positive rate. Although this study has

attempted to explore false positive rates in section 3.1.2, experts were used as gold standard which

presents a major limitation as discussed previously.

Page 56: Exploring statistical approaches to Auditory Brainstem Response testing

55

5.0 Conclusion

To conclude, this report has found results which agree with the reviewed literature regarding the high

level of variability that is present during ABR analysis. This level of inconsistency is of great concern,

as major differences may occur during the management of a patient. This should be addressed by the

NHSP immediately by perhaps providing mandatory standardised training programmes to audiologists

around the UK.

Neither objective parameter detected a response at a rate of 100% when using experts as gold

standard. Fsp produced relatively good sensitivity and specificity levels with a critical value of 3.1

(Elberling & Don, 1984), compared to Autratio which produced a very low specificity level using a

critical value of 3.0 based on the NHSP. This highlights the fact that an Autratio critical value of 3.0 is

unfitting and should be revised accordingly. Furthermore, the advantages of applying the bootstrap

method to Fsp and Autratio greatly outweighed the disadvantages, supporting the combined use of

the bootstrap method with Autratio and Fsp.

Further analysis using simulations provided evidence that the bootstrap technique is sensitive to

changes in SNR. It was also found that SNR had a significant correlation with both objective

parameters and that on average, Autratio produced greater values across all SNRs. Based on the

findings, this report proposed an Fsp critical value of 3.0, which is very similar to previous work

(Elberling & Don, 1984). An Autratio value of 5.2 was also found, which is significantly higher than the

subjective criterion of 3:1 as proposed by the NHSP. However, the inaccurate estimation of noise by

visual inspection may be the underlying factor to this variation.

Correlational analysis revealed that although Fsp and Autratio are calculated considerably differently,

they are both related to one another. This was explored further by conducting simulations and the

results indicated that SNR was the underlying variable which caused this relationship; where Fsp and

Autratio values increased with SNR.

The findings of this report suggest that the implementation of an objective approach to supersede the

subjective method of analysis is still not a viable option. The Fsp parameter did not achieve an

adequate level of sensitivity and both parameters produced poor specificity levels. The use of

bootstrapping resulted in achieving greater advantages than disadvantages in terms of the accuracy

of the objective parameter which strengths the implementation of the bootstrap in ABR clinics. The

use of bootstrapping will allow comparisons of results to be made across clinics. This report suggests

that the objective methods should be used alongside subjective analysis to provide confidence to

clinicians.

Page 57: Exploring statistical approaches to Auditory Brainstem Response testing

56

Further research should test the accuracy of the critical values proposed based on the findings of this

report. Furthermore, limitations that have been highlighted in this report should be addressed by

future work in order to make better comparisons between the objective parameters. Applying the

bootstrap method to other objective parameters such as the Fmp or ± difference may also prove

beneficial as it would allow for better comparisons to be made regarding its usefulness. This would

aid the decision of whether it is actually viable for an objective method to supersede the current

subjective method of analysis.

Page 58: Exploring statistical approaches to Auditory Brainstem Response testing

57

Works Cited

Arlinger, S. D., 1981. Technical aspects of stimulation, recording, and signal processing. Scandinavian

Audiology, Volume 13, pp. 41-53.

Arnold, S. A., 1985. Objective versus visual detection of the auditory brain stem response.. Ear

Hearing, 6(3), pp. 144-150.

Audacity, 2012. Free Audio Editor and Recorder. [Online]

Available at: http://audacity.sourceforge.net/

[Accessed 04 December 2014].

Besouw, R. V., 2012. Physiological Measurement: Auditory evoked potentials and synchronous

averaging. [Online]

Available at: https://blackboard.soton.ac.uk/bbcswebdav/pid-1704748-dt-content-rid-

859574_1/xid-859574_1

[Accessed 22 November 2014].

Bremner, D. et al., 2012. Audiology Assessment Protocol: Version 4.1, s.l.: BC Early Hearing Program.

ClimaxDigital, 2012. ClimaxDigital ACAP100 USB 2.0 Audio Capture-transfer analogue audio sources

to digital format. [Online]

Available at: http://www.climaxdigital.co.uk/USB-20-Audio-Capture-transfer-analogue-audio-

sources-to-digital-format

[Accessed 04 December 2014].

Coats, A. C., 1978. Human auditory nerve action potentials and brain stem evoked responses:

Latency-intensity functions in detection of cochlear and retrocochlear abnormality. Archives of

Otolaryngology, 104(12), pp. 709-717.

Don, M., Elberling, C. & Waring, M., 1984. Objective detection of averaged auditory brainstem

responses. Scandinavian audiology, 13(4), pp. 219-228.

Efron, B., 1979. Bootstap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1), pp.

1-26.

Elberling, C., 1979. Auditory electrophysiology: Spectral analysis of cochlear and brain stem evoked

potentials. Scandinavian Audiology, Volume 8, pp. 57-64.

Elberling, C. & Don, M., 1984. Quality estimation of averaged auditory brainstem responses.

Scandinavian audiology, 13(3), pp. 187-197.

Page 59: Exploring statistical approaches to Auditory Brainstem Response testing

58

Elberling, C. & Don, M., 1987. Threshold characteristics of the human auditory brain stem response.

The Journal of the Acoustical Society of America, 81(1), pp. 115-121.

Elberling, C. & Don, M., 2010. A direct approach for the design of chirp stimuli used for the recording

of auditory brainstem responses. J Acoust Soc Am, Volume 128, pp. 2955-2964.

Ferm, I., Lightfoot, G. & Stevens, J., 2013. Comparison of ABR response amplitude, test time, and

estimation of hearing threshold using frequency specific chirp and tone pip stimuli in newborns.

International journal of audiology, 52(6), pp. 419-423.

Foxe, J. J. & Stapells, D. R., 1993. Normal infant and adult auditory brainstem responses to bone-

conducted tones. International Journal of Audiology, 32(2), pp. 95-109.

Gans, D., Zotto, D. D. & Gans, K. D., 1992. Bias in Scoring Auditory Brainstem Responses. British

Journal of Audiology, 26(6), pp. 363-368.

Gorga, M. P. et al., 1985. Some comparisons between auditory brain stem response thresholds,

latencies, and the pure-tone audiogram. Ear and Hearing, 6(2), pp. 105-112.

Hall, J. W., 2006. New Handbook for Auditory Evoked Responses. 1st ed. Florida: Pearson.

Hecox, K. & Galambos, R., 1974. Brain stem auditory evoked responses in human infants and adults.

Archives of otolaryngology, 99(1), pp. 30-33.

IBM, 2014. SPSS Statistics. [Online]

Available at: http://www-01.ibm.com/software/analytics/spss/products/statistics/downloads.html

[Accessed 06 December 2014].

iSixSigma, 2014. Kappa. [Online]

Available at: http://www.isixsigma.com/dictionary/kappa/

[Accessed 18 November 2014].

Jean, C., 1996. Assessing agreement on classification tasks: The kappa statistic. Computational

Linguistics, 22(2), pp. 249-254.

Kavanagh, K. T., Harker, L. A. & Tyler, R. S., 1984. Auditory brainstem and middle latency responses:

I. Effects of response filtering and waveform identification; II. Threshold responses to a 500-Hz tone

pip.. Acta Otolaryngologica (Stockholm), Volume 108, pp. 1-12.

Kockanek, K. et al., 1991. Wplyw rodzaju bodzca dzwiekowego na oznaczanie progu slichowego

metoda ABR.[The effect of brief tone envelopes on ABR and behavioral thresholds]. Otolaryngologia

polska. The Polish otolaryngology, 46(3), pp. 296-301.

Page 60: Exploring statistical approaches to Auditory Brainstem Response testing

59

Kuttva, S., Radomskij, P. & Raglan, E., 2009. Effect of peer review on accuracy of reported auditory

brainstem response thresholds in newborn hearing screening programme referrals. Audiological

Medicine, 7(4), pp. 205-210.

Laukli, E. & Mair, I. W. S., 1981. Early auditory-evoked responses: Spectral content. Audiology,

Volume 20, pp. 453-464.

Lightfoot, G. & Stevens, J., 2014. Effects of Artefact Rejection and Bayesian Weighted Averaging on

the Efficiency of Recording the Newborn ABR. Ear and hearing, 35(2), pp. 213-220.

Lv, J., Bell, S. L. & Simpson , D. M., 2007. Objective detection of evoked potentials using a bootstrap

technique. Medical Engineering & Physics, 29(2), pp. 191-198.

Mathworks, 2014. Matlab: The Language of Technical Computing. [Online]

Available at: http://uk.mathworks.com/products/matlab/

[Accessed 04 December 2014].

Mocks, J., Tuan, P. D. & Gasser, T., 1984. Testing for homogeneity of noisy signals evoked by

repeated stimuli. The Annals of Statistics, 12(1), pp. 193-209.

Morelle, R., 2012. The hum that helps to fight crime. [Online]

Available at: http://www.bbc.co.uk/news/science-environment-20629671

[Accessed 21 November 2014].

Mühler, R. & Specht, H., 1999. Sorted averaging-principle and application to auditory brainstem

responses. Scandinavian audiology, 28(3), pp. 145-149.

NHS, 2008. NHS Newborn Hearing Screening Programme. [Online]

Available at: http://hearing.screening.nhs.uk/searchwebsite.php?searchstring=stage+A+check

[Accessed 13 November 2014].

NHS, 2013. NHS Newborn Hearing Screening Programme. [Online]

Available at: http://hearing.screening.nhs.uk/public

[Accessed 25 November 2014].

Özdamar, Ö. & Delgado, R. E., 1996. Measurement of signal and noise characteristics in ongoing

auditory brainstem response averaging. Annals of biomedical engineering, 24(6), pp. 702-715.

Picton, T. W., Woods, D. L., Baribeau-Braun, J. & Healey, T. M., 1977. Evoked potential audiometry. J

Otolaryngol, Volume 6, pp. 90-119.

Page 61: Exploring statistical approaches to Auditory Brainstem Response testing

60

Pratt, H. & Sohmer, H., 1977. Correlations between psychophysical magnitude estimates and

simultaneously obtained auditory nerve, brain stem and cortical responses to click stimuli in man.

Electroencephalography and clinical neurophysiology, 43(6), pp. 802-812.

Reich, D. S. & Wiatrak, B. J., 1996. Methods of sedation for auditory brainstem response testing..

International Journal of Paediatric Otorhyinolaryngology, Volume 38, pp. 131-141.

Rice University, 2014. PHYS 331: Junior Physics Laboratory I: Notes on Noise Reduction. [Online]

Available at: http://www.owlnet.rice.edu/~dodds/Files331/noise_notes.pdf

[Accessed 20 November 2014].

Ruth, R. A., Hildebrand, D. L. & Cantrell, R. W., 1982. A study of methods used to enhance wave I in

the auditory brain stem response. Archives of Otolaryngology - Head and Neck Surgery, Volume 90,

pp. 635-640.

Sauter, T. B., Beck, D. L. & Speidel, D. P., 2012. ABR and ASSR: Challenges and Solutions, 2012.

Hearing Review, 19(6), pp. 20-25.

Seigel, D. G., Podgo, M. J. & Remaley, N. A., 1992. Acceptable values of kappa for comparison of two

groups. American journal of epidemiology, 135(5), pp. 571-578.

Shapland, M. R. & Leong, J. W. K., 2010. Bootstrap Modeling: Beyond the Basics. [Online]

Available at: http://www.casact.org/pubs/forum/10fforum/ShaplandLeong.pdf

[Accessed 21 November 2014].

Silva, I., 2009. Estimation of Postaverage SNR from Evoked Responses Under Nonstationary Noise.

Biomedical Engineering, IEEE Transactions , 56(8), pp. 2123-2130.

Sininger, Y. S. & Don, M., 1989. Effects of click rate and electrode orientation on threshold of the

auditory brainstem response. Journal of speech and hearing research, 32(4), p. 880.

Small, S. A. & Stapells, D. R., 2008. Normal ipsilateral/contralateral asymmetries in infant multiple

auditory steady-state responses to air-and bone-conduction stimuli. Ear and hearing, 29(2), pp. 185-

198.

Stackoverflow, 2011. What does correlation coefficient actually represent. [Online]

Available at: http://stackoverflow.com/questions/7631799/what-does-correlation-coefficient-

actually-represent

[Accessed 07 12 2014].

Page 62: Exploring statistical approaches to Auditory Brainstem Response testing

61

Stapells, D. R. & Ruben, R. J., 1989. Auditory brain stem responses to bone-conducted tones in

infants. The Annals of otology, rhinology, and laryngology, 98(12 pt 1), pp. 941-949.

Stevens, J., Brennan, S., Gratton, D. & Campbell, M., 2013a. ABR in newborns: Effects of electrode

configuration, stimulus rate, and EEG rejection levels on test efficiency. International Journal of

Audiology, 52(10), pp. 706-712.

Stevens, J. et al., 2013b. Guidelines for the early audiological assessment and management of babies

referred from the newborn hearing screening programme: version 3.1, s.l.: NHS.

Stockard, J. J., Stockard, J. E. & Sharbrough, F. W., 1978. Nonpathological factors influencing

brainstem auditory evoked potentials. American Journal of EEG Technology, Volume 18, pp. 177-209.

Sutton, G. et al., 2013. Guidance for Auditory Brainstem Response testing in babies: Version 2.1,

London: NHS.

Takagi, K. N., Suzuki, T. & Kobayashi, K., 1985. Effect of tone-burst frequency on fast and slow

components of auditory brain-stem reponse. Scandinavian Audiology, Volume 14, pp. 75-79.

Tibshirani, R. & Efron, B., 1993. An Introduction to the Bootstrap. 1st ed. s.l.:Chapman and Hall/CRC.

Vidler, M. & Parker, D., 2004. Auditory brainstem response threshold estimation: subjective

threshold estimation by experienced clinicians in a computer simulation of the clinical test.

International Journal of Audiology, 43(7), pp. 417-429.

Warren, M. P., 1989. The auditory brainstem response in pediatrics. Otolaryngologic Clinics of North

America, 22(3), pp. 473-500.

Weinstein, B., 2000. Geriatric Audiology. 1st ed. s.l.:Thieme.

Wetherill, G. B. & Levitt, H., 1965. Sequential estimation of points on a psychometric function. British

Journal of Mathematical and Statistical Psychology, 18(1), pp. 1-10.

Wong, P. K. H. & Bickford, R. G., 1980. Brain stem auditory evoked potentials: the use of noise

estimate.. Electroencephalogr Clin Neurophysiol, 50(1), pp. 25-34.

Page 63: Exploring statistical approaches to Auditory Brainstem Response testing

7.0 Appendicies

7.1 Appendix A: Table displaying: All data used from Lightfoot & Stevens’ study, calculated parameter values of data and lastly, experts’ interpretations of

data.

Key

y Agreement of 'yes'

Objective Parameters

Bootstrapped Significance Values

When considering three categories:

Yes (y), No (n) and Result Inconclusive (r)

When considering two categories only:

Yes (y), Not yes (n) or (r)

n Agreement of 'no'

r Agreement of 'result inconclusive'

n' or 'r' Agreement of 'not yes'

p≤0.05 Significant

Baby Ear AC/BC Freq (KHz)

Intensity (dB nHL)

Fsp Ratio Sig FSP Sig Ratio Expert

analysis 1

Expert analysis

2,3,4

Expert analysis 1

Expert analysis

2,3,4

1 Lt A 4 40 4.4251 14.7561 0.000 0.000 y y y y

2 Lt A 4 50 0.364 3.7872 0.599 0.058 n r n r

2 Lt A 4 60 2.221 3.5414 0.002 0.010 r r r r

2 Lt A 4 70 0.8769 4.8766 0.188 0.010 y y y y

2 Lt A 4 80 2.0503 3.0688 0.002 0.094 y r y r

2 Rt A 4 50 0.6172 3.7816 0.273 0.076 r r r r

2 Rt A 4 60 0.6723 4.288 0.237 0.054 n r n r

2 Rt A 4 70 1.5558 5.4421 0.002 0.004 y r y r

3 Lt A 4 40 2.498 5.2218 0.000 0.008 y y y y

3 Lt A 4 50 2.8156 9.5915 0.004 0.000 y y y y

3 Rt A 4 40 4.5966 4.2667 0.000 0.002 y y y y

3 Rt A 4 50 3.0648 16.7977 0.002 0.000 y y y y

4 Lt A 4 80 0.5658 1.629 0.297 0.427 n n n n

4 Rt A 4 70 0.3305 0.764 0.868 0.918 n n n n

Page 64: Exploring statistical approaches to Auditory Brainstem Response testing

63

4 Rt A 4 80 0.7165 2.7933 0.479 0.305 n r n r

5 Lt A 4 40 7.2178 8.2583 0.000 0.000 y y y y

5 Rt A 4 50 4.8535 12.8006 0.000 0.000 y r y r

6 Lt A 4 40 6.717 9.3483 0.000 0.000 y y y y

6 Rt A 4 40 6.9995 5.8313 0.000 0.000 y y y y

6 Rt A 4 50 14.3761 9.3689 0.000 0.000 y y y y

7 Lt A 1 45 3.1515 7.9225 0.000 0.000 y y y y

7 Lt A 1 55 9.2722 11.2221 0.000 0.000 y y y y

7 Lt A 4 40 16.2599 8.5218 0.000 0.000 y y y y

7 Rt A 4 40 4.2809 6.9783 0.002 0.002 y y y y

7 Rt A 4 50 1.7277 4.113 0.068 0.070 y y y y

8 Lt A 4 50 0.2731 0.5102 0.321 0.100 n r n r

8 Lt A 4 60 2.0884 4.1589 0.006 0.016 n r n r

8 Lt A 4 70 1.941 5.4221 0.008 0.000 n y n y

8 Rt A 4 50 1.1445 7.3446 0.090 0.002 y y y y

8 Rt B 4 30 4.3725 8.4473 0.000 0.000 r y r y

8 Rt A 4 60 6.3957 3.8134 0.000 0.086 y y y y

9 Lt A 4 50 0.3024 3.7901 0.553 0.010 r r r r

9 Lt A 4 60 0.9976 3.7925 0.010 0.006 y y y y

9 Rt A 4 60 1.3887 3.7559 0.014 0.012 r y r y

9 Rt A 4 70 1.9527 4.3338 0.008 0.008 r y r y

9 Rt B 4 30 2.6884 4.7687 0.000 0.044 y y y y

10 Lt A 4 40 5.7471 6.0921 0.000 0.000 y y y y

10 Rt A 4 40 6.8229 10.0162 0.000 0.000 y y y y

10 Rt A 4 50 10.6075 9.7659 0.000 0.000 y y y y

11 Lt A 4 40 2.728 12.8851 0.012 0.000 y y y y

11 Lt A 4 50 3.9165 7.7218 0.000 0.000 y y y y

Page 65: Exploring statistical approaches to Auditory Brainstem Response testing

64

11 Rt A 4 40 5.2349 7.9004 0.000 0.000 y y y y

11 Rt A 4 50 9.7042 10.5996 0.000 0.000 n r n r

12 Lt A 4 50 0.3048 1.026 0.705 0.643 r r r r

12 Lt A 4 60 0.6246 2.7254 0.311 0.078 y y y y

12 Rt A 4 60 0.9664 4.4995 0.054 0.040 n r n r

12 Rt A 4 65 1.265 3.5409 0.078 0.068 n r n r

13 Lt A 4 50 0.2688 2.9172 0.776 0.249 y y y y

13 Lt A 4 60 1.4746 3.5494 0.000 0.000 y y y y

13 Lt A 4 70 7.0925 11.2886 0.000 0.000 y y y y

13 Rt A 4 40 4.6711 7.4281 0.000 0.000 y y y y

13 Rt A 4 50 6.547 11.9604 0.000 0.000 y y y y

13 Rt A 4 60 10.117 6.7241 0.000 0.000 n r n r

14 Lt A 4 40 2.9965 4.6614 0.002 0.012 n y n y

14 Lt A 4 50 5.5169 5.0416 0.000 0.000 r y r y

14 Rt A 4 40 5.558 6.5864 0.000 0.000 y y y y

14 Rt A 4 50 5.1072 9.4276 0.000 0.000 y y y y

15 Lt A 4 40 7.8613 10.543 0.000 0.000 y y y y

15 Lt A 4 50 7.7214 6.5855 0.000 0.000 y y y y

15 Rt A 4 40 3.0514 9.0503 0.006 0.000 y y y y

15 Rt A 4 50 6.2835 8.0596 0.000 0.000 n r n r

16 Lt A 4 40 0.296 3.1622 0.796 0.164 n r n r

16 Lt A 4 50 0.3806 3.4996 0.669 0.130 n r n r

16 Rt A 4 40 3.4048 2.268 0.006 0.419 n r n r

16 Rt A 4 50 0.9739 2.6267 0.204 0.561 n r n r

16 Rt A 4 60 1.912 3.1217 0.036 0.212 n r n r

17 Lt A 4 40 4.7684 7.0083 0.000 0.000 y y y y

17 Rt A 4 50 8.5826 8.6365 0.000 0.000 y y y y

18 Lt A 4 40 2.5921 5.4859 0.000 0.008 y y y y

Page 66: Exploring statistical approaches to Auditory Brainstem Response testing

65

18 Lt A 4 50 3.3808 5.5447 0.000 0.002 r r r r

18 Rt A 4 40 1.3159 2.4657 0.022 0.363 n r n r

18 Rt A 4 50 0.4065 1.8605 0.800 0.509 n r n r

19 Lt A 4 30 7.043 8.4987 0.000 0.000 y y y y

19 Rt A 4 50 12.0276 8.1472 0.000 0.000 y y y y

20 Rt A 4 40 0.4241 3.8028 0.721 0.078 n r n r

20 Rt A 4 50 1.163 3.2583 0.028 0.084 n y n y

21 Lt A 4 40 2.8643 6.5816 0.000 0.000 y y y y

21 Lt A 4 50 3.6222 5.7806 0.000 0.000 y y y y

21 Rt A 4 40 2.3649 4.321 0.000 0.014 r y r y

21 Rt A 4 50 2.2128 6.3634 0.000 0.000 r y r y

23 Lt A 4 40 6.3384 4.696 0.000 0.008 y y y y

23 Lt A 4 50 0.6384 2.3934 0.505 0.128 n r n r

23 Rt A 4 40 1.5449 5.7374 0.024 0.026 r y r y

23 Rt A 4 50 4.2412 3.7149 0.000 0.016 y r y r

24 Lt A 4 60 2.8683 10.2549 0.000 0.000 y y y y

24 Rt A 4 65 1.2955 3.8627 0.078 0.048 n r n r

25 Lt A 4 40 1.2817 3.0344 0.126 0.008 r r r r

25 Lt A 4 50 0.7545 4.6986 0.359 0.006 y y y y

25 Rt A 4 40 4.3085 5.7578 0.000 0.008 r y r y

25 Rt A 4 50 4.5883 11.7267 0.000 0.000 y y y y

26 Lt A 4 50 7.0774 7.7962 0.000 0.000 y y y y

26 Rt A 4 40 4.4908 6.6928 0.000 0.000 y y y y

26 Rt A 4 50 8.3415 6.3562 0.000 0.000 y y y y

Page 67: Exploring statistical approaches to Auditory Brainstem Response testing

66

7.2 Appendix B: Code devised by Dr Bell for use on Matlab. The code calculates Fsp, Autratio and p-values arising from bootstrapping. Graphical representations of ABR waveforms are also produced.

Part 1 Part 2 %% 24 7 13. Deleted analysis using random index

bootstrap %% used random rotation from Kimberley - seems to work

Ok %% something funny is going on - why is this showing a

drift??? %% INCLUDE INVERSION OF DATA - SEEMS TO BE UPSIDEDOWN

%% 12 8 14 correction added so that mean abs difference

used on line 96 %% (3:1 rule) instead of abs of mean difference

% triggers at 0.02s separation clear;close all X=wavread('Baby 26 Rt 4k 50dB.wav'); %% ADD FILENAME 1

HERE

x=resample(X,5000,44100); % resample to 5k = guy used

filter of 1500

X1=wavread('Baby 26 Rt 4k 50dB b.wav'); %% ADD FILENAME

2 HERE x1=resample(X1,5000,44100); % resample to 5k = guy used

filter of 1500 fs=5000; analysisstart=6; % start of 3:1/fsp window analysisend=16; % end of 3:1/fsp window AS=analysisstart/1000*fs; AE=analysisend/1000*fs; Amid=fix((AS+AE)/2); % run recording 1

data=-x(:,1); triggers=x(:,2); trigdelay=0.002; % length to pause after a trigger TD=trigdelay*fs; % pause in samples triggerthreshold=0.2;

epoch=0.018*fs; %18ms window %rejectT=0.1; % level to reject epochs. 0.2 is with no

scaling. rejectT=0.15; % seems Ok about 37.5 microV %rejectT=5; N=0; for i=1:length(triggers)-epoch; %for i=1:20000 if triggers(i)>triggerthreshold temp=-data(i:i+epoch-1); % artefact rejection if max(temp)<rejectT; if min(temp)>-rejectT; N=N+1; array(N,:)=temp; end end for n=i+1:i+TD triggers(n)=0; % if a trigger is detected,

set next triggers to zero end end end array=array/4000; % scale? scale=linspace(0,18,epoch); % time axis for plot abr=-mean(array);

Page 68: Exploring statistical approaches to Auditory Brainstem Response testing

67

Part 3 Part 4 figure;plot(scale,abr);title('Average of 2 responses')

% run recording 2 data1=-x1(:,1); triggers1=x1(:,2); trigdelay=0.002; % length to pause after a trigger TD=trigdelay*fs; % pause in samples triggerthreshold=0.2;

epoch=0.018*fs; %18ms window %rejectT=0.1; % level to reject epochs. 0.2 is with no

scaling. rejectT=0.15; % seems Ok about 37.5 microV %rejectT=5;

N1=0; for i=1:length(triggers1)-epoch; %for i=1:20000 if triggers1(i)>triggerthreshold temp1=-data1(i:i+epoch-1); % artefact rejection if max(temp1)<rejectT; if min(temp1)>-rejectT; N1=N1+1; array1(N1,:)=temp1; end end for n=i+1:i+TD triggers1(n)=0; % if a trigger is detected,

set next triggers to zero end end end array1=array1/4000; % scale? scale=linspace(0,18,epoch); % time axis for plot abr1=-mean(array1); hold on

plot(scale,abr1,'c');title('ABR overlay plot')

array2=[array' array1']'; % combine arrays of abrs abr2=(abr+abr1)/2; % average abr and abr1 - gives abr2 figure;plot(scale,abr2) top=max(abr2(AS:AE)); %peak in ABR 2 low=min(abr2(AS:AE)); %trough in ABR 2 diff=mean(abs((abr(AS:AE)-abr1(AS:AE)))); % average

difference over window ratio=(top-low)/diff % subaverages % abra=mean(array(1:fix(N/2),:)); % abrb=mean(array(fix(N/2)+1:fix(N),:)); % figure;plot(scale,abra);hold on;plot(scale,abrb)

% FSP calc upper=var(abr(AS:AE)); % power from 5 to 15 ms % NHSP

for click (.005*5000:.015*5000)); lower=var(array(:,Amid)); % power of SP at ms .008*5000) Fsp=upper*N/lower

% Bootstrap Fsp for n=1:499 temp=array; for i=1:N rotate=fix(rand*(epoch-1))+1; % avoid zero by

adding 1 temp(i,:)=[temp(i,rotate+1:epoch)

temp(i,1:rotate)]; %% end bootstrap(n,:)=mean(temp);

BootFsp(n)=N*var(bootstrap(n,AS:AE))/var(temp(:,Amid));; End

Bootvar=sort(BootFsp); BootFsp(475);

Page 69: Exploring statistical approaches to Auditory Brainstem Response testing

68

Part 5 Part 6 sort(BootFsp); count1=0; for i=1:length(BootFsp) if Fsp>BootFsp(i) count1=count1+1; end end SIGFSP=1-(count1/length(BootFsp)) % Bootstrap 3:1 for n=1:499 temp=array; for i=1:N rotate=fix(rand*(epoch-1))+1; % avoid zero by

adding 1 temp(i,:)=[temp(i,rotate+1:epoch)

temp(i,1:rotate)]; %% end bootstrap(n,:)=mean(temp); abr3=mean(temp); % call 'abr3'

temp1=array1; % 2nd array for i=1:N1 rotate=fix(rand*(epoch-1))+1; % avoid zero by

adding 1 temp1(i,:)=[temp1(i,rotate+1:epoch)

temp1(i,1:rotate)]; %% end bootstrap1(n,:)=mean(temp1); abr4=mean(temp1); % call 'abr4' abr5=(abr3+abr4)/2; top=max(abr5(AS:AE)); %peak in ABR 5 low=min(abr5(AS:AE)); %trough in ABR 5 % correction to mean abs, not abs mean 5/12/14 diff=mean(abs((abr3(AS:AE)-abr4(AS:AE)))); % average

difference over window Bootratio(n)=(top-low)/diff;

end

count2=0; for i=1:length(Bootratio) if ratio>Bootratio(i) count2=count2+1; end end SIGRATIO=1-(count2/length(Bootratio)) break

for n=1:epoch temp2=bootstrap(:,n); temp1=sort(temp2); lower5(n)=temp1(25); %lowest 1% upper5(n)=temp1(475); % upper 1% lower1(n)=temp1(5); %lowest 1% upper1(n)=temp1(495); % upper 1% lowerp2(n)=temp1(1); %lowest 1% upperp2(n)=temp1(499); % upper 1% end figure; plot(scale,abr2);hold on plot(scale,lower5,'c') plot(scale,upper5,'g') plot(scale,lower1,'c:') plot(scale,upper1,'g:') plot(scale,lowerp2,'r:') plot(scale,upperp2,'r:')

title('mean response with upper and lower 5th, 1st

percentile amplitudes from bootstrap and upper and lower

.2%')

Page 70: Exploring statistical approaches to Auditory Brainstem Response testing

69

7.3 Appendix C: Code devised by Dr Bell for use on Matlab software to analyse simulations. This code allows the user to input an SNR level (before averaging) and then Fsp, Autratio and p-values arising from bootstrapping are calculated. Graphical representations of the waveform are also produced.

Part 1 Part 2 %% 24 7 13. Deleted analysis using random index

bootstrap %% used random rotation from Kimberley - seems to work

Ok %% something funny is going on - why is this showing a

drift??? %% INCLUDE INVERSION OF DATA - SEEMS TO BE UPSIDEDOWN

%% 12 8 14 correction added so that mean abs difference

used on line 96 %% (3:1 rule) instead of abs of mean difference

% triggers at 0.02s separation % clear;close all % X=wavread('Baby 19 Rt 4k 50dB.wav'); %% ADD FILENAME

1 HERE % x=resample(X,5000,44100); % resample to 5k = guy used

filter of 1500 % % X1=wavread('Baby 19 Rt 4k 50dBb.wav'); %% ADD FILENAME

2 HERE % x1=resample(X1,5000,44100); % resample to 5k = guy

used filter of 1500 %% 28 Nov. Read in x and x1 here fs=5000; analysisstart=6; % start of 3:1/fsp window analysisend=16; % end of 3:1/fsp window AS=analysisstart/1000*fs; AE=analysisend/1000*fs; Amid=fix((AS+AE)/2); % run recording 1 data1=-x1(:,1);

triggers1=x1(:,2); trigdelay=0.002; % length to pause after a trigger TD=trigdelay*fs; % pause in samples triggerthreshold=0.2;

epoch=0.018*fs; %18ms window %rejectT=0.1; % level to reject epochs. 0.2 is with no

scaling. %rejectT=0.15; % seems Ok about 37.5 microV %rejectT=5; rejectT=10000; %no rejection N1=0; for i=1:length(triggers1)-epoch; %for i=1:20000 if triggers1(i)>triggerthreshold temp1=-data1(i:i+epoch-1); % artefact rejection if max(temp1)<rejectT; if min(temp1)>-rejectT; N1=N1+1; array1(N1,:)=temp1; end end for n=i+1:i+TD triggers1(n)=0; % if a trigger is detected,

set next triggers to zero end end end array1=array1/4000; % scale? scale=linspace(0,18,epoch); % time axis for plot abr1=-mean(array1); hold on

Page 71: Exploring statistical approaches to Auditory Brainstem Response testing

70

Part 3 Part 4

plot(scale,abr1,'c');title('ABR overlay plot')

array2=[array' array1']'; % combine arrays of abrs abr2=(abr+abr1)/2; % average abr and abr1 - gives abr2 figure;plot(scale,abr2) top=max(abr2(AS:AE)); %peak in ABR 2 low=min(abr2(AS:AE)); %trough in ABR 2 diff=mean(abs((abr(AS:AE)-abr1(AS:AE)))); % average

difference over window ratio=(top-low)/diff

% subaverages % abra=mean(array(1:fix(N/2),:)); % abrb=mean(array(fix(N/2)+1:fix(N),:)); % figure;plot(scale,abra);hold on;plot(scale,abrb)

% FSP calc upper=var(abr(AS:AE)); % power from 5 to 15 ms % NHSP

for click (.005*5000:.015*5000)); lower=var(array(:,Amid)); % power of SP at ms .008*5000) Fsp=upper*N/lower % Bootstrap Fsp for n=1:499 temp=array; for i=1:N rotate=fix(rand*(epoch-1))+1; % avoid zero by

adding 1 temp(i,:)=[temp(i,rotate+1:epoch)

temp(i,1:rotate)]; %% end bootstrap(n,:)=mean(temp);

BootFsp(n)=N*var(bootstrap(n,AS:AE))/var(temp(:,Amid));; end

Bootvar=sort(BootFsp);

BootFsp(475); sort(BootFsp); count1=0; for i=1:length(BootFsp) if Fsp>BootFsp(i) count1=count1+1; end end SIGFSP=1-(count1/length(BootFsp))

% Bootstrap 3:1 for n=1:499 temp=array; for i=1:N rotate=fix(rand*(epoch-1))+1; % avoid zero by

adding 1 temp(i,:)=[temp(i,rotate+1:epoch)

temp(i,1:rotate)]; %% end bootstrap(n,:)=mean(temp); abr3=mean(temp); % call 'abr3' temp1=array1; % 2nd array for i=1:N1 rotate=fix(rand*(epoch-1))+1; % avoid zero by

adding 1 temp1(i,:)=[temp1(i,rotate+1:epoch)

temp1(i,1:rotate)]; %% end bootstrap1(n,:)=mean(temp1); abr4=mean(temp1); % call 'abr4' abr5=(abr3+abr4)/2; top=max(abr5(AS:AE)); %peak in ABR 5 low=min(abr5(AS:AE)); %trough in ABR 5 % diff=abs(mean((abr3(AS:AE)-abr4(AS:AE)))); %

average difference over window

Page 72: Exploring statistical approaches to Auditory Brainstem Response testing

71

Part 5 Part 6

diff=mean(abs((abr3(AS:AE)-abr4(AS:AE)))); % average

difference over window Bootratio(n)=(top-low)/diff; End

count2=0; for i=1:length(Bootratio) if ratio>Bootratio(i) count2=count2+1; end end SIGRATIO=1-(count2/length(Bootratio))

break

for n=1:epoch temp2=bootstrap(:,n); temp1=sort(temp2); lower5(n)=temp1(25); %lowest 1% upper5(n)=temp1(475); % upper 1% lower1(n)=temp1(5); %lowest 1% upper1(n)=temp1(495); % upper 1% lowerp2(n)=temp1(1); %lowest 1% upperp2(n)=temp1(499); % upper 1% end figure; plot(scale,abr2);hold on plot(scale,lower5,'c') plot(scale,upper5,'g') plot(scale,lower1,'c:') plot(scale,upper1,'g:') plot(scale,lowerp2,'r:') plot(scale,upperp2,'r:')

title('mean response with upper and lower 5th, 1st

percentile amplitudes from bootstrap and upper and lower

.2%')

Page 73: Exploring statistical approaches to Auditory Brainstem Response testing

72

7.4 Appendix D: Risk assessment form submitted to the University of Southampton for approval.

Health & Safety Risk Assessment

Assessor 25544209 Responsible Manager Dr Steven Bell Date 15/10/2014

Faculty / Service ISVR Academic Unit / Team BSc Audiology Location ISVR building 13

Brief description

of task / activity

There will be no recruitment of participants. Existing data will be used for analysis and simulations will be run to generate

additional data. The subjects will remain anonymised.

Reasonably foreseeable hazards Inherent risk Controls Residual risk

Looking at computer screen for too long

(Researcher)

Low X

Take regular breaks to give eyes rest.

Low X

Med Med

High High

Page 74: Exploring statistical approaches to Auditory Brainstem Response testing

73

7.5 Appendix E: All simulated data retested ten times at each SNR.

Key P≤0.05 Significance

SNR (Before Averaging)

Fsp Ratio Sig FSP Sig Ratio

0.002 1.4575 4.9221 0.1122 0.02

0.002 3.529 3.3764 0.002 0.1543

0.002 0.9477 1.0529 0.3206 0.9699

0.002 1.0092 1.7418 0.2325 0.7014

0.002 2.1339 1.9548 0.0321 0.5992

0.002 2.0216 1.5916 0.02 0.7575

0.002 0.4564 0.6174 0.4449 1

0.002 0.4119 1.6509 0.8076 0.7515

0.002 4.06 1.6676 0 0.7415

0.002 0.3451 2.2505 0.8818 0.481

SNR (Before Averaging)

Fsp Ratio Sig FSP Sig Ratio

0.004 1.0663 4.8356 0.2224 0.012

0.004 3.4581 7.4201 0.002 0

0.004 0.7731 2.5767 0.4088 0.3607

0.004 5.3982 2.2282 0 0.487

0.004 0.8692 3.5447 0.3427 0.0882

0.004 2.0829 3.6389 0.22 0.0822

0.004 2.4769 4.0293 0.012 0.0501

0.004 2.2578 2.2076 0.02 0.493

0.004 1.8971 4.4464 0.0361 0.02

0.004 2.0559 3.8348 0.0261 0.0601

SNR (Before Averaging)

Fsp Ratio Sig FSP Sig Ratio

0.006 3.5007 4.8255 0 0.0261

0.006 3.7126 3.0781 0.002 0.1784

0.006 2.9539 7.4786 0.008 0

0.006 4.2528 4.0728 0 0.0421

0.006 5.5796 7.0287 0 0.002

0.006 4.9602 11.1003 0 0

0.006 3.3279 6.3912 0.004 0.002

0.006 2.6251 6.128 0.14 0.006

0.006 6.2224 8.2152 0 0

0.006 2.4937 7.4414 0.018 0

Page 75: Exploring statistical approaches to Auditory Brainstem Response testing

74

SNR (Before Averaging)

Fsp Ratio Sig FSP Sig Ratio

0.008 3.9818 7.0377 0.002 0

0.008 8.3244 6.187 0 0.004

0.008 4.955 8.3237 0 0

0.008 3.9505 5.3517 0.002 0.008

0.008 7.6731 4.2792 0 0.0361

0.008 5.6357 5.815 0 0

0.008 8.5347 9.6018 0 0

0.008 6.434 5.5265 0 0.002

0.008 5.3222 11.1609 0 0

0.008 7.9378 9.6877 0 0

SNR (Before Averaging)

Fsp Ratio Sig FSP Sig Ratio

0.01 8.6682 8.3902 0 0

0.01 9.3832 8.9045 0 0

0.01 8.773 9.6421 0 0

0.01 4.4791 4.2182 0 0.0341

0.01 5.5322 10.0589 0 0

0.01 10.7102 11.3996 0 0

0.01 5.3229 6.5612 0 0.004

0.01 8.6994 6.9989 0 0

0.01 7.2592 8.9056 0 0

0.01 7.0485 8.3938 0 0.002