the effect of scale centredness on patient ......caterina masino graduate department of curriculum,...

THE EFFECT OF SCALE CENTREDNESS ON

PATIENT SATISFACTION RESPONSES

by

Caterina Masino

A thesis submitted in conformity with the requirements

for the degree of Master of Arts

Graduate Department of Curriculum, Teaching and Learning

Ontario Institute for Studies in Education

University of Toronto

© Copyright by Caterina Masino 2010

ii

THE EFFECT OF SCALE CENTREDNESS ON

PATIENT SATISFACTION RESPONSES

Master of Arts 2010

Caterina Masino

Graduate Department of Curriculum, Teaching and Learning

University of Toronto

Abstract

High satisfaction rates and the lack of response variability are problematic areas in survey

research. An important area of methodological concern for self-report survey is the sensitivity

and reliability of the instrument. This research examines the effects of a positive (right) centred

scale on the distribution and reliability of satisfaction responses in a positive respondent

population. A total of 216 participants were randomly assigned to one of the following three

experimental Likert scale conditions: 5–point equal interval balanced scale; 5–point positive

(right) packed scale; 5–point positive (right) centred scale. The distribution of responses

occurred in the direction hypothesized. Comparable discrimination was found across the three

conditions. Although, the study findings did not prove to be significant, the equal interval

balanced scale produced the lowest mean score, contrary to previous research findings.

iii

Acknowledgements

The completion of this thesis dissertation is dedicated to all my cheerleaders.

I wish to thank Sharon McGonigle, Jennifer Wong, Emily Seto, Luis Saffie & Munira Jessa for

their unwavering faith and encouragement during my studies.

A special note of appreciation goes to:

The University Health Network Telehealth Program for their ongoing support, and in particular

to Judith Estrada for her much valued help during participant recruitment.

&

Kathleen Hartford and the Ontario Telemedicine Network for the permission to use their Patient

Satisfaction Questionnaire in this research.

Most of all, I wish to thank my family for their love, patience, and support over the years.

iv

Table of Contents

Abstract ........................................................................................................................................... ii

Acknowledgements ........................................................................................................................ iii

Chapter 1: Introduction ................................................................................................................... 1

Background to the Study ......................................................................................................... 2

Objectives of the Study ........................................................................................................... 2

Research Questions and Hypotheses ...................................................................................... 3

Chapter 2: Literature Review .......................................................................................................... 4

Likert Scales............................................................................................................................ 4

Patient Satisfaction Research .................................................................................................. 5

Rating Scale Construction....................................................................................................... 5

Chapter 3: Methodology ............................................................................................................... 11

Introduction ........................................................................................................................... 11

Research Design.................................................................................................................... 12

Sample................................................................................................................................... 13

Survey Instrument ................................................................................................................. 14

Procedure .............................................................................................................................. 16

Data Collection ..................................................................................................................... 17

Data Analysis ........................................................................................................................ 17

Chapter 4: Results ......................................................................................................................... 18

Descriptive Analysis ............................................................................................................. 18

Participants. ....................................................................................................................... 18

Survey. .............................................................................................................................. 18

Frequency of Use of Scale Labels ........................................................................................ 19

Means and Standard Deviations............................................................................................ 21

Individual Item Analysis ....................................................................................................... 24

Subscore Category Analysis ................................................................................................. 27

Summary: Means and Standard Deviations .......................................................................... 28

Analysis of Variance (ANOVA) ........................................................................................... 29

Reliability Analysis ............................................................................................................... 34

Chapter 5: Discussion and Conclusions ........................................................................................ 36

Discussion ............................................................................................................................. 36

Conclusion ............................................................................................................................ 40

Limitations of the Study........................................................................................................ 41

Suggestions for Future Research .......................................................................................... 44

References ..................................................................................................................................... 45

v

List of Tables

Table 1 PSQ Survey Items ............................................................................................................ 15

Table 2 PSQ Scoring Rules for Patient Satisfaction ..................................................................... 16

Table 3 Mean and Standard Deviations by Individual Items ........................................................ 24

Table 4 Mean and Standard Deviations by PSQ Subscore ........................................................... 27

Table 5 Summary of Means and Standard Deviations.................................................................. 28

Table 6 Levene‟s Test of Homogeneity of Variances for Individual Items ................................. 30

Table 7 Levene‟s Test of Homogeneity of Variances for Subscore ............................................. 31

Table 8 ANOVA Table for Individual Items ................................................................................ 32

Table 9 ANOVA Table for Subscore Categories ......................................................................... 33

Table 10 Tukey HSD Procedure for Item 7A ............................................................................... 34

Table 11 Reliability Analysis........................................................................................................ 35

List of Figures

Figure 1. Rating scale anchor choice and scale types. ................................................................. 13

Figure 2. Equal interval balance scale. ......................................................................................... 19

Figure 3. Positive packed scale. ................................................................................................... 19

Figure 4. Positive centred scale. ................................................................................................... 20

Figure 5. Scatterplot of mean scores for individual items. ........................................................... 22

Figure 6. Boxplot of mean score data for individual items. ......................................................... 23

Figure 7. Rating scale anchor choice and scale types. ................................................................. 43

List of Appendices

Appendix A Invitation to Participate ............................................................................................ 48

Appendix B Study Participant Survey: Version 2, Format A ....................................................... 49 Appendix C Study Participant Survey: Version 3, Format B ....................................................... 53 Appendix D Study Participant Survey: Version 3, Format C ....................................................... 57

1

Chapter 1:

Introduction

Patient satisfaction is an important factor in evaluating health care (Sitzia & Wood,

1997). Although high levels of patient satisfaction is a desired outcome of patient care, uniformly

high levels of satisfaction can lead critics to question satisfaction results. There are many reasons

for high satisfaction which span both conceptual and methodological design realms. In the

presence of an overly positive respondent population, an important area of methodological

concern for self-report survey is the sensitivity and reliability of the instrument. A lack of

sensitivity of the survey instrument can produce high satisfaction results artificially, so it is

necessary to determine which measures are most valid and sensitive to differences in quality.

Moreover, the lack in response variability becomes problematic for researchers, who are often

forced into comparing positive with less positive responses.

Telemedicine is an emerging type of health care delivery that allows a physician to

provide clinical care to their patient through interactive videoconference (Mekhjian, Turner,

Gailiun, & McCain, 1999). Patient satisfaction is one of the most widely researched patient-

oriented outcomes for telemedicine (Whitten & Mair, 2000). One feature of health care research

and of telemedicine in particular, is the daunting array of available surveys. Despite the

variability in survey instrumentation, consistent results have emerged. Three separate systematic

reviews all report that data on patient satisfaction with telemedicine reveal no unfavorable

effects, and had satisfaction rates greater than 80% (Currell, Urquhart, Wainwright, & Lewis,

2000; Mair & Whitten, 2000; Williams, May, & Esmail, 2001).

2

Background to the Study

Overly positive satisfaction response rates is not a new phenomenon in the presence of

Likert scales, which are known to be highly susceptible to a positive response set (Ware, 1978).

In the systematic review by Williams et al. (2001), the majority of telemedicine studies use

Likert-type format with agreement labels (Williams et al., 2001). The effects of rating scale

labels on the psychometric properties of survey instruments have not been systematically

researched in the area of patient satisfaction (Meric, 1994). However, some research has been

done in the realms of education and psychology that explore the effects of rating scale labels on

responses. Recently, a doctoral thesis dissertation by Kolic (2004) empirically examined the

effects of rating scale construction using a technique called scale centredness in the education

context (Kolic, 2004). Scale centredness is a function of the choice of endpoints. For example, if

a rating scale is leaning more to the right or positive end of the continuum, the scale is

considered positive (right) centred.(Kolic, 2004) Kolic (2004) found a significant effect of scale

centredness in her research and further recommended it as a viable scale technique in the

presence of an overly positive respondent population. Since telemedicine satisfaction research

has struggled with criticisms on its positive response rates, and given the importance of this

recent finding and limited research on this technique, the objective of this study is to explore the

effects of scale centredness on the psychometrics properties of a survey instrument used to assess

patient satisfaction in telemedicine context.

Objectives of the Study

The aim of the current study is to provide empirical research for rating scale design in

patient satisfaction by examining the effect of the scale centredness on rating responses in the

presence of an overly positive respondent population.

3

Research Questions and Hypotheses

Research Question 1:

To what extent does the centredness of a rating scale provide a better discrimination

of responses in telemedicine satisfaction survey?

Hypothesis 1:

The positive (right) centred scale will produce a lower mean than

positive (right) packed scale, with the highest mean produced by

the equal interval balanced scale.

Hypothesis 2:

The positive (right) centred scale will produce a larger variance

than the positive (right) packed scale, with the lowest variance

produced by the equal interval balanced scale.

Research Question 2:

What are the effects of right centered scale on the internal consistency and reliability

of satisfaction responses?

Hypothesis 3:

The positive (right) centred scale will produce a higher reliability

coefficient than the positive (right) packed scale with the lowest

produced for the equal interval balanced scale.

4

Chapter 2:

Literature Review

Likert Scales

A Likert scale measures attitudes in which respondents are asked to rate aspects of a

service or state the extent to which they agree or disagree with predetermined statements. A

typical Likert scale consists of a 5–point scale, equal interval spacing, and is balanced with the

same number of points in both the negative and positive direction. Respondents rate items that

are stated in a unidirectional direction. In terms of the scoring procedure, the scale is considered

a summative scale which allows summing and averaging scores across multiple items (Uebersax,

2006). An example of a Likert scale is as follows: strongly disagree, disagree, neutral, agree, and

strongly agree. Scales that do not meet all Likert characteristics as described above may be

referred to as Likert-type scales. Although, many definitions exist on Likert-type scales, the

general consensus is that these scales often use labels other than agreement labels and the most

widely used include label descriptions in quality and frequency (Meric, 1994). Examples of

Likert-type scales are commonly found and can include scales with labels such as evaluation

labels or frequency labels. Categories are evenly spaced, however the selection of labels in the

two-directions may be more relaxed and not exact opposites per se (Uebersax, 2006). Also,

Likert-type scales may not have an exact middle as in Likert scales (e.g., Neutral or Neither

Agree or Disagree), however do include a central point (e.g., Average or Good). The format of a

typical 5–point Likert-type scale with evaluation labels is as follows: Poor, Fair, Good, Very

Good, and Excellent.

5

Patient Satisfaction Research

Surveys are the most widely used instruments in health care research to assess patient

satisfaction. A review on patient perception of hospital care in the time period 1980-2003, found

that the most common survey employed used an evaluation type response format (Castle, Brown,

Hepner, & Hays, 2005). Similarly, a systematic review in telemedicine satisfaction found that of

the 77% (72 out of 93) of studies that provided survey details, 89% used a Likert format in which

there was a scale of agreement or disagreement (Williams et al., 2001). Unfortunately, there is

insufficient methodological detail in the majority of the published research regarding rating scale

formats, instrument sensitivity, reliability, and validity issues. A recent literature review found

only 22% studies provided additional details such as survey instrument employed. (Castle et al.,

2005) Similarly, Williams (2001) found that 94% of the telemedicine studies created their own

surveys and the majority (86%) did not report on validity or reliability (Williams et al., 2001).

One telemedicine study focusing on an inmate population documented the use of a 5–point

Likert scale having a reliability coefficient over 0.8 (Mekhjian et al., 1999). However, a partial

or altogether lacking methodological detail in the majority of studies makes it more difficult for

researchers to systematically explore rating scale construction and the effect of different scale

forms on the psychometric properties of a survey instrument.

Rating Scale Construction

Even in the midst of insufficient instrument details, a well-documented issue in the

published literature is the lack of variability of responses in satisfaction research in healthcare. A

general finding of patient satisfaction surveys in conventional healthcare delivery models show

uniformly high levels of satisfaction (Avis, Bond, & Arthur, 1997; Sitzia & Wood, 1997).

Similarly, systematic reviews on telemedicine satisfaction revealed no unfavorable effects, and

6

had satisfaction rates greater than 80% (Currell et al., 2000; Mair & Whitten, 2000; Williams et

al., 2001). Although, there are many factors thought to contribute to high rates of reported

satisfaction, the most common is response bias including social desirability bias or acquiescent

response bias (Sitzia & Wood, 1997). However, some suggest that survey design and in

particular the wording of questions, are important factors and may also contribute to response set

behaviours. There are many aspects of response variability that span both the methodological and

conceptual dimensions of satisfaction. Regarding methodology, a common practice in survey

design is to phrase some items positively and some negatively to minimize the „halo‟ effect as it

is commonly believed that when respondents realize that statements are not always positive, they

tend to rate each statement more carefully (Demiris, 2006). In the presence of positive response

rates, there has also been debate over the use of negatively (or reverse wording) of the question

stem intended to protect against positive response set behaviours. Research on the practice of

negatively worded items has found problems with internal consistency, factor structures, and

other statistics when negatively worded stems are used either alone or together with directly

worded stems (Barnette, 2000). Barnette (2000) explored alternatives to using negatively worded

items. In this study, combinations of item stem direction and Likert response options were used

to determine effects on reliability and found the condition with the highest reliability occurred

when directly worded stems were coupled with bidirectional response options (Barnette, 2000).

Barnette (2000) suggested that using directly worded item stems with bidirectional response

labels should be selected instead of negative wording in practice. However, it is not the scope of

the present study to manipulate item wording as a way to create response variability. The current

study will focus on Likert rating scale labels.

7

There are a number of factors that should be considered by researchers in developing

Likert rating scales. These factors include: number of scale points; category of labels or anchors;

assignment of numerical values in conjunction with labels; degree of labeling; scale width;

semantic compatibility; equal interval properties or packedness of the scale (a function of choice

of labels between the endpoints); centredness of the scale (a function of choice of labels for

endpoints) (Kolic, 2004).

There has much debate about the implications of scale length and equal interval spacing.

Likert scales should be carefully designed since scale length may adversely affect scale

reliability. It has been found that the optimal number of scale points is between four and seven

points (Lozano, Garcia-Cueto, & Muniz, 2008). Regarding the interval spacing between response

choices, a common belief is that Likert rating scales should be designed with equally spaced

response choices since it is thought that respondents might respond to rating scales as if the

response choices were equally spaced (Spector, 1980). But in practice, rating scales have been

designed even when response choice words did not form an equal interval scale. In order to

investigate whether unequally spaced response choices cause significant problems, Spector

(1980) conducted three experiments where he found that equally spaced response choices make

the rater‟s task easier, and that respondents equalized those response choice when they were

unequal (Spector, 1980).

The choice of labels is an important factor to consider in scale construction. The choice

of labels has been known to affect the interval property of the rating scale (Lam & Klockars,

1982; Lam & Stevens, 1994; Wildt & Mazis, 1978). Lam and Klockars (1982) found that equal

interval properties of a rating scale were dependent on the appropriate choice of labels to anchor

the points on the rating scale (Lam & Klockars, 1982). Furthermore, Lam and Stevens (1994)

8

examined the effects of rating scale design and its interaction with content polarization and item

word intensity and found that responses can be influenced not only by the design of the rating

scale but also by the interactions of the design with item content and wording (Lam & Stevens,

1994). Therefore, the impact of scale labels on the variability of subject‟s responses is dependent

on the nature of the scale content and in the manner in which content is conveyed in each item.

There have been a number of studies that have explored the technique of scale

packedness. In Lam and Klockars (1982) evaluative labels were used in an education context

consisting of the following four types of rating scales: (a) endpoints labeled only; (b) labels

equally spaced; (c) right or positive packed; (d) left or negative packed

(Lam & Klockars, 1982).The authors found that left or negatively packed scales produced the

highest mean, right or positively packed scales produced the lowest mean and that the equal

interval scale and endpoints-only labeled scale produced mean ratings that were equivalent and

intermediate to the means of the positively and negatively packed scales (Lam & Klockars,

1982). They found that scales with only endpoints labeled produced results similar to scales with

equally spaced response labels. Most of the research indicates that there is little difference

between scales with all points labeled and end-point only labeled scales. In the Dixon et al.

(1984) results did not show a significant difference between the end-defined and all-category

defined formats, nor did respondents indicate a format preference.(Dixon, Bobo, & Stevick,

1984) In general, labeled scales tend to be endorsed more often than unlabelled scales if

descriptors allow.

Similar findings were presented by Hancock and Klockars (1991) study using frequency

labels on the following three scales: (a) 5–point balanced equal interval; (b) 9–point balanced

equal interval; (c) 5–point right packed scale. (Hancock & Klockars, 1991).

9

The study found that in the case of frequency labels, the best discrimination came from the 9–

point balanced scale, the second best form the 5–point positive packed scale and the poorest from

the 5–point balanced scale (Hancock & Klockars, 1991). Although the 9–point scale provided

better discrimination and higher mean correlation than the two shorter points, the 5–point right

packed scale provided the lower mean (Hancock & Klockars, 1991). In a follow-up study by

Klockars and Hancock (1993) the same experiment was employed but consisting of the

evaluation labels (very poor to excellent). The positive packed scale differed significantly from

the two balanced scales, whereas the two balanced scales were indistinguishable from each other

(Klockars & Hancock, 1993).

The implications from the Hancock and Klockars (1993) study for rating scale

construction of evaluation scales, the effect of lengthening the scale or label packing the scale is

minimal (Klockars & Hancock, 1993). The difference between frequency and evaluation rating

scales were interpreted as due to the fact that evaluation labels lack the specificity of frequency

labels show high semantic elasticity and do not have such absolute meanings as the frequency

labels (Klockars & Hancock, 1993).

There has not been much written on scale centredness. Kolic‟s (2004) doctoral thesis

dissertation explored the effects of scale centredness on responses, using a 2 x 3 factorial design

that crossed two levels of scale centredness (left and right) with three levels of scale packedness

(left and right) (Kolic, 2004). Only 5–point rating scale using frequency labels was employed in

this study. The study found that the mean score for the left packed scale was higher than for the

equal interval scale, which in turn, was higher than that of the right packed scale (Kolic, 2004).

In addition, significant main effects were found for centredness and packedness (Kolic, 2004).

The observed power of both centredness and packedness was also significant which suggests that

10

study results are stable and there is a high probability of obtaining results significant on the 0.05

level in study replications (Kolic, 2004). With regards to reliability coefficients, no significant

differences were found. Respectively, the highest reliability was found for left centred left

packed scales and right centred left packed scales (Kolic, 2004). The lowest reliability was found

for left centred right packed scales and right centred right packed scales (Kolic, 2004). Kolic

(2004) recommended that the use of a right centred scales to obtain finer discrimination, and

ensure adequate response variability in an overly positive respondent population.

A wide variety of rating scales are available researchers, however the selection of formats

are especially important when respondents are inclined toward a generally positive attitude

towards the object of interest being measured. Popular techniques to minimize the effects of

positive response set include packing a scale, and reverse wording. Scale centredness, a newer

technique, has not been explored with the use of evaluative labels. As health care research uses

evaluation labels quite frequently, this study will use a centered rating scale with agreement

response options within the context of measuring patient satisfaction with telemedicine health

service delivery.

11

Chapter 3:

Methodology

Introduction

The purpose of this research is to examine the effects of a positive (right) centered scale

on the distribution and reliability of satisfaction responses. As previously mentioned,

telemedicine satisfaction is an excellent context for this experiment as it is prone to an overly

positive respondent population. This study will make the assumption that telemedicine

respondents are telling the truth and are satisfied with the service.

The two research questions are:

1. To what extent does the centredness of a rating scale provide a better discrimination

of responses in telemedicine satisfaction survey?

2. What are the effects of right centered scale on the internal consistency and reliability

of satisfaction responses?

The three hypotheses are:

1. The positive (right) centred scale will produce a lower mean than positive (right)

packed scale, with the highest mean produced by the equal interval balanced scale.

2. The positive (right) centred scale will produce a larger variance than the positive

(right) packed scale, with the lowest variance produced by the equal interval balanced

scale.

12

3. The positive (right) centred scale will produce a higher reliability coefficient than the

positive (right) packed scale with the lowest produced for the equal interval balanced

scale.

The independent variables are the following rating scales: positive (right) centred,

positive (right) packed, and equal interval balanced (see Figure 1). The dependent variables are

the distribution and variability of the rating responses, and the reliability of the survey

instrument.

Research Design

Each participant will be randomly assigned to one of three experimental Likert scale

forms in the table below. The original survey (equal interval balanced) is one of the three

experimental conditions to determine a baseline.

13

< Psychological Continuum >

Figure 1. Rating scale anchor choice and scale types.

Sample

The target population is telemedicine patients. The study was conducted at a multi-site

hospital in Toronto that uses telemedicine services for clinical appointments. A convenience

sample was used consisting of patients who have been scheduled for a telemedicine appointment

with a clinician at one of the participating hospital sites during the study time frame. This study

has been approved by the University Health Network Research Ethics Board and the University

of Toronto Office of Research Ethics.

14

Survey Instrument

The current study will use the existing telemedicine Patient Satisfaction Questionnaire

(PSQ) developed by the Ontario Telemedicine Network (OTN). The survey consists of a total of

31 items, 19 of which are within the rating scale of interest. There are 19 rating scale items of

interest which are constructed as statements of opinion and patients‟ perceptions of telemedicine

consultations and use the equal interval balanced rating scale. Only 17 of the 19 items are used to

obtain the satisfaction score in the following six categories: expectations, access, technical,

communication, privacy, and satisfaction.

Statistics on the PSQ in the year 2008 found the overall satisfaction with telemedicine

was 97%, with mean item rating ranging from 4.28/5 to 4.62/5, mode 5, reliability measured by

Cronbach‟s alpha 0.822 (Keresztes, Hartford, & Wilk, 2008). On this basis, this instrument can

be utilized to examine the scale form conditions for this research.

Table 1 presents the abbreviated item content for the PSQ items by its respective

subscore category and the direction of item wording (e.g., whether the item represents a

favorable (+) or unfavorable (-) opinion about telemedicine consultations. Seventeen items are

used to score the PSQ subscores. All items represent a favorable opinion except one item 7Q that

represents an unfavorable opinion statement. This item will be reverse scored.

15

Table 1

PSQ Survey Items

Item Abbreviated Item Content by Sub-score Group Direction of Wording

SUB-SCORE 1: EXPECTATIONS

7A “received enough notice” +

7B “scheduled quickly” +

7C “knew what to expect” +

7D “see my health care provider sooner” +

7Q “rather see in person than by telemedicine” - (reverse score)

SUB-SCORE 2: ACCESS

7M “felt as comfortable receiving care” +

7N “easier for me to see the health care provider” +

SUB-SCORE 3: TECHNICAL

7E “see the health care provider clearly” +

7F “hear the health care provider clearly” +

SUB-SCORE 4: COMMUNICATION

7G “could talk about the same information” +

7H “was enough time” +

7I “felt I was understood” +

7J “next steps in my care were explained” +

SUB-SCORE 5: PRIVACY

7K “felt comfortable in the room” +

7L “felt my privacy was respected” +

SUB-SCORE 6: SATISFACTION

7O “overall, I was satisfied” +

7P “will use telemedicine again” +

16

Items are grouped and scored as shown in Table 2. The items within each subscore

category are summed to yield the score and then presented as an average.

Table 2

PSQ Scoring Rules for Patient Satisfaction

Subscore Items

1. Expectations 7A + 7B + 7C + 7D + 7Q

2. Access 7M + 7N

3. Technical 7E + 7F

4. Communication 7G + 7H + 7I + 7J

5. Privacy 7K + 7L

6. Satisfaction 7O + 7P

Procedure

The study was introduced to eligible participants by telephone. When participants agreed

to partake in the study, a letter requesting participation and consent (see Appendix A) was

attached to the survey that was selected by random assignment and mailed to the participant.

Participants were asked to circle the number for each statement that represents the opinion that is

closest to his or her view. The survey and item wording of the PSQ were identical across all

forms except the for response label choices on the scale of interest. Survey form A was the

baseline equal interval balanced scale with the following response scale: Strongly Disagree,

Disagree, Neutral, Agree, and Strongly Agree (see Appendix B). Survey form B employed the

positive (right) packed scale in which subjects responded to the same statements on the scale of

17

interest but using the response anchors of: Strongly Disagree, Neutral, Agree, Very Much Agree,

and Strongly Agree (see Appendix C). Survey form C employed the positive (right) centred scale

in which subjects responded to the same statements on the scale of interest but using the response

anchors of: Disagree, Neutral, Agree, Very Much Agree, and Strongly Agree (see Appendix D).

Data Collection

To investigate the hypotheses, participants were randomly assigned to one of the three

different forms of a survey. Participants were drawn from a multi-site teaching hospital in

Toronto who had a scheduled telemedicine appointment occurring in the 2-month study time

frame. An independent experimental design was utilized, and a convenience sample included

patients who were scheduled for a telemedicine appointment within the study time frame.

Subjects were then randomly assigned to one of the three survey form conditions. A total of 216

surveys were sent by mail and a total of 154 surveys were returned by mail, therefore the study

response rate was 71%. Returns by survey form were as follows: 54 equal interval balanced, 48

positive (right) packed scale, and 52 positive (right) centred survey forms.

Data Analysis

The data was analyzed using the Statistical Package for the Social Sciences (SPSS)

Version 13.0. The first analysis examines the percentage of responses given for each label choice

in the rating scale. The second analysis examines mean scores and standard deviations of the 17

rating scale items individually and then again by the six subscores. The one-way analysis of

variance (ANOVA) will be performed to compare the three survey conditions. In addition,

reliability measures and Cronbach‟s Alpha are examined to determine the internal consistency of

responses for the three survey conditions.

18

Chapter 4:

Results

Descriptive Analysis

Participants.

The sample consisted of 74 females (48%) and 80 males (52%). The majority of

participants (n = 90, 58%) were between 45–64 years of age with the second largest majority (n

= 41, 27%) 65 years of age and over. The remaining 14% (n = 22) were between 22-44 years of

age, with 1 respondent between 18–24 years of age.

In terms of the highest education level attainment, the majority of participants had a

college (includes technical, trade, community) diploma (n = 52, 34%) with the second largest

majority reporting having obtained a high diploma only (n = 50, 33%). Twenty three participants

(n = 23, 15%) had a university degree. Also, 20 participants only completed grade school and

one participant indicated not completing grade school level. There were nine missing values for

education level.

It was their first experience for a telemedicine appointment for 69 (45%) of participants;

however the majority (n = 85, 55%) had used telemedicine before. It was a subsequent (follow-

up) appointment with their physician for the majority of participants 92 (60%). Twenty-one

(14%) of participants reported that they had technical problems with the session.

Survey.

There was no missing data for the 17 rating scale items in the scale of interest. The

descriptive statistics (mean and standard deviations) for the participant‟s rating responses for

each survey form, equal interval balanced, positive (right) packed, positive (right) centred are

presented.

19

Frequency of Use of Scale Labels

Figures 2, 3, and 4 displays the percentage of responses given to each scale label for each

survey form condition.

Equal Interval Balanced Scale

0.3% 0.9%6.3%

39.5%

52.9%

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

Strongly

Disagree

Disagree Neutral Agree Strongly Agree

Figure 2. Equal interval balance scale.

Positive Packed Scale

1.0%3.9% 6.1%

24.1%

64.8%

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

Strongly

Disagree

Neutral Agree Very Much

Agree

Strongly Agree

Figure 3. Positive packed scale.

20

Positive Centered Scale

1.4% 2.7%5.3%

28.8%

61.8%

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

Disagree Neutral Agree Very Much

Agree

Strongly Agree

Figure 4. Positive centred scale.

Over 92% of the participants selected a positive choice (“agree” to “strongly agree”) in

all 3 survey form conditions. Comparison of the frequency of scale label choice of the 3 survey

forms indicate that the majority of respondents selected the “strongly agree” label and continue

to do so in the both the positive (right) packed and positive (right) centred scale conditions.

However, in the positive (right) packed and positive (right) centred scale conditions, there is a

substantial use of the “very much” label and minimal use of the “agree” label. For example, in

the equal interval balanced form, the “agree” label was used by subjects 39.5% of the time, but

only 6.1% in the packed scale and 5.3% in the positive (right) centred scale. In the equal interval

balanced scale, it could be hypothesized that respondents may round down their choice and

select “agree.” However, in the positive (right) centred and positive packed, respondents seem to

round up their choice to “very much agree.” The presence of an intermediate choice “very much

agree” in the positive (right) centred and positive packed scale should produce a higher mean

score. Therefore the positive (right) centred and positive packed scales reflect a more accurate

level of positivity than the equal interval balanced scale condition.

21

Means and Standard Deviations

Overall satisfaction results ranged from 92% to 96% across all three survey forms with

mean scores ranging from 3.52/5 to 4.81/5 with a mode of 5. In the equal interval balanced scale

condition, mean scores ranged from 3.70 to 4.65, with a mode of 5. In the positive (right) packed

scale condition, mean scores ranged from 3.52 to 4.81, with a mode of 5. In the positive (right)

centred scale condition, mean scores ranged from 3.88 to 4.77 with a mode of 5. The results of

the current study are slightly lower but comparable to the PSQ 2008 baseline data from 2008

which found that for overall satisfaction with telemedicine was 97%, with mean item rating

ranging from 4.28/5 to 4.62/5 and a mode of 5 (Keresztes, et al., 2008). The individual item

mean scores are graphically displayed in a scatterplot and boxplot format in Figure 4 and Figure

5 illustrate the overlap in the distribution of mean scores across the 17 individual items (see

Figures 4-5). Of particular interest, item 7Q (“rather see in person than by telemedicine”)

produces the lowest mean in all three survey forms (see Figures 4-5).

22

Figure 5. Scatterplot of mean scores for individual items.

23

Figure 6. Boxplot of mean score data for individual items.

24

Individual Item Analysis

The means and standard deviations for the 17 individual items are listed in Table 3.

Table 3

Mean and Standard Deviations by Individual Items

Item Scale condition N Mean

Standard

deviation

7a Equal Interval Balanced

Positive (Right) Packed

Positive (Right) Centred

54

48

52

4.43 Low

4.77 High

4.67 Med

0.838 High

0.472 Low

0.648 Med

7b Equal Interval Balanced



54

48

52

4.35 Low

4.52 High

4.38 Med

0.828 Med

0.850 High

0.796 Low

7c Equal Interval Balanced



54

48

52

4.28 Med

4.31 High

4.02 Low

0.878 Low

0.949 Med

1.129 High

7d Equal Interval Balanced



54

48

52

4.39 High

4.27 Med

4.23 Low

0.712 Low

1.216 High

1.096 Med

7e Equal Interval Balanced



54

48

52

4.65 Med

4.81 High

4.62 Low

0.482 Med

0.394 Low

0.690 High

7f Equal Interval Balanced



54

48

52

4.50 Low

4.69 High

4.63 Med

0.505 Low

0.624 High

0.561 Med

25


Standard

deviation

7g Equal Interval Balanced



54

48

52

4.56 Med

4.67 High

4.54 Low

0.572 Low

0.7532 High

0.7531 Med

7h Equal Interval Balanced



54

48

52

4.46 Low

4.63 High

4.50 Med

0.573 Med

0.570 Low

0.780 High

7i Equal Interval Balanced



54

48

52

4.54 Med

4.50 Low

4.62 High

0.539 Low

0.772 High

0.599 Med

7j Equal Interval Balanced



54

48

52

4.50 Low

4.60 High

4.56 Med

0.541 Low

0.644 High

0.608 Med

7k Equal Interval Balanced



54

48

52

4.48 Low

4.58 High

4.50 Med

0.574 Low

0.679 Med

0.897 High

71 Equal Interval Balanced



54

48

52

4.54 Low

4.60 Med

4.62 High

0.503 Low

0.707 High

0.661 Med

7m Equal Interval Balanced



54

48

52

4.26 Med

4.25 Low

4.42 High

0.732 Low

1.042 High

0.936 Med

7n Equal Interval Balanced



54

48

52

4.65 High

4.25 Low

4.42 Med

0.555 Low

1.120 High

0.997 Med

26


Standard

deviation

7o Equal Interval Balanced



54

48

52

4.556 Low

4.563 Med

4.60 High

0.572 Low

0.580 Med

0.721 High

7p Equal Interval Balanced



54

48

52

4.63 Med

4.60 Low

4.77 High

0.525 Med

0.676 High

0.425 Low

7q Equal Interval Balanced



54

48

52

3.70 Med

3.52 Low

3.88 High

0.903 Low

1.111 High

0.943 Med

The values across the three survey forms are very similar. The mean scores were lowest

for the equal interval balanced scale condition occurred in 8 out of 17 items, with the

intermediate position occurring with the positive centred scale condition (7 /17), and the highest

mean score occurring with the positive packed scale condition (9/17). The standard deviation

scores were highest for the positive packed scale condition occurring in 11 out of 17 items, and

lowest for the equal interval balanced scale condition (12 /17) with the positive centred scale at

the intermediate position (10/17).

27

Subscore Category Analysis

The mean and standard deviations for the six subscore categories are reported in Table 4.

Table 4

Mean and Standard Deviations by PSQ Subscore


Standard

deviation

1. Expectations Equal Interval Balanced



54

48

52

4.2296 Low

4.2792 High

4.2385 Med

0.47963 Low

0.58599 High

0.57263 Med

2. Access Equal Interval Balanced



54

48

52

4.4537 High

4.2500 Low

4.4231 Med

0.51641 Low

0.89917 High

0.78830 Med

3. Technical Equal Interval Balanced



54

48

52

4.5741 Low

4.7500 High

4.6250 Med

0.45977 Low

0.47266 Med

0.60127 High

4. Communication Equal Interval Balanced



54

48

52

4.5139 Low

4.5990 High

4.5529 Med

0.48422 Low

0.55000 Med

0.61504 High

5. Privacy Equal Interval Balanced



54

48

52

4.5093 Low

4.5938 High

4.5577 Med

0.51844 Low

0.65766 Med

0.75831 High

6. Satisfaction Equal Interval Balanced



54

48

52

4.5926 Med

4.5833 Low

4.6827 High

0.50539 Low

0.58649 High

0.52421 Med

28

As in the individual item analysis, the values across the three survey forms are similar.

The mean scores were lowest for the equal interval balanced scale condition in 4 subscore

categories (4/6) with the highest mean scores occurring in the positive packed scale condition

(4/6). The positive centred scale condition was at the intermediate position occurring in 5

subscore categories (5/6).

The standard deviation scores were the lowest for the equal interval balanced scale

condition occurring in all 6 subscore categories. The positive packed and positive centred scale

performed equally and tied both the highest and the intermediate positions, with 3 subscore

categories each.

Summary: Means and Standard Deviations

A summary of the mean scores and standard deviations are listed in Table 5. Overall, the

findings were consistent for the equal interval balanced scale only across the individual item and

subscore category analyses.

Table 5

Summary of Means and Standard Deviations

Mean

Standard

deviation

Individual item analysis Equal Interval Balanced



Low

High

Med

Low

High

Med

Subscore category analysis Equal Interval Balanced



Low

Med

High

Low

Equal Position (Med-High)

Equal Position (Med-High)

29

The equal interval balanced scale condition had the lowest mean score in both individual

and subscore category analyses. However, the intermediate and highest mean score positions

were not consistent across the individual item analysis found the positive packed scale condition

producing a higher mean score than the positive centred scale condition. The subscore category

analysis found the opposite and had the positive centred scale condition producing the highest

mean score (see Table 5).

Regarding the variance as measured by the standard deviation, the equal interval scale

condition had the lowest standard deviation score. This result was consistent with both individual

item and subscore category analyses. The individual analysis found that the standard deviation

was highest for the positive centred scale condition with the positive packed scale condition at

the intermediate position. Instead, the subscore analysis revealed equal positions for both

positive packed and positive centred conditions for highest and intermediate positions.

Analysis of Variance (ANOVA)

An analysis of variance (ANOVA) procedure was conducted to determine whether there

are significant differences across the three survey forms. One-way ANOVA assumes that the

variances of the conditions are equal. The sample size of each survey form condition differs and

is as follows: 54 in the equal interval balanced scale condition, 48 in the positive packed scale

condition, and 52 in the positive centred scale condition. A test of homogeneity of variances was

conducted to assess the assumption of equal variances across the 3 survey form conditions and to

deem whether the data is appropriate for ANOVA procedure.

Tables 6 and 7 list the results of the Levene test for homogeneity of variances.

30

Table 6

Levene’s Test of Homogeneity of Variances for Individual Items

Levene statistic df1 df2 Sig.

7A 6.358 2 151 .002 significant

7B .061 2 151 .941 n.s

7C 1.096 2 151 .337 n.s

7D 4.421 2 151 .014 significant

7E 6.285 2 151 .002 significant

7F .177 2 151 .838 n.s

7G .690 2 151 .503 n.s

7H 1.614 2 151 .203 n.s

7I 2.185 2 151 .116 n.s

7J .183 2 151 .833 n.s

7K 1.120 2 151 .329 n.s

7L .488 2 151 .615 n.s

7M 2.816 2 151 .063 n.s

7N 6.707 2 151 .002 significant

7O .042 2 151 .959 n.s

7P 5.843 2 151 .004 significant

7Q 2.711 2 151 .070 n.s

31

Table 7

Levene’s Test of Homogeneity of Variances for Subscore

Levene statistic df1 df2 Sig.

Expectations 1.429 2 151 .243 n.s

Access 6.767 2 151 .002 significant

Technical 2.606 2 151 .077 n.s

Communication 1.220 2 151 .298 n.s

Privacy .768 2 151 .466 n.s

Satisfaction 1.028 2 151 .360 n.s

The significance value exceeded 0.05 for the majority of the items (12/17) in the

individual item analysis, and 5 out of 6 categories in the subscore category analysis suggesting

that the variances of the three survey conditions are equal and the assumption is justified. The

results of the one-way ANOVA procedure are presented in Table 8 for individual item analysis

and in Table 9 for subscore category analysis.

The Levene statistic can also be used to determine standard deviation significance testing.

The individual item analysis demonstrated significance in 6 out of 17 items in the 1 out of 6

subscore categories in the subscore category analysis.

32

Table 8

ANOVA Table for Individual Items

ANOVA

3.271 2 1.635 3.573 .030

69.125 151 .458

72.396 153

.801 2 .400 .589 .556

102.602 151 .679

103.403 153

2.633 2 1.317 1.342 .264

148.127 151 .981

150.760 153

.716 2 .358 .343 .710

157.543 151 1.043

158.260 153

1.104 2 .552 1.897 .154

43.935 151 .291

45.039 153

.967 2 .484 1.526 .221

47.870 151 .317

48.838 153

.480 2 .240 .496 .610

72.923 151 .483

73.403 153

.720 2 .360 .854 .428

63.676 151 .422

64.396 153

.351 2 .175 .429 .652

61.734 151 .409

62.084 153

.278 2 .139 .391 .677

53.806 151 .356

54.084 153

.293 2 .147 .276 .759

80.148 151 .531

80.442 153

.190 2 .095 .242 .785

59.213 151 .392

59.403 153

.976 2 .488 .594 .553

124.063 151 .822

125.039 153

4.077 2 2.039 2.443 .090

126.007 151 .834

130.084 153

.049 2 .025 .062 .940

59.665 151 .395

59.714 153

.808 2 .404 1.346 .263

45.303 151 .300

46.110 153

3.305 2 1.652 1.702 .186

146.546 151 .971

149.851 153

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

7A

7B

7C

7D

7E

7F

7G

7H

7I

7J

7K

7L

7M

7N

7O

7P

7Q

Sum of

Squares df Mean Square F Sig.

33

Table 9

ANOVA Table for Subscore Categories

ANOVA

.070 2 .035 .117 .890

45.055 151 .298

45.124 153

1.201 2 .601 1.082 .342

83.827 151 .555

85.028 153

.821 2 .411 1.545 .217

40.141 151 .266

40.963 153

.184 2 .092 .302 .740

45.937 151 .304

46.121 153

.184 2 .092 .217 .805

63.900 151 .423

64.084 153

.309 2 .155 .534 .587

43.718 151 .290

44.028 153

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

Between Groups

Within Groups

Total

expectations

access

technical

communication

privacy

satisfaction

Sum of

Squares df Mean Square F Sig.

Overall, the ANOVA procedure did not find significance in both the individual item and

subscore category analyses. The subscore category analysis also found no significance in all six

categories (see Table 9). The individual item analysis yielded a significance level exceeding 0.05

in 16 out of 17 items which suggests that there are no group differences overall (see Table 8).

The results of the ANOVA revealed a statistically significant difference for only one item (7A)

from the individual item analysis. The individual item analysis indicated significance for 7A,

F(2,151) = 3.573, p= 0.03). To determine which group(s) differ, the Tukey post hoc test was

conducted. Table 10 lists the pairwise comparison of the group means for the Tukey post hoc

procedure.

34

Table 10

Tukey HSD Procedure for Item 7A

N Subset for alpha = .05

1 2

Equal Interval Balanced 54 4.43

Positive (Right) Centred 52 4.67 4.67

Positive (Right) Packed 48 4.77

Sig. .158 .745

The Tukey HSD procedure indicates that the equal interval balanced scale condition differs from

the positive (right) packed scale condition (see Table 10). However, the equal interval balanced

scale condition does not differ from the positive (right) centred scale condition. In addition, the

positive (right) packed and positive (right) centred scale conditions do not differ from each other.

Reliability Analysis

The values are similar across all 3 survey forms. Overall, the reliability analysis yielded

high Cronbach alpha coefficients for all the 3 survey forms, ranging 0.888 to 0.907 (see Table

11). A high Cronbach alpha coefficient was consistent with PSQ baseline data from 2008 which

reported a Cronbach alpha of 0.822 in 2008. Cronbach alpha for the equal interval balanced scale

was measured at 0.888. Cronbach alphas were 0.898 for the positive packed scale and 0.907 for

the positive centred scale. The lowest reliabilities occurred with the equal interval balanced scale

with the highest from the positive centred scale.

35

Table 11

Reliability Analysis

Scale

N M SD

Cronbach

alpha Variance

Equal Interval Balanced 17 75.46 6.641 0.888 44.102

Positive (Right) Packed 17 76.15 8.460 0.898 71.574

Positive (Right) Centred 17 75.98 8.644 0.907 74.725

36

Chapter 5:

Discussion and Conclusions

Discussion

The current study examines the effects of a positive (right) centered scale on the

distribution and reliability of satisfaction responses in the health care domain. The research

questions examine:

1. the extent to which centredness of a rating scale provides a better discrimination of

responses in telemedicine satisfaction survey; and

2. the effects of right centered scale on the internal consistency and reliability of

satisfaction responses.

Three hypotheses were presented in the current study.

Hypothesis 1 stated that the positive (right) centred scale will produce a lower

mean than positive (right) packed scale, with the highest mean produced by the

equal interval balanced scale.

This hypothesis was not supported by the current study. No significance was found for mean

score differences. The results indicate that the equal interval balanced scale condition yielded the

lowest mean score. This finding was consistent in both individual item and subscore category

analyses. However, the highest and intermediate mean score positions varied between the

individual item and subscore category analyses. In the individual item analysis, the highest mean

score was produced by the positive packed scale with the positive centred scale at the

intermediate position. In the subscore category analyses the positive centred scale produced the

highest mean score with the positive packed scale at the intermediate position..

37

The mean score findings are not consistent with Kolic‟s (2004) findings on scale

centredness using frequency labels which found significant main effects for both centredness and

packedness, further producing the lower mean score in the positive (right) centred scale (M =

2.16), the intermediate mean score in the positive (right) packed scale (M = 2.24) and the higher

mean score in the equal interval scale (M =2.37) (Kolic, 2004). However, the current study

findings are similar with Klockars and Hancock (1993) on packedness using evaluation labels

which found minimal effects and comparable discrimination for a 5–point positive packed scale,

5–point equal interval balanced scale, 9–point equal interval balanced scale. Klockars and

Hancock (1993) explain that the differences using evaluation labels instead of frequency labels

may be due to high semantic elasticity of the labels used, for example evaluation labels do not

have such absolute meaning as frequency labels do for respondents (Klockars & Hancock, 1993).

This may be a plausible explanation for the differences in the current study findings using

evaluation labels in comparison to Kolic‟s (2004) study using frequency labels. Lam and Kolic

(2008) used an equal interval rating scale identical to the equal interval balanced scale condition

employed in the current study and therefore, a similar effect should be expected for the mean

score results. However, this was not the case. Lam and Kolic (2008) found that a matched

condition produces a lower mean for the positive packed scale whereas a mismatched condition

would produce of the positive packed and equal interval scales are equal (Lam & Kolic, 2008).

The opposite effect was observed in the current study in which mean scores were lowest in equal

interval scale versus the positive packed scale and does not support either conclusion.

A more plausible explanation may be likely due to respondent strategies. Although it is

expected that the saturation with positive anchors in a positive packed scale would produce lower

mean scores for the equal interval scale, as attested in Lam and Kolic (2008), the differences

38

observed in the current study may be due to the location on the scale where the saturation took

place. For example, in the current study, the presence of an intermediate choice “very much

agree” in the positive centred and positive packed scale produces a higher mean score.

Theoretically, it is probable that respondents used rounding strategy (round up to “very much

agree”) in the presence of an intermediate option between “agree” and “strongly agree”, which

may result in a lower mean in the equal interval scale condition. However, Lam and Kolic‟s

(2008) matched condition used the “somewhat agree” label that expanded response options to the

left of “agree” (somewhat disagree, somewhat agree, agree, strongly agree). Therefore the mean

scores results in the current study are consistent with the saturation of positive labels which

produced a “rounding up” of responses from “agree” to “very much agree” in turn yielding a

higher mean score in the positive packed and positive centred scales. This is a likely explanation

for the lower mean scores found in the equal interval balanced scale.

Other plausible explanations may also be due to response bias. Lam and Kolic (2008)

suggested that if scale labels are not compatible with item wording, the respondent will resort to

satisficing or simply ignore the labels presented in the scale. (Lam & Kolic, 2008) For example,

acquiescence bias is a form of satisficing and defined as the tendency to give positive responses

or “yea-saying” (Streiner & Norman, 2008). Lam and Kolic (2008) attribute the absence of

significant findings on semantic compatibility and variances in their study to a possible ceiling

effect. A ceiling effect occurs when the responses are not evenly distributed and show a positive

„skew‟ toward the favourable end which makes it impossible to distinguish among the various

levels of excellence (Streiner & Norman, 2008). A possible method to counteract this bias is to

offset the middle and expand in the area of interest (Streiner & Norman, 2008). Scale

centredness may be a viable strategy to counteract ceiling effect as it offsets the middle and

39

expands the area of interest. The current study found non-significant effects although the

distribution occurred in the manner hypothesized.

Another plausible explanation concerns cognitive burden on respondents. Spector (1980)

argued against using unequal scales as the cognitive burden on respondents is increased in the

presence of these scales. He further identifies that cognitive burden affects older respondents or

respondents with lower education levels who may strategize to select one choice and use

throughout the responses. (Spector, 1976) The respondent population in the current study have

similar characteristics in which 85% are over the age of 45. This may be a plausible explanation

for the findings in the current study.

Hypothesis 2 stated that the positive (right) centred scale will produce a larger

variance than the positive (right) packed scale, with the lowest variance produced

by the equal interval balanced scale.

The hypothesis is supported. The current study found that the equal interval scale condition

produced the lowest standard deviation score. This result was consistent with both individual

item and subscore category analyses. The individual analysis found that the standard deviation

was highest for the positive centred scale condition with the positive packed scale condition at

the intermediate position. Instead, the subscore analysis revealed equal positions for both

positive packed and positive centred conditions for highest and intermediate positions. No

significant differences were found. The result on positive packed and equal interval balanced

scale condition is consistent with Lam and Kolic (2008) which found that the standard deviation

in the matched condition using evaluation labels was higher in the positive packed scale and

lower in the equal interval scale. In Kolic‟s (2004) study on centredness, the effects of the

different centred scales on the variability of response were not fully investigated; however a

40

higher standard deviation was reported for the negative (left) centred over the positive (right)

centred scale. Therefore, a comparison regarding variability of responses to the positive centred

scale in the current study was not possible.

Hypothesis 3 stated that the positive (right) centred scale will produce a higher

reliability coefficient than the positive (right) packed scale with the lowest

produced for the equal interval balanced scale.

This hypothesis was supported by the current study. High reliabilities were observed for all three

survey forms, which is consistent with the reported reliability for the baseline PSQ. Reliability

using Cronbach alpha was measured at 0.888 for the equal interval balanced scale, 0.898 for the

positive packed scale, 0.907 for the positive centred scale. The lowest reliabilities occurred with

the equal interval balanced scale with the highest from the positive centred scale. However, Lam

and Kolic (2008) reported a lower alpha with the positive packed scale condition compared to

the equal interval scale condition in both the matched and mismatched conditions. This finding

was not consistent with the current study although the values are very close for Cronbach‟s alpha

across all three surveys.

Conclusion

The current study addresses an interesting issue in rating scale design and has practical

implication for survey research, particularly in satisfaction research. This current study agrees

with Kolic‟s (2004) recommendation that future research is needed on scale centredness.

Although there were no significant findings, the performance of the positive (right) centred scale

on distribution of rating responses while maintaining a high reliability are noteworthy. In a

positive respondent population, it is recommended that the positive (right) centred scale should

41

be preferred over a positive packed scale as it maintains to a larger degree the equal interval

properties of a scale, does not distort results, and can enable the respondents to select a more

accurate level of positiveness.

Limitations of the Study

The following are the limitations of the current study:

1. Small sample size. The mean score, standard deviations and reliability alpha

coefficients were very similar and therefore proved difficult to make comparison

across the survey forms. A larger sample size should be considered in future studies.

2. The current study used a convenience sample of telemedicine patients who had a

scheduled appointment during the study period. Future research should employ

random sampling method.

3. Apart from Kolic‟s (2004) doctoral thesis dissertation, there was no empirical

research in the area of scale centredness. Also, Kolic‟s (2004) study combined the

selected levels of scale packedness and scale centredness in a factorial design

producing six frequency scales which made it difficult for comparison to the current

study.

4. The label selection for both the positive packed and positive centred scales were not

pre-determined for the study. The use of pre-determined scale values to select labels

would provide a more precise measurement of intensity. It was not the scope of the

current study to pre-determine scale values for the positive packed and positive

centred scale conditions. Although Kolic (2004) used pre-determined scale values

42

from an additional experiment, scale labels were not identical to the current study as

it excluded the labels of “neutral” and “strongly disagree.” Therefore the current

study could not generate a comparison using pre-determined scale values. Kolic‟s

study differs from the current study in that it used frequency labels and derived scale

values from her experiment to normalize her analysis. However, the current study did

not normalize the scale for the above analysis and used the scale values of 1, 2, 3, 4, 5

for the positive packed and positive centred scale conditions. However, it is plausible

to transform the scale and preserve the Likert scale values of 1–5 which are the actual

weighted values confirmed by Likert and insert 4.5 for the label “very much agree”,

the middle point between “agree” and “strongly agree.”

43

Figure 7 would be a normalized scale for the current study.

< Psychological Continuum >

Figure 7. Rating scale anchor choice and scale types.

It was not the scope of this study to normalize the scale values as to mirror practical

survey design as survey designers typically do not pre-determine values when they are

constructing a rating scale, but select them arbitrarily. An analysis used the normalized scale

manner may provide significant results.

44

Suggestions for Future Research

It is recommended that additional research is needed to explore the effects of scale

centredness on rating scale responses. Future research should include a large sample size as well

as employ qualitative methods to further explore respondent strategies. It is important to note

that much of the past research on scale packedness and centredness has been done in the

education setting using university student populations. A known positive respondent population

and existing survey instrument producing high satisfaction rates were employed in the current

study. Differences found in the current research may be very likely attributed to the respondent

population. Future research on scale centredness should include the health care sector as it may

prove to be an optimal setting to explore rating scale techniques for positive respondent

populations.

45

References

Avis, M., Bond, M., & Arthur, A. (1997). Questioning patient satisfaction: An empirical

investigation in two outpatient clinics. Social Science & Medicine, 44(1), 85-92.

Barnette, J. J. (2000). Effects of stem and Likert response option reversals on survey internal

consistency: If you feel the need, there is a better alternative to ssing those negatively

worded stems. Educational and Psychological Measurement, 60(3), 361-370.

Castle, N. G., Brown, J., Hepner, K. A., & Hays, R. D. (2005). Review of the literature on survey

instruments used to collect data on hospital patients' perceptions of care. Health Services

Research, 40(6 Pt 2), 1996-2017.

Currell, R., Urquhart, C., Wainwright, P., & Lewis, R. (2000). Telemedicine versus face to face

patient care: effects on professional practice and health care outcomes. Cochrane

Database of Systematic Reviews (2), CD002098.

Demiris, G. (2006). Principles of survey development for telemedicine applications. Journal of

Telemedicine and Telecare, 12(3), 111-115.

Dixon, P. N., Bobo, M., & Stevick, R. A. (1984). Response differences and preferences for all-

category-defined and end-defined Likert formats. Educational and Psychological

Measurement, 44, 61-66.

Hancock, G. R., & Klockars, A. J. (1991). The effect of scale manipulations on validity:

Targetting frequency rating scales for anticipated performance levels. Applied

Ergonomics, 22(3), 147-154.

Keresztes, C., Hartford, K., & Wilk, P. (2008, October). Measuring patient satisfaction with

telemedicine: Establishing psychometric properties. Paper presented at the Canadian

Society of Telehealth, Ottawa, Ontario, Canada.

46

Klockars, A. J., & Hancock, G. R. (1993). Manipulations of evaluative rating scales to increase

validity. Psychological Reports, 73,1059-1066.

Kolic, M. C. (2004). An empirical investigation of factors affecting Likert-type rating scale

responses. Unpublished doctoral dissertation, University of Toronto, Toronto, Ontario,

Canada.

Lam, T. C. M., & Klockars, A. J. (1982). Anchor point effects on the equivalence of

questionnaire items. Journal of Educational Measurement, 19(4), 317-322.

Lam, T. C. M., & Kolic, M. C. (2008). Effects of sematic incompatibility on rating response.

Applied Psychological Measurement, 32(3), 248-260.

Lam, T. C. M., & Stevens, J. J. (1994). Effects of content polarization, item wording, and rating

scale width on rating response. Applied Measurement in Education, 7(2), 141-158.

Lozano, L. M., Garcia-Cueto, E., & Muniz, J. (2008). Effect of the number of response

categories on the reliability and validity of rating scales. Methodology, 4(2), 73-79.

Mair, F., & Whitten, P. (2000). Systematic review of studies of patient satisfaction with

telemedicine. BMJ, 320(7248), 1517-1520.

Mekhjian, H., Turner, J. W., Gailiun, M., & McCain, T. A. (1999). Patient satisfaction with

telemedicine in a prison environment. Journal of Telemedicine and Telecare, 5(1), 55-61.

Meric, H. J. (1994). The effect of scale form choice on psychometric properties of patient

satisfaction measurement. Health Marketing Quarterly, 11(3-4), 27-39.

Sitzia, J., & Wood, N. (1997). Patient satisfaction: A review of issues and concepts. Social

Science & Medicine, 45(12), 1829-1843.

47

Spector, P. E. (1976). Choosing response categories for summated rating scale. Journal of

Applied Psychology, 61, 374-375.

Spector, P. E. (1980). Ratings of equal and unequal response choice intervals. Journal of Social

Psychology, 112, 115-119.

Streiner, D. L., & Norman, G. R. (2008). Health measurement scales: A practical guide to their

development and use (3rd

. ed.). New York: Oxford Univesity Press.

Uebersax, J. S. (2006). Likert scales: Dispelling the confusion. Retrieved September 15, 2009,

from Statistical Methods for Rater Agreement Web site, http://john-

uebersax.com/stat/likert.htm

Ware, J. E., Jr. (1978). Effects of acquiescent response set on patient satisfaction ratings.

Medical Care, 16(4), 327-336.

Whitten, P. S., & Mair, F. (2000). Telemedicine and patient satisfaction: Current status and

future directions. Telemedicine Journal and e-Health, 6(4), 417-423.

Wildt, A. R., & Mazis, M. B. (1978). Determinants of scale response: Label versus position.

Journal of Marketing Research, 15(May), 261-267.

Williams, T. L., May, C. R., & Esmail, A. (2001). Limitations of patient satisfaction studies in

telehealthcare: A systematic review of the literature. Telemedicine Journal and e-Health,

7(4), 293-312.

http://john-uebersax.com/stat/likert.htmhttp://john-uebersax.com/stat/likert.htm

48

Appendix A

Invitation to Participate

49

Appendix B

Study Participant Survey:

Version 2, Format A

53

Appendix C


Version 3, Format B

57

Appendix D


Version 3, Format C

the effect of scale centredness on patient ......caterina masino graduate department of curriculum,...

Documents