the effect of scale centredness on patient ......caterina masino graduate department of curriculum,...
TRANSCRIPT
-
THE EFFECT OF SCALE CENTREDNESS ON
PATIENT SATISFACTION RESPONSES
by
Caterina Masino
A thesis submitted in conformity with the requirements
for the degree of Master of Arts
Graduate Department of Curriculum, Teaching and Learning
Ontario Institute for Studies in Education
University of Toronto
© Copyright by Caterina Masino 2010
-
ii
THE EFFECT OF SCALE CENTREDNESS ON
PATIENT SATISFACTION RESPONSES
Master of Arts 2010
Caterina Masino
Graduate Department of Curriculum, Teaching and Learning
University of Toronto
Abstract
High satisfaction rates and the lack of response variability are problematic areas in survey
research. An important area of methodological concern for self-report survey is the sensitivity
and reliability of the instrument. This research examines the effects of a positive (right) centred
scale on the distribution and reliability of satisfaction responses in a positive respondent
population. A total of 216 participants were randomly assigned to one of the following three
experimental Likert scale conditions: 5–point equal interval balanced scale; 5–point positive
(right) packed scale; 5–point positive (right) centred scale. The distribution of responses
occurred in the direction hypothesized. Comparable discrimination was found across the three
conditions. Although, the study findings did not prove to be significant, the equal interval
balanced scale produced the lowest mean score, contrary to previous research findings.
-
iii
Acknowledgements
The completion of this thesis dissertation is dedicated to all my cheerleaders.
I wish to thank Sharon McGonigle, Jennifer Wong, Emily Seto, Luis Saffie & Munira Jessa for
their unwavering faith and encouragement during my studies.
A special note of appreciation goes to:
The University Health Network Telehealth Program for their ongoing support, and in particular
to Judith Estrada for her much valued help during participant recruitment.
&
Kathleen Hartford and the Ontario Telemedicine Network for the permission to use their Patient
Satisfaction Questionnaire in this research.
Most of all, I wish to thank my family for their love, patience, and support over the years.
-
iv
Table of Contents
Abstract ........................................................................................................................................... ii
Acknowledgements ........................................................................................................................ iii
Chapter 1: Introduction ................................................................................................................... 1
Background to the Study ......................................................................................................... 2
Objectives of the Study ........................................................................................................... 2
Research Questions and Hypotheses ...................................................................................... 3
Chapter 2: Literature Review .......................................................................................................... 4
Likert Scales............................................................................................................................ 4
Patient Satisfaction Research .................................................................................................. 5
Rating Scale Construction....................................................................................................... 5
Chapter 3: Methodology ............................................................................................................... 11
Introduction ........................................................................................................................... 11
Research Design.................................................................................................................... 12
Sample................................................................................................................................... 13
Survey Instrument ................................................................................................................. 14
Procedure .............................................................................................................................. 16
Data Collection ..................................................................................................................... 17
Data Analysis ........................................................................................................................ 17
Chapter 4: Results ......................................................................................................................... 18
Descriptive Analysis ............................................................................................................. 18
Participants. ....................................................................................................................... 18
Survey. .............................................................................................................................. 18
Frequency of Use of Scale Labels ........................................................................................ 19
Means and Standard Deviations............................................................................................ 21
Individual Item Analysis ....................................................................................................... 24
Subscore Category Analysis ................................................................................................. 27
Summary: Means and Standard Deviations .......................................................................... 28
Analysis of Variance (ANOVA) ........................................................................................... 29
Reliability Analysis ............................................................................................................... 34
Chapter 5: Discussion and Conclusions ........................................................................................ 36
Discussion ............................................................................................................................. 36
Conclusion ............................................................................................................................ 40
Limitations of the Study........................................................................................................ 41
Suggestions for Future Research .......................................................................................... 44
References ..................................................................................................................................... 45
-
v
List of Tables
Table 1 PSQ Survey Items ............................................................................................................ 15
Table 2 PSQ Scoring Rules for Patient Satisfaction ..................................................................... 16
Table 3 Mean and Standard Deviations by Individual Items ........................................................ 24
Table 4 Mean and Standard Deviations by PSQ Subscore ........................................................... 27
Table 5 Summary of Means and Standard Deviations.................................................................. 28
Table 6 Levene‟s Test of Homogeneity of Variances for Individual Items ................................. 30
Table 7 Levene‟s Test of Homogeneity of Variances for Subscore ............................................. 31
Table 8 ANOVA Table for Individual Items ................................................................................ 32
Table 9 ANOVA Table for Subscore Categories ......................................................................... 33
Table 10 Tukey HSD Procedure for Item 7A ............................................................................... 34
Table 11 Reliability Analysis........................................................................................................ 35
List of Figures
Figure 1. Rating scale anchor choice and scale types. ................................................................. 13
Figure 2. Equal interval balance scale. ......................................................................................... 19
Figure 3. Positive packed scale. ................................................................................................... 19
Figure 4. Positive centred scale. ................................................................................................... 20
Figure 5. Scatterplot of mean scores for individual items. ........................................................... 22
Figure 6. Boxplot of mean score data for individual items. ......................................................... 23
Figure 7. Rating scale anchor choice and scale types. ................................................................. 43
List of Appendices
Appendix A Invitation to Participate ............................................................................................ 48
Appendix B Study Participant Survey: Version 2, Format A ....................................................... 49 Appendix C Study Participant Survey: Version 3, Format B ....................................................... 53 Appendix D Study Participant Survey: Version 3, Format C ....................................................... 57
-
1
Chapter 1:
Introduction
Patient satisfaction is an important factor in evaluating health care (Sitzia & Wood,
1997). Although high levels of patient satisfaction is a desired outcome of patient care, uniformly
high levels of satisfaction can lead critics to question satisfaction results. There are many reasons
for high satisfaction which span both conceptual and methodological design realms. In the
presence of an overly positive respondent population, an important area of methodological
concern for self-report survey is the sensitivity and reliability of the instrument. A lack of
sensitivity of the survey instrument can produce high satisfaction results artificially, so it is
necessary to determine which measures are most valid and sensitive to differences in quality.
Moreover, the lack in response variability becomes problematic for researchers, who are often
forced into comparing positive with less positive responses.
Telemedicine is an emerging type of health care delivery that allows a physician to
provide clinical care to their patient through interactive videoconference (Mekhjian, Turner,
Gailiun, & McCain, 1999). Patient satisfaction is one of the most widely researched patient-
oriented outcomes for telemedicine (Whitten & Mair, 2000). One feature of health care research
and of telemedicine in particular, is the daunting array of available surveys. Despite the
variability in survey instrumentation, consistent results have emerged. Three separate systematic
reviews all report that data on patient satisfaction with telemedicine reveal no unfavorable
effects, and had satisfaction rates greater than 80% (Currell, Urquhart, Wainwright, & Lewis,
2000; Mair & Whitten, 2000; Williams, May, & Esmail, 2001).
-
2
Background to the Study
Overly positive satisfaction response rates is not a new phenomenon in the presence of
Likert scales, which are known to be highly susceptible to a positive response set (Ware, 1978).
In the systematic review by Williams et al. (2001), the majority of telemedicine studies use
Likert-type format with agreement labels (Williams et al., 2001). The effects of rating scale
labels on the psychometric properties of survey instruments have not been systematically
researched in the area of patient satisfaction (Meric, 1994). However, some research has been
done in the realms of education and psychology that explore the effects of rating scale labels on
responses. Recently, a doctoral thesis dissertation by Kolic (2004) empirically examined the
effects of rating scale construction using a technique called scale centredness in the education
context (Kolic, 2004). Scale centredness is a function of the choice of endpoints. For example, if
a rating scale is leaning more to the right or positive end of the continuum, the scale is
considered positive (right) centred.(Kolic, 2004) Kolic (2004) found a significant effect of scale
centredness in her research and further recommended it as a viable scale technique in the
presence of an overly positive respondent population. Since telemedicine satisfaction research
has struggled with criticisms on its positive response rates, and given the importance of this
recent finding and limited research on this technique, the objective of this study is to explore the
effects of scale centredness on the psychometrics properties of a survey instrument used to assess
patient satisfaction in telemedicine context.
Objectives of the Study
The aim of the current study is to provide empirical research for rating scale design in
patient satisfaction by examining the effect of the scale centredness on rating responses in the
presence of an overly positive respondent population.
-
3
Research Questions and Hypotheses
Research Question 1:
To what extent does the centredness of a rating scale provide a better discrimination
of responses in telemedicine satisfaction survey?
Hypothesis 1:
The positive (right) centred scale will produce a lower mean than
positive (right) packed scale, with the highest mean produced by
the equal interval balanced scale.
Hypothesis 2:
The positive (right) centred scale will produce a larger variance
than the positive (right) packed scale, with the lowest variance
produced by the equal interval balanced scale.
Research Question 2:
What are the effects of right centered scale on the internal consistency and reliability
of satisfaction responses?
Hypothesis 3:
The positive (right) centred scale will produce a higher reliability
coefficient than the positive (right) packed scale with the lowest
produced for the equal interval balanced scale.
-
4
Chapter 2:
Literature Review
Likert Scales
A Likert scale measures attitudes in which respondents are asked to rate aspects of a
service or state the extent to which they agree or disagree with predetermined statements. A
typical Likert scale consists of a 5–point scale, equal interval spacing, and is balanced with the
same number of points in both the negative and positive direction. Respondents rate items that
are stated in a unidirectional direction. In terms of the scoring procedure, the scale is considered
a summative scale which allows summing and averaging scores across multiple items (Uebersax,
2006). An example of a Likert scale is as follows: strongly disagree, disagree, neutral, agree, and
strongly agree. Scales that do not meet all Likert characteristics as described above may be
referred to as Likert-type scales. Although, many definitions exist on Likert-type scales, the
general consensus is that these scales often use labels other than agreement labels and the most
widely used include label descriptions in quality and frequency (Meric, 1994). Examples of
Likert-type scales are commonly found and can include scales with labels such as evaluation
labels or frequency labels. Categories are evenly spaced, however the selection of labels in the
two-directions may be more relaxed and not exact opposites per se (Uebersax, 2006). Also,
Likert-type scales may not have an exact middle as in Likert scales (e.g., Neutral or Neither
Agree or Disagree), however do include a central point (e.g., Average or Good). The format of a
typical 5–point Likert-type scale with evaluation labels is as follows: Poor, Fair, Good, Very
Good, and Excellent.
-
5
Patient Satisfaction Research
Surveys are the most widely used instruments in health care research to assess patient
satisfaction. A review on patient perception of hospital care in the time period 1980-2003, found
that the most common survey employed used an evaluation type response format (Castle, Brown,
Hepner, & Hays, 2005). Similarly, a systematic review in telemedicine satisfaction found that of
the 77% (72 out of 93) of studies that provided survey details, 89% used a Likert format in which
there was a scale of agreement or disagreement (Williams et al., 2001). Unfortunately, there is
insufficient methodological detail in the majority of the published research regarding rating scale
formats, instrument sensitivity, reliability, and validity issues. A recent literature review found
only 22% studies provided additional details such as survey instrument employed. (Castle et al.,
2005) Similarly, Williams (2001) found that 94% of the telemedicine studies created their own
surveys and the majority (86%) did not report on validity or reliability (Williams et al., 2001).
One telemedicine study focusing on an inmate population documented the use of a 5–point
Likert scale having a reliability coefficient over 0.8 (Mekhjian et al., 1999). However, a partial
or altogether lacking methodological detail in the majority of studies makes it more difficult for
researchers to systematically explore rating scale construction and the effect of different scale
forms on the psychometric properties of a survey instrument.
Rating Scale Construction
Even in the midst of insufficient instrument details, a well-documented issue in the
published literature is the lack of variability of responses in satisfaction research in healthcare. A
general finding of patient satisfaction surveys in conventional healthcare delivery models show
uniformly high levels of satisfaction (Avis, Bond, & Arthur, 1997; Sitzia & Wood, 1997).
Similarly, systematic reviews on telemedicine satisfaction revealed no unfavorable effects, and
-
6
had satisfaction rates greater than 80% (Currell et al., 2000; Mair & Whitten, 2000; Williams et
al., 2001). Although, there are many factors thought to contribute to high rates of reported
satisfaction, the most common is response bias including social desirability bias or acquiescent
response bias (Sitzia & Wood, 1997). However, some suggest that survey design and in
particular the wording of questions, are important factors and may also contribute to response set
behaviours. There are many aspects of response variability that span both the methodological and
conceptual dimensions of satisfaction. Regarding methodology, a common practice in survey
design is to phrase some items positively and some negatively to minimize the „halo‟ effect as it
is commonly believed that when respondents realize that statements are not always positive, they
tend to rate each statement more carefully (Demiris, 2006). In the presence of positive response
rates, there has also been debate over the use of negatively (or reverse wording) of the question
stem intended to protect against positive response set behaviours. Research on the practice of
negatively worded items has found problems with internal consistency, factor structures, and
other statistics when negatively worded stems are used either alone or together with directly
worded stems (Barnette, 2000). Barnette (2000) explored alternatives to using negatively worded
items. In this study, combinations of item stem direction and Likert response options were used
to determine effects on reliability and found the condition with the highest reliability occurred
when directly worded stems were coupled with bidirectional response options (Barnette, 2000).
Barnette (2000) suggested that using directly worded item stems with bidirectional response
labels should be selected instead of negative wording in practice. However, it is not the scope of
the present study to manipulate item wording as a way to create response variability. The current
study will focus on Likert rating scale labels.
-
7
There are a number of factors that should be considered by researchers in developing
Likert rating scales. These factors include: number of scale points; category of labels or anchors;
assignment of numerical values in conjunction with labels; degree of labeling; scale width;
semantic compatibility; equal interval properties or packedness of the scale (a function of choice
of labels between the endpoints); centredness of the scale (a function of choice of labels for
endpoints) (Kolic, 2004).
There has much debate about the implications of scale length and equal interval spacing.
Likert scales should be carefully designed since scale length may adversely affect scale
reliability. It has been found that the optimal number of scale points is between four and seven
points (Lozano, Garcia-Cueto, & Muniz, 2008). Regarding the interval spacing between response
choices, a common belief is that Likert rating scales should be designed with equally spaced
response choices since it is thought that respondents might respond to rating scales as if the
response choices were equally spaced (Spector, 1980). But in practice, rating scales have been
designed even when response choice words did not form an equal interval scale. In order to
investigate whether unequally spaced response choices cause significant problems, Spector
(1980) conducted three experiments where he found that equally spaced response choices make
the rater‟s task easier, and that respondents equalized those response choice when they were
unequal (Spector, 1980).
The choice of labels is an important factor to consider in scale construction. The choice
of labels has been known to affect the interval property of the rating scale (Lam & Klockars,
1982; Lam & Stevens, 1994; Wildt & Mazis, 1978). Lam and Klockars (1982) found that equal
interval properties of a rating scale were dependent on the appropriate choice of labels to anchor
the points on the rating scale (Lam & Klockars, 1982). Furthermore, Lam and Stevens (1994)
-
8
examined the effects of rating scale design and its interaction with content polarization and item
word intensity and found that responses can be influenced not only by the design of the rating
scale but also by the interactions of the design with item content and wording (Lam & Stevens,
1994). Therefore, the impact of scale labels on the variability of subject‟s responses is dependent
on the nature of the scale content and in the manner in which content is conveyed in each item.
There have been a number of studies that have explored the technique of scale
packedness. In Lam and Klockars (1982) evaluative labels were used in an education context
consisting of the following four types of rating scales: (a) endpoints labeled only; (b) labels
equally spaced; (c) right or positive packed; (d) left or negative packed
(Lam & Klockars, 1982).The authors found that left or negatively packed scales produced the
highest mean, right or positively packed scales produced the lowest mean and that the equal
interval scale and endpoints-only labeled scale produced mean ratings that were equivalent and
intermediate to the means of the positively and negatively packed scales (Lam & Klockars,
1982). They found that scales with only endpoints labeled produced results similar to scales with
equally spaced response labels. Most of the research indicates that there is little difference
between scales with all points labeled and end-point only labeled scales. In the Dixon et al.
(1984) results did not show a significant difference between the end-defined and all-category
defined formats, nor did respondents indicate a format preference.(Dixon, Bobo, & Stevick,
1984) In general, labeled scales tend to be endorsed more often than unlabelled scales if
descriptors allow.
Similar findings were presented by Hancock and Klockars (1991) study using frequency
labels on the following three scales: (a) 5–point balanced equal interval; (b) 9–point balanced
equal interval; (c) 5–point right packed scale. (Hancock & Klockars, 1991).
-
9
The study found that in the case of frequency labels, the best discrimination came from the 9–
point balanced scale, the second best form the 5–point positive packed scale and the poorest from
the 5–point balanced scale (Hancock & Klockars, 1991). Although the 9–point scale provided
better discrimination and higher mean correlation than the two shorter points, the 5–point right
packed scale provided the lower mean (Hancock & Klockars, 1991). In a follow-up study by
Klockars and Hancock (1993) the same experiment was employed but consisting of the
evaluation labels (very poor to excellent). The positive packed scale differed significantly from
the two balanced scales, whereas the two balanced scales were indistinguishable from each other
(Klockars & Hancock, 1993).
The implications from the Hancock and Klockars (1993) study for rating scale
construction of evaluation scales, the effect of lengthening the scale or label packing the scale is
minimal (Klockars & Hancock, 1993). The difference between frequency and evaluation rating
scales were interpreted as due to the fact that evaluation labels lack the specificity of frequency
labels show high semantic elasticity and do not have such absolute meanings as the frequency
labels (Klockars & Hancock, 1993).
There has not been much written on scale centredness. Kolic‟s (2004) doctoral thesis
dissertation explored the effects of scale centredness on responses, using a 2 x 3 factorial design
that crossed two levels of scale centredness (left and right) with three levels of scale packedness
(left and right) (Kolic, 2004). Only 5–point rating scale using frequency labels was employed in
this study. The study found that the mean score for the left packed scale was higher than for the
equal interval scale, which in turn, was higher than that of the right packed scale (Kolic, 2004).
In addition, significant main effects were found for centredness and packedness (Kolic, 2004).
The observed power of both centredness and packedness was also significant which suggests that
-
10
study results are stable and there is a high probability of obtaining results significant on the 0.05
level in study replications (Kolic, 2004). With regards to reliability coefficients, no significant
differences were found. Respectively, the highest reliability was found for left centred left
packed scales and right centred left packed scales (Kolic, 2004). The lowest reliability was found
for left centred right packed scales and right centred right packed scales (Kolic, 2004). Kolic
(2004) recommended that the use of a right centred scales to obtain finer discrimination, and
ensure adequate response variability in an overly positive respondent population.
A wide variety of rating scales are available researchers, however the selection of formats
are especially important when respondents are inclined toward a generally positive attitude
towards the object of interest being measured. Popular techniques to minimize the effects of
positive response set include packing a scale, and reverse wording. Scale centredness, a newer
technique, has not been explored with the use of evaluative labels. As health care research uses
evaluation labels quite frequently, this study will use a centered rating scale with agreement
response options within the context of measuring patient satisfaction with telemedicine health
service delivery.
-
11
Chapter 3:
Methodology
Introduction
The purpose of this research is to examine the effects of a positive (right) centered scale
on the distribution and reliability of satisfaction responses. As previously mentioned,
telemedicine satisfaction is an excellent context for this experiment as it is prone to an overly
positive respondent population. This study will make the assumption that telemedicine
respondents are telling the truth and are satisfied with the service.
The two research questions are:
1. To what extent does the centredness of a rating scale provide a better discrimination
of responses in telemedicine satisfaction survey?
2. What are the effects of right centered scale on the internal consistency and reliability
of satisfaction responses?
The three hypotheses are:
1. The positive (right) centred scale will produce a lower mean than positive (right)
packed scale, with the highest mean produced by the equal interval balanced scale.
2. The positive (right) centred scale will produce a larger variance than the positive
(right) packed scale, with the lowest variance produced by the equal interval balanced
scale.
-
12
3. The positive (right) centred scale will produce a higher reliability coefficient than the
positive (right) packed scale with the lowest produced for the equal interval balanced
scale.
The independent variables are the following rating scales: positive (right) centred,
positive (right) packed, and equal interval balanced (see Figure 1). The dependent variables are
the distribution and variability of the rating responses, and the reliability of the survey
instrument.
Research Design
Each participant will be randomly assigned to one of three experimental Likert scale
forms in the table below. The original survey (equal interval balanced) is one of the three
experimental conditions to determine a baseline.
-
13
< Psychological Continuum >
Figure 1. Rating scale anchor choice and scale types.
Sample
The target population is telemedicine patients. The study was conducted at a multi-site
hospital in Toronto that uses telemedicine services for clinical appointments. A convenience
sample was used consisting of patients who have been scheduled for a telemedicine appointment
with a clinician at one of the participating hospital sites during the study time frame. This study
has been approved by the University Health Network Research Ethics Board and the University
of Toronto Office of Research Ethics.
-
14
Survey Instrument
The current study will use the existing telemedicine Patient Satisfaction Questionnaire
(PSQ) developed by the Ontario Telemedicine Network (OTN). The survey consists of a total of
31 items, 19 of which are within the rating scale of interest. There are 19 rating scale items of
interest which are constructed as statements of opinion and patients‟ perceptions of telemedicine
consultations and use the equal interval balanced rating scale. Only 17 of the 19 items are used to
obtain the satisfaction score in the following six categories: expectations, access, technical,
communication, privacy, and satisfaction.
Statistics on the PSQ in the year 2008 found the overall satisfaction with telemedicine
was 97%, with mean item rating ranging from 4.28/5 to 4.62/5, mode 5, reliability measured by
Cronbach‟s alpha 0.822 (Keresztes, Hartford, & Wilk, 2008). On this basis, this instrument can
be utilized to examine the scale form conditions for this research.
Table 1 presents the abbreviated item content for the PSQ items by its respective
subscore category and the direction of item wording (e.g., whether the item represents a
favorable (+) or unfavorable (-) opinion about telemedicine consultations. Seventeen items are
used to score the PSQ subscores. All items represent a favorable opinion except one item 7Q that
represents an unfavorable opinion statement. This item will be reverse scored.
-
15
Table 1
PSQ Survey Items
Item Abbreviated Item Content by Sub-score Group Direction of Wording
SUB-SCORE 1: EXPECTATIONS
7A “received enough notice” +
7B “scheduled quickly” +
7C “knew what to expect” +
7D “see my health care provider sooner” +
7Q “rather see in person than by telemedicine” - (reverse score)
SUB-SCORE 2: ACCESS
7M “felt as comfortable receiving care” +
7N “easier for me to see the health care provider” +
SUB-SCORE 3: TECHNICAL
7E “see the health care provider clearly” +
7F “hear the health care provider clearly” +
SUB-SCORE 4: COMMUNICATION
7G “could talk about the same information” +
7H “was enough time” +
7I “felt I was understood” +
7J “next steps in my care were explained” +
SUB-SCORE 5: PRIVACY
7K “felt comfortable in the room” +
7L “felt my privacy was respected” +
SUB-SCORE 6: SATISFACTION
7O “overall, I was satisfied” +
7P “will use telemedicine again” +
-
16
Items are grouped and scored as shown in Table 2. The items within each subscore
category are summed to yield the score and then presented as an average.
Table 2
PSQ Scoring Rules for Patient Satisfaction
Subscore Items
1. Expectations 7A + 7B + 7C + 7D + 7Q
2. Access 7M + 7N
3. Technical 7E + 7F
4. Communication 7G + 7H + 7I + 7J
5. Privacy 7K + 7L
6. Satisfaction 7O + 7P
Procedure
The study was introduced to eligible participants by telephone. When participants agreed
to partake in the study, a letter requesting participation and consent (see Appendix A) was
attached to the survey that was selected by random assignment and mailed to the participant.
Participants were asked to circle the number for each statement that represents the opinion that is
closest to his or her view. The survey and item wording of the PSQ were identical across all
forms except the for response label choices on the scale of interest. Survey form A was the
baseline equal interval balanced scale with the following response scale: Strongly Disagree,
Disagree, Neutral, Agree, and Strongly Agree (see Appendix B). Survey form B employed the
positive (right) packed scale in which subjects responded to the same statements on the scale of
-
17
interest but using the response anchors of: Strongly Disagree, Neutral, Agree, Very Much Agree,
and Strongly Agree (see Appendix C). Survey form C employed the positive (right) centred scale
in which subjects responded to the same statements on the scale of interest but using the response
anchors of: Disagree, Neutral, Agree, Very Much Agree, and Strongly Agree (see Appendix D).
Data Collection
To investigate the hypotheses, participants were randomly assigned to one of the three
different forms of a survey. Participants were drawn from a multi-site teaching hospital in
Toronto who had a scheduled telemedicine appointment occurring in the 2-month study time
frame. An independent experimental design was utilized, and a convenience sample included
patients who were scheduled for a telemedicine appointment within the study time frame.
Subjects were then randomly assigned to one of the three survey form conditions. A total of 216
surveys were sent by mail and a total of 154 surveys were returned by mail, therefore the study
response rate was 71%. Returns by survey form were as follows: 54 equal interval balanced, 48
positive (right) packed scale, and 52 positive (right) centred survey forms.
Data Analysis
The data was analyzed using the Statistical Package for the Social Sciences (SPSS)
Version 13.0. The first analysis examines the percentage of responses given for each label choice
in the rating scale. The second analysis examines mean scores and standard deviations of the 17
rating scale items individually and then again by the six subscores. The one-way analysis of
variance (ANOVA) will be performed to compare the three survey conditions. In addition,
reliability measures and Cronbach‟s Alpha are examined to determine the internal consistency of
responses for the three survey conditions.
-
18
Chapter 4:
Results
Descriptive Analysis
Participants.
The sample consisted of 74 females (48%) and 80 males (52%). The majority of
participants (n = 90, 58%) were between 45–64 years of age with the second largest majority (n
= 41, 27%) 65 years of age and over. The remaining 14% (n = 22) were between 22-44 years of
age, with 1 respondent between 18–24 years of age.
In terms of the highest education level attainment, the majority of participants had a
college (includes technical, trade, community) diploma (n = 52, 34%) with the second largest
majority reporting having obtained a high diploma only (n = 50, 33%). Twenty three participants
(n = 23, 15%) had a university degree. Also, 20 participants only completed grade school and
one participant indicated not completing grade school level. There were nine missing values for
education level.
It was their first experience for a telemedicine appointment for 69 (45%) of participants;
however the majority (n = 85, 55%) had used telemedicine before. It was a subsequent (follow-
up) appointment with their physician for the majority of participants 92 (60%). Twenty-one
(14%) of participants reported that they had technical problems with the session.
Survey.
There was no missing data for the 17 rating scale items in the scale of interest. The
descriptive statistics (mean and standard deviations) for the participant‟s rating responses for
each survey form, equal interval balanced, positive (right) packed, positive (right) centred are
presented.
-
19
Frequency of Use of Scale Labels
Figures 2, 3, and 4 displays the percentage of responses given to each scale label for each
survey form condition.
Equal Interval Balanced Scale
0.3% 0.9%6.3%
39.5%
52.9%
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
Strongly
Disagree
Disagree Neutral Agree Strongly Agree
Figure 2. Equal interval balance scale.
Positive Packed Scale
1.0%3.9% 6.1%
24.1%
64.8%
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
Strongly
Disagree
Neutral Agree Very Much
Agree
Strongly Agree
Figure 3. Positive packed scale.
-
20
Positive Centered Scale
1.4% 2.7%5.3%
28.8%
61.8%
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
Disagree Neutral Agree Very Much
Agree
Strongly Agree
Figure 4. Positive centred scale.
Over 92% of the participants selected a positive choice (“agree” to “strongly agree”) in
all 3 survey form conditions. Comparison of the frequency of scale label choice of the 3 survey
forms indicate that the majority of respondents selected the “strongly agree” label and continue
to do so in the both the positive (right) packed and positive (right) centred scale conditions.
However, in the positive (right) packed and positive (right) centred scale conditions, there is a
substantial use of the “very much” label and minimal use of the “agree” label. For example, in
the equal interval balanced form, the “agree” label was used by subjects 39.5% of the time, but
only 6.1% in the packed scale and 5.3% in the positive (right) centred scale. In the equal interval
balanced scale, it could be hypothesized that respondents may round down their choice and
select “agree.” However, in the positive (right) centred and positive packed, respondents seem to
round up their choice to “very much agree.” The presence of an intermediate choice “very much
agree” in the positive (right) centred and positive packed scale should produce a higher mean
score. Therefore the positive (right) centred and positive packed scales reflect a more accurate
level of positivity than the equal interval balanced scale condition.
-
21
Means and Standard Deviations
Overall satisfaction results ranged from 92% to 96% across all three survey forms with
mean scores ranging from 3.52/5 to 4.81/5 with a mode of 5. In the equal interval balanced scale
condition, mean scores ranged from 3.70 to 4.65, with a mode of 5. In the positive (right) packed
scale condition, mean scores ranged from 3.52 to 4.81, with a mode of 5. In the positive (right)
centred scale condition, mean scores ranged from 3.88 to 4.77 with a mode of 5. The results of
the current study are slightly lower but comparable to the PSQ 2008 baseline data from 2008
which found that for overall satisfaction with telemedicine was 97%, with mean item rating
ranging from 4.28/5 to 4.62/5 and a mode of 5 (Keresztes, et al., 2008). The individual item
mean scores are graphically displayed in a scatterplot and boxplot format in Figure 4 and Figure
5 illustrate the overlap in the distribution of mean scores across the 17 individual items (see
Figures 4-5). Of particular interest, item 7Q (“rather see in person than by telemedicine”)
produces the lowest mean in all three survey forms (see Figures 4-5).
-
22
Figure 5. Scatterplot of mean scores for individual items.
-
23
Figure 6. Boxplot of mean score data for individual items.
-
24
Individual Item Analysis
The means and standard deviations for the 17 individual items are listed in Table 3.
Table 3
Mean and Standard Deviations by Individual Items
Item Scale condition N Mean
Standard
deviation
7a Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.43 Low
4.77 High
4.67 Med
0.838 High
0.472 Low
0.648 Med
7b Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.35 Low
4.52 High
4.38 Med
0.828 Med
0.850 High
0.796 Low
7c Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.28 Med
4.31 High
4.02 Low
0.878 Low
0.949 Med
1.129 High
7d Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.39 High
4.27 Med
4.23 Low
0.712 Low
1.216 High
1.096 Med
7e Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.65 Med
4.81 High
4.62 Low
0.482 Med
0.394 Low
0.690 High
7f Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.50 Low
4.69 High
4.63 Med
0.505 Low
0.624 High
0.561 Med
-
25
Item Scale condition N Mean
Standard
deviation
7g Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.56 Med
4.67 High
4.54 Low
0.572 Low
0.7532 High
0.7531 Med
7h Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.46 Low
4.63 High
4.50 Med
0.573 Med
0.570 Low
0.780 High
7i Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.54 Med
4.50 Low
4.62 High
0.539 Low
0.772 High
0.599 Med
7j Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.50 Low
4.60 High
4.56 Med
0.541 Low
0.644 High
0.608 Med
7k Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.48 Low
4.58 High
4.50 Med
0.574 Low
0.679 Med
0.897 High
71 Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.54 Low
4.60 Med
4.62 High
0.503 Low
0.707 High
0.661 Med
7m Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.26 Med
4.25 Low
4.42 High
0.732 Low
1.042 High
0.936 Med
7n Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.65 High
4.25 Low
4.42 Med
0.555 Low
1.120 High
0.997 Med
-
26
Item Scale condition N Mean
Standard
deviation
7o Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.556 Low
4.563 Med
4.60 High
0.572 Low
0.580 Med
0.721 High
7p Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.63 Med
4.60 Low
4.77 High
0.525 Med
0.676 High
0.425 Low
7q Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
3.70 Med
3.52 Low
3.88 High
0.903 Low
1.111 High
0.943 Med
The values across the three survey forms are very similar. The mean scores were lowest
for the equal interval balanced scale condition occurred in 8 out of 17 items, with the
intermediate position occurring with the positive centred scale condition (7 /17), and the highest
mean score occurring with the positive packed scale condition (9/17). The standard deviation
scores were highest for the positive packed scale condition occurring in 11 out of 17 items, and
lowest for the equal interval balanced scale condition (12 /17) with the positive centred scale at
the intermediate position (10/17).
-
27
Subscore Category Analysis
The mean and standard deviations for the six subscore categories are reported in Table 4.
Table 4
Mean and Standard Deviations by PSQ Subscore
Item Scale condition N Mean
Standard
deviation
1. Expectations Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.2296 Low
4.2792 High
4.2385 Med
0.47963 Low
0.58599 High
0.57263 Med
2. Access Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.4537 High
4.2500 Low
4.4231 Med
0.51641 Low
0.89917 High
0.78830 Med
3. Technical Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.5741 Low
4.7500 High
4.6250 Med
0.45977 Low
0.47266 Med
0.60127 High
4. Communication Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.5139 Low
4.5990 High
4.5529 Med
0.48422 Low
0.55000 Med
0.61504 High
5. Privacy Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.5093 Low
4.5938 High
4.5577 Med
0.51844 Low
0.65766 Med
0.75831 High
6. Satisfaction Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
54
48
52
4.5926 Med
4.5833 Low
4.6827 High
0.50539 Low
0.58649 High
0.52421 Med
-
28
As in the individual item analysis, the values across the three survey forms are similar.
The mean scores were lowest for the equal interval balanced scale condition in 4 subscore
categories (4/6) with the highest mean scores occurring in the positive packed scale condition
(4/6). The positive centred scale condition was at the intermediate position occurring in 5
subscore categories (5/6).
The standard deviation scores were the lowest for the equal interval balanced scale
condition occurring in all 6 subscore categories. The positive packed and positive centred scale
performed equally and tied both the highest and the intermediate positions, with 3 subscore
categories each.
Summary: Means and Standard Deviations
A summary of the mean scores and standard deviations are listed in Table 5. Overall, the
findings were consistent for the equal interval balanced scale only across the individual item and
subscore category analyses.
Table 5
Summary of Means and Standard Deviations
Mean
Standard
deviation
Individual item analysis Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
Low
High
Med
Low
High
Med
Subscore category analysis Equal Interval Balanced
Positive (Right) Packed
Positive (Right) Centred
Low
Med
High
Low
Equal Position (Med-High)
Equal Position (Med-High)
-
29
The equal interval balanced scale condition had the lowest mean score in both individual
and subscore category analyses. However, the intermediate and highest mean score positions
were not consistent across the individual item analysis found the positive packed scale condition
producing a higher mean score than the positive centred scale condition. The subscore category
analysis found the opposite and had the positive centred scale condition producing the highest
mean score (see Table 5).
Regarding the variance as measured by the standard deviation, the equal interval scale
condition had the lowest standard deviation score. This result was consistent with both individual
item and subscore category analyses. The individual analysis found that the standard deviation
was highest for the positive centred scale condition with the positive packed scale condition at
the intermediate position. Instead, the subscore analysis revealed equal positions for both
positive packed and positive centred conditions for highest and intermediate positions.
Analysis of Variance (ANOVA)
An analysis of variance (ANOVA) procedure was conducted to determine whether there
are significant differences across the three survey forms. One-way ANOVA assumes that the
variances of the conditions are equal. The sample size of each survey form condition differs and
is as follows: 54 in the equal interval balanced scale condition, 48 in the positive packed scale
condition, and 52 in the positive centred scale condition. A test of homogeneity of variances was
conducted to assess the assumption of equal variances across the 3 survey form conditions and to
deem whether the data is appropriate for ANOVA procedure.
Tables 6 and 7 list the results of the Levene test for homogeneity of variances.
-
30
Table 6
Levene’s Test of Homogeneity of Variances for Individual Items
Levene statistic df1 df2 Sig.
7A 6.358 2 151 .002 significant
7B .061 2 151 .941 n.s
7C 1.096 2 151 .337 n.s
7D 4.421 2 151 .014 significant
7E 6.285 2 151 .002 significant
7F .177 2 151 .838 n.s
7G .690 2 151 .503 n.s
7H 1.614 2 151 .203 n.s
7I 2.185 2 151 .116 n.s
7J .183 2 151 .833 n.s
7K 1.120 2 151 .329 n.s
7L .488 2 151 .615 n.s
7M 2.816 2 151 .063 n.s
7N 6.707 2 151 .002 significant
7O .042 2 151 .959 n.s
7P 5.843 2 151 .004 significant
7Q 2.711 2 151 .070 n.s
-
31
Table 7
Levene’s Test of Homogeneity of Variances for Subscore
Levene statistic df1 df2 Sig.
Expectations 1.429 2 151 .243 n.s
Access 6.767 2 151 .002 significant
Technical 2.606 2 151 .077 n.s
Communication 1.220 2 151 .298 n.s
Privacy .768 2 151 .466 n.s
Satisfaction 1.028 2 151 .360 n.s
The significance value exceeded 0.05 for the majority of the items (12/17) in the
individual item analysis, and 5 out of 6 categories in the subscore category analysis suggesting
that the variances of the three survey conditions are equal and the assumption is justified. The
results of the one-way ANOVA procedure are presented in Table 8 for individual item analysis
and in Table 9 for subscore category analysis.
The Levene statistic can also be used to determine standard deviation significance testing.
The individual item analysis demonstrated significance in 6 out of 17 items in the 1 out of 6
subscore categories in the subscore category analysis.
-
32
Table 8
ANOVA Table for Individual Items
ANOVA
3.271 2 1.635 3.573 .030
69.125 151 .458
72.396 153
.801 2 .400 .589 .556
102.602 151 .679
103.403 153
2.633 2 1.317 1.342 .264
148.127 151 .981
150.760 153
.716 2 .358 .343 .710
157.543 151 1.043
158.260 153
1.104 2 .552 1.897 .154
43.935 151 .291
45.039 153
.967 2 .484 1.526 .221
47.870 151 .317
48.838 153
.480 2 .240 .496 .610
72.923 151 .483
73.403 153
.720 2 .360 .854 .428
63.676 151 .422
64.396 153
.351 2 .175 .429 .652
61.734 151 .409
62.084 153
.278 2 .139 .391 .677
53.806 151 .356
54.084 153
.293 2 .147 .276 .759
80.148 151 .531
80.442 153
.190 2 .095 .242 .785
59.213 151 .392
59.403 153
.976 2 .488 .594 .553
124.063 151 .822
125.039 153
4.077 2 2.039 2.443 .090
126.007 151 .834
130.084 153
.049 2 .025 .062 .940
59.665 151 .395
59.714 153
.808 2 .404 1.346 .263
45.303 151 .300
46.110 153
3.305 2 1.652 1.702 .186
146.546 151 .971
149.851 153
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
7A
7B
7C
7D
7E
7F
7G
7H
7I
7J
7K
7L
7M
7N
7O
7P
7Q
Sum of
Squares df Mean Square F Sig.
-
33
Table 9
ANOVA Table for Subscore Categories
ANOVA
.070 2 .035 .117 .890
45.055 151 .298
45.124 153
1.201 2 .601 1.082 .342
83.827 151 .555
85.028 153
.821 2 .411 1.545 .217
40.141 151 .266
40.963 153
.184 2 .092 .302 .740
45.937 151 .304
46.121 153
.184 2 .092 .217 .805
63.900 151 .423
64.084 153
.309 2 .155 .534 .587
43.718 151 .290
44.028 153
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
expectations
access
technical
communication
privacy
satisfaction
Sum of
Squares df Mean Square F Sig.
Overall, the ANOVA procedure did not find significance in both the individual item and
subscore category analyses. The subscore category analysis also found no significance in all six
categories (see Table 9). The individual item analysis yielded a significance level exceeding 0.05
in 16 out of 17 items which suggests that there are no group differences overall (see Table 8).
The results of the ANOVA revealed a statistically significant difference for only one item (7A)
from the individual item analysis. The individual item analysis indicated significance for 7A,
F(2,151) = 3.573, p= 0.03). To determine which group(s) differ, the Tukey post hoc test was
conducted. Table 10 lists the pairwise comparison of the group means for the Tukey post hoc
procedure.
-
34
Table 10
Tukey HSD Procedure for Item 7A
N Subset for alpha = .05
1 2
Equal Interval Balanced 54 4.43
Positive (Right) Centred 52 4.67 4.67
Positive (Right) Packed 48 4.77
Sig. .158 .745
The Tukey HSD procedure indicates that the equal interval balanced scale condition differs from
the positive (right) packed scale condition (see Table 10). However, the equal interval balanced
scale condition does not differ from the positive (right) centred scale condition. In addition, the
positive (right) packed and positive (right) centred scale conditions do not differ from each other.
Reliability Analysis
The values are similar across all 3 survey forms. Overall, the reliability analysis yielded
high Cronbach alpha coefficients for all the 3 survey forms, ranging 0.888 to 0.907 (see Table
11). A high Cronbach alpha coefficient was consistent with PSQ baseline data from 2008 which
reported a Cronbach alpha of 0.822 in 2008. Cronbach alpha for the equal interval balanced scale
was measured at 0.888. Cronbach alphas were 0.898 for the positive packed scale and 0.907 for
the positive centred scale. The lowest reliabilities occurred with the equal interval balanced scale
with the highest from the positive centred scale.
-
35
Table 11
Reliability Analysis
Scale
N M SD
Cronbach
alpha Variance
Equal Interval Balanced 17 75.46 6.641 0.888 44.102
Positive (Right) Packed 17 76.15 8.460 0.898 71.574
Positive (Right) Centred 17 75.98 8.644 0.907 74.725
-
36
Chapter 5:
Discussion and Conclusions
Discussion
The current study examines the effects of a positive (right) centered scale on the
distribution and reliability of satisfaction responses in the health care domain. The research
questions examine:
1. the extent to which centredness of a rating scale provides a better discrimination of
responses in telemedicine satisfaction survey; and
2. the effects of right centered scale on the internal consistency and reliability of
satisfaction responses.
Three hypotheses were presented in the current study.
Hypothesis 1 stated that the positive (right) centred scale will produce a lower
mean than positive (right) packed scale, with the highest mean produced by the
equal interval balanced scale.
This hypothesis was not supported by the current study. No significance was found for mean
score differences. The results indicate that the equal interval balanced scale condition yielded the
lowest mean score. This finding was consistent in both individual item and subscore category
analyses. However, the highest and intermediate mean score positions varied between the
individual item and subscore category analyses. In the individual item analysis, the highest mean
score was produced by the positive packed scale with the positive centred scale at the
intermediate position. In the subscore category analyses the positive centred scale produced the
highest mean score with the positive packed scale at the intermediate position..
-
37
The mean score findings are not consistent with Kolic‟s (2004) findings on scale
centredness using frequency labels which found significant main effects for both centredness and
packedness, further producing the lower mean score in the positive (right) centred scale (M =
2.16), the intermediate mean score in the positive (right) packed scale (M = 2.24) and the higher
mean score in the equal interval scale (M =2.37) (Kolic, 2004). However, the current study
findings are similar with Klockars and Hancock (1993) on packedness using evaluation labels
which found minimal effects and comparable discrimination for a 5–point positive packed scale,
5–point equal interval balanced scale, 9–point equal interval balanced scale. Klockars and
Hancock (1993) explain that the differences using evaluation labels instead of frequency labels
may be due to high semantic elasticity of the labels used, for example evaluation labels do not
have such absolute meaning as frequency labels do for respondents (Klockars & Hancock, 1993).
This may be a plausible explanation for the differences in the current study findings using
evaluation labels in comparison to Kolic‟s (2004) study using frequency labels. Lam and Kolic
(2008) used an equal interval rating scale identical to the equal interval balanced scale condition
employed in the current study and therefore, a similar effect should be expected for the mean
score results. However, this was not the case. Lam and Kolic (2008) found that a matched
condition produces a lower mean for the positive packed scale whereas a mismatched condition
would produce of the positive packed and equal interval scales are equal (Lam & Kolic, 2008).
The opposite effect was observed in the current study in which mean scores were lowest in equal
interval scale versus the positive packed scale and does not support either conclusion.
A more plausible explanation may be likely due to respondent strategies. Although it is
expected that the saturation with positive anchors in a positive packed scale would produce lower
mean scores for the equal interval scale, as attested in Lam and Kolic (2008), the differences
-
38
observed in the current study may be due to the location on the scale where the saturation took
place. For example, in the current study, the presence of an intermediate choice “very much
agree” in the positive centred and positive packed scale produces a higher mean score.
Theoretically, it is probable that respondents used rounding strategy (round up to “very much
agree”) in the presence of an intermediate option between “agree” and “strongly agree”, which
may result in a lower mean in the equal interval scale condition. However, Lam and Kolic‟s
(2008) matched condition used the “somewhat agree” label that expanded response options to the
left of “agree” (somewhat disagree, somewhat agree, agree, strongly agree). Therefore the mean
scores results in the current study are consistent with the saturation of positive labels which
produced a “rounding up” of responses from “agree” to “very much agree” in turn yielding a
higher mean score in the positive packed and positive centred scales. This is a likely explanation
for the lower mean scores found in the equal interval balanced scale.
Other plausible explanations may also be due to response bias. Lam and Kolic (2008)
suggested that if scale labels are not compatible with item wording, the respondent will resort to
satisficing or simply ignore the labels presented in the scale. (Lam & Kolic, 2008) For example,
acquiescence bias is a form of satisficing and defined as the tendency to give positive responses
or “yea-saying” (Streiner & Norman, 2008). Lam and Kolic (2008) attribute the absence of
significant findings on semantic compatibility and variances in their study to a possible ceiling
effect. A ceiling effect occurs when the responses are not evenly distributed and show a positive
„skew‟ toward the favourable end which makes it impossible to distinguish among the various
levels of excellence (Streiner & Norman, 2008). A possible method to counteract this bias is to
offset the middle and expand in the area of interest (Streiner & Norman, 2008). Scale
centredness may be a viable strategy to counteract ceiling effect as it offsets the middle and
-
39
expands the area of interest. The current study found non-significant effects although the
distribution occurred in the manner hypothesized.
Another plausible explanation concerns cognitive burden on respondents. Spector (1980)
argued against using unequal scales as the cognitive burden on respondents is increased in the
presence of these scales. He further identifies that cognitive burden affects older respondents or
respondents with lower education levels who may strategize to select one choice and use
throughout the responses. (Spector, 1976) The respondent population in the current study have
similar characteristics in which 85% are over the age of 45. This may be a plausible explanation
for the findings in the current study.
Hypothesis 2 stated that the positive (right) centred scale will produce a larger
variance than the positive (right) packed scale, with the lowest variance produced
by the equal interval balanced scale.
The hypothesis is supported. The current study found that the equal interval scale condition
produced the lowest standard deviation score. This result was consistent with both individual
item and subscore category analyses. The individual analysis found that the standard deviation
was highest for the positive centred scale condition with the positive packed scale condition at
the intermediate position. Instead, the subscore analysis revealed equal positions for both
positive packed and positive centred conditions for highest and intermediate positions. No
significant differences were found. The result on positive packed and equal interval balanced
scale condition is consistent with Lam and Kolic (2008) which found that the standard deviation
in the matched condition using evaluation labels was higher in the positive packed scale and
lower in the equal interval scale. In Kolic‟s (2004) study on centredness, the effects of the
different centred scales on the variability of response were not fully investigated; however a
-
40
higher standard deviation was reported for the negative (left) centred over the positive (right)
centred scale. Therefore, a comparison regarding variability of responses to the positive centred
scale in the current study was not possible.
Hypothesis 3 stated that the positive (right) centred scale will produce a higher
reliability coefficient than the positive (right) packed scale with the lowest
produced for the equal interval balanced scale.
This hypothesis was supported by the current study. High reliabilities were observed for all three
survey forms, which is consistent with the reported reliability for the baseline PSQ. Reliability
using Cronbach alpha was measured at 0.888 for the equal interval balanced scale, 0.898 for the
positive packed scale, 0.907 for the positive centred scale. The lowest reliabilities occurred with
the equal interval balanced scale with the highest from the positive centred scale. However, Lam
and Kolic (2008) reported a lower alpha with the positive packed scale condition compared to
the equal interval scale condition in both the matched and mismatched conditions. This finding
was not consistent with the current study although the values are very close for Cronbach‟s alpha
across all three surveys.
Conclusion
The current study addresses an interesting issue in rating scale design and has practical
implication for survey research, particularly in satisfaction research. This current study agrees
with Kolic‟s (2004) recommendation that future research is needed on scale centredness.
Although there were no significant findings, the performance of the positive (right) centred scale
on distribution of rating responses while maintaining a high reliability are noteworthy. In a
positive respondent population, it is recommended that the positive (right) centred scale should
-
41
be preferred over a positive packed scale as it maintains to a larger degree the equal interval
properties of a scale, does not distort results, and can enable the respondents to select a more
accurate level of positiveness.
Limitations of the Study
The following are the limitations of the current study:
1. Small sample size. The mean score, standard deviations and reliability alpha
coefficients were very similar and therefore proved difficult to make comparison
across the survey forms. A larger sample size should be considered in future studies.
2. The current study used a convenience sample of telemedicine patients who had a
scheduled appointment during the study period. Future research should employ
random sampling method.
3. Apart from Kolic‟s (2004) doctoral thesis dissertation, there was no empirical
research in the area of scale centredness. Also, Kolic‟s (2004) study combined the
selected levels of scale packedness and scale centredness in a factorial design
producing six frequency scales which made it difficult for comparison to the current
study.
4. The label selection for both the positive packed and positive centred scales were not
pre-determined for the study. The use of pre-determined scale values to select labels
would provide a more precise measurement of intensity. It was not the scope of the
current study to pre-determine scale values for the positive packed and positive
centred scale conditions. Although Kolic (2004) used pre-determined scale values
-
42
from an additional experiment, scale labels were not identical to the current study as
it excluded the labels of “neutral” and “strongly disagree.” Therefore the current
study could not generate a comparison using pre-determined scale values. Kolic‟s
study differs from the current study in that it used frequency labels and derived scale
values from her experiment to normalize her analysis. However, the current study did
not normalize the scale for the above analysis and used the scale values of 1, 2, 3, 4, 5
for the positive packed and positive centred scale conditions. However, it is plausible
to transform the scale and preserve the Likert scale values of 1–5 which are the actual
weighted values confirmed by Likert and insert 4.5 for the label “very much agree”,
the middle point between “agree” and “strongly agree.”
-
43
Figure 7 would be a normalized scale for the current study.
< Psychological Continuum >
Figure 7. Rating scale anchor choice and scale types.
It was not the scope of this study to normalize the scale values as to mirror practical
survey design as survey designers typically do not pre-determine values when they are
constructing a rating scale, but select them arbitrarily. An analysis used the normalized scale
manner may provide significant results.
-
44
Suggestions for Future Research
It is recommended that additional research is needed to explore the effects of scale
centredness on rating scale responses. Future research should include a large sample size as well
as employ qualitative methods to further explore respondent strategies. It is important to note
that much of the past research on scale packedness and centredness has been done in the
education setting using university student populations. A known positive respondent population
and existing survey instrument producing high satisfaction rates were employed in the current
study. Differences found in the current research may be very likely attributed to the respondent
population. Future research on scale centredness should include the health care sector as it may
prove to be an optimal setting to explore rating scale techniques for positive respondent
populations.
-
45
References
Avis, M., Bond, M., & Arthur, A. (1997). Questioning patient satisfaction: An empirical
investigation in two outpatient clinics. Social Science & Medicine, 44(1), 85-92.
Barnette, J. J. (2000). Effects of stem and Likert response option reversals on survey internal
consistency: If you feel the need, there is a better alternative to ssing those negatively
worded stems. Educational and Psychological Measurement, 60(3), 361-370.
Castle, N. G., Brown, J., Hepner, K. A., & Hays, R. D. (2005). Review of the literature on survey
instruments used to collect data on hospital patients' perceptions of care. Health Services
Research, 40(6 Pt 2), 1996-2017.
Currell, R., Urquhart, C., Wainwright, P., & Lewis, R. (2000). Telemedicine versus face to face
patient care: effects on professional practice and health care outcomes. Cochrane
Database of Systematic Reviews (2), CD002098.
Demiris, G. (2006). Principles of survey development for telemedicine applications. Journal of
Telemedicine and Telecare, 12(3), 111-115.
Dixon, P. N., Bobo, M., & Stevick, R. A. (1984). Response differences and preferences for all-
category-defined and end-defined Likert formats. Educational and Psychological
Measurement, 44, 61-66.
Hancock, G. R., & Klockars, A. J. (1991). The effect of scale manipulations on validity:
Targetting frequency rating scales for anticipated performance levels. Applied
Ergonomics, 22(3), 147-154.
Keresztes, C., Hartford, K., & Wilk, P. (2008, October). Measuring patient satisfaction with
telemedicine: Establishing psychometric properties. Paper presented at the Canadian
Society of Telehealth, Ottawa, Ontario, Canada.
-
46
Klockars, A. J., & Hancock, G. R. (1993). Manipulations of evaluative rating scales to increase
validity. Psychological Reports, 73,1059-1066.
Kolic, M. C. (2004). An empirical investigation of factors affecting Likert-type rating scale
responses. Unpublished doctoral dissertation, University of Toronto, Toronto, Ontario,
Canada.
Lam, T. C. M., & Klockars, A. J. (1982). Anchor point effects on the equivalence of
questionnaire items. Journal of Educational Measurement, 19(4), 317-322.
Lam, T. C. M., & Kolic, M. C. (2008). Effects of sematic incompatibility on rating response.
Applied Psychological Measurement, 32(3), 248-260.
Lam, T. C. M., & Stevens, J. J. (1994). Effects of content polarization, item wording, and rating
scale width on rating response. Applied Measurement in Education, 7(2), 141-158.
Lozano, L. M., Garcia-Cueto, E., & Muniz, J. (2008). Effect of the number of response
categories on the reliability and validity of rating scales. Methodology, 4(2), 73-79.
Mair, F., & Whitten, P. (2000). Systematic review of studies of patient satisfaction with
telemedicine. BMJ, 320(7248), 1517-1520.
Mekhjian, H., Turner, J. W., Gailiun, M., & McCain, T. A. (1999). Patient satisfaction with
telemedicine in a prison environment. Journal of Telemedicine and Telecare, 5(1), 55-61.
Meric, H. J. (1994). The effect of scale form choice on psychometric properties of patient
satisfaction measurement. Health Marketing Quarterly, 11(3-4), 27-39.
Sitzia, J., & Wood, N. (1997). Patient satisfaction: A review of issues and concepts. Social
Science & Medicine, 45(12), 1829-1843.
-
47
Spector, P. E. (1976). Choosing response categories for summated rating scale. Journal of
Applied Psychology, 61, 374-375.
Spector, P. E. (1980). Ratings of equal and unequal response choice intervals. Journal of Social
Psychology, 112, 115-119.
Streiner, D. L., & Norman, G. R. (2008). Health measurement scales: A practical guide to their
development and use (3rd
. ed.). New York: Oxford Univesity Press.
Uebersax, J. S. (2006). Likert scales: Dispelling the confusion. Retrieved September 15, 2009,
from Statistical Methods for Rater Agreement Web site, http://john-
uebersax.com/stat/likert.htm
Ware, J. E., Jr. (1978). Effects of acquiescent response set on patient satisfaction ratings.
Medical Care, 16(4), 327-336.
Whitten, P. S., & Mair, F. (2000). Telemedicine and patient satisfaction: Current status and
future directions. Telemedicine Journal and e-Health, 6(4), 417-423.
Wildt, A. R., & Mazis, M. B. (1978). Determinants of scale response: Label versus position.
Journal of Marketing Research, 15(May), 261-267.
Williams, T. L., May, C. R., & Esmail, A. (2001). Limitations of patient satisfaction studies in
telehealthcare: A systematic review of the literature. Telemedicine Journal and e-Health,
7(4), 293-312.
http://john-uebersax.com/stat/likert.htmhttp://john-uebersax.com/stat/likert.htm
-
48
Appendix A
Invitation to Participate
-
49
Appendix B
Study Participant Survey:
Version 2, Format A
-
50
-
51
-
52
-
53
Appendix C
Study Participant Survey:
Version 3, Format B
-
54
-
55
-
56
-
57
Appendix D
Study Participant Survey:
Version 3, Format C
-
58
-
59
-
60