the effect of scale centredness on patient ......caterina masino graduate department of curriculum,...

65
THE EFFECT OF SCALE CENTREDNESS ON PATIENT SATISFACTION RESPONSES by Caterina Masino A thesis submitted in conformity with the requirements for the degree of Master of Arts Graduate Department of Curriculum, Teaching and Learning Ontario Institute for Studies in Education University of Toronto © Copyright by Caterina Masino 2010

Upload: others

Post on 28-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • THE EFFECT OF SCALE CENTREDNESS ON

    PATIENT SATISFACTION RESPONSES

    by

    Caterina Masino

    A thesis submitted in conformity with the requirements

    for the degree of Master of Arts

    Graduate Department of Curriculum, Teaching and Learning

    Ontario Institute for Studies in Education

    University of Toronto

    © Copyright by Caterina Masino 2010

  • ii

    THE EFFECT OF SCALE CENTREDNESS ON

    PATIENT SATISFACTION RESPONSES

    Master of Arts 2010

    Caterina Masino

    Graduate Department of Curriculum, Teaching and Learning

    University of Toronto

    Abstract

    High satisfaction rates and the lack of response variability are problematic areas in survey

    research. An important area of methodological concern for self-report survey is the sensitivity

    and reliability of the instrument. This research examines the effects of a positive (right) centred

    scale on the distribution and reliability of satisfaction responses in a positive respondent

    population. A total of 216 participants were randomly assigned to one of the following three

    experimental Likert scale conditions: 5–point equal interval balanced scale; 5–point positive

    (right) packed scale; 5–point positive (right) centred scale. The distribution of responses

    occurred in the direction hypothesized. Comparable discrimination was found across the three

    conditions. Although, the study findings did not prove to be significant, the equal interval

    balanced scale produced the lowest mean score, contrary to previous research findings.

  • iii

    Acknowledgements

    The completion of this thesis dissertation is dedicated to all my cheerleaders.

    I wish to thank Sharon McGonigle, Jennifer Wong, Emily Seto, Luis Saffie & Munira Jessa for

    their unwavering faith and encouragement during my studies.

    A special note of appreciation goes to:

    The University Health Network Telehealth Program for their ongoing support, and in particular

    to Judith Estrada for her much valued help during participant recruitment.

    &

    Kathleen Hartford and the Ontario Telemedicine Network for the permission to use their Patient

    Satisfaction Questionnaire in this research.

    Most of all, I wish to thank my family for their love, patience, and support over the years.

  • iv

    Table of Contents

    Abstract ........................................................................................................................................... ii

    Acknowledgements ........................................................................................................................ iii

    Chapter 1: Introduction ................................................................................................................... 1

    Background to the Study ......................................................................................................... 2

    Objectives of the Study ........................................................................................................... 2

    Research Questions and Hypotheses ...................................................................................... 3

    Chapter 2: Literature Review .......................................................................................................... 4

    Likert Scales............................................................................................................................ 4

    Patient Satisfaction Research .................................................................................................. 5

    Rating Scale Construction....................................................................................................... 5

    Chapter 3: Methodology ............................................................................................................... 11

    Introduction ........................................................................................................................... 11

    Research Design.................................................................................................................... 12

    Sample................................................................................................................................... 13

    Survey Instrument ................................................................................................................. 14

    Procedure .............................................................................................................................. 16

    Data Collection ..................................................................................................................... 17

    Data Analysis ........................................................................................................................ 17

    Chapter 4: Results ......................................................................................................................... 18

    Descriptive Analysis ............................................................................................................. 18

    Participants. ....................................................................................................................... 18

    Survey. .............................................................................................................................. 18

    Frequency of Use of Scale Labels ........................................................................................ 19

    Means and Standard Deviations............................................................................................ 21

    Individual Item Analysis ....................................................................................................... 24

    Subscore Category Analysis ................................................................................................. 27

    Summary: Means and Standard Deviations .......................................................................... 28

    Analysis of Variance (ANOVA) ........................................................................................... 29

    Reliability Analysis ............................................................................................................... 34

    Chapter 5: Discussion and Conclusions ........................................................................................ 36

    Discussion ............................................................................................................................. 36

    Conclusion ............................................................................................................................ 40

    Limitations of the Study........................................................................................................ 41

    Suggestions for Future Research .......................................................................................... 44

    References ..................................................................................................................................... 45

  • v

    List of Tables

    Table 1 PSQ Survey Items ............................................................................................................ 15

    Table 2 PSQ Scoring Rules for Patient Satisfaction ..................................................................... 16

    Table 3 Mean and Standard Deviations by Individual Items ........................................................ 24

    Table 4 Mean and Standard Deviations by PSQ Subscore ........................................................... 27

    Table 5 Summary of Means and Standard Deviations.................................................................. 28

    Table 6 Levene‟s Test of Homogeneity of Variances for Individual Items ................................. 30

    Table 7 Levene‟s Test of Homogeneity of Variances for Subscore ............................................. 31

    Table 8 ANOVA Table for Individual Items ................................................................................ 32

    Table 9 ANOVA Table for Subscore Categories ......................................................................... 33

    Table 10 Tukey HSD Procedure for Item 7A ............................................................................... 34

    Table 11 Reliability Analysis........................................................................................................ 35

    List of Figures

    Figure 1. Rating scale anchor choice and scale types. ................................................................. 13

    Figure 2. Equal interval balance scale. ......................................................................................... 19

    Figure 3. Positive packed scale. ................................................................................................... 19

    Figure 4. Positive centred scale. ................................................................................................... 20

    Figure 5. Scatterplot of mean scores for individual items. ........................................................... 22

    Figure 6. Boxplot of mean score data for individual items. ......................................................... 23

    Figure 7. Rating scale anchor choice and scale types. ................................................................. 43

    List of Appendices

    Appendix A Invitation to Participate ............................................................................................ 48

    Appendix B Study Participant Survey: Version 2, Format A ....................................................... 49 Appendix C Study Participant Survey: Version 3, Format B ....................................................... 53 Appendix D Study Participant Survey: Version 3, Format C ....................................................... 57

  • 1

    Chapter 1:

    Introduction

    Patient satisfaction is an important factor in evaluating health care (Sitzia & Wood,

    1997). Although high levels of patient satisfaction is a desired outcome of patient care, uniformly

    high levels of satisfaction can lead critics to question satisfaction results. There are many reasons

    for high satisfaction which span both conceptual and methodological design realms. In the

    presence of an overly positive respondent population, an important area of methodological

    concern for self-report survey is the sensitivity and reliability of the instrument. A lack of

    sensitivity of the survey instrument can produce high satisfaction results artificially, so it is

    necessary to determine which measures are most valid and sensitive to differences in quality.

    Moreover, the lack in response variability becomes problematic for researchers, who are often

    forced into comparing positive with less positive responses.

    Telemedicine is an emerging type of health care delivery that allows a physician to

    provide clinical care to their patient through interactive videoconference (Mekhjian, Turner,

    Gailiun, & McCain, 1999). Patient satisfaction is one of the most widely researched patient-

    oriented outcomes for telemedicine (Whitten & Mair, 2000). One feature of health care research

    and of telemedicine in particular, is the daunting array of available surveys. Despite the

    variability in survey instrumentation, consistent results have emerged. Three separate systematic

    reviews all report that data on patient satisfaction with telemedicine reveal no unfavorable

    effects, and had satisfaction rates greater than 80% (Currell, Urquhart, Wainwright, & Lewis,

    2000; Mair & Whitten, 2000; Williams, May, & Esmail, 2001).

  • 2

    Background to the Study

    Overly positive satisfaction response rates is not a new phenomenon in the presence of

    Likert scales, which are known to be highly susceptible to a positive response set (Ware, 1978).

    In the systematic review by Williams et al. (2001), the majority of telemedicine studies use

    Likert-type format with agreement labels (Williams et al., 2001). The effects of rating scale

    labels on the psychometric properties of survey instruments have not been systematically

    researched in the area of patient satisfaction (Meric, 1994). However, some research has been

    done in the realms of education and psychology that explore the effects of rating scale labels on

    responses. Recently, a doctoral thesis dissertation by Kolic (2004) empirically examined the

    effects of rating scale construction using a technique called scale centredness in the education

    context (Kolic, 2004). Scale centredness is a function of the choice of endpoints. For example, if

    a rating scale is leaning more to the right or positive end of the continuum, the scale is

    considered positive (right) centred.(Kolic, 2004) Kolic (2004) found a significant effect of scale

    centredness in her research and further recommended it as a viable scale technique in the

    presence of an overly positive respondent population. Since telemedicine satisfaction research

    has struggled with criticisms on its positive response rates, and given the importance of this

    recent finding and limited research on this technique, the objective of this study is to explore the

    effects of scale centredness on the psychometrics properties of a survey instrument used to assess

    patient satisfaction in telemedicine context.

    Objectives of the Study

    The aim of the current study is to provide empirical research for rating scale design in

    patient satisfaction by examining the effect of the scale centredness on rating responses in the

    presence of an overly positive respondent population.

  • 3

    Research Questions and Hypotheses

    Research Question 1:

    To what extent does the centredness of a rating scale provide a better discrimination

    of responses in telemedicine satisfaction survey?

    Hypothesis 1:

    The positive (right) centred scale will produce a lower mean than

    positive (right) packed scale, with the highest mean produced by

    the equal interval balanced scale.

    Hypothesis 2:

    The positive (right) centred scale will produce a larger variance

    than the positive (right) packed scale, with the lowest variance

    produced by the equal interval balanced scale.

    Research Question 2:

    What are the effects of right centered scale on the internal consistency and reliability

    of satisfaction responses?

    Hypothesis 3:

    The positive (right) centred scale will produce a higher reliability

    coefficient than the positive (right) packed scale with the lowest

    produced for the equal interval balanced scale.

  • 4

    Chapter 2:

    Literature Review

    Likert Scales

    A Likert scale measures attitudes in which respondents are asked to rate aspects of a

    service or state the extent to which they agree or disagree with predetermined statements. A

    typical Likert scale consists of a 5–point scale, equal interval spacing, and is balanced with the

    same number of points in both the negative and positive direction. Respondents rate items that

    are stated in a unidirectional direction. In terms of the scoring procedure, the scale is considered

    a summative scale which allows summing and averaging scores across multiple items (Uebersax,

    2006). An example of a Likert scale is as follows: strongly disagree, disagree, neutral, agree, and

    strongly agree. Scales that do not meet all Likert characteristics as described above may be

    referred to as Likert-type scales. Although, many definitions exist on Likert-type scales, the

    general consensus is that these scales often use labels other than agreement labels and the most

    widely used include label descriptions in quality and frequency (Meric, 1994). Examples of

    Likert-type scales are commonly found and can include scales with labels such as evaluation

    labels or frequency labels. Categories are evenly spaced, however the selection of labels in the

    two-directions may be more relaxed and not exact opposites per se (Uebersax, 2006). Also,

    Likert-type scales may not have an exact middle as in Likert scales (e.g., Neutral or Neither

    Agree or Disagree), however do include a central point (e.g., Average or Good). The format of a

    typical 5–point Likert-type scale with evaluation labels is as follows: Poor, Fair, Good, Very

    Good, and Excellent.

  • 5

    Patient Satisfaction Research

    Surveys are the most widely used instruments in health care research to assess patient

    satisfaction. A review on patient perception of hospital care in the time period 1980-2003, found

    that the most common survey employed used an evaluation type response format (Castle, Brown,

    Hepner, & Hays, 2005). Similarly, a systematic review in telemedicine satisfaction found that of

    the 77% (72 out of 93) of studies that provided survey details, 89% used a Likert format in which

    there was a scale of agreement or disagreement (Williams et al., 2001). Unfortunately, there is

    insufficient methodological detail in the majority of the published research regarding rating scale

    formats, instrument sensitivity, reliability, and validity issues. A recent literature review found

    only 22% studies provided additional details such as survey instrument employed. (Castle et al.,

    2005) Similarly, Williams (2001) found that 94% of the telemedicine studies created their own

    surveys and the majority (86%) did not report on validity or reliability (Williams et al., 2001).

    One telemedicine study focusing on an inmate population documented the use of a 5–point

    Likert scale having a reliability coefficient over 0.8 (Mekhjian et al., 1999). However, a partial

    or altogether lacking methodological detail in the majority of studies makes it more difficult for

    researchers to systematically explore rating scale construction and the effect of different scale

    forms on the psychometric properties of a survey instrument.

    Rating Scale Construction

    Even in the midst of insufficient instrument details, a well-documented issue in the

    published literature is the lack of variability of responses in satisfaction research in healthcare. A

    general finding of patient satisfaction surveys in conventional healthcare delivery models show

    uniformly high levels of satisfaction (Avis, Bond, & Arthur, 1997; Sitzia & Wood, 1997).

    Similarly, systematic reviews on telemedicine satisfaction revealed no unfavorable effects, and

  • 6

    had satisfaction rates greater than 80% (Currell et al., 2000; Mair & Whitten, 2000; Williams et

    al., 2001). Although, there are many factors thought to contribute to high rates of reported

    satisfaction, the most common is response bias including social desirability bias or acquiescent

    response bias (Sitzia & Wood, 1997). However, some suggest that survey design and in

    particular the wording of questions, are important factors and may also contribute to response set

    behaviours. There are many aspects of response variability that span both the methodological and

    conceptual dimensions of satisfaction. Regarding methodology, a common practice in survey

    design is to phrase some items positively and some negatively to minimize the „halo‟ effect as it

    is commonly believed that when respondents realize that statements are not always positive, they

    tend to rate each statement more carefully (Demiris, 2006). In the presence of positive response

    rates, there has also been debate over the use of negatively (or reverse wording) of the question

    stem intended to protect against positive response set behaviours. Research on the practice of

    negatively worded items has found problems with internal consistency, factor structures, and

    other statistics when negatively worded stems are used either alone or together with directly

    worded stems (Barnette, 2000). Barnette (2000) explored alternatives to using negatively worded

    items. In this study, combinations of item stem direction and Likert response options were used

    to determine effects on reliability and found the condition with the highest reliability occurred

    when directly worded stems were coupled with bidirectional response options (Barnette, 2000).

    Barnette (2000) suggested that using directly worded item stems with bidirectional response

    labels should be selected instead of negative wording in practice. However, it is not the scope of

    the present study to manipulate item wording as a way to create response variability. The current

    study will focus on Likert rating scale labels.

  • 7

    There are a number of factors that should be considered by researchers in developing

    Likert rating scales. These factors include: number of scale points; category of labels or anchors;

    assignment of numerical values in conjunction with labels; degree of labeling; scale width;

    semantic compatibility; equal interval properties or packedness of the scale (a function of choice

    of labels between the endpoints); centredness of the scale (a function of choice of labels for

    endpoints) (Kolic, 2004).

    There has much debate about the implications of scale length and equal interval spacing.

    Likert scales should be carefully designed since scale length may adversely affect scale

    reliability. It has been found that the optimal number of scale points is between four and seven

    points (Lozano, Garcia-Cueto, & Muniz, 2008). Regarding the interval spacing between response

    choices, a common belief is that Likert rating scales should be designed with equally spaced

    response choices since it is thought that respondents might respond to rating scales as if the

    response choices were equally spaced (Spector, 1980). But in practice, rating scales have been

    designed even when response choice words did not form an equal interval scale. In order to

    investigate whether unequally spaced response choices cause significant problems, Spector

    (1980) conducted three experiments where he found that equally spaced response choices make

    the rater‟s task easier, and that respondents equalized those response choice when they were

    unequal (Spector, 1980).

    The choice of labels is an important factor to consider in scale construction. The choice

    of labels has been known to affect the interval property of the rating scale (Lam & Klockars,

    1982; Lam & Stevens, 1994; Wildt & Mazis, 1978). Lam and Klockars (1982) found that equal

    interval properties of a rating scale were dependent on the appropriate choice of labels to anchor

    the points on the rating scale (Lam & Klockars, 1982). Furthermore, Lam and Stevens (1994)

  • 8

    examined the effects of rating scale design and its interaction with content polarization and item

    word intensity and found that responses can be influenced not only by the design of the rating

    scale but also by the interactions of the design with item content and wording (Lam & Stevens,

    1994). Therefore, the impact of scale labels on the variability of subject‟s responses is dependent

    on the nature of the scale content and in the manner in which content is conveyed in each item.

    There have been a number of studies that have explored the technique of scale

    packedness. In Lam and Klockars (1982) evaluative labels were used in an education context

    consisting of the following four types of rating scales: (a) endpoints labeled only; (b) labels

    equally spaced; (c) right or positive packed; (d) left or negative packed

    (Lam & Klockars, 1982).The authors found that left or negatively packed scales produced the

    highest mean, right or positively packed scales produced the lowest mean and that the equal

    interval scale and endpoints-only labeled scale produced mean ratings that were equivalent and

    intermediate to the means of the positively and negatively packed scales (Lam & Klockars,

    1982). They found that scales with only endpoints labeled produced results similar to scales with

    equally spaced response labels. Most of the research indicates that there is little difference

    between scales with all points labeled and end-point only labeled scales. In the Dixon et al.

    (1984) results did not show a significant difference between the end-defined and all-category

    defined formats, nor did respondents indicate a format preference.(Dixon, Bobo, & Stevick,

    1984) In general, labeled scales tend to be endorsed more often than unlabelled scales if

    descriptors allow.

    Similar findings were presented by Hancock and Klockars (1991) study using frequency

    labels on the following three scales: (a) 5–point balanced equal interval; (b) 9–point balanced

    equal interval; (c) 5–point right packed scale. (Hancock & Klockars, 1991).

  • 9

    The study found that in the case of frequency labels, the best discrimination came from the 9–

    point balanced scale, the second best form the 5–point positive packed scale and the poorest from

    the 5–point balanced scale (Hancock & Klockars, 1991). Although the 9–point scale provided

    better discrimination and higher mean correlation than the two shorter points, the 5–point right

    packed scale provided the lower mean (Hancock & Klockars, 1991). In a follow-up study by

    Klockars and Hancock (1993) the same experiment was employed but consisting of the

    evaluation labels (very poor to excellent). The positive packed scale differed significantly from

    the two balanced scales, whereas the two balanced scales were indistinguishable from each other

    (Klockars & Hancock, 1993).

    The implications from the Hancock and Klockars (1993) study for rating scale

    construction of evaluation scales, the effect of lengthening the scale or label packing the scale is

    minimal (Klockars & Hancock, 1993). The difference between frequency and evaluation rating

    scales were interpreted as due to the fact that evaluation labels lack the specificity of frequency

    labels show high semantic elasticity and do not have such absolute meanings as the frequency

    labels (Klockars & Hancock, 1993).

    There has not been much written on scale centredness. Kolic‟s (2004) doctoral thesis

    dissertation explored the effects of scale centredness on responses, using a 2 x 3 factorial design

    that crossed two levels of scale centredness (left and right) with three levels of scale packedness

    (left and right) (Kolic, 2004). Only 5–point rating scale using frequency labels was employed in

    this study. The study found that the mean score for the left packed scale was higher than for the

    equal interval scale, which in turn, was higher than that of the right packed scale (Kolic, 2004).

    In addition, significant main effects were found for centredness and packedness (Kolic, 2004).

    The observed power of both centredness and packedness was also significant which suggests that

  • 10

    study results are stable and there is a high probability of obtaining results significant on the 0.05

    level in study replications (Kolic, 2004). With regards to reliability coefficients, no significant

    differences were found. Respectively, the highest reliability was found for left centred left

    packed scales and right centred left packed scales (Kolic, 2004). The lowest reliability was found

    for left centred right packed scales and right centred right packed scales (Kolic, 2004). Kolic

    (2004) recommended that the use of a right centred scales to obtain finer discrimination, and

    ensure adequate response variability in an overly positive respondent population.

    A wide variety of rating scales are available researchers, however the selection of formats

    are especially important when respondents are inclined toward a generally positive attitude

    towards the object of interest being measured. Popular techniques to minimize the effects of

    positive response set include packing a scale, and reverse wording. Scale centredness, a newer

    technique, has not been explored with the use of evaluative labels. As health care research uses

    evaluation labels quite frequently, this study will use a centered rating scale with agreement

    response options within the context of measuring patient satisfaction with telemedicine health

    service delivery.

  • 11

    Chapter 3:

    Methodology

    Introduction

    The purpose of this research is to examine the effects of a positive (right) centered scale

    on the distribution and reliability of satisfaction responses. As previously mentioned,

    telemedicine satisfaction is an excellent context for this experiment as it is prone to an overly

    positive respondent population. This study will make the assumption that telemedicine

    respondents are telling the truth and are satisfied with the service.

    The two research questions are:

    1. To what extent does the centredness of a rating scale provide a better discrimination

    of responses in telemedicine satisfaction survey?

    2. What are the effects of right centered scale on the internal consistency and reliability

    of satisfaction responses?

    The three hypotheses are:

    1. The positive (right) centred scale will produce a lower mean than positive (right)

    packed scale, with the highest mean produced by the equal interval balanced scale.

    2. The positive (right) centred scale will produce a larger variance than the positive

    (right) packed scale, with the lowest variance produced by the equal interval balanced

    scale.

  • 12

    3. The positive (right) centred scale will produce a higher reliability coefficient than the

    positive (right) packed scale with the lowest produced for the equal interval balanced

    scale.

    The independent variables are the following rating scales: positive (right) centred,

    positive (right) packed, and equal interval balanced (see Figure 1). The dependent variables are

    the distribution and variability of the rating responses, and the reliability of the survey

    instrument.

    Research Design

    Each participant will be randomly assigned to one of three experimental Likert scale

    forms in the table below. The original survey (equal interval balanced) is one of the three

    experimental conditions to determine a baseline.

  • 13

    < Psychological Continuum >

    Figure 1. Rating scale anchor choice and scale types.

    Sample

    The target population is telemedicine patients. The study was conducted at a multi-site

    hospital in Toronto that uses telemedicine services for clinical appointments. A convenience

    sample was used consisting of patients who have been scheduled for a telemedicine appointment

    with a clinician at one of the participating hospital sites during the study time frame. This study

    has been approved by the University Health Network Research Ethics Board and the University

    of Toronto Office of Research Ethics.

  • 14

    Survey Instrument

    The current study will use the existing telemedicine Patient Satisfaction Questionnaire

    (PSQ) developed by the Ontario Telemedicine Network (OTN). The survey consists of a total of

    31 items, 19 of which are within the rating scale of interest. There are 19 rating scale items of

    interest which are constructed as statements of opinion and patients‟ perceptions of telemedicine

    consultations and use the equal interval balanced rating scale. Only 17 of the 19 items are used to

    obtain the satisfaction score in the following six categories: expectations, access, technical,

    communication, privacy, and satisfaction.

    Statistics on the PSQ in the year 2008 found the overall satisfaction with telemedicine

    was 97%, with mean item rating ranging from 4.28/5 to 4.62/5, mode 5, reliability measured by

    Cronbach‟s alpha 0.822 (Keresztes, Hartford, & Wilk, 2008). On this basis, this instrument can

    be utilized to examine the scale form conditions for this research.

    Table 1 presents the abbreviated item content for the PSQ items by its respective

    subscore category and the direction of item wording (e.g., whether the item represents a

    favorable (+) or unfavorable (-) opinion about telemedicine consultations. Seventeen items are

    used to score the PSQ subscores. All items represent a favorable opinion except one item 7Q that

    represents an unfavorable opinion statement. This item will be reverse scored.

  • 15

    Table 1

    PSQ Survey Items

    Item Abbreviated Item Content by Sub-score Group Direction of Wording

    SUB-SCORE 1: EXPECTATIONS

    7A “received enough notice” +

    7B “scheduled quickly” +

    7C “knew what to expect” +

    7D “see my health care provider sooner” +

    7Q “rather see in person than by telemedicine” - (reverse score)

    SUB-SCORE 2: ACCESS

    7M “felt as comfortable receiving care” +

    7N “easier for me to see the health care provider” +

    SUB-SCORE 3: TECHNICAL

    7E “see the health care provider clearly” +

    7F “hear the health care provider clearly” +

    SUB-SCORE 4: COMMUNICATION

    7G “could talk about the same information” +

    7H “was enough time” +

    7I “felt I was understood” +

    7J “next steps in my care were explained” +

    SUB-SCORE 5: PRIVACY

    7K “felt comfortable in the room” +

    7L “felt my privacy was respected” +

    SUB-SCORE 6: SATISFACTION

    7O “overall, I was satisfied” +

    7P “will use telemedicine again” +

  • 16

    Items are grouped and scored as shown in Table 2. The items within each subscore

    category are summed to yield the score and then presented as an average.

    Table 2

    PSQ Scoring Rules for Patient Satisfaction

    Subscore Items

    1. Expectations 7A + 7B + 7C + 7D + 7Q

    2. Access 7M + 7N

    3. Technical 7E + 7F

    4. Communication 7G + 7H + 7I + 7J

    5. Privacy 7K + 7L

    6. Satisfaction 7O + 7P

    Procedure

    The study was introduced to eligible participants by telephone. When participants agreed

    to partake in the study, a letter requesting participation and consent (see Appendix A) was

    attached to the survey that was selected by random assignment and mailed to the participant.

    Participants were asked to circle the number for each statement that represents the opinion that is

    closest to his or her view. The survey and item wording of the PSQ were identical across all

    forms except the for response label choices on the scale of interest. Survey form A was the

    baseline equal interval balanced scale with the following response scale: Strongly Disagree,

    Disagree, Neutral, Agree, and Strongly Agree (see Appendix B). Survey form B employed the

    positive (right) packed scale in which subjects responded to the same statements on the scale of

  • 17

    interest but using the response anchors of: Strongly Disagree, Neutral, Agree, Very Much Agree,

    and Strongly Agree (see Appendix C). Survey form C employed the positive (right) centred scale

    in which subjects responded to the same statements on the scale of interest but using the response

    anchors of: Disagree, Neutral, Agree, Very Much Agree, and Strongly Agree (see Appendix D).

    Data Collection

    To investigate the hypotheses, participants were randomly assigned to one of the three

    different forms of a survey. Participants were drawn from a multi-site teaching hospital in

    Toronto who had a scheduled telemedicine appointment occurring in the 2-month study time

    frame. An independent experimental design was utilized, and a convenience sample included

    patients who were scheduled for a telemedicine appointment within the study time frame.

    Subjects were then randomly assigned to one of the three survey form conditions. A total of 216

    surveys were sent by mail and a total of 154 surveys were returned by mail, therefore the study

    response rate was 71%. Returns by survey form were as follows: 54 equal interval balanced, 48

    positive (right) packed scale, and 52 positive (right) centred survey forms.

    Data Analysis

    The data was analyzed using the Statistical Package for the Social Sciences (SPSS)

    Version 13.0. The first analysis examines the percentage of responses given for each label choice

    in the rating scale. The second analysis examines mean scores and standard deviations of the 17

    rating scale items individually and then again by the six subscores. The one-way analysis of

    variance (ANOVA) will be performed to compare the three survey conditions. In addition,

    reliability measures and Cronbach‟s Alpha are examined to determine the internal consistency of

    responses for the three survey conditions.

  • 18

    Chapter 4:

    Results

    Descriptive Analysis

    Participants.

    The sample consisted of 74 females (48%) and 80 males (52%). The majority of

    participants (n = 90, 58%) were between 45–64 years of age with the second largest majority (n

    = 41, 27%) 65 years of age and over. The remaining 14% (n = 22) were between 22-44 years of

    age, with 1 respondent between 18–24 years of age.

    In terms of the highest education level attainment, the majority of participants had a

    college (includes technical, trade, community) diploma (n = 52, 34%) with the second largest

    majority reporting having obtained a high diploma only (n = 50, 33%). Twenty three participants

    (n = 23, 15%) had a university degree. Also, 20 participants only completed grade school and

    one participant indicated not completing grade school level. There were nine missing values for

    education level.

    It was their first experience for a telemedicine appointment for 69 (45%) of participants;

    however the majority (n = 85, 55%) had used telemedicine before. It was a subsequent (follow-

    up) appointment with their physician for the majority of participants 92 (60%). Twenty-one

    (14%) of participants reported that they had technical problems with the session.

    Survey.

    There was no missing data for the 17 rating scale items in the scale of interest. The

    descriptive statistics (mean and standard deviations) for the participant‟s rating responses for

    each survey form, equal interval balanced, positive (right) packed, positive (right) centred are

    presented.

  • 19

    Frequency of Use of Scale Labels

    Figures 2, 3, and 4 displays the percentage of responses given to each scale label for each

    survey form condition.

    Equal Interval Balanced Scale

    0.3% 0.9%6.3%

    39.5%

    52.9%

    0.0%

    10.0%

    20.0%

    30.0%

    40.0%

    50.0%

    60.0%

    70.0%

    Strongly

    Disagree

    Disagree Neutral Agree Strongly Agree

    Figure 2. Equal interval balance scale.

    Positive Packed Scale

    1.0%3.9% 6.1%

    24.1%

    64.8%

    0.0%

    10.0%

    20.0%

    30.0%

    40.0%

    50.0%

    60.0%

    70.0%

    Strongly

    Disagree

    Neutral Agree Very Much

    Agree

    Strongly Agree

    Figure 3. Positive packed scale.

  • 20

    Positive Centered Scale

    1.4% 2.7%5.3%

    28.8%

    61.8%

    0.0%

    10.0%

    20.0%

    30.0%

    40.0%

    50.0%

    60.0%

    70.0%

    Disagree Neutral Agree Very Much

    Agree

    Strongly Agree

    Figure 4. Positive centred scale.

    Over 92% of the participants selected a positive choice (“agree” to “strongly agree”) in

    all 3 survey form conditions. Comparison of the frequency of scale label choice of the 3 survey

    forms indicate that the majority of respondents selected the “strongly agree” label and continue

    to do so in the both the positive (right) packed and positive (right) centred scale conditions.

    However, in the positive (right) packed and positive (right) centred scale conditions, there is a

    substantial use of the “very much” label and minimal use of the “agree” label. For example, in

    the equal interval balanced form, the “agree” label was used by subjects 39.5% of the time, but

    only 6.1% in the packed scale and 5.3% in the positive (right) centred scale. In the equal interval

    balanced scale, it could be hypothesized that respondents may round down their choice and

    select “agree.” However, in the positive (right) centred and positive packed, respondents seem to

    round up their choice to “very much agree.” The presence of an intermediate choice “very much

    agree” in the positive (right) centred and positive packed scale should produce a higher mean

    score. Therefore the positive (right) centred and positive packed scales reflect a more accurate

    level of positivity than the equal interval balanced scale condition.

  • 21

    Means and Standard Deviations

    Overall satisfaction results ranged from 92% to 96% across all three survey forms with

    mean scores ranging from 3.52/5 to 4.81/5 with a mode of 5. In the equal interval balanced scale

    condition, mean scores ranged from 3.70 to 4.65, with a mode of 5. In the positive (right) packed

    scale condition, mean scores ranged from 3.52 to 4.81, with a mode of 5. In the positive (right)

    centred scale condition, mean scores ranged from 3.88 to 4.77 with a mode of 5. The results of

    the current study are slightly lower but comparable to the PSQ 2008 baseline data from 2008

    which found that for overall satisfaction with telemedicine was 97%, with mean item rating

    ranging from 4.28/5 to 4.62/5 and a mode of 5 (Keresztes, et al., 2008). The individual item

    mean scores are graphically displayed in a scatterplot and boxplot format in Figure 4 and Figure

    5 illustrate the overlap in the distribution of mean scores across the 17 individual items (see

    Figures 4-5). Of particular interest, item 7Q (“rather see in person than by telemedicine”)

    produces the lowest mean in all three survey forms (see Figures 4-5).

  • 22

    Figure 5. Scatterplot of mean scores for individual items.

  • 23

    Figure 6. Boxplot of mean score data for individual items.

  • 24

    Individual Item Analysis

    The means and standard deviations for the 17 individual items are listed in Table 3.

    Table 3

    Mean and Standard Deviations by Individual Items

    Item Scale condition N Mean

    Standard

    deviation

    7a Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.43 Low

    4.77 High

    4.67 Med

    0.838 High

    0.472 Low

    0.648 Med

    7b Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.35 Low

    4.52 High

    4.38 Med

    0.828 Med

    0.850 High

    0.796 Low

    7c Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.28 Med

    4.31 High

    4.02 Low

    0.878 Low

    0.949 Med

    1.129 High

    7d Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.39 High

    4.27 Med

    4.23 Low

    0.712 Low

    1.216 High

    1.096 Med

    7e Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.65 Med

    4.81 High

    4.62 Low

    0.482 Med

    0.394 Low

    0.690 High

    7f Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.50 Low

    4.69 High

    4.63 Med

    0.505 Low

    0.624 High

    0.561 Med

  • 25

    Item Scale condition N Mean

    Standard

    deviation

    7g Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.56 Med

    4.67 High

    4.54 Low

    0.572 Low

    0.7532 High

    0.7531 Med

    7h Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.46 Low

    4.63 High

    4.50 Med

    0.573 Med

    0.570 Low

    0.780 High

    7i Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.54 Med

    4.50 Low

    4.62 High

    0.539 Low

    0.772 High

    0.599 Med

    7j Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.50 Low

    4.60 High

    4.56 Med

    0.541 Low

    0.644 High

    0.608 Med

    7k Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.48 Low

    4.58 High

    4.50 Med

    0.574 Low

    0.679 Med

    0.897 High

    71 Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.54 Low

    4.60 Med

    4.62 High

    0.503 Low

    0.707 High

    0.661 Med

    7m Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.26 Med

    4.25 Low

    4.42 High

    0.732 Low

    1.042 High

    0.936 Med

    7n Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.65 High

    4.25 Low

    4.42 Med

    0.555 Low

    1.120 High

    0.997 Med

  • 26

    Item Scale condition N Mean

    Standard

    deviation

    7o Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.556 Low

    4.563 Med

    4.60 High

    0.572 Low

    0.580 Med

    0.721 High

    7p Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.63 Med

    4.60 Low

    4.77 High

    0.525 Med

    0.676 High

    0.425 Low

    7q Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    3.70 Med

    3.52 Low

    3.88 High

    0.903 Low

    1.111 High

    0.943 Med

    The values across the three survey forms are very similar. The mean scores were lowest

    for the equal interval balanced scale condition occurred in 8 out of 17 items, with the

    intermediate position occurring with the positive centred scale condition (7 /17), and the highest

    mean score occurring with the positive packed scale condition (9/17). The standard deviation

    scores were highest for the positive packed scale condition occurring in 11 out of 17 items, and

    lowest for the equal interval balanced scale condition (12 /17) with the positive centred scale at

    the intermediate position (10/17).

  • 27

    Subscore Category Analysis

    The mean and standard deviations for the six subscore categories are reported in Table 4.

    Table 4

    Mean and Standard Deviations by PSQ Subscore

    Item Scale condition N Mean

    Standard

    deviation

    1. Expectations Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.2296 Low

    4.2792 High

    4.2385 Med

    0.47963 Low

    0.58599 High

    0.57263 Med

    2. Access Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.4537 High

    4.2500 Low

    4.4231 Med

    0.51641 Low

    0.89917 High

    0.78830 Med

    3. Technical Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.5741 Low

    4.7500 High

    4.6250 Med

    0.45977 Low

    0.47266 Med

    0.60127 High

    4. Communication Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.5139 Low

    4.5990 High

    4.5529 Med

    0.48422 Low

    0.55000 Med

    0.61504 High

    5. Privacy Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.5093 Low

    4.5938 High

    4.5577 Med

    0.51844 Low

    0.65766 Med

    0.75831 High

    6. Satisfaction Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    54

    48

    52

    4.5926 Med

    4.5833 Low

    4.6827 High

    0.50539 Low

    0.58649 High

    0.52421 Med

  • 28

    As in the individual item analysis, the values across the three survey forms are similar.

    The mean scores were lowest for the equal interval balanced scale condition in 4 subscore

    categories (4/6) with the highest mean scores occurring in the positive packed scale condition

    (4/6). The positive centred scale condition was at the intermediate position occurring in 5

    subscore categories (5/6).

    The standard deviation scores were the lowest for the equal interval balanced scale

    condition occurring in all 6 subscore categories. The positive packed and positive centred scale

    performed equally and tied both the highest and the intermediate positions, with 3 subscore

    categories each.

    Summary: Means and Standard Deviations

    A summary of the mean scores and standard deviations are listed in Table 5. Overall, the

    findings were consistent for the equal interval balanced scale only across the individual item and

    subscore category analyses.

    Table 5

    Summary of Means and Standard Deviations

    Mean

    Standard

    deviation

    Individual item analysis Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    Low

    High

    Med

    Low

    High

    Med

    Subscore category analysis Equal Interval Balanced

    Positive (Right) Packed

    Positive (Right) Centred

    Low

    Med

    High

    Low

    Equal Position (Med-High)

    Equal Position (Med-High)

  • 29

    The equal interval balanced scale condition had the lowest mean score in both individual

    and subscore category analyses. However, the intermediate and highest mean score positions

    were not consistent across the individual item analysis found the positive packed scale condition

    producing a higher mean score than the positive centred scale condition. The subscore category

    analysis found the opposite and had the positive centred scale condition producing the highest

    mean score (see Table 5).

    Regarding the variance as measured by the standard deviation, the equal interval scale

    condition had the lowest standard deviation score. This result was consistent with both individual

    item and subscore category analyses. The individual analysis found that the standard deviation

    was highest for the positive centred scale condition with the positive packed scale condition at

    the intermediate position. Instead, the subscore analysis revealed equal positions for both

    positive packed and positive centred conditions for highest and intermediate positions.

    Analysis of Variance (ANOVA)

    An analysis of variance (ANOVA) procedure was conducted to determine whether there

    are significant differences across the three survey forms. One-way ANOVA assumes that the

    variances of the conditions are equal. The sample size of each survey form condition differs and

    is as follows: 54 in the equal interval balanced scale condition, 48 in the positive packed scale

    condition, and 52 in the positive centred scale condition. A test of homogeneity of variances was

    conducted to assess the assumption of equal variances across the 3 survey form conditions and to

    deem whether the data is appropriate for ANOVA procedure.

    Tables 6 and 7 list the results of the Levene test for homogeneity of variances.

  • 30

    Table 6

    Levene’s Test of Homogeneity of Variances for Individual Items

    Levene statistic df1 df2 Sig.

    7A 6.358 2 151 .002 significant

    7B .061 2 151 .941 n.s

    7C 1.096 2 151 .337 n.s

    7D 4.421 2 151 .014 significant

    7E 6.285 2 151 .002 significant

    7F .177 2 151 .838 n.s

    7G .690 2 151 .503 n.s

    7H 1.614 2 151 .203 n.s

    7I 2.185 2 151 .116 n.s

    7J .183 2 151 .833 n.s

    7K 1.120 2 151 .329 n.s

    7L .488 2 151 .615 n.s

    7M 2.816 2 151 .063 n.s

    7N 6.707 2 151 .002 significant

    7O .042 2 151 .959 n.s

    7P 5.843 2 151 .004 significant

    7Q 2.711 2 151 .070 n.s

  • 31

    Table 7

    Levene’s Test of Homogeneity of Variances for Subscore

    Levene statistic df1 df2 Sig.

    Expectations 1.429 2 151 .243 n.s

    Access 6.767 2 151 .002 significant

    Technical 2.606 2 151 .077 n.s

    Communication 1.220 2 151 .298 n.s

    Privacy .768 2 151 .466 n.s

    Satisfaction 1.028 2 151 .360 n.s

    The significance value exceeded 0.05 for the majority of the items (12/17) in the

    individual item analysis, and 5 out of 6 categories in the subscore category analysis suggesting

    that the variances of the three survey conditions are equal and the assumption is justified. The

    results of the one-way ANOVA procedure are presented in Table 8 for individual item analysis

    and in Table 9 for subscore category analysis.

    The Levene statistic can also be used to determine standard deviation significance testing.

    The individual item analysis demonstrated significance in 6 out of 17 items in the 1 out of 6

    subscore categories in the subscore category analysis.

  • 32

    Table 8

    ANOVA Table for Individual Items

    ANOVA

    3.271 2 1.635 3.573 .030

    69.125 151 .458

    72.396 153

    .801 2 .400 .589 .556

    102.602 151 .679

    103.403 153

    2.633 2 1.317 1.342 .264

    148.127 151 .981

    150.760 153

    .716 2 .358 .343 .710

    157.543 151 1.043

    158.260 153

    1.104 2 .552 1.897 .154

    43.935 151 .291

    45.039 153

    .967 2 .484 1.526 .221

    47.870 151 .317

    48.838 153

    .480 2 .240 .496 .610

    72.923 151 .483

    73.403 153

    .720 2 .360 .854 .428

    63.676 151 .422

    64.396 153

    .351 2 .175 .429 .652

    61.734 151 .409

    62.084 153

    .278 2 .139 .391 .677

    53.806 151 .356

    54.084 153

    .293 2 .147 .276 .759

    80.148 151 .531

    80.442 153

    .190 2 .095 .242 .785

    59.213 151 .392

    59.403 153

    .976 2 .488 .594 .553

    124.063 151 .822

    125.039 153

    4.077 2 2.039 2.443 .090

    126.007 151 .834

    130.084 153

    .049 2 .025 .062 .940

    59.665 151 .395

    59.714 153

    .808 2 .404 1.346 .263

    45.303 151 .300

    46.110 153

    3.305 2 1.652 1.702 .186

    146.546 151 .971

    149.851 153

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    7A

    7B

    7C

    7D

    7E

    7F

    7G

    7H

    7I

    7J

    7K

    7L

    7M

    7N

    7O

    7P

    7Q

    Sum of

    Squares df Mean Square F Sig.

  • 33

    Table 9

    ANOVA Table for Subscore Categories

    ANOVA

    .070 2 .035 .117 .890

    45.055 151 .298

    45.124 153

    1.201 2 .601 1.082 .342

    83.827 151 .555

    85.028 153

    .821 2 .411 1.545 .217

    40.141 151 .266

    40.963 153

    .184 2 .092 .302 .740

    45.937 151 .304

    46.121 153

    .184 2 .092 .217 .805

    63.900 151 .423

    64.084 153

    .309 2 .155 .534 .587

    43.718 151 .290

    44.028 153

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    Between Groups

    Within Groups

    Total

    expectations

    access

    technical

    communication

    privacy

    satisfaction

    Sum of

    Squares df Mean Square F Sig.

    Overall, the ANOVA procedure did not find significance in both the individual item and

    subscore category analyses. The subscore category analysis also found no significance in all six

    categories (see Table 9). The individual item analysis yielded a significance level exceeding 0.05

    in 16 out of 17 items which suggests that there are no group differences overall (see Table 8).

    The results of the ANOVA revealed a statistically significant difference for only one item (7A)

    from the individual item analysis. The individual item analysis indicated significance for 7A,

    F(2,151) = 3.573, p= 0.03). To determine which group(s) differ, the Tukey post hoc test was

    conducted. Table 10 lists the pairwise comparison of the group means for the Tukey post hoc

    procedure.

  • 34

    Table 10

    Tukey HSD Procedure for Item 7A

    N Subset for alpha = .05

    1 2

    Equal Interval Balanced 54 4.43

    Positive (Right) Centred 52 4.67 4.67

    Positive (Right) Packed 48 4.77

    Sig. .158 .745

    The Tukey HSD procedure indicates that the equal interval balanced scale condition differs from

    the positive (right) packed scale condition (see Table 10). However, the equal interval balanced

    scale condition does not differ from the positive (right) centred scale condition. In addition, the

    positive (right) packed and positive (right) centred scale conditions do not differ from each other.

    Reliability Analysis

    The values are similar across all 3 survey forms. Overall, the reliability analysis yielded

    high Cronbach alpha coefficients for all the 3 survey forms, ranging 0.888 to 0.907 (see Table

    11). A high Cronbach alpha coefficient was consistent with PSQ baseline data from 2008 which

    reported a Cronbach alpha of 0.822 in 2008. Cronbach alpha for the equal interval balanced scale

    was measured at 0.888. Cronbach alphas were 0.898 for the positive packed scale and 0.907 for

    the positive centred scale. The lowest reliabilities occurred with the equal interval balanced scale

    with the highest from the positive centred scale.

  • 35

    Table 11

    Reliability Analysis

    Scale

    N M SD

    Cronbach

    alpha Variance

    Equal Interval Balanced 17 75.46 6.641 0.888 44.102

    Positive (Right) Packed 17 76.15 8.460 0.898 71.574

    Positive (Right) Centred 17 75.98 8.644 0.907 74.725

  • 36

    Chapter 5:

    Discussion and Conclusions

    Discussion

    The current study examines the effects of a positive (right) centered scale on the

    distribution and reliability of satisfaction responses in the health care domain. The research

    questions examine:

    1. the extent to which centredness of a rating scale provides a better discrimination of

    responses in telemedicine satisfaction survey; and

    2. the effects of right centered scale on the internal consistency and reliability of

    satisfaction responses.

    Three hypotheses were presented in the current study.

    Hypothesis 1 stated that the positive (right) centred scale will produce a lower

    mean than positive (right) packed scale, with the highest mean produced by the

    equal interval balanced scale.

    This hypothesis was not supported by the current study. No significance was found for mean

    score differences. The results indicate that the equal interval balanced scale condition yielded the

    lowest mean score. This finding was consistent in both individual item and subscore category

    analyses. However, the highest and intermediate mean score positions varied between the

    individual item and subscore category analyses. In the individual item analysis, the highest mean

    score was produced by the positive packed scale with the positive centred scale at the

    intermediate position. In the subscore category analyses the positive centred scale produced the

    highest mean score with the positive packed scale at the intermediate position..

  • 37

    The mean score findings are not consistent with Kolic‟s (2004) findings on scale

    centredness using frequency labels which found significant main effects for both centredness and

    packedness, further producing the lower mean score in the positive (right) centred scale (M =

    2.16), the intermediate mean score in the positive (right) packed scale (M = 2.24) and the higher

    mean score in the equal interval scale (M =2.37) (Kolic, 2004). However, the current study

    findings are similar with Klockars and Hancock (1993) on packedness using evaluation labels

    which found minimal effects and comparable discrimination for a 5–point positive packed scale,

    5–point equal interval balanced scale, 9–point equal interval balanced scale. Klockars and

    Hancock (1993) explain that the differences using evaluation labels instead of frequency labels

    may be due to high semantic elasticity of the labels used, for example evaluation labels do not

    have such absolute meaning as frequency labels do for respondents (Klockars & Hancock, 1993).

    This may be a plausible explanation for the differences in the current study findings using

    evaluation labels in comparison to Kolic‟s (2004) study using frequency labels. Lam and Kolic

    (2008) used an equal interval rating scale identical to the equal interval balanced scale condition

    employed in the current study and therefore, a similar effect should be expected for the mean

    score results. However, this was not the case. Lam and Kolic (2008) found that a matched

    condition produces a lower mean for the positive packed scale whereas a mismatched condition

    would produce of the positive packed and equal interval scales are equal (Lam & Kolic, 2008).

    The opposite effect was observed in the current study in which mean scores were lowest in equal

    interval scale versus the positive packed scale and does not support either conclusion.

    A more plausible explanation may be likely due to respondent strategies. Although it is

    expected that the saturation with positive anchors in a positive packed scale would produce lower

    mean scores for the equal interval scale, as attested in Lam and Kolic (2008), the differences

  • 38

    observed in the current study may be due to the location on the scale where the saturation took

    place. For example, in the current study, the presence of an intermediate choice “very much

    agree” in the positive centred and positive packed scale produces a higher mean score.

    Theoretically, it is probable that respondents used rounding strategy (round up to “very much

    agree”) in the presence of an intermediate option between “agree” and “strongly agree”, which

    may result in a lower mean in the equal interval scale condition. However, Lam and Kolic‟s

    (2008) matched condition used the “somewhat agree” label that expanded response options to the

    left of “agree” (somewhat disagree, somewhat agree, agree, strongly agree). Therefore the mean

    scores results in the current study are consistent with the saturation of positive labels which

    produced a “rounding up” of responses from “agree” to “very much agree” in turn yielding a

    higher mean score in the positive packed and positive centred scales. This is a likely explanation

    for the lower mean scores found in the equal interval balanced scale.

    Other plausible explanations may also be due to response bias. Lam and Kolic (2008)

    suggested that if scale labels are not compatible with item wording, the respondent will resort to

    satisficing or simply ignore the labels presented in the scale. (Lam & Kolic, 2008) For example,

    acquiescence bias is a form of satisficing and defined as the tendency to give positive responses

    or “yea-saying” (Streiner & Norman, 2008). Lam and Kolic (2008) attribute the absence of

    significant findings on semantic compatibility and variances in their study to a possible ceiling

    effect. A ceiling effect occurs when the responses are not evenly distributed and show a positive

    „skew‟ toward the favourable end which makes it impossible to distinguish among the various

    levels of excellence (Streiner & Norman, 2008). A possible method to counteract this bias is to

    offset the middle and expand in the area of interest (Streiner & Norman, 2008). Scale

    centredness may be a viable strategy to counteract ceiling effect as it offsets the middle and

  • 39

    expands the area of interest. The current study found non-significant effects although the

    distribution occurred in the manner hypothesized.

    Another plausible explanation concerns cognitive burden on respondents. Spector (1980)

    argued against using unequal scales as the cognitive burden on respondents is increased in the

    presence of these scales. He further identifies that cognitive burden affects older respondents or

    respondents with lower education levels who may strategize to select one choice and use

    throughout the responses. (Spector, 1976) The respondent population in the current study have

    similar characteristics in which 85% are over the age of 45. This may be a plausible explanation

    for the findings in the current study.

    Hypothesis 2 stated that the positive (right) centred scale will produce a larger

    variance than the positive (right) packed scale, with the lowest variance produced

    by the equal interval balanced scale.

    The hypothesis is supported. The current study found that the equal interval scale condition

    produced the lowest standard deviation score. This result was consistent with both individual

    item and subscore category analyses. The individual analysis found that the standard deviation

    was highest for the positive centred scale condition with the positive packed scale condition at

    the intermediate position. Instead, the subscore analysis revealed equal positions for both

    positive packed and positive centred conditions for highest and intermediate positions. No

    significant differences were found. The result on positive packed and equal interval balanced

    scale condition is consistent with Lam and Kolic (2008) which found that the standard deviation

    in the matched condition using evaluation labels was higher in the positive packed scale and

    lower in the equal interval scale. In Kolic‟s (2004) study on centredness, the effects of the

    different centred scales on the variability of response were not fully investigated; however a

  • 40

    higher standard deviation was reported for the negative (left) centred over the positive (right)

    centred scale. Therefore, a comparison regarding variability of responses to the positive centred

    scale in the current study was not possible.

    Hypothesis 3 stated that the positive (right) centred scale will produce a higher

    reliability coefficient than the positive (right) packed scale with the lowest

    produced for the equal interval balanced scale.

    This hypothesis was supported by the current study. High reliabilities were observed for all three

    survey forms, which is consistent with the reported reliability for the baseline PSQ. Reliability

    using Cronbach alpha was measured at 0.888 for the equal interval balanced scale, 0.898 for the

    positive packed scale, 0.907 for the positive centred scale. The lowest reliabilities occurred with

    the equal interval balanced scale with the highest from the positive centred scale. However, Lam

    and Kolic (2008) reported a lower alpha with the positive packed scale condition compared to

    the equal interval scale condition in both the matched and mismatched conditions. This finding

    was not consistent with the current study although the values are very close for Cronbach‟s alpha

    across all three surveys.

    Conclusion

    The current study addresses an interesting issue in rating scale design and has practical

    implication for survey research, particularly in satisfaction research. This current study agrees

    with Kolic‟s (2004) recommendation that future research is needed on scale centredness.

    Although there were no significant findings, the performance of the positive (right) centred scale

    on distribution of rating responses while maintaining a high reliability are noteworthy. In a

    positive respondent population, it is recommended that the positive (right) centred scale should

  • 41

    be preferred over a positive packed scale as it maintains to a larger degree the equal interval

    properties of a scale, does not distort results, and can enable the respondents to select a more

    accurate level of positiveness.

    Limitations of the Study

    The following are the limitations of the current study:

    1. Small sample size. The mean score, standard deviations and reliability alpha

    coefficients were very similar and therefore proved difficult to make comparison

    across the survey forms. A larger sample size should be considered in future studies.

    2. The current study used a convenience sample of telemedicine patients who had a

    scheduled appointment during the study period. Future research should employ

    random sampling method.

    3. Apart from Kolic‟s (2004) doctoral thesis dissertation, there was no empirical

    research in the area of scale centredness. Also, Kolic‟s (2004) study combined the

    selected levels of scale packedness and scale centredness in a factorial design

    producing six frequency scales which made it difficult for comparison to the current

    study.

    4. The label selection for both the positive packed and positive centred scales were not

    pre-determined for the study. The use of pre-determined scale values to select labels

    would provide a more precise measurement of intensity. It was not the scope of the

    current study to pre-determine scale values for the positive packed and positive

    centred scale conditions. Although Kolic (2004) used pre-determined scale values

  • 42

    from an additional experiment, scale labels were not identical to the current study as

    it excluded the labels of “neutral” and “strongly disagree.” Therefore the current

    study could not generate a comparison using pre-determined scale values. Kolic‟s

    study differs from the current study in that it used frequency labels and derived scale

    values from her experiment to normalize her analysis. However, the current study did

    not normalize the scale for the above analysis and used the scale values of 1, 2, 3, 4, 5

    for the positive packed and positive centred scale conditions. However, it is plausible

    to transform the scale and preserve the Likert scale values of 1–5 which are the actual

    weighted values confirmed by Likert and insert 4.5 for the label “very much agree”,

    the middle point between “agree” and “strongly agree.”

  • 43

    Figure 7 would be a normalized scale for the current study.

    < Psychological Continuum >

    Figure 7. Rating scale anchor choice and scale types.

    It was not the scope of this study to normalize the scale values as to mirror practical

    survey design as survey designers typically do not pre-determine values when they are

    constructing a rating scale, but select them arbitrarily. An analysis used the normalized scale

    manner may provide significant results.

  • 44

    Suggestions for Future Research

    It is recommended that additional research is needed to explore the effects of scale

    centredness on rating scale responses. Future research should include a large sample size as well

    as employ qualitative methods to further explore respondent strategies. It is important to note

    that much of the past research on scale packedness and centredness has been done in the

    education setting using university student populations. A known positive respondent population

    and existing survey instrument producing high satisfaction rates were employed in the current

    study. Differences found in the current research may be very likely attributed to the respondent

    population. Future research on scale centredness should include the health care sector as it may

    prove to be an optimal setting to explore rating scale techniques for positive respondent

    populations.

  • 45

    References

    Avis, M., Bond, M., & Arthur, A. (1997). Questioning patient satisfaction: An empirical

    investigation in two outpatient clinics. Social Science & Medicine, 44(1), 85-92.

    Barnette, J. J. (2000). Effects of stem and Likert response option reversals on survey internal

    consistency: If you feel the need, there is a better alternative to ssing those negatively

    worded stems. Educational and Psychological Measurement, 60(3), 361-370.

    Castle, N. G., Brown, J., Hepner, K. A., & Hays, R. D. (2005). Review of the literature on survey

    instruments used to collect data on hospital patients' perceptions of care. Health Services

    Research, 40(6 Pt 2), 1996-2017.

    Currell, R., Urquhart, C., Wainwright, P., & Lewis, R. (2000). Telemedicine versus face to face

    patient care: effects on professional practice and health care outcomes. Cochrane

    Database of Systematic Reviews (2), CD002098.

    Demiris, G. (2006). Principles of survey development for telemedicine applications. Journal of

    Telemedicine and Telecare, 12(3), 111-115.

    Dixon, P. N., Bobo, M., & Stevick, R. A. (1984). Response differences and preferences for all-

    category-defined and end-defined Likert formats. Educational and Psychological

    Measurement, 44, 61-66.

    Hancock, G. R., & Klockars, A. J. (1991). The effect of scale manipulations on validity:

    Targetting frequency rating scales for anticipated performance levels. Applied

    Ergonomics, 22(3), 147-154.

    Keresztes, C., Hartford, K., & Wilk, P. (2008, October). Measuring patient satisfaction with

    telemedicine: Establishing psychometric properties. Paper presented at the Canadian

    Society of Telehealth, Ottawa, Ontario, Canada.

  • 46

    Klockars, A. J., & Hancock, G. R. (1993). Manipulations of evaluative rating scales to increase

    validity. Psychological Reports, 73,1059-1066.

    Kolic, M. C. (2004). An empirical investigation of factors affecting Likert-type rating scale

    responses. Unpublished doctoral dissertation, University of Toronto, Toronto, Ontario,

    Canada.

    Lam, T. C. M., & Klockars, A. J. (1982). Anchor point effects on the equivalence of

    questionnaire items. Journal of Educational Measurement, 19(4), 317-322.

    Lam, T. C. M., & Kolic, M. C. (2008). Effects of sematic incompatibility on rating response.

    Applied Psychological Measurement, 32(3), 248-260.

    Lam, T. C. M., & Stevens, J. J. (1994). Effects of content polarization, item wording, and rating

    scale width on rating response. Applied Measurement in Education, 7(2), 141-158.

    Lozano, L. M., Garcia-Cueto, E., & Muniz, J. (2008). Effect of the number of response

    categories on the reliability and validity of rating scales. Methodology, 4(2), 73-79.

    Mair, F., & Whitten, P. (2000). Systematic review of studies of patient satisfaction with

    telemedicine. BMJ, 320(7248), 1517-1520.

    Mekhjian, H., Turner, J. W., Gailiun, M., & McCain, T. A. (1999). Patient satisfaction with

    telemedicine in a prison environment. Journal of Telemedicine and Telecare, 5(1), 55-61.

    Meric, H. J. (1994). The effect of scale form choice on psychometric properties of patient

    satisfaction measurement. Health Marketing Quarterly, 11(3-4), 27-39.

    Sitzia, J., & Wood, N. (1997). Patient satisfaction: A review of issues and concepts. Social

    Science & Medicine, 45(12), 1829-1843.

  • 47

    Spector, P. E. (1976). Choosing response categories for summated rating scale. Journal of

    Applied Psychology, 61, 374-375.

    Spector, P. E. (1980). Ratings of equal and unequal response choice intervals. Journal of Social

    Psychology, 112, 115-119.

    Streiner, D. L., & Norman, G. R. (2008). Health measurement scales: A practical guide to their

    development and use (3rd

    . ed.). New York: Oxford Univesity Press.

    Uebersax, J. S. (2006). Likert scales: Dispelling the confusion. Retrieved September 15, 2009,

    from Statistical Methods for Rater Agreement Web site, http://john-

    uebersax.com/stat/likert.htm

    Ware, J. E., Jr. (1978). Effects of acquiescent response set on patient satisfaction ratings.

    Medical Care, 16(4), 327-336.

    Whitten, P. S., & Mair, F. (2000). Telemedicine and patient satisfaction: Current status and

    future directions. Telemedicine Journal and e-Health, 6(4), 417-423.

    Wildt, A. R., & Mazis, M. B. (1978). Determinants of scale response: Label versus position.

    Journal of Marketing Research, 15(May), 261-267.

    Williams, T. L., May, C. R., & Esmail, A. (2001). Limitations of patient satisfaction studies in

    telehealthcare: A systematic review of the literature. Telemedicine Journal and e-Health,

    7(4), 293-312.

    http://john-uebersax.com/stat/likert.htmhttp://john-uebersax.com/stat/likert.htm

  • 48

    Appendix A

    Invitation to Participate

  • 49

    Appendix B

    Study Participant Survey:

    Version 2, Format A

  • 50

  • 51

  • 52

  • 53

    Appendix C

    Study Participant Survey:

    Version 3, Format B

  • 54

  • 55

  • 56

  • 57

    Appendix D

    Study Participant Survey:

    Version 3, Format C

  • 58

  • 59

  • 60