interexaminer reliability of observations in physical ... · obtained in muscle strength testing...

7
Interexaminer Reliability of Observations in Physical Examinations of the Neck EIRA VIIKARI-JUNTURA The purpose of this study was to collect data on interexaminer reliability of a set of tests representative of the clinical examination of a patient with neck and radicular pain. A conventional neurological examination, palpations, and tests for the provocation or relief of radicular symptoms were performed on 52 patients by two independent raters. Good reliability was obtained in the atrophy inspection of the small muscles of the hand, in the sensitivity tests for touch and pain, and in the neck compression and axial manual traction tests. Fair reliability was obtained in muscle strength testing and in the estimation of the range of motion, and poor reliability was obtained for many palpations. Poor standardization of examination procedures and changes in the patients' attention were considered the main factors affecting reliability. Better operational definitions and proce- dures, such as the standardization of palpation pressure and traction force, are suggested for future studies. Key Words: Neck, Physical examination, Physical therapy, Tests and measurements. The physical examination and the client's medical history are important components of the clinical examination. The physical signs provide the objective material on which the clinical diagnosis, decisions for further examinations, and therapy are based. In epidemiological studies of "normal" populations, clinicalfindingsmay help to classify the subjects into different categories. For all of these purposes, good reli- ability of the findings is necessary. In physical examination procedures, intraexaminer reliability usually is better than interexaminer reliability (eg, in the detection of physical signs of airways obstruction, 1 the estimation of the size of the liver by percussion, 2 and the evaluation of passive intervertebral motion of the lumbar spine. 3 Most studies on interexaminer reliability of tests for spinal diseases have dealt with the low back. Good reliability has been achieved in measuring the range of motion of the lumbar spine. 4-6 Tests, however, in which an anatomical abnormality or a physical sign is recorded only as either absent or present have been less reliable (eg, the straight-leg-raising test, 5,6 the femoral nerve stretch test, 4 tests for pain in the lumbar spine during forward flexion, 6 and evaluations of lumbar lordosis by inspection. 4 ) Waddell et al reported that the anatomical localization of tenderness in the lumbar spine by palpation and many other commonly mentioned signs were so unreli- able that they were discarded from further studies. 5 Possibly, only studies with "good" results have been published, and the interexaminer reliability concerning many physical examina- tion procedures may be even worse than can be gauged from the literature. The purpose of this study was to collect data on interex- aminer reliability of a set of tests representative of the clinical examination of a patient with neck and radicular pain. We focused on conventional neurological examination, palpation, and tests for provocation or relief of radicular symptoms. We wanted to determine whether certain examination procedures, because of satisfactory reliability, could be recommended for clinical and scientific practice and for further use in our validity studies. METHOD Subjects Sixty-nine consecutive patients of the Neurosurgery De- partment of Helsinki University Central Hospital referred for cervical myelography between March 7,1982, through March 10, 1983, and February 14, 1984, through January 7, 1985, were selected for the study. For practical reasons, 17 patients could not be examined by the two raters; thus, thefinalstudy group consisted of 52 patients. The patients had been referred for neurosurgical evaluation, and the decision to perform a myelographic examination already had been made. Twenty- nine of the patients were men aged 13 to 66 years (X = 48 years, s = 10.9 years), and 23 were women aged 37 to 80 years (X = 53 years, s = 9.5 years). Thefinaldiagnoses of the patients when leaving the hospital are presented in Table 1. Patients with cervical spondylosis (n = 32), the most common diagnosis, usually had neck pain and radicular symptoms or motor weakness; in addition, eight of these patients had spinal cord compression symptoms. The miscellaneous disorders of five patients were Brown-Sequard syndrome after a Cloward operation, residual state after spondylitis, cerebropathy, acute torticollis, and residual state after a shoulder joint injury. Procedure One day before the myelography, the patients were exam- ined by two raters, a physician who specialized in physical medicine and rehabilitation (Rater A) and a physical therapist Ms. Viikari-Juntura is a physician who has specialized in physical medicine and rehabilitation. This study was conducted at the Physiotherapy and Neu- rosurgery Departments of the Helsinki University Central Hospital and at the Institute of Occupational Health, Helsinki, Finland. Address all correspondence to Institute of Occupational Health, Topeliuksenkatu 41 a A, SF-00250 Hel- sinki, Finland. This article was submitted March 13, 1986; was with the author for revision 12 weeks; and was accepted January 21, 1987. Potential Conflict of Interest; 4. 1526 PHYSICAL THERAPY

Upload: trandang

Post on 26-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interexaminer Reliability of Observations in Physical ... · obtained in muscle strength testing and in the estimation of the range of motion, ... To study the reliability of palpation,

Interexaminer Reliability of Observations in Physical Examinations of the Neck

EIRA VIIKARI-JUNTURA

The purpose of this study was to collect data on interexaminer reliability of a set of tests representative of the clinical examination of a patient with neck and radicular pain. A conventional neurological examination, palpations, and tests for the provocation or relief of radicular symptoms were performed on 52 patients by two independent raters. Good reliability was obtained in the atrophy inspection of the small muscles of the hand, in the sensitivity tests for touch and pain, and in the neck compression and axial manual traction tests. Fair reliability was obtained in muscle strength testing and in the estimation of the range of motion, and poor reliability was obtained for many palpations. Poor standardization of examination procedures and changes in the patients' attention were considered the main factors affecting reliability. Better operational definitions and proce­dures, such as the standardization of palpation pressure and traction force, are suggested for future studies. Key Words: Neck, Physical examination, Physical therapy, Tests and measurements.

The physical examination and the client's medical history are important components of the clinical examination. The physical signs provide the objective material on which the clinical diagnosis, decisions for further examinations, and therapy are based. In epidemiological studies of "normal" populations, clinical findings may help to classify the subjects into different categories. For all of these purposes, good reli­ability of the findings is necessary. In physical examination procedures, intraexaminer reliability usually is better than interexaminer reliability (eg, in the detection of physical signs of airways obstruction,1 the estimation of the size of the liver by percussion,2 and the evaluation of passive intervertebral motion of the lumbar spine.3

Most studies on interexaminer reliability of tests for spinal diseases have dealt with the low back. Good reliability has been achieved in measuring the range of motion of the lumbar spine.4-6 Tests, however, in which an anatomical abnormality or a physical sign is recorded only as either absent or present have been less reliable (eg, the straight-leg-raising test,5,6 the femoral nerve stretch test,4 tests for pain in the lumbar spine during forward flexion,6 and evaluations of lumbar lordosis by inspection.4) Waddell et al reported that the anatomical localization of tenderness in the lumbar spine by palpation and many other commonly mentioned signs were so unreli­able that they were discarded from further studies.5 Possibly, only studies with "good" results have been published, and the interexaminer reliability concerning many physical examina­tion procedures may be even worse than can be gauged from the literature.

The purpose of this study was to collect data on interex­aminer reliability of a set of tests representative of the clinical

examination of a patient with neck and radicular pain. We focused on conventional neurological examination, palpation, and tests for provocation or relief of radicular symptoms. We wanted to determine whether certain examination procedures, because of satisfactory reliability, could be recommended for clinical and scientific practice and for further use in our validity studies.

METHOD

Subjects

Sixty-nine consecutive patients of the Neurosurgery De­partment of Helsinki University Central Hospital referred for cervical myelography between March 7,1982, through March 10, 1983, and February 14, 1984, through January 7, 1985, were selected for the study. For practical reasons, 17 patients could not be examined by the two raters; thus, the final study group consisted of 52 patients. The patients had been referred for neurosurgical evaluation, and the decision to perform a myelographic examination already had been made. Twenty-nine of the patients were men aged 13 to 66 years (X = 48 years, s = 10.9 years), and 23 were women aged 37 to 80 years (X = 53 years, s = 9.5 years). The final diagnoses of the patients when leaving the hospital are presented in Table 1. Patients with cervical spondylosis (n = 32), the most common diagnosis, usually had neck pain and radicular symptoms or motor weakness; in addition, eight of these patients had spinal cord compression symptoms. The miscellaneous disorders of five patients were Brown-Sequard syndrome after a Cloward operation, residual state after spondylitis, cerebropathy, acute torticollis, and residual state after a shoulder joint injury.

Procedure

One day before the myelography, the patients were exam­ined by two raters, a physician who specialized in physical medicine and rehabilitation (Rater A) and a physical therapist

Ms. Viikari-Juntura is a physician who has specialized in physical medicine and rehabilitation. This study was conducted at the Physiotherapy and Neu­rosurgery Departments of the Helsinki University Central Hospital and at the Institute of Occupational Health, Helsinki, Finland. Address all correspondence to Institute of Occupational Health, Topeliuksenkatu 41 a A, SF-00250 Hel­sinki, Finland.

This article was submitted March 13, 1986; was with the author for revision 12 weeks; and was accepted January 21, 1987. Potential Conflict of Interest; 4.

1526 PHYSICAL THERAPY

Page 2: Interexaminer Reliability of Observations in Physical ... · obtained in muscle strength testing and in the estimation of the range of motion, ... To study the reliability of palpation,

RESEARCH TABLE 1 Final Diagnoses of Patients in Reliability Study (N = 52)

Diagnosis

Cervical spondylosis Neurological tumor Disseminated sclerosis Peripheral upper limb paresis Brachial plexus involvement Miscellaneous Unknown TOTAL

Number of Patients

32 6 2 2 2 5 3

52

who specialized in neurology and orthopedics (Rater B). Twenty patients were examined first by Rater A, and 25 patients were examined first by Rater B. Five other patients were examined on the same day, but the order of the exami­nations was not known; the remaining two patients were examined on successive days. The time interval between the successive physical examinations of the same patient by Ra­ters A and B ranged from 15 minutes to 3 hours, although it usually was less than 1 hour (except for the two patients examined on successive days). The prestructured physical examination was performed by both raters, whereas the pa­tients were interviewed only by Rater A after the physical examination.

Before commencing the study, the two raters practiced the physical examination on five healthy individuals and five patients with cervical spondylosis. During this practice period, particular attention was paid to the identical performance of the different tests and the interpretation of the results. In addition, the principles of the physical examination were reviewed three months after the study had begun.

Neither examiner knew the hypothetical diagnosis of the patients. For reasons of safety, the neurosurgeon in charge verified that the patients had no known primary or secondary cancer of the cervical spine region nor cervical spine malfor­mation. Also, patients with a diagnosis of rheumatoid arthritis were excluded. The informed consent of each patient was obtained before the examination.

The physical examination of each patient consisted of three series of tests.

Conventional neurological examination. Muscle atrophy was inspected in the upper limbs, shoulders, and scapular regions. Attention was paid to apparent differences between the two sides and to significant absolute atrophy. The findings were classified into one of three categories: 1) no atrophy (normal configuration of the muscle belly), 2) moderate atrophy (poor configuration of the muscle belly), and 3) marked atrophy (muscle belly not visualized or a hollow instead of the muscle belly). The side of the positive finding and the location according to muscles or muscle groups (eg, forearm) were recorded.

Muscle strength was tested in four muscles representing the myotomes C5-C8. Both sides were tested simultaneously to better detect differences between the two sides. Muscle strength was graded as "normal," "reduced," and "markedly reduced," based primarily on the differences between the two sides. If the muscle strength apparently was reduced bilater­ally, a rough comparison based on the patient's age and general condition was made to the examiner's resisting strength. Anterior, middle, and posterior parts of the deltoid muscle were tested by resisting flexion, abduction, and exten-

Fig. 1. Indicator areas for different cervical dermatomes used during sensitivity testing.

sion of the humerus, respectively (with the patient in a sitting position, the trunk was stabilized by the examiner in flexion and extension). Biceps brachii muscle strength was assessed by resisting elbow flexion when the forearm was supinated.7

Triceps brachii muscle strength was tested resisting elbow extension from 90 degrees of elbow flexion. The dorsal inter-osseus muscles were tested by resisting the separation of the second through fifth fingers.

Sensitivity to light touch and to pain were tested using indicator areas for different cervical dermatomes (Fig. 1). Sensitivity to light touch was tested by having the examiners touch the different areas with their fingers; sensitivity to pain was tested with injection needles. Both sides were tested simultaneously, and the results compared. If the examiners detected a bilateral change in sensitivity, a comparison was made to an adjacent dermatome with normal sensitivity. The findings were graded as 1) normal sensitivity; 2) hypesthesia or anesthesia; and 3) hyperesthesia, dysesthesia, or paresthesia.

Sensitivity to vibration was tested with vibrating tuning forks (256 Hz) placed on the ulnar styloid processes and lateral malleoli. The examiners focused on the differences between the two sides as reported by the patient. If a bilateral change was noted, the upper extremities were compared with the lower extremities or vice versa. The findings were graded as normal sensitivity and reduced or absent sensitivity.

Palpation of neck and shoulder region. Before the exami­nation, the patient was instructed to report any sensation of

Volume 67 / Number 10, October 1987 1527

Page 3: Interexaminer Reliability of Observations in Physical ... · obtained in muscle strength testing and in the estimation of the range of motion, ... To study the reliability of palpation,

Fig. 2. Palpation of the neck and shoulder region. The circles indicate the palpated areas. The following areas were used in the analysis of the findings: 1) upper spinous processes, 2) lower spinous processes, 3) right side of neck, 4) left side of neck, 5) right suprascapular area, 6) left suprascapular area, 7) right scapular area, and 8) left scapular area.

pain or tenderness during the physical examination. With the patient seated and the hands relaxed on the thighs, the muscle insertion sites and trigger points8 shown in Figure 2 were palpated. The spinous processes of the cervical vertebrae were palpated with the patient in a supine position. The findings were graded as 1) no tenderness (no report from the patient), 2) moderate tenderness (report of pain from the patient), and 3) marked tenderness (report of pain and jump sign). The muscle tone of the trapezius and scapular muscles was tested manually and graded as 1) normal (normal consistency), 2) moderately increased (increased consistency), and 3) greatly increased (greatly increased consistency).

Clinical tests. Several tests were performed during which radicular pain, numbness, or paresthesia either was provoked (aggravated) or relieved. These tests were divided into tests of single movements of the cervical spine and specific tests.

Single movements of the cervical spine that were tested were 1) rotations, 2) lateral flexions, and 3) flexion and extension. Rotations were tested by having the patients rotate their head as far as possible to each side while the examiner observed from behind. The ROM was estimated and classified as normal (>80°), limited (60°-80°), and markedly limited (<60°). Lateral flexions were tested in a similar manner, and the ROM was classified as normal (>30°), limited (20°-30°), and markedly limited (<20°). Flexion and extension were

observed from the side, and the ROM was classified as normal (>45°), limited (30°-45°), and markedly limited (<30°). In all movements, the examiner tested for the presence of radicular pain, paresthesia, or numbness in the upper extremities. At the end of the active ROM, the examiner gently tested the movement manually to determine whether further rotation provoked or aggravated pain or paresthesia. The Lhermitte's sign (electric shock-like phenomenon in the legs) also was recorded when present in flexion.

Specific tests that were performed were tests of 1) neck compression, 2) brachial plexus tension, 3) shoulder abduc­tion relief, and 4) axial manual traction. The neck compres­sion test9 was performed with the patient both in a sitting and in a supine position. In this test, the examiner flexed laterally and slightly rotated and then compressed the patient's head. A compression of about 7 kg was applied in this study. The appearance or aggravation of radicular pain, paresthesia, or numbness in the shoulder-upper arm or forearm-hand region (ipsilateral or contralateral) were recorded.

The brachial plexus tension test10 was performed with the patient in the supine position. For this test, the examiner first abducted the humerus to the limit of pain-free motion, then rotated the forearm and humerus outward, and finally flexed the elbow. If no limitation of movement occurred, the hu­merus was abducted to 90 degrees. The appearance or aggra­vation of pain or paresthesia in the shoulder or arm was

1528 PHYSICAL THERAPY

Page 4: Interexaminer Reliability of Observations in Physical ... · obtained in muscle strength testing and in the estimation of the range of motion, ... To study the reliability of palpation,

RESEARCH

TABLE 2 Interexaminer Reliability of Inspection of Muscle Atrophy and Muscle Strength Testing (N = 52)

Examination Procedure or Test

Muscle atrophy Right side Left side Right deltoid Left deltoid Right triceps brachii Left triceps brachii Right hypothenar Left hypothenar

Muscle strength Right deltoid Left deltoid Right biceps brachii Left biceps brachii Right triceps brachii Left triceps brachii Right dorsal interossei Left dorsal interossei

Abnormal or Positive Findings

Rater A

35 27 6 8

15 17 17 12

12 6 8 4

35 8

21 15

(%) Rater B

25 31

8 12 2 4 8

12

17 13 21 13 27 17 27 13

.50a

.35a

. . .b

.33 . . .b

.32

.57

.81

.46 . . .b

.47 . . .b

.64

.40

.42

.45

Ps

.64

.53

.57

.40

.22

.36

.61

.83

.53

.60

.53

.44

.75

.44

.56

.53

recorded. This test was included in the study from patient 17 onward.

The shoulder abduction relief test11 was performed in the presence of radicular pain, paresthesia, or numbness with the patient in a sitting position. In this test, the patient lifted his hand above his head; a positive result was the decrease or disappearance of the radicular symptom. This test also was included in the study from patient 17 onward.

The axial manual traction test was performed in the pres­ence of radicular symptoms with the patient in a supine position. An axial traction force corresponding to 10 to 15 kg was applied. A positive finding was the decrease or disappear­ance of radicular symptoms.

Data Analysis In this article, interexaminer reliability is defined as agree­

ment between successive examinations made by two raters of the same patient. Kappa (K) and weighted Kappa (Kw) reliability coefficients were used to express interrater agree­ment.12 The empirical value of Kappa , however, depends on the prevalence of the positive findings; it is attenuated most severely toward low values when the prevalence is either particularly low or high.13 Consequently, the Kappa value was given when the prevalence of the positive findings was be­tween 10% and 90% (the mean of the prevalences obtained by Raters A and B), and the value was not much distorted. In this study, values of < .40 were considered to be poor, .40 < < .75 were fair to good, and > .75 were excellent.14

The proportion of specific agreement (A),1 2 which is the proportion of patients about whom raters agree on the pres­ence of positive findings, also was used for interrater agree­ment. Values for ps < .50 were considered to be poor, .50 < ps < .80 were fair to good, and ps > .80 were excellent.

To study the reliability of palpation, the neck and shoulder region was divided into eight areas (Fig. 2). The findings of

an area were considered to be consistent if the presence or absence of tenderness tallied.

RESULTS

The percentages for prevalence of the positive findings in the different examinations performed by the two raters and the values for reliability of the findings are presented in Tables 2 to 6. Because markedly positive findings of the tests in Tables 2, 4, and 6 were few, the two categories of positive findings (positive or abnormal and markedly positive or ab­normal) have been combined in the analysis.

Conventional Neurological Examination In the inspection of muscle atrophy (Tab. 2), the reliability

values of the different muscles or muscle groups varied from poor to excellent, those of the hypothenar muscles being the best and those of the triceps brachii muscles being the worst. Most tests of muscle strength had fairly reliable results.

Good reliability was obtained in testing the patients' sensi­tivity to touch and pain (Tab. 3). The Kappa coefficients of indicator areas C5-C8 ranged from .41 to .74 for sensitivity to touch and from .29 to .68 for sensitivity to pain.

Palpation of Neck and Shoulder Region In five neck and scapular areas, the reliability of palpation

was fair and in three areas was poor (Tab. 4). The Kappa coefficient of muscle tone was .40.

Clinical Tests The assessment of the ROM in the single movements of

the cervical spine was fairly reliable (Tab. 5). The prevalence of radicular pain in different movements ranged from 0% to 10% ( = 2.9%); thus, the value of Kappa was not calculated.

a The prerequisite for consistency was that a positive finding for at least one muscle or muscle group tallied. b was omitted because the prevalence was less than 10%.

Volume 67 / Number 10, October 1987 1529

Page 5: Interexaminer Reliability of Observations in Physical ... · obtained in muscle strength testing and in the estimation of the range of motion, ... To study the reliability of palpation,

TABLE 3 Interexaminer Reliability of Sensitivity Testing (N = 52)

Test

Sensitivity to light touch Right side Left side

Sensitivity to pain Right side Left side

Sensitivity to vibration Right upper extremity Left upper extremity Right lower extremity Left lower extremity

Hypesthesia or

Anesthesia (%)

31 17

42 31

13 4

21 21

Rater A

Hyperesthesia (%)

4 2

6 2

Hypesthesia or

Anesthesia (%)

29 11

39 23

15 10 21 17

Rater B

Hyperesthesia (%)

6 6

6b

2

.62a

.64a

.54a

.41a

.45 c

.54

.51

Ps

.76

.71

.77

.57

.53

.29

.64

.60

TABLE 4 Interexaminer Reliability of Tenderness to Palpation in Neck and Shoulder Region (N = 51)

Neck-Shoulder Areaa

Upper spinous processes (1)

Lower spinous processes (2)

Right side of neck (3)

Left side of neck (4)

Right suprascapu­lar area (5)

Left suprascapular area (6)

Right scapular area (7)

Left scapular area (8)

Positive Findings (%)

Rater A

23

27

12

10

20

12

14

12

Rater B

12

19

8

0

18

12

14

8

.47

.52

.24

b

.42

.44

.34

.56

Ps

.56

.67

.33

.00

.53

.50

.43

.60

The Lhermitte's sign was present in flexion in one patient with disseminated sclerosis.

The reliability of most items of the neck compression test performed with the patient positioned sitting was good (Tab. 6). The same test performed with the patient positioned supine was less reliable. This test could not be performed on five patients who experienced considerable pain when positioned supine. The brachial plexus tension test had poor reliability, the shoulder abduction relief test had fair reliability, and the axial manual traction test had good reliability.

DISCUSSION

Reliability Some items of the conventional neurological examination,

such as the inspection of atrophy of the hypothenar muscles

and the testing of the patients' sensitivity to light touch or pain in some indicator areas, had good reliability. Inspection of the atrophy of some other muscles (eg, triceps brachii) was far less reliable. Similarly, the reliability of findings in sensi­tivity testing varied largely in different indicator areas with no consistent pattern. We are aware of only two reports on the reliability of the conventional neurological examination5,6

and none containing data on the inspection of muscle atrophy or sensitivity testing. Waddell et al have reported good inter­examiner reliability ( = .62) in assessing "root compression signs," which are summarized data from conventional neu­rological examinations.5 In their study, patients with low back complaints were examined by two orthopedists.

The results of the muscle strength tests were only fairly reliable. In a study in which 10 physical therapists examined the muscle strength of two patients with poliomyelitis, 2 examiners agreed completely in the grading of a muscle or muscle group on an average of 47.8% of instances (grading into six categories); they agreed within one grade on an average of 91.2% of instances.15 Because they tested many muscles of only 2 patients and we tested four muscles of 52 patients, our results are not comparable. Their results, how­ever, show that muscle testing may be highly reliable in controlled conditions with selected patients.

Of the four clinical tests for the provocation or relief of radicular pain, the neck compression test performed with the patient in a sitting position and the axial manual traction test had good reliability. The neck compression test performed with the patient positioned supine gave less reliable results. This finding may have resulted because the patients usually were most relaxed and had less difficulty when sitting up straight or because the examiner may have had more difficulty reproducing the proper test position with the patient posi­tioned supine. We were surprised that a simple test such as the shoulder abduction relief test was not more than fairly reliable. The brachial plexus tension test had poor reliability, possibly because it is rather difficult to perform.

Factors Affecting Reliability Poor standardization is a common feature of many tests

used in physical examinations. Reliability certainly was af-

a The prerequisite for consistency was that a positive finding for at least one indicator area tallied. b One patient had diminished sensitivity in one indicator area and hyperesthesia in another area on the same side. c was omitted because the prevalence was less than 10%.

a The numbers in parentheses refer to those in Figure 2. b was omitted because the prevalence was less than 10%.

1530 PHYSICAL THERAPY

Page 6: Interexaminer Reliability of Observations in Physical ... · obtained in muscle strength testing and in the estimation of the range of motion, ... To study the reliability of palpation,

RESEARCH

TABLE 5 Interexaminer Reliability of Estimated Range of Motion of Cervical Spine (N = 52)

Type of Movement

Rotation to right Rotation to left Lateral flexion to right Lateral flexion to left Forward flexion Extension

Limited

33 27 27 35 19 6

Rater A

Positive Findings (%)

Markedly Limited

31 29 13 8 0 6

Limited

41 36 39 48 15 21

Rater B

Markedly Limited

15 4

21 12 6 6

.56

.40

.51

.41

.43

.56

TABLE 6 Interexaminer Reliability of Neck Compression, Brachial Plexus Tension, Shoulder Abduction Relief, and Axial Manual Traction Tests

Test

Neck Compression Rotation and lateral flexion to

the right (sitting): Pain or paresthesia in

Right shoulder or upper arm Left shoulder or upper arm Right forearm or hand Left forearm or hand

Rotation and lateral flexion to the left (sitting):

Pain or paresthesia in Right shoulder or upper arm Left shoulder or upper arm Right forearm or hand Left forearm or hand

Rotation and lateral flexion to the right (supine):

Pain or paresthesia in Right shoulder or upper arm Left shoulder or upper arm Right forearm or hand Left forearm or hand

Rotation and lateral flexion to the left (supine):

Pain or paresthesia in Right shoulder or upper arm Left shoulder or upper arm Right forearm or hand Left forearm or hand

Brachial Plexus Tension Pain or paresthesia in

Right shoulder or arm Left shoulder or arm

Shoulder Abduction Relief Relief of radicular symptom on

Right side Left side

Axial Manual Traction Relief of radicular pain

Rater A

Positive Findings

(%)

17 4

13 4

6 8 4

10

10 4

10 4

6 6 6 4

22 11

37 27

28

n

52 52 52 52

52 52 52 52

51 51 51 51

51 51 51 51

36 36

16* 15d

29f

Rater B

Positive Findings

(%)

27 12 16 8

12 25 10 14

15 7

11 4

11 22 4 7

14 8

53 50

43

n

51 51 51 51

51 51 51 51

46 46 46 46

46 46 46 46

36 36

17c

14e

30g

.61 a

.77 a

a

.40

.54

.62

.28 a

.63 a

a

.40 a

a

.35 a

.21

.40

.50

Ps

.70

.50

.80

.67

.67

.47

.57

.67

.36

.50

.67

.67

.29

.46

.50

.40

.46

.57

.57

.67

.71 a was omitted because the prevalence was less than 10%. b Test performed to 16 of 36 examined patients. c Test performed to 17 of 36 examined patients. d Test performed to 15 of 36 examined patients. e Test performed to 14 of 36 examined patients. f Test performed to 29 of 52 examined patients. g Test performed to 30 of 52 examined patients.

Volume 67 / Number 10, October 1987 1531

Page 7: Interexaminer Reliability of Observations in Physical ... · obtained in muscle strength testing and in the estimation of the range of motion, ... To study the reliability of palpation,

fected in this study because the pressure in palpations and force in manual compression and traction were not standard­ized. The manual assessment of muscle tone and strength and the inspection of muscle atrophy without model pictures were highly subjective. In the neck compression test, the position of the cervical spine and head with combined lateral flexion and rotation is not easy to reproduce.

The results of sensitivity testing, palpations, and the tests where radicular pain was provoked or relieved rely on the patient's subjective report. Thus, changes in the patient's attention affect reliability. Also, the presence of radicular pain and muscle strength in the presence of pain may change within short time periods. Not much, however, can be done to eliminate these factors.

Some systematic errors between the two examiners oc­curred. Rater B produced radicular pain or paresthesia of the shoulder or upper arm more often in the neck compression test than Rater A, whereas Rater A reported more atrophies of the triceps brachii muscle than did Rater B.

We studied whether a recently performed physical exami­nation had any effect on the findings of a second examination. Radicular pain caused by the single movements of the neck and by the neck compression test was analyzed. Comparison of the two groups of patients with a different examination order by Rater A and Rater B revealed that both raters, when being the second examiner, produced radicular pain more often than when being the first examiner. The mean number of positive tests per patient for Rater A was 1.6 as the first examiner and 2.3 as the second examiner; the mean number of positive tests for Rater B was 3.0 as the first examiner and 3.2 as the second examiner. This finding suggests that the

patients experienced more pain after the first examination. These differences, however, were not statistically significant.

Although this study was conducted over a three-year period, the disparities in the findings between the examiners did not increase over time. The differences between Raters A and B may have been accentuated because the examinations were performed with a low frequency (one examination every two weeks).

CONCLUSION

Good interexaminer reliability was obtained for some items in the inspection of muscle atrophy, in sensitivity testing, and in the tests where radicular pain was provoked or relieved. Many tests for tenderness to palpation revealed poor reliabil­ity. Interexaminer reliability likely could improve with better operational definitions and testing procedures (eg, the use of model pictures of different stages of atrophy in different muscles for the inspection of muscle atrophy and the practice with a dynamometer to perform palpations and axial manual traction with a standardized force). Palpation pressure and traction force also could be measured. Such considerations hardly can be applied to clinical practice, but should be considered when physical examinations are used for research purposes.

Acknowledgments. I am indebted to Sisko-Tuulikki Kuuttinen for her support and willingness to perform the physical examinations; Tuula Nurminen, MSc, and Markku Nurminen, DHSc, for their assistance with the statistical analysis; and Merja Tolvanen for her assistance in data processing.

REFERENCES

1. Godfrey S, Edwards RHT, Campbell EJM, et al: Repeatability of physical signs in airways obstruction. Thorax 24:4-9, 1969

2. Castell DO, O'Brien KD, Muench H, et al: Estimation of liver size by percussion in normal individuals. Ann Intern Med 70:1183-1189, 1969

3. Gonnella C, Paris SV, Kutner M: Reliability in evaluating passive interver­tebral motion. Phys Ther 62:436-444, 1982

4. Nelson MA, Allen P, Clamp SE, et al: Reliability and reproducibility of clinical findings in low-back pain. Spine 4:97-101, 1979

5. Waddell G, Main CJ, Morris EW, et al: Normality and reliability in the clinical assessment of backache. Br Med J [Clin Res] 284:1519-1523, 1982

6. Alaranta H: Factors Defining Impairment, Disability and Handicap in a Population of Patients Examined One Year Following Surgery for Lumbar Disc Herniation (translated from Finnish). Turku, Finland, Social Insurance Institution, 1985

7. Kendall HO, Kendall FP, Wadsworth GE: Muscles, Testing and Function, ed 2. Baltimore, MD, Williams & Wilkins, 1971

8. Travell JG, Simons DG: Myofascial Pain and Dysfunction: The Trigger Point Manual. Baltimore, MD, Williams & Wilkins, 1983

9. Spurling RG, Scoville WB: Lateral rupture of the cervical intervertebral discs: A common cause of shoulder and arm pain. Surg Gynecol Obstet 78:350-358,1944

10. Wells P: Cervical dysfunction and shoulder problems. Physiotherapy 68:66-73, 1982

11. Davidson Rl, Dunn EJ, Metzmaker JN: The shoulder abduction test in the diagnosis of radicular pain in cervical extradural compressive monoradi-culopathies. Spine 6:441-446, 1981

12. Fleiss JL: Statistical Methods for Rates and Proportions, ed 2. New York, NY, John Wiley & Sons Inc, 1981, pp 100-111, 212-426

13. Walter SD: Measuring the reliability of clinical data: The case for using three observers. Rev Epidemiol Sante Publique 32:206-211, 1984

14. Landis JR, Koch GG: The measurement of observers' agreement for categorical data. Biometrics 33:159-174, 1977

15. Iddings DM, Smith LK, Spencer WA: Muscle testing: Part 2. Reliability in clinical use. Phys Ther Rev 41:249-256, 1961

1532 PHYSICAL THERAPY