classifying patients’ need for and use chan kim, elaine

Classifying Patients’ Need for and Useof Pain Relief Medication

Chan Kim, Elaine Lee, Vishash Verma

Abstract

We analyzed data collected on post-operation patients who used patient-controlled analgesicsto accomplish two goals. The first goal was to see whether the patients fell into distinct clustersbased on their pattern of pain across time; the second was to determine whether the patients whoused hydromorphone and those who used morphine were different in some way. We clusteredthe patients using hierarchical clustering, curve clustering, multidimensional scaling, and HiddenMarkov Models. Based on the analyses, the patients seem to have fallen into three or fourclusters based on their pattern of pain across time, and there does seem to be a differencebetween the hydromorphone and morphine patients.

Introduction

The data in this study was provided by Dr. Alon Ben-Ari, an anesthesiologist at the Universityof Pittsburgh Medical Center. He has 32 patients who underwent a knee operation and then useda patient-controlled analgesic, or PCA, while recovering in their hospital room. A patient-controlled analgesic is an electronic device loaded with an analgesic or painkiller; when a patientis in pain, he can press a button attached to the PCA to receive an injection of analgesic. Four ofDr. Ben-Ari's patients used PCAs containing morphine; the other 28 got hydromorphone. ThePCA will not administer the drug more than once every eight minutes, so not every button pressresults in an injection. A button press that resulted in an injection will be called "successful."

Data on the gender, age, and body mass index (BMI) of the patients was also provided.

The analysis of the data had two goals. The first goal was to determine whether the patientscould be grouped in such a way that patients in the same group had similar patterns of painacross time, i.e., intense pain at similar times, no pain at similar times, etc. The second goal wasto find whether the hydromorphone patients differed from the morphine patients in their PCAusage. The analysis suggests that there are three or four different clusters of patients and that thehydromorphone patients do differ in their PCA usage from the morphine patients.

Below is a detailed description of the data, followed by discussions of the methods andconcepts used and the results arrived at. Clustering was attempted in various ways and a hiddenMarkov model was fit to the data.

Description of the Raw Data

The PCA records in a log file the time of each button press and whether it resulted in aninjection. Below is a part of one of the patients' log files.

1

"07/14/2009 13:41:00",,,,,"PCA DENIED""07/14/2009 13:41:00",,,,,"PCA END: 0.2 mg""07/14/2009 13:41:00",,,,,"PCA DENIED""07/14/2009 13:42:00",,,,,"PCA DENIED""07/14/2009 13:42:00",,,,,"PCA DENIED""07/15/2009 01:16:00",,,,,"PCA DENIED""07/15/2009 01:16:00",,,,,"PCA DENIED""07/15/2009 01:16:00",,,,,"PCA END: 0.2 mg"

"PCA DENIED" indicates that the analgesic was not administered; "PCA END: 0.2 mg"indicates that 0.2 mg of the analgesic was administered. The raw data is the patients' log files.Some of the log files, like the one above, revealed that some button presses that occurred morethan eight minutes after the last successful button press were unsuccessful even though theyshould not have been. It remains uncertain why this happened. This did not present any problemsfor the analysis; what is of interest is the pattern of pain or demand for the analgesic, which isreflected in the pattern of button presses, successful and unsuccessful.

Exploratory Data Analysis

There were 32 patients, and we have demographic information on their operation date, age,BMI, ethnicity, and kind of painkiller used (either hydromorphone or morphine). We arecompletely missing all the demographic data on patients 19 and 20. There are 4 missing entriesin our demographic data, one patient missing both age and BMI and two patients missing BMIvalues. Of the 30 patients, 16 were male and 14 were female patients.

Figure 1: Histograms of patients' age and body mass index

Figure 1 shows the histograms of age and BMI of patients. We observe from the first

2

histogram that most of the patients are old, with the mean value of age being roughly 62 years.The minimum and maximum ages are 21 and 82, respectively, and the patients with ages 21 and31 seem to be outliers. The standard deviation of the patients' ages is about 12.5 years. Thehistogram of patients' BMIs shows that it is bimodal with a peak between 25 and 30 and anotherbetween 35 and 40 and that the BMIs range from 21.50 to 48. The standard deviation of BMI isabout 7.6 with mean 35.17.

Figure 2: Length of hospitalization and total button presses.

Above is a scatterplot showing length of hospital stay and total number of button presses foreach patient. We observe that the distribution of the length of hospitalization is somewhatskewed to the left with a possible outlier having length greater than 1400 minutes. According tothe y-axis, the largest number of times the button was pressed was near 250, with most patientspushing the button between 0 and 50 times.

Part 1: Analysis Without Regard to Varying Dimensionality

The PCA log file for each patient was converted to a vector, each of whose entriescorresponds to a time elapsed since usage started and contains the number of times the patientpressed the button during that time.

Not every patient in the study was hospitalized for an equal length of time. Consequently, notevery patient was using the PCA for the same amount of time either. Performing traditionalmultivariate analysis on the data set requires vectors of the same length. To address this issue,

3

each vector was extended by adding extra entries to the end so it became the same length as thelongest vector in the set. These entries were populated a couple different ways in an attempt toverify that the results yielded were independent of the padding scheme. The entries werepopulated with 0's for one set of analysis and -1's for another, referred to as "0-padded" and"-1-padded," respectively.

Another way to address the unequal lengths of the vectors was to create new vectors reflectingthe demand in each portion of the stay. Vectors with 10 entries were created, where the ith (1 <=i <= 10) entry contains the total demand for the ith leg of the patient's stay. This set oftransformed data will be referred to as "10-Binned". We believe 10 entries is a suitablegeneralization of each patient's stay while preserving any specific unique behavior they mayhave at each stage of their stay.

Clustering

To gain insight into the shape of the data set containing demographic labels and drug demand,K-means clustering using Euclidean distance was performed. K-means is a clustering techniquethat uses observations represented by their vectors. The user specifies how many clusters k-means should return. K-means begins by selecting n observations to serve as cluster centers,where n is the number of clusters that k-means needs to return. For a given observation, k-meansdetermines its distance from each of the cluster centers and assigns it to the cluster belonging tothe closest center. Once this has been completed for all observations, the new cluster centers arerecalculated. K-means repeats this procedure of assigning observations to clusters based onshortest distance from a cluster center until the assignments don't change. Due to its reliance ondistance from a cluster center to make its assignments, k-means is best for finding clusters thatare spherical in nature. The accuracy of K-means tells how spherical the clusters, bydemographic labels, are; the higher the accuracy, the more spherical the clusters are. K-meansclustering was performed on the 10-Binned data set to determine if it's possible to distinguishpatients by gender, ethnicity, or drug used. Patient data was missing for 2 patients so only 30patients' data were used. The results are shown in the below contingency tables:

Opioid (H: hydromorphone, M: morphine)1 2

H 3 23M 4 0

Ethnicity (B: black, W: White)1 2

B 0 4W 7 19

Gender (F: female, M: male)1 2

F 6 10M 1 13

4

The partition distinguished between the different drugs effectively. All of the patients usingmorphine were placed in one cluster while all but 3 patients using hydromorphone were placedin the other cluster. As for distinguishing between ethnicity and gender, the accuracy for both is76.7% but it assumes that both ethnicities and both genders are in one cluster.

Hierarchical (agglomerative) clustering was performed on 10-Binned, -1-padded, and0-padded vectors. Hierarchical clustering starts with each observation, represented by theirvector, in its own cluster. It groups together the two clusters that have the shortest distancebetween them at each iteration until all of the clusters belong in one cluster. A dendrogram is agraphical representation of the sequence of cluster groupings. Euclidean distance was used todetermine distance between two clusters. Hierarchical clustering was performed to see whichobservations are similar or different from each other. The most informative results are shownbelow:

Figure 3: Dendrograms depicting hierarchical clustering results

The results of hierarchical clustering are in Figure 3. 10-Binned data (left), using completelinkage, fall primarily into one large group with three smaller groups. The group on the leftcontains patient 24, a morphine patient. Both of the patients in the next cluster to the right weremorphine patients. The partition seems to suggest that the morphine patients were somehowdifferent from the hydromorphone patients. -1-Padded data (center), using complete linkage, fallinto three groups. Patient 24 is again in its own group. Finally, the 0-padded data (right) was notpartitioned well by hierarchical clustering at all. The bias introduced by the padding contributedto the irregularity seen in its dendrogram. When reading the numbers in order from the bottom-

5

most leaf to the top, they correspond to demand in increasing order. This clustering is useless.Below is a confusion matrix for the 10-Binned and -1-Padded partitions. The (i, j)-cell gives thenumber of patients who fall into cluster i of the 10-Binned partition and cluster j of the-1-Padded partition. Cluster 3 of the 10-Binned partition contains 23 patients, and cluster 2 of the-1-Padded partition contains 22 patients; they share 17 patients. There is substantial agreementbetween the partitions, but they still are quite different, not least because they contain a differentnumber of cells or clusters. Interestingly, in both partitions, patient 24 is in its own cell.

Cluster 1 2 31 1 0 02 0 0 23 0 17 64 0 5 1

Multidimensional Scaling

Multidimensional scaling (MDS) was performed on both the -1-padded and the 0-paddedvectors. MDS techniques project data into a space with dimension smaller than that of theoriginal data in such a way that the distance between the projections of any two data points isclose to the distance between the points. More technically, basic MDS projects the data into alower-dimensional space so as to minimize the sum over all pairs of points of the squareddifference between the distance between points and the distance between projections. If the datais projected into two dimensions, then it is possible to cluster it by eye. For this reason, we usedMDS to represent the -1-padded, 0-padded, and 10-binned vectors in two dimensions as below.

6

Figure 4: Multidimensional scaling for different vector types

The numbers in the plots are the numbers assigned to the patients. The plot on the left at thetop indicates that patient 24 may be different from the other patients in some way. The plot onthe right at the top suggests that the morphine patients, patients 23, 24, 25, and 34, may bedifferent in some way from the hydromorphone patients, whose projected data points mostly lievery close to each other. The third plot suggests the same.

7

Part 2: Analysis With Regard to Varying Dimensionality

While the above analysis has provided insights on natural groupings in the data, the temporalassociation has largely been ignored. By incorporating the sequential aspect of the data, a newset of analysis techniques become available that have the potential to better characterize patients.

Clustering of Nonparametric Density and CDF EstimatesUsing the Flexmix package in R

In order to observe each patient's use of PCA with respect to time, we developed the non-parametric density estimation curves of each patient's button presses over the course of their stay.Below are the density estimate plots for each patient titled with their patient numbers. We firstdeveloped a vector for each patient with a length of their hospitalization in minutes, then put thenumber of button presses at each given minute. Then we produced the density curves with thevectors by setting x-axis as the time of hospitalization in minutes and y-axis as density. We useddefault bandwidth values, therefore the bandwidth are different for each patient. We immediatelyobserve that there are some similar patterns in the shape of the density curves. There are severalbimodal curves, such as patients 2, 4, 8, 10, 11, 13, 22, and 23. Some of the other curves showtrimodal distributions (patients 5, 12, and 16) and skewed distributions with tails (patient 15, 24,and 28). We also observe that patient 1 and 20 are different from other patients in that the densitycurve is not continuous; there is a long pause in the button presses during his hospitalization, andmany presses occur during a short time interval, causing a magnitude of peak greater than otherpatients.

8

Figure 5: Non-parametric density estimate curves of each patient's button presses

Having observed patterns among the density estimates of the patients, we decided to usetrajectory clustering of the curves. Trajectory clustering could be beneficial in that it allows us tocluster the different size of data. Therefore, in our case, we do not have to adjust the differentlength of hospitalization for each patient by padding vectors or scaling the length ofhospitalization.

An R package called flexmix is used for trajectory clustering. However, instead of clusteringthe curves in a non-parametric manner, flexmix finds clusters based on a regression equation byimplementing a general framework for finite mixtures of regression models. Parameters areestimated in the flexmix function by the Expectation Maximization algorithm. Since it is difficultto develop a regression model for densities of button presses, the cumulative representation ofeach patient's button use was developed. Instead of having y-axis as a number of button presses

10

at a given time, a counting process was used to have cumulative number of counts until giventime for y-axis. Then the range of y-axis was converted from 0 to 1, showing cumulativedistribution function (CDF) of each patient.

Another advantage of using flexmix is that we could add other variables to the regressionmodel for clustering. Each patient's age, body mass index, and sex in the regression modelbesides the response CDF of the button press and time. However, we are missing somedemographic information for two patients -- 19 and 20. Therefore, patients with missingness indemographic variables were excluded from using flexmix. Two different regression models weretried. The first model regresses the button press on time, BMI, age, and sex of patients. Theclusters returned from this model would be a straight line with different slopes and coefficientsfor different clusters. The other model regresses the button press on all the variables as the firstmodel, with the addition of time^2. This model will add some curvatures to the clusters,therefore capturing the shapes of the CDF curves better.

Figure 6: Trajectory clustering using flexmix model: press~time+bmi+age+sex

The plots in Figure 6 show the result of clustering using a linear model. Four trajectories werereturned, each cluster holding 20 patients, 3 patients, 3 patients, and one patient. We observe thatstraight lines do not capture the variability of the CDF curves.

Figure 7: Trajectory clustering using flexmix model: press~time+time^2+bmi+age+sex

The plots in Figure 7 show the result of clustering using a square model. Another fourtrajectories were returned, each cluster holding 1 patient, 3 patients, 21 patients, and 2 patients.

11

The distribution of number of patients is similar for both models. First plot shows that thesquared term of time in the model allowed the cluster to adjust for the flat part of the patient'sCDF. Other clusters show more or less straight lines with some curvatures as time increases.Therefore, again, the regression model failed to capture the variations of CDF curves of eachpatient.

Using the CCToolbox package in Matlab

Instead of using the CDF curves of the patients, we utilized the Curve Clustering Toolboxpackage in Matlab to cluster the PDF curves of the patients. The Curve Clustering Toolboximplements a family of probabilistic model-based curve-aligned clustering algorithms. There is anumber of different mixture models, but for our procedure, we will be using the spline regressionmixture model for continuous curve alignment. CCToolbox also utilizes the EM algorithm. Forour splines mixture model, we defined the knots value to be 49, which is a suggested defaultvalue, and there were 3 clustered that were returned as a result.

12

Figure 8: Density curves in each cluster and the mean curves of the densitiesin each cluster in bold red line

Of the 32 curves, CCToolbox clustered 12 density curves in one cluster, 8 curves in anothercluster, and the remaining 12 curves in another cluster. Above plots show the original densitycurves in each cluster and the mean of those curves in the bold red line. The first plot, labeled ascluster 1, includes the patients 1 and 20, who have higher peaks compared to other patients. Toobserve the distributions of other patients more clearly, another plot was made with reduced y-axis for a better view of the curves.

We observe from the plots CCToolbox clusters 2 and 3 that most of the patients with thebimodal distributions are classified together. Bold average curves show that cluster 2 has a

13

bimodal distribution with slightly higher peak on the left, and that cluster 3 has a bimodaldistribution with slightly higher peak on the right. On the other hand, the first cluster iscomposed of wide variety of curves, including our outlier patients 1 and 20. The mean curve ofthe densities in the first cluster shows several peaks, but it overall increases until 400 minutesand starts to decrease.

The use of CCToolbox was unsuccessful in extracting the density curves with a single mode,such as patients 15, 24, 28, and so on. These patients were classified in one of the 3 clustersabove, as supposed to having a separate cluster. It is possible that since there are not that manypatients with a unimodal distribution, their PDF curves were treated as being similar to thebimodal distributions during the clustering process by the CCToolbox.

Using flexmix package, the PDF curves had to be transformed to the CDF curves in order toperform the classification. As a result, since the CDF curves look quite similar throughout thepatients and it is difficult to develop a regression model that captures the curvatures and thevariability of the CDF curves, flexmix seemed inappropriate to cluster the curves. On the otherhand, using the splines model in the CCToolbox, we were able to treat the PDF curves with notransformation, and it returned 3 clusters for the density curves of the patients. From observingthe average density lines of each cluster, we could predict the behavior of patients by 3 differentpatterns: gradually demanding more opioids until about 400th minute and then gradually stopdemanding, as suggested by cluster 1; having two high rate of demand for opioids during thehospitalization, as indicated by clusters 2 and 3 of the CCToolbox.

Hidden Markov Models

Given the temporal nature of the data, it is natural to determine whether they imply somelatent phenomena. It was hypothesized that the temporal demand data will provide informationon when a patient is in pain and when a patient is comfortable. Hidden Markov Models (HMM)were selected to model this. HMMs are like Markov models in that there are a sequence of statesand state transition probabilities. However, these things are not observable in HMMs. Instead, asequence of observations is visible. The observation sequence is dependent on the sequence ofstates. Here, the observations are the demand and the hidden (unobservable) states are the painstatuses (pain or comfortable).

One of the assumptions of HMM states that the current observable state does not influencefuture hidden states. This assumption does not necessarily hold because demand sometimesresults in drug administration, which definitely influences future demand and pain status. Toaddress this issue, additional observable states and hidden states were created, all of whichindicate whether drugs were administered in the previous time period (in this case, previousminute). The set of observable states are: (demand, no drug in previous state), (no demand, nodrug), (demand, drug), (no demand, drug). The set of hidden states are: (pain, no drug inprevious state), (comfortable, no drug), (pain, drug), (comfortable, drug).

The Baum Welch algorithm was used to estimate the transition probabilities matrix for the

14

hidden states and the emission probabilities matrix. The emission probabilities matrix gives theprobabilities of being in a particular observable state conditioned on a given hidden state. Toinitialize the algorithm, the following matrices were provided:

Transition Probabilities Matrixp = pain, c = comfortable; n = no drug, d = drug

p,n p,d c,n c,dp,n 0.7 0.1 0.1 0.1p,d 0.7 0.2 0.05 0.05c,n 0.5 0.3 0.1 0.1c,d 0.2 0.2 0.4 0.2

Emission Probabilities Matrix(0,0) = no demand, no drug, (0,1) = no demand, drug, (1,0) = demand, no drug, (1,1) = demand,drug

0,0 0,1 1,0 1,1p,n 0.1 0 0.9 0p,d 0 0.7 0 0.3c,n 0.85 0 0.15 0c,d 0 0.6 0 0.4

The initial transition probabilities matrix was constructed arbitrarily because it will convergeto a proper estimate during the algorithm's run. The emission probabilities matrix wasconstructed logically such that virtually 0 probability was assigned to states in which the drug/nodrug aspect of the hidden and observable states are not the same. It also reflects intuition thatpatients who are comfortable probably are not demanding drugs and those who are in pain willprobably continue to demand drugs.

The Baum-Welch algorithm was performed on each patient's sequence of drug demand withrespect to time (in minutes) and taking into account when drug was administered. A HMM wascalculated for each patient, except for patients 1 and 23 due to a nonconvergence error. Usingonly the transition probabilities matrix and emission probabilities matrix, a vector of length 32was created, containing the entries in both of the 4x4 matrices. Each patient is now representedby this vector. There are 30 total vectors.

Multidimensional scaling was performed to reduce the vectors to two-dimensions in order toget an idea of the group structure. All 30 vectors are plotted below.

15

Figure 9: Multidimensional scaling on vectors representing HMM matrices

Using demographic values (age, gender) and drug type, we colored the observations todetermine if the group structure in the plot corresponded to them. As stated earlier, patients 19and 20 do not have this information, so their presence is omitted from the plots. It appears thatthe group structure probably does not correlate to age, gender, or drug type.

Figure 10: Multidimensional scaling plots with demographics and drug type indicated.

K-means clustering was performed on the vectors to determine if they can cluster patient drugdemand density curves. To determine the best number of clusters n, n = 2 through 6 were used.The total within cluster sum-of-squares for each set of partitions was used to determine that n = 4would be the best value to create meaningful clusters.

16

Figure 11: K-means within sum-of-squares (left) with the 4 clusters indicated (right)

The 4 clusters of density curves are given below. The curves that are colored red correspondto patients who had morphine administered to them. Again, it is unknown what drug wasadministered to patients 19 and 20 so their curves are colored blue. The black curves correspondto patients who received hydromorphone.

Cluster 1:

These densities are distinctive in that they possess at least one very sharp peak at or before 500minutes.

Cluster 2:

17

Drug demand is high at all times of the stay.

Cluster 3:

All of these densities have peaks early during the course of the stay.

Cluster 4:

18

These densities tend to have multiple local maxima.

Judging by the density curves, the patients within each cluster did not have any immediatelyapparent commonalities.

When constructing an overall HMM for all of the observations, we have the followingresulting matrices.

Transition matrix:State 1 State 2 State 3 State 4

State 1 0.52369519 1.165936e-09 0.4643603612 1.194444e-02State 2 0.02775888 8.896852e-01 0.0082478338 7.430811e-02State 3 0.65124380 3.192702e-10 0.3487536097 2.587727e-06State 4 0.04603114 9.539056e-01 0.0000632308 4.654105e-156

Emission probabilities matrix:0,0 1,0 1,1 0,1

State 1 0.999236763 0.0005757242 1.875125e-04 0State 2 0.877804127 0.1221958730 1.272086e-17 0State 3 0.949809599 0.0001705347 5.001987e-02 0State 4 0.002868201 0.0818963137 9.152355e-01 0

It is unknown which hidden states that each State corresponds to, but the emissionprobabilities matrix shows some indication of impossibilities that we had hoped it would detect(eg: a value close to 0 if the drug administered in previous minute information is different for apair of observable and hidden states).

Conclusion

19

We used a variety of methods to attempt to characterize patients by their PCA usage. Wefound that these different methods gave different partitions of the patients, though all divided thepatients into three or four groups. Many of the methods separated the hydromorphone patientsfrom the morphine patients.

An idea to pursue in the future is to predict the total analgesic use of a new patient by findingwhich of the original 32 patients the new patient is most similar to. This would give the medicalstaff an idea of what to expect from a new patient based on the data.

20

References

Leisch, Friedrich. "A General Framework for Finite Mixture Models and Latent ClassRegression in R." Journal of Statistical Software 11, no. 8 (2004): 18.

Igor, Cadez, Scott Gaffney, and Padhraic Smyth. A General Probabilistic Framework forClustering Individuals. University of California, Irvine. (2000)

21

classifying patients’ need for and use chan kim, elaine

Documents