statistical methods for knowledge discovery in adverse drug reaction surveillance197004/... ·...

Statistical methods for knowledge discovery in adverse drug reaction surveillance

Statistical methods for knowledgediscovery in adverse drug reaction

surveillance

G. Niklas Norén

Stockholm University

c© G. Niklas Norén, Stockholm 2007

Cover photography by G. Niklas Norén

ISBN 91-7155-411-4 pp. 1–41

Typeset by LATEX

Printed in Sweden by Universitetsservice AB, Stockholm 2007

Distributor: Department of Mathematics, Stockholm University

Abstract

Collections of individual case safety reports are the main resource for earlydiscovery of unknown adverse reactions to drugs once they have been intro-duced to the general public. The data sets involved are complex and based onvoluntary submission of reports, but contain pieces of very important informa-tion. The aim of this thesis is to propose computationally feasible statisticalmethods for large-scale knowledge discovery in these data sets. The main con-tributions are a duplicate detection method that can reliably identify pairs ofunexpectedly similar reports and a new measure for highlighting suspecteddrug–drug interaction.

Specifically, we extend the hit-miss model for database record matchingwith a hit-miss mixture model for scoring numerical record fields and a newmethod to compensate for strong record field correlations. The extendedhit-miss model is implemented for the WHO database and demonstrated tobe useful in real world duplicate detection, despite the noisy and incompleteinformation on individual case safety reports. The Information Componentmeasure of disproportionality has been in routine use since 1998 to screen theWHO database for excessive adverse drug reaction reporting rates. Here, it isfurther refined. We introduce improved credibility intervals for rare events,post-stratification adjustment for suspected confounders and an extensionto higher order associations that allows for simple but robust screening forpotential risk factors. A new approach to identifying reporting patternsindicative of drug–drug interaction is also proposed. Finally, we describe howimprecision estimates specific to each prediction of a Bayes classifier may beobtained with the Bayesian bootstrap. Such case-based imprecision estimatesallow for better prediction when different types of errors have differentassociated loss, with a possible application in combining quantitative andclinical filters to highlight drug–ADR pairs for clinical review.

List of Papers

This thesis is based on the following original publications, which are referredto in the text by their Roman numerals.

I Norén, G. N., Orre, R., Bate, A., Edwards, I. R. (2007). Duplicatedetection in adverse drug reaction surveillance. Data Mining andKnowledge Discovery. Published on-line.

II Norén, G. N., Bate, A., Orre, R., Edwards, I. R. (2006). Extend-ing the methods used to screen the WHO drug safety databasetowards analysis of complex associations and improved accuracyfor rare events. Statistics in Medicine, 25(21):3740–3757.

III Hopstadius, J, Norén, G. N., Bate, A., Edwards, I. R. (2007).Adjustment for potential confounders in adverse drug reactionsurveillance. Submitted for publication.

IV Norén, G. N., Sundberg, R., Bate, A., Edwards, I. R. (2007). Astatistical methodology for drug–drug interaction surveillance.Submitted for publication.

V Norén, G. N., Orre, R. (2005). Case based imprecision estimatesfor Bayes classifiers with the Bayesian bootstrap. MachineLearning, 58(1):79–94.

Reprints of I, II and V were made with kind permission from the publishers.

Contents

Part I: Thesis summary1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Adverse drug reaction surveillance . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Individual case safety reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 The WHO database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Adverse drug reaction signal detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Knowledge discovery in adverse drug reaction surveillance . . . . . . . 113.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Disproportionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.4 Shrinkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.5 Pattern discovery and detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.6 Facilitating interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.7 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Overview of the papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.1 Paper I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Paper II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.3 Paper III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.4 Paper IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.5 Paper V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Part I:

Thesis summary

1. Introduction

It is in the nature of pharmaceutical development that the full safety profileof a new medicinal product will not be known at the time it is introduced tothe general public. Because randomised clinical trials are limited in both thetypes and numbers of patients exposed, continued safety monitoring of drugsis in the interest of patients, regulatory authorities and pharmaceutical compa-nies (Finney 1966, Evans 2000). Individual case safety reports are submittedby health professionals based on suspected adverse drug reaction (ADR) in-cidents (Edwards and Aronson 2000) observed in real world clinical practice.They remain one of the best resources for early post-marketing discovery ofpotential public health or patient safety issues. They are rich sources of in-formation, but anecdotal in nature. The reliance on voluntary submission, thevariation in quality of information and the large number of new reports sub-mitted to national and international organisations every year provide a rangeof interesting statistical challenges.

1.1 Aim

The overall aim of this thesis is to propose improved statistical methods forknowledge discovery in collections of individual case safety reports. I pro-poses a new method for automated duplicate detection based on the hit-missmodel introduced for statistical record linkage (matching records across datasets) by Copas and Hilton (1990). An extended hit-miss model that handlesnumerical record fields and compensates for correlations between record fieldsis implemented for the WHO database and demonstrated to be useful in realworld duplicate detection. II proposes improved credibility intervals, a post-stratification approach to adjustment for confounding variables and an exten-sion to higher order associations for the Information Component (IC) measureof disproportionality used to screen the WHO database for excessive ADR rel-ative reporting rates. III demonstrates that the post-stratification adjustmentof the observed-to-expected ratio for suspected confounders adopted for theIC in II may lead to spurious underestimation in the presence of any verysmall strata in a stratified data set. A comparison to a literature reference in-dicates that while routine adjustment for some potential confounders in firstpass screening of collections of individual case safety reports does improve

1

performance, the magnitude of this improvement is modest compared to theimprovement from a triage (prioritisation) criterion requiring reports from atleast two countries before a drug–ADR pair is highlighted for clinical review.This suggests that confounding may have less impact on the analysis of indi-vidual case safety reports than previously believed. IV introduces a new mea-sure of drug–drug interaction for collections of individual case safety reports.Unlike methods proposed previously for this purpose, it defines interactionas departure from a baseline model with independent attributable risk. V in-troduces a Bayesian bootstrap method for estimating the uncertainty in Bayesclassification associated with each individual prediction. We demonstrate howthis information can be used to improve performance, when different types oferrors have different associated loss, with a possible application in selectingdrug–ADR pairs for detailed clinical review.

2

2. Adverse drug reaction surveillance

The analysis of individual case safety reports is the cornerstone of early post-marketing ADR detection (Rawlins 1988). Whereas large, formal drug safetystudies are useful to test specific hypotheses related to drug safety, they are notsuitable for continuous monitoring with the aim of detecting previously un-suspected ADRs, as early as possible. In the context of this PhD thesis, ADRsurveillance refers exclusively to drug safety monitoring based on individualcase safety reports. It thus excludes other post-marketing efforts such as theintensive monitoring programs of New Zealand and the United Kingdom, aswell as safety monitoring based on health registries and hospital-based safetymonitoring. For comprehensive overviews of post-marketing ADR surveil-lance, see Lindquist (2003) and Bate (2003).

2.1 Individual case safety reports

Individual case safety reports communicate genuine clinical concerns fromobservant health professionals (Edwards 1999). As they are based on actualpatients in real world clinical practice, their collection and analysis increasethe chance to discover ADRs that are due to drug–drug interaction, affect pa-tients with certain medical predispositions or that belong to patient subgroupsthat tend to be excluded from pre-marketing clinical trials, such as children orpregnant women. In addition, the large numbers of patients exposed and theunlimited follow-up time available considerably increase the chance to detectADRs that are rare or that occur only after extended periods of use.

An example of an authentic individual case safety report is provided in Fig-ure 2.1. Much of the information on these reports can be originally providedas free text, some of which is later encoded as structured information upondatabase entry. This is usually done by trained personnel at pharmaceuticalcompanies or at national authorities. The encoding of observed ADR incidentsin terms of standardised terminology is a critical part of the preprocessing.One potential pitfall is the risk of misinterpretation when the ADR encodingis performed by someone who has never actually met the patient. Variationin coding across regions and time periods may lead to systematic differencesthat can affect subsequent data analysis. A general problem is that severalADR terms are often applicable to a given incident. Thus, exploratory analysis

3

Figure 2.1: Sample individual case safety report. Reprinted with kind permission ofthe Adverse Drug Reactions Unit at the Therapeutic Goods Administration of Aus-tralia

4

focusing on single ADR terms may fail to include all relevant reports — a phe-nomenon which has been referred to as ‘signal fragmentation’ (Purcell 2003).While in the follow-up of specific issues, this can be remedied by specifyinggroups of relevant ADR terms for the issue of interest, it is not obvious howsuch strategies can be easily automated for routine exploratory analysis.

Individual case safety reports refer to suspected ADR incidents and some ad-verse events observed in association with drug prescription will in reality becoincidental, due to concomitant medication or natural progression of the un-derlying disease. At the same time, not all ADR incidents that actually occurare identified as such and eventually reported to the national drug safety cen-tres. The degree of under-reporting is unknown but can be expected to varywith the severity of the suspected ADR, across geographical regions and timeperiods. There may also be variation in the propensity to report suspectedADRs during the life-span of a drug and in response to any attention to sus-pected drug safety issues in the public or scientific media. The categories ofhealth professionals who are allowed to submit reports also differ over timeand between regions. Some countries allow only medical doctors to submitreports, whereas others accept reports from medical nurses and pharmacistsas well. In addition, some countries encourage direct consumer reporting. Un-surprisingly, the propensity to report suspected ADRs of different types variesconsiderably between different categories of reporters (Savage 1985).

An important characteristic of individual case safety report submission is thatseparate reports sometimes have a common origin and therefore cannot beconsidered as independent pieces of information (Finney 1973). This maydistort automated knowledge discovery and mislead clinical review. The mostobvious cause of non-independent reports is report duplication, where a sin-gle suspected ADR incident results in several reports. This phenomenon isdiscussed at some length in I. More subtle examples include groups of reportsprovided by the same health professional, such as those from the Norwegiandentist discovered in I, reports from the same clinical study (sometimes mis-labelled as spontaneous reports) or separate reports for the same patient at dif-ferent points in time. If single individuals are responsible for encoding largenumbers of reports, this may also induce superficial similarity between re-ports. A potential example of this is the group of over 600 very similar reportsoriginally collected by a single law firm, discovered in IV. Violated indepen-dence assumptions differ from other data quality issues in that they do notrelate to the quality of single reports, but to the quality of collections of re-ports. Even upon the confirmation that a pair of reports are indeed duplicatesit is not obvious how to proceed: should the suspected duplicates be flagged orshould one of them perhaps be removed from the data set (if so, which one)?

5

Drugs ADRs

Anatomical TherapeuticClass

Reports System Organ Class Reports

Selective serotonin reup-take inhibitors (SSRI)

193,939 Body as a whole - generaldisorders

1,218,425

Antiinfl. prep. non-steroids for topical use(NSAID)

180,770 Skin and appendages dis-orders

1,070,189

Platelet aggregation in-hibitors excl. heparin

179,226 Gastro-intestinal systemdisorders

902,238

ACE inhibitors, plain 171,706 Central & peripheral ner-vous system disorders

853,883

Benzodiazepine deriva-tives

157,898 Psychiatric disorders 677,227

Table 2.1: The most commonly reported groups of drugs and ADRs in the WHOdatabase (note that each report may list more than one drug and more than one ADR).

2.2 The WHO database

The Uppsala Monitoring Centre maintains and analyses the world’s largestcollection of individual case safety reports. As of December 2006, the WHOdatabase contained over 3.8 million reports, with a current yearly growth ofover 200,000 reports (see Figure 2.2). The database is held on behalf of thecountries participating in the WHO Programme for International Drug Moni-toring, whose number has continued to grow from the founding 10 countriesin 1968 to over 80 member countries at the end of 2006. The internationalcoverage allows rare but important public health or patient safety issues to bedetected earlier after drug launch than if based on isolated analysis of nationaldata sets (Olsson 1998). Variation between countries in the range of avail-able drug substances, populations at risk, reporting culture and regulation mayinfluence relative reporting rates and make knowledge discovery more com-plicated in international data sets. At the same time, this diversity is an in-valuable asset in detecting public health or patient safety issues related to forexample ethnic or dietary ADR risk factors. Thus, even though most reports inthe WHO database come from the USA and other industrialised nations, theworldwide coverage of the WHO programme is perhaps its greatest strength.

As is clear from Figure 2.2, the vast majority of reports in the WHO databaseare so-called spontaneous reports that refer to observations in regular clinicalpractice. However, a small minority are from intensive monitoring programsor clinical studies. Such atypical reports should in principle be labelled as

6

1970 1975 1980 1985 1990 1995 2000 20050

1,000,000

2,000,000

3,000,000

4,000,000

Year

Num

ber

of r

epor

ts

a. Database growth

47%

12%

6%

5%

5%

5%

20%

United StatesUnited KingdomGermanyCanadaFranceAustraliaOther countries

95%

5%

Spontaneous reportsOther

b. Biggest contributors c. Types of reports

0 2 4 6 8 100

1,000,000

2,000,000

3,000,000

4,000,000

Number of drugs per report

Num

ber

of r

epor

ts

0 2 4 6 8 100

500,000

1,000,000

1,500,000

2,000,000

Number of ADRs per report

Num

ber

of r

epor

ts

d. Number of drugs per report e. Number of ADRs per report

0 10 20 30 40 50 60 70 80 90 100 1100

10,000

20,000

30,000

40,000

50,000

Patient age (years)

Num

ber

of r

epor

ts

f. Patient age distribution

Figure 2.2: Characteristics of the WHO database

7

such, but occasional mislabellings do occur. Thus, they cannot reliably beexcluded from the analysis.

Table 2.1 indicates what groups of drugs and ADRs have been reported mostoften during the entire life span of the WHO database. From Figure 2.2, itis clear that most reports list only one suspected drug and between one andfour ADRs, but there are reports that deviate from this general pattern, andlist very large numbers of drugs and ADRs. The most striking aspect of theempirical age distribution in Figure 2.2 is perhaps the large number of reportsfor children less than two years of age. A large proportion of these relate tosuspected adverse reactions to vaccines. Another interesting phenomenon isthe digit preference on 0 and 5 for encoding patient age.

2.3 Adverse drug reaction signal detection

The detection of early warnings related to potential public health or patientsafety issues is the main aim of collecting and analysing individual case safetyreports. In the context of ADR surveillance, the WHO defines a signal as:

"Reported information on a possible causal relationship between an adverseevent and a drug, the relationship being unknown or incompletely documentedpreviously. Usually more than a single report is required to generate a sig-nal, depending upon the seriousness of the event and quality of the informa-tion." (Edwards and Biriell 1994)

As is clear from the definition, single reports in isolation rarely motivate thecommunication of an early warning of a potential ADR, but there are excep-tional examples where single reports of very high quality do (Meyboom et al.1997). Particularly valuable pieces of information in this respect are those thatindicate the effect on the ADR of withdrawing the suspected medication (so-called dechallenge intervention), and the effect of re-exposing the patient tothe suspected treatment, after a successful dechallenge (so-called rechallengeintervention) (Edwards et al. 1990). Moreover, Aronson and Hauben (2006)argue that there are certain types of ADRs for which single, well documentedincidents may motivate early warning, much in the spirit of the triage algo-rithms proposed by Ståhl et al. (2004).

Early warning of a potential ADR is possible even in the absence of any in-dividually very strong reports, if there is a large enough number of reports onthe drug–ADR pair of interest (Edwards et al. 1990). This is true in particu-lar when alternative systematic explanations to excessive reporting rates, suchas reporting biases or strong confounding, can be dismissed and the relative

8

Figure 2.3: Signal detection process

reporting rate remains excessive even after suspected duplicates have beenremoved.

The aim of ADR signal detection is to generate, strengthen and refine hy-potheses related to suspected drug toxicity. Hypothesis testing is not possibleon account of the inherently non-systematic nature of data collection and thelack of proper comparison groups. In-depth clinical evaluation and scrutinyof reports remain at the core of the ADR signal detection process. How-ever, the WHO database receives tens of thousands of reports every monthand this massive inflow of reports require efficient computational methodsto help clinical experts focus on the groups of reports most likely to repre-sent important public health or patient safety issues (Meyboom et al. 2002).As indicated in Figure 2.3, the signal detection process in routine use on theWHO database consists of a combination of automated knowledge discov-ery methods (Bate et al. 1998), triage (prioritisation) algorithms and clinicalreview (Ståhl et al. 2004). The knowledge discovery methods highlight drug–ADR pairs with unexpectedly large numbers of reports relative to the averagereporting rates in the database. Triage algorithms use a combination of quanti-tative and qualitative information to focus attention on the most urgent issuesfor follow-up (Ståhl et al. 2004). Reports related to drug–ADR pairs singledout by the triage algorithm are forwarded to a panel of international experts forclinical review. In the context of the clinical review, pattern discovery methodsmay often be useful to profile larger groups of reports and suggest alternativeexplanations to observed excessive reporting rates. Hypotheses of suspectedADRs first highlighted in automated knowledge discovery that remain afterclinical review are routinely communicated to the drug safety community,and some have been published in the mainstream medical literature (Coulter

9

et al. 2001, Sanz et al. 2005). However, the risk of distortion from undiscov-ered data quality problems and the difficulty of obtaining complete, detailedinformation on reported ADR incidents mean that signals of suspected ADRsoften remain tentative, even after clinical review.

10

3. Knowledge discovery in adversedrug reaction surveillance

Vast improvements in data storage capacity over the last decades have spurredever increasing ambitions to analyse large, complex data sets not originallycollected for the purpose of statistical analysis. Such investigations requiredata analysis methodology that scales well with increasing amounts of dataand that focuses on discovery and exploration rather than on inference. Thisarea of research and application, on the border between mathematical statisticsand computer science, is referred to as knowledge discovery or data mining.

Fayyad et al. (1996) describe data mining as one step in a more general know-ledge discovery process. Mannila (1996) and Hand (1998) emphasise the sim-ilarity between data mining and exploratory statistical analysis, the latter char-acterising the difference as one primarily related to data set size and proper-ties: in data mining, contamination, nonstationarity and biases are standard.On account of the complex data sets involved, interpretability is often a mainconsideration, which may favour simplicity at the expense of prediction accu-racy (Glymour et al. 1997). An important dividing line is the choice betweenmodel based inference and algorithmic approaches (Breiman 2001). Whereasmuch of the research on knowledge discovery has been driven by computerscience, key contributions from the statistical community include the clarifi-cation of inferential processes underlying algorithmic methods, insight intothe bias–variance trade-off in determining model complexity, methods forquantifying uncertainty and placing emphasis of the impact on interpretationof potential distortions such as confounding or selection biases (Elder andPregibon 1996, Glymour et al. 1997, Efron 2001).

In contrast with the more rigid framework for hypothesis testing, knowledgediscovery is usually an interactive and iterative process of increasingly refinedhypothesis generation. In my view, it should combine an unintimidated atti-tude towards the analysis of problematic and complex data sets with a properunderstanding and clear statement of the limitations in nature and strength ofthe conclusions that can be drawn.

11

3.1 Context

Collections of individual case safety reports clearly contain important piecesof rich and very useful information (Finney 1973, Edwards 1997), but theyconstitute an inherently non-random sample. The presence of reporting bi-ases and violated independence assumptions discussed in Section 2 rendersummary statistics potentially deceptive. In particular, the presence of non-independent reports can lead to optimistic precision estimates and invalidatestandard tests for association (Finney 1971). As a consequence, the placefor statistical methodology in the analysis of collections of individual casesafety reports is somewhat out of the ordinary. Its main focus is on providinga framework for effective hypothesis generation and refinement, rather thanon hypothesis testing (Bate 2003). Methods for reliably identifying elevatedADR reporting rates in collections of individual case safety reports are alreadypart of routine drug safety signal detection (Ståhl et al. 2004). In the future,methods for highlighting suspected drug–drug interaction, groups of non-independent reports or reporting patterns involving larger sets of drugs andADRs should allow for even more sophisticated use of this valuable source ofinformation.

The emphasis on hypothesis generation and refinement applies throughout thisthesis: the aim of the record matching algorithm in I is to highlight likelyduplicates for manual review and the aim of II, III and IV is to determinethe most effective approach to highlighting apparently excessive ADR report-ing rates for further follow-up. The purpose of implementing the methods inV for prioritisation of drug–ADR pairs for further follow-up as discussed inSection 4.5 would also be effective hypothesis generation.

In knowledge discovery, large numbers of possible associations and patternsare considered simultaneously. Familywise error rates that reflect the probabil-ity that any highlighted association corresponds to a false positive are usuallyless relevant in this context, because all open-ended investigations are boundto produce some false positives. Performance is better evaluated in terms ofmeasures that indicate the proportion of false positives that can be anticipatedin a specific study, such as false discovery rates. In our work, we have used tworelated measures of performance from the literature on Information Retrieval:precision (the number of true positives over the sum of true and false posi-tives) and recall (the number of true positives over the sum of true positivesand false negatives). Precision–recall graphs that indicate how the precisionand recall vary by the threshold for clinical review are used in both I and III.They provide an informative overview of performance, independent of the se-lected threshold.

12

Figure 3.1: Exploratory analysis of collections of individual case safety reports

3.2 Process

Fayyad et al. (1996) define knowledge discovery as:

"The nontrivial process of identifying valid, novel, potentially useful, and ulti-mately understandable patterns in data."

The knowledge discovery process is not limited to actual data analysis butincludes: data collection, cleaning and preparation; reduction and projection;data analysis and interpretation, and finally dissemination, incorporation intoexisting structures and action based on discovered knowledge. It thus entailsthe entire ADR signal detection process outlined in Section 2.3, from the col-lection of reports and their pooling in an international database, through datapreparation and transformation including conversion from free text to struc-tured information, data cleaning and duplicate detection, via disproportional-ity analysis and triage algorithms to clinical review, and finally communica-tion to national centres, pharmaceutical companies and the general public.

The statistical methodology developed in the context of this thesis is appliedat two different stages of the knowledge discovery process for ADR surveil-lance, as indicated in Figure 2.3. On one hand, disproportionality analysis isa core component in screening for excessive ADR reporting rates in first passanalysis of the database. On the other hand, pattern discovery methods are

13

useful in assisting clinical review and highlighting interesting aspects of spe-cific groups of reports in more detailed investigations. Figure 3.1 proposes ageneral framework for such exploratory analysis. For the purpose of illustra-tion, assume that the data subset of interest consists of all reports involvinga particular drug D. At the outset of the exploratory analysis, simple descrip-tive information such as the total number of reports listing D and from whatcountries and during what time periods they have been submitted, may bevery useful. Together with lists of the most commonly co-reported drugs andADRs, as well as empirical distributions for patient age and gender, this pro-vides a descriptive overview of the reporting of D which can serve as a usefulreference for subsequent discoveries.

Experienced data analysts may react directly to descriptive information thatcontradicts their subject matter knowledge. For example, a domain expert fa-miliar with the WHO database may react to the observation that a suspiciouslylarge proportion of the reports in a subgroup of interest have been submittedfrom a country with a low overall reporting rate. The middle box in Figure 3.1is an attempt to formalise such comparative data analysis. Contrasts betweenthe group of reports of interest and a comparison group (e.g. the database asa whole or all reports involving a drug in the same class of drugs as D) pro-vide insight into what properties of the data subset differentiate it from thecomparison group. For example, it may turn out that the relative reporting rateof a rare ADR for D by far exceeds that in the database as a whole. Suchdiscrepancies may well be more enlightening than information on what themost commonly reported ADR is in absolute terms. The discussion of suchdisproportionality analysis is further extended in Section 3.3.

Both descriptive and comparative studies may be misleading when the groupof interest contains distinct subgroups. For example, if D is prescribed on onehand to young males and on the other hand to elderly females, the summary in-formation that the average patient age on reports listing D is 43 years and thatthe overall proportion of females is 52%, conveys a very insufficient overview.Clustering algorithms allow for automated partitioning of data, with the aim ofdetecting latent structure, and may allow for much more relevant subsequentdescriptive or comparative data analysis, as indicated by Figure 3.1.

In addition to the iteration of automated partitioning, description and compar-ison described above, there are other methods for pattern discovery in collec-tions of individual case safety reports. Record matching methods such as thatadapted for duplicate detection in I can be used to detect groups of unexpect-edly similar reports. Modified Hopfield networks and clustering algorithmssuch as those evaluated in Orre et al. (2005) may allow groups of often re-curring ADRs (syndromes) to be identified. Similarly, interaction detectionmethods such as those in II and IV can be used to highlight suspected ADRrisk factors.

14

It is rarely possible to specify at the outset of a large exploratory study, a fullyautomated, all-purpose approach to exploratory data analysis appropriate forall possible questions and patterns of potential interest. In addition, know-ledge discovery often produces results that relate not to the primary studyobjective, but to fundamental properties of the data or of the data collectionprocess. Thus, data cleaning and analysis are in practice intertwined, so thatthe correction of a data quality problem highlighted in initial data analysisallows for more refined subsequent data analysis. For example, in screeningthe WHO database for reporting patterns indicative of suspected drug interac-tion in IV, some larger groups of non-independent reports were highlighted.Their removal may allow for more accurate subsequent studies of drug–druginteraction in the WHO database.

3.3 Disproportionality

The frequency or relative frequency of a certain event (or set of events) ina database is sometimes of direct interest. However, in many knowledge dis-covery applications the discrepancy between the observed (relative) frequencyand its expected value under some baseline model is of greater interest. Anexample from the analysis of purchasing patterns in supermarket sales datais that even if milk is the product most commonly purchased together withthe product of interest, because this is true of most products, it may be moreenlightening to point out that, for instance, grapefruit juice is purchased fourtimes as often together with the product of interest as overall in the database.Such contrasts provide the basis of disproportionality analysis, which focuseson identifying events whose relative frequency in a given subgroup deviatessubstantially from the relative frequency of the same event in a given compar-ison group.

Most modern methods for screening collections of individual case safetyreports for excessive ADR reporting rates are based on disproportionalityrelative to the rest of the database. This is true of the InformationComponent (IC) (Bate et al. 1998), the Empirical Bayes Geometric Mean(EBGM) (DuMouchel 1999), the Proportional Reporting Ratio (PRR) (Evanset al. 2001) and the Reporting Odds Ratio (ROR) (Egberts et al. 2002).All these measures compare the number of reports on a certain drug–ADRpair to an expected number of reports conditional on the overall reportingrates for the drug and the ADR in the database. The original idea of makingcomparisons with the database itself as reference goes back to the early daysof ADR surveillance (Patwary 1969, Finney 1974). In addition to the lack ofreliable external estimates for the international usage of different drugs, anadvantage of disproportionality analysis is that marginal reporting biases thataffect only the drug or only the ADR, cancel out (at least approximately) in

15

a measure of disproportionality. Thus, even though the reporting rates arelikely to be higher for serious than for harmless ADRs, this does not have aconsiderable impact on the measures of disproportionality, as long as thereporting bias affects all drugs to an equal extent. The main drawback ofdisproportionality measures is that they rely on comparison to the reportingof other drug–ADR pairs. Thus, if a particular drug–ADR pair is massivelyreported, it will inflate the overall reporting rates for both the drug andthe ADR, sometimes to the extent that excessive reporting rates for thesame drug with another ADR or for another drug with the same ADR aremasked (Evans 2004, Hauben et al. 2005).

Assume the following contingency table based on the cross-classification ofreports according to whether they involve a drug x and an ADR y:

y not y

x a b

not x c d

The basis for pairwise disproportionality analysis in the WHO database isan observed-to-expected ratio OE contrasting the relative reporting rate of ygiven x to the overall relative reporting rate of y in the database. With the an-notation used in the above contingency table, the observed number of reportson y given x is a, and the expected number of reports conditional on the tablemarginals is the product of the marginal relative reporting rate of y and the to-tal number of reports on x: a+c

a+b+c+d · (a + b). The observed-to-expected ratiois:

OE =a/(a+b)

(a+ c)/(a+b+ c+d)(3.1)

The same measure of disproportionality has been used also in the context ofassociation rule analysis (Agrawal et al. 1996), where it is referred to as the liftor the interest of an association rule involving x and y (Silverstein et al. 1998,Hastie et al. 2001). The similarity between the observed-to-expected ratio andother measures of disproportionality proposed for the analysis of individualcase safety reports is clear. The Proportional Reporting Ratio (PRR) based onthe above contingency table is (Evans et al. 2001):

PRR =a/(a+b)c/(c+d)

(3.2)

and the corresponding Reporting Odds Ratio (ROR) is (Egberts et al. 2002):

ROR =a/bc/d

(3.3)

16

The IC measure of disproportionality used in routine knowledge discovery forthe WHO database is essentially a conservative version of log2 OE, that tendsto 0 for rare drug–ADR pairs. The moderation in magnitude is referred to asshrinkage (for details see Section 3.4 below). The availability of thoroughlyevaluated shrinkage measures is the main advantage of the OE ratio over thePRR and the ROR. Other strengths are the link to Bayes classifiers describedin Norén (2005) and the somewhat better robustness to zero counts in the con-tingency table than for the PRR and ROR (van Puijenbroek et al. 2002). Themain limitation is that the observed-to-expected ratio provides a less distinctcontrast between the group of interest and the reference group by including thegroup of interest in the reference. Another limitation is that the observed-to-expected ratio for a given pair of events by definition cannot exceed the inverseof the marginal relative reporting rate for each event. For example, if one of theevents has an overall relative reporting rate of 0.5, then observed-to-expectedratios involving this event can at most reach 2 (if the relative reporting rate ofthe first event conditional on the other event is 1.00). In practice this limits theusefulness of observed-to-expected ratio as a measure of disproportionality toevents that are reasonably rare.

While disproportionality analysis is usually carried out at an early stage of theexploratory analysis of collections of individual case safety reports, there aresometimes requests to compute a measure of disproportionality for a drug–ADR pair highlighted for review based on clinical judgement or on account ofone or a few very strong reports. If the drug–ADR pair turns out to be dispro-portionally reported, this may indeed lend added support. However, observeddisproportionality must always be interpreted with caution. The possibility ofalternative explanations such as report duplication, violated independence as-sumptions, publication biases or confounding must always be analysed andclearly stated.

3.4 Shrinkage

Shrinkage is an attempt to regularise and reduce the volatility of a measureor parameter estimate of interest, by trading an increase in bias for a decreasein variance. In large and sparse data sets such as national or international col-lections of individual case safety reports, raw measures of disproportionalitytend to sometimes yield very large values based on extremely low numbers ofreports, but disproportionality based on just 1 or 2 reports is rarely of practicalinterest. The problem is that for rare drugs and ADRs, the expected number ofreports may be very close to 0, relative to which even a single observed reportmay constitute a substantial deviation. Very low expected numbers of reportsoccur in the analysis of collections of individual case safety reports becausethe 2 by 2 contingency table of Section 3.3 is usually very unbalanced. Even

17

Figure 3.2: The simplified IC shrinkage measure plotted against the standard ICshrinkage measure for 10,000 randomly selected drug–ADR pairs in the WHOdatabase

for the most common drugs and ADRs in the WHO database, the number ofreports that do not involve either the drug or the ADR, d, is around 3,000,000,whereas b and c are generally in the order of 100 or 1,000 and a is even smaller(a≤ 10 for 80% of the drug–ADR pairs in the database). In order to reduce thevulnerability to spurious associations, two shrinkage measures of dispropor-tionality have been proposed for the analysis of collections of individual casesafety reports: the IC (Bate et al. 1998) and the EBGM (DuMouchel 1999).These measures of disproportionality are versions of the (logarithm of the)observed-to-expected ratio in (3.1) moderated towards a baseline value in theabsence of large amounts of data. For the IC, the baseline value is 0 whichcorresponds to an observed-to-expected ratio of 1. Such shrinkage provides arobust measure of disproportionality moderated towards less extreme valuesfor rare drugs and ADRs. However, as data accumulates it tends to log2 OE asdesired.

The IC shrinkage measure is defined in II as a Bayesian maximum à posterioriestimate of a parameter related to the logarithm of the observed-to-expectedratio in (3.1). It is well approximated by the following simplified shrinkagemeasure based on observed and expected counts Oxy and Exy:

IC ≈ log2Oxy +1/2Exy +1/2

(3.4)

A comparison between the IC shrinkage measure in II and that in (3.4) for10,000 randomly selected drug–ADR pairs in the WHO database is presentedin Figure 3.2. Clearly, the difference between the two shrinkage measures isnegligible. The main advantages of the simplified IC shrinkage measure are

18

that it is easier to compute and that it provides a general recipe for shrink-age that can be applied to any measure expressed in terms of an observed-to-expected ratio, such as the Ω measure of drug–drug interaction in IV. Thisshrinkage can also be implemented for the PRR and ROR, after re-expressionin terms of observed-to-expected ratios with Oxy = a and Exy = (a+b)c

c+d for thePRR, and with Oxy = a and Exy = bc

d for the ROR.

Empirical Bayes estimation provides an alternative framework for shrinkage,where the prior distribution for a group of parameters is estimated based onthe empirical distribution of maximum likelihood estimates for the group. Themain advantage of empirical Bayes estimators is that they borrow strengthfrom similar observations to improve the overall accuracy. However, with re-spect to each parameter, its estimate will only improve under the assumptionthat it is indeed related to the other parameters. Unlike the IC prior distri-bution, an empirical Bayes prior for the observed-to-expected ratio will notnecessarily be centred at 1, and thus may inflate individual disproportional-ity measures rather than shrink them towards less extreme values. A practicalissue is that for drug–ADR pairs that have never been co-reported, the max-imum likelihood estimate of the observed-to-expected ratio is 0. In practice,these drug–ADR pairs appear to be ignored in the estimation of the empiricalprior distribution in DuMouchel (1999), and the potential bias due to this isunclear. Berry and Berry (2004) propose a hierarchical empirical Bayes esti-mator for the observed-to-expected measure of disproportionality, where eachmeasure of disproportionality is shrunk towards the group mean for a smallergroup of more closely related ADRs. This should allow for more sophisti-cated empirical Bayes shrinkage, but the identification of appropriate groupsof related ADR terms remains a challenging research problem in its own right.

3.5 Pattern discovery and detection

Pattern recognition is the attempt to partition a group of data points intoclasses, based on a given set of explanatory variables (Webb 2002). Distinctionis made between supervised and unsupervised pattern recognition: in super-vised pattern recognition (or discrimination) a classifier is constructed basedon training data consisting of labelled data points with the aim of accuratelycategorising unseen data points; in unsupervised classification (or clustering),the aim is to identify a natural partitioning of the available data set, withoutlabelled training data available, or even a specification of what the classes ofinterest may be.

For our purposes, the distinction between patterns and models in the contextof pattern discovery and detection is more relevant. Hand and Bolton (2004)characterise patterns as related to local features of a data set involving only

19

subsets of the data points and/or subsets of the variables. Whereas a globalmodel provides a high level description of the most important general featuresof a data set, a pattern may highlight one or a few outlying observations ora strong correlation between two variables. Hand and Bolton (2004) proposethe following general definition:

"A pattern is a local structure that generates data with an anomalously highdensity compared with that expected under the (global) baseline model."

The focus on deviation from a global baseline model applies broadly to themethods described in this PhD thesis. The very aim of disproportionality anal-ysis, is to identify groups of events that are co-reported more often than wouldbe expected, based on a baseline independence model. Similarly, in duplicatedetection and other record matching applications, the aim is to identify pairs(or small subsets) of unexpectedly similar reports whose similarity deviatesfrom a global baseline model assuming all reports have been submitted inde-pendently.

With the exception of the work on Bayes classifiers in V (which relates pri-marily to supervised pattern recognition by the above definition), this thesisfocuses on unsupervised pattern discovery. The aim is to discover structure indata, without strict à priori specification of what the structure of interest is.At the same time, completely open-ended hypothesis generation is not pos-sible as the type of potential patterns is determined by the choice of patterndiscovery method, as well as implicitly by a range of other choices such asthe variables considered in a given study (Hand 1994, p 319). Thus, whiledisproportionality analysis may highlight a variety of patterns related to any-thing from a suspected drug–ADR association to an elevated reporting rate ofa certain drug in one particular country, the type of patterns in such studiesis restricted to unexpectedly high (or low) relative reporting rates. Similarly,record matching may highlight a variety of non-independent reports, but allhighlighted patterns will refer to unexpected report similarity.

3.6 Facilitating interpretation

Interpretation is one of the final steps in the knowledge discoveryprocess (Fayyad et al. 1996), and a key component of the ADR signaldetection process. Transparency is of particular importance in the analysisof non-systematically collected data such as individual case safety reports,where the use of overly complex statistical methodology may give a falsesense of security and distract domain experts from limitations with thedata (Hauben et al. 2005). Breiman (1985) refers to the application of

20

advanced statistical methodology to hide inadequacies with the data as‘edifice building’; in the ADR signal detection process, the use of overlycomplex statistical methods may divert clinical experts from carefulconsideration of alternative explanations to apparently excessive ADRrelative reporting rates.

Since the primary aim of applying knowledge discovery methods to collec-tions of individual case safety reports is to guide and support domain expertsin their manual review, better transparency is a strong argument in favour ofchoosing a simple method over a more complicated one. Indeed, better trans-parency is perhaps the strongest argument for choosing the simple IC shrink-age measure over the more complicated one as discussed in Section 3.4. Sta-tistical sophistication does not necessarily rule out transparency, however. Thehit-miss model record matching algorithm in I is based on a rather intricateprobabilistic model, but its basis for highlighting a given record pair as sus-pected duplicates is immediately clear from an overview such as that presentedin Figure 4.2 of Section 4.

While sophisticated statistical methods are sometimes required to make themost of the available data, knowledge discovery results should always be pre-sented as transparently as possible. For example, while shrinkage measures ofdisproportionality have proved a very powerful basis for filtering individualcase safety reports for interesting reporting patterns, they may confuse do-main experts, with little interest in the statistical methodology. Moreover, itis difficult to evaluate the impact of data quality issues such as suspected du-plication or reporting biases on shrinkage measures of disproportionality. Ob-served and expected counts provide a more transparent explanation for whycertain drug–ADR pairs have been highlighted for manual review. In the pres-ence of suspected data quality issues, simple arithmetic will indicate to whatextent an excessive reporting rate may be due to a group of suspected dupli-cates, for instance. At the same time, domain experts often do want a senseof whether an observed disproportionality is likely to be due to chance or not.Credibility intervals around measures of disproportionality give some such in-dication, although the potential for violated independence assumptions meansthat precision can be overestimated.

Adjustment for potential confounders may complicate interpretation ofshrinkage measures of disproportionality. However, as commented on inIII, adjusted observed-to-expected ratios sometimes correspond closely tostratum specific ones, and translating adjusted observed-to-expected ratiosto stratum specific ones may simplify interpretation. For example, in theexample on hypertension and zimeldine in Appendix E.4 of Hopstadius(2006), the IC increases from -0.33 to +1.65 when adjusted for time ofreporting and country of origin. A closer investigation of the detailed dataavailable in Appendix F.2 of the same thesis indicates that the discrepancy

21

is due to hypertension being more than twice as common on US reports(1.7%) as on reports from other countries (0.7%), whereas zimeldine wasnever used in the USA. Additionally the overall relative reporting rate ofhypertension has increased in recent years, whereas zimeldine was primarilyused in the early 1980’s. Thus, the crude IC which contrasts the observedrelative reporting rate of hypertension given zimeldine to the overall relativereporting rate of hypertension in the entire database underestimates thedisproportionality. Arguably, the best information to present to clinicalexperts in this case would be the observed number of reports on hypertensionfor zimeldine, and the expected number of such reports based on the relativereporting rate of hypertension in the countries and time period in which itwas available. To guide domain experts to appropriate interpretation is clearlyas important a challenge as method development in knowledge discoveryresearch.

3.7 Future directions

The new methodology proposed in this thesis provides a strong basis for fu-ture improvement and further research on knowledge discovery methods forcollections of individual case safety reports. The method for drug–drug in-teraction detection goes beyond simple drug–ADR disproportional reportingrates, and could potentially be used also to screen for other types of ADRrisk factors, such as related to patient gender or age. In general, we mustaim to make better use of the rich information available on individual casesafety reports. Virtually all knowledge discovery methods of today (includingthose described in this thesis) are based on raw numbers of reports (Haubenet al. 2005). They do not account for the amount or quality of information oneach report nor for suspected duplication. This is in stark contrast with clini-cal review, in which both the quality of single reports and the quality of setsof reports as a group is carefully scrutinised (Meyboom et al. 1997). Indeed,given that the overall aim of applying knowledge discovery methods to col-lections of individual case safety reports is to assist and direct clinical review,an important challenge for the future is to achieve better alignment betweenautomated knowledge discovery and clinical review. A first step may be todevelop new and improved quality criteria for individual case safety reportssimilar to those discussed by Edwards et al. (1990). Based on such qualitycriteria, the number of high quality, distinct reports referring to a particulardrug–ADR pair can be identified and potentially provide a useful triage cri-terion. The possibility to highlight single high quality reports is interesting inits own right.

The extended hit-miss model record matching algorithm has proved very use-ful for duplicate detection in the WHO database. Its importance is likely to

22

increase even further in the future, as new categories of health care profes-sionals, and even patients, are invited to submit reports. In addition, the hit-miss model record matching algorithm sometimes highlights non-independentreports other than pure suspected duplicates. Non-independent reports distortdata analysis, and their identification is important both for effective first passscreening and for clinical review where the consideration of a group of relatedreports as independent pieces of information is potentially deceptive. For thispurpose, an adapted hit-miss model record matching algorithm should ideallybe developed explicitly for the purpose of detecting non-independent reportsother than pure duplicates. A main challenge is how to incorporate, in sub-sequent data analysis, the information that some reports are suspected to berelated. One might conceive of an extended disproportionality analysis wherereports were weighted according to whether they are part of a suspected clus-ter or not. Given the tedious process of having suspected duplicates confirmedand removed from collections of individual case safety reports, the same ap-proach could perhaps be used also to account for suspected duplication, in firstpass screening. In a similar spirit, reports could perhaps also be weighted bytheir quality of information.

Another important challenge for the future is to further advance the methodsfor exploring patterns involving large groups of drugs and ADRs in collec-tions of individual case safety reports. In Orre et al. (2005), we use a Hopfieldtype network and a mixture model based probabilistic clustering algorithmto identify suspected ADR syndromes in the WHO database. The main chal-lenge is that while each syndrome may consist of a large group of ADRs,each report tends to include only a small subset of these, so training data isboth noisy and incomplete. Pattern discovery in high-dimensional binary datahas been studied in other application areas such as market basket analysis anddocument retrieval (Bingham et al. 2002), and this research provides a goodstarting point for further development in our area. An interesting generalisa-tion of the mixture model based clustering algorithm, for high-dimensionalbinary data, is the subspace clustering method proposed by Patrikainen andMannila (2004), which models only the most characteristic attributes for eachclass. For the discovery of reporting patterns based on smaller groups of re-ports, the hit-miss model based record matching algorithm may potentiallyprove useful. Its advantage is that it does not attempt to build a global model,but searches for groups of unexpectedly similar reports, based on pairwisecomparison.

The importance of individual case safety reports for early post-marketing dis-covery of previously undetected drug toxicity is clear. At the same time, thesedata sets are not optimal for all types of ADR-related knowledge discovery.Specifically, each report constitutes a snapshot in time, and any informationon the patient’s previous medical history is limited, at best. Therefore, it isdifficult to evaluate the potential impact of channelling effects, where those

23

patients that do not respond favourably to one medical treatment are system-atically switched to a specific other treatment. Similarly, individual case safetyreports usually do not provide enough information to determine whether thereare differences in the severity of the underlying disease between patients pre-scribed different drugs. Yet another limitation with individual case safety re-ports is that adverse events without clear temporal association with the pre-scription of the drug are difficult to identify as suspected ADRs, in particular ifthe background incidence of the adverse event is high (Meyboom et al. 1997).As a consequence, longitudinal patient records listing patients’ entire medicalhistories are a very interesting complementary source of information. Com-bined, individual case safety reports and longitudinal patient records may al-low for more comprehensive ADR related knowledge discovery. While themethodology proposed in the context of this thesis has been developed specif-ically for the exploratory analysis of collections of individual case safety re-ports, some of it may be relevant also for the analysis of longitudinal patientrecords. Specifically, the method for interaction detection introduced in IVcan be adapted to longitudinal patient records, and the proposed frameworkfor exploratory analysis outlined in Figure 3.1, should, with some modifica-tions, apply also to longitudinal patient records.

24

4. Overview of the papers

This thesis is based on five original contributions. The order in which theyare presented corresponds roughly to their natural order of application in theknowledge discovery process for ADR surveillance. I focuses on improvingdata quality through identifying suspected duplicate reports. II, III and IVpropose improvements to, and evaluate different aspects of, disproportional-ity analysis for individual case safety reports. Finally, V proposes a bootstrapmethod to estimate the uncertainty in each prediction of a Bayes classifier.Historically, II and V are based on related work on Bayesian bootstrap anal-ysis in 2003. An earlier version of II was presented at the 25th annual con-ference of the International Society for Clinical Biostatistics in Leiden, theNetherlands, 2004. The duplicate detection algorithm in I was developed dur-ing 2004 and 2005, and a shorter version of this paper was presented at theEleventh International Conference on Knowledge Discovery and Data Miningin Chicago, 2005. The evaluation of the adjusted observed-to-expected ratio inIII was performed during 2005 and 2006, and the statistical methodology fordrug–drug interaction detection in IV was developed during 2006. The aim ofthis section is to provide a conceptual overview of the five papers.

4.1 Paper I

Good data quality is a prerequisite for effective data analysis (Kim et al. 2003,De Veaux and Hand 2005). One important data quality problem in collectionsof individual case safety reports is that of report duplication. Duplicate reportsare unlinked reports related to the same ADR incident, perhaps provided bydifferent health professionals or by the same health professional to differentdrug safety centres. Their presence is a problem in the analysis of individ-ual case safety reports because the total number of reports on a particulardrug–ADR pair is both the basis for automated knowledge discovery and animportant piece of information in clinical review of potential drug safety sig-nals. When a single suspected ADR incident yields several reports, this maydivert the analysis. Some studies indicate that duplicates may account for aslarge a proportion as 5% of all reports. More importantly, suspected reportduplication appears not to be evenly spread in the data set, but whereas mostreports have no suspected duplicates, a small minority have several. Require-

25

T

X YHit

Blank

Miss

1-a-b

b

aTrue value

Observedvalue on

first report

Observed value on

second report

?

−

T

Figure 4.1: The hit-miss model

ments and regulations selectively stimulate reporting of previously unknownand serious ADRs, and may also increase the risk of duplicate reports relatedto such incidents (R. H. B. Meyboom, personal communication). The iden-tification of suspected duplicates is thus an important step towards improveddata quality and, ultimately, more effective automated knowledge discoveryas well as better informed clinical review.

The identification of suspected duplicates in collections of individual casesafety reports is a difficult challenge. Duplicate reports will often either havebeen submitted by different individuals or processed in different reporting sys-tems, and as such can be superficially very dissimilar. Different ADR termsmay have been used to encode the same incident, patient information maybe erroneous or incomplete and the listed drugs may differ between reportsrelated to the same incident. Therefore, simple rule based methods are usu-ally insufficient to reliably detect suspected duplicates. The duplicate detec-tion method proposed in I is based on the hit-miss model for statistical recordlinkage introduced by Copas and Hilton (1990). The hit-miss model providesa probability model for how discrepancies between related database recordsoccur. It allows for flexible and robust record matching in the presence of alarge variety of errors. Under the hit-miss model, each observed value X ona database record (for example a listed patient gender on a report) is basedon a true but unobserved value T = t (in this case the true gender of the pa-tient). Observed values on related records are assumed to have been generatedin independent identically distributed random processes resulting in i) a miss(with respect to the true value) with probability a, ii) a blank with probabilityb, or iii) a hit with probability 1−a−b (see Figure 4.1). For a miss X is a ran-dom value independent of T but following the same distribution, for a blankthe value of X is missing and for a hit X = t. Hits and misses are unobserv-able events of an assumed data generating process. In screening for suspected

26

2002-02-07 ? 62 years Norway Sertraline MirtazapineTachycardiaventricular

2002-02-07 Female 60 years Norway Sertraline Mirtazapine ZopicloneTachycardiaventricular

+12.0 ±0 -0.2 +7.2 +6.1 +8.7 -2.3 +8.1

+38.2

-1.4

= ? ≠ = = = ≠ =

Compensation for correlationbetween sertraline, mirtazapine

and tachycardia

Figure 4.2: Hit-miss model based scoring of a sample record pair

duplicates we make comparisons between distinct reports, based on whetherthey have matching or mismatching information.

Duplicate detection in the WHO database is based on patient gender, patientage, outcome, country of origin, date of onset, as well as all listed drugs andADRs. For each record field, a match weight is calculated based on the like-lihood ratio for the observed matching event under the assumption that thetwo records under study i) relate to the same underlying ADR incident orii) are unrelated. The total match score is obtained by adding together thematch weights for all record fields, as illustrated in Figure 4.2. It can be shownthat, under the hit-miss model, matches always receive positive weights, mis-matches receive negative weights and missing information on either report re-sults in a match weight of 0. Moreover, matches on rare events receive highermatch weights than matches on common events. This is an appealing propertysince chance matches between unrelated record fields are more likely on com-mon events. The penalty for a mismatch is constant for a given record fieldbut varies between record fields depending on how many mismatches wereobserved in each record field in the available training data (consisting of con-firmed pairs of duplicate reports). Thus, in screening for suspected duplicates,mismatches in error prone record fields are penalised less than mismatches inrecord fields that are usually reliable.

In I, we propose two methodological improvements to the standard hit-missmodel: a hit-miss mixture model for numerical record fields and an adjust-ment of the overall match score for violated independence assumptions be-tween matching record fields. The hit-miss mixture model extends the hit-missmodel by including the possibility of imperfect matches in numerical recordfields, which are less detached from the true value than complete misses. De-viations follow a narrow distribution centred at the true value. The compen-sation for violated independence assumptions is based on an IC dispropor-

27

tionality measure for the overall co-occurrence of two matching events in thedatabase. It reduces the total match score for groups of matched events thatoccur together more often in the database than would be expected under theassumption of independence. The greatest strengths of the extended hit-missmodel are that it provides transparent and intuitive match weights and that itsparametrisation allows for robust fitting also in the absence of large numbersof confirmed duplicates.

Because suspected duplicates can be reliably confirmed or refuted, the per-formance of a proposed duplicate detection method can be easily evaluated.In I, we demonstrate that the extended hit-miss model is able to identify withhigh accuracy (94.7% in our test data set), the most likely duplicate for agiven database record. We also show that it effectively discriminates pairs oftrue duplicates from random matches. In a batch of 1559 Norwegian reportsthat included 19 confirmed duplicates, the extended hit-miss model identified12 of the 19 already known duplicates (corresponding to a 63% recall) whileadditionally highlighting two pairs and one set of three reports as suspectedduplicates that were not originally labelled as such (corresponding to a nom-inal 71% precision). Out of the additional suspected duplicates, one pair waslater confirmed by the Norwegian national centre as a set of true duplicates,the other pair remains a set of suspected but unconfirmed duplicates and theset of three suspected duplicates turned out to be separate reports on the samedrug–ADR pair submitted by the same dentist, but for three distinct patients.

4.2 Paper II

The IC measure of disproportionality discussed in Sections 3.4 and 3.3 is thebasis for routine screening of the WHO database to highlight excessive ADRreporting rates. In its original implementation (Bate et al. 1998), the IC onlyallowed for the identification of pairwise disproportionality (typically betweenone drug and one ADR). It relied on large sample approximations to computecredibility intervals and did not accommodate adjustment for suspected con-founders. In response to these issues, II proposes credibility intervals accuratealso for small samples, adopts a post-stratification approach to adjust for sus-pected confounders and introduces a simple extension to higher orders for theIC measure of disproportionality. The overall aim of these improvements is toallow more sophisticated and reliable screening for disproportional reportingrates in the WHO database.

The credibility intervals for the IC proposed in Bate et al. (1998) were basedon a normal approximation to the posterior IC distribution. In II, we demon-strate by precise Monte Carlo simulation that this is often not accurate enough.As an alternative, we propose an approximate formula for computing credibil-

28

ity intervals of the posterior IC distribution that is accurate also for rare events.It may seem counter-intuitive that the use of small sample methods should benecessary in the analysis of a data set with nearly 4 million records. However,as the focus turns to specific drug–ADR pairs, the number of relevant reportsdecreases very rapidly. Among the around 720,000 drug–ADR pairs ever co-reported in the WHO database, more than 320,000 are co-reported only once,and an additional 106,000 only twice. More than 80% are co-reported lessthan 10 times. In the context of the variety of challenges involved in analysingthese data sets, the importance of very accurate credibility intervals is perhapslimited, but one practically useful aspect of the refined credibility intervalsproposed in II over those in Bate et al. (1998) is that they allow exampleswhere the first three reports on a new drug all refer to the same ADR to behighlighted. This may allow for very early warning of some suspected ADRs.

There may be a need to eliminate the impact of suspected confounders indisproportionality analysis. In II, we adopt a post-stratification approach toadjusting the observed-to-expected ratio for potential confounders originallyproposed by DuMouchel (1999). The adjusted observed-to-expected ratio isan average of stratum specific observed-to-expected ratios weighted by thestratum specific expected numbers of reports:

OEad j =∑z

Ozxy

Ezxy·Ez

xy

∑z Ezxy

=Oxy

∑z Ezxy

(4.1)

The relative merits of the adjusted observed-to-expected ratio, and the generalimpact of confounding on disproportionality analysis in the WHO databaseare further discussed in III.

The extension of the IC to higher order associations in II is an important steptowards being able to screen for disproportional reporting indicative of effectmodification (for example variation across age groups in the risk of a certainADR due to a particular drug). The higher order IC is simple to estimate androbust to overfitting based on limited amounts of data. Disregarding shrinkage,the third order IC between events x, y and z proposed in II is:

ICxyz = ICxy|z − ICxy (4.2)

where:

ICxy|z = log2P(y | x,z)P(y | z)

(4.3)

Thus, a positive third order IC value indicates that the presence of event z in-creases the disproportionality between x and y (and vice versa – the measureis symmetrical in x, y and z). We further show that the third order IC can beexpressed as an observed-to-expected ratio for the threeway relative reporting

29

rate, where the expected relative reporting rate is based on a product of fac-tors relating to main effects and pairwise interaction. The advantage of (4.2)relative to the third order IC proposed in Orre et al. (2000) is that (4.2) ac-counts for pairwise associations in the expected relative reporting rate. Thediscussion of how to best screen for interaction in ADR surveillance is furtherextended in IV.

4.3 Paper III

Confounders are covariates that distort the quantitative relationship understudy. A textbook example of confounding is that the crude associationbetween coffee drinking and coronary heart disease in observational studiesmay be due to heavy coffee drinkers also having a greater propensity tosmoke (Hennekens et al. 1976). While in experimental studies, randomisationin principle eliminates the impact of all potential confounders, observationalstudies are non-randomised by design and require each suspected confounderto be individually identified and adjusted for in the analysis (or accounted forin the study design). The potential impact of unaccounted for confoundingvariables is a constant concern in the interpretation of observational data. Ithas been argued that routine adjustment for potential confounders is crucialalso in first pass screening for excessive ADR reporting rates in collections ofindividual case safety reports to avoid highlighting disproportional reportingrates driven by other covariates.

In III, we study the relative merits of the post-stratification approach to rou-tine adjustment of the observed-to-expected ratio adopted for the IC in II. Wefocus on the WHO database, but use both simulated stratification and strat-ification based on true covariates (in particular patient age, patient gender,country of origin and time of reporting). The two main results are that theadjusted observed-to-expected ratio is sensitive to over-stratification and thatroutine adjustment for common potential confounders has less impact on sig-nal detection performance than initially believed. With a careful selection ofsuspected confounders and a more coarse categorisation of these covariates,routine adjustment does improve performance relative to a literature compar-ison, in our investigation. However, this performance improvement is modestcompared to that due to imposing a triage criterion that requires reports frommore than one country to highlight a drug–ADR pair for clinical review (seeFigure 4.3). These results support the claim by Bate et al. (2003), that con-founding may be less important a bias in first pass screening of collections ofindividual case safety reports for excessive ADR reporting rates than generallyassumed.

30

0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Recall

Pre

cisi

on

Crude without triageAdjusted without triageCrude with triage Adjusted with triage

Figure 4.3: Precision–recall graphs relative to a literature reference for the crude IC025and the IC025 simultaneously adjusted for country of origin and reporting time interval,with and without a triage criterion requiring reports from more than 1 country. Thegraphs plot precision (number of true positives over number of true positives and falsepositives) vs recall (number of true positives over number of true positives and falsenegatives) at varying thresholds on IC025.

4.4 Paper IV

Interaction between drug substances may lead to excessive risk of certainADRs when two drugs are taken at the same time. If previously unknownhigh risk drug combinations can be identified, they can potentially be avoidedin the future, and drugs that would have otherwise been withdrawn can remainon the market with warnings concerning co-medication. Thus, the identifica-tion of suspected drug-drug interaction is important both from the individ-ual patient safety perspective and from the general public health perspective.In addition to the higher order IC measure of disproportionality in II, tworegression based approaches to screening individual case safety reports forsuspected drug–drug interaction have also been proposed (van Puijenbroeket al. 1999, DuMouchel and Pregibon 2001), but no publicly available resultsindicate that any of the proposed methods have been successfully applied toprospective screening for suspected drug–drug interaction.

An important contribution of IV is the observation that the limited success ofearlier proposed methods for drug–drug interaction detection may be due totheir use of baseline models where, in the absence of interaction, different riskfactors essentially multiply. There are arguments from both public health andindividual patient safety perspectives to consider absolute differences in riskrather than relative ones (Rothman et al. 1980): from a public health perspec-

31

tive, we are interested in whether the absolute number of ADR incidents ofa certain type in a given population depends on to what extent two differentdrugs are co-prescribed; from the individual decision-making point of view,we want to know whether the increase in absolute risk of a certain ADR dueto the prescription of one drug is modified by the co-prescription of anotherdrug. Based on these arguments, we propose in IV the Ω measure of suspecteddrug–drug interaction. Ω is a shrinkage measure of threeway disproportionalreporting, based on the logarithm of an observed-to-expected ratio for the rel-ative reporting rate of an ADR A under co-prescription of drugs D1 and D2.In our model, the background risk of A and the risks of A attributable to D1and D2, respectively, are independent. For small attributable risks, this leadsto an approximately additive model for risk difference, in the population. Themain technical contribution is an approach to estimate the expected relativereporting rate of A given D1 and D2 co-prescribed.

In studies of the WHO database, we show that Ω highlights examples of es-tablished drug–drug interaction, with excessive relative reporting rates that goundetected with logistic regression. For example, unlike logistic regression Ω

indicates that there is suspected interaction between gemfibrozil and cerivas-tatin with respect to the risk of rhabdomyolysis. This is a well establisheddrug–drug interaction and co-prescription together with gemfibrozil was con-traindicated for cerivastatin even as it was introduced to the general public.There are over 1,000 reports in the WHO database on rhabdomyolysis for con-comitant use of cerivastatin and gemfibrozil, and the relative reporting rate ofrhabdomyolysis given cerivastatin together with gemfibrozil is over 75%. Thisis to be compared with relative reporting rates of 0.1% in the absence of bothcerivastatin and gemfibrozil, 4% for gemfibrozil in the absence of cerivastatinand 25% for cerivastatin in the absence of gemfibrozil. Clearly, a method fordrug–drug interaction must highlight this as indicative of suspected drug–druginteraction in order to be practically useful for first pass screening purposes.Ω fulfils this requirement and allows for computationally efficient first passscreening for suspected drug–drug interaction in collections of individual casesafety reports.

4.5 Paper V

The aim of V is to demonstrate the usefulness of case-based imprecision esti-mates for Bayes classifier predictions. Unlike the overall expected predictionerror, case-based precision estimates indicate the certainty with which eachindividual data point is predicted. Clearly, this will vary depending on the de-gree of similarity between the data point of interest and those in training data.Bayes classifiers are generative classifiers that predict class membership in-directly, based on estimated distributions of the explanatory variables given

32

a specific class. This is in contrast with discriminative classifiers, such as lo-gistic regression whose parameters are optimised directly with respect to theprediction performance on a given set of training data. As noted by Ng andJordan (2002), generative classifiers may reach their (higher) asymptotic errormore rapidly than discriminative classifiers such as logistic regression, andthus be preferable in the absence of large amounts of training data. Despite itsoften violated assumption of mutual independence between explanatory vari-ables given class membership, the naive Bayes classifier has proved to com-pare well with more sophisticated classification methods in many real worldapplications (Domingos and Pazzani 1997, Hand and Yu 2001). However, theexact values of the estimated class probabilities are not trustworthy as there isa tendency of the naive Bayes classifier of being too confident in its predic-tions (Hand and Yu 2001). As an alternative, we propose that the certainty withwhich each data point is classified be estimated based on Bayesian bootstrapresampling of the original training data. The Bayesian bootstrap produces alarge number of slightly modified training data sets by repeatedly assigningDi(1,1, . . . ,1) distributed random weights to the observations in the originaltraining data. Based on each Bayesian bootstrap replicate of the original train-ing data, a Bayes classifier is trained and used to predict the data point(s) ofinterest. Instead of the predicted probability of class membership based on theoriginal Bayes classifier, we propose that the proportion of Bayesian bootstrapreplicates for which the predicted probability of class membership exceeds 0.5be used as an estimate of the certainty with which a given data point is pre-dicted. We provide results in V that indicate that this reduces the expectedloss, when some misclassifications are more costly than others.

A comment made in Norén (2005), which is worth repeating, is that in V, themarginal class probabilities P(y j) were estimated essentially as the proportionof instances from each class in the available training data. This is appropriatewhen training data is a representative sample from the population to whichthe classifier is to be applied. If, on the other hand, the composition of trainingdata does not necessarily represent that of future observations, then P(y j) musteither be estimated based on external data relevant to the population of interestor be based on prior assumptions. The Bayesian bootstrap can be modified toaccommodate this, by replacing the numbers of data points from each class intraining data ny1 ,ny2 , . . . (see Table 1 in V) by the corresponding numbers ina data set representative for future samples (or by appropriate pseudo-counts).

V is the only paper in this thesis not to have derived methods explicitly for thepurpose of improved ADR surveillance. However, the methods for improvedBayes classification under asymmetrical loss have a potential application inthe development of more data driven triage algorithms for ADR surveillance.As implemented today, the triage algorithms are based exclusively on clini-cal expertise (Ståhl et al. 2004). A Bayes classifier framework may allow fora more data driven approach where clinical judgement of the value of previ-

33

ously highlighted drug–ADR pairs is used as training data for the constructionof a Bayes classifier. Useful explanatory variables for such an implementationmight include the total number of reports listing a given drug–ADR combi-nation, as well as their quality, diversity and geographical spread; the numberof positive de- or rechallenge interventions etc. Given that in ADR signal de-tection, missed problems are more problematic than falsely highlighted ones,the loss functions involved will be asymmetrical, and the Bayesian bootstrapmethod proposed in V should allow for improved performance.

34

Acknowledgements

I would like to express my gratitude to all those who have provided supportand encouragement during the work that has lead to this PhD thesis.

• Professor Rolf Sundberg for inspiration and advice, and for providing anexcellent example of how to combine a profound knowledge of mathemat-ical statistics with a genuine interest in solving real world problems.

• Professor Ralph Edwards for helping me to broaden my views in pharma-covigilance, for identifying a wide variety of challenging research prob-lems and for providing an ambitious overall vision for the work of theUppsala Monitoring Centre.

• Andrew Bate for encouraging me to enrol as a PhD student, for day to daysupport and advice and for a very rewarding and productive collaboration.

• Marie Lindquist, Ron Meyboom and Sten Olsson for sharing their know-ledge of ADR signal detection and of the WHO programme.

• All colleagues at the Uppsala Monitoring Centre for providing an excellentwork environment, in particular the members of the R&D team, past andpresent: Erik Swahn, Jonathan Edwards, Malin Ståhl, Sven Purbe, JohanHopstadius, Kristina Star, Johanna Strandell and Ola Caster.

• Roland Orre for expert computational support and advice.• All members of the Division of Mathematical Statistics at Stockholm Uni-

versity for providing a stimulating research environment and for welcom-ing me to the group.

Finally, I would like to thank all my family and friends from Järbo, New Beth-lehem, Göteborg, Uppsala and elsewhere. A special thank you to my parentsHasse and Christina Norén for your everlasting support and encouragement,and to Minna, my love, for making it all worthwhile.

Uppsala, March 2007,

Niklas Norén

35

Bibliography

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo, A. I.: 1996, Fastdiscovery of association rules, Advances in knowledge discovery and datamining, American Association for Artificial Intelligence, pp. 307–328.

Aronson, J. K. and Hauben, M.: 2006, Anecdotes that provide definitive evidence,British Medical Journal 333(7581), 1267–1269.

Bate, A.: 2003, The Use of a Bayesian Confidence Propagation Neural Networkin Pharmacovigilance, PhD thesis, Umeå University.

Bate, A., Edwards, I. R., Lindquist, M. and Orre, R.: 2003, Violation of homogeneity:the author’s reply, Drug Safety 26, 364–366.

Bate, A., Lindquist, M., Edwards, I. R., Olsson, S., Orre, R., Lansner, A. and De Fre-itas, R. M.: 1998, A Bayesian neural network method for adverse drug reactionsignal generation, European Journal of Clinical Pharmacology 54, 315–321.

Berry, S. M. and Berry, D. A.: 2004, Accounting for multiplicities in assessing drugsafety: a three-level hierarchical mixture model, Biometrics 60(2), 418–426.

Bingham, E., Mannila, H. and Seppänen, J. K.: 2002, Topics in 0–1 data, KDD’02:Proceedings of the eighth ACM SIGKDD international conference onKnowledge discovery and data mining, ACM Press, New York, NY, USA,pp. 450–455.

Breiman, L.: 1985, Nail finders, edifices and Oz, Proceedings of the Berkeley Con-ference in Honor of Jerzy Neyman and Jack Kiefer, Vol. I, Wadsworth, Bel-mont, CA, USA, pp. 201–214.

Breiman, L.: 2001, Statistical modeling: the two cultures (with comments and a re-joinder by the author), Statistical science 16(3), 199–231.

Copas, J. and Hilton, F.: 1990, Record linkage: statistical models for match-ing computer records, Journal of the Royal Statistical Society: Series A153(3), 287–320.

Coulter, D. M., Bate, A., Meyboom, R. H., Lindquist, M. and Edwards, I. R.: 2001,Antipsychotic drugs and heart muscle disorder in international pharmacovigi-lance: data mining study, British Medical Journal 322(7296), 1207–1209.

37

De Veaux, R. D. and Hand, D. J.: 2005, How to lie with bad data, Statistical Science20(3), 231–238.

Domingos, P. and Pazzani, M.: 1997, On the optimatility of the simple Bayesian clas-sifier under zero-one loss, Machine learning 29, 103–130.

DuMouchel, W.: 1999, Bayesian data mining in large frequency tables, with an ap-plication to the FDA spontaneous reporting system, American Statistician53, 177–202.

DuMouchel, W. and Pregibon, D.: 2001, Empirical Bayes screening for multi-itemassociations, KDD ’01: Proceedings of the seventh ACM SIGKDD inter-national conference on Knowledge discovery and data mining, pp. 67–76.

Edwards, I. R.: 1997, Adverse drug reactions: finding the needle in the haystack,British Medical Journal 315(7107), 500.

Edwards, I. R.: 1999, Spontaneous reporting – of what? Clinical concerns about drugs,British Journal of Clinical Pharmacology 48(2), 138–141.

Edwards, I. R. and Aronson, J. K.: 2000, Adverse drug reactions: definitions, diagno-sis and management, Lancet 356(9237), 1255–1259.

Edwards, I. R. and Biriell, C.: 1994, Harmonisation in pharmacovigilance, DrugSafety 10(2), 93–102.

Edwards, I. R., Wiholm, B.-E., Lindquist, M. and Napke, E.: 1990, Quality criteria forearly signals of possible adverse drug reactions, Lancet 336(8708), 156–158.

Efron, B.: 2001, [Statistical modeling: the two cultures]: Comment, Statistical sci-ence 16(3), 218–219.

Egberts, A. C., Meyboom, R. H. and van Puijenbroek, E. P.: 2002, Use of measuresof disproportionality in pharmacovigilance: three Dutch examples, Drug Safety25(6), 453–458.

Elder, J. F. and Pregibon, D.: 1996, A statistical perspective on knowledge discoveryin databases, Advances in knowledge discovery and data mining, AmericanAssociation for Artificial Intelligence, Menlo Park, CA, USA, pp. 83–113.

Evans, S. J. W.: 2000, Pharmacovigilance: a science or fielding emergencies?, Statis-tics in Medicine 19(23), 3199–3209.

Evans, S. J. W.: 2004, Statistics: analysis and presentation of safety data, in J. Talbottand P. Waller (eds), Stephens’ detection of new adverse drug reactions, JohnWiley & Sons, Chichester, England, pp. 301–328.

Evans, S. J. W., Waller, P. C. and Davis, S.: 2001, Use of proportional reporting ratios(PRRs) for signal generation from spontaneous adverse drug reaction reports,Pharmacoepidemiology and Drug Safety 10(6), 483–486.

38

Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P.: 1996, The KDD process for extract-ing useful knowledge from volumes of data, Communications of the ACM39(11), 27–34.

Finney, D. J.: 1966, Monitoring adverse reactions to drugs – its logic and its weak-nesses, Proceedings of the European Society for the study of Drug Toxicity7, 198–207.

Finney, D. J.: 1971, Statistical logic in the monitoring of reactions to therapeuticdrugs, Methods of Information in Medicine 10(4), 237–245.

Finney, D. J.: 1973, The detection of causation of adverse events, Proceedings of the39th session of the International Statistical Institute, pp. 387–393.

Finney, D. J.: 1974, Systematic signalling of adverse reactions to drugs, Methods ofInformation in Medicine 13(1), 1–10.

Glymour, C., Madigan, D., Pregibon, D. and Smyth, P.: 1997, Statistical themes andlessons for data mining, Data Min. Knowl. Discov. 1(1), 11–28.

Hand, D. J.: 1994, Deconstructing statistical questions, Journal of the Royal Statis-tical Society. Series A (Statistics in Society) 157(3), 317–356.

Hand, D. J.: 1998, Data mining: Statistics and more?, The American Statistician52, 112–118.

Hand, D. J. and Bolton, R.: 2004, Pattern discovery and detection: A unified statisticalmethodology, Journal of Applied Statistics 31(8), 885–924.

Hand, D. J. and Yu, K.: 2001, Idiot’s Bayes—not so stupid after all?, InternationalStatistical Review 69(3), 385–398.

Hastie, T., Tibshirani, R. and Friedman, J.: 2001, The elements of statistical learn-ing: data mining, inference and prediction, Springer-Verlag, New York, NY,USA.

Hauben, M., Madigan, D., Gerrits, C. M., Walsh, L. and van Puijenbroek, E. P.: 2005,The role of data mining in pharmacovigilance, Expert Opinion on Drug Safety4(5), 929–948.

Hennekens, C., Drolette, M., Jesse, M., Davies, J. and Hutchison, G.: 1976, Coffeedrinking and death due to coronary heart disease, New England Journal ofMedicine 294(12), 633–636.

Hopstadius, J.: 2006, Methods to control for confounding variables in screeningfor associations in the WHO drug safety database, Master’s thesis, UppsalaUniversity.

39

Kim, W. Y., Choi, B.-J., Hong, E. K., Kim, S.-K. and Lee, D.: 2003, A taxonomy ofdirty data., Data Mining and Knowledge Discovery 7(1), 81–99.

Lindquist, M.: 2003, Seeing and Observing in International Pharmacovigilance– Achievements and Prospects in Worldwide Drug Safety, PhD thesis,Katholieke Universiteit Nijmegen.

Mannila, H.: 1996, Data mining: machine learning, statistics, and databases, Pro-ceedings of the 8th International Conference on Scientific and StatisticalDatabase Management (SSDBM ’96), pp. 2–9.

Meyboom, R. H. B., Egberts, A. C. G., Edwards, I. R., Hekster, Y. A., de Koning, F.H. P. and Gribnau, F. W. J.: 1997, Principles of signal detection in pharmacovig-ilance, Drug Safety 16(6), 355–365.

Meyboom, R. H. B., Lindquist, M., Egberts, A. C. G. and Edwards, I. R.: 2002, Signalselection and follow-up in pharmacovigilance, Drug Safety 25(6), 459–465.

Ng, A. Y. and Jordan, M. I.: 2002, On discriminative vs. generative classifiers: A com-parison of logistic regression and naive bayes, in T. G. Dietterich, S. Becker andZ. Ghahramani (eds), Advances in Neural Information Processing Systems14, MIT Press, Cambridge, MA.

Norén, G. N.: 2005, Statistical methods for large scale exploratory analysis of post-marketing drug safety data. Licentiate thesis, Stockholm University.

Olsson, S.: 1998, The role of the WHO programme on international drug monitoringin coordinating worldwide drug safety efforts, Drug Safety 19(1), 1–10.

Orre, R., Bate, A., Norén, G. N., Swahn, E., Arnborg, S. and Edwards, I. R.: 2005, ABayesian recurrent neural network for unsupervised pattern recognition in largeincomplete data sets, International Journal of Neural Systems 15(3), 207–222.

Orre, R., Lansner, A., Bate, A. and Lindquist, M.: 2000, Bayesian neural networkswith confidence estimations applied to data mining, Computational Statistics& Data Analysis 34, 473–493.

Patrikainen, A. and Mannila, H.: 2004, Subspace clustering of high dimensional bi-nary data – a probabilistic approach, Proc. Fourth SIAM Int’l Conf. DataMining, Workshop Clustering High Dimensional Data and Its Applica-tions, pp. 57–65.

Patwary, K. M.: 1969, Report on statistical aspects of the pilot research project forinternational drug monitoring, Technical report, Report prepared for the WorldHealth Organization, Geneva.

Purcell, P. M.: 2003, Data mining in pharmacovigilance, International Journal ofPharmaceutical Medicine 17(2), 63–64.

40

Rawlins, M. D.: 1988, Spontaneous reporting of adverse drug reactions. II: Uses,British Journal of Clinical Pharmacology 1(26), 7–11.

Rothman, K. J., Greenland, S. and Walker, A. M.: 1980, Concepts of interaction,American Journal of Epidemiology 112(4), 467–470.

Sanz, E. J., De-las-Cuevas, C., Kiuru, A., Bate, A. and Edwards, I. R.: 2005, Selec-tive serotonin reuptake inhibitors in pregnant women and neonatal withdrawalsyndrome: a database analysis, The Lancet 365, 482–487.

Savage, R. L.: 1985, Adverse drug reaction monitoring, Master’s thesis, Universityof Newcastle upon Tyne.

Silverstein, C., Brin, S. and Motwani, R.: 1998, Beyond market baskets: generalizingassociation rules to dependence rules, Data mining and Knowledge Discov-ery 2, 39–68.

Ståhl, M., Lindquist, M., Edwards, I. R. and Brown, E. G.: 2004, Introducing triagelogic as a new strategy for the detection of signals in the WHO drug monitoringdatabase, Drug Safety 13(6), 355–363.

van Puijenbroek, E. P., Bate, A., Leufkens, H. G. M., Lindquist, M., Orre, R. andEgberts, A. C. G.: 2002, A comparison of measures of disproportionality forsignal detection in spontaneous reporting systems for adverse drug reactions,Pharmacoepidemiology and Drug Safety 11(1), 3–10.

van Puijenbroek, E. P., Egberts, A. C., Meyboom, R. H. B. and Leufkens, H. G. M.:1999, Signalling possible drug-drug interactions in a spontaneous reporting sys-tem: delay of withdrawal bleeding during concomitant use of oral contraceptivesand itraconazole, British Journal of Clinical Pharmacology 47, 689–693.

Webb, A.: 2002, Statistical pattern recognition, 2 edn, John Wiley & Sons, Chich-ester, England.

41

statistical methods for knowledge discovery in adverse drug reaction surveillance197004/... ·...

Documents