data mining in pharmacovigilance
DESCRIPTION
TRANSCRIPT
DATA MINING IN PHARMACOVIGILANCE
Dr. Bhaswat S. ChakrabortySr. VP & Chair, R&D Core CommitteeCadila Pharmaceuticals Ltd., Ahmedabad
Presented at Indian Pharmacological Society Meeting, Ahmedabad, October 5, 2013
1
CONTENTS Pharmacovigilance (PV) PV process PV databases Data mining in PV Toxic signals & signal detection (SD) Non-Bayesian SD
Disproportionality Bayesian SD
Multi-item gamma poisson shrinker (MGPS) Bayesian confidence propagation neural network
(BCPNN) Examples Concluding remarks 2
PREMATURE APPROVAL, INCOMPLETE SAFETY PROFILE? Many drugs whose complete safety profile is
still unknown have been approved In some cases, drugs are approved despite
identification of SAEs in premarketing trials Alosetron hydrochloride – ischemic colitis Grepafloxacin hydrochloride – QT prolongation and
deaths Rofecoxib – heart attack and stroke (long-term,
high-dosage use) They were all subsequently withdrawn from
the market because of these SAEs In currently marketed drugs black box
warnings (SAEs caused by prescription drugs) is very common
3
CHANCES TO OBSERVE SAES THROUGH CTS
Reaction Rate
Sample Size
Pr(at least 1)
Pr(at least 2)
1% 500 0.993 0.960
0.5% 500 0.918 0.713
1000 0.993 0.960
0.1% 1500 0.777 0.442
3000 0.950 0.801
0.01% 6000 0.451 0.122
10000 0.632 0.264
20000 0.865 0.5944
PHARMACOVIGILANCE (PV) Monitoring, evaluation and
implementation of drug safety Detection and quantitation
of adverse drug reactions (ADRs)novel or partially known
previously unknownknown hazard ↑frequency or ↑severity
in their Clinical nature, Severity or Frequency
5
6
THE PHARMACOVIGILANCE PROCESS
Source: A.L. Gould, Internet PPT
PHARMACOVIGILANCE DATABASES PV is usually practiced by agencies and pharmaceutical
companies by focusing on SD in large databases These databases are of huge sizes, e.g.,
USFDA database, AERS: > 6.2 million records WHO database, VIGIBASE: >7.2 million records GSK databse, OCEANS: > 2 million records
Based on a study, the highest power for finding a true signal is achieved by combining those databases with the most drug-specific data.
Also early safety SD should involve the use of multiple large global databases
Reliance on a single database may reduce statistical power and diversity of ADRs
Hammond IW et al. (2007). Expert Opin Drug Saf. 6:713-21
7
DESIRABLE ATTRIBUTES OF AE DATABASE SOFTWARE Should be well integrated with Clinical data
management software User friendly Individual reports management features Easy for query Line listing of the entire database or part is
possible and easy Data extraction is easy, with desirable filters May also keep track of postmarketing Rx
utility and complaints data8
DATA MINING Getting something useful from lots and lots and lots of
data
Although it might appear so, the methodology is not linear, as it involves building and assessing models, carrying out simultaneous as well as serial steps
9
DRUG TOXIC SIGNALS WHO: “reported information on a possible
causal relationship between an adverse event and a drug, the relationship being unknown or incompletely documented previously.”
More than a single report needed Suggests Drug-ADR (D-R) association
(doesn't establish causality) An alert from any available source
Pre or post-marketing data generated Data-mining of especially post-marketing safety
databases 10
SIGNAL DETECTION Comes originally from electronics
engg. In signal detection theory
a receiver operating characteristic (ROC) illustrates performance of true positives vs. false positives out of the negatives
at various threshold settings Sensitivity is high with low true
negative rate Specificity is high with a true positive
rate11
Increasing the threshold would mean fewer false positives (and more false negatives). The actual shape of the curve is determined by the overlap the two distributions.
12
GOALS FOR ADR SIGNALS Low false positive signals
Drug-ADR association should be real Low false negative signal
Should not miss any Drug-ADR signal Early detection of signals is desirable False discovery rate → 0
Association Bupropion – seizures Olanzapine – thrombosis Pergolide – increased libido Risperidon – diabetes mellitus Terbinafine – stomatistis Rosiglitazone – liver function abnormalities
Dis-association Isotretinoine– suicide
Source: LAREB13
DATA MINING & SD PROTOCOL
Report collection Database
cleaning Quantitative
assessment Qualitative
assessment Evaluation Communication
Gavali, Kulkarni, Kumar and Chakraborty (2009), Ind J Pharmacol, 41, 162-166
14
15
DATA DISPLAY & MINING METHODS IN PV
No. Reports
Target R Other R Total
Target D a b nTD
Other D c d nOD
Total nTA nOA n
Methods for Mining
Reporting Ratio (RR): E(a) = nTD nTA/nProportional Reporting Ratio (PRR): E(a) = nTD c/nODOdds Ratio (OR): E(a) = b c/d
Need to accommodate uncertainty, especially if a is smallBayesian approaches provide a way to do this
Basic approach: possible Signal when R = a/E(a) is “large”
CRITERIA FOR A TOXIC DISPROPORTIONAL ADR
ROR =
χ2 =
Expected
ExpectedObserved 2)(
Significant disproportional Signal is detected when 2 is ≥ 4.0 and the rest ≥ 2.0
16
c
baa )( PRR
dc
ba /
CASESTUDY EXAMPLE: PROPRANOLOL-BRADYCARDIA
Gavali, Kulkarni, Kumar and Chakraborty (2009), Ind J Pharmacol, 41, 162-166
17
BAYESIAN STATISTICS IN SD
where Pr(R|D) is the posterior probability of observing a specific adverse event R given that a specific drug D is the suspect drug.
Pr(R) and Pr(D) are prior probabilities of observing R and D in the entire database.
Pr(R,D) is joint probability that both R and D were observed in the same database coincidentally.
Pr(R|D) / Pr(R) = Pr(R,D) / Pr(R)*Pr(D)
18
MULTI-ITEM GAMMA POISSON SHRINKER (MGPS) It ranks drug-event combinations According to how ‘interestingly large’ the
number of reports of that R-D combination compared with what would be expected if the
drug and event were statistically independent. Unlike the Information Component (IC), MGPS
technique gives an overall ranking of R-D combinations
IC gives a kind of non-relative measure (IC) for each R-D combination
19
MULTI-ITEM GAMMA POISSON SHRINKER(MGPS)
Reporting ratio
Modified Reporting
ratio
Modeled Reporting
ratio
Empirical Bayes
Geometric Mean (EBGM)
Stratification by gender, age, yr. etc.)
Bayesian shrinkage
for cell sizes
If the lower bound of 90%CI of EBGM (EB05) ≥2, R-D combinations occur twice as often as expected; also,For N>20 or so, N/E = EBGM = PRR
20
Hauben & Zhou. (2003) Drug Safety 26, 159-186
21
BAYESIAN CONFIDENCE PROPAGATION NEURAL NETWORK (BCPNN) The Uppsala Monitoring Centre (UMC) for WHO
databases uses BCPNN architecture for SD Neural networks are highly organized & efficient Give simple probabilistic interpretation of
network weights Analogous to a living neuron with its multiple
dendrites and single axon BCPNN calculates cell counts for all potential R-D
combinations in the database, not just those appearing in at least one report Done with two fully interconnected layers
One for all drugs and one for all adverse events22
INFORMATION COMPONENT (IC)
IC is used to decide whether the joint probabilities of ADRs are different from independent D & R.
This makes sense because if the events are independent the knowledge of one of the variables
contributes no new information about the other & does not reduce the uncertainty about Y (due to
knowledge about X)
IC = log2 [Pr(R,D) / Pr(R)*Pr(D)
23
POSITIVE IC AND TIME SCANS If Pr of co-occurrence of R & D is the same as
the product of the individual Pr of R & D, the Bayesian likelihood estimator Pr(R,D)/Pr(R)*Pr(D) will be equal to 1
This means equal prior and posterior probabilities Log2 1 = 0, therefore IC = 0
However, when posterior probability Pr(R|D) exceeds the prior probability P(R), the IC becomes more positive
An IC with a lower bound of 95% CI>0 that increases with sequential time scans is positive stable signal 24
CAPTOPRIL AND COUGH
The diagram shows the IC for the drug-ADR association. Error bars: + 95% CI.
R. Orre et al. (2000) Computational Statistics & Data Analysis 34, 473-493
25
A well known signal: suprofen and back pain. The diagram shows the IC for the drug-ADR association. Error bars: + 95% CI.
R. Orre et al. (2000) Computational Statistics & Data Analysis 34, 473-493
26
The development from 1973 to 1990 of the IC for the drug azapropazonevs. the photosensitivity reaction with 95% CI.
R. Orre et al. (2000) Computational Statistics & Data Analysis 34, 473-493
27
CHARACTERISTICS OF IC The preceding
diagrams show how the IC for the D-R (e.g., suprofen-back pain association varies over a span of time (e.g., 1983 – 1990)
The cumulative probability function for IC being greater than zero [Pr(IC>0)] develops over time. This association is seen with 80% certainty after the Q1, 1984. 28
DIGOXINE & RASH: AN INTERESTING CASE
Although overall negative IC, when examined across age group, increasing age was aasociated with positive IC.
R. Orre et al. (2000) Computational Statistics & Data Analysis 34, 473-493
29
PACLITAXEL-TACHYCARDIA
Change of IC between 1970 to 2010 for the association of tachycardia-paclitaxel. The IC is plotted from year of 1970 to 2010 with five year intervals with 95% CI
Singhal & Chakraborty. Unpublished data
30
DOCETAXEL - FLUSHING
Change of IC between 1970 to 2010 for the association of Doclitaxel-flushing.
Singhal & Chakraborty. Unpublished data
-2-101234567
1970-1975 1976-1980 1981-1985 1986-1990 1991-1995 1996-2000 2001-2005 2006-2010
E(IC)
Time (Year)
31
CONCLUDING REMARKS Statistical data mining for drug-adverse reaction offers
a useful, non-invasive and sophisticated tool for unknown or incompletely signals
Mainly proportional reporting ratios (PRR) and Bayesian data mining including Empirical Bayesian Screening (EBS) & Bayesian Confidence Propagation Neural Network (BCPNN) are used
PRRs and EBS are comparable, only EBS has an advantage with D-R combinations in very small numbers but it is based on relative ranking
BCPNN provides an IC (a kind of threshold) for signaling that applies to any D-R cells irrespective of ranking
The signals do not establish causality, they only indicate very strong association between D & R
With all methods of data mining (especially PRR, EBS & BCPNN), the quality & size of the database is very important (can amplify or dilute a signal)
32
THANK YOU VERY MUCH
Acknowledgement: Ms. Raji Nair
33