extracting drug-drug interaction from text using negation...

8
Extracting Drug-Drug Interaction from Text Using Negation Features Estudio del efecto de la negaci´on en la detecci´on de interacciones entre f´armacos Behrouz Bokharaeian, Alberto D´ ıaz NIL Group Universidad Complutense de Madrid Madrid, Spain [email protected], [email protected] Miguel Ballesteros Natural Language Processing Group Universitat Pompeu Fabra Barcelona, Spain [email protected] Resumen: La extracci´ on de relaciones entre entidades es una tarea muy impor- tante dentro del procesamiento de textos biom´ edicos. Se han desarrollado muchos algoritmos para este prop´ osito aunque s´ olo unos pocos han estudiado el tema de las interacciones entre f´ armacos. En este trabajo se ha estudiado el efecto de la negaci´ on para esta tarea. En primer lugar, se describe c´ omo se ha extendido el corpus DrugDDI con anotaciones sobre negaciones y, en segundo lugar, se muestran una serie de experimentos en los que se muestra que tener en cuenta el efecto de la negaci´ on puede mejorar la detecci´ on de interacciones entre f´ armacos cuando se combina con otros m´ etodos de extracci´ on de relaciones. Palabras clave: Interacciones entre f´ armacos, negaci´ on, funciones kernel, m´ aquinas de vectores de soporte, funciones kernel. Abstract: Extracting biomedical relations from text is an important task in BioMedical NLP. There are several systems developed for this purpose but the ones on Drug-Drug interactions are still a few. In this paper we want to show the ef- fectiveness of negation features for this task. We firstly describe how we extended the DrugDDI corpus by annotating it with the scope of negation, and secondly we report a set of experiments in which we show that negation features provide benefits for the detection of drug-drug interactions in combination with some simple relation extraction methods. Keywords: Drug-Drug interaction, Negation, Support vector machines, kernel- based methods 1. Introduction A drug-drug interaction (DDI) occurs when one drug affects the level or activity of an- other drug, this may happen, for instance, in the case of drug concentrations. This interac- tion can result on decreasing its effectiveness or possibly altering its side effects that may even the cause of health problems to patients (Stockley, 2007). There is a great amount of DDI databases and this is why health care experts have dif- ficulties to be kept up-to-date of everything published on drug-drug interactions. This fact means that the development of tools for automatically extracting DDIs from biomed- ical resources is essential for improving and updating the drug knowledge and databases. There are also many systems on the ex- traction of biomedical relations from text; however the research on studying the effect of negation in biomedical relation extraction is still limited. On the other hand, nega- tion is very common in clinical texts and it is one of the main causes of making er- rors in automated indexing systems (Chap- man et al., 2001); the medical personnel is mostly trained to include negations in their reports. Particularly when we are detect- ing the interaction between drugs, the pres- ence of negations can produce false positives classifications, for instance, the sentence Co- administration of multiple doses of 10 mg of lenalidomide had no effect on the single dose pharmacokinetics of R- and S- warfarin a DDI between lenalidomide and warfarin could be detected as a practicable fact if negation is not taken into account. We there- fore believe that detecting the words that Procesamiento del Lenguaje Natural, Revista nº 51, septiembre de 2013, pp 49-56 recibido 30-04-2013 revisado 18-06-2013 aceptado 21-06-2013 ISSN 1135-5948 © 2013 Sociedad Española Para el Procesamiento del Lenguaje Natural

Upload: others

Post on 22-Jul-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extracting Drug-Drug Interaction from Text Using Negation Featuresrua.ua.es/dspace/bitstream/10045/30587/1/PLN_51_05.pdf · 2016. 4. 28. · for the detection of drug-drug interactions

Extracting Drug-Drug Interactionfrom Text Using Negation Features

Estudio del efecto de la negacion en la deteccion de interacciones entrefarmacos

Behrouz Bokharaeian, Alberto DıazNIL Group

Universidad Complutense de MadridMadrid, Spain

[email protected], [email protected]

Miguel BallesterosNatural Language Processing Group

Universitat Pompeu FabraBarcelona, Spain

[email protected]

Resumen: La extraccion de relaciones entre entidades es una tarea muy impor-tante dentro del procesamiento de textos biomedicos. Se han desarrollado muchosalgoritmos para este proposito aunque solo unos pocos han estudiado el tema delas interacciones entre farmacos. En este trabajo se ha estudiado el efecto de lanegacion para esta tarea. En primer lugar, se describe como se ha extendido elcorpus DrugDDI con anotaciones sobre negaciones y, en segundo lugar, se muestranuna serie de experimentos en los que se muestra que tener en cuenta el efecto dela negacion puede mejorar la deteccion de interacciones entre farmacos cuando secombina con otros metodos de extraccion de relaciones.Palabras clave: Interacciones entre farmacos, negacion, funciones kernel, maquinasde vectores de soporte, funciones kernel.

Abstract: Extracting biomedical relations from text is an important task inBioMedical NLP. There are several systems developed for this purpose but the oneson Drug-Drug interactions are still a few. In this paper we want to show the ef-fectiveness of negation features for this task. We firstly describe how we extendedthe DrugDDI corpus by annotating it with the scope of negation, and secondly wereport a set of experiments in which we show that negation features provide benefitsfor the detection of drug-drug interactions in combination with some simple relationextraction methods.Keywords: Drug-Drug interaction, Negation, Support vector machines, kernel-based methods

1. Introduction

A drug-drug interaction (DDI) occurs whenone drug affects the level or activity of an-other drug, this may happen, for instance, inthe case of drug concentrations. This interac-tion can result on decreasing its effectivenessor possibly altering its side effects that mayeven the cause of health problems to patients(Stockley, 2007).

There is a great amount of DDI databasesand this is why health care experts have dif-ficulties to be kept up-to-date of everythingpublished on drug-drug interactions. Thisfact means that the development of tools forautomatically extracting DDIs from biomed-ical resources is essential for improving andupdating the drug knowledge and databases.

There are also many systems on the ex-traction of biomedical relations from text;

however the research on studying the effectof negation in biomedical relation extractionis still limited. On the other hand, nega-tion is very common in clinical texts andit is one of the main causes of making er-rors in automated indexing systems (Chap-man et al., 2001); the medical personnel ismostly trained to include negations in theirreports. Particularly when we are detect-ing the interaction between drugs, the pres-ence of negations can produce false positivesclassifications, for instance, the sentence Co-administration of multiple doses of 10 mgof lenalidomide had no effect on the singledose pharmacokinetics of R- and S- warfarina DDI between lenalidomide and warfarincould be detected as a practicable fact ifnegation is not taken into account. We there-fore believe that detecting the words that

Procesamiento del Lenguaje Natural, Revista nº 51, septiembre de 2013, pp 49-56 recibido 30-04-2013 revisado 18-06-2013 aceptado 21-06-2013

ISSN 1135-5948 © 2013 Sociedad Española Para el Procesamiento del Lenguaje Natural

Page 2: Extracting Drug-Drug Interaction from Text Using Negation Featuresrua.ua.es/dspace/bitstream/10045/30587/1/PLN_51_05.pdf · 2016. 4. 28. · for the detection of drug-drug interactions

are affected by negations may be an essen-tial part in most biomedical text mining tasksthat try to obtain automatically the accurateknowledge from textual data.

In order to avoid errors derived of us-ing automatic negation detection algorithmssuch as NegEx (Amini et al., 2011), we anno-tated a DDI corpus - previously developed-with the scope of negations. The corpusis called DrugDDI corpus (Segura-Bedmaret al., 2011b), and it was developed forthe Workshop on Drug-Drug Interaction Ex-traction (Segura-Bedmar et al., 2011a) thattook place in 2011 in Huelva, Spain. TheDrugDDI corpus contains 579 documents ex-tracted from the DrugBank database. We an-alyzed the corpus and we annotated the sen-tences within with the scope of negation inorder to find the effect of negation features inthe detection of DDIs. We annotated it usingthe same guidelines of the BioScope corpus(Vincze et al., 2008), that is, we annotatedcues and scopes affected by negation state-ments into sentences in a linear format.

For detecting the DDIs we used a fast ver-sion of a support vector machine (henceforth,SVM) classifier with a linear kernel based ona bag of words (henceforth, BOW) represen-tation obtained from the extracted features.We carried out some experiments with dif-ferent kernels (global context, subtree, short-est path), with and without negation infor-mation. The results presented in this papershow that negation features can improve theperformance of relation extraction methods.

The rest of the paper is structured asfollows. In Section 2 we discuss previousrelated work about biomedical relation ex-traction and relevant information about theDrugDDI corpus. In Section 3 we explainhow we annotated the corpus with the scopeof negation. In Section 4 we explain how weused the obtained information from negationtags to improve the DDI detection task. InSection 5 we discuss the results obtained. Fi-nally, in Section 6, we show our conclusionsand suggestions for future work.

2. Related work

In this Section we describe the DrugDDI cor-pus and we present some related work onkernel-based relation extraction.

2.1. DrugDDI corpus

There are some annotated corpora thatwere developed with the intention of study-ing biomedical relation extraction, such as,Aimed (Bunesu et al., 2005), LLL (Nedellecet al., 2005), BioCreAtIvE-PPI (Krallingeret al., 2008) on protein-protein interactions(PPI) and DrugDDI (Segura-Bedmar et al.,2011b), on drug-drug interactions. In par-ticular, the DrugDDI corpus is the first an-notated corpus on the phenomenon of inter-actions among drugs and it is the one thatwe used for our experiments. It was de-signed with the intention of encouraging theNLP community to conduct further researchon this type of interactions. The DrugBankdatabase (Wishart et al., 2008) was used assource to develop this corpus. This databasecontains unstructured textual information ondrugs and their interactions.

The DrugDDI corpus is available in twodifferent formats: (i) the first one con-tains the information provided by MMTX(Aronson, 2001) and the unified formatadapted from PPI corpora format proposedin (Pyysalo et al., 2008). The unified XMLformat (see Figure 1) does not contain anylinguistic information; it only provides theplain text sentences, their drugs and their in-teractions. Each entity (drug) includes ref-erence (origId) to the sentence identifier inthe MMTX format corpus. For each sen-tence contained in the unified format, theannotations correspond to all the drugs enti-ties and the possible DDI candidate pair thatrepresents the interaction. Each DDI candi-date pair is represented as a pair ID node inwhich the identifiers of the interacting drugsare registered on its e1 and e2 attributes. Ifthe pair is a DDI, the interaction attributeis set to true, otherwise this attribute is setto false. Table 1 shows related statistics ofthe DrugDDI corpus (Segura-Bedmar et al.,2011b).

2.2. Biomedical RelationExtraction

Nowadays, there are many systems developedfor extracting biomedical relations from textthat can be categorized in (i) feature basedand (ii) kernel-based approaches. Feature-based approaches transform the context ofentities into a set of features; this set isused to train a data-driven algorithm. Onthe other hand, kernel-based approaches are

Behrouz Bokharaeian, Alberto Díaz y Miguel Ballesteros

50

Page 3: Extracting Drug-Drug Interaction from Text Using Negation Featuresrua.ua.es/dspace/bitstream/10045/30587/1/PLN_51_05.pdf · 2016. 4. 28. · for the detection of drug-drug interactions

Figure 1: The unified XML format in the DrugDDI corpus.

No.Documents 579Sentences 5,806Drugs 8,260Sentences with at least two drug 3,775Sentences with at least one DDI 2,044Sentences with no DDI 3,762Candidate drug pairs 30,757Positive interactions 3,160Negative interactions 27,597

Table 1: Basic statistics for the DrugDDIcorpus.

based on similarity functions. This idea pro-vides the option of checking structured rep-resentations, such as parse trees and comput-ing the similarity between different represen-tations directly. Combing kernel based andfeature based approaches were investigatedby Thomas et al. (2011), they developed avoting system (based on majority) that ben-efits from the outcomes of several methods.

So far, the sequence and tree kernels arethe ones that have shown a superior perfor-mance for the detection of biomedical rela-tions from text (Bunesu et al., 2005). Inparticular, global context kernel, subtree andshortest path kernels are three important ker-nel methods that were applied successfullyfor biomedical relation extraction task. Forinstance, Giuliano et al. (2005) applied byconsidering three different patterns and theycalculated the similarity between two sen-tences by computing common n-grams of twodifferent patterns.

The shortest path kernel (Bunescu yMooney, 2005) uses the shortest path be-tween two entities (or drugs) in a phrasestructure tree. The subtree kernel (Mos-chitti, 2006) counted the number of commonsubtrees in whole parse trees by comparingtwo different sentences. Moreover, a com-parative survey about different kernel-basedapproaches and their performances can befound in (Frunza y Inkpen, 2010).

More recent research on tree kernels were

carried out by Guodong et al. (2010). Theyintroduced a ”context-sensitive“ convolutiontree kernel, which specifies both ”context-free“ and ”context-sensitive“ sub-trees bytraversing the paths of their ancestor nodesas their contexts to capture structural infor-mation in the tree structure. Another mo-tivating work was reported by Chen et al.(2011), that presented a proteing-protein in-teraction pair extractor, it consists on a SVMclassifier that exploits a linear kernel with acomplete set of features.

Finally, Simoes et al. (2013) introducedan approach for Relation Extraction (RE)based on labeled graph kernels, they pro-posed an implementation of a random walkkernel (Neuhaus y Bunke, 2006) that mainlyexplores two characteristics: (i) the words be-tween the candidate entities and (ii) the com-bined information from different sources.

3. Annotating the DrugDDIcorpus with negations

We followed the Bioscope guidelines in orderto annotate the corpus (Vincze et al., 2008).The main idea is based in the detection of aset of negation cues, like ’no’ or ’not’. Af-ter this, the scope of the cue is calculatedbased on its syntactic context. There areseveral systems that annotate the scope ofnegation, in our approach we used the onepublished by Ballesteros et al. (Ballesteros etal., 2012), which is publicly available,1 rule-based system that works on biomedical litera-ture (Bioscope) and the input is just the sen-tence without any required annotation, whichserves very well for our purposes.

We used as input all the sentences of theDrugDDI corpus, containing 5,806 sentencesand 579 files. The output was therefore aset of sentences annotated with the scope ofnegation. After applying the system, we ob-served that there were a set of 1,340 sen-tences containing negations in the corpus,

1http://minerva.fdi.ucm.es:8888/ScopeTagger/

Extracting Drug-Drug interaction from text using negation features

51

Page 4: Extracting Drug-Drug Interaction from Text Using Negation Featuresrua.ua.es/dspace/bitstream/10045/30587/1/PLN_51_05.pdf · 2016. 4. 28. · for the detection of drug-drug interactions

which conforms 23% of the corpus.Taking into account that the negation

scope detection system is fully automatic, wemanually checked the outcome correcting theannotations when needed. In order to do so,we divided the annotated corpus in 3 differentsets that were assigned to 3 different evalu-ators. The evaluators checked all the sen-tences contained in each set and correctedthe sentences that contained annotation er-rors. After this revision, a different evaluatorrevised all the annotations produced by theother 3 evaluators. Finally, we got the wholeset of 1,340 sentences (correctly) annotatedwith the scope of negation.

The algorithm produced errors -accordingto the evaluators- in 350 sentences fromthe 1,340, including false positives matches(there were 16 cases). Which means that 74%of sentences was annotated correctly in anautomatic way, when considering a full scopematch. The main errors produced by the al-gorithm were related with the processing ofpassive voice sentences, commas, and copu-lative keywords (and, or). In particular theproblem of passive voice sentences was re-lated with the pattern It + to be + not +past participle, which seems that it was notcaptured by the system, at least in all cases.The false positives were related with the cuefailure, which is not a negation when it is anoun modified by an adjective, for instance,renal failure or heart failure. In the DrugDDIcorpus these words appear always as nouns,and therefore all of the performed annota-tions were incorrect.

The following paragraph shows some ex-amples and corrections made by the evalua-tors:

Scope closed in an incorrect way con-taining words from two different clausessuch as: Example: It is [{not} clearwhether this represents an interactionwith TIKOSYN or the presence of moresevere structural heart disease in pa-tients on digoxin;]. The scope should beclosed in or.

Scope closed in an incorrect way incopulative coordinated sentences: Ex-ample: The following medications havebeen administered in clinical trials withSimulect? with [{no} increase in adversereactions: ATG/ALG] , azathioprine,corticosteroids, cyclosporine, mycophe-

nolate mofetil, and muromonab-CD3.

Scope opened incorrectly in coordinatedcopulative sentences: Example: In an invitro study, cytochrome P450 isozymes1A2, 2A6, 2C9, 2C19, 2D6, 2E1, [and3A4 were{not} inhibited by exposure tocevimeline].

Some passive voice sentences were notdetected. In particular, as it is alreadymentioned, sentences with the format ’It(this and that) + finite form of to be +not + past participle’. Example: [Con-comitant use of bromocriptine mesylatewith other ergot alkaloids is{not} recom-mended].

We also carried out some analysis concern-ing the number of different cues in the corpusand the number of different errors observed.Table 2 shows that not and no are by farthe most frequent cues in the corpus. It canbe observed that the most problematic cue isneither ... nor.

Cue No. MODFs RateNot 855 266 31.1%No 439 58 13.2%without 47 8 17.0%Neither ... nor 14 12 85.7%Absence 10 5 50%Lack 8 1 12.5%cannot 7 4 57.1%

Table 2: Statistics of negations cues in thecorpus and modifications for each cue in the

manual checking process.

We finally explored the sentences that arenot automatically annotated but they indeedshow a negative statement in order to findfalse negatives. We looked into several nega-tions cues that are not detected by the systemsuch as unaffected, unchanged or nonsignifi-cant. We detected and corrected 75 differentsentences that belong to this problem.

Here we show some examples of false neg-atives:

[The pharmacokinetics of naltrexoneand its major metabolite 6-beta-naltrexol were {unaffected} followingco-administration with Acamprosate].

[Mean T max and mean plasma elimi-nation half-life of albendazole sulfoxidewere {unchanged}].

Behrouz Bokharaeian, Alberto Díaz y Miguel Ballesteros

52

Page 5: Extracting Drug-Drug Interaction from Text Using Negation Featuresrua.ua.es/dspace/bitstream/10045/30587/1/PLN_51_05.pdf · 2016. 4. 28. · for the detection of drug-drug interactions

Monoamine Oxidase Inhibition: Line-zolid is a reversible, [{nonselective} in-hibitor of monoamine oxidase].

Therefore, the corpus finally contains1,399 sentences annotated with the scope ofnegation, of which 932 correspond to sen-tences in which there are at least two drugsmentioned. It is worth mentioning that thereare 1,731 sentences with 2 or more drug men-tions but no DDI, and 2,044 with 2 or moredrugs and at least one interaction.

Finally, the extension of the DrugDDI cor-pus consists of adding a new tag in the an-notation of each sentence with the scope ofnegations. Figure 3 shows an example. Theproduced corpus is available for public use.2

4. DDI detection

In this Section, we explain in detail the ex-periments we carried out by using negationfeatures. First, we illustrate in detail themethods we used without negation features,and then we present our proposed combinednegation method, see figure 4. All the experi-ments were carried out by using the Stanfordparser3 for tokenization and constituent pars-ing (Cer et al., 2010), and the SVMs providedby Weka as training engine.

4.1. DDI detection withoutnegation features

The DDI extraction method consists of fourdifferent processes: (1) initial prepossessing,(2) feature extraction, (3) Bag of Words com-putation and (4) classification. The prepro-cessing step (1) consists of removing somestop words and tokens, for instance remov-ing question marks at the beginning of thesentence. We also carried out a normaliza-tion task for some tokens due to the usage ofdifferent encoding and processing methods,mainly HTML tags. In the feature extractionstep (2), we extracted three different featuresets corresponding to different used relationextraction methods. The feature extractionstep for global context kernel consists of ex-tracting fore-between, between, and between-after tokens that we mentioned in Section 2.The feature extraction step for shortest pathkernel method included constituent parsing

2http://nil.fdi.ucm.es/sites/default/files/NegDrugDDI.zip

3http://nlp.stanford.edu/software/lex-parser.shtml

of the sentence and then extracting short-est path between two drugs in the gener-ated parse tree. And for the subtree kernelwe also extracted all subtrees from the men-tioned constituent parse tree. After extract-ing features, we applied the BOW method (3)to generate new feature sets that the SVMclassifier uses. The aim of this step is pro-ducing a new representation of the instanceswhich is used in the classification step. Andfinally in the classification step (4), we ap-plied the Weka SVM classifier (Platt, 1998)(SMO), with a linear composition of featuresproduced by the BOW method to detect theinteractions among drugs. The Inner productof new features was used as kernel functionbetween two new representations.

4.2. DDI detection with negationfeatures

In this section, we explain our proposedmethod that merges negation features withthe features mentioned in Section 4.1. Wedivided the corpus in instances affected bynegation and instances without negationstatements. The last ones were classified inthe same way as in Section 4.1, while forthe instances with negations we added nega-tion features to the representation. The pos-itive instances were classified in the sameway as previous approaches but the sentencescontaining negations were categorized usingnegation features in addition to the other pre-vious features. As in previous subsection, thecombined method for instances containingnegations consists of 4 steps. After a simplepreprocessing step we carried out a featureextraction process. In this step, we generatedsix negation features in addition to three fea-ture sets corresponding to global context ker-nel (GCK), Shortest Path and Subtree kernelmethods. Negation feature consists of tokensinside the negation scope, left side tokensoutside of the negation scope and right sidetokens, and the negation cue tokens, negationcue, and position of open and closed negationscope. For instance in the sentence shown inFigure 3: tokens inside brackets create mid-dle scope features, right side tokens constructright features and tokens in the left side of thenegation scope form left scope features. Asin the previous subsection, we used a BOWmethod to convert negation string features toword features. Finally, the new feature set isused to classify the drug-drug interactions by

Extracting Drug-Drug interaction from text using negation features

53

Page 6: Extracting Drug-Drug Interaction from Text Using Negation Featuresrua.ua.es/dspace/bitstream/10045/30587/1/PLN_51_05.pdf · 2016. 4. 28. · for the detection of drug-drug interactions

Figure 2: A sentence annotated with the Scope of Negation in our version of the DrugDDIcorpus.

making use of the Weka SVM.

Figure 3: Left, middle and right side scopeand negation cue in a negative sentence.

In summation, our approach is a featurebased method that uses a bag of word kernelutilizing basic features to compute simple ba-sic kernels and negation features. We applieda fast implementation of the support vectormachine provided in Weka, which uses se-quential minimal optimization. By carryingout some experiments we also limited the sizeof the words in each feature bag in the BOWapproach to 1000 words per feature class.

Figure 4: The different processes followedby our proposed approach.

5. Evaluation

5.1. Evaluation Setup

In order to demonstrate the improvementsprovided by using negation features, our ex-periments consisted of a 10-fold cross valida-tion over the training part of the DrugDDIcorpus. Therefore, our results are not di-rectly comparable to the ones provided inthe DDI challenge (Segura-Bedmar et al.,2011a). The training DrugDDI corpus con-tains 437 documents extracted from Drug-Bank database. It consists of 4267 sentenceswith average of 9.8 sentences per documentand 25,209 instances with 2,402 interactionsbetween different drugs.

Our measurement metrics included truepositive, false positive, false negative, totalnumber of positive instances, Precision, Re-call and F-1 score.

5.2. Results

Table 3 shows the outcomes of the exper-iments by computing the metrics and bytraining over the DDI corpus. The ta-ble shows results for Global context ker-nel (GCK), Subtree and shortest path ker-nel (SubtreeK) and corresponding combinednegation methods (GCKNS = GCK withnegation features; SubtreeKNS = Subtreekernel with negation features). The firstthree rows of the table show the performanceof the three basic kernels and the last threeones (with NS postfix) show the outcomes forthe combined version that includes negationfeatures. The best result was obtained withGCKNS, and the worst result was obtainedby the shortest path tree approach. More-over, the best improvement was obtainedby the GCK approach; it improves 3.8 per-centual points of the F score.

As we can see in the table, there is animproved performance when we applied thenegation features for classification. This fact

Behrouz Bokharaeian, Alberto Díaz y Miguel Ballesteros

54

Page 7: Extracting Drug-Drug Interaction from Text Using Negation Featuresrua.ua.es/dspace/bitstream/10045/30587/1/PLN_51_05.pdf · 2016. 4. 28. · for the detection of drug-drug interactions

Method TP FP FN Total P R F1GCK: 902 1094 1500 2402 0.452 0.376 0.410SubtreeK: 818 1105 1584 2402 0.425 0.341 0.378ShortestPathTK: 795 1066 1607 2402 0.427 0.331 0.373GCKNS: 987 1021 1415 2402 0.492 0.411 0.448SubtreeKNS: 919 1280 1483 2402 0.418 0.383 0.399ShortestPathTKNS: 936 1240 1466 2402 0.430 0.390 0.409

Table 3: 10- cross validation results for the methods that do not use negation features and themethods that use negation features.

demonstrates our hypothesis and the empha-sizes the purpose of the present work.

6. Conclusions and Future Work

Due to the huge amount of drug related in-formation in bio-medical documents and theimportance of detecting dangerous drug-druginteractions in medical treatments, we believethat implementing automatic Drug-Drug in-teraction extraction methods from text iscritical. The DrugDDI corpus is the firstannotated corpus for Dug-Drug interactiontasks used in the DDI Extraction 2011 chal-lenge.

In this paper, after reviewing relatedwork on biomedical relation extraction, wefirst explained the process of annotating theDrugDDI corpus with negation tags; andthen we explored the performance of combingnegation features with three simple relationextraction methods. Our results show the su-perior performance of the combined methodutilizing negation features over the three ba-sic experimented relation extraction meth-ods.

However, the experiments also show thatthe application of negation features can in-deed improve the relation extraction perfor-mance but the obtained improvement clearlydepends on the number and rate of positiveand negative relations, rate of negative cuesin the corpus, and other relation extractionfeatures. It is also true that combining nega-tion features with a huge number of other fea-tures may not improve the performance andeven may hurt the final result, and this iswhy we used a limited number of features.It is therefore obvious that corpora havingmore sentences with negation cues can bene-fit more from using negation features.

For further work, we plan to use a differenttype of annotation such as negation events

instead of scopes, and also handling hedgecues and speculative statements in conjunc-tion with negations.

References

Amini, I., M. Sanderson, D. Martinez y X. Li.2011. Search for clinical records: Rmitat trec 2011 medical track. En Proceed-ings of the 20th Text REtrieval Conference(TREC 2011).

Aronson, A. 2001. Effective mappingof biomedical text to the UMLSMetathesaurus: the MetaMap pro-gram. En Proceedings of the AMIASymposium, paginas 17–27. URLhttp://metamap.nlm.nih.gov/.

Ballesteros, M., V. Francisco, A. Diaz, J.Herrera y P. Gervas. 2012. Inferringthe scope of negation in biomedical doc-uments. En Proceedings of the 13th In-ternational Conference on Intelligent TextProcessing and Computational Linguistics(CICLING 2012).

Bunescu, R., R. Ge, R. Kate, E. Mar-cotte, R. Mooney, A. Ramani y Y. Wong.2005. Comparative experiments on learn-ing information extractors for proteinsand their interactions. Artificial Intelli-gence in Medicine, 33(2):139–155.

Bunescu R. y R. Mooney. 2005. A short-est path dependency kernel for relation ex-traction. En Proceedings of the conferenceon Human Language Technology and Em-pirical Methods in Natural Language Pro-cessing, paginas 724–731.

Cer, D., M. de Marneffe, D. Jurafsky y C. D.Manning. 2010. Parsing to Stanford De-pendencies: Trade-offs between speed andaccuracy. En Proceedings of the 7th In-

Extracting Drug-Drug interaction from text using negation features

55

Page 8: Extracting Drug-Drug Interaction from Text Using Negation Featuresrua.ua.es/dspace/bitstream/10045/30587/1/PLN_51_05.pdf · 2016. 4. 28. · for the detection of drug-drug interactions

ternational Conference on Language Re-sources and Evaluation(LREC 2010) .

Chapman, W., W. Bridewell, P. Hanbury, G.F. Cooper y B. G. Buchanan. 2001. ASimple Algorithm for Identifying NegatedFindings and Diseases in Discharge Sum-maries. Journal of Biomedical Informac-tics, 34(5):301-310.

Chen, Y., F. Liu y B. Manderick. 2011. Ex-tract Protein-Protein Interactions Fromthe Literature Using Support Vector Ma-chines with Feature Selection. BiomedicalEngineering, Trends, Researchs and Tech-nologies.

Frunza, O. y D. Inkpen. 2010. Extrac-tion of disease-treatment semantic rela-tions from biomedical sentences. Proceed-ings of the 2010 Workshop on BiomedicalNatural Language Processing, paginas 91–98.

Giuliano, C., A. Lavelli y L. Romano. 2005.Exploiting shallow linguistic informationfor relation extraction from biomedical lit-erature. En Proceedings of the EleventhConference of the European Chapter of theAssociation for Computational Linguistics(EACL-2006), paginas 5–7.

Guodong, Z., Q. Longhua y F. Jianxi. 2010.Tree kernel-based semantic relation ex-traction with rich syntactic and semanticinformation. International Journal on In-formation Sciences, 180(8):1313–1325.

Krallinger, M., A. Valencia y L. Hirschman.2008. Linking genes to literature: textmining, information extraction, and re-trieval applications for biology. GenomeBiology , 9(Suppl 2):S8.

Moschitti, A. 2006. Making Tree KernelsPractical for Natural Language Learning.En Proccedings of the 11th Conference ofthe European Chapter of the Associationfor Computational Linguistics.

Nedellec, C. 2004. Machine Learning for In-formation Extraction in Genomics - Stateof the Art and Perspectives. Text Miningand its Applications, Springer Verlag.

Neuhaus, M. y H. Bunke. 2006. A RandomWalk Kernel Derived from Graph EditDistance. Lecture Notes in Computer Sci-ence, 4109(5):191-199.

Platt, J. 1998. Sequential Minimal Opti-mization: A Fast Algorithm for TrainingSupport Vector Machines. Advances inkernel methods - Support vector learning.

Pyysalo, S., A. Airola, J. Heimonen, J.Bjorne, F. Ginter y T. Salakoski. 2008.Comparative analysis of five protein-protein interaction corpora. BMC bioin-formatics, 9(Suppl 3):S6.

Segura-Bedmar, I., P. Martınez y D. Sanchez-Cisneros. 2011. En Proceedings of the 1stChallenge task on Drug-Drug InteractionExtraction (DDIExtraction 2011). CEURWorkshop Proceedings, Vol. 761.

Segura-Bedmar, I., P. Martınez y C. de PabloSanchez. 2011. Using a shallow linguis-tic kernel for drug-drug interaction extrac-tion. Journal of Biomedical Informatics,44(5):789–804.

Simoes, G., D. Matos y H. Galhardas. 2013.A Labeled Graph Kernel for RelationshipExtraction. CoRR, abs/1302.4874.

Stockley, I. H. 2007. Stockley’s Drug Inter-action. Pharmaceutical Press.

Thomas, P., M. Neves, I. Solt, D. Tikk y U.Leser. 2011. Relation extraction for drug-drug interactions using ensemble learn-ing. En Proceedings of the First Challengetask on Drug-Drug Interaction Extraction(DDIExtraction 2011), pp:11–17.

Vincze, V., G. Szarvas, R. Farkas, G. Moray J. Csirik. 2008. The BioScope cor-pus: annotation for negation, uncertaintyand their scope in biomedical texts. BMCBioinformatics, 9(Suppl 11):S9.

Wishart, D. R., C. Knox , A. C. Guo, D.Cheng, S. Shrivastava, D. Tzur, B. Gau-tam, M. Hassanali. 2008. DrugBank:a knowledgebase for drugs, drug actionsand drug targets. Nucleic Acids Res.,36(Suppl 1):D901-D906.

Behrouz Bokharaeian, Alberto Díaz y Miguel Ballesteros

56