![Page 1: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/1.jpg)
Intro Projects
Social Media Mining in the context of a pharmaceuticalcompany, and other applications
Fabio RinaldiSUPSI/IDSIA and University of Zurich, Switzerland
Swiss Institute of BioinformaticsFondazione Bruno Kessler, Trento, Italy
February 4, 2020
![Page 2: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/2.jpg)
Intro Projects
Projects
• Extraction of Information from the Scientific Literature• PsyMine (COGITO)• MelanoBase (SNF)• Author Name Disambiguation (Roche)
• Clinical records: SwissMADE (SNF/NRP74)
• Social Media: MedMon (InnoSuisse)
• Veterinary reports: collaboration with VetSuisse
• Assisted curation (NIH), collaboration with RegulonDB
• Tools and Resources: BioTermHub, OGER
http://www.ontogene.org/
![Page 3: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/3.jpg)
Intro Projects
PsyMine
![Page 4: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/4.jpg)
Intro Projects
From disorders to etiological factors
![Page 5: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/5.jpg)
Intro Projects
Creation of a reference corpus
![Page 6: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/6.jpg)
Intro Projects
MelanoBase
• Most serious type of skin cancer
• Develops from the pigment-containing cells (melanocytes)
• Primary cause of is UV exposure
![Page 7: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/7.jpg)
Intro Projects
MelanoBase
![Page 8: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/8.jpg)
Intro Projects
Author name disambiguation
![Page 9: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/9.jpg)
Intro Projects
SwissMADE: The challenge of clinical text
[http://carecentra.com/clinical-notes-mining/]
• SwissMADE (Monitoring ofAdverse Drug Event)
• older patients (aged ≥ 65years)
• antithrombotic drugs
• using structured andunstructured parts of theEHRs
• involves five hospitals
![Page 10: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/10.jpg)
Intro Projects
MedMon
• Bring patient insights into the lifecycle of pharmaceutical products:from development to surveillance(pharmacovigilance)
• Mining the web and social networks formentions of Adverse Drug Reactions
• Collaboration with a major PharmaCompany and another Swiss University
![Page 11: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/11.jpg)
Intro Projects
VetMine
![Page 12: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/12.jpg)
Intro Projects
VetMine
![Page 13: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/13.jpg)
Intro Projects
VetMine
![Page 14: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/14.jpg)
Intro Projects
VetMine
![Page 15: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/15.jpg)
Intro Projects
Assisted curation
• The OntoGene/BioMeXT group has been active in assisted curation since 2010with the SASEBio project (Semi-Automated Semantic Enrichment of theBiomedical Literature).
• Since 2013 we are collaborating with the RegulonDB database in a project aimedat testing and gradually introduce assisted curation techniques in their curationpipeline.
• RegulonDB is a database of the regulatory network of Escherichia coli K-12.
![Page 16: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/16.jpg)
Intro Projects
![Page 17: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/17.jpg)
Intro Projects
Example
We additionally found that expression of the mntP gene is upregulated by manganesethrough MntR.
• Given: MntR [+] mntP
• To identify: condition [manganese]
![Page 18: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/18.jpg)
Intro Projects
OxyR experiment
• TOPIC: oxidative stress by OxyR
• CORPUS: 46 papers, curated in RegDB
• METHODS: automated annotations of entitiesvia OntoGene, selection of sentences via ODINfilters, manual validation
• RESULTS: 100% of RIs retrieved, includingTF, EFFECT and their TG
• Identified the growth conditions for 15 of the20 Ris of OxyR checking only a limited set ofsentences (about 10% of the article is read)
[?]
![Page 19: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/19.jpg)
Intro Projects
![Page 20: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/20.jpg)
Intro Projects
![Page 21: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/21.jpg)
Intro Projects
![Page 22: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/22.jpg)
MedMon - Monitoring Social Media Content forPatient-related Information:Approaches and Challenges
Tilia Ellendorff, Fabio Rinaldi
Universitat ZurichInstitut fur Computerlinguistik
February 4, 2020
MedMon February 4, 2020 1 / 15
![Page 23: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/23.jpg)
Introduction
The MedMon Project:Monitoring of internet resources for pharmaceutical research anddevelopment
Innosuisse project together with a big pharmaceutical companyand researchers from the University of Applied Sciences of theGrisons
Main motivations and goals:Discovering of unmet medical needs of rare disease patients
Assessment of patients’ perception of their specific disease burden
Early detection and tracking of epidemic outbreaks
Optimization of patient recruitment for clinical studies (location, agegroups, etc.)
Monitoring of conversations on clinical trials and treatments
MedMon February 4, 2020 2 / 15
![Page 24: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/24.jpg)
MedMon: Data Sources
Use-cases:ParkinsonsMultiple SclerosisAngelmansViral Infections
Data Access via MedMonPortal:
TwitterRedditPatient Forums (e.g.parkinson.forum.org)
Languages:English, German, French,Spanish(but focus on English)
MedMon February 4, 2020 3 / 15
![Page 25: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/25.jpg)
MedMon Processing Pipeline
MedMon February 4, 2020 4 / 15
![Page 26: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/26.jpg)
Data Retrieval
Retrieval of relevant micro-posts from different data sourcesTwitterRedditMedical Forums
So far retrieval by keyword search: “Parkinson’s Disease”, “PD”,“Multiple Sclerosis”, “MS”Future: Disease-agnostic retrieval of micro-posts
MedMon February 4, 2020 5 / 15
![Page 27: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/27.jpg)
Pre-processing and Removal of Duplicates
Domain-specific data pre-processingAims: reduce sparsity; remove bias introduces by duplicatesExamples:
User names and numbers are replaced with placeholders“@Jah423” → “@user”, “70 kids” → “NUMBER kids”URLs are truncated to their domain names“http://tinyurl.com/yereux6” → “http://tinyurl.com”Hash symbols are stripped from hash tags“#flu” → “flu”Camel-cased expressions are split into their component words“SideEffects” → “Side Effects”Frequent colloquial abbreviations are resolved to their full version“w/” → “with”Repetitions of letters (> 3) are replaced with a single or double letter“greaaaaat” → “great”, “freeeeeze” → “freeze”
Identification and removal of duplicate micro-postMedMon February 4, 2020 6 / 15
![Page 28: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/28.jpg)
Spam-filtering and Filtering by Disease Mentions
Identification of Micro-post relevanceDoes it mention a disease?“Having a blast at Takin’ the Mic tonight with Karl Parkinson andCaelainn Hogan @CaelainnH https://t.co/L2jr4y5HSG. ”Spam vs. relevant content“Seratopicin Healing Pain Relief this cream incorporates all-natural,vegan components. [...] This product can be used for maximum ofyour commonplace muscle and joint pains.”
MedMon February 4, 2020 7 / 15
![Page 29: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/29.jpg)
Personal Health-Mention Identification
Identification of Personal Health Mentions (PHM)Is the micro-post about a specific patient?Is the micro-post authored by the patient, by a relative or ahealth-care provider?
Participation in Social Media Mining for Health Shared Task(SMM4H 2019)
Task 4: Generalizable identification of personal health experiencementionsFocus of shared task: generalize from two given healthcontexts/health conditions (influenza vaccination and infection) inthe training set to three unknown health contexts in test set
MedMon February 4, 2020 8 / 15
![Page 30: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/30.jpg)
SMM4H 2019:Our Best-Performing Approach to PHM Identification
BERT Classifier: pre-trained Bidirectional Encoder Representationsfrom Transformers, with a sequence classification head (tuned to thetask)
Merging of all model parameters into a single modelAverage: pointwise average of all model parametersWeighted Average: models are weighted by their performance onthe development fold (measured as F-score and transformed bySoftmax).
MedMon February 4, 2020 9 / 15
![Page 31: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/31.jpg)
SMM4H 2019:Our Best-Performing Approach to PHM Identification
Weighted average of model parameters:1 x Merged Model from models trained on vaccination9 x Merged Model from models trained on infection(Sub-models were averaged)all models trained for 4 epochs
MedMon February 4, 2020 10 / 15
![Page 32: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/32.jpg)
SMM4H 2019: Results
Results Task 4 (PHM Identification)MedMon February 4, 2020 11 / 15
![Page 33: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/33.jpg)
Identification of Disease Mentions and Symptoms
Span-level annotation of Diseases/Disorders/HealthConditions (+ normalization)
Recognition of layman terminology“Angelman Syndrome”, “Angelmans”, “Angel”, “Angels”, “AS”
Span-level recognition of Symptoms (+ normalization)Recognition of layman terminologyRecognition of multi-word expressions“I’ve been having troubles falling asleep at night”Distinction from adverse drug reaction mentions (ADRs)“ One woman new to our support group never had a tremor untilrecently when she was given pain killer containing epinephrineand she has not stopped having tremors since that dentalappointment.”
MedMon February 4, 2020 12 / 15
![Page 34: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/34.jpg)
Identification of Patients’ Attributes
Span-level recognition of Medication IntakeSpan-level recognition of Patients’ Attributes
Location: where is the patient locatedAge: age at post and current age; Estimated year of birthGender: female vs. male vs. third gender
Currently: dictionary look-up and regular expressions atmicro-post levelFuture: Patients’ attributes and medication intake should berecognized in relation to mention of patient
MedMon February 4, 2020 13 / 15
![Page 35: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/35.jpg)
Conclusions: (Current) Challenges
Disease-agnostic retrieval of micro-postsIdentification of micro-post relevance: filtering of spam,identification of personal health mentionSpan-level recognition of Symptom Mentions: Symptoms vs.ADRsSpan-level recognition of Patients’ Attributes and Medication(Intake): Recognition in relation to patient
MedMon February 4, 2020 14 / 15
![Page 36: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/36.jpg)
Sequence Tagging for Concept Recogni onOntoGene Tools and CRAFT shared task 2019
Lenz Furrer, Joseph Cornelius, Fabio Rinaldi
Ins tute of Computa onal Linguis csUniversity of Zurich
February 4, 2020
![Page 37: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/37.jpg)
The CRAFT Shared Task 2019
CRAFT: the Colorado Richly Annotated Full-Text corpus• 67 ar cles for training (released 2012)• 30 ar cles for tes ng (released now)• annotated with bio en es, dependencies and coreferences• one sub-task each in the compe on
Task (CA)• Named En ty Recogni on (NER) and
Normalisa on (NEN)• 10 en ty types, plain + extended→ 20 separate evalua ons
Team: Lenz Furrer, Joseph Cornelius
2/21
![Page 38: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/38.jpg)
Tradi onal Approach to NER and NEN: PipelineA common human skin tumour is caused by activating mutations in beta-catenin.
NERO O O B-dis E-dis O O O O O O S-chem
3/21
![Page 39: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/39.jpg)
Tradi onal Approach to NER and NEN: PipelineA common human skin tumour is caused by activating mutations in beta-catenin.
NER
NEN
O O O B-dis E-dis O O O O O O S-chem
...MESH:D012876 Skin Disease, ParasiticMESH:D012877 Manifestation, SkinMESH:D012877 Skin ManifestationMESH:D012878 Neoplasm, SkinMESH:D012878 Skin NeoplasmMESH:D012878 Cancer of the SkinMESH:D012878 Cancer, SkinMESH:D012878 Skin CancerMESH:D012883 Skin UlcerMESH:D012883 Ulcer, SkinMESH:D012887 Fractures, Skull...
skin tumour
3/21
![Page 40: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/40.jpg)
NEN: Sequence-to-Sequence
Negacy Degefa Hailu (2019). “Inves ga on of tradi onal and deep neural sequencemodels for biomedical concept recogni on”. PhD thesis. University of Colorado at Denver,Anschutz Medical Campus
4/21
![Page 41: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/41.jpg)
Joint Tackling of NER and NEN
• avoid error propaga on• mutual benefit of feedback across tasks
NER
NEN
NER NEN
pipeline joint training
5/21
![Page 42: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/42.jpg)
Mul -Task Learning for Joint Neural NER+NEN
Takotsubo syndrome secondary to Zolmitriptan
Embedding layer (word + character)
B-DISEASE D054549-P
∑ ∑
U V
I-DISEASE D054549-P
∑ ∑
U V
O NULL
∑ ∑
U V
O NULL
∑ ∑
U V
B-CHEMICAL C089750
∑ ∑
U V
Figure 3: The main architecture of our neural multi-task learning model with two explicit feedback strategies for MER andMEN. The character embedding is computed by CNN in Figure 2. Then the character representation vector is concatenatedwith the word embedding before feeding into the Bi-LSTM. Dashed arrows from the left to the right is the feedback from MERto MEN. Dashed arrows from the right to the left is the feedback from MEN to MER. Orange arrows indicate dropout layersapplied on both the input and output vectors of Bi-LSTM.
For a k-layer Bi-LSTM tagger for MER and MEN we get:
MER(w1:n, i) = yiMER = argmaxyiMER
= fMER(vki )
MEN(w1:n, i) = yiMEN = argmaxyiMEN
= fMEN (vki )
vki = F k
θ (x1:n, i)
x1:n = E(w1), E(w2), ..., E(wn)
where E as an embedding function mapping each wordin the vocabulary into a d-dimensional vector, yi
MER isthe log-probabilities vector with the length of MER tagspace, yiMER is the output tag of MER, yi
MEN is thelog-probabilities vector with the length of MEN tag space,yiMEN is the output tag of MEN, and vk
i is the output ofthe kth Bi-LSTM layer as defined above. All the parametersare trained separately for MER and MEN because we modelMER and MEN as different sequence labeling tasks.
Multi-task Mode with Explicit Feedback StrategiesThe dependencies between MER and MEN inspire us to ex-plore their potential mutual benefits. In order to make themost of the mutual benefits between MER and MEN, wepropose to feed the above mentioned Bi-LSTM and its vari-ants into multi-task learning framework with two explicitfeedback strategies, as shown in Figure 3. This method (1)is able to convert hierarchical tasks into parallel multi-taskmode while maintaining mutual supports between tasks; (2)benefits from general representations of both tasks providedby multi-task learning; (3) is effective in determining bound-aries of medical named entities through explicit feedbackstrategies thus improves the performance of both MER andMEN.
We experiment with a multi-task learning architecturebased on stacked Bi-LSTM, CNNs and CRF. Multi-tasklearning can be seen as a way of regularizing model in-duction by sharing representations with other inductions.
We use stacked Bi-LSTM-CNNs-CRF with task supervisionfrom multiple tasks, sharing Bi-LSTM-CNNs layers amongthe tasks.
MER and MEN are hierarchical tasks and their outputspotentially have mutual benefits for each other as well. Itmeans MEN can take MER results as input, while the re-sults of MEN can be also useful for MER. However, MERand MEN can be implemented independently as different se-quence tagging tasks. Therefore, we 1) follow the popularstrategy of multi-task learning to share representations be-tween MER and MEN; and 2) propose to use mutual feed-back between MER and MEN, i.e., the result of MER is fedinto the MEN as part of the input and the result of MEN isfed into the MER as part of the input. The multi-task learn-ing with two explicit feedback strategies for MER and MENis defined as:
MER(w1:n, i) = yiMER = argmaxyiMER
= fMER(vMERi )
MEN(w1:n, i) = yiMEN = argmaxyiMEN
= fMEN (vMENi )
vMERi = vk
i ◦ (vki + yi
MENU)
vMENi = vk
i ◦ (vki + yi
MERV)
vki = F k
θ (x1:n, i)
x1:n = E(w1), E(w2), ..., E(wn)
where fMER(vMERi ) is the MER multi-class classification
function and fMEN (vMENi ) the MEN multi-class classi-
fication function. vMERi is the input of MER multi-class
classification function, which combines the output of theshared stacked Bi-LSTM-CNNs and the explicit feedbackfrom MEN. vMEN
i is the input of MEN multi-class classi-fication function, which combines the output of the sharedBi-LSTM-CNNs and the explicit feedback from MER. Uis the matrix to map the feedback from MEN to MER, Vmaps the feedback from MER to MEN. You can consider
820
Sendong Zhao et al. (2019). “A Neural Mul -Task Learning Framework to Jointly ModelMedical Named En ty Recogni on and Normaliza on”. In: Proceedings of the Thirty-ThirdAAAI Conference on Ar ficial Intelligence (AAAI-19), pp. 817–824. DOI:10.1609/aaai.v33i01.3301817
6/21
![Page 43: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/43.jpg)
BiLSTM with Synchronous NER and NEN
ℎ→
�−2
ℎ �−2
��
�−2
S 13712
RanBP2
��−2 ��
�−2
��
�−2
13712
ℎ→
�
ℎ �
���
B 8608
Hexokinase
�� ���
���
8608
ℎ→
�+1
ℎ �+1
��
�+1
E 8608
I
��+1 ��
�+1
��
�+1
8608
...
...
ℎ→
�+2
ℎ �+2
��
�+2
O NIL
activities
��+2 ��
�+2
��
�+2
NIL
embedd
ing
BiLST
MNER
NEN
ℎ→
�−1
ℎ �−1
��
�−1
O NIL
modulates
��−1 ��
�−1
��
�−1
NIL
...
...7/21
![Page 44: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/44.jpg)
Fine-Tuning BioBERT
BioBERT
[CLS] Tok1 Tok2 TokN. . .
E[CLS] E1 E2 EN. . .
C T1 T2 TN. . .
NIL CHEBI:8608 NIL. . .
PubMed(1M)
BERTE[CLS] E1 E2 EN. . .
C T1 T2 TN. . .
Train two independent models:• span detector (NER)• ID tagger (NEN)
8/21
![Page 45: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/45.jpg)
Unseen IDs: Pretraining (BiLSTM)
Pretraining• pretrain on ontology names→ top 1000 concepts only→ copy output label to feature input→ 20 epochs• then con nue training on corpus sentences
(6-fold cross-valida on, early stopping)
9/21
![Page 46: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/46.jpg)
Unseen IDs: Back-Off (BERT)
O O
RanBP2 modulates Hexokinase I activitiesNIL NIL NIL
SPR:13712 PR:8608
PR:29546
RanBP2 modulates Hexokinase I activitiesPR:13712
PR:8608 PR:8608
PR:8608 PR:8608NIL NIL
B E
IDs
spans
OGER
predictions
10/21
![Page 47: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/47.jpg)
OGER
[http://www.ontogene.org/resources/oger]
11/21
![Page 48: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/48.jpg)
OGER: annota on service
The OntoGene’s Biomedical En ty Recogniser (OGER)• RESTful web service, using BTH terminologies• Allows annota on of a collec on of documents.• Evaluated in the Bio Text Mining services challenge BioCrea ve/TIPS
• best results according to several of the evalua on metrics.
[http://www.ontogene.org/resources/oger]
12/21
![Page 49: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/49.jpg)
BioCrea ve V.5 / TIPS
13/21
![Page 50: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/50.jpg)
Bio Term Hub
The Bio Term Hub is an aggregator of biomedicalterminologies, which is kept up-to-date byautoma cally integra ng content from manuallycurated data bases.[http://www.ontogene.org/resources/termdb]
14/21
![Page 51: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/51.jpg)
Bio Term Hub
[http://www.ontogene.org/resources/termdb]15/21
![Page 52: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/52.jpg)
Metrics
Slot Error Rate (Bossy et al. 2013)• count matches (M), inser ons (I),
dele ons (D), subs tu ons (S)→ subs tu ons: penalty for incorrect
boundaries and distance to correctontology entry
→ find op mal alignment
SER =S+ I+ D
N
Precision/Recall/F1• count matches (M) and par al
matches (Mp = 1− S)
Recall =M+Mp
N
Precision =M+Mp
P
N: # of ground-truth annota ons, P: # of predicted annota ons
16/21
![Page 53: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/53.jpg)
Compe ng Systems
• Baselineplain dic onary-based system (OGER)
• BiLSTM• 6-fold CV, early stopping• + pretraining• + mul ple runs per fold, pick best
• BioBERT fine-tuned for 55 epochs• ID tagger• span tagger combined with OGER• combina on of the previous two (back-off)
17/21
![Page 54: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/54.jpg)
Results: Legend
0.0
0.2
0.4
0.6
0.8
1.0
F10.0
0.2
0.4
0.6
0.8
1.0
SER
OGER (baseline)BiLSTMBiLSTM, pretrainedBiLSTM, pretrained, pick-bestBERT-IDsBERT-spans+OGERBERT-IDs+BERT-spans+OGER
18/21
![Page 55: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/55.jpg)
Results
CHEBI0.00.20.40.60.81.0
F1
CL GO_BP GO_CC GO_MF MOP NCBITaxon PR SO UBERON
CHEBIEXT
0.00.20.40.60.81.0
F1
CLEXT
GO_BPEXT
GO_CCEXT
GO_MFEXT
MOPEXT
NCBITaxonEXT
PREXT
SOEXT
UBERONEXT
0.00.20.40.60.81.0
SER
0.00.20.40.60.81.0
SER
19/21
![Page 56: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/56.jpg)
Examples: Correct Unseen IDs
BERT-IDs+BERT-spans+OGER predicts CHEBI_PR_EXT:somatostatin (twice):However, the somatosta n receptor 2 (SSTR-2) antagonist PRL-2903 does not inter-ferewith the ability of glucose (at 3 and7mM) to inhibit glucagon secre on frommouseislets [47].
BERT-IDs+BERT-spans+OGER predicts CHEBI:60004:Adult mouse testes were homogenized in a buffer containing 20 mM Tris, pH 7.5, 100mMKCl, 5mMMgCl2, 0.3%NP-40, 40U/ml of Rnasin ribonuclase inhibitor (Promega,Madison, WI), and a mixture of 10 protease inhibitors provided [...]
BiLSTM pick-best predicts PR:000008373:Decreased Osteogenic Differen a on Correlates with Abnormal Distribu on of Cx43
20/21
![Page 57: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/57.jpg)
Unseen IDs: Precision/Recallunique occ. OGER pretraining BERT spans BERT back-off
CHEBI 110 447 0.33 / 0.65 1.00 / 0.00 0.74 / 0.47 0.70 / 0.11CHEBI_EXT 134 538 0.37 / 0.71 1.00 / 0.00 0.62 / 0.49 0.76 / 0.09CL 52 484 0.72 / 0.31 1.00 / 0.00 0.88 / 0.22 0.59 / 0.04CL_EXT 52 484 0.72 / 0.31 1.00 / 0.00 0.71 / 0.25 0.71 / 0.11GO_BP 120 484 0.21 / 0.25 1.00 / 0.00 0.56 / 0.12 0.66 / 0.06GO_BP_EXT 126 508 0.22 / 0.28 1.00 / 0.00 0.29 / 0.18 0.62 / 0.07GO_CC 32 184 0.19 / 0.35 1.00 / 0.00 0.50 / 0.17 0.49 / 0.06GO_CC_EXT 36 231 0.28 / 0.47 1.00 / 0.00 0.58 / 0.19 0.60 / 0.07GO_MF 1 1 0.10 / 0.50 1.00 / 0.00 1.00 / 0.00 1.00 / 0.00GO_MF_EXT 73 416 0.38 / 0.22 1.00 / 0.00 0.57 / 0.15 0.54 / 0.04NCBITaxon 40 87 0.02 / 0.50 1.00 / 0.00 0.40 / 0.34 0.75 / 0.22NCBITaxon_EXT 44 95 0.02 / 0.54 1.00 / 0.00 0.43 / 0.35 0.85 / 0.25PR 278 4782 0.26 / 0.86 0.63 / 0.00 0.81 / 0.74 0.69 / 0.15PR_EXT 309 5156 0.27 / 0.84 0.34 / 0.01 0.84 / 0.73 0.65 / 0.20SO 16 101 0.04 / 0.87 1.00 / 0.00 0.10 / 0.06 0.52 / 0.02SO_EXT 25 123 0.05 / 0.78 1.00 / 0.00 0.28 / 0.47 0.85 / 0.41UBERON 203 1297 0.47 / 0.33 0.69 / 0.00 0.74 / 0.25 0.59 / 0.06UBERON_EXT 207 1308 0.47 / 0.33 0.87 / 0.00 0.78 / 0.27 0.60 / 0.06
21/21
![Page 58: Social Media Mining in the context of a pharmaceutical ... › cl › rinaldi › PRESENTATIONS › blah2020.pdf · Social Media Mining in the context of a pharmaceutical company,](https://reader033.vdocuments.us/reader033/viewer/2022042403/5f162ef7e17c15430564b7a2/html5/thumbnails/58.jpg)
Intro Projects
Conclusions
• Solid, easy-to-use, efficient dictionary-based solution with constantly up-to-dateresources
• Bio Term Hub: a one-stop site for obtaining up-to-date biomedical terminologicalresources. http://www.ontogene.org/resources/termdb
• OGER: an efficient text annotation tool using the BTH terminologies. Providesspans and IDs (NER and CR) http://www.ontogene.org/resources/oger
• Integration with state-of-the-art disambiguation approaches for specificapplications
• Applications over several text types: literature, clinical records, social media
http://www.ontogene.org/https://github.com/OntoGene/craft-st