biomedical articles per year
DESCRIPTION
Biomedical articles per year. In 2011: approx. 20 000 new PubMed articles per week!. Questions of biomedical experts. Yes/no question: “Are there any DNMT3 proteins present in plants ?” Exact answer : “ Yes” - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/1.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
NLP Group, Department of Informatics, Athens University of Economics and Business
Ion Androutsopoulos
BioASQ
Intelligent Information Management Targeted Competition Framework ICT-2011.4.4(d)
Ion Androutsopoulos, February 2014
A challenge on large-scale biomedical semantic indexing and question answering
www.bioasq.org
![Page 2: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/2.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
Biomedical articles per year
2/29
• In 2011: approx. 20 000 new PubMed articles per week!
![Page 3: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/3.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
Questions of biomedical experts
• Yes/no question: “Are there any DNMT3 proteins present in plants?”– Exact answer: “Yes”– Ideal answer: “Yes. The plant DOMAINS
REARRANGED METHYLTRANSFERASE2 (DRM2) is a homolog of the mammalian de novo methyltransferase DNMT3. DRM2 contains a novel arrangement of the motifs required for DNA methyltransferase catalytic activity.”
3/29
![Page 4: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/4.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
Questions of biomedical experts (II)
• Factoid question: “What is the methyl donor of DNA (cytosine-5)-methyltransferases?”– Exact answer: “S-adenosyl-L-methionine”– Ideal answer: “S-adenosyl-L-methionine (AdoMet, SAM)
is the methyl donor of DNA (cytosine-5)-methyltransferases. DNA (cytosine-5)-methyltransferases catalyze the transfer of a methyl group from S-adenosyl-L-methionine to the C-5 position of cytosine residues in DNA.”
4/29
![Page 5: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/5.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
Questions of biomedical experts (III)• List question: “Which species may be used for the
biotechnological production of itaconic acid?”– Exact answer: { “Aspergillus terreus”, “Aspergillus niger”,
“Ustilago maydis” }– Ideal answer: “In 1955, the production of itaconic acid was
firstly described for Ustilago maydis. Some Aspergillus species, like A. itaconicus and A. terreus, show the ability to synthesize this organic acid and A. terreus can secrete significant amounts to the media. Itaconic acid is mainly supplied by biotechnological processes with the fungus Aspergillus terreus. Cloning of the cadA gene into the citric acid producing fungus A. niger showed that it is possible to produce itaconic acid also in a different host organism.”
5/29
![Page 6: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/6.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
Questions of biomedical experts (IV)• Summary question: “How do histone methyltransferases
cause histone modification?”– Exact answer: none– Ideal answer: “Histone methyltransferases (HMTs) are responsible
for the site-specific addition of covalent modifications on the histone tails, which serve as markers for the recruitment of chromatin organization complexes. There are two major types of HMTs: histone-lysine N-Methyltransferases and histone-arginine N-methyltransferases. The former methylate specific lysine (K) residues such as 4, 9, 27, 36, and 79 on histone H3 and residue 20 on histone H4. The latter methylate arginine (R) residues such as 2, 8, 17, and 26 on histone H3 and residue 3 on histone H4. Depending on what residue is modified and the degree of methylation (mono-, di- and tri-methylation), lysine methylation of histones is linked to either transcriptionally active or silent chromatin.”
6/29
![Page 7: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/7.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org 7/6
![Page 8: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/8.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
Finding relevant snippets
8/29
![Page 9: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/9.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
Not only texts: ontologies, linked data, …
9/29
![Page 10: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/10.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org 10/6
![Page 11: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/11.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
Information from structured dataList question: “Which forms of cancer is the Tpl2 gene associated with?”• Related RDF triple:
– Subject: http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseases/3003 (lung cancer)
– Predicate: http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/associatedGene
– Object: http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/TPL2"
• Related concepts:– http://www.disease-ontology.org/api/metadata/DOID:162 (cancer)
– http://www.uniprot.org/uniprot/M3K8_RAT (TPL2 synonym)
11/29
![Page 12: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/12.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org 12/6
![Page 13: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/13.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
What is BioASQ?
13/29
A competition funded by the European Union (FP7).
Task A: Hierarchical text classification• Organizers distribute new unclassified PubMed articles.• Participants assign MeSH terms to the articles.• Evaluation based on annotations of PubMed curators.
Task B: IR, QA, summarization, …• Organizers distribute English biomedical questions.• Participants provide: relevant articles, snippets,
concepts, triples, “exact” answers, “ideal” answers. • Evaluation: both automatic (GMAP, MRR, ROUGE etc.)
and manual (by biomedical experts).
![Page 14: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/14.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
Two cycles
14/29
Evaluation infrastructure &
dry-run data
Start of the challenge
End of the challenge
BioASQ workshop
March 2013 June 2013 August 2013 September 2013
2013 Schedule
Start of Task 2A Start of Task 2B
End of the challenge
BioASQ workshop
February 2014 March 2014 May 2014 September 2014
2014 Schedule
► Both tasks run twice, in two cycles (two years).► 1st cycle completed, workshop collocated with CLEF-2013.► 2nd cycle starting! Part of CLEF QA track! (http://nlp.uned.es/clef-qa/)
► Participation can be partial (any task, subtask, response type). ► Prizes for each task/subtask.
![Page 15: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/15.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
More info
15/29
Questions (300+500) and gold articles, snippets, concepts, triples, “exact” and “ideal” answers prepared by biomedical experts from around Europe.► Using tools/infrastructure developed by BioASQ.
Data sources include both text and structured info.► PubMed abstracts, PubMed Central articles, MeSH.► Gene Ontology, UniProt, Jochem, Disease Ontology.
BioASQ datasets, infrastructure, evaluation services etc. available beyond the end of the project:► Plus social net to help extend data, set up new challenges.
Advisory board: both academia and industry.► NLM, NIST, CMU, IBM, MSR, NaCTeM etc.
![Page 16: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/16.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
Annotation tool
16/29
![Page 17: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/17.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
Annotation tool (II)
17/29
![Page 18: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/18.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
Annotation tool (III)
18/29
![Page 19: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/19.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
BioASQ social network
19/29
![Page 20: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/20.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
BioASQ social network (II)
20/29
![Page 21: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/21.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
How they all fit together
21/29
![Page 22: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/22.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
First year participants
22/29
![Page 23: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/23.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
First year participants (II)• Task 1A (46 systems, 11 teams)
– Mayo Clinic, USA– University of Alberta, CANADA– Aristotle University of Thessaloniki + Atypon, GREECE– University of Vigo, SPAIN– University of Colorado, USA– NCBI, NLM, USA– Université de Rouen, FRANCE– Fudan University, CHINA– UCSD, USA– Toyota Technological Institute, JAPAN– Imran, PAKISTAN
23/29
![Page 24: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/24.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
First year participants (III)• Task 1B, Phase A (4 systems, 2 teams)
– Mayo Clinic, USA– University of Alberta, CANADA
• Task 1B, Phase B (7 systems, 2 teams)– University of Alberta, CANADA– Toyota Technological Institute, JAPAN
• More participants needed in Task 1B, esp. from Europe!• Workshop (30 participants)• Invited speakers
– Lan Aronson, Lister Hill Center, U.S. National Library of Medicine, USA
– Jennifer Chu-Caroll, IBM T.J. Watson Research Center, USA
24/29
![Page 25: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/25.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
First year technology/results overview• Task 1A
– Mainly SVMs and learning-to-rank.– Mostly flat classification, ignoring class taxonomy.– Mediocre results by hierarchical methods.– One of the systems outperformed NLM’s system.
• Task 1B– Phase A (retrieve relevant documents, concepts, snippets,
triples): low performance (compared to baselines).– Phase B (formulate ‘exact’ and ‘ideal’ answers): poor performance
for ‘exact’ answers (except for yes/no questions); high performance for ‘ideal’ answers (paragraph-sized summaries), but starting with gold documents, snippets etc.
• Large scope for improvements, esp. in Task 1B.
25/29
![Page 26: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/26.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
“Exact” answer results (batch 2/3)
26/29
![Page 27: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/27.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
“Ideal” answer results (batch 2/3)
27/29
![Page 28: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/28.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org 28/29
![Page 29: Biomedical articles per year](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681663d550346895dd9a658/html5/thumbnails/29.jpg)
Ion Androutsopoulos, February 2014 www.bioasq.org
Project Consortium
1. National Centre for Scientific Research “Demokritos” -NSCR “D” (EL)
2. Transinsight GmbH – TI (D)3. Universite Joseph Fourier- UJF (F)4. University Leipzig - ULEI (D)5. Universite Pierre et Marie Curie Paris 6 – UPMC (F)6. Athens University of Economics and Business –
Research Centre – AUEB-RC (EL)
29/29