medical-domain machine translation in kconnect · medical-domain machine translation in kconnect...

24
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 644753 (KConnect). Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics Czech Republic Apr 4th, 2017 – QT21 workshop, Valencia, Spain

Upload: others

Post on 23-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 644753 (KConnect).

Medical-domain Machine Translation in KConnect

Pavel PecinaCharles University, PragueFaculty of Mathematics and PhysicsInstitute of Formal and Applied LinguisticsCzech Republic

Apr 4th, 2017 – QT21 workshop, Valencia, Spain

Page 2: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

Outline

● Context of the project (Khresmoi)

● Project details goals and objectives

● Role of MT in the project

● Industry requirements/constraints

● Solutions and tools

● Prototypes/Demos

● What is still needed

Page 3: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

Khresmoi

● „Collect and make sense of biomedical information, then make it freely and easily available in several languages.“

● FP7-ICT, No. 257528, Collaborative project

● Total cost: EUR ~10M, 2010/09-2014/08

● Topic: ICT-2009.4.3 - Intelligent Information Management

● Coordinator: Henning Müller, University of Applied Sciences Western Switzerland, Sierre

● Consortium: 12 institutions

● http://www.khresmoi.eu/

Page 4: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

Khresmoi objectives

● Effective automated information extraction from (unstructured) biomedical documents

● Linking information extracted from unstructured biomedical texts/images to structured information in knowledge bases

● Support of cross-language search, including multi-lingual queries, and returning machine-translated pertinent excerpts

● Adaptive user interfaces to assist in formulating queries and display search results via ergonomic/interactive visualizations

● Automated analysis and indexing for medical images

Page 5: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

Khresmoi results (MT related)

● MT component to allow cross-lingual search and access

● Based on Moses and domain-adaptation techniques

● Deployed as (cloud-based) web-service

● Translation in two „modes“:– Translation of search queries from user languages to the

documents languages (query translation)– Translation of sentences from automaticaly created

summaries of medical documents (summary translation)

● Languages: Czech, German, French ↔ English

Page 6: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

KConnect – a follow-up of Khresmoi

● „Development and commercialization of cloud-based services for multilingual Semantic Annotation, Semantic Search and Machine Translation of Electronic Health Records and medical publications.“

● H2020 project, No. 644753, Innovation action

● Total cost: EUR ~4M, 2015/02–2017/07

● Topic: ICT-15-2014 Big data and open data innovation and take-up

● Coordinator: Allan Hanbury, Technical University in Viena

● Consortium: 10 institutions (5 from Khresmoi)

● http://www.kconnect.eu

Page 7: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

Consortium● Academia:

– Technische Universitaet Wien (Austria) – coordination

– University of Sheffield (United Kingdom)

– King’s College London (United Kingdom)

– Charles University, Prague (Czech Republic)

● Industry:

– Findwise AB (Sweden)

– Precognox Informatikai Kft (Hungary)

– Ontotext AD (Bulgaria)

– Trip Database Ltd (United Kingdom)

– Health on the Net Foundation (Switzerland)

– Jonopkins Lan (Sweden)

Page 8: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

KConnect objectives

● Productisation of the multilingual medical text processing tools developed in Khresmoi.

● Creating professional services community of companies trained to build solutions based on the KConnect Services.

● Development of toolkits for straightforward adaptation of the commercialised services to new languages.

● Adapting the services to Electronic Health Records processing, which is particularly challenging due to misspellings, neologisms, organisation-specific acronyms, etc.

● Languages: Hungarian, Polish, Spanish, Swedish ↔ English

Page 9: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

MT Application Scenarios

1. Query translation– Translation of medical/health-related search queries from a

user language to the document language(s)– Queries usually non-grammatical, short sequences of terms– Lay-people queries vs. expert queries

2. Summary translation – Sentences taken from automaticaly created abstracts of

medical documents translated back to the user language– Usually longer, highly informative sentences

Page 10: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

Requirements, constraints

● Requirements– Cloud-based solution, easily accessible as webservice– Local instalation (hospitals)– Instant response, scalable – Low computation resources (local instalations)– Easily (re)trainable

● Constraints– No (very limited) domain-specific in-house training data

Page 11: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

Solutions and tools

● Moses (phrase-based, domain adaptated)

● MT Monkey – MT webservice architecture

● Eman Lite – MT traninig pipeline

● Manually translated dev/test sets for medical domain

● Training data colllected and made available for WMT 17

Page 12: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

MT Monkey

● Webservice architecture

● Developed at CUNI within Khresmoi

● Activelly extended and maintained within KConnect

● Scalable (see Tamchyna et al, 2013 for evaluation)

● Recently Dockerized

Page 13: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

Eman Lite

● fully automated MT system training

● command-line and web-based interface

Page 14: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

Prototypes/demos

● Trip database search– https://www.tripdatabase.com– Search in medical articles (clinical trials, research papers ...)

● Health-on-the-Net Search– http://everyone.khresmoi.eu/– Health-focused web-search engine– Readability and trustablity prediction

● Demos– http://quest.ms.mff.cuni.cz/khresmoi/demo/– http://quest.ms.mff.cuni.cz/khresmoi/client/

Page 15: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

Trip Database Search

Page 16: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

Trip Database Search

Page 17: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

HON Search

Page 18: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

HON Search

Page 19: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

HON Search

Page 20: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

HON Search (new version)

Page 21: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

HON Search (new version)

Page 22: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

HON Search (new version)

Page 23: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

Prototypes/demos

● Trip database search– https://www.tripdatabase.com– Search in medical articles (clinical trials, research papers ...)

● Health-on-the-Net Search– http://everyone.khresmoi.eu/– http://jupiter.honservices.org/beta/– Health-focused web-search engine– Readability and trustablity prediction

● Demos– http://quest.ms.mff.cuni.cz/khresmoi/demo/

Page 24: Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute

Issues

● Availability of (in-domain) training data

● Training data licences not clear (UMLS,MeSH, SnomedCT)

● Translation quality for some languages (e.g. Hungarian)

● Lay-people language vs. expert language