hlt r&d in south africa hlt collaboration between south africa and the low countries workshop 24...

22
HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Upload: asher-kelley

Post on 26-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

HLT R&D in South Africa

HLT Collaboration between South Africa

and the Low Countries Workshop

24 November 2008

Noordhoek, South Africa

Page 2: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Overview

Specific R&D challenges Areas of active research

Text processing Speech processing Applications of HLT

Main projects: current and recent Research institutions active in HLT Main R&D sponsors

Page 3: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Specific R&D Challenges

Incompleteness of basic linguistic knowledge

Scarcity of resources Linguistic data Technology components

Uniqueness of user populations and languages

Page 4: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Research areas (1)

Text processing: Computational morphological analysis, POS tagging Spelling checkers, grammar checkers Machine translation, machine-aided translation Computational lexicography Wordnets

Research focus: Development of basic required components and tools Data collection and corpus development Technology transfer, cross-language learning, bootstrapping, language

distances MA for agglutinative languages

Page 5: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Research areas (2)

Speech processing: ASR, TTS, spoken dialogue systems Phonetic investigations for HLT Speaker verification, S-LID Speech tools (diarization, channel normalisation, speech detection)

Research focus: Development of basic required components and tools Data collection and corpus development Technology transfer, cross-language learning, bootstrapping, language

distances Timing information in speech Multi-accent and multilingual acoustic modelling Higher order Markov models and other non-standard acoustic models

Page 6: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Research areas (3)

Applications of HLT Telephone-based information systems Computer assisted language learning Document proofing tools Accessibility devices Mobile devices

Page 7: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Main R&D initiatives

Department of Arts and Culture (DAC)Applications that support multilingualism, especially related to government service delivery DAC A: Spelling checkers DAC B: Machine-aided translation DAC C: Lwazi: Multilingual telephony-based information delivery

Department of Science and Technology (DST)Directed research in HLT aimed at addressing SA national priorities. National HLT Network projects International collaborative projects

Various individual research projects

Page 8: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Main R&D projects

Text processing: Computational morphological analysis: Unisa Spellcheckers: DAC A Machine translation: EtsaTrans, DAC B

Speech: Phonetic investigations: NHN PAST ASR/TTS/spoken dialogue systems:

AST, Limpopo ASR OpenPhone, Lwazi (DAC C)

Mobile E-learning for Africa (MELFA)

Page 9: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

UNISA Computational Morphological Analysis

Development of parsing tools for Bantu languages: computational morphological analysers disambiguators syntactic parsers

Development of supporting resources for development & testing, includes extensive underlying machine-readable lexicons

Status: Initiated in 2002 (for isiZulu morphological analyser) Various prototypes under development (isiZulu, isiXhosa, Siswati, isiNdebele, Northern

Sotho and Setswana) Extended until 2010

Principal researchers: Sonja Bosch (Project Leader), Laurette Pretorius Ansu Berg, Axel Fleisch, Albert Kotze, Petro Kotze, Memezi Mfusi, Lydia Mojapelo,

Rigardt Pretorius, Linda van Huyssteen, Biffy Viljoen

Sponsor: NRF

Page 10: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

DAC A: Spelling checkers for public administration domain

Development of spelling checkers for 10 official SA languages Specifically for use in government departments. Spelling checkers for isiNdebele, isiXhosa, isiZulu and Siswati include morphological

analysers for effective spellchecking of these agglutinative languages

Status: Final evaluation by client in progress

Principal researchers: MJ Puttkammer (NWU), S Pilon (NWU), DJ Prinsloo (UP), SE Bosch (Unisa)

Sponsor: Department of Arts and Culture, CText

Page 11: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

EtsaTrans Machine Translation

Development of a functional machine translation system. Focus domain: mainly administrative documents Main languages: English to Afrikaans, Afrikaans to English Other languages: English to Xhosa, English to Southern Sotho

Harvesting previously translated information to create parallel corpora

Status: Initiated in 2003, ongoing Prototypes in use

Principal researchers: JA Naudé, L Jordaan

Sponsor: UFS

Page 12: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

DAC B: Machine-aided translation tools

Development of translation tools: An integrated translation environment (ITE) Word translators Machine translation systems for three language pairs Terminology management system Document management system

Status: Under development (2007-2010) All tools, data and research output to be made available publicly

Principal researchers: HJ Groenewald, S Pilon (NWU) DJ Prinsloo (UP)

Sponsor: DAC

Page 13: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

NHN PAST: Phonetics for Advanced Speech Technology

Technology-orientated investigation and description of the vowel system of the Sotho languages and tone in Sotho and Nguni language

Status: Initiated May 2008, Due for completion June 2009

Principal researchers: E. Barnard (Meraka) B. Khoali (independent consultant) D. Wissing (NWU) S. Zerbian (Wits)

Sponsor: National HLT Network (DST/Meraka)

Page 14: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

African Speech Technologies (AST)

Development of a multilingual telephone-based hotel reservation system. Developed corpora and technology components (TTS, ASR, dialogue systems) for

SAE, Afrikaans, isiZulu, isiXhosa and Sesotho.

Status: Completed 2004 Gave rise to commercial company: Catchword Data available for research purposes (release imminent)

Principal researchers: J.C. Roux, E.C. Botha, J. du Preez Various collaborators

Sponsor: DACST (Innovation Fund)

Page 15: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Limpopo ASR

Development of baseline automatic speech recognition systems for the major languages of the Limpopo Province Languages: Sepedi (Sesotho sa Leboa), Setswana, Tshivenda and Xitsonga. Telephone speech data collection and manual annotation

Extension to text-to-speech synthesis and domain-specific prototype dialogue systems

Status: Baseline ASR systems completed (2004-2006) Extension ongoing

Principal researchers: HJ Oosthuizen and MJD Manamela

Sponsor: Telkom and other industry partners

Page 16: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

OpenPhone

Demonstrated use of telephone-based information services in providing health information in a rural setting.

Automated health information system that provides information to caregivers looking after HIV-positive children living in the vicinity of Gabarone in Botswana

Includes Setswana TTS and ASR development

Status: Completed 2008, currently live.

http://www.meraka.org.za/hlt_projects_ophone.htm

Principal researchers: Etienne Barnard, Marelie Davel, Madelaine Plauche

Sponsor OSI/OSISA, DST

Page 17: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Lwazi

Development and piloting of a fully Open Source multilingual telephone-based information system ASR and TTS systems in 11 official languages ASR and TTS integrated into a telephony platform Open Source resources and tools Various pilots: first significant pilot with DPSA Community Development Workers

Status: Initiated September 2006 On track for completion September 2009

Principal researchers: Etienne Barnard, Marelie Davel, Gerhard van Huyssteen

Sponsor: DAC

Page 18: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Mobile E-learning for Africa (MELFA)

Mobile solutions for on-site literacy training and skills development for workers in the Building and Construction Industry

Includes text-to-speech, speech-to-speech translation Initially 30 test persons in Western Cape are involved in testing the modules for interactive

M-learning.

Status: Initiated in 2007, completing in 2009.

Principal researchers: JC Roux (Project leader, SA), A Visagie, H Engelbrecht, A Magnusdottir, P Scholtz.

Sponsor: Danida (Danish government organisation)

Page 19: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Research institutions: TextInstitution Areas of interest Size1 Language focus

UNISA

University of South Africa

Morphological analysis, POS disambiguation, syntactic parsing

8/2 Bantu family languages

CTexT

North-West University

Document proofing tools, machine aided translation, machine translation, computer assisted language learning, syntactic parsing

2/8 Afrikaans

(Other official languages, African languages)

UP

University of Pretoria

Morphological analysis, POS disambiguation, syntactic parsing, computational lexicography

2/0 Sepedi

UWC

University of Western Cape

POS disambiguation, computational lexicography, localization, machine translation

2/x isiXhosa

Wits (1)

University of Witwatersrand

Morphological analysis 1/0 isiZulu

UFS

University of Free State

Machine aided translation, machine translation

1/0 EnglishAfrikaans

(Sesotho/E, isiXhosa/E)

1 Size: snr researchers / post-graduate students

Page 20: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Research institutions: Speech

Institution Areas of interest Size Language focus

SU-CLaST

University of Stellenbosch

ASR, TTS, spoken dialogue systems, speaker verification, S-LID, computer assisted language learning, machine translation, speech-to-speech translation

6/6 SAE, isiXhosa, Afrikaans

Meraka

CSIR Meraka Institute

ASR, TTS, spoken dialogue systems, tone modelling, pronunciation modelling, speaker verification, language distances, channel normalisation, S-LID

4/15 All SA official languages

Wits

University of Witwatersrand

Tone modelling

TTS

2/1 Sotho and Nguni languages

Limpopo

University of Limpopo

ASR, TTS, language modelling 1/2 Sepedi, Xitsonga, Tshivenda, Setswana

Page 21: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Main R&D sponsors Department of Arts and Culture (DAC)

Applications that support multilingualism, especially related to government service delivery

Department of Science and Technology (DST)

Directed research in HLT aimed at addressing SA national priorities.

National Research Foundation (NRF)

Support for individual researchers

Industry:

Addressing industry-specific needs ASR/TTS (Telkom, Intelleca, IBM, Google and others), Spelling checkers (Microsoft) Speech processing tools (Grintek,Armscor), Speech-to-speech translation (Armscor)

International donor funding

Addressing developmental needs Open Society Initiative (OSI/OSISA), Danish Danida, UK Dept for International Development (DfID) Canadian International Development Research (IDRC), and others

Host institutions (Universities, CSIR, etc)

Page 22: HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa