hlt in south africa: yesterday, today and tomorrow

36
Workshop:HLT Collaboratio n 23 -26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language and Speech Technology

Upload: alexis

Post on 06-Jan-2016

36 views

Category:

Documents


2 download

DESCRIPTION

HLT in South Africa: Yesterday, Today and Tomorrow. Justus Roux Stellenbosch University Centre for Language and Speech Technology. AIM Brunfelsia Latifolia Focus on official government policy development on HLT in South Africa Role players in policy making - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

1

HLT in South Africa: Yesterday, Today and Tomorrow

Justus Roux

Stellenbosch University Centre for Language and Speech Technology

Page 2: HLT in South Africa:  Yesterday, Today and Tomorrow

2

AIM

• Brunfelsia Latifolia

• Focus on official government policy development on HLT in South Africa

• Role players in policy making • Wish list regarding future planning and policies

Page 3: HLT in South Africa:  Yesterday, Today and Tomorrow

3

YESTERDAY1999 / 2000:

• First initiative by Pan South African Language Board (PanSALB) and the Department of Arts, Culture, Science and Technology (DACST) towards setting up a “Human Language Technology Project”

• Joint Steering Committee: DACST, PanSALB, Universities: Stellenbosch, Pretoria, UNISA, Bloemfontein, ICOMTEK (CSIR), private translation company

• Task to develop a Strategic Plan for HLT development in South Africa

Page 4: HLT in South Africa:  Yesterday, Today and Tomorrow

4

YESTERDAY

Thinking at that time very much influenced by

– European Model for ‘Language Engineering’ and FP5 funding for HLT in Europe

– Recognition of particular realities in SA• Academic & technical realities – limited – training and

reskilling programmes – technology transfer

• Financial realities – co-operation to be sought from Government, Academia, Private sector

• Political realities – official language situation > development of National Lexicographic Units (NLUs)

Page 5: HLT in South Africa:  Yesterday, Today and Tomorrow

5

YESTERDAY

September 2000 – Report – The development of Human Language Technologies in South Africa – Strategic Planning.

Three steps• Step 1 Create a SA model for HLT development

and implementation– Component 1: Applied research and capacity building

(Specialised courses at tertiary institutions, short informal courses)

– Component 2: Production of language resources – standards – “Regulatory forum”

– Component 3: Developing enabling technologies – support to innovative projects – funding from Innovation Fund of DACST

– Component 4: Conscious steps to develop HLT industry

Page 6: HLT in South Africa:  Yesterday, Today and Tomorrow

6

YESTERDAY

Step 2 Creation of a legal framework to ensure systematic acquisition of government resources

Ammendment of the Legal Deposit Act (1997)

Step 3 Development of physical infrastructure to manage the implementation of the model

(NB Role of the NLUs as integral part)

• Virtual National Language and Speech Resource Centre

• Virtual National Electronic Language and Speech Data Network

• Regulatory Forum for Human Language Technologies

Page 7: HLT in South Africa:  Yesterday, Today and Tomorrow

7

Page 8: HLT in South Africa:  Yesterday, Today and Tomorrow

8

• Strategic plan was accepted (by DACST) and on 8 November 2001 a Ministerial Advisory Panel on HLT was inaugurated with the task to focus on the viability of the establishment of a “virtual national electronic language and speech network”

• 8 members – three of whom are at this meeting

• Report delivered in to the Minister in September 2002

YESTERDAY

Page 9: HLT in South Africa:  Yesterday, Today and Tomorrow

9

Recommendations

#1 A virtual HLT Centre to be established with a hub and spoke / nodes configuration (Accepted)

YESTERDAY

Page 10: HLT in South Africa:  Yesterday, Today and Tomorrow

10

Structure of National Resource Centre for HLT (Virtual Centre: Hub and connected nodes)

Centre YSA Eng

AfrikaansUni D

N SothoSign Lang

Uni BVendaTsonga

Uni AXhosaSwati

Centre XZulu

Ndebele

Uni CN SothoTswanaManagerial Hub

Coordination of Node Activities

Data acquisitionData enhancement

Data management & backupTraining

NLU (?)Lang (?)

LELE

LE

LE = Language experts

Page 11: HLT in South Africa:  Yesterday, Today and Tomorrow

11

Recommendation #2 (Not accepted)

Establishment of an interim Implementation Secretariat for period of one year

In stead an HLT Steering Committee was appointed to oversee

implementation within a period of five years

Recommendation #3 (Accepted – not implemented)

HLT development should take place in co-operation with Presidential National Commission on Information Society and Development

Recommendation #4 (Not accepted – not necessary)

Amendment of Legal Deposit Act (1997)

YESTERDAY

Page 12: HLT in South Africa:  Yesterday, Today and Tomorrow

12

YESTERDAY

2002

Department of Science and Technology (DST) – National Research and Development Strategy – reference to ICT / HLT (Handout)

2003• National Language Policy Framework (NLPF) approved by Cabinet

(February) – specific reference to HLT in Section 3 (3.3) • The development of an official HLT Strategy as one of the

implementation mechanisms of the NLFP is suggested - Section 4 (4.8) (Refer “TODAY”)

• Establishment of an HLT Unit within National Language Service • HLT Steering Committee appointed to oversee implementation of

an HLT Resource Centre within a period of five years in collaboration with the HLT Unit of the National Language service (NLS) (2003-2007)

Page 13: HLT in South Africa:  Yesterday, Today and Tomorrow

13

YESTERDAY

2004

Department of Trade and Industry Report

Benchmarking of Technology – Trends and Technology Developments

Emphasis on the important role of HLT within the economic sector in South Africa.

Page 14: HLT in South Africa:  Yesterday, Today and Tomorrow

14

Summary of technologies with potential high impact on ICT sector

(SA Dept Trade and Industry Report 2004: 10)

Low HighSouth Africa`s ability to respond

Po

ten

tial i

mp

act

on

in

du

stry

Mobile

WirelessHLT

OSS

TelemedicineGrid computing

Geomatics

RFID

Manufacturing (CAD, Robotics)

Lim

ited

Pe

rva

sive

Page 15: HLT in South Africa:  Yesterday, Today and Tomorrow

15

YESTERDAY2005• Establishment of Meraka Institute with HLT Research

Group Initiative of Department of Science and Technology (DST)

• National Workshop on HLT (May 2005 – CSIR Conference Centre) – Roadmapping – Main issues and recommendations are in handout.

• During this period several workshops and conference tracks were held:– PRASA annual conferences– ALASA SIG on Language and Speech Technology Development– ALASA International Conferences (special track)– Roadmapping workshop with State IT Agency (SITA) – Steven

Krauwer (BLARKS)

Page 16: HLT in South Africa:  Yesterday, Today and Tomorrow

16

TODAYProgress of Steering Committee to set up Resource Centre in collaboration with NLS (HLT Unit) (1)

• Draft HLT National Strategy document developed and submitted (Detail Dr Jokweni)

• Great amount of work, but little progress

• The Steering Committee had a strained working relationship with previous Chief Director of NLS, hence two instances of disagreement:

– Unilateral call by DAC (NLS) (2005) for tenders as management agent for the envisaged National Resource Centre – failure – no funds available

– Unilateral call for development proposals by DAC (2006) – Steering Committee was not involved (amount distributed to successful applicants – outputs imminent)

Page 17: HLT in South Africa:  Yesterday, Today and Tomorrow

17

TODAY

Progress of Steering Committee to set up Resource Centre in collaboration with NLS (HLT Unit) (2)

• The Steering Committee has a good working relationship with new Chief Director and staff of the of NLS

– Submissions for funding submitted

Page 18: HLT in South Africa:  Yesterday, Today and Tomorrow

18

Research Role Players in South Africa: Universities

LanguageResources

EnablingTechno-logies

StandardiseFormats &Protocols

Speech recognition

Morph analysis

Speech generation

POS tagging

Syntactic analysis

Semantic analysis

Text corpora

Spoken corpora

Dictionaries

Lexicons

Grammars

Terminology banks

Research

UniversitiesEngineering

Computer Science Dedicated R&D Centres

Meraka Institute

DST

InternationalStandards

Organisation(ISO TC 37)

SABS TC 37

UniversitiesLanguages

Linguistics Dedicated R&D Centres

NLSPanSALB

DAC

Page 19: HLT in South Africa:  Yesterday, Today and Tomorrow

19

TOMORROWWish list - Planning and policy

• Restructuring of the HLT Steering Committee: Real role players are needed to contribute to the debate (Request to the Minister through NLS / DAC)

• Establishment of the HLT Resource Centre as a priority.– Render support services to HLT community– Source of job creation

• Co-ordinated academic training at national level– Standard curricula over and above specialised curricula– Staff exchange programme (national & international)– Recognition of modules across accredited institutions

• Applied research conducted in accordance with national priorities set by, for example, a body of experts from user sectors. (Roadmaps, annually updated.)

• Blue sky research within HLT remains imperative also from funding perspective.

Page 20: HLT in South Africa:  Yesterday, Today and Tomorrow

20

TOMORROW• National funding procedures for HLT research and

training should be transparent and equitable– Task for a Select Committee of National and International

Experts (?)

• Address the particular interest in HLT research and training within Africa: imminent projects – Algeria, Morocco, Kenya, Nigeria and Gabon. – Possibility of international funding, e.g. Association of African

Universities (AAU) staff & student exchange programme

• Hopefully more insights to be gained from this workshop, not only with respect to international co-operation, but also regarding the positioning of HLT activities in South Africa.

Page 21: HLT in South Africa:  Yesterday, Today and Tomorrow

21

THANK YOU

JC Roux

[email protected]

Page 22: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

22

Page 23: HLT in South Africa:  Yesterday, Today and Tomorrow

23

FUNCTIONS OF HLT CENTRE

Page 24: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

24

Importance of a National Resource Centre for HLT

• Acquiring, enhancing and managing text and speech data for HLT applications:– Extremely costly– Extremely time consuming– Requires skilled language experts

• Therefore: Need to develop reusable resources

• General practice world wide:– ELSNET (Europe), LDC (USA), (Japan)

Page 25: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

25

Functions of a National Resource Centre for HLT

• Constitutes one of the integral components for effective HLT product development in all official languages of SA.

• Will interact will all other role players for in the field to expedite service delivery in HLT applications.

• It will serve a depository of raw and enhanced reusable text and speech resources of all SA languages for use by different communities / institutions for language related purposes, e.g. NLUs, Terminology development sections, translation services, education etc

• It will serve as a language archive to document language and speech phenomena of the official languages of SA over a period of time as part of cultural heritage. (SA lost its ‘Sound Archive’)

Page 26: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

26

Tasks of a National Resource Centre for HLT

Data acquisition • Text data

– Different types / genres• Official / Formal (announcements, legislation)• Informal (magazines etc)• Literary (novels, drama etc)

• Sources:• Printed media: News agencies, Publishers• Government services (all levels, including Hansard)

Page 27: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

27

Tasks of a National Resource Centre for HLT

Data acquisition • Speech data

– Different types• Read speech • Spontaneous speech

– Different domains & conditions• Sport, news, interviews / noisy environments

– Different transmission modes• Telephone speech: mobile, fixed lines• Recorded speech (microphone)

– Different subjects• Male, Female, young, old, impaired

• Sources:• SABC archives• Own initiatives (!)

Page 28: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

28

Tasks of a National Resource Centre for HLT

Data enhancementText• Development and application of

– Tokenisers (word identification)– Parts of speech taggers (nouns, verbs, adverbs etc)– Morphological analysers (composition of words)– Syntactic parsers (composition of phrases / sentences)(With tools to be developed in collaboration with experts

from Technology Component)

• Creation of machine readable lexicons (XML format)

Page 29: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

29

A partial XML entry for the noun -ntu, class 1-2, is as follows

<Entry> <Head> <Stem>ntu</Stem> </Head> <Body> <Tone>3.2.9</Tone> <MSI>

<POS> <Noun> <Noun-features>

<Class-pf-s>umu</Class-pf-s><Class-pf-p>aba</Class-pf-p><Class-no>1-2</Class-no><Label>n</Label>

<Dim> <Form>umntwana</Form>

<Sense>baby, small child</Sense> </Dim> <Loc> <Form>kumuntu</Form>

Bosch SE, Pretorius L & Jones, J. Towards machine-readable lexicons for South African Bantu Languages. Nordic Journal of African Studies 16 (2): 131-145 (2007)

Page 30: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

30

Tasks of a National Resource Centre for HLT

Data enhancement (2)

Speech• Orthographic transcriptions of speech (S to T)• Phonetic transcription and annotation of speech

– Sound like utterances• Fluent speech• Repetitions, false starts etc

– Non sound like utterances• Background noise• Lip smacks etc

• Supportive software programmes (e.g. Praat)

Page 31: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

31

Ukuja(bula)

Speaker One – Ngithi ukujabula manje

u k u

Page 32: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

32

Tasks of a National Resource Centre for HLT

Data management & Software development

• Determine data needs in collaboration with HLT Unit in NLS for government applications

• Acquire the data with the assistance of language specialists at different nodes of the Centre

• Solicit development of appropriate software• Manage, back-up, distribute data to users• Commercialise resources: private sector

developers

Page 33: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

33

Tasks of a National Resource Centre for HLT

Training and Consultation• Identify training needs and potential trainers

• Develop non-formal training curricula for the reskilling of interested language practitioners

• Organise HLT training workshops at different venues in the country encouraging language bodies to participate

• Create awareness of HLT potential in collaboration with the HLT Unit of NLS

Page 34: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

34

Structure of National Resource Centre for HLT (Virtual Centre: Hub and connected nodes)

Centre YSA Eng

AfrikaansUni D

N SothoSign Lang

Uni BVendaTsonga

Uni AXhosaSwati

Centre XZulu

Ndebele

Uni CN SothoTswanaManagerial Hub

Coordination of Node Activities

Data acquisitionData enhancement

Data management & backupTraining

NLU (?)Lang (?)

LELE

LE

LE = Language experts

Page 35: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

35

Relationships

Seatla se sengwe se tlhapiswa ke se sengwe

(The one hand washes the other)

• No infringements on current lexicographic or terminological activities - Different foci

• Complementary activities:– Raw or enhanced data to be supplied to NLU`s /

PanSALB / NLS– NLU`s could contribute to National depository

• Win-win situation for the sake of technological development of our languages

Page 36: HLT in South Africa:  Yesterday, Today and Tomorrow

Workshop:HLT Collaboration 23 -26 November 2008

36

Concluding remarks

• Attempt to speed up activities in the development of HLT applications to provide services in a language of choice.

• To provide new resources and tools for lexicographic and terminological development.

• To provide a new range of job opportunities for graduates in African languages

• Keep South Africa abreast with new developments in the Information Society and avoid the marginalisation of the indigenous languages.