software - terminology coordination unit · • mysql (database management) • tomcat (web server)...

76
Västra vägen 7 B 169 61 Solna Telefon: 08-446 66 00 Telefax: 08-446 66 29 Webbplats: www.tnc.se E-post: [email protected] © Terminologicentrum TNC Software = hard for national termbanks? Henrik Nilsson & Sandra Cuadrado í Camps Terminologicentrum TNC & Termcat IITF Colloquium Vienna, Austria 9 July 2015

Upload: others

Post on 30-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

13.7

Software = hard for national termbanks?

Henrik Nilsson & Sandra Cuadrado í Camps Terminologicentrum TNC & Termcat

IITF Colloquium Vienna, Austria

9 July 2015

Page 2: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Outline

• ”National termbank” • The concept and some examples

• Rikstermbanken • Cercaterm

• State-of-the-art (TERMINTRA) • Aspects – and related technical challenges

• getting (and presenting) content • harmonizing content • users • digital age … • reuse • getting funding

Page 3: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

”National” could imply

•a government responsibility and financing •a link to a national terminology centre •a basis in the ”national” conceptual world •a certain language choice (monolingual, only ”national languages”)

•a certain quality •a certain accessibility (free of charge, adapted) •a certain scope (e.g. cover all terminology in the nation, nothing ”foreign” etc.)

•a certain status (affecting usage) •a marketing gimmick •

Page 4: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

”National” should imply

• a certain coverage (as to contents) • a certain status (acknowledged by professionals and a language or terminology institution)

• accessibility (open and freed of ownership claims)

[Termintra, Oslo, 2012]

Page 5: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

x [Guidelines for Terminology Policies, Unesco]

”the national termbank, which attempts to serve a general purpose role in coordinating the creation and use of terminologies within a country, and hence is theoretically multifunctional, multilingual and exploited by widely differing kinds of users”

[McNaught, 1987]

national terminology database database containing mono- or multilingual terminological data […] established at country level

Page 6: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

”I have been a manager […] within the U.S. Federal Government for over 30 years. In that time, I have observed that the dominant case of ineffectiveness, inefficiency, and unreponsiveness in operations is the inconsistent terms used across the various boundaries of government, their contractors, industry, non-profits, and citizens. There are terminology boundaries between locations, organizations, offices within the organizations, work functions, processes, resources (e.g. people, intelligence, funds, skills, materiel, facilities, services), and capability requirements (e.g. missions, information systems).”

x

[Roebuck, 2009]

Why a national term bank?

Page 7: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

”Next, the vocabulary of these functions would be automatically collected, organized, and placed into a National Terminology database to enable integration, interoperability, unification, and federation of operations”

x

[Roebuck, 2009]

→ technical challenges!?

Page 8: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Slovenia: Evroterm

Wales: National Terminology Portal

Stofnun Árna Magnússonar í íslenskum fræðum, Iceland: Orðabanki

NL-Term, Nederländerna: Nedterm

Foras na Gaeilge, Ireland: Téarma.ie

Eter, Estonia: ESTERM

Latvia: EuroTermBank

UZEI, Basque country: Euskalterm

TSK, Finland: Vetenskapstermbanken, TEPA, Valter TNC, Sweden:

Rikstermbanken

Société française de terminologie, France: FranceTerme

LKI, Lithuania: Terminų bankas

European ”national” termbanks

Türk Dil Kurumu, Turkey: Bilim ve Sanat Terimleri

Croatia: National Terminology Portal (incl. Struna)

Norway, : Termportalen, Snorre

Dernmark, : (DTB)

Confédération suisse Termdat

Termcat Cercaterm

Page 9: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Struna (CR)

Page 10: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

FranceTerme (FR)

Page 11: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Terminų Bankas (LT)

Page 12: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

BFT (FI)

Page 13: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

National Terminology Portal (Wales)

Page 14: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Risten (Sápmi)

Page 15: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Orðabanki (ISL)

Page 16: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Téarma.ie (IRL)

Page 17: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Slovenská terminologicka databáza (SK)

Page 18: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

AkadTerm (LV)

Page 19: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Page 20: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Euskalterm (Basque Country)

Page 21: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Türk Diril Kurumu (TR)

Page 22: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Terminoģijas portāls (LV)

Page 23: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Nedterm (NL)

Page 24: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

5.1

Other termbanks

• EuroTermBank

• National Termbank (RSA) • IATE • ISO Online Browsing Platform • UNTERM

• EAA Glossary • Electropedia • METEOTERM • ILOTERM • FAOTERM

Page 25: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

EuroTermBank

Page 26: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

IATE

Page 27: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

www.rikstermbanken.se

Page 28: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

· IT-propositionen, (Prop. 2004/05:175), 2005

· ”Bästa språket” (Prop. 2005/06:2), 2005

Background

· Grant from Ministry of Industry, Employment and

Communications: 2005: 1 500 000 SEK; 2007: 750 000 SEK, 2009: 0; 2011: discussion about semantic resource!

· IATE, EU; evaluation 2004

· Terminų Bankas, Lithuania & EuroTermBank

· TISS, 2002–2004

· Nordterm-Net, 1999; Brussels Declaration, 2002 et al.

”The fast development of society requires constant work on creating and making accessible agreed-upon terminologies, within more and more subject fields. An easy access to terms via the Internet in a national termbank [rikstermbank] endorses such a development.” ”the establishment of a national central term bank, a ”rikstermbank”, is a prerequisite for an easy access to, and quality assurance of, Swedish terms in all domains.”

Page 29: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Rikstermbanken as a tool

for search and retrieval

for terminology work, research

for storage

Page 30: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

”Rikstermbanken should mainly reflect concepts of the Swedish society; however, this does not mean that the termbank would comprise only Swedish terms. In order to make it function in the way it is planned, the termbank should also contain term equivalents in foreign languages, and not only in English but also in various immigrant languages and in the official minority languages of Sweden.”

[IT-propositionen, prop 2005/06:175]

Page 31: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Current contents

• no limitations as to domains!

• Swedish conceptual world = starting point

• complete glossaries, but also parts of documents and excerpts

• some digitalizated material

• quality control by terminologists (and at times the supplier)

• presentation phase → consolidation phase

overview → harmonisation

Page 32: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Rikstermbanken in numbers

• 106 000 term records • 300 000 terms (incl. look up-terms, synonyms,

equivalents) • 28 languages • 71 % definitions (in Swedish)

• ca 1500 unique sources • ca 500 suppliers

Page 33: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Contents

• priorities

• selection, types

• preparation (enhancing, record making & breaking)

• harmonization (doublettes …)

• updating

• addition of new material

• quality – quantity?

Page 34: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Preparation of the material

• termbank adaptation (reformating according to NTRF-RTB, exclusion of remaining ”book-related” aspects)

• selection

• changes for consistency

• linguistic and content-related adjustments (incl. removal of target group adaptations)

• discussion with suppliers

• illustrations

• semi-automatic three-step import control tool

Page 35: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Technology

• experience from Termdok development and Nordterm-Net (MLIS-project)

• comparisons to existing TMS-software and standards (ISO, LISA et al)

• IATE evaluation

• co-operation with IATE, EuroTermBank

• → proper software

• open source: Lucene, Mysql, Tomcat, Java

Page 36: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Technical development – Rikstermbanken

• Oracle replaced by open source: • Mysql (database management) • Tomcat (web server)

• Lucene (indexing) • Java applications • Iterative process • Documentation via internal wiki

Page 37: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Cercaterm (CAT)

Page 38: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Cercaterm

• online platform designed, supported and updated by Termcat (since 2000)

• development of terminological products, terminology standardisation, terminology consulting service updates to Cercaterm

• Termcat’s terminology production, standardized terminology, queries resolved + other material

• 230 000 files (more than 925 000 denominations) • new functions in 2010 (based on user survey):

search, sources • 3 million visitis in 2014 • also other information

Page 39: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Cercaterm (CAT)

Page 40: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Cercaterm (CAT)

Page 41: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Cercaterm (CAT)

Page 42: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

TERMINTRA

• Forum for discussion on national termbanks • The concept of ”national termbank” • Aspects: General, Contents, Users, Funding,

Organization, Technology • First seminar in Oslo 2012, second in Zagreb 2013 • Participants from Catalonia, Croatia, Denmark, Finland,

France, Ireland, Iceland, Latvia, Norway, Sápmi, Sweden, Switzerland, Wales

Page 43: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

TERMINTRA: Technology

• What technical solutions are in use today, and are some more appropriate than others?

• Should a national term bank be based on a distributed solution or not? Or, rather, constitute a kind of portal? Pros and cons?

• What standards should be the basis for national terminology databases (storage and exchange formats, etc.)?

• Are the current terminology management systems suitable for the demands which could be made on a national term bank? To what extent are today’s national terminology banks based on proprietary software (use of open source or not)?

Page 44: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

”The current situation is that most of the bigger existing term banks use purpose-built software, although there are cases where general purpose information retrieval software is used. Although computerized term banks have been in existence for a number of years, there seems to be little agreement as to how they should operate, and if the present situation persists, their use will continue to be low. If term banks are to become widely used certain changes in practice will be necessary; changes which in turn have implications for the software that must be used for term bank operation.”

[Negus, 1979]

Page 45: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

”the longer established term banks tend to use purpose built software, partly because nothing generally available at the time was found to be suitable, and partly because each is aimed at providing a range of services not found elsewhere, using terminological records and searching methods which are more or less unique. […] all systems should attempt to maintain the greatest flexibility in their approach. However, this is difficult to achieve where specially created software is concerned; there is an inevitable tendency to provide what is definitely required at the time of program specification, perhaps giving little thought to what services might be required, or facilities demanded, at some indeterminate time in the future.”

[Negus, 1979]

Page 46: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

”As to the technological aspects of national termbanks, it became clear during the presentations and discussions that most of the represented termbanks had developed their own technical solution (which, however, in many cases relied on international standards). The exception was the Finnish termbank using Wiki-technology and open source software.”

[Proceedings, TERMINTRA I, 2013]

Page 47: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Perspective

Aspect Contents Technology

Manager

Users

Suppliers

20.2

Financing bodies

Organisation

X X X

X X

X X (X)

X (X)

(X)

(X)

Page 48: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Challenge: getting content

• term extraction as part of software (or separate)?

• automatic ”record breaking” into data categories (definition indicators etc.)? And ”record making”?

• automatically ”fill in the gaps”? (automatic classification)

Page 49: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

[Heid (1991) in Martin & van der Vliet, 2003]

Various sources

Page 50: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Import process (of glossaries)

1. inventory (weekly) & preliminary assessment 2. formal inquiry 3. collection 4. formatting 5. review 6. (feedback) 7. first import 8. adjustments 9. second import 10.updating

Page 51: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Term bank contents: challenges

• Selection: all or nothing – or a little? • Interpretation of contents, ”decontextualisation” • Term choice (variants, synonyms etc.) • Definition vs. explanation • Updating vs archiving → consistency changes? • “Decustomization” (= depersonalisation) • “Record breaking” & “record making” • Document types: legal documents …

Page 52: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

svTE offset svDF litografisk plantryckmetod där tryckplåten är preparerad så att färggivande ytor

gjorts färgmottagliga och vattenbortstötande och icke färggivande partier gjorts vattenmottagliga och färgbortstötande

svRETE litografi, djuptryck, direktlito svAN Överföringen av tryckbilden från offsetplåten sker indirekt via en gummiduk till

papperet.

”Record breaking” (1) Before

After

Page 53: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

”Record breaking” (2)

svTE incidens HONR 1 svFK Antalet fall av en viss sjukdom som uppträder i en befolkning under viss tid; anges t.ex. som antalet diagnoser per 1 000 invånare per år. svTE incidens HONR 2 svUPTE incidenskvot svFK Antalet av en viss studerad händelse i en klinisk prövning eller kohortundersökning, dividerat med antalet deltagare i gruppen. Graden av skillnad mellan två gruppers incidenstal kan uttryckas genom att det ena divideras med det andra till en incidenskvot. svRETE händelse

Before

After

Page 54: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Challenge: getting content

• term extraction as part of software (or separate)?

• automatic ”record breaking” into data categories (definition indicators etc.)? And ”record making”?

• automatically ”fill in the gaps”? (automatic classification)

• mirroring (QA?) or double storage (updating)?

Page 55: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Distributed or not?

”All terms in one place” + consistency + control + not many other termbanks around … + pragmatic: simpler at the time …, traditional – double storage – updating needs – administration of contributors – higher technology demands on contributors

Page 56: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Challenge: presenting content

• automatic compounding of term records

• visualization (ontologies etc.)

Page 57: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

bagværk

småkage

brød kage

tørkage, fin kage

kage for > 1 person

skærekage

gulerodskage

sandkage

flødekage, flødeskumskage

kage for 1 person

konfekt?

lagkage?

gærkage

kaffebrød?

marengsbund lagkagebund

genoisebund

vandbakkelse

tærte

kiksbasered bund

creme frugt bagt kage

vaniljecreme

bavarois

mørdejstærte butterdejstærte

Page 58: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

.10b

Page 59: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Challenge: harmonizing content

• signalize various statuses (”primaries”)

• automatic handling of doublettes

• automatic calculation of ”definition similiarity”?

• version management

• automatic updating of content

• automatic notification of updating (to users, of existing links etc.)

Page 60: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

From presentation to consolidation

”need one accepted definition of a concept”

Amount of content

Time

Page 61: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

User survey

Good Bad No opinion

16. If your search for a particular term generated several hits, what do you think about that?

84,3 % (172) 2,0 % (4)

13,7 % (28)

27 skipped question 17 comments

Page 62: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

• on a national level: Rikstermbanken • background & perspectives & user survey

• content revision • harmonisation within a source

• definition – explanation • harmonisation between sources (i.e. within the

termbank as a whole) • doublettes

• problems and solutions • content presentation • content updating

Resource harmonisation

Page 63: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Harmonisation: problems

• Within and between sources • Definition vs explanations – choice? • Certitude of domain? • Breaking of conceptual whole, break in macro

and micro structures • Role of publication date • Homonyms, synonyms • Degree (%) of similarity between definitions? • Handling of diverging interests (be shown –

disappear etc.) • Different sources for different data categories → indication of doublettes or problem?

Page 64: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Harmonisation: within a source

• often semasiological presentation → redundancy (e.g. synonyms in separate records)

• choice of definition or explanation

• with respect to macrostructure (cross-references etc.)

• homonyms

Page 65: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Harmonisation: between sources

• (automatic) removal of absolute doublettes (but other information, other languages etc.?) →

• limit (%) of ”definition similarity” – calculation?

• combination of several sources in one record instead?

• several organizations using the same definition is in itself an interesting piece of information → special marking in hit list?

• source respect? © issues?

Page 66: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

x

”[…] a large, general term bank to serve an entire nation. Such a bank would satisfy the needs of users with a variety of tasks, of prior knowledge, of organisational adherence, or of requirements for a specific product.”

[Åström, 1987]

National term bank

Page 67: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Challenge: users

• satisfy all user groups?

• measures of usability?

Page 68: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

“for a successful operation of a term bank, today’s imperative is reaching out for the user and delivering the required content, wherever it may reside, with the method and in the format required by the user. The area of user participation and interaction is identified […] as yet to be successfully integrated in the design of terminology portals.”

[Vasiljevs, Rirdance and Gornostay, 2010]

Page 69: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

• = Important for terminology products!

• But: sometimes over-estimated, esp. concerning human users and layout of term banks?

• Demand, frequency of usage vs development costs?

User adaptation!?

Page 70: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Challenge: digital age …

• crowdsourcing → nichesourcing

• wiki-technology

• voting procedures

• moderating functionalities

• access rights, roles and responsibilities etc. → new administrator interfaces etc.

• usage on new devices (tablets, phones etc.) → app

Page 71: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Critics

“Crowdsourcing killed indie rock … ‘cause crowds have terrible taste.” [Weingarten in Keats, 2011]

“government needs smart-sourcing, not crowdsourcing.” [Peterson in Keats, 2011]

“Collectively based lexicography is often regarded with scepticism by professional lexicographers since anyone can contribute anything and there’s no possibility to keep the quality level of the contributions under control. This way of working has even been described as a potential danger to all serious lexicography since these dictionaries risk disturbing the trust in the two qualities that users generally associate with professionally produced dictionaries: quality and reliability.”

[Doherty in Svensén, 2004]

Page 72: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Challenge: reuse

• linked open data etc.

• APIs, URIs

• web tracking

• version management?

• thematic portals

• integration, plug-ins

• CAT, Word etc.

• federations

• (© issues)

Page 73: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

→ ”semantic resource”

”Semantic Resource […] refers to all ontology-similar entities, such as taxonomies, dictionaries, thesauri, etc.” (Lima et al, 2010?)

Fackverket 3.0 – linked open data banisters • TNC, Wikimedia, Bobitek • funded by Swedish Agency for Innovation Systems • aims: enhance use of linked open terminologies by co-ordinating and further develop existing resources and tools

Page 74: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Challenge: getting funding …

• few existing national termbanks use OTS

• not good enough? (evalutation criteria?, new demands?)

• easier to obtain funding if you develop your own software?

Page 75: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

x

”What will be the needs of linguistic data bank users in the future? These can of course vary to a large extent, but I believe that the ones we should pay attention to are the simple, down-to-earth requests, which can be summed up under the following keywords: simplicity, quality and service.”

[Åström, 1982]

Page 76: Software - Terminology Coordination Unit · • Mysql (database management) • Tomcat (web server) • Lucene (indexing) • Java applications • Iterative process • Documentation

Västra vägen 7 B 169 61 Solna

Telefon: 08-446 66 00 Telefax: 08-446 66 29

Webbplats: www.tnc.se E-post: [email protected]

© Terminologicentrum TNC

Links

· [email protected]

· TNC: www.tnc.se

· Rikstermbanken: www.rikstermbanken.se

· [email protected]

· Termcat: http://www.termcat.cat/

· Cercaterm: http://www.termcat.cat/ca/Cercaterm/Fitxes/

.29