terminology management - [email protected] 18 statistical extraction •monolingual or bilingual...

78
Terminology Management Angelika Zerfass

Upload: others

Post on 20-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Terminology Management

Angelika Zerfass

Page 2: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Introduction

•Degree in translation (Chinese/Japanese into German), Computational Linguistics

•Worked for Trados in Japan, Germany, USA

•Since 2000, independent trainer and consultant for translation tools (TM tools and terminology management tools)

•2 employees for technical support and terminology management

Page 3: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Agenda

•What kind of terminology work are you doing?

•What is terminology?

•Collecting terminology

•Terminology lists and term bases

•Terminology checking

Page 4: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

WHAT KIND OF TERMINOLOGY WORK ARE YOU DOING?

Page 6: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Terminology Work

• Terminology projects are often initiated during the translation process.

• During the creation of a bilingual terminology list, the need for source language standardization arises

– Allowed and forbidden terms, additional information like definitions…

• Standardized terminology can now be used for source language checking

• The source language term lists are then complemented by the target languages (if there is a relay language, this needs to be completed first)

– Relay Language = source language Japanese, relay language English, further translation starts from English, not from Japanese

• Bilingual term lists can be used for translation and terminology checks (in the Translation Memory tools) in the target language

• The content of a term base can be published on the intranet/internet or access can be granted for other users than translators

6

Page 7: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

What is terminology?

• In our industry, terminology usually means

– product and company names

– product-specific terms

– company-specific terms

– subject matter-specific terms

– special abbreviations and acronyms

– Terminology that is prescribed by norms and standards

– our terminology versus the terminology of our competitors

7

Page 8: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Multilingual Workflow Management

8

Where do you find terminology?

• Product development, text creation, specifications

• Marketing

• Sales

• Contracts, agreements

• Product manuals / documentation

• …

Page 9: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

How is terminology represented?

• Dictionary

– Lists all the meanings of one term

– Bank = financial institution, elevation, bench…

• Terminology Database for translation

– One entry per meaning of a term

• bank (= financial institution)

• bank (= bench)

• bank (= elevation like a sand bank)

9

Page 10: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

10

Term base

Bank (financial institution)

Bank (elevation)

Bank (bench)

Bank (cloud bank)

Bank (to bank)

Dictionary Thefreedictionary.com

Page 11: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Creation of term lists Term extraction /setting up a term list

• Ideally the term lists with new terms and company/product specific terms are collected where they are created

• Product development / engineering

• (Technical) authoring

• Marketing / Sales

– Ideally these groups use rules on how to create terms – air flow sensor vs. air-flow sensor vs. Air Flow sensor vs. sensor airflow…

– Körperwäremesensor, Sensor für Körperwärme, Sensor Körperwärme

11

Page 12: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Creation of term lists Term extraction /setting up a term list

• During translation – Asking translators to either create new entries in the term base or fill in the

target language equivalents of the source language terms

• Extraction of terms from existing (monolingual) documents or bilingual translation resources (TMs) after a translation project or independent of any translation project. – manually

– tools-assisted

– Up to 20,000 words manual and tool-assisted extraction take about the same time for reading/checking the segments.

– About 3% of all words or 20-30% of all terms extracted by a tool can be considered real term candidates.

12

Page 13: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Terminology Extraction

• Tools-assisted

– Manual extraction assisted by translation tools

– Concordance tools – create a list of all terms in the document

– monolingual

– Statistical extraction tools – create a list of term candidates according to frequencies

– monolingual and bilingual

– Linguistic extraction tools – create a list of term candidates according to rules (ex: noun

phrases up to 3 elements)

– monolingual and bilingual 13

Page 14: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Manual extraction

• Some Translation Memory tools offer a way to collect term pairs out of the alignment of two documents and send it directly to a term base (memoQ)

14

Page 15: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Extraction of source language terms

• Some Translation Memory tools will create a list of the source language terms of the documents that are imported into a TM system (ex. across).

• Workflows allow for term translation as a first step

– Without context, this might not be feasible…

• Terminology goes into the term base directly (across, memoQ, SDL Trados via a separate extraction tool)

[email protected] 15

Page 16: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

16

Monolingual Extraction

• Extraction of terms from documents in one language.

– Creation of term lists… • important terms

– Who defines what is important?

– How can a tool “know”, what is important?

• frequent terms – What is frequent? 3 times / 10 times…

– Are frequent terms also important?

• new terms – According to whose level of subject matter knowledge?

– Compared to which term list / term database?

Page 17: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Concordance lists

• List of all terms in a document

• Frequency of each term

• Extraction can be controlled through stop word lists

• Terms in context

• Simple Concordance Program (SCP) freeware (European languages and

Arabic only)

17

Page 18: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Term candidates – statistical extraction

[email protected] 18

Statistical extraction • Monolingual or bilingual

• Suitable for every language / language combination (for example from a translation memory)

• The larger the collection of extraction material, the better the extracted lists

• Stop word lists

• Context sentences

• In theory for all languages, but in praxis Asian languages need more selection work than others

Page 19: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Term candidates – statistical extraction

[email protected] 19

Statistical extraction • Settings are focused

on words surrounded by spaces as delimiters

• Result: "stupid" term candidates

Page 20: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

20

This is, how a text looks to a statistical extraction tool…

Vot gnig harengoga fuor tok gnig nor shewerginhatz. Mirhon bortup tip trewshu gnig batbo loqtet. Bortup ter, bortup nofdas, semsel nih furpo ayano bliktreptat. Mirhon granbevtrov driktopret grig go wasbrekit mut mirkep taptro gnig suf. Aktrep zitpek nitnit bortup mil. Setrimb ak troptan bur metlatkento.

Page 21: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

21

Bilingual Extraction

• Term extraction from bilingual sources like translation memory files or bilingual translation files

– Creation of parallel lists of terms and their translation(s)

• All forms of the term and all its translations

• Only basic form

• Most frequent translation of source term

Page 22: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Statistical Extraction with SDL MultiTerm Extract

List of source terms List of possible translations

Context sentences from TM Notes field, definition field; add context sentences to entry in term base

Page 23: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Linguistic Extraction Tool

• Tool knows about the structure of the language

• Extracted terms can be reduced to their basic from with the help of dictionaries and rules

• User can define the rules used for extraction – like noun phrases up to 3 words…

• Monolingual or bilingual extraction

• Extraction limited to supported languages (mostly Western European)

Page 24: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Results of a linguistic Extraction with Context

(TerminologyWizard)

Page 25: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Linguistic Extraction with SDL PhraseFinder

List of source terms List of possible translations

Grammar, comment fields

Context sentences from TM

Page 26: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Test Results…

• Statistical extraction "works" for all language pairs, but the tools are better with European languages than Asian languages

• For bilingual extraction Asian languages work better as target languages

• Concordance tools are limited in file formats that they can extract from

• Manual extraction might not be the fastest, but the best if you have a certain goal (company specific terms…)

• Cleaning up these extracted lists takes TIME!

Page 27: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Online Extraction Japanese/Chinese

• Gensen web (uses a POS tagger)

• Test with Japanese Yahoo website, homepage

• Test with China Daily news article

Page 28: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

29

Term extraction issues • Terminology extraction is a highly individual

process – Goal of extraction, subject matter expertise, available

time

• Tools use different methods for terminology extraction

– Concordance, statistics, linguistics

• Tools support different file formats for extraction and export

– Monolingual, bilingual, export formats

• Tools sometimes don’t show the context from which the term was taken

Page 29: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Example – Bilingual extraction from a TM, 60,000 units

– Several statistical extractions with varying settings to produce lists with more or less "noise" (ca. 10 min / extraction)

– General stop word list

– 800 extracted terms (medical) English-German took us 5-6 hours

– Deleting terms that were too general, product names that stayed the same, cleaning up rubbish strings…

– Looking up context to determine which of the translations should be marked as term candidates

– Export to Excel (+ sorting from short to long, number of words in a term… 0.5 hours)

– And now the list needs to be checked by experts!

Page 30: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

ADDITIONAL INFORMATION

Page 31: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

What else belongs to a term?

• A term and its equivalent in other languages is often not enough. More information is needed…

– Source (where did you find the term?)

– What product, business area… does the term belong to?

– Definition (how do you write a good definition? )

– Images

– What is allowed and what is forbidden

– Additional notes, comments, examples… 32

Page 32: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

33

A set of questions to create good definitions

(as suggested by Kurt Hilgenberg)

• Object-oriented questions • What is XXX?

• Where does XXX appear?

• Functional questions • How does XXX work

• What are the characteristics of XXX?

• How does XXX differ from YYY?

• Questions on reasons and conditions • Under what conditions does XXX appear?

• Instrumental questions • What are the objectives of XXX?

• How is XXX used?

Page 33: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Multilingual Workflow Management 34

Example for the term DIALOG

Object

What is a dialog? A part of the user interface

Function

What are the characteristics of a dialog? A dialog has checkboxes, radio buttons, input

fields, dropdown menus or other ways of selecting

or inputting values

How does a dialog differ from a window? A dialog is used for activating / deactivating

settings

A window is used for showing the result of the

settings

Condition / Reason

Under what conditions does a dialog appear? The dialog appears when the user selects a menu

item ending with “…“

Instrumental / Usage

What is the dialog used for? The dialog is used to activate / deactivate the

options and input values

Page 34: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Multilingual Workflow Management 35

Example for the term SWITCH Object

What is a switch? A piece of hardware

A selection item on a dialog in a software user

interface

Function

What are the characteristics of a switch?

The switch can be turned to different positions

The switch is a square box that can contain a mark

or be empty

Condition / Reason

What is the reason for the switch? The switch allows the changing of the settings

Instrumental / Usage

How is the switch used?

The switch is turned clockwise or counter-

clockwise

The switch box is clicked and then shows a mark,

it is clicked again to take the mark off again

Page 35: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

36

Tips on terminology work

• As much useful information as possible – Definition, Context example, Source information,

graphics, status, customer

• ...but only as much as you need to decide if the term is to be used in a certain situation or not

• Enter the term as it will appear in the text – Incorrect: Screen (Monitor)

– Correct: Screen Monitor

• Enter base / singular form of the term

• For Asian languages, which are more contextual than others, add longer expressions

Page 36: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

EXCHANGING TERMINOLOGY

Page 37: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

38

Exchanging Terminology – Working with a client

• Ask for term lists

• Create a template for the customer to add terms and answer questions

• Define clear rules and color marking on who is allowed to do what (nobody to add rows or columns, only fill the column that is indicated for you to work in…)

38

Page 38: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Exchanging Terminology

• Working with Translators • Usually the first step is an Excel table

• Translators will either import it into their tool of choice or will consult the list manually during translation

• It is hard to check, if the terminology was used consistently, if there is no term base with term checking features in the translation process

• If you know what software your translators are using, you can send them a pre-defined term base to attach to their translation system

39

Page 39: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Developments in Terminology Work

• If possible

– All users of terminology work on an online system

• Web-based version of the terminology component of the translation tool (WebTerm, MultiTerm Online, qTerm, crossTerm Web…)

• Self-developed web-based interface for terminology work by a translation vendor (Lookup by DOG, TermXplorer…)

• Web-based terminology system like TermWeb (Interverbum)

• Connection to a TM system is a plus 40

Page 40: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Terminology Work

• Terminology takes a lot of time

– Limit the exchanges on terminology questions to a certain amount of communication back and forth

– Someone needs to have the power to decide what to use (even if the client is not responsive)

– Terminology work needs good documentation

– Terminology is a work in progress!

41

Page 41: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

What about the TBX standard exchange format?

• Not all tools are able to read this format yet and the files may be different from different tools

– TBX is quite complex and needs a lot of attention to detail

– Some examples for TBX support: – MultiTerm 2009/2011 (export; import only with conversion)

– memoQ qTerm (yes), internal term base (no)

– MultiTrans (TBX import and export)

– across (TBX import and export) (TBX between across and MultiTerm works well in our experience)

42

Page 42: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

TERMINOLOGY LISTS/ TERM BASES

Page 43: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

45

Content for a term list / term base

• Subject matter or company/product-specific terms

• ProductName, Company - Name

• “compatibility across platforms”

• view settings – Dialog with settings for viewing something or a button to view the

settings of something?

• Collect terms, synonyms, abbreviations

• screen, scr., monitor

• Base form of the term

• Decide on term status field

• forbidden, deprecated, pending, confirmed by…, used by us, used by competitor… 45

Page 44: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Terminology Retrieval during Translation

46

Known terms marked in the text in blue

Additional Information from the term base (examples, explanations…)

Blue = allowed term Black = forbidden term

Page 45: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

47

Metadata in term lists / term bases

• Depending on the intended usage

– Definitions, status, source, pictures for translators

– Gender, grammar, examples for non-translators

– More grammatical information for use in machine translation systems

– Explanations with examples and pictures for educational purposes

47

Page 46: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

48

Levels on the Entry

• A terminology entry usually has 2 to 3 levels where you can add information

– Entry level: Information on the entry as a whole, like a picture, a general note, a product name…

– Language level: Information that applies to all term in one language (definition…) – this level is not often used

– Term level: Information that applies to a certain term, like status forbidden, source of the term, context example, links to other, similar terms, gender information…

48

Page 47: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

49

Term Base Entry (MultiTerm)

Term level information (free text)

Synonyms

Term level information (categories)

Entry level information

Page 48: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Term list -> Term Base

• Getting the data into a term base

– Import format for a term database depends on the tool that is going to be used to maintain the terminology

• Row layout – 1 row = 1 terminology entry

• ID layout – several rows with the same ID = 1 terminology entry

50

Page 49: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Row-based term list (one row = one terminology entry)

51

ID-based term list (all rows with one ID belong to the same entry)

Page 50: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Term Bases of TM systems

• Some examples • SDL MultiTerm is a freely configurable term base

system. Can be used online as well.

• memoQ can import TMX files or delimited text files (saved from Excel) directly, but the user interface is fixed and the number of fields is limited

• memQ qTerm (web-based term base)is similar to MultiTerm in that it is freely configurable and can accommodate as many fields as you like

• TermStar/WebTerm offer a pre-defined interface where custom fields can be added

52

Page 51: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Import routines

• Preparation steps to import a list (most often Excel) into a term base:

– memoQ: save Excel as Unicode TXT

– MultiTerm: convert Excel to MultiTerm import format (XML) through the MultiTerm Convert tool.

53

Page 52: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Structure of a memoQ entry List of entries in the term base The 2 selected languages for display in the list

Metadata for term in language 1

Metadata for term in language 2

Metadata for the entire entry (all languages)

Page 53: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Excel for memoQ

Entry level data Term level data In memoQ, the string NonTerm can be used to mark a term as forbidden. The corresponding checkbox will be set during import.

Page 54: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Import settings (mapping Excel to memoQ structure)

Columns are mapped to the corresponding field in memoQ

The status column, containing the NonTerm information goes into the Term information field.

Page 55: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Excel for MultiTerm

Create as many fields as you need and name them as you like

Hyperlinks will be available as links in MultiTerm as well

Page 56: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Import settings (mapping Excel to MultiTerm structure)

Columns are mapped to the field types in MultiTerm

Metadata columns are connected to the correct level (entry level or term level)

Page 57: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Imported term base in MultiTerm

59

Page 58: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Maintaining terminology within the term base

• Terminology is an ongoing process – Adding new terms

– Adding new languages

– Adding additional information (pictures…)

– Changing translations / incorporating feedback from users

– Changing the status of a term (forbidden, allowed, deprecated, pending…)

– Separating or combining terminology resources

– Converting terminology resources for use within other tools

60

Page 59: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

TERMINOLOGY CHECKING

Page 60: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Checking terminology during translation and the authoring process

• Tools that integrate with the authoring environment to check the source documents for – Forbidden terms, use of synonyms, correct spelling of the words,

grammatical or structural errors

• Tools inside translation memory systems that check the source language-target language sentence pairs against a term base for – Forbidden terms

– Terms that are in the term base where the translation of the term has not been used

– Missing terms in the term base

• Stand-alone QA Tools that offer a terminology checking component for bilingual translation documents

63

Page 61: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

64

SDL Author Assistant

Acrolinx

Page 62: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

65

Term Check inside a TM tool environment (SDL Trados 2009/11)

- Forbidden term (phrase)

- Wrong term (set instead of sentence)

Page 63: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

17-20 Nov. Buenos Aires, Argentina

66

Term Check Results

Wrong translation

(selector key = Auswahltaste not Auswahlschalter)

Wrong translation

(monitor = Bildschirm, not Monitor)

Page 64: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

17-20 Nov. Buenos Aires, Argentina

67

Term Check Results

Wrong translation

Several translations Missing translation in term database

- Forbidden term (Monitor) - Wrong term (Auswahlschalter) - No target (menu)

Page 65: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

70

Summary

• Each checking routine only checks some possibilities, none checks the whole range

• Term from the term base was not used

• Missing translation in term list / term base

• Term with several translations

• Missing source term (reverse check)

• Setup possibilities of the term base define the range of checks

Page 66: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

71

Term base setup with forbidden terms (SDL MultiTerm)

Page 67: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

72

memoQ term base with wildcards ( * or | ) for terminology matching

Page 68: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

WHAT ELSE…

Page 69: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Using terminology outside of the translation process

• What else is terminology good for

– Company dictionary

• Help new employees / sub-contractors to understand the products

– Training machine translation systems (additional grammatical information needed)

– Training search engines (to search for synonyms for the term the user entered)

– Setting a standard by publishing the terminology (see Microsoft glossaries)

74

Page 70: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

75

Terminology Processes

Term list

Authoring

Term list

Terminology extraction

Terminology approval

Term list

Import

Terminology Database

Translation and Terminology Check

Term list

• Term translations • New terms • Change requests

Term list

Terminology approval

Import of translations

Term check during authoring

Online publication of term database (intranet/internet)

Term list

• New terms • Change requests

Term list

Terminology approval

Page 71: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

76

From terms back to sentences

• If terminology is used for source language checking then sentence checking can be added as well

• A list of standard sentences is compared against the document to see where the author deviated from the standard sentence structures

• Standard sentences can be extracted (manually) from translation memory systems

• Authoring memory systems then use these sentences for source language checking

Page 72: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

17-20 Nov. Buenos Aires, Argentina

77

Sentence Check • Authoring Memory Systems to check consistent

use of standard sentences

Select the second menu point – open recent project

Select the first menu point “open current project“

Select the second option – show open projects

Select the view menu and click on project

95%

92%

90%

The text you write…

Page 73: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

78

Online terminology

• Terminology management was either done in a standalone terminology management tool or for use in a translation memory system.

• Not everybody who needs to work on the term owns the corresponding term management system or wants to work with one.

• Online systems allow to work on terminology together with clients and translators through a browser interface.

• Users have certain rights on the entries (making comments, adding entries, deleting terms…)

Page 74: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

MultiTerm online

79

79

Page 75: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

memoQ qTerm

80

80

Page 76: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

Interverbum

81

81

Page 77: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

QUESTIONS?

Page 78: Terminology Management - ZAACZerfass@zaac.de 18 Statistical extraction •Monolingual or bilingual •Suitable for every language / language combination (for example from a translation

85

Some Terminology Extraction Tools

– Concordance tools (freeware)

• Simple Concordance Program (SCP), http://www.textworld.com/scp/

• ExtPhr32, http://publish.uwo.ca/~craven/freeware.htm

– Term extraction tools / components of translation memory tools

– Online term extraction for Japanese, Chinese

• http://gensen.dl.itc.u-tokyo.ac.jp/gensenweb_eng.html

– Statistical Extraction

• MultiTerm Extract, Déjà Vu Lexicon, Heartsome Dictionary Editor, across

• TermiDOG (www.dog-gmbh.de), Chamblon Terminology Extractor (http://www.chamblon.com/terminologyextractor.htm), Terminotix Synchroterm (http://www.terminotix.com )…

– Linguistic Extraction

• Synthema Terminology Wizard

(http://www.synthema.it/english/servizi/traduzioni.html) – Does not seem to be available commercially anymore

• SDL PhraseFinder…

85