an introduction to the internationalisation of econtent a course provided by the localisation...

40
An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project Instructor: TBC Date: TBC

Upload: gabriel-eaton

Post on 26-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

An Introduction to the Internationalisation of eContent

A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Instructor: TBCDate: TBC

Page 2: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Course Outline

• Three 1½ hour sessions• Each session = 1 hour lecture + ½ hour exercise• Session 1:

– Introduction of Basic Concepts– Character, Script, Font, Character Set– Character Encoding

• Session 2:– Writing for an International Audience– Formatting Conventions– Graphics– Other Cultural Issues

• Session 3:– File Formats– Typical Internationalisation Problems (Web Forms, Text

Expansion…)– Internationalisation Checklist

Page 3: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Exercises

• 3 exercises during today’s course

• Overall aim – To design a basic internationalised web site

• Examine this sample web page

• Potential problems for international visitors???

• Today YOU will create an internationalised version of this web page and more!

Page 4: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: Introduction to the Basic Concepts

• eContent– Any material made available in electronic format– Examples: Web pages, PDF documents, material distributed via

mobile technology– This course will deal with web-based material only

• Internationalisation– The process of adapting a product or its contents so that it can

deal effectively with multiple languages, writing directions, cultural conventions and so forth without the need for redesign

– Involves creating an interface that is globally inoffensive– A prerequisite for localisation – Often abbreviated to i18n

Page 5: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: Introduction to the Basic Concepts (*continued)

• Localisation– Takes internationalisation a step further– The process of taking internationalised content and adapting

it both linguistically and culturally to a specific target market

– Localised content should look and feel as though it was created for the target market, in the target market

– Often abbreviated to l10n

– Note: If content is internationalised properly, the localisation process will be very straightforward

Page 6: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: The Business Case for eContent i18n and l10n

• Why internationalise and/or localise eContent?– The Internet is an ideal way to reach the global market– Total online population of 619 million– Only 36.5% of this population speak English as their first

language– Consumers are 3 times more likely to buy from web sites,

which use their native language

• What are the native languages of today’s Internet users?– The following chart taken from www.eMarketer.com illustrates

this…

Page 7: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: The Business Case for eContent i18n and l10n

(*continued)

Page 8: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: Deciding on a Strategy

• When internationalising an existing web site, you must review each part of the eContent and decide whether to:

– Adapt it

– Translate it

– Remove it or

– Replace it

Page 9: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: More Definitions…

• Characters– The most basic logical components of written language that have

semantic value, i.e. a single textual symbol– Can be a letter, a number, a punctuation mark, a currency symbol

etc. Some examples include: £ % / g 7

• Glyph– Any character can be displayed in a variety of ways. A glyph is the

graphical representation or shape of a character e.g. A A a - all different glyphs of the same character

• Bit, Byte and Double Byte– A byte is the minimal unit of storage used within a computer

architecture– Most characters can be represented by one byte, which is equal to 8

bits– However, some of the characters used in certain languages (such as

Chinese) are more complex and require 16 bits or 2 bytes of storage space. Therefore, these are known as double-byte languages

Page 10: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: Definitions… (*continued)

• Script– A collection of symbols used to represent textual information in

one or more writing system(s)– Examples are Latin, Greek and Cyrillic

• Writing Systems– A set of rules for using one or more script(s) to write a particular

language– An example is the American English writing system – Writing Systems can differ in their directionality. Some read left-

to-right, others right-to-left, while others still are multi-directional

• Character set– A collection of all of the possible characters that can be used in

the writing system of a particular language, or group of languages

Page 11: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: Definitions… (*continued)

• Character Encoding– Mapping of characters from a character set to a set of unique

numeric codes. These codes are stored in the code page of each computer

• Font– A collection of glyphs used for the visual display of character

sets. Fonts can usually be displayed in a variety of sizes– An example is Times New Roman, Size 12– Note: Each font is only designed to display a certain number

of characters. If you choose the incorrect font for your eContent files, certain characters may appear distorted

Page 12: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: Character Encoding

• When you hit a key on your keyboard:– The keyboard sends the numeric code of the key/character to the

processor– The processor looks up the encoding of the computer to

determine which character is represented by the specific numeric code

– It then sends this data to the display device and the appropriate character appears on your screen

– Problems can occur if somebody views what you have typed on another machine. If their machine uses a different encoding, certain characters may display incorrectly

• Why does this problem occur?– Different code pages use the same numeric code to represent

different characters. So, for example, the numeric code for œ and the code for © could be identical in two different code pages

Page 13: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: Character Encoding (*continued)

Sample taken from the online version of the French newspaper, le Monde

By changing the charset attribute from windows-1252 to shift-jis, the output changes to the following…

Page 14: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: Character Encoding (*continued)

Now you should really see how important character encoding is!

Page 15: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: Important Encoding Standards

• ASCII (American Standard Code for Information Exchange)– The first standard encoding used in computer systems

– 7-bit character set = only enough encoding space for 128 characters– Most languages use more than 128 characters

• Unicode– An attempt to unify all of the existing character encoding systems into

a single standard– 16-bit encoding system (with 96,000 possible numeric codes at

present)– Aims to allow designers to use unique, unambiguous representations

for every character in virtually every writing system in the world– Your eContent files should be encoded using Unicode (or UTF-8) to

ensure that all characters will display correctly – Work on this standard is ongoing. Find out more about Unicode at:

http://www.unicode.org/

Page 16: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: Important Encoding Standards (*continued)

• Other encoding standards– ISO-8859-1 (Most western European languages) – Big5 (Traditional Chinese)– Shift-JIS (Japanese)

• Additional information on character sets and encoding standards is available in your handouts

Page 17: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 1: Exercise• At this stage, you should begin to create a web

page, using the material provided• You should…

– Pay particular attention to the character set and fonts that are used in the HTML files

– Test different options and view the different results– End by encoding the files in UTF-8, with suitable fonts used

throughout

• Tags and attributes to experiment with include:– <META HTTP-EQUIV…..charset=….>– <META NAME="keywords" …….lang=………>– Any of the <Font> tags– You can insert the Dir attribute into one of the Paragraph tags,

to specify the direction of the characters to be displayed, e.g. <P Dir=“rtl”> for right-to-left

– You can insert quotation marks by using the Q tag and the appropriate Lang attribute, e.g. <Q Lang=“de”>text</Q>

Page 18: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 2: Writing for an International Audience

• Make all text as clear and simple as possible, resulting in:– A smaller word count– Text that is easier and cheaper to translate (if you wish to

localise it at a later stage)

• Use consistent terminology

• Basic Content Guidelines. Avoid using….– Slang, jargon or acronyms– Religious and political references– Humour– Culturally-Specific examples

Page 19: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 2: Writing for an International Audience

(*continued)

• Some Technical Guidelines. Avoid using…– Very long sentences– The passive voice– Long noun strings– Too many synonyms – A telegraphic style of writing

• Note: These guidelines should be followed when your original eContent is being authored!

Page 20: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 2: Formatting Conventions

Numbers– Numeric Separators: 1,234.56 versus 1.234,56 or 1 234,56– Numerals Used: Arabic, Roman…– Expressions: A billion = 1,000,000,000 or 1,000,000,000,000– Lucky/Unlucky Numbers

• Time– Time Separators: 11:30:40 versus 11:30.40– 12 hour versus 24 hour: 10:15 versus 22:15 - inclusion of

am/pm?– Time Differences: Your web site automatically displays the

current time, but for where?

Almost every country in the world has specific conventions for dealing with the format of:

Page 21: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 2: Formatting Conventions (*continued)

• Date– Date Order: 12/10/03 = October 12, 2003 or December 10,

2003 or even October 3, 2112– Date Separator: 1/12/03 versus 1.12.03– Calendar Used: Gregorian, Lunar, Imperial…– Solution: Avoid abbreviations & write the full date

• Currency– Currency Separators: The same as numeric separators– Currency Symbols: €, $ and so forth. Displayed before or

after the numeric digits? € 12.50 versus 12.50 € – Currency Conversion: May be necessary if dealing with

multiple locales

Page 22: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 2: Formatting Conventions (*continued)

• Units of Measurement– Temperatures, degrees in Fahrenheit or Celsius?– Distance, measured in miles or kilometres?– Size of clothes, shoes, paper……

• Address Formats– Address Order: Name of Addressee, Company, Street, Town,

City, State, Zip Code and Country. Always in this order? – Codes: Zip code versus Postal code (or maybe no code at all!)

• Telephone/Fax Numbers and so forth– Give international numbers or country dialling codes e.g.+353

61 202881

Page 23: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 2: Graphics

• If possible, avoid using graphics with embedded text

• If you do add text, create layered graphics– Use a graphics program like Adobe PhotoShop for example– Allow space for text expansion– Keep a copy of each layer to facilitate localisation (you will

only have to take a single layer and translate the text contained in it)

• Use culturally-neutral graphics– Avoid using flags, stars, body parts or animals– Can you imagine why?

Page 24: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 2: Graphics (*continued)

• Some images/icons make sense in one culture, but not in another

• Take the image on the top right for example– A classical American mailbox representing email– Is this icon suitable for all cultures?– Would the icon below be more universally

recognised?

• Can you think of other icons or images that could cause problems for an international audience?

Page 25: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 2: Other Cultural IssuesColours have very different meanings around the

world…Colour US Spain France Japan China

Danger Power, Passion, Fire, Danger

Aristocracy Anger, Danger

Happiness

Safety Hope Criminality Future, Youth, Energy

Ming Dynasty, Heaven

Cowardice Bad Luck Transience Grace, Nobility

Birth, Wealth, Power

Purity Purity Neutrality Death Death, Purity

Page 26: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 2: Other Cultural Issues (*continued)

• Symbols– The swastika, for example, is viewed differently in different

locales. An ancient symbol of good luck, prosperity and long life, used in cultures such as India and China Associated with National Socialism in Europe

• Gestures– Remember the web page we saw earlier. It showed a

photograph of a handshake– Is a handshake used universally as a gesture of greeting?

• Representations of People– Be aware of issues such as gender, race and dress code– Is it appropriate to have a woman wearing a short skirt,

photographed in the workplace for example?

Page 27: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 2: Exercise

• At this stage, take the web pages from earlier and edit/remove/replace anything that you feel could be culturally-offensive

• You can also edit the text, to make it clearer or more internationally-aware

Page 28: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 3: File Formats

• Choice of file formats – SGML, HTML, XML……• What tagging language should be used and why?• XML – eXtensible Markup Language

– Recommended for international web sites– Powerful in many areas: single-source, multilingual storage,

structured data, rendering– Allows you to separate eContent (text) from program code.

Therefore, facilitates translation– Supports Unicode (and many other encodings)– Offers support for industry standards (TMX, XLIFF)– Allows you to use CSS and XSL to render web pages– Offers an XML:Lang attribute – ID-based leveraging

Page 29: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 3: File Formats (*continued)

Extract from an XML file

Page 30: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 3: File Formats (*continued)

• XLIFF - XML Localisation Interchange File Format• A standard developed by Oasis Consortium,

specifically for localisation– Interchange of localisable data and its related information,

without any loss– Tool-neutral– As it is based on XML, it offers all of the advantages of XML

• XLIFF documents can capture anything needed for a localisation project:– Localisable objects (e.g. text strings) in source and target

languages– Supplementary information (e.g. glossaries or material to

recreate the original format)– Administrative information (e.g. workflow data)– Custom data (e.g. initialisation information for tools)

Page 31: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 3: Typical i18n Problems - Web Forms

• A potential problem area!

• Design forms that are able to:– Handle multi-lingual data– Accept multi-directional data– Deal with text expansion– Accommodate different formatting conventions (date,

currency etc.)– Acceptable by international standards

• Be careful when specifying mandatory fields– Zip/Postal Codes

Page 32: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 3: Typical i18n Problems - Web Forms (*continued)

 Web form taken from www.ryanair.com

Page 33: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 3: Typical i18n Problems - Text Expansion

• In web forms & other parts of your web site, remember to make allowances for possible text expansion

• How?– Leave sufficient white space in the original version

Language Percentage Expansion (when translated from English)

French 20%

German 25%

Italian 20%

Chinese 30 – 40%

Page 34: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 3: Typical i18n Problems - Text Expansion (*continued)

• If text expansion is not considered during i18n, it may result in truncation during l10n, i.e. text will be clipped

• Example – a button to submit a web form– “Submit” in English = “Soumettre” in French

• Short passages of text have the potential to create even bigger problems!

Page 35: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 3: Typical i18n Problems - Technology

• Levels of technology vary greatly from country to country

• Your content should be technically accessible to as wide an audience as possible

• Try to avoid creating content that is difficult to download or that requires large bandwidth – Flash presentations– Large graphics files

Page 36: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 3: Typical i18n Problems – Legal Issues

• Very specialised area – consult an expert!• Essential that you comply with international

legislation– Copyright regulations– Domain names– Data protection/Privacy– Contracts– Payment regulations– Customer services

• Examples: – Yahoo France – auction of Nazi materials– Ryanair Germany – comparison of prices with Lufthansa

Page 37: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Developing an i18n Checklist

– Can the web site handle multiple languages, scripts and writing directions?

– Is the web site encoded in Unicode?– Is all of the text in the web site suitable for an international audience?– Is consistent terminology used throughout?– Are all graphics used culturally-neutral?– If any of the graphics contain text, is this text accessible (via layers)?– Are dates, numbers and other formats displayed in an internationally-

friendly manner?– Are appropriate colours, gestures and symbols used in the web site?

We have now reached the end of the Internationalisation course. To recap on what we have learned today, we will go through a basic i18n checklist.

Page 38: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Developing an i18n Checklist (*continued)

– Has XML and/or XLIFF been used to separate user interface text from the actual code of the web site?

– Are all web forms used in the site capable of accepting international data?

– Is the web site designed with space for text expansion?– Have the technical limitations of the international market

been taken into account?– Does the design of the site meet international legal

requirements?

Page 39: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

Session 3: Exercise

• Unfortunately this course is relatively short. Therefore, we will not deal with converting our files to XML

• Design a web form using the instructions provided. Ensure that it will be capable of handling international data (remember to think about text expansion!)

• Make a note of any problems that you can still see with your files (technical, legal…)

Page 40: An Introduction to the Internationalisation of eContent A course provided by the Localisation Research Centre (LRC) as part of the EU-Funded ELECT Project

End of Day 1