creating interfaces: localization language & other issues character codes homework: preparation...
TRANSCRIPT
![Page 1: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/1.jpg)
Creating Interfaces: Localization
Language & other issues
character codes
Homework: preparation for future topics
![Page 2: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/2.jpg)
Finish presentations
• Everyone post constructive comments on at least 2 other projects.
• (Note: catch up on other postings.)
![Page 3: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/3.jpg)
Many, interconnected issues
• Create web site for use in several specific 'local' places.
• Create multiple web sites, each for use in specific place.– in an efficient, effective manner so any underlying
common content does not need to be duplicated (and commonality diluted).
• Develop tools (networking s/w, standards, etc.) that promote Web as "global, interoperable tool of communication"– www.w3c.org
![Page 4: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/4.jpg)
Localization• not just language
– language is not just character code– UCS (universal character set) and UNICODE, many, many
related standards to address encoding issues.
• dates– local date and also way to express 'western' date
• time• money• position on and flow across page• acceptable images, photography, icons• ?
![Page 5: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/5.jpg)
Character code
• Note: European languages plus several other 'small' alphabets easily handled.
• We/I (typical monolingual American) can't hardly appreciate the challenge:– two Chinese (kanji) character sets: modern
(China) and traditional (Taiwan + most of the Chinese diaspora)
– 'ruby': symbols 'over' ideographs
![Page 6: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/6.jpg)
http://www.cs.tut.fi/~jkorpela/chars.html#codecharacter repertoire: A set of distinct characters.character code: A mapping, often presented in tabular form,
which defines a one-to-one correspondence between characters in a character repertoire and a set of nonnegative integers.
character encoding: A method (algorithm) for presenting characters in digital form by mapping sequences of code numbers of characters into sequences of octets. In the simplest case, each character is mapped to an integer in the range 0 - 255 according to a character code and these are used as such as octets. Naturally, this only works for character repertoires with at most 256 characters. For larger sets, more complicated encodings are needed. Encodings have names, which can be registered.
![Page 7: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/7.jpg)
charset
Using the terms just defined, the charset attribute in an HTML meta tag means encoding
<meta http-equiv="Content-Type" content= "text/html;charset=utf-8" />
<meta http-equiv="Content-Type" content= "text/html;charset=ISO-8859-1" />
![Page 8: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/8.jpg)
Language
• Attribute of html tag
<html lang="en-us">
MAY be used by browsers (spell-check, hyphenation, speech synthesizers), search engines, other tools.
See two-letter codes:
www.w3c.org/WAI/ER/IG/ert/iso639.htm
![Page 9: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/9.jpg)
… more• A glyph is a presentation of a particular shape which a
character may have when rendered or displayed. – speak of same glyph in italic, bold, etc.
• A repertoire of glyphs comprises a font. In a more technical sense, as the implementation of a font, a font is a numbered set of glyphs. The numbers correspond to code positions of the characters (presented by the glyphs). Thus, a font in that sense is character code dependent. An expression like "Unicode font" refers to such issues and does not imply that the font contains glyphs for all Unicode characters.
![Page 10: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/10.jpg)
Examples
• ASCII is a character repertoire, code and encoding. Note: confusion about 7 vs 8 bit ASCII
• ISO Latin 1 alias ISO 8859-1 standard defines a repertoire, code and encoding of which ASCII is a subset. ISO 8859 is a family of many encodings, indicated by the –n. ISO 8859-5 handles Cyrillic.
![Page 11: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/11.jpg)
Unicode … provides a unique number for every character, no matter
what the platform, no matter what the program, no matter what the language. This is the goal.
The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646.
It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends.
![Page 12: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/12.jpg)
Note
• Unicode goal is universal coverage…
• Unicode is product of a consortium of 'mostly US companies'.
• Some controversy in its treatment of things– Combining certain kanji characters
![Page 13: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/13.jpg)
Unicode consortium
• Go to http://www.unicode.org/unicode/standard/WhatIsUnicode.html
• Examine the Translations on the left. See what language characters do not appear on your computer. – Select one and
– Go to Display Problems and see if you can fix it.
![Page 14: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/14.jpg)
XML progress• XML 1.0 to XML 1.1• Issue: complaint that new standard had features to
suit IBM• The IBM-specific problem that XML 1.1 aims to
fix has to do with a special character that designates to IBM mainframe systems the end of a line of text. XML 1.0 chokes on that character, but version 1.1 would recognize it.– ZDNet News: http://zdnet.com.com/2100-1104-
962392.html
![Page 15: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/15.jpg)
Techniques
• One web site / screen provide options to go to different pages– use symbols/icons that are meaningful to audience
• tricky. Flags may not be appropriate.
– use images containing text in the specific language– risky choice: hope that computer/platform/browser has
character encoding and font to display language– poor choice: use English word for other language.http://www.lionbridge.com/ Example of company/site
supporting 'global reach'.
![Page 16: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/16.jpg)
quiz What is the word in that language for
– Spanish – Chinese (Mandarin? Hainese?)– Korean– Japanese– Hebrew– Russian– French– Finnish– Arabic (Classical?, ?)– Hindi (Urdu?, ?)
What is the direction of text? What is the format for dates? Time? Money?, relevant cultural issues?
![Page 17: Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e255503460f94b13fb4/html5/thumbnails/17.jpg)
Homework
• Next: Accessibility discussion, exercises
• Prepare – download Instant Saxon: standalone translator
for xml and xslt.– download Nokia Mobile Internet Toolkit. Need
to register (no costs). – register with studio.tellme.com