Open-Source Approaches to Unicode Enablement
Panel Discussion
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Agenda
Panel Introductions Library Descriptions and Demos What is Open Source? What is the Open Source experience? Q and A
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Today’s Panel
Arnt Gulbrandsen Bob Verbrugge Frank Tang Helena Shih Mark Leisher
Steven Loomis Steven Watt Tex Texin Yves Arrouye
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Library Descriptions and Demos
Troll: QT Free Edition CRL: Assorted Unicode Support Mozilla: International Library of Mozilla IBM: International Components for
Unicode
Troll’s Qt Free Edition
Arnt Gulbrandsen
Troll Tech
CRL’s Unicode Support
Mark Leisher
Computing Research Laboratory
New Mexico State University
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
CRL’s Unicode Support
Goal: Provide example resources usable on Unix.
Fonts.
Encoding mapping tables.
Unicode character information.
Algorithms.
Other resources.
Resource availability.
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
CRL’s Unicode Support
Fonts.
Three bitmap fonts in BDF format were developed and made available.
Arabic
Devanagari
Clearly U
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
CRL’s Unicode Support
Encoding mapping tables.
The Unicode Consortium provides mapping tables for converting many of the more common character sets to Unicode. The CSets archive provides supplementary mapping tables for character sets and encodings that are not supplied by the Unicode Consortium.
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
CRL’s Unicode Support
Unicode character information.
To facilitate development of Unicode-capable software, a simple character information and partial bi-directional reordering API and library was developed early on before standardization efforts really gained momentum. This is the UCData package and the Pretty Good Bidi Algorithm.
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
CRL’s Unicode Support
Algorithms.
To further encourage independent development of Unicode capable software, a few basic text search algorithms were converted to use Unicode text. These include:
A Boyer-Moore string search routine.
A glob matching routine called Wildmat.
An almost minimal DFA regular expression routine.
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
CRL’s Unicode Support
Other resources.
Some of the other resources made available by CRL are:
Code to test wchar_t type support in C/C++ compilers.
Keyboard arrangements for various languages that have been collected over the years.
Resource Availability.
All of the resources mentioned are freeware and can be found at http://crl.nmsu.edu/~mleisher/.
International Library for Mozilla
Frank Tang
Netscape Communications
Mozilla
International Components for Unicode (ICU)
Helena Shih and Steven Loomis
IBM Unicode Technology Center
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Unicode support in the Industry
Lack of a complete set of features in most implementations.
Inconsistent across different environments. Win32 vs. POSIX, for example.
Poor portability. Unable to share the resources with other products. Almost no extensibility and customization. Not a concern for most companies when a product is
first designed.
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
AS/400 e-Server 720AS/400 e-Server 720
Netfinity ServerNetfinity Server
S/390 Server S/390 Server
Apple G3 MacintoshApple G3 Macintosh
Microsoft NT WorkstationMicrosoft NT Workstation
Sun Ultra 60 WorkstationSun Ultra 60 Workstation
IBM’s DB/2 ProductIBM’s DB/2 Product
World Wide WebWorld Wide Web
II
CC
UU
II
CC
UU
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
ICU Objectives
Quality Unicode & I18N support across platforms Consistent results in both C/C++ and Java Powerful, portable API available to the Open-
Source development community Important resources sharing mechanism Outside feedback & contributions improve quality
and feature set
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
ICU Features
Parallel to the i18n architecture in JDK All components multi-thread safe Full Unicode string manipulation Complete locale support, e.g. > 145 locales Fast and flexible character set conversion Efficient data loading mechanism Hierarchical resource bundles with Unicode data Extensive calendar and timezone support Date, time, currency, number and message formatting
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
ICU Features
Locale sensitive sorting (including Thai) Locale sensitive text boundary detection Customizable transliteration interface Unicode text compression algorithm Fast and compliant Unicode 3.0 Bidi algorithm Unicode 3.0 normalization support Most up-to-date Unicode 3.0 character properties
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Platform Support
Reference Platforms:– AIX – OS/390 – AS/400 – RedHat Linux – Solaris– Windows 98, NT4.0 and Win2000– HP-UX
Working Partners: Sun, IBM, NCR, Xerox, Netscape, Progress, RealNames, Versant, Compuware, GlobalSight, Hotmail, Lotus ...
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
ICU Documentation
API Documentation– Updated from header files (like javadoc)– Available on external web site
User Guide– Work in progress, feedback welcome– Initial draft available
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
ICU4J - ICU for Java
IBM developed extensive I18N library I18N code added to Java JDK 1.1 Java code ported to C++ -> ICU ICU available on alphaWorks Both ICU and Java classes continue development
– Sometimes “leapfrogging” each other with features
ICU open source, moves to developerWorks 2000 March: Java Code open source as “ICU4J”
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
ICU4J Features
Builds on Java 2 feature set Feature summary:
– Advanced text boundary detection
– Calendars: Hebrew, Hijri/Islamic, Japanese Gengou, Thai Buddhist
– Spelled-out numbers
– Normalization
– Transliteration
– Standard Unicode compression
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Reference Information
ICU Web Sites– http://oss.software.ibm.com/icu/
developerWorks Unicode site– http://www.ibm.com/developer/unicode/
The Unicode Standard– http://www.unicode.org/
developerWorks Java site– http://www.ibm.com/developer/java/
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Demos
Locale Explorer xliterate-It! Qt Demo
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Agenda
Panel Introductions Library Descriptions and Demos What is Open Source? What is the Open Source experience? Q and A
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
ICU OpenSource Objectives
Promotes a cross-platform Unicode strategy Produces a Unicode technology
implementation Supports important OpenSource products
Linux, Apache, Mozilla, XML etc.
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Open-Source Models
The Apache model– Web access for CVS repository
– Technical committees
Developer community support – [email protected] support account
– news.alphaworks.ibm.com discussion newsgroup
Commercial product partnership– RealNames, versant, GE ...
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Open-Source Models
The Troll Tech model– Free and Professional Editions
– Distinguish private, open source use from commercial,
closed source use
– All contributions accepted and used in both versions.
– Source updated daily
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Why contribute to Open Source?
Bob Verbrugge:– Requires robust I18n and portability– Implementing alone, cost is considerable– Sharing development is cost effective– Shared knowledge with experts– Ability to influence the end-result
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Why contribute to Open Source?
Steve Watt:– Requires portability and interoperability– Upgrading existing library to Unicode
version 3.0 is a sizable effort– Commercial libraries did not meet our
needs– Shared effort means our development
focus is now aligned with on our needs
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Why contribute to Open Source?
Steve Watt’s concerns:– Giving away proprietary technology– Design by committee– Will release schedules fit product
schedules?– Will library and product stay in synch?– Do all participants have common
objectives?
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Why contribute to Open Source?
Yves Arrouye:– Share expertise, give something– Benefits from features developed by others
• Normalization, optimized algorithms• Character set conversions
– Access to source code– Using multiple Open Source products
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Why contribute to Open Source?
Yves Arrouye’s concerns:– Management Perceptions
“If it’s free, it must be for play…”– Entry requirements and qualifications to be
able to affect direction or design– Patch integration, Release control and
schedules– Build stability
16th International Unicode Conference Amsterdam, the Netherlands, March 2000
C14, C15: Panel on Open-Source Approaches to Unicode Enablement
Panel Introductions Library Descriptions and Demos What is Open Source? What is the Open Source experience? Q and A
Agenda