internationalization bob alcorn, blackboard inc
TRANSCRIPT
Internationalization
Bob Alcorn,Blackboard Inc.
And now a word from our lawyers…Any statements in this presentation about future expectations, plans and prospects for Blackboard and other statements containing the words "believes," "anticipates," "plans," "expects," "will," and similar expressions, constitute forward-looking statements within the meaning of The Private Securities Litigation Reform Act of 1995. Actual results may differ materially from those indicated by such forward-looking statements as a result of various important factors, including the factors discussed in the "Risk Factors" section of our most recent 10-K filed with the SEC. In addition, the forward-looking statements included in this press release represent the Company's views as of April 11, 2005. The Company anticipates that subsequent events and developments will cause the Company's views to change. However, while the Company may elect to update these forward-looking statements at some point in the future, the Company specifically disclaims any obligation to do so. These forward-looking statements should not be relied upon as representing the Company's views as of any date subsequent to April 11, 2005. Blackboard, in its sole discretion, may delay or cancel the release of any product or functionality described in this presentation.
Internationalization Overview
• Internationalization (i18n) vs. Localization (l10n)
• Character sets, Character Encoding– Unicode, ISO– Pitfalls
• Blackboard Learning System™ Application Pack 2 Features
• Looking Ahead
I18N vs L10N
• I18N is the process of building the infrastructure to support multiple locales– String extraction– Data formatting– Character set encoding
• L10N is the process of enabling a locale– Providing resource bundles– Dependent on depth of i18n
Concepts
• Character - the smallest components of written language that have semantic value
• Glyphs – the shapes that characters can have when they are rendered or displayed– Not a one-to-one correspondence. E.g.,
Different fonts, ligatures– Not what we care about vis-à-vis I18N…
Concepts
• Character Sets– Collection of characters used to express a given
written language, expressed as a numeric value– Kinda sorta language specific
• Character Set Encoding– Binary encoding for numeric values in a character
set– Sometimes used interchangeably with character
set… • E.g., MIME type “text/html;charset=iso-8859-1”… the
“character set” is ISO-8859-1.• For our intents, we can treat them as synonymous.
Concepts
• Universal Character Set (UCS)– Unambiguous numeric value for every
character in every language (more or less)– UCS can be unambiguously encoded
with…• UTF-16• UCS-2 (subset of UTF-16)• UTF-8• Requires multi-byte encoding
Concepts
• Unambiguous encoding enables “co-existence” of several different languages in a single data stream– Single byte encodings require a “marker” to know
that any given value (e.g., 64) is to be interpreted as a different character
– Example: you couldn’t directly encode Hebrew and Cyrillic (with ISO-8859-7 and ISO-8859-5, respectively) in the same database field.
Concepts
• The “ISO 8859 Planes”– Set of single byte encodings (256 UCS characters
per encoding)– ISO-8859-X, where X = 1 . . 15– Super set of 7-bit US-ASCII (values 0-127 are
identical)
• Windows encoding is NOT ISO-8859-1– CP1252. Similar except for a control character
range
The Pipeline
BrowserApplication
ServerDatabase
Server
Internally Unicode. Handles translation
from browser to database
Dependent on database storage type (char, nchar)
and various settings
May see arbitrary text streams from servers. Posts in same encoding.
The Browser
The Application Server
• Internally Unicode (Java uses UTF-16 internally
• Java I/O APIs take encoding into consideration– Specifically java.io.Reader and
java.io.Writer– Map bytes from HTTP input stream to
characters
The Database
• CHAR vs. NCHAR– Single vs. multi-byte– CHAR can still be
“internationalized” with different collations
What’s Wrong Here?
String value = “problème”;
socketStream.write( value.getBytes() );
byte[] byteBuf = new byte[1024];
int count = socketStream.read( byteBuf );
String value = new String( byteBuf, 0, count );
Client:
Server:
What’s Wrong Here?
File path = storeFileFromRequest();
FileReader fr = new FileReader( path );
char[] buf = new char[1024];
StringBuffer str = new StringBuffer();
while( fr.read( buf ) != -1 )
{
str.append( buf );
}
Encodings and Transformations
P r o b l è m e
Text
ISO-8859-1
x50 x72 x6F x62 x6C xE8 x6D x65 byte[] bytes = value.getBytes()
UTF-8
x50 x72 x6F x62 x6C xC3 x6D x65xA8 byte[] bytes = value.getBytes( “UTF-8” )
Reading UTF-8 as ISO-8859-1
P r o b l à m e¨ new String( bytes )
Encodings and Transformations
P r o b l è m e
x50 x72 x6F x62 x6C xE8 x6D x65
Text
UTF-16 (LE)
Reading UTF-16 (LE) as ISO-8859-1
x00 x00 x00 x00 x00 x00 x00 x00
P r o b l è m e□ □ □ □ □ □ □ □
Blackboard Academic Suite ™ Version 6, Application Pack 2
• Internationalized!– Text extracted into locale-specific resource
bundles– Application code uses locale settings for
formatting (numbers, dates, names)– ISO-8859-1 only
• In theory, any 1-byte encoding could be used, but it is not being tested
Blackboard Academic Suite ™ Version 6, Application Pack 3
• Internationalized!– Support multiple locales simultaneously– Per-course, Per user settings
• Blackboard Building Blocks™ view doesn’t change– Locale negotiation is still transparent
Blackboard Academic Suite ™ Release 7
• Final Stage in internationalization– Full multi-byte support from browser to
database– Multi-byte file name handling, independent
of server file system– Localizable Blackboard Building Blocks
manifests– “Language Pack Editor”
Application Pack 2 – Blackboard Building Blocks View
• What’s the current locale?
• Display this datum using current locale settings…
• Parse this datum using the current locale settings…
Making it Work
• Automatically handled through tag library
<bb:docTemplate></bb:docTemplate>
HTTP Header:
Content-type: text/html;charset=ISO-8859-1
HTML:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
Making it Work
• If you’re not using the tag, you can still set the appropriate values
String encoding = BbServiceManager .getConfigurationService() .getBbProperty( “bbconfig.webserver.charset” );
response.setContentType( “text/html;charset=” + encoding );
I18N API
• LocaleManager– Fail-safe service (no service exceptions;
falls back to en_US locale)– Hides locale negotiation– AP2 is per-VI; ML feature set has per
course, or per user.
LocaleManager localeManager = BbServiceManager.getLocaleManager();
I18N API
• BbLocale– Wraps JDK locale object– Auto-negotiated for current context
• Important for dynamic, multi-locale capabilities
– Wraps utility functions
BbLocale locale = localeManager.getLocale();
locale.getLocale();
Application<%Locale locale = BbServiceManager.getLocaleManager() .getLocale().getLocaleObject();ResourceBundle bundle = ResourceBundle.getBundle( "resources", locale );
String pageTitle = bundle.getString( "index.page.title" );String formatDemoTitle = bundle.getString( "format.demo.title" );String formatDemoDesc = bundle.getString( "format.demo.desc" );String inputDemoTitle = bundle.getString( "input.demo.title" );String inputDemoDesc = bundle.getString( "input.demo.desc" );%>
<bbUI:docTemplate title="<%=pageTitle%>"><bbUI:titleBar><%=pageTitle%></bbUI:titleBar> <bbUI:caretList> <bbUI:caret title="<%=formatDemoTitle%>" href="<%=Util.getFullUri( request, Constants.URI_LOCALE_DATA )%>"> <%=formatDemoDesc%> </bbUI:caret> <bbUI:caret title="<%=inputDemoTitle%>" href="<%=Util.getFullUri( request, Constants.URI_DATA_INPUT )%>"> <%=inputDemoDesc%> </bbUI:caret> </bbUI:caretList></bbUI:docTemplate>
Application
Application
What’s Wrong Here?
String dateString = dateValue.toString();
out.println( dateString );
Date.toString() does output a locale-appropriate string, or give any options for formatting.
toString(): Mon Jul 19 21:02:13 GMT-05:00 2004
In French: 19 juil. 2004 21 h 02 GMT-05:00In English: Jul 19, 2004 9:02:13 PM GMT-05:00
Displaying Datavalue = locale.formatDate( dateValue, BbLocale.Name.SHORT );
value = locale.formatNumber( floatValue );
value = locale.formatName( user, BbLocale.Name.SHORT );
Displaying Data
• Corresponds to Java libraries (see java.text.*), but with formats predefined to Blackboard UI conventions.– DateFormat.format()– DecimalFormat.format()– PercentageFormat.format()
Format Enumerations
• BbLocale.Name– LONG, MEDIUM, SHORT, GREETING
• BbLocale.Date– LONG, MEDIUM, SHORT
• BbLocale.Time– LONG, MEDIUM, SHORT
What’s Wrong Here?
String numberString = “100,000.00”;
float floatVal = Float.parseFloat( numberString );
Number.parseType() methods do not perform locale-sensitive transformations.
European locales, for example, use comma separators, instead of decimal separators. E.g., 100.000,00
Reading Data
BbLocale locale = BbServiceManager .getLocaleManager() .getLocale();
float floatValue = locale.parseNumberAsFloat( input );
double doubleValue = locale.parseNumber( input );
Limitations
• Pre-R7, B2 Manifest is single locale– Block installs as en_US, and always
displays as en_US– E.g., if Locale is es_ES, links are not
rendered properly
• Pre-R7, Multi-byte locales not supported– Incompatible, single-byte locales not
verified (e.g., ISO-8859-5 will not co-exist with ISO-8859-1)
Looking Ahead – Blackboard Academic Suite™ Release 7
• Complete internationalization– Additional extraction and localization– Platform changes to support additional,
non-Latin languages
• Multi-byte I/O support– Database (NVARCHAR, etc.)– UTF-8/16 encoding browser to application,
application to database• UCS-2 on Windows
Looking Ahead – Blackboard Academic Suite ™ Release 7
• Blackboard Building Blocks Resource Bundles– Register bundles to display appropriate
end-user text
• Multiple Locales in Blackboard Building Blocks– Installation, default, and fall-back rules
Thank You!