extended reach: an efficient content management technique for sharing and localizing content

22
Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content IBM Technical Report TR-40.0032 December, 2003 Sheila Monheit IBM Corporate Webmasters San Jose, CA, United States [email protected] Sara Elo Dean IBM Corporate Webmasters Helsinki, Finland [email protected] David Leip IBM Corporate Webmasters Hawthorne, NY, United States [email protected] Hidekazu Shirayama IBM Corporate Webmasters Tokyo, Japan [email protected]

Upload: david-leip

Post on 28-Nov-2014

1.258 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

IBM Technical Report TR-40.0032 December, 2003

Sheila Monheit IBM Corporate Webmasters San Jose, CA, United States [email protected]

Sara Elo Dean IBM Corporate Webmasters Helsinki, Finland [email protected]

David Leip IBM Corporate Webmasters Hawthorne, NY, United States [email protected]

Hidekazu Shirayama IBM Corporate Webmasters Tokyo, Japan [email protected]

Page 2: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

Table of Contents 1 Introduction............................................................................................................................. 3 2 Objectives ............................................................................................................................... 3 3 IBM URI Taxonomy............................................................................................................... 3 4 Approach................................................................................................................................. 4

4.1 ibm.com Content Model ................................................................................................. 5 4.2 Multi-Page Publish Scheme............................................................................................ 5 4.3 Enabling Localized Content............................................................................................ 6 4.4 Automating Country Code References ........................................................................... 8 4.5 Shared vs. Localized Text Blocks................................................................................. 13 4.6 Leadspace Rotation....................................................................................................... 17 4.7 Hybrid Approach .......................................................................................................... 18

5 Evaluating Extended Reach in Pilots.................................................................................... 19 5.1 Pilot 1: Basic extended reach with identical content .................................................... 19 5.2 Pilot 2: Enhanced extended reach with localized content............................................. 20 5.3 Pilot 2 Evaluation.......................................................................................................... 20

6 Future Work .......................................................................................................................... 21

Page 3: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

1 Introduction For global companies such as IBM, it is important from a marketing and brand perspective that they represent themselves as being “in touch” with the many local national markets in which they do business. This applies to all aspects and representations of the corporation, including their web presence. In some cases these markets can be quite small, and it can be difficult to justify the investment to create and maintain separate web content for each of these markets individually. The alternative, simply grouping countries together and creating a single web site for a region, is not particularly attractive. It leaves that set of end users feeling not on par with the corporation’s larger markets. A large corporate web site such as ibm.com is faced with the challenge to serve as wide a set of customers as efficiently as possible. Two strategies exist for achieving this goal. The first is to leverage the same content across different formats. For example, the ibm.com corporate news content is shared across XHTML for the standard web browsers, WML, HDML, cHTML for pervasive devices, and RSS for content syndication. The second approach is to share the same content across different sites. This paper discussed the second approach, named Extended Reach. Specifically, the paper explains the way IBM has set up multiple country portals that can be managed, from a content maintenance perspective, as a single portal.

2 Objectives The Extended Reach project has three main business goals:

1. To make ibm.com available on a wider basis world wide 2. To reduce the workload of maintaining country portals, especially for smaller countries 3. Flaunt the “I” (International) in IBM

IBM took the early lead in establishing a web presence for quite a few countries, more than its

competitors. In recent years some of its larger competitors (Dell, HP & Microsoft) surpassed IBM, creating a web presence in more countries. With the rollout of the Extended Reach project, IBM has regained the leadership position. Today IBM presents a country portal in 83 countries, while Dell, HP and Microsoft and other competitors cover fewer countries.

3 IBM URI Taxonomy The IBM URI taxonomy centers on subject matter keywords in English and the ISO standard for two-letter country and language codes [1]. These elements allow presenting a web site visitor with consistent naming conventions across applications and web sites worldwide. Examples:

• http://www.ibm.com/ibm/au (About IBM in Australia) • http://www.ibm.com/news/ve (News in Venezuela) • http://www.ibm.com/servers/de (Servers in Germany)

Page 4: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

If more than one language is used for a country, URIs follow the /<cc>/<lc> format where <cc> is two-letter code as specified in ISO 3166-1 and the corresponding ISO 3166-1-alpha-2 code elements and <lc> is the two character language code. Examples:

• http://www.ibm.com/e-business/ch/fr (e-business in Switzerland, French version) • http://www.ibm.com/products/ca/fr (Products & Services in Canada, French version)

Top-level, or root level, directories are restricted to IBM registered trademarks and service marks, and major, global, cross divisional content areas such as /e-business, /thinkpad, /services and /products. These keywords must be in English only. For worldwide consistency, URIs are not translated to the local language. Use of regional web sites and regional URIs is strongly discouraged. Furthermore, if consistent URIs do not or cannot be implemented due to application constraints for strategic pages, the ibm.com web servers are configured with redirects so that the advertised URI still abide to the URI taxonomy. Examples:

• http://www.ibm.com/shop/it/customerservice (Online customer support Italy) redirects to http://www-134.ibm.com/webapp/wcs/stores/servlet/HelpDisplay?subject=2294556&storeId=380&catalogId=-380&langId=-4

• http://www.ibm.com/shop/uk/help (Online shopping support UK) redirects to http://www-134.ibm.com/webapp/wcs/stores/servlet/HelpDisplay?storeId=826&catalogId=-826&dualCurrId=20&langId=826&subject=2294556

The Extended Reach technique builds on the fact that the IBM web URI taxonomy is country code centric. URIs between corresponding pages for countries vary in general only by country code. This enables URIs to be programmatically localized for countries within a group.

4 Approach The Extended Reach technique is applicable for a group of country web sites with the following criteria:

• Maximum content sharing across multiple countries. The goal is to share most of the content that makes up the web site, with only a small amount of unique information maintained separately for each country. Allow for variation in content where a country has a local business need.

• Group similar small market countries together based on common language and region. For example:

o 20 Caribbean English language countries o 7 ASEAN English language countries

Due to translation issues, it is not possible to share content between different languages. • Enforce a standard layout.

Page 5: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

• Support rotation of content to give a greater sense of freshness and even uniqueness across countries.

• Comply with a standard URI taxonomy to enable the automated localization of standard URIs.

• Cater for automated country name substitution, but with care.

4.1 ibm.com Content Model Today, a content management system based on the Extensible Markup Language (XML) is used to create and maintain ibm.com country portals. By encoding content in XML and layout logic in the Extensible Stylesheet Language (XSL), the system enforces the separation of content and presentation. The system also supports reusable XML fragments and manages the dependencies between such fragments. Using a Java-based user interface, a content editor can upload XSL stylesheets and multimedia objects, create and edit XML content fragments, compose pages out of fragments, preview pages, review final published pages, and reject them or promote them to the final stage in the publishing flow [2]. Every ibm.com web page consists of several fragments: a masthead, footer, left and right navigation bars, and the main white space. Each of these is built as a separate XML fragment included into one or more XML documents, or servables. The XML fragments and servables abide to Document Type Definitions (DTDs). Fragments correspond to reusable components such as a navigation bar, an image, or a link, and servables to specific page types, such as an index page, a homepage, or a news article. An XML servable may contain fragments that are unique to the white space of the page type or reusable fragments. An XML servable is transformed to output pages in various formats by dedicated XSL stylesheets that control the presentation of a page. Thus content input and output presentation are tightly controlled by the appropriate servable and fragment DTDs and the XSL stylesheets.

4.2 Multi-Page Publish Scheme For countries not within Extended Reach, ibm.com corporate portal country pages are generated on a 1-1 basis. One input XML servable transformed with one XSL generates one output page (in HTML, WML, HDML, or RSS format) for one country in one language. Thus, ten XML servables tagged for ten different countries are transformed by one XSL stylesheet, generating ten resulting pages. In this way the IBM standard layout, along with the tight DTD control over the page content, are ensured across every country portal page. Extended Reach presented the challenge of creating more than one output page from one XSL transformation of one input XML. The input XML was now a fully reusable XML servable, made up already reused fragments. The existing content model and content were analyzed to identify how content could be efficiently shared across a group of countries. Countries that share a common language and common content could be grouped together.

Page 6: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

The first design introduced no changes to the DTDs in order to avoid the maintenance of two sets of DTDs, one set for countries with unique content and one for the Extended Reach countries with identical content. The Extended Reach technique was implemented as a multi-page publishing scheme in the XSL stylesheets. The existing XSL logic was enhanced to include a looping mechanism. The new logic could generate multiple outputs from a single XML and result in a distinct ibm.com corporate portal page for each specified target country. The output pages were identical in content, apart from the automated localization of the masthead, footer and URIs. Once the groupings of countries had been identified, rendering the countries to IBM standard layouts became very straightforward. Within every XML servable is a COUNTRY element tag, which specifies the target country page being generated. By adding this tag multiple times, the stylesheet can process any number of countries.

Single country tagging: <COMMON>

<LANGUAGE >en</LANGUAGE> <COUNTRY>bd</COUNTRY> </COMMON> Multiple country tagging: <COMMON>

<LANGUAGE >en</LANGUAGE> <COUNTRY>bd</COUNTRY> <COUNTRY>lk</COUNTRY> <COUNTRY>vn</COUNTRY> <COUNTRY>ph</COUNTRY> <COUNTRY>my</COUNTRY> <COUNTRY>th</COUNTRY> <COUNTRY> id</COUNTRY> <AUDIENCE >all</AUDIENCE>

</COMMON>

An XML servable also contains the STYLESHEET tag, which identifies the XSL stylesheet to transform with: <STYLESHEET>regional_newsindex_xml_html.xsl</STYLESHEET>

4.3 Enabling Localized Content The first Extended Reach implementation successfully created multiple near-identical, automatically localized output pages and enforced the IBM layout standard. However, the approach was too rigorous: identical pages left no room for unique country distinctions. Some ASEAN Extended Reach candidate countries were unable to adopt the technique because the design did not allow for any localization on the pages. An enhanced design needed to allow for some custom content identification within the existing page structures defined in the DTDs.

A content analysis of countries in the same region provided insight into the localization requirements. Fig 1 and Fig 2 show the www.ibm.com homepages for Malaysia and Indonesia:

Page 7: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

Fig 1: www.ibm.com/my

Fig 2: www.ibm.com/id

Every link on the page refers either to a country-specific page or a www.ibm.com general page. The country code occurs anywhere within the URI, or not at all.

www.ibm.com/my/offers/thinkpad/

www.ibm.com/services/my/

www.ibm.com/services/bcs/id/

Leadspace views rotate per hit, for every country.

www.ibm.com/planewtwide/select

www.ibm.com/services/id/

www.ibm.com/planetwide/select

Page 8: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

Fig 3: Services links

Malaysian Homepage Services section: Indonesian Homepage Services section:

Further investigation of content, such as lists of links in Fig 3, reveals the following: • The URI taxonomy is consistent within defined sections on a page, so enabling

country references can be automated. • Some links appear only for a subset of countries in a group, so country tagging

of a link must be enabled • Some text blocks are identical across all countries with the exception of the

local country name, so enabling automatic country references within text could be enabled.

4.4 Automating Country Code References Before Extended Reach, links, such as the ones in Fig 3, were defined in XML as ITEM_TITLE and ITEM_URL element pairs. The sample below defines the left navigation bar on the www.ibm.com/us homepage:

<PRIMARY_LINKS> <ITEM> <ITEM_TITLE>Home / home office</ITEM_TITLE> <ITEM_URL>http://www.ibm.com/homeoffice/</ITEM_URL> </ITEM> </PRIMARY_LINKS> <PRIMARY_LINKS> <ITEM>

<ITEM_TITLE>Small & medium business</ITEM_TITLE> <ITEM_URL>http://www.ibm.com/businesscenter/us/<ITEM_URL>

</ITEM> </PRIMARY_LINKS> <PRIMARY_LINKS> <ITEM> <ITEM_TITLE>Large enterprise</ITEM_TITLE> <ITEM_URL>http://www.ibm.com/largeenterprise/us/</ITEM_URL> </ITEM> </PRIMARY_LINKS>

Optional link to: www.ibm.com/financing/my

No optional link

Page 9: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

The transforming XSL loops over all the PRIMARY_LINK elements and generates the following output html:

Based on the definition in the XML, all the links point to US URIs and every title and URI pair is included in the output. No mechanism, or need, exists to specify conditions of links, such as their presence or absence in the output, because the navigation bar is dedicated to the US. The following patterns were defined to enable flexible localization of links. Content and XSL stylesheets were enhanced to respectively include and process the new logic.

%%CC substitute every country (cc) listed under

<COMMON/COUNTRY> in the URI string %%INCLIST_cc_cc_%% substitute ONLY countries included in the INCLIST

string [[%%INCLIST_cc_cc_%%]] include this link (which contains no CC references

at all, ex:www.ibm.com) for countries in the INCLIST (note: this string is added at the end of the URI string)

%%EXCLIST_cc_cc_%% substitute ONLY countries NOT included in the EXCLIST string

[[%%EXCLIST_cc_cc_%%]] include this link (which contains no CC references at all, ex:www.ibm.com) for countries NOT included in the EXCLIST (note: this string is added at the end of the URI string

Going back to the sample services section in Fig 3, the XML for that section in the new syntax becomes:

<SERVICES_BOX>

<SERVICES_GRAY_TITLE>Services</SERVICES_GRAY_TITLE> <SERVICES_LINKS>

<LINK_TEXT>Business and IT services</LINK_TEXT> <LINK_URL>http://www.ibm.com/services/%%CC/</LINK_URL>

</SERVICES_LINKS> <SERVICES_LINKS>

<LINK_TEXT>Business consulting services</LINK_TEXT> <LINK_URL>http://www.ibm.com/bcs/%%CC/</LINK_URL> </SERVICES_LINKS> <SERVICES_LINKS>

<LINK_TEXT>Infrastructure services</LINK_TEXT> <LINK_URL>http://www.ibm.com/services/%%CC/strategy/capability/fullinfra.html</LINK_URL>

</SERVICES_LINKS> <SERVICES_LINKS>

http://www.ibm.com/hom

http://www.ibm.com/businesscenter/us/ http://www.ibm.com/largeenterprise/us/

Page 10: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

<LINK_TEXT>On demand services</LINK_TEXT> <LINK_URL>http://www.ibm.com/services/%%CC/ondemand/</LINK_URL> </SERVICES_LINKS> <SERVICES_LINKS>

<LINK_TEXT>Financing</LINK_TEXT> <LINK_URL>http://www.ibm.com/financing/%%INCLIST_my_th_ph_%%/</LINK_URL> </SERVICES_LINKS>

</SERVICES_BOX>

Further down in the same XML servable the country definitions are: <COMMON>

<LANGUAGE>en</LANGUAGE> <COUNTRY>my</COUNTRY> <COUNTRY>ph</COUNTRY> <COUNTRY>th</COUNTRY> <COUNTRY>id</COUNTRY>

</COMMON> The output seen in Fig 3 for the Malaysian and Indonesian homepages is generated by the Extended Reach XSL below:

<xsl:template name="regionalLinks"> <xsl:param name="cc"/> <xsl:param name="link"/> <xsl:choose> <xsl:when test="contains($link,'%%CC')"> <xsl:value-of select="concat(substring-before($link,'%%CC'),$cc,substring-after($link,'%%CC'))"/> </xsl:when> <xsl:when test="contains ($link,'%%INCLIST_')"> <xsl:variable name="IncList" select="substring-before (substring-after ($link, '%%INCLIST_'), '%%')"/> <!--xsl:value-of select="concat('this is dolist variable:', $doList)"/--> <xsl:choose> <xsl:when test="contains ($IncList, $cc)"> <xsl:choose> <xsl:when test="contains ($link, '[[%%INCLIST_')"> <xsl:value-of select="substring-before($link, '[[%%INCLIST')"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="concat(substring-before($link, '%%INCLIST_'),$cc,substring-after($link, '_%%'))"/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <xsl:value-of select="''"/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:when test="contains ($link,'%%EXCLIST_')"> <xsl:variable name="ExcList" select="substring-before (substring-after ($link, '%%EXCLIST_'), '%%')"/> <!--xsl:value-of select="concat('this is dolist variable:', $doList)"/--> <xsl:choose> <xsl:when test="contains ($ExcList, $cc)"> <xsl:value-of select="''"/>

Page 11: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

</xsl:when> <xsl:otherwise> <xsl:choose> <xsl:when test="contains($link, '[[%%EXCLIST_')"> <xsl:value-of select="substring-before($link, '[[%%EXCLIST')"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="concat(substring-before($link, '%%EXCLIST_'),$cc,substring-after($link, '_%%'))"/> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <xsl:value-of select="$link"/> </xsl:otherwise> </xsl:choose> </xsl:template>

A detailed explanation of the XSL follows: The XSL template gets passed 2 parameters from the parent routine:

1. cc, which is the country code of the pass it is performing under the FOR-EACH loop for COMMON/COUNTRY:

<COMMON> <LANGUAGE>en</LANGUAGE> <COUNTRY>my</COUNTRY> <COUNTRY>ph</COUNTRY> <COUNTRY>th</COUNTRY> <COUNTRY>id</COUNTRY> </COMMON> In the first pass cc=my (Malaysia), then ph (Philippines) and so on.

2. link, which is the string containing the URI information, the contents

of the <LINK_URL> element: <LINK_URL>http://www.ibm.com/financing/%%INCLIST_my_th_ph_%%/</LINK_URL>.

The template above is executed within the COMMON/COUNTRY for-each loop N times, once for each time a URI requires processing. In this example, the cc variable does not change values until all the LINK_URL elements are processed. At that point the cc variable is assigned the value of the next COUNTRY element and the processing for each LINK_URL is repeated.

The links being processed are in order:

1. http://www.ibm.com/services/%%CC/ 2. http://www.ibm.com/bcs/%%CC/ 3. http://www.ibm.com/services/%%CC/strategy/capability/fullinfra.html 4. http://www.ibm.com/financing/%%INCLIST_my_th_ph_%%/

Page 12: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

The following XSL logic occurs for each pass through these links:

For the first three links (1, 2 and 3) the value of cc, the country being processed, is substituted directly into the link string at the exact location of the %%CC notation. Thus, when processing the first COMMON/COUNTRY element my, the first three links print as

http://www.ibm.com/services/my/ http://www.ibm.com/bcs/my/ http://www.ibm.com/services/my/strategy/capability/fullinfra.html

and when processing the second COMMON/COUNTRY element ph, the same links print as

http://www.ibm.com/services/ph/ http://www.ibm.com/bcs/ph/ http://www.ibm.com/services/ph/strategy/capability/fullinfra.html

The processing of the fourth link is more complicated.

http://www.ibm.com/financing/%%INCLIST_my_th_ph_%%/

When the XSL encounters the %%INCLIST or %%EXCLIST pattern, it triggers two conditional loops:

1. First, it parses the string until the closing _%% to see whether or not the current cc variable is relevant for this string. In this case, Malaysia (my), Thailand (th), and Philippines (ph) homepages should all include this URI. Indonesia (id) homepage should not include it.

This could also have been represented as:

http://www.ibm.com/financing/%%EXCLIST_id_%%/

and would have produced the same results. For the EXCLIST pattern, the conditional loop parses the string to see if the current cc is NOT in the list, and if so, the link is included. 2. Second, if it is established that the URI string is applicable for the current cc variable, the next conditional test determines if the URI string contains a country reference in its syntax, or whether it’s a general ibm.com URI that has no country reference in it at all. This test performs a second parse on the INCLIST or EXCLIST patterns to determine if the INCLIST or EXCLIST pattern is at the end of the URI string, and if so, if the [[ opening and ]] closing brackets surround it. This indicates that the URI string does include a country reference. An example is the pattern:

http://www.lotus.com/[[%%INCLIST_my_%%]]

which prints out the link without any country code http://www.lotus.com/ on the Malaysian page only.

Page 13: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

Last, if the string being processed with the INCLIST or EXCLIST pattern is not applicable to the current cc variable, the XSL returns a blank string. This is necessary for later processing when the URI and TITLE are both processed for the final output. The TITLE is always included in the input XML, regardless of country tagging, so to ensure that no TITLE without a corresponding URI is inserted into output HTML, a blank string is required for a last test before the HTML output is created. If the returned URI string is blank, no TITLE/URI combination is included in the HTML; if it isn’t blank, the returned string, now containing the correct country tags, along with the corresponding TITLE, is included in the HTML.

4.5 Shared vs. Localized Text Blocks A comparison of About IBM pages provides a good example of the types of text blocks shared among and localized by countries.

Fig 4: www.ibm.com/ibm/my

Fig 5: www.ibm.com/ibm/id

Text block that all countries share. May include country name in the text.

Text block with localized information

Localized photo (optional)

Shared financial info, additional section for localized financial info allowed

Page 14: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

The examples in Figs 4 and 5 illustrate different types of text blocks, namely shared and localized. A shared text block is reusable, but requires some processing to allow for minor localization in order to give the text a country specific feel. For example, in the first section, it would be ideal if a country could use the general text, and insert one or more localized sentences.

A localized text block is specific to a country only. For example: the history of IBM in the country, the picture of the local general manager, or the contact information for the country shown in Fig 6. Fig 6: www.ibm.com/ibm/my continued

Text block that all countries share. May include country name in the text.

Shared financial info, additional section for localized financial info allowed

Text block with localized information

Localized photo (optional)

Page 15: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

For the shared text block, the text processing XSL template is modified to accept a TAG that serves as a placeholder and country identifier within a text block, (%%COUNTRYNAME). Using standard XSL, this text processing template can be invoked, using the country name (Malaysia). The XSL processing is a standard text substitution/replacement template, one that recursively parses a sting and substitutes any instance of TAG with the passed in parameter value.

A less obvious, but equally beneficial, outcome of this first type of text substitution is its application to the HTML Meta tags:

Malaysian Meta tags: <meta name="IBM.Country" content="my"/> <meta name="Description" content="The IBM Malaysia home page, entry point to information about IBM products and services."/> <meta name="Abstract" content="The IBM Malaysia home page, entry point to information about IBM products and services."/> The corresponding Indonesian Meta tags: <meta name="IBM.Country" content="id"/> <meta name="Description" content="The IBM Indonesia home page, entry point to information about IBM products and services."/> <meta name="Abstract" content="The IBM Indonesia home page, entry point to information about IBM products and services."/>

The second type, the localized text block, requires a change beyond the Extended Reach approach described so far where only XSL processing and content are enhanced. Minor DTD changes need to be introduced to accommodate the inclusion of localized blocks of text in an XML servable. The DTD for the About IBM page, along with all the other portal pages, already accommodates the inclusion of reusable XML fragments.

The root element for About IBM DTD:

<!ELEMENT ABOUT_IBM (SYSTEM,TITLE,TITLE_GRAPHIC?, LONG_DESCRIPTION?, SITE_SECTION, LEFT_NAVBAR, PHOTO?, PHOTO_URL?, CAPTION?, BLUE_TITLE? COMPANY_INFO?, COUNTRY_COMPANY_INFO? CONTACT_INFO?, FINANCIAL?, ADDITIONAL_INFO*, INLINE_ELEMENTS?, PUBLISHINFO+, COMMON, META_INFORMATION)>

In this example, the underlined elements are subfragments, reusable pieces of XML that can be included in the full About IBM XML servable. To accommodate the requirements for localized text blocks, the About IBM DTD was modified to create the Regional About IBM DTD:

<!ELEMENT REGIONAL_ABOUTIBM (SYSTEM, TITLE,TITLE_GRAPHIC? LONG_DESCRIPTION?, SITE_SECTION, LEFT_NAVBAR, PHOTO_SECTION*, BLUE_TITLE?, COMPANY_INFO?, COUNTRY_COMPANY_INPUT*, CONTACT_INFO*, FINANCIAL*,ADDITIONAL_INFO*, INLINE_ELEMENTS?, PUBLISHINFO+,COMMON, META_INFORMATION)> The difference between the two versions of the DTD are the additional fragment elements in the regional version: PHOTO_SECTION and COUNTRY_COMPANY_INPUT. In addition, some

Page 16: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

of the fragments formerly defined as ‘cardinality zero or one’ (?) were modified to ‘cardinality zero or more’ (*). These changes provide the ability to include separate XML fragments for the localized text blocks. For example, in the regional About IBM servable created for Bangladesh (bd), Sri Lanka (lk), Vietnam (vn), Philippines (ph), Malaysia (my), Thailand (th) and Indonesia (id), the following XML fragments are included:

<PHOTO_SECTION SUBFRAGMENTTYPE="COUNTRY_PHOTO”> <COUNTRY_PHOTO> . . .

<COMMON DATATYPE="NOLABEL"> <LANGUAGE DATATYPE="ASSOCLIST">en</LANGUAGE> <COUNTRY DATATYPE="ASSOCLIST">my</COUNTRY> <AUDIENCE DATATYPE="ASSOCLIST" LINKABLE="AUDIENCE">all</AUDIENCE>

</COMMON> </COUNTRY_PHOTO> </PHOTO_SECTION> <PHOTO_SECTION SUBFRAGMENTTYPE="COUNTRY_PHOTO”> <COUNTRY_PHOTO> . . .

<COMMON DATATYPE="NOLABEL"> <LANGUAGE DATATYPE="ASSOCLIST">en</LANGUAGE> <COUNTRY DATATYPE="ASSOCLIST">ph</COUNTRY> <AUDIENCE DATATYPE="ASSOCLIST" LINKABLE="AUDIENCE">all</AUDIENCE>

</COUNTRY_PHOTO> </PHOTO_SECTION> <PHOTO_SECTION SUBFRAGMENTTYPE="COUNTRY_PHOTO”> <COUNTRY_PHOTO> . . .

<COMMON DATATYPE="NOLABEL"> <LANGUAGE DATATYPE="ASSOCLIST">en</LANGUAGE> <COUNTRY DATATYPE="ASSOCLIST">id</COUNTRY> <AUDIENCE DATATYPE="ASSOCLIST" LINKABLE="AUDIENCE">all</AUDIENCE>

</COUNTRY_PHOTO> </PHOTO_SECTION> <PHOTO_SECTION SUBFRAGMENTTYPE="COUNTRY_PHOTO”> <COUNTRY_PHOTO> . . .

<COMMON DATATYPE="NOLABEL"> <LANGUAGE DATATYPE="ASSOCLIST">en</LANGUAGE> <COUNTRY DATATYPE="ASSOCLIST">th</COUNTRY> <AUDIENCE DATATYPE="ASSOCLIST" LINKABLE="AUDIENCE">all</AUDIENCE>

</COUNTRY_PHOTO> </PHOTO_SECTION> <PHOTO_SECTION SUBFRAGMENTTYPE="COUNTRY_PHOTO”> <COUNTRY_PHOTO> . . .

<COMMON DATATYPE="NOLABEL"> <LANGUAGE DATATYPE="ASSOCLIST">en</LANGUAGE> <COUNTRY DATATYPE="ASSOCLIST">vn</COUNTRY> <AUDIENCE DATATYPE="ASSOCLIST" LINKABLE="AUDIENCE">all</AUDIENCE>

Page 17: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

</COUNTRY_PHOTO> </PHOTO_SECTION> Similar sections of XML exist for other selected subfragment types such as COUNTRY_COMPANY_INPUT and CONTACT_INFO.

This is a collapsed view of the XML containing only the ID of each included subfragment. Note the different number of fragments of each type due to the fact that localized fragments are of cardinality zero or more.

Expanding any of the subfragments reveals the XML elements that identify the applicable country. <COMMON DATATYPE="NOLABEL">

<LANGUAGE DATATYPE="ASSOCLIST">en</LANGUAGE> <COUNTRY DATATYPE="ASSOCLIST">ph</COUNTRY> <AUDIENCE DATATYPE="ASSOCLIST" LINKABLE="AUDIENCE">all</AUDIENCE>

</COMMON>

During XSL processing of this servable, within the for-each loop for the servable COMMON/COUNTRY, a test is performed to verify the existence of a localized fragment and its applicability to the current cc variable. If the cc variable of the servable matches the cc variable of the fragment, the contents of the XML fragment are included in the generation of the output.

This test within the XSL is shown below: <xsl:if test="boolean(../../COUNTRY_COMPANY_INPUT [COUNTRY_COMPANY_INFO/COMMON/COUNTRY=$cc])" > <xsl:apply-templates select="../../COUNTRY_COMPANY_INPUT[COUNTRY_COMPANY_INFO/COMMON/COUNTRY=$cc] "> <xsl:with-param name="directoryPrefix" select="$directoryPrefix"/> <xsl:with-param name="countryName" select="$countryName"/> <xsl:with-param name="cc" select="$cc"/> </xsl:apply-templates> </xsl:if>

4.6 Leadspace Rotation The www.ibm.com homepages have a unique set of criteria: the ability to display rotating leadspace fragments at the top of the white space for each homepage. This feature is shown in Figures 1 and 2, where the leadspaces differ between the Malaysian and Indonesian homepages. This feature is enabled by the homepage engine, which is run for every www.ibm.com homepage, regardless of the manner in which it was created. No modifications were required of the homepage engine to enable it to be used with the extended reach model. However, the use of the engine with the extended reach model adds another level of uniqueness to each country page generated from only one XML source.

Page 18: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

4.7 Hybrid Approach

During the first phase of the Extended Reach project, the design was restricted to an implementation that would not require modification of the existing DTDs. Not having to rebuild existing content was a major consideration. For the most part, existing DTDs could accommodate content for the multi-publish output model. In the second phase of Extended Reach, an opportunity rose to add a new set of pages, with no existing DTDs, into the www.ibm.com Corporate Portal: the Software pages for the ASEAN countries. Since there were no existing DTDs for this set of pages, a completely new design could be implemented, limited only by the restrictions set by the content management system.

The design team decided that combining the earlier approach with some modifications works best. The %%CC notation within an XML tag is still used as a placeholder for country code substitutions. However, rather than using the inclusion/exclusion notation within the XML tag, e.g.

http://www.ibm.com/financing/%%INCLIST_my_th_ph_%%/

editors add discrete country tags in the XML to identify the applicable countries. This approach makes content preparation simpler and less error prone for editors. They can choose a country from a dropdown list rather than typing out a string in the defined syntax for each URI. Furthermore, this approach is more consistent with standard XML tagging, as it separates the URI from the country restrictions set upon it. <PHOTO_SECTION SUBFRAGMENTTYPE="COUNTRY_PHOTO"> <COUNTRY_PHOTO>

<TITLE DATATYPE="STRING" LINKABLE="TITLE">asean Software Home #Lead Image - IBM Lotus Workplace</TITLE> <PHOTO DATATYPE="STRING" SUBFRAGMENTTYPE="IMAGE">

<IMAGE> … </IMAGE>

</PHOTO> <PHOTO_URL> <ITEM_URL>http://www.ibm.com/software/%%CC/lotusworkplace/</ITEM_URL>

</PHOTO_URL> <COMMON DATATYPE="NOLABEL">

<LANGUAGE DATATYPE="ASSOCLIST">en</LANGUAGE> <COUNTRY DATATYPE="ASSOCLIST">id</COUNTRY> <COUNTRY DATATYPE="ASSOCLIST">ph</COUNTRY> <AUDIENCE DATATYPE="ASSOCLIST" LINKABLE="AUDIENCE">all</AUDIENCE> </COMMON>

</COUNTRY_PHOTO> </PHOTO_SECTION>

In the example above the PHOTO_SECTION element is a fragment tagged to work for id (Indonesia) and ph (Philippines) only. It is not applied for the other top-level country tags that denote the overall applicability of the page.

Note the element

Page 19: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

<ITEM_URL>http://www.ibm.com/software/%%CC/lotusworkplace/</ITEM_URL>

and the following elements

<COUNTRY DATATYPE="ASSOCLIST">id</COUNTRY> <COUNTRY DATATYPE="ASSOCLIST">ph</COUNTRY>

The XSL processes the ITEM_URL tag and includes it only for ID and PH.

5 Evaluating Extended Reach in Pilots The first Extended Reach pilot supported the output of identical pages with minimal automated localization. The second Enhanced Extended Reach pilot supported localized content within otherwise identical pages.

5.1 Pilot 1: Basic extended reach with identical content The Extended Reach functionality was first rolled out in the fall of 2002 for two groups: twenty Caribbean English speaking countries and three ASEAN English speaking countries. At the time, the definition of the Extended Reach technique was strict: the countries in an Extended Reach group had to share identical content. The only localization the model allowed was the automatic replacement of the ISO country code in URIs. Each portal page was otherwise identical across the countries with automatically localized masthead and footer. This model proved to fit the very lowest resource countries, where little or no localized content existed. Such country portals consisted of little other than the minimum 9 required top-level pages and a sufficient flow of news articles to keep the news section up to date. The three ASEAN countries that adopted this technique, namely Bangladesh, Sri Lanka, and Vietnam, as well as the twenty Caribbean countries, did benefit from the feature to an extent. A single update in the content management system published out to three web pages, thus reducing the time and money required to keep the sites fresh. An additional benefit was the reduced time required to launch new sites. The twenty Caribbean country portals did not exist before Extended Reach. Their parallel launch took less than one hour, instead of roughly twenty times that if each one was managed and launched as a separate portal. However, when evaluating whether the pilot had resulted in improvements to the site quality, it became obvious that the Extended Reach approach did not solve the problem of content creation. The countries have so little resource that even uploading news articles of regional relevance written and published for the larger ASEAN or Americas markets could not be done. This result led to the re-examination of the Extended Reach model itself. A second round of requirements was gathered from the ASEAN web management. Each Extended Reach group of countries clearly needs at least one country with sufficient funds to create fresh content on an on-going basis. As the content is uploaded into the content management system, the other countries in the same group immediately benefit from the content updates. The question to the ASEAN team was: How does the restriction on identical content need to be relaxed in order to accommodate countries with localized content into the same Extended Reach group?

Page 20: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

5.2 Pilot 2: Enhanced extended reach with localized content During the summer of 2003, the ASEAN web management articulated the requirements for including Indonesia, Malaysia, Philippines, and Thailand into the existing Extended Reach group. The web manager analyzed each page and stated their need for localization, i.e. which areas of the page needed to be optional and filled in with content for a subset of the countries in the group. For example, one requirement was: “The right hand navigation modules on the Products & Services page need the ability to be localized as they are used to link to features that do not exist for all countries.” In order to build the rules for a more general approach to allow for future modification, each rule was generalized. For example, the specific requirement above turned into the following rule in the content model: “The element called related_info in the portal DTDs must be able to be tagged for one, some, or all of the countries in the Extended Reach group, and should only appear on the output pages for the tagged countries.” The technique described in Section 4.5 enables this functionality today for all pages, not only the Products & Services page.

5.3 Pilot 2 Evaluation Enhanced Extended Reach for ASEAN Country Portals was successfully deployed on September 24th 2003 for the following 7 countries: Malaysia, Indonesia, Philippines, Thailand, Sri Lanka, Bangladesh, Vietnam. The improvement of 85.8% in time, and thus in web site maintenance cost, was achieved for news articles. Enhanced Extended Reach for the ASEAN Software Portal was successfully deployed on October 8th 2003 for the following 5 countries: India, Singapore, Malaysia, Thailand, Philippines. Quotes from ASEAN team on reduced workload: From Yee Nam Sng, ASEAN Site manager:

“ The News section provides the most savings and efficiency. This is because most news articles are replicated without any change (except for local URIs) across all countries.” “Homepage marketing modules provide savings. We are able to achieve faster turnaround and some savings by planning our updates and marketing modules across ASEAN carefully.”

From AP Creative Services editors:

Page 21: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

“Out of the three Enhanced Extended Reach implementations, the news fragments gain the most benefits. Although it might only save around 15 minutes per news/country, it has saved us from tedious job to replicate the content and manually reposition the tiers. It also has limited the chance of errors. Publishing is now a bliss. Less fragments to load, review and publish.”

“ The psychological efficiency is what we feel most. It's really tedious to duplicate the same thing over and over again. This Enhanced Extended Reach approach has increased the "Morale" of the editor by taking off these duplicate tasks.”

6 Future Work Given the success of the Enhanced Extended Reach model, and the demonstrated cost savings it has resulted in, new country groupings will certainly be created. Some candidates include regions where ibm.com does not yet have existing country portals:

• Americas Spanish • Middle East Arabic • Africa English • Africa French

Another direction is to apply the same technique for new sets of pages, much like was done for Software pages for the ASEAN countries. In addition, the results of the Software page pilot will certainly provide lessons learned to ibm.com Software group on how best to include the software portals worldwide in the same framework. Yet another approach is to multi-publish output pages from one XML source regardless of pre-determined country groupings. For example, the legal statements for many IBM countries are the same, regardless of the region or size of market. XML pages for wireless.ibm.com are generated using this approach.

7 Acknowledgements

The authors wish to thank Dikran Meliksetian, Rosa Bolger, Lisa Intravio Chris Wang and Marie Shafi who helped put the methods described in this paper into practice, and who have consistently supported and contributed to its further development.

8 References

Page 22: Extended Reach: An Efficient Content Management Technique for Sharing and Localizing Content

[1] ISO web site at http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html [2] "XML Content Management: Challenges and Solutions" XML Europe 2001 Nianjun Zhou, Dikran Meliksetian, Louis Weitzman, Sara Elo Dean, Jeff Milton, Peter Davis, Jessica Wu. May 2001