systematic validation of localization across all languages by martin Ørsted, microsoft ireland for...
TRANSCRIPT
Systematic validation of localization across all languages
By Martin Ørsted, Microsoft IrelandFor the LRC XIII Conference
October 2008
Content
• The upstream effort• Downstream bullet-proofing• The single resource approach• Generic rules across a group of resources• Adding the languages• Conclusion• Questions?
Microsoft Ireland, Martin Ørsted
The upstream effort
• Nothing beats fixing at dev time– use of newer programming languages with more
built-in error checking– Use of pseudo localization upstream– Educating developers– The use of controlled English– Source reuse systems
Microsoft Ireland, Martin Ørsted
The upstream effort
• The upstream effort wont be perfect, due to:– Deadlines– Tradeoffs– The inadequacy of the development languages– Certain issues are difficult to bullet proof (law of
diminishing returns)– Choose your own favourite
Microsoft Ireland, Martin Ørsted
Downstream bullet-proofing
• Downstream bullet-proofing addresses shortcomings of upstream bullet-proofing
• But it also adds further benefits
• As the number of languages increase, the more it makes sense to invest
• What benefit can we realize from doing many languages?
Microsoft Ireland, Martin Ørsted
Single resource issues• Over-localization: The string should not have been translated. • Buffer limitation: The translation of the resource should not be more than a given
amount of characters, generally referred to as a string length limitation.• Illegal characters: Certain characters may not be allowed in the string• Dependency: Two resources may have to be translated the same, in effect one
resource is dependent on the other, references the other.• Backward compatibility: It is a special case of the dependency, basically, changing
a string from one version to another could cause a loss of backward compatibility.• Uniqueness: The string belongs to a group of strings that all have to have unique
names(translations), could be a list of commands for example.• Placeholder over-localized: Some localizable strings have placeholders in them. If
the placeholder gets localized the program can not drop the information into the placeholder and display it.
• Needed string decoration: Some strings may have control characters in the beginning or end of the string that should not be localized
Microsoft Ireland, Martin Ørsted
Examples of single resource issuesRule US string Example loc Issue description
Over-localization Common Files Might refer to a registry string. Rather than localizing the string the program will look up the localized name in the registry
Placeholder The file %1 could not be opened because %2
%1 and %2 are placeholders
Decoration \n\nOpen\n\n \n is a new-line character, sometimes used in dos style applications
Placeholder The file %s was last opened on %d %d
On %d%d the file %s was last opened
%s and %d are positional placeholders, their position has to be maintained, changing them as shown will cause an intermittent memory protection fault
Microsoft Ireland, Martin Ørsted
The LocVer rule
• Thought up example:• String in Excel: Current Accounts• Localized string causes bug, we realize that
translation has to be 30 char or less• We create a rule: MaxLength=30• We apply the rule to all languages• If other languages break the rule we will know
Microsoft Ireland, Martin Ørsted
The approach
Microsoft Ireland, Martin Ørsted
Benefit and cost
• + Find once, fix everywhere• + Enables reduced test, no need for regression
against other languages• - Management overhead, review new strings,
edit rules for changed strings• - Only viable with a good few languages• - Manual effort, either inspect strings as
added or add as bugs occur
Microsoft Ireland, Martin Ørsted
Last words on single resource
• Very valuable approach• But least preferred due to overhead• Much used
Microsoft Ireland, Martin Ørsted
Verification across a group of resources
Microsoft Ireland, Martin Ørsted
Groups of resources
• Look for patterns, for example:– Placeholders, %1, %2, %3– Commands, might be identifiable by resource
name
• Apply a generic rule to them– The rule will automatically cover new resources
that match pattern, and will automatically change if the resource change
Microsoft Ireland, Martin Ørsted
Groups of resources
• Positive– Less management overhead– Automatically adjusts to changes– Can become quite advanced
• Limits– Only work if you can identify a pattern– But much preferred in those cases– Fall back is individual resource rules
Microsoft Ireland, Martin Ørsted
SQL queries across a pool of resources
• Same way LocVer fixes Functional (almost)• Query for things like:– US contains 2007, localized doesn’t– US contains Microsoft, localized doesn’t– Localized contains Xdocs (which was the code
name for the first version of InfoPath until late in development)
Microsoft Ireland, Martin Ørsted
SQL queries
• Queries run once or twice• Loads of false positives• But may be worthwhile to review• Gets smarter with the added language
dimension
Microsoft Ireland, Martin Ørsted
Adding the languages
• The more languages added, the more intelligence can be applied
• Idea: Break the linear cost dependency between #Languages and eng and test costs
• Several possibilities
Microsoft Ireland, Martin Ørsted
Patterns across languages
• With 10 languages or more, you can look for patterns per resource like:– If 9 out 10 languages start with \n, should #10 also?– If 9 out 10 languages contain “Microsoft”, should #10
also?– If 9 out 10 languages localize two resources the same,
should #10 also?
– So both linguistic and functional issues will be caught
Microsoft Ireland, Martin Ørsted
Benefits across languages• Examples– DAL, Dynamic Auto Layout– Hotkey fixer, a way of programmatically assigning
hotkeys per language– Grouping on code pages, and only testing across one– Make pseudo loc understand LocVer, and test on
pseudo, reduce test on the languages– Controlled English becomes viable– Transliteration, MT– Test case versus test design specifications, the
introduction of randomness
Microsoft Ireland, Martin Ørsted
The end result
Microsoft Ireland, Martin Ørsted
Conclusion
• The linear dependency between cost of test + engineering versus number of languages can be broken
• At the same time the quality can be systematically improved
• The trick is to design solutions where the work effort and hence cost does not linearly grow with added languages
Microsoft Ireland, Martin Ørsted
Conclusion continued
• DAL, SQL Queries, Generic rules, Single rules, Hotkey fixer all scale to extra languages with no extra effort
• But they come with various degrees of overhead
• Learnings across languages can introduce further efficiencies
Microsoft Ireland, Martin Ørsted
Questions?
• Thank you for your time!
Microsoft Ireland, Martin Ørsted