internationalized domain names (idns) yale a2k2 conference new haven, usa april 27, 2007 ram mohan...
TRANSCRIPT
Internationalized Domain Names (IDNs)
Yale A2K2 Conference
New Haven, USA
April 27, 2007
Ram Mohan [email protected]
Building a Sustainable Framework For A Multilingual Internet
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
Agenda
• Role of IDNs in Access To Knowledge• Multi-lingual Internet Basics – A special case
study in India• Current state of Technical Readiness• An IDN Policy Framework
– Basics– Technical Principles– Policy Principles
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
The Language Barrier Limits Access to the Internet
• The language barrier limits Internet usage
• Domain Names are the single most important way to locate resources on the Internet
• 65% of the world’s Internet users don’t speak English
• In China, 90% of Internet users prefer to access content in their local languages1
• Software applications now integrate websites/email seamlessly http://glreach.com/globstats/
1. CNNIC Statistical Survey, 2005
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
Technology can help
Total
0
500
1000
1500
2000
2500
3000
3500
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7
2005 2006
Total
German IDN launch
IE7B1(27-July-05) IE7B3(29-Jun-06)
IE7B2(24-Apr-06)Additional IDNs launch
Source: PIR ICANN Reports
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
Utility of IDNs
• Makes the Internet more friendly to non-English speakers
• Provides more accessibility to applications like Email, FTP, etc
• It is the most effective way to popularize the use of Internet in non-English speaking communities
• Guarantees cultural diversity and protects the special interests of people in different regions
• Allows national cultures and under-represented languages to stay alive
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
Multi-lingual Internet Basics
• Internationalized Domain Names (IDNs)– Domain names represented in characters used in
local languages• Allows entire domain name to be represented in a
local language character set– example. 日本 , or 日本 . 日本
• These names have to…– Work everywhere– Be backwards compatible– Not break application software– Support languages appropriately
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
Access for India’s billion people
• Total Population 1.002 billion (2001 Census)
• 22 Official Languages– Devanagari script based (North
Indian):• Hindi, Marathi, Sanskrit,
Kashmiri, Sindhi, Nepali, Manipuri
– Dravidian script based (South Indian):
• Tamil, Telugu, Kannada, Malayalam, Konkani
– Arabic Script Based: Urdu– Some Languages representable
in more than one script– Other script basis: Bengali,
Oriya, Gujarati, Punjabi, Assamese
• Worldwide Audience:– Hindi - 400 Million Speakers– Bengali - 200 Million Speakers– Tamil - 60 Million Speakers– Telugu - 70 Million Speakers
• Movies released in 15 languages
• Schools teach in 58 different languages
• Radio programs broadcast in 71 languages
• Newspapers publish in 87 languages
And … one Internet
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
Building Indian IDNs
• China has 1.6 billion Chinese language speakers … with two major scripts and shared characters with Japanese & Korean language communities
• About 12 Indian languages are based on Devanagari scripts … leading to potential variant issues
• A “many-to-many” problem for India– Multiple languages share common scripts– Multiple scripts used in multiple languages
• Other Challenges:– Bi-directional text– Multiple diacentric positioning– Word Breaking
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
Adequate Standards Exist
• IETF Open Standards published 2002-2004:– RFC 3490, 3491, 3492– RFC 3454– RFC 3743
• ICANN IDN Guidelines• China-Japan-Korea (CJK) common CDNC language tables
provide an example of how to build community support• Indic Scripts – standards creation effort provides new learning• Successful root server tests for IDNs at the top level in Dec 2006
Much of the “technology” & “protocol” part of internationalizing domain names is complete
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
Domain Name Technical Constraints Must be considered
• Normal Unicode-Punycode conversion– flod18häst .xn--flod18hst-12a
• Performance with a 63-character long TLD string– .hippo18potamushippo18potamushippo18potamushippo1
8po• Right to left, embedded characters with opposing directional
properties
• Left to right script with sophisticated shaping properties
• Non-alphabetic script
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
Introducing an IDN Policy Framework
• Provides clarity to IDN issues• Ensures that registries give due consideration to key
elements of IDN policies• Allow governments or other authorities to follow /
evaluate the process for IDN deployment at the ccTLD– Involvement of government / authorities at different stages of
process (e.g. List of Valid Characters, Contextual Rules, Variants)
• Provides language communities, civil society, businesses input into policy creation prior to rollout
• IDN Policies & Registry System– Policy decisions may have profound technical implications to
registry system
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
Founding technical principles in IDN implementation
• Build Character Inclusion Table (List of valid characters) ... governments, linguists, technologists needed
• Variant Mapping Consideration …allow only one form of character set for IDN
• Contextual Rules– Minimum & Maximum Length– Prohibited prefixes or suffixes – Potential contextual rules: prohibited character sequences
• Register and operate the Internationalized TLD in the root DNS Server in the form of IDNA Punycode
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
Founding policy principles in IDN implementation
1. Avoid ASCII-Squatting
2. Consult with government for Geo-political Impact of new top level domain
3. Actively solicit Language Community Input for evaluation of new IDN gTLD Strings
4. One String per new IDN gTLD
5. Limit Variant Confusion and Collision
6. Limit Confusingly Similar Strings
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
Founding policy principles (cont.)
7. (No) Priority Rights for new gTLD strings and new domain names
8. Approach Aliasing as a Policy matter
9. Adhere to a Single Script (ASCII exception, other restrictions)
10. UDRP sufficient for dispute resolution in new IDN TLDs
April 27, 2007 Yale Access 2 Knowledge ConferenceRam Mohan
General IDN TLD Rollout Principles
• Retain global uniqueness of the TLD system– Domain names remain unique and unambiguous
• Maintain interoperability of the TLD system– Domain names work the same way regardless of the
geography it is accessed in– .भा�रत needs to point applications and users to the same
place regardless of accessing the domain from India, UK or Greece
• Promote “Future-Proof” solutions– Define Unicode characters to be allowed– Provides ability for adding new languages, new characters
far in the future• Avoid User Confusion• Promote multi-stakeholder involvement
Let’s make it happen
Ram Mohan [email protected]
Building a Sustainable Framework For A Multilingual Internet