character set and language negotiation in z39.50 version 3
DESCRIPTION
Character Set and Language Negotiation in Z39.50 Version 3. Scope. Negotiate language of messages Negotiate character set of InternationalString Z39.50 “message” strings Optionally retrieve records in negotiated character set Character set negotiation only valid for version 3. - PowerPoint PPT PresentationTRANSCRIPT
Character Set and Language Negotiation in Z39.50 Version 3
ZIG Tutorial Stockholm, 10 August 1999
Scope• Negotiate language of messages• Negotiate character set of
InternationalString• Z39.50 “message” strings • Optionally retrieve records in negotiated
character set• Character set negotiation only valid for
version 3
ZIG Tutorial Stockholm, 10 August 1999
Negotiation Basics• Carried in UserInfo external object in Init• Similar to option negotiation
– origin proposes list of possibilities– target selects one from list
• Only a single round of negotiation takes place
• Applies to complete session• Cannot change during session
ZIG Tutorial Stockholm, 10 August 1999
UserInfoFormat-charSetandLanguageNegotiation-2{1 840 10003 10 2} DEFINITIONS ::=
BEGIN
CharSetandLanguageNegotiation ::= CHOICE {
proposal [1] IMPLICIT OriginProposal,
response [2] IMPLICIT TargetResponse
}
ZIG Tutorial Stockholm, 10 August 1999
Character Sets• ISO 2022 is “code page” approach to
character set• ISO 10646 is ~ Unicode• Different procedures for negotiating
character sets:– ISO 2022 – ISO 10646
• Can negotiate “private” character set
ZIG Tutorial Stockholm, 10 August 1999
OriginProposal ::= SEQUENCE {
proposedCharSets [1] IMPLICIT SEQUENCE OF CHOICE{
iso2022 [1] Iso2022,
iso10646 [2] IMPLICIT Iso10646,
private [3] PrivateCharacterSet} OPTIONAL,
-- proposedCharSets must be omitted
-- if origin proposes version 2
}
ZIG Tutorial Stockholm, 10 August 1999
ISO 2022• Supports 7- and 8-bit environments• “Page” is 96 graphic characters (“G set”)
and 32 control characters (“C set”)• 2 G pages active at any one time (G-Right
[hex 20-7F], G-Left [hex A0-FF])• 2 C sets active (C0 [00-1F], C1 [80-9F])• Can define 4 G pages and swap into GL,
GR as needed
ZIG Tutorial Stockholm, 10 August 1999
ISO 2022 Escapes• Assign character sets to pages G0-G3,
C0-C1• Make G pages active in GL, GR• Character sets identified by 1 or 2
characters in the escape sequence• Character sets and the escape sequences to
identify them are registered :– http://www.itscj.or.jp/ISO-IR/index.htm
ZIG Tutorial Stockholm, 10 August 1999
ISO 2022 negotiation• Negotiate initial assignment of G0-G3• Negotiate initial assignment of GL, GR• Sequence of origin proposals for all of these• Target response chooses one of these
proposals• In absence of negotiation must assume IRV
in GL with GR undefined– no characters above hex 7F
ZIG Tutorial Stockholm, 10 August 1999
Iso2022 ::= CHOICE{
originProposal [1] IMPLICIT SEQUENCE{
proposedEnvironment [0] Environment OPTIONAL,
proposedSets [1] IMPLICIT SEQUENCE OF INTEGER,
proposedInitialSets [2] IMPLICIT SEQUENCE OF InitialSet,
proposedLeftAndRight [3] IMPLICIT LeftAndRight
},
}
Environment ::= CHOICE{
sevenBit [1] IMPLICIT NULL,
eightBit [2] IMPLICIT NULL
}
ZIG Tutorial Stockholm, 10 August 1999
InitialSet::= SEQUENCE{
g0 [0] IMPLICIT INTEGER,
g1 [1] IMPLICIT INTEGER,
g2 [2] IMPLICIT INTEGER,
g3 [3] IMPLICIT INTEGER,
c0 [4] IMPLICIT INTEGER,
c1 [5] IMPLICIT INTEGER
}
LeftAndRight ::= SEQUENCE{
gLeft [3] IMPLICIT INTEGER
{g0 (0), g1 (1), g2 (2), g3 (3)},
gRight [4] IMPLICIT INTEGER
{g1 (1), g2 (2), g3 (3)}
}
ZIG Tutorial Stockholm, 10 August 1999
ISO 10646• Defines a single set of 1032 possible
characters (4+ billion !!!)• Divided into “planes” of 1016 characters• Only first plane currently has characters
defined: “Basic Multilingual Plane” (BMP)• BMP is co-terminous with Unicode• Z39.50 negotiates ISO 10646, not
Unicode per se
ZIG Tutorial Stockholm, 10 August 1999
Unicode Encoding Rules• UCS-4:32-bit characters• UCS-2: 16-bit character encoding with
“surrogate” mechanism for characters in planes above 0
• UTF-16: like UCS-2• UTF-8: 8-bit character encoding, with
variable length multi-byte characters for all characters other than first 128
ZIG Tutorial Stockholm, 10 August 1999
UTF-8• Intended to be a “file system safe” encoding• Guarantees that every character with value
below hex 80 is an ASCII character, including hex 00.
• All characters with values above 7F are encoded as 2, 3 or 4 bytes
• Transformation between UTF-8 and UCS-2 is simple and efficient
ZIG Tutorial Stockholm, 10 August 1999
Negotiating ISO 10646• Specify the “character repertoire” (i.e. the
subset of the full UCS that will be used)• Specify the encoding• Handled by object identifiers• For Unicode:
– character repertoire is the full BMP– encoding can be UTF-16 or UTF-8
ZIG Tutorial Stockholm, 10 August 1999
Iso10646 ::= SEQUENCE{
collections [1] IMPLICIT OBJECT IDENTIFIER,
-- oid of form 1.0.10646.implementationLevel -- .repertoireSubset.arc1.arc2. ....
-- [use 1.0.10646.1.2.1.3 for Unicode]
encodingLevel [2] IMPLICIT OBJECT IDENTIFIER
-- oid of form 1.0.10646.0.form -- where value of 'form' is 2, 4, 5, or 8 -- for ucs-2, ucs-4, utf-16, utf-8
ZIG Tutorial Stockholm, 10 August 1999
Language Negotiation• Instances of InternationalString are either
“message” or “name”• Language negotiation applies to “message
strings”• Origin proposes one or more language codes• Codes from Z39.53• Target may choose 1 of these proposed codes
ZIG Tutorial Stockholm, 10 August 1999
proposedLanguages [2] IMPLICIT SEQUENCE OF
LanguageCode OPTIONAL,
recordsInSelectedCharSets [3] IMPLICIT BOOLEAN OPTIONAL
-- default 'false’
ZIG Tutorial Stockholm, 10 August 1999
initRequest { -- SEQUENCE referenceId -- "9" --, protocolVersion 'e0'H, options 'eda2'H, preferredMessageSize 15000, exceptionalRecordSize 15000, implementationName -- "Amicus Professional Workstation" --, implementationVersion -- "3.0” --, otherInfo { -- SEQUENCE OF { -- SEQUENCE category { -- SEQUENCE categoryTypeId {1 2 840 10003 10 2}, categoryValue 0 }, information externallyDefinedInfo { -- SEQUENCE direct-reference {1 2 840 10003 10 2}, encoding single-ASN1-type proposal { -- SEQUENCE
proposedCharSets { -- SEQUENCE OF iso10646 { -- SEQUENCE
collections {1 0 10646 1 2 1 3}, encodingLevel {1 0 10646 1 0 8} },
ZIG Tutorial Stockholm, 10 August 1999
iso2022 originProposal { -- SEQUENCE proposedEnvironment eightBit NULL, proposedSets { -- SEQUENCE OF 2, 1000, 1001, 1002, 1003,
1, 67
}, proposedInitialSets { -- SEQUENCE OF { -- SEQUENCE g0 2, g1 1001, g2 1001, g3 1001, c0 1, c1 67 } }, proposedLeftAndRight { -- SEQUENCE gLeft 0, gRight 1 }
},
ZIG Tutorial Stockholm, 10 August 1999
proposedlanguages { -- SEQUENCE OF -- “ENG” }, recordsInSelectedCharSets TRUE }
} }}
}