character set and language negotiation in z39.50 version 3

21
Character Set and Language Negotiation in Z39.50 Version 3

Upload: cleave

Post on 23-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Character Set and Language Negotiation in Z39.50 Version 3. Scope. Negotiate language of messages Negotiate character set of InternationalString Z39.50 “message” strings Optionally retrieve records in negotiated character set Character set negotiation only valid for version 3. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Character Set and Language Negotiation in Z39.50 Version 3

Character Set and Language Negotiation in Z39.50 Version 3

Page 2: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

Scope• Negotiate language of messages• Negotiate character set of

InternationalString• Z39.50 “message” strings • Optionally retrieve records in negotiated

character set• Character set negotiation only valid for

version 3

Page 3: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

Negotiation Basics• Carried in UserInfo external object in Init• Similar to option negotiation

– origin proposes list of possibilities– target selects one from list

• Only a single round of negotiation takes place

• Applies to complete session• Cannot change during session

Page 4: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

UserInfoFormat-charSetandLanguageNegotiation-2{1 840 10003 10 2} DEFINITIONS ::=

BEGIN

CharSetandLanguageNegotiation ::= CHOICE {

proposal [1] IMPLICIT OriginProposal,

response [2] IMPLICIT TargetResponse

}

Page 5: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

Character Sets• ISO 2022 is “code page” approach to

character set• ISO 10646 is ~ Unicode• Different procedures for negotiating

character sets:– ISO 2022 – ISO 10646

• Can negotiate “private” character set

Page 6: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

OriginProposal ::= SEQUENCE {

proposedCharSets [1] IMPLICIT SEQUENCE OF CHOICE{

iso2022 [1] Iso2022,

iso10646 [2] IMPLICIT Iso10646,

private [3] PrivateCharacterSet} OPTIONAL,

-- proposedCharSets must be omitted

-- if origin proposes version 2

}

Page 7: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

ISO 2022• Supports 7- and 8-bit environments• “Page” is 96 graphic characters (“G set”)

and 32 control characters (“C set”)• 2 G pages active at any one time (G-Right

[hex 20-7F], G-Left [hex A0-FF])• 2 C sets active (C0 [00-1F], C1 [80-9F])• Can define 4 G pages and swap into GL,

GR as needed

Page 8: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

ISO 2022 Escapes• Assign character sets to pages G0-G3,

C0-C1• Make G pages active in GL, GR• Character sets identified by 1 or 2

characters in the escape sequence• Character sets and the escape sequences to

identify them are registered :– http://www.itscj.or.jp/ISO-IR/index.htm

Page 9: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

ISO 2022 negotiation• Negotiate initial assignment of G0-G3• Negotiate initial assignment of GL, GR• Sequence of origin proposals for all of these• Target response chooses one of these

proposals• In absence of negotiation must assume IRV

in GL with GR undefined– no characters above hex 7F

Page 10: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

Iso2022 ::= CHOICE{

originProposal [1] IMPLICIT SEQUENCE{

proposedEnvironment [0] Environment OPTIONAL,

proposedSets [1] IMPLICIT SEQUENCE OF INTEGER,

proposedInitialSets [2] IMPLICIT SEQUENCE OF InitialSet,

proposedLeftAndRight [3] IMPLICIT LeftAndRight

},

}

Environment ::= CHOICE{

sevenBit [1] IMPLICIT NULL,

eightBit [2] IMPLICIT NULL

}

Page 11: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

InitialSet::= SEQUENCE{

g0 [0] IMPLICIT INTEGER,

g1 [1] IMPLICIT INTEGER,

g2 [2] IMPLICIT INTEGER,

g3 [3] IMPLICIT INTEGER,

c0 [4] IMPLICIT INTEGER,

c1 [5] IMPLICIT INTEGER

}

LeftAndRight ::= SEQUENCE{

gLeft [3] IMPLICIT INTEGER

{g0 (0), g1 (1), g2 (2), g3 (3)},

gRight [4] IMPLICIT INTEGER

{g1 (1), g2 (2), g3 (3)}

}

Page 12: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

ISO 10646• Defines a single set of 1032 possible

characters (4+ billion !!!)• Divided into “planes” of 1016 characters• Only first plane currently has characters

defined: “Basic Multilingual Plane” (BMP)• BMP is co-terminous with Unicode• Z39.50 negotiates ISO 10646, not

Unicode per se

Page 13: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

Unicode Encoding Rules• UCS-4:32-bit characters• UCS-2: 16-bit character encoding with

“surrogate” mechanism for characters in planes above 0

• UTF-16: like UCS-2• UTF-8: 8-bit character encoding, with

variable length multi-byte characters for all characters other than first 128

Page 14: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

UTF-8• Intended to be a “file system safe” encoding• Guarantees that every character with value

below hex 80 is an ASCII character, including hex 00.

• All characters with values above 7F are encoded as 2, 3 or 4 bytes

• Transformation between UTF-8 and UCS-2 is simple and efficient

Page 15: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

Negotiating ISO 10646• Specify the “character repertoire” (i.e. the

subset of the full UCS that will be used)• Specify the encoding• Handled by object identifiers• For Unicode:

– character repertoire is the full BMP– encoding can be UTF-16 or UTF-8

Page 16: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

Iso10646 ::= SEQUENCE{

collections [1] IMPLICIT OBJECT IDENTIFIER,

-- oid of form 1.0.10646.implementationLevel -- .repertoireSubset.arc1.arc2. ....

-- [use 1.0.10646.1.2.1.3 for Unicode]

encodingLevel [2] IMPLICIT OBJECT IDENTIFIER

-- oid of form 1.0.10646.0.form -- where value of 'form' is 2, 4, 5, or 8 -- for ucs-2, ucs-4, utf-16, utf-8

Page 17: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

Language Negotiation• Instances of InternationalString are either

“message” or “name”• Language negotiation applies to “message

strings”• Origin proposes one or more language codes• Codes from Z39.53• Target may choose 1 of these proposed codes

Page 18: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

proposedLanguages [2] IMPLICIT SEQUENCE OF

LanguageCode OPTIONAL,

recordsInSelectedCharSets [3] IMPLICIT BOOLEAN OPTIONAL

-- default 'false’

Page 19: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

initRequest { -- SEQUENCE referenceId -- "9" --, protocolVersion 'e0'H, options 'eda2'H, preferredMessageSize 15000, exceptionalRecordSize 15000, implementationName -- "Amicus Professional Workstation" --, implementationVersion -- "3.0” --, otherInfo { -- SEQUENCE OF { -- SEQUENCE category { -- SEQUENCE categoryTypeId {1 2 840 10003 10 2}, categoryValue 0 }, information externallyDefinedInfo { -- SEQUENCE direct-reference {1 2 840 10003 10 2}, encoding single-ASN1-type proposal { -- SEQUENCE

proposedCharSets { -- SEQUENCE OF iso10646 { -- SEQUENCE

collections {1 0 10646 1 2 1 3}, encodingLevel {1 0 10646 1 0 8} },

Page 20: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

iso2022 originProposal { -- SEQUENCE proposedEnvironment eightBit NULL, proposedSets { -- SEQUENCE OF 2, 1000, 1001, 1002, 1003,

1, 67

}, proposedInitialSets { -- SEQUENCE OF { -- SEQUENCE g0 2, g1 1001, g2 1001, g3 1001, c0 1, c1 67 } }, proposedLeftAndRight { -- SEQUENCE gLeft 0, gRight 1 }

},

Page 21: Character Set and Language Negotiation in Z39.50 Version 3

ZIG Tutorial Stockholm, 10 August 1999

proposedlanguages { -- SEQUENCE OF -- “ENG” }, recordsInSelectedCharSets TRUE }

} }}

}