globalisation & computer systems week 4 writing systems and their implications for globalisation...

Post on 28-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Globalisation & Computer systems

Week 4 writing systems and their

implications for globalisation character representation

ASCII extended ASCII code pages Practical: code pages in VB

Week 6 Writing systems and their

implication for globalisation Directionality (Arabic, Hebrew) Code space: Chinese Context sensitive characters: Arabic Compositionality (Amharic)

Representation bits and bytes characters code points glyphs fonts standardization

Representation What is a bit?

‘a binary digit’, i.e either 0 or 1 What is a byte?

‘the fixed no. of bits that can be treated as a unit by the computer hardware’

A byte can be used to express a character such as “A”

Representation ASCII:

American standard code for information interchange

A standard character encoding system The bytes were originally 7-bits Given this, how many bit patterns? Each pattern maps onto a decimal code

point, and that maps onto a character

Representation Glyphs

the pictures used to represent a given character; many to one:

The character “A” -> AAAAAAAAA

Representation Glyphs

the pictures used to represent a given pictures used to represent a given character; many to one:

The character “A” -> AAAAAAAAA

Fonts the collection, or ‘picture gallery’ of

glyphs

Representation ASCII:

The problem with 7-bit bytes… What about French la tête What about Greek κεφαλη

Extend ASCII to 8-bit bytes ISO (International organization for

standardization) Now 256 bit-patterns

Representation Extended ASCII:

With 8-bit bytes you get 256 bit-patterns

For consistency, the first 128 code-points remain the same from ISO-7

The next 128 used for a range of languages

For each language, you need an interpretation of these 128 code points

The encoding is handled by a code page

Representation Extended ASCII:

For code point 154: CP_EASTEUROPE (code page 1250): š CP_RUSSIAN (code page 1251): љ What about code point 65 for these two

code pages? Now represent your names with your

own orthographies in mind, using the code pages

Representation Code pages in VBPublic Enum ValidCharsets ANSI_CHARSET = 0 GREEK_CHARSET = 161 THAI_CHARSET = 222End Enum Private Sub Form_Load()Dim X As New StdFont X.Charset = 161 X.Bold = True X.Size = 8 X.Name = "Times New Roman" Set frmTest.Font = X Set frmTest.Label1.Font = X Set frmTest.Text1.Font = X frmTest.Label1.Caption = Chr(181) + Chr(225) + Chr(226) frmTest.Text1.Text = Chr(181) + Chr(225) + Chr(226)

End Sub

Representation and UNICODE

What about Chinese? Thousands of characters – 256 bit-

patterns clearly not enough

Representation and UNICODE

What about Chinese? Thousands of characters – 256 bit-

patterns clearly not enough Make the bytes bigger… Bytes have 16-bits, which gives

65536 bit-patterns UNICODE

UNICODE – design principles Reference:

The Unicode Standard, Version 3. 2000.

Online: http://www.unicode.org/unicode/uni2book/

top related