Transcript

http://proglit.com/

bits and text

BY

SA

byte(the size of a cell of

addressable memory)

8 bits on all modern systemsoctet = 8 bits

kilobyte1,000 (103) bytes

or 1,024 (210) bytes

megabyte1,000,000 (106) bytes

or 1,048,576 (220) bytes

gigabyte1,000,000,000 (109) bytes

or 1,073,741,824 (230) bytes

terabytepetabyte

exabytezettabyte (1021 bytes or 270 bytes)

(1018 bytes or 260 bytes)

(1015 bytes or 250 bytes)

(1012 bytes or 240 bytes)

kibibyte

mebibyte

gibibyte (230 bytes)

(220 bytes)

(210 bytes)

kilobitmegabit

gigabitetc…

(109 bits or 230 bits)

(106 bits or 220 bits)

(103 bits or 210 bits)

kilobit (kb)

kilobyte (kB)

?

“banana”

b a n a n a

2 1 14 1 14 1

2 1 14 1 14 1

2 1 14 1 14 1

b a n a n a

“banana”

b a n a n a

52 97 4 97 4 97

character set(a mapping of characters to numbers)

ASCII(American Standard Code for

Information Interchange)128 characters

whitespace character(a character representing spacing)

“A banana”

A b a n a n a

65 32 97 96 110 96 110 96

whitespace character(a character representing spacing)

space, tab, linefeed, carriage return

control character(signals an action response to the reader)

• LF (line feed)• CR (carriage return)• FF (form feed)• BEL (bell)

plain text (no formatting, only characters)

• no italics, underline, or bold• no fonts, font sizes, or colors• no margins, columns, or page breaks etc.

character(a unit of written language and notation)

glyph(an actual visual representation

of a character)

j j

character encoding(scheme for representing characters as bits)

ASCII = 1 byte per character

c a t

100 97 116

0x64 0x61 0x74

Unicode(the world standard character set

and its encodings)

U+0000to

U+10FFFF

U+0000 – U+FFFF plane 0, BMP (Basic Multilingual Plane)U+10000 – U+1FFFF plane 1, SMP (Supplementary Multilingual Plane)U+20000 – U+2FFFF plane 2, SIP (Supplementary Ideographic Plane)U+30000 – U+DFFFF planes 3 to 13 currently unassignedU+E0000 – U+EFFFF plane 14, SSP (Supplementary Special-purpose Plane)U+F0000 – U+FFFFF plane 15, PUA (Private Use Area)U+100000 – U+10FFFF plane 16, PUA (Private Use Area)

UTF-32(4 bytes per character)

U+3FF01 0000_0000 0000_0011 1111_1111 0000_000100 03 FF 01

U+40077 0000_0000 0000_0100 0000_0000 0111_011100 04 00 77

U+0065 0000_0000 0000_0000 0000_0000 0110_010100 00 00 65

UTF-16(2 or 4 bytes per character)

U+0065 0000_0000 0110_010100 65

U+F10F 1111_0001 0000_1111F1 0F

1101_10xx xxxx_xxxx 1101_11xx xxxx_xxxx

* (fixed) (plane) (character)

UTF-16(2 or 4 bytes per character)

U+3F010 1101_1000 1011_1100 1101_1100 0001_0000

U+10FF00 1101_1011 1111_1111 1101_1111 0000_0000

U+17711 1101_1000 0001_1101 1101_1111 0001_0001

UTF-16(2 or 4 bytes per character)

U+3F010 1101_1000 1011_1100 1101_1100 0001_0000D8 BC DC 10

U+10FF00 1101_1011 1111_1111 1101_1111 0000_0000 DB FF DF 00U+17711 1101_1000 0001_1101 1101_1111 0001_0001 D8 1D DF 11

surrogates: U+D800 to U+DFFF

UTF-8(1 to 4 bytes per character)

U+0000 – U+007F:0xxx_xxxx

U+0080 – U+07FF:110x_xxxx 10xx_xxxx

U+0800 – U+FFFF:1110_xxxx 10xx_xxxx 10xx_xxxx

U+10000 – U+10FFFF:1111_0xxx 10xx_xxxx 10xx_xxxx 10xx_xxxx

UTF-8(1 to 4 bytes per character)

U+0031:0011_0001

U+0700:1101_1100 1000_0000

U+86FF:1110_1000 1001_1011 1011_1111

U+50000:1111_0001 1001_0000 1000_0000 1000_0000

UTF-8(1- to 4-bytes per character)

U+0031: (valid) 0011_0001

U+0031: (invalid) 1111_0000 1000_0000 1000_0000 1011_0001

text editor(a program for creatingand editing text files)

• notepad• vi/vim• emacs

http://proglit.com/


Top Related