http:// proglit.com
DESCRIPTION
http:// proglit.com /. bits and text. SA. BY. byte. (the size of a cell of addressable memory) 8 bits on all modern systems octet = 8 bits. kilo byte. 1,000 ( 10 3 ) bytes or 1,024 ( 2 1 0 ) bytes. mega byte. 1,000,000 ( 10 6 ) bytes or 1,048,576 ( 2 2 0 ) bytes. giga byte. - PowerPoint PPT PresentationTRANSCRIPT
http://proglit.com/
bits and text
BY
SA
byte(the size of a cell of
addressable memory)
8 bits on all modern systemsoctet = 8 bits
kilobyte1,000 (103) bytes
or 1,024 (210) bytes
megabyte1,000,000 (106) bytes
or 1,048,576 (220) bytes
gigabyte1,000,000,000 (109) bytes
or 1,073,741,824 (230) bytes
terabytepetabyteexabytezettabyte (1021 bytes or 270 bytes)
(1018 bytes or 260 bytes)
(1015 bytes or 250 bytes)
(1012 bytes or 240 bytes)
kibibyte
mebibyte
gibibyte (230 bytes)
(220 bytes)
(210 bytes)
kilobitmegabitgigabit
etc…
(109 bits or 230 bits)
(106 bits or 220 bits)
(103 bits or 210 bits)
kilobit (kb)
kilobyte (kB)
?
“banana”
b a n a n a
2 1 14 1 14 1
2 1 14 1 14 1
2 1 14 1 14 1
b a n a n a
“banana”
b a n a n a
52 97 4 97 4 97
character set(a mapping of characters to numbers)
ASCII(American Standard Code for
Information Interchange)128 characters
whitespace character(a character representing spacing)
“A banana”
A b a n a n a
65 32 97 96 110 96 110 96
whitespace character(a character representing spacing)
space, tab, linefeed, carriage return
control character(signals an action response to the reader)
• LF (line feed)• CR (carriage return)• FF (form feed)• BEL (bell)
plain text (no formatting, only characters)
• no italics, underline, or bold• no fonts, font sizes, or colors• no margins, columns, or page breaks etc.
character(a unit of written language and notation)
glyph(an actual visual representation
of a character)
j j
character encoding(scheme for representing characters as bits)
ASCII = 1 byte per character
c a t
100 97 116
0x64 0x61 0x74
Unicode(the world standard character set
and its encodings)
U+0000to
U+10FFFF
U+0000 – U+FFFF plane 0, BMP (Basic Multilingual Plane)U+10000 – U+1FFFF plane 1, SMP (Supplementary Multilingual Plane)U+20000 – U+2FFFF plane 2, SIP (Supplementary Ideographic Plane)U+30000 – U+DFFFF planes 3 to 13 currently unassignedU+E0000 – U+EFFFF plane 14, SSP (Supplementary Special-purpose Plane)U+F0000 – U+FFFFF plane 15, PUA (Private Use Area)U+100000 – U+10FFFF plane 16, PUA (Private Use Area)
UTF-32(4 bytes per character)
U+3FF01 0000_0000 0000_0011 1111_1111 0000_000100 03 FF 01
U+40077 0000_0000 0000_0100 0000_0000 0111_011100 04 00 77
U+0065 0000_0000 0000_0000 0000_0000 0110_010100 00 00 65
UTF-16(2 or 4 bytes per character)
U+0065 0000_0000 0110_010100 65
U+F10F 1111_0001 0000_1111F1 0F
1101_10xx xxxx_xxxx 1101_11xx xxxx_xxxx
* (fixed) (plane) (character)
UTF-16(2 or 4 bytes per character)
U+3F010 1101_1000 1011_1100 1101_1100 0001_0000
U+10FF00 1101_1011 1111_1111 1101_1111 0000_0000
U+17711 1101_1000 0001_1101 1101_1111 0001_0001
UTF-16(2 or 4 bytes per character)
U+3F010 1101_1000 1011_1100 1101_1100 0001_0000D8 BC DC 10
U+10FF00 1101_1011 1111_1111 1101_1111 0000_0000 DB FF DF 00U+17711 1101_1000 0001_1101 1101_1111 0001_0001 D8 1D DF 11
surrogates: U+D800 to U+DFFF
UTF-8(1 to 4 bytes per character)
U+0000 – U+007F:0xxx_xxxx
U+0080 – U+07FF:110x_xxxx 10xx_xxxx
U+0800 – U+FFFF:1110_xxxx 10xx_xxxx 10xx_xxxx
U+10000 – U+10FFFF:1111_0xxx 10xx_xxxx 10xx_xxxx 10xx_xxxx
UTF-8(1 to 4 bytes per character)
U+0031:0011_0001
U+0700:1101_1100 1000_0000
U+86FF:1110_1000 1001_1011 1011_1111
U+50000:1111_0001 1001_0000 1000_0000 1000_0000
UTF-8(1- to 4-bytes per character)
U+0031: (valid) 0011_0001
U+0031: (invalid) 1111_0000 1000_0000 1000_0000 1011_0001
text editor(a program for creatingand editing text files)
• notepad• vi/vim• emacs
http://proglit.com/