chapter 8_1 bits and the "why" of bytes: representing information digitally

Chapter 8_1

Bits and the "Why" of Bytes:Representing Information Digitally


Digitizing Discrete Information

• Digitize: Represent information with digits (numerals 0 through 9)

• Limitation of Digits– Alternative Representation: Any set of symbols

could represent phone number digits, as long as the keypad is labeled accordingly

• Symbols, Briefly– Digits have the advantage of having short names

(easy to say)• But computer professionals are shortening symbol

names (exclamation point is pronounced "bang")


Ordering Symbols

• Advantage of digits for encoding info is that items can be listed in numerical order

• To use other symbols, we need an ordering system (collating sequence)

– Agreed order from smallest to largest value

• In choosing symbols for encoding, consider how symbols interact with things being encoded


Encoding with Dice

• Consider a representation based on dice

– Single die has six sides, and the patterns on the sides can be used for digital representation


Encoding with Dice (cont'd)

• Consider representing the Roman alphabet with dice

– 26 letter, only 6 patterns on a die

– We use multiple patterns to represent each letter

– How many patterns are required?• 2 dice can produce 36 pattern sequences (6*6)

• 3 dice can produce 216 (6*6*6)

• n dice can produce 6n sequences



• Call each pattern produced by pairing two dice a symbol:– 1 and 1 == A– 1 and 2 == B– 1 and 3 == C– 1 and 4 == D– 1 and 5 == E– 1 and 6 == F– 2 and 1 == G– etc.


Extending the Encoding

• 26 letters of the Roman alphabet are represented; 10 spaces are blank

• These spaces can be used for the Arabic numerals

• What is we need punctuation? We only have 36 available spaces in the two-dice system. How can we avoid going to a three-dice system?


The Escape: Creating More Symbols

• We can use the last two-dice symbol as "escape". It does not match any legal character, so it will never be needed for normal text digitization

• It indicates that the digitization is "escaping from the basic representation" and applying a secondary representation

• We could also use a "double escape" to enter a tertiary representation system


The Fundamental Representation of Information

• The fundamental patters used in IT come when the physical world meets the logical world

• The most fundamental form of information is the presence or absence of a physical phenomenon

• In the logical world, the concepts or true and false are important

– By associating true with the presence of a phenomenon and false with its absence, we use the physical world to implement the logical world, and produce information technology


The PandA Representation

• PandA is the mnemonic for "presence and absence"

• It is discrete (distinct or separable)—the phenomenon is present or it is not (true or false). There in no continuous gradation in between


A Binary System

• Two patterns—Present and Absent—make PandA a binary system

• We can give any names to these two patterns as long as we are consistent

• The PandA basic unit is known as a "bit" (short for binary digit)


Bits in Computer Memory

• Memory is arranged inside a computer in a very long sequence of bits (places where a phenomenon can be set and detected)

• Analogy: Sidewalk Memory

– Each sidewalk square represents a memory slot, and stones represent the presence or absence

– If a stone is on the square, the value is 1, if not the value is 0


Units of DataMeasures the size or volume of data

BIT: Single binary digit, either a 1 or a 0

BYTE: A group of eight bits, a character (Basic unit for measuring the size of the data)KILOBYTE: 210 or 1024 bytes (not 1000 used in scientific notation for the prefix kilo)MEGABYTE: 220 or 1,048,576 bytes (not 1000000 used in scientific notation for the prefix mega-)

WORD: the number of adjacent bits that are manipulated or stored as a unit in a particular computer system


Alternative PandA Encodings

• There are other ways to encode two states using physical phenomena

– Use stones on all squares, but black stones for one state and white for the other

– Use multiple stones of two colors per square, saying more black than white means 0 and more white than black means 1

– Stone in center for one state, off-center for the other

– etc.


Combining Bit Patterns

• Since we only have two patterns, we must combine them into sequences to create enough symbols to encode necessary information

• PandA has p=2 patterns, arranging them into n-length sequences, we can create 2n symbols


Hex Explained

• Recall in Chapter 4, we specified custom colors in HTML using hex digits– e.g., <font color ="#FF8E2A">

– Hex is short for hexadecimal, base 16

• Why use hex? Writing the sequence of bits is long, tedious, and error-prone


The 16 Hex Digits

• 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F

• Sixteen digits (or hexits) can be represented perfectly by the 16 symbols of 4-bit sequences

• Changing hex digits to bits and back again:

– Given a sequence of bits, group them in 4's and write the corresponding hex digit

– Given hex, write the associated group of 4 digits

chapter 8_1 bits and the "why" of bytes: representing information digitally

Documents