chapter 3 : data representation

Chapter Chapter 33: Data : Data RepresentationRepresentation

Types of DataTypes of Data

NumbersNumbers– 2324, -34.35, 34567890123.123452324, -34.35, 34567890123.12345

Characters and symbolsCharacters and symbols– A, B, C, … Z, a, b, c,… z, A, B, C, … Z, a, b, c,… z, – 0, 1, 2, 3 … 9, +, -, ), (, *, &, etc0, 1, 2, 3 … 9, +, -, ), (, *, &, etc

ImagesImages– Photos, charts, drawingsPhotos, charts, drawings

AudioAudio– Sound, music, etcSound, music, etc

VideoVideo– Video clips and moviesVideo clips and movies

InstructionsInstructions– Computer instructions are coded in sequences of 0’s Computer instructions are coded in sequences of 0’s

and 1’sand 1’s

Binary Number SystemBinary Number System

Cheapest and simplest in design and Cheapest and simplest in design and engineeringengineering

Switch: on Switch: on 11; off ; off 0 0 Circuit: voltages Circuit: voltages

– 1.7 volts – higher 1.7 volts – higher 11– 0.0 volts - 1.3 volts 0.0 volts - 1.3 volts 0 0– Voltages (1.3 to 1.7) are avoided in designVoltages (1.3 to 1.7) are avoided in design

Mathematics: binary numbersMathematics: binary numbers– Using digits 0 and 1 only.Using digits 0 and 1 only.

Decimal vs. BinaryDecimal vs. Binary

Decimal # systemDecimal # system– 10 symbols: 1, 2, 3,…9, 010 symbols: 1, 2, 3,…9, 0– Base = 10 (We have 10 fingers)Base = 10 (We have 10 fingers)– Decimal number 2324 reads “Decimal number 2324 reads “2 2

thousands 3 hundreds twenty four”.thousands 3 hundreds twenty four”. Binary # systemBinary # system

– 2 symbols: 0 and 12 symbols: 0 and 1– Base = 2Base = 2– Binary number 1101 = ?Binary number 1101 = ?

Decimal vs. BinaryDecimal vs. Binary

42 3 2 .

2*1000

4*1Each digit represents: 10

10 1Position values: 103Position values (base):

102 101 100

Decimal # System:

11 1 0 .

1*8 1*4 0*2 1*1Each digit represents:

8 4 2 1Position values:

23Position values (base):

22 21 20

Binary # System:

Value in Decimal:

2*1000+3*100+2*10+4*1 = 2324D

Value in Decimal:

1*8+1*4+0*2+1*1 = 13D

Storage UnitsStorage Units

Binary digits – bitsBinary digits – bits 8 bits = 1 byte8 bits = 1 byte 2210 10 bytes = 1024 bytes =1 kilobytes = bytes = 1024 bytes =1 kilobytes =

1KB 1KB 2220 20 bytes = 2bytes = 210 10 KB = 1 megabytes = 1MBKB = 1 megabytes = 1MB 2230 30 bytes = 2bytes = 210 10 MB = 1 gigabytes = 1GBMB = 1 gigabytes = 1GB 2240 40 bytes = 2bytes = 210 10 GB = 1 terabytes = 1TBGB = 1 terabytes = 1TB

Representation of NumbersRepresentation of Numbers

Fixed-size-storage approach:Fixed-size-storage approach:– Computers allocate a specified amount of Computers allocate a specified amount of

space for a numberspace for a number IntegersIntegers

1 bit: 0 to 11 bit: 0 to 1 2 bits: 00, 01, 10, 11 2 bits: 00, 01, 10, 11 0 to 3 0 to 3 4 bits: 0000, 0001, 0010, … 1111 4 bits: 0000, 0001, 0010, … 1111 0 to 15 0 to 15 1 byte: 0 to 2551 byte: 0 to 255 2 bytes: -32768 to +327672 bytes: -32768 to +32767 4 bytes: -2,147,483,648 to +2,147,483,6474 bytes: -2,147,483,648 to +2,147,483,647Note: with 4 bytes for integers, any number Note: with 4 bytes for integers, any number

smaller than smaller than -2,147,648 -2,147,648 or larger than or larger than 2,147,483,6472,147,483,647 would be incorrectly would be incorrectly represented.,represented.,

Representation of Representation of NumbersNumbers

11 10 .

1*2 0*1 1*0.5

1*0.25

Each digit represents:

2 1 1/2 1/4Position values:

21Position values (base):

20 2-1 2-2

Binary # System:

Value in Decimal:

2 + ½ + ¼ + 1/8 = 2.875D

1*0.125

Binary representation of real numbers

Floating-point numbers for real numbersFloating-point numbers for real numbers– Three parts of representation:Three parts of representation:

1.1. Sign (always 1 bits: 0 for + and 1 for -)Sign (always 1 bits: 0 for + and 1 for -)2.2. Significant digits (e.g., six bits)Significant digits (e.g., six bits)3.3. the power of 2 for the leftmost digit (e.g., 3 bits)the power of 2 for the leftmost digit (e.g., 3 bits)

– Example for binary -1111.01Example for binary -1111.01 Sign: 1 (negative)Sign: 1 (negative) Significant digits: 111101Significant digits: 111101BB

Power of 2: 011Power of 2: 011BB

– Example for binary +100.1101Example for binary +100.1101BB Sign: 0 (positive)Sign: 0 (positive) Significant digits: 100110Significant digits: 100110BB

– Note: the last digit is lost, which is 1/16 in decimalNote: the last digit is lost, which is 1/16 in decimal Power of 2: 010Power of 2: 010BB

Single-precision floating-point numbersSingle-precision floating-point numbers1.1. Sign (always 1 bits: 0 for + and 1 for -)Sign (always 1 bits: 0 for + and 1 for -)2.2. Significant digits: 23 bitsSignificant digits: 23 bits3.3. exponent: 8exponent: 8

Double-precision floating-point numbersDouble-precision floating-point numbers1.1. Sign (always 1 bits: 0 for + and 1 for -)Sign (always 1 bits: 0 for + and 1 for -)2.2. Significant digits: 52 bitsSignificant digits: 52 bits3.3. exponent: 11exponent: 11

What you should know?What you should know?– Computers can represent numbers only in limited Computers can represent numbers only in limited

accuracy.accuracy. E.g., when you enter a E.g., when you enter a 20 digit20 digit decimal # into a program decimal # into a program

that uses single-precision, only that uses single-precision, only about 7 digitsabout 7 digits are actually are actually stored, the rest are lost.stored, the rest are lost.

– Real examples:Real examples: Designing aircraft on p.35Designing aircraft on p.35 The Vancouver Stock Exchange Index on pp. 38-39 The Vancouver Stock Exchange Index on pp. 38-39

Representation of Representation of NumbersNumbers// file: public_html/2005f-html/cil102/accuracy.c// file: public_html/2005f-html/cil102/accuracy.c#include <stdio.h>#include <stdio.h>

int main() {int main() { int x, y, result;int x, y, result; // x, y, and result all use 32 bits to represent integers (-2,147,648 to // x, y, and result all use 32 bits to represent integers (-2,147,648 to

+2,147,483,647)+2,147,483,647) char op;char op; int i;int i;

for (i = 0; i < 100; i++) {for (i = 0; i < 100; i++) { printf("please enter an expression:\n");printf("please enter an expression:\n"); scanf("%d %c %d", &x, &op, &y);scanf("%d %c %d", &x, &op, &y);

if (op == '+')if (op == '+') result = x + y;result = x + y; else if (op == '-')else if (op == '-') result = x - y;result = x - y; else {else { printf("Invalid operator!!");printf("Invalid operator!!"); break;break; }} printf("%d %c %d = %d\n", x, op, y, result);printf("%d %c %d = %d\n", x, op, y, result); }}}}// When you enter // When you enter 2000000000 + 5000000002000000000 + 500000000, the result is , the result is -1794967296-1794967296

Variable-size-storage approach:Variable-size-storage approach:– Allow a wide-range of numbers to be Allow a wide-range of numbers to be

stored accuratelystored accurately– Needs significant more time to Needs significant more time to

processprocess– Fixed-size approach is used more Fixed-size approach is used more

common than variable-size common than variable-size approach.approach.

Representation of charactersRepresentation of characters

There are no visual letters A, B, C, etc stored There are no visual letters A, B, C, etc stored in computers like we have in mind.in computers like we have in mind.

Letters and symbols are encoded in 8 bits – Letters and symbols are encoded in 8 bits – one byte - of 0’s and 1’s.one byte - of 0’s and 1’s.– Keyboard converts keys A, B, C etc to their Keyboard converts keys A, B, C etc to their

corresponding codes and corresponding codes and – monitor converts the code into visual letters A, B, C monitor converts the code into visual letters A, B, C

etc on screen.etc on screen. Two commonly used coding schemes:Two commonly used coding schemes:

– ASCIIASCII: American Standard Code Information : American Standard Code Information InterchangeInterchange

– EBCDICEBCDIC: Extended Binary Coded Decimal : Extended Binary Coded Decimal Interchange CodeInterchange Code

Representation of Representation of characterscharacters

CharacterCharacter EBCDICEBCDIC ASCIIASCIIAA 1100000111000001 0100000101000001BB 1100001011000010 0100001001000010aa 1000000110000001 0110000101100001bb 1000001010000010 011000100110001000 1111000011110000 001100000011000011 1111000111110001 001100010011000122 1111001011110010 0011001000110010

, (comma), (comma) 0110101101101011 0010110000101100- (dash)- (dash) 0110000001100000 0010010100100101

Representation of Representation of characterscharacters

Foreign characters – two approachesForeign characters – two approaches– Use one byte per charUse one byte per char

Ex., Ex., – ISO-8859-1 for Western (Roman)ISO-8859-1 for Western (Roman)– ISO-8859-7 for GreekISO-8859-7 for Greek– ISO-2022-CN for simplified ChineseISO-2022-CN for simplified Chinese

Webpage: using “META charset=…” to specify Webpage: using “META charset=…” to specify which encoding is used.which encoding is used.

– Use two bytes per char/symbolsUse two bytes per char/symbols 16 bits have 65,536 combinations (characters)16 bits have 65,536 combinations (characters) Unicode coding systemUnicode coding system

Representation of ImagesRepresentation of Images

A picture is treated as a matrix of dots, called A picture is treated as a matrix of dots, called

pixelspixels..

The pixels are so small and close The pixels are so small and close together we cannot really see together we cannot really see them as separate dots.them as separate dots.

Resolution: dots per inch (Resolution: dots per inch (dpidpi))– 72 dpi for Web images72 dpi for Web images– 600 or 1200 dpi for professional 600 or 1200 dpi for professional

printers or home photo printersprinters or home photo printers

The color of each pixel is represented using bits.The color of each pixel is represented using bits. Black/WhiteBlack/White: one bit per pixel: one bit per pixel

– 1-white and 0-black1-white and 0-black Gray scaleGray scale: one byte per pixel: one byte per pixel

– 256 different degrees of gray (00000000 to 11111111)256 different degrees of gray (00000000 to 11111111)– 00000000 black, 01111111 intermediate gray, 11111111 00000000 black, 01111111 intermediate gray, 11111111

white white ColorColor: three bytes per pixel: three bytes per pixel

– Red, green, blue colorRed, green, blue color– One byte for the intensity of each of the three colorOne byte for the intensity of each of the three color– 256 possible red, 256 green, 256 blue256 possible red, 256 green, 256 blue

Pure red: 11111111 for red byte, 00000000 for green and bluePure red: 11111111 for red byte, 00000000 for green and blue White: 11111111 for all three bytesWhite: 11111111 for all three bytes Black: 00000000 for all three bytes Black: 00000000 for all three bytes

Image storage -- sizeImage storage -- size Gray scaleGray scale: : one byteone byte per pixel per pixel

E.g., A 3 X 5 picture with 300 dpi resolutionE.g., A 3 X 5 picture with 300 dpi resolution 3 * 300 = 900 pixels per column3 * 300 = 900 pixels per column 5 * 300 = 1500 pixels per row5 * 300 = 1500 pixels per row 900 * 1500 = 1,350,000 pixels/picture900 * 1500 = 1,350,000 pixels/picture Needed storage = 1,350,000 bytes/picture = Needed storage = 1,350,000 bytes/picture =

1MB/picture1MB/picture ColorColor: : three bytesthree bytes per pixel per pixel

E.g., A 3 X 5 picture with 300 dpi resolutionE.g., A 3 X 5 picture with 300 dpi resolution 3 * 300 = 900 pixels per column3 * 300 = 900 pixels per column 5 * 300 = 1500 pixels per row5 * 300 = 1500 pixels per row 900 * 1500 = 1,350,000 pixels/picture900 * 1500 = 1,350,000 pixels/picture Needed storage = 3 (bytes per pixel) * 1,350,000 Needed storage = 3 (bytes per pixel) * 1,350,000 = 4,050,000 bytes/picture = 4,050,000 bytes/picture = 4MB/picture = 4MB/picture ------ TOO BIG TOO BIG

Image compressionImage compression Color tableColor table

– Most pictures contain a small # of different colorsMost pictures contain a small # of different colors– Use a table to define colors that are actually used Use a table to define colors that are actually used

in the picture in the picture – Each pixel has an index to the Each pixel has an index to the color tablecolor table..– Each image contains a Each image contains a color tablecolor table and and table indicestable indices– ExampleExample

For a picture with For a picture with 100 different colors100 different colors, the color table would , the color table would contain contain 100 entries100 entries, three bytes each entry for each color. , three bytes each entry for each color. One byteOne byte can be used as index to the table for each pixel. can be used as index to the table for each pixel.

Drawing commandsDrawing commands– Draw picture using basic commandsDraw picture using basic commands– Just as artists draws using a pencil or a Just as artists draws using a pencil or a

brush and other basic movements brush and other basic movements – Example,Example,

A house is drawn by sketching various A house is drawn by sketching various elements (doors, windows, walls), adding elements (doors, windows, walls), adding color to them, and moving to the desired color to them, and moving to the desired position.position.

Data averaging or samplingData averaging or sampling– Condense the size by selecting a smaller Condense the size by selecting a smaller

collection of information to store.collection of information to store.– Many different ways of sampling and data Many different ways of sampling and data

averagingaveraging– An example: choose to store only every other An example: choose to store only every other

pixel in an image (pixel in an image (samplingsampling)– reducing the size to )– reducing the size to half. To display the full picture, the computer need half. To display the full picture, the computer need to fill in the missing data with, for example, the to fill in the missing data with, for example, the average of neighboring pixels (average of neighboring pixels (data averagingdata averaging))

– The resulting picture cannot be as sharp as the The resulting picture cannot be as sharp as the original original

– Lossy data compressionLossy data compression

Image FormatsImage Formats

Commonly used image file formats -1Commonly used image file formats -1– Bitmap (.bmp)Bitmap (.bmp)

Pixel-by-pixel storage of all color information for each Pixel-by-pixel storage of all color information for each pixel.pixel.

Lossless representationLossless representation Files are huge.Files are huge.

– Graphics Interchange Format (.gif)Graphics Interchange Format (.gif) Use one or more color tables – the Use one or more color tables – the color tablecolor table

techniquetechnique Each table contains 256 colors. Each table contains 256 colors. Suitable for pictures with a small # (<256) of Suitable for pictures with a small # (<256) of

different colors (e.g., organization charts)different colors (e.g., organization charts) Not suitable for pictures with shading (e.g., photos)Not suitable for pictures with shading (e.g., photos)

Image FormatsImage Formats

Commonly used image file formats - 2Commonly used image file formats - 2– PostScript (.ps)PostScript (.ps)

Employ the Employ the drawing commandsdrawing commands technique technique ““moveto” draws a line from current position to a new one moveto” draws a line from current position to a new one

and “arc” draws an arc given its center, radius, etcand “arc” draws an arc given its center, radius, etc General shapes can be used in multiple places General shapes can be used in multiple places Fonts can be reused.Fonts can be reused. Useful when the picture can be rendered as a drawing or its Useful when the picture can be rendered as a drawing or its

contains many of the same elements (e.g., text of the same contains many of the same elements (e.g., text of the same fonts)fonts)

– Joint Photographic Experts Group (JPEG) (.jpg)Joint Photographic Experts Group (JPEG) (.jpg) use the use the data averaging and samplingdata averaging and sampling on 8*8 pixel blocks on 8*8 pixel blocks User determines the level of details and clarityUser determines the level of details and clarity High-quality image – 8*8 blocks maintain their contentsHigh-quality image – 8*8 blocks maintain their contents Low-quality image – info in 8*8 blocks is discarded Low-quality image – info in 8*8 blocks is discarded smaller smaller

filesfiles

Comparison b/w jpg, gif, Comparison b/w jpg, gif, and psand ps

Comparison of .jpg and .gifComparison of .jpg and .gifhttp://www.siriusweb.com/tutorials/gifvsjpg/

More on .jpg and .gifMore on .jpg and .gifhttp://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.htm

Summary of Image RepresentationsSummary of Image Representations

Other commonly used formatsOther commonly used formats– Tiff: Tagged Image File Format Tiff: Tagged Image File Format – PNG: Portable Network GraphicsPNG: Portable Network Graphics– New formats will emerge New formats will emerge

Understand the format and know Understand the format and know the pros and consthe pros and cons

To learn: Google the formatTo learn: Google the format Use programs (GIMP) to convert Use programs (GIMP) to convert

b/w formatsb/w formats

ADC and DACADC and DAC

ADC: Analog to Digital ConverterADC: Analog to Digital Converter

Use 8 bits to represent voltage 0 to 5 Use 8 bits to represent voltage 0 to 5 voltsvolts

Input = 5 volts, output = 1111 1111Input = 5 volts, output = 1111 1111 Input = 3 volts, output = 1001 0111Input = 3 volts, output = 1001 0111 Input = 0 volts, output = 0000 0000Input = 0 volts, output = 0000 0000

3 volts

5 volts 1111

ADC 1001 0111

ADC and DACADC and DAC

DAC: Digital to Analog ConverterDAC: Digital to Analog Converter

3 volts

5 volts1111

DAC1001 0111

Use 8 bits to represent voltage 0 to 5 Use 8 bits to represent voltage 0 to 5 voltsvolts

Input = 1111 1111, output = 5 voltsInput = 1111 1111, output = 5 volts Input = 1001 0111, output = 3 voltsInput = 1001 0111, output = 3 volts Input = 0000 0000, output = 0 voltsInput = 0000 0000, output = 0 volts

Analog AudioAnalog Audio

Sound wave

Digital Recording - 1 Digital Recording - 1

Digital Recording at low sample rate

Digital Replaying

Digital Recording - 2 Digital Recording - 2

Digital Recording at low high sampling rate

Digital Replaying

Music CDMusic CD

Sample rate: 44,100 Sample rate: 44,100 samples/secondsamples/second

#of bits for height: 16 bits#of bits for height: 16 bits # of channel: 2# of channel: 2 Total of bytes/sec:Total of bytes/sec:

44,100 samples/s x 2 bytes/sample x 2 channels44,100 samples/s x 2 bytes/sample x 2 channels

= 176,400 bytes/second= 176,400 bytes/second

Total of bytes on a 74 minute CDTotal of bytes on a 74 minute CD176,400 bytes/sec * 70 minutes * 60 seconds/minute176,400 bytes/sec * 70 minutes * 60 seconds/minute

= 783,216,000 => 783 MB = 783,216,000 => 783 MB

MP3 FormatMP3 Format

Compress the audio based on the Compress the audio based on the following:following:– People cannot hear sound at very low People cannot hear sound at very low

and very high frequenciesand very high frequencies– People hear loud sound, not the softer People hear loud sound, not the softer

one when there are two soundsone when there are two sounds– There are sounds humans hear better. There are sounds humans hear better.

Lossy FormatLossy Format

MP3 QualityMP3 Quality

Bit Rate: # of bits per second Bit Rate: # of bits per second encoded in MP3encoded in MP3

Bit Rate: 96 - 320 bit rateBit Rate: 96 - 320 bit rate QualityQuality

– 320 bit rate 320 bit rate humans cannot tell humans cannot tell difference from original music CDdifference from original music CD

– 120 bit rate 120 bit rate like hearing music on radio like hearing music on radio– 160 bit rate or higher 160 bit rate or higher for better for better

experienceexperience

Music CD to MP3 FilesMusic CD to MP3 Files

Music CD

Finest Quality

PCHard disk

Data CDMP3

RipperMP3

EncoderOr

Compresser

Listening to Music and Listening to Music and MP3MP3

Music CD

Finest Quality

Data CDMP3

Music CD

Player

MP3 Player

chapter 3 : data representation

Documents

data mining – output: knowledge representation chapter 3

chapter 2 data representation - ksuweb.kennesaw.edu

chapter 3 data representation

chapter 2 data representation - se.rit.edu

chapter 1-c,d computers and digital basics. digital data...

chapter 2 bits, data types & operations l integer...

chapter 1 representation and summary of data & answers

data representation chapter three · signed numeric...

chapter 3 the information layer: data representation

data representation chapter one

chapter 3 data representation. chapter goals describe...

chapter 1 numeric data representation the primary problem in

graphical representation of data -...

chapter 2 data representation -...

chapter 02 introduction to data representation...

542 chapter 9 data collection, representation and...

data representation chapter one -...

s1: chapter 4 representation of data

chapter 1 representation of data - mega lecture

data representation in computer systems chapter 2