data encoding ctps 2018 - intranet.cb.amrita.edu · byte to terabyte • bits can be grouped...

109
Data Encoding CTPS 2018 Department of CSE,Coimbatore LN #8 (2 Hrs)

Upload: others

Post on 26-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Data Encoding

CTPS 2018

Department of CSE,Coimbatore

LN #8

(2 Hrs)

Page 2: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Objectives

❖To understand positional numeral systems.

❖To depict how complex information such as text, colors,

pictures, and sound can be encoded as bit strings.

Department of CSE,Coimbatore

Page 3: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Positional Number System

• A number system defines how a number can be represented using

distinct symbols.

• A number can be represented differently in different systems.

For example, the two numbers (2A)16 and (52)8 both refer to the

same quantity, (42)10, but their representations are different.

Department of CSE,Coimbatore

Page 4: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Common Number Systems

System Base SymbolsUsed by humans?

Used in

computers?

Decimal 10 0, 1, … 9 Yes No

Binary 2 0, 1 No Yes

Octal 8 0, 1, … 7 No No

Hexa-

decimal16

0, 1, … 9,

A, B, … FNo No

4

Page 5: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Bits and binary

• All computer data is represented using binary, a number

system that uses 0s and 1s.

• Binary digits can be grouped together into bytes.

• Computers use binary - the digits 0 and 1 - to store data.

• A binary digit, or bit, is the smallest unit of data in

computing.

• It is represented by a 0 or a 1.

• Binary numbers are made up of binary digits

(bits), eg the binary number 1001.

5 Department of CSE

Page 6: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

S = {0, 1}

Department of CSEDepartment of CSE,Coimbatore

Page 7: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Department of CSE,Coimbatore

Page 8: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

1.What is the biggest binary number one can write

with n bits?

3.

How many unique patterns does a sequence of 5

bits generate?

4. Write all the patterns of a sequence of 5 bits.

Department of CSE, Coimbatore8

What

Page 9: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

1. What is the biggest binary number one can write with

n bits? n 1’s

2. How many unique patterns does a sequence of 5 bits

generate? 2^5

3. Write all the patterns of a sequence of 5 bits.

00000,00001,00010…..11111

Department of CSE, Coimbatore9

Page 10: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Department of CSE,Coimbatore

Table : Four positional number systems

Page 11: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Bits and binary

• The circuits in a computer's processor are made up of billionsof transistors.

• A transistor is a tiny switch that is activated by the electronic signalsit receives.

• The digits 1 and 0 used in binary reflect the on and offstates of a transistor.

• Computer programs are sets of instructions.

• Each instruction is translated into machine code - simple binarycodes that activate the CPU.

• Programmers write computer code and this is converted bya translator into binary instructions that the processorcan execute.

11 Department of CSE

Page 12: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Byte to Terabyte

• Bits can be grouped together to make them easier to work with.

• A group of 8 bits is called a byte.

Other groupings include:

• Nibble - 4 bits (half a byte)

• Byte - 8 bits

• Kilobyte (KB) - 1024 bytes (or 1024 x 8 bits)

• Megabyte (MB) - 1024 kilobytes (or 1048576 bytes)

• Gigabyte (GB) - 1024 megabytes

• Terabyte (TB) - 1024 gigabytes

• Most computers can process millions of bits every second.

• A hard drive's storage capacity is measured in gigabytes orterabytes.

• RAM is often measured in megabytes or gigabytes.12 Department of CSE

Page 13: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Big Data: Volume

Byte Kilobyte Megabyte Gigabyte Terabyte Petabyte Exabyte Zettabyte Yottabyte

KB MB GB TB PB EB ZB YB

1000 bytes 1000 KB 1000 MB 1000 GB 1000 TB 1000 PB 1000 ZB 1000YB

30KB

One page

of text

5 MB

One song

5 GB

One movie 6 million

books

1 TB

55 storeys

of DVD

1 PB

Data

up to

2003

5 EB

Data

in 2011

1.8

ZB

NSA

data center

1 YB

Page 14: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Using Hexadecimal

• Hex codes are used in many areas of computing to simplify

binary codes.

• It is important to note that computers do not use hexadecimal

- it is used by humans to shorten binary to a more easily

understandable form.

• Hexadecimal is translated into binary for computer use.

• Some examples of where hex is used include:

• Colour references

• Error messages

• Assembly language programs

14 Department of CSE

Page 15: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Color References: Hex colour model

• Hex can be used to represent colours on web pages and image-editing programs using the format:

#RRGGBB (RR = reds, GG = greens, BB = blues).

The # symbol indicates that the number has been written in hex format.

• Eg #FF6600. The Hex color model uses two hex digits for each colour

15 Department of CSE

Page 16: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

#FF 66 00

16 Department of CSE

As one hex digit represents

4 bits

Two hex digits together

make 8 bits (1 byte).

Page 17: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• The values for each colour run between 00 and FF.

• In binary,

• 00 is 0000 0000

• FF is 1111 1111

• That provides 2^8 = 256 possible values for each of the

three colours.

• That gives a total spectrum of 256 reds x 256 greens x

256 blues - which is over 16 million colours in total.17 Department of CSE

Page 18: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• #FF0000 will be the purest red - red only, no green or blue.

• Black is #000000 - no red, no green and no blue.

• White is #FFFFFF.

• An orange colour can be represented by the code #FF6600.

• The hex code is much easier to read than the binary equivalent

1111 1111 0110 0110 0000 0000

18 Department of CSE

Page 19: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• The figure shows the additive mixing of red, green and blueprimaries to form the three secondary colors yellow (red +green), cyan (blue + green) and magenta (red + blue), andwhite ((red + green + blue).

• RGB model – Computer display

Department of CSE,Coimbatore

Colour model

Page 20: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• The figure shows the three subtractive primaries, and their

pairwise combinations to form red, green and blue, and

finally black by subtracting all three primaries from white.

• CMYK model - Used in Printing

Department of CSE,Coimbatore

Colour models

Page 21: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• If you are making a web page with HTML or CSS you can use hex codes to choose the colours.

• The RGB model ( Additive ) is used for color monitors and most video cameras.

• Hex values have equivalents in the RGB colour model.

• The RGB model is very similar to the hex colour model,

you use a value between 0 and 255 for each colour.

• So an orange colour that is #FF 66 00 in hex would be

255, 102, 0 in RGB.

• Cyan color is 0,255,255

•Teal color is 0,128,128

21 Department of CSE

Page 22: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Color, Hex and RGB color codes

• Red #FF0000 (255,0,0)

• Tomato #FF6347 (255,99,71)

• Coral #FF7F50 (255,127,80)

• indian red #CD5C5C (205,92,92)

Department of CSE,Coimbatore

Page 23: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Department of CSE,Coimbatore

Color HTML / CSS Name Hex Code#RRGGBBDecimal

Code(R,G,B)

Black #000000 (0,0,0)

White #FFFFFF (255,255,255)

Red #FF0000 (255,0,0)

Lime #00FF00 (0,255,0)

Blue #0000FF (0,0,255)

Yellow #FFFF00 (255,255,0)

Cyan / Aqua #00FFFF (0,255,255)

Magenta / Fuchsia #FF00FF (255,0,255)

Silver #C0C0C0 (192,192,192)

Gray #808080 (128,128,128)

Maroon #800000 (128,0,0)

Olive #808000 (128,128,0)

Green #008000 (0,128,0)

Purple #800080 (128,0,128)

Teal #008080 (0,128,128)

Navy #000080 (0,0,128)

Page 24: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Error messages using Hexa

• Hex is often used in error messages on your computer.

• The hex number refers to the memory location of

the error.

• This helps programmers to find and then fix

problems.

24 Department of CSE

Page 25: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Text, image,audio,video….

Department of CSE,Coimbatore

Image

Image

AudioAudio

Video

Text

Different forms of data

Data in any form is represented in binary form only in computers.

Page 26: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

All software, music, documents, and any other information that is

processed by a computer, is stored using binary.

26 Department of CSE

Page 27: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Different Forms of Data

Department of CSE,Coimbatore

Page 28: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Inside a computer, all data is stored as numbers ( binary) :• Numbers – are stored as numbers, obviously!

• Text characters are stored as a code that represents each – e.g.ASCII.

• Images are stored as numbers representing the amounts of red,green and blue for each pixel.

• Sounds are stored as numbers representing the loudness at givenintervals.

Department of CSE,Coimbatore

Page 29: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Storage space for data

Different types of data require different amounts of storage space.

29 Department of CSE

Data Storage

One extended-ASCII character in a text file (eg 'A') 1 byte

The word 'Monday' in a document 6 bytes

A plain-text email 2 KB

64 pixel x 64 pixel GIF 12 KB

Hi-res 2000 x 2000 pixel RAW photo 11.4 MB

Three minute MP3 audio file 3 MB

One minute uncompressed WAV audio file 15 MB

One hour film compressed as MPEG4 4 GB

Page 30: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Bit number patterns

• Computer systems and files have limits that are measured in bits.

For example, image and audio files have bit depth.

• The bit depth reflects the number of binary numbers available…..

• This is similar to the number of combinations available on a

padlock.

• The more wheels of numbers on a padlock, the more

combinations of numbers are possible.

• The greater the bit depth, the more combinations of binary numbers

are possible.30 Department of CSE

Page 31: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Bit number patterns

Every time the bit depth increases by one, the number of

binary combinations is doubled.

• A 1-bit system uses combinations of numbers up to one place value

(1).There are just two options: 0 or 1.

• A 2-bit system uses combinations of numbers up to two place values

(11).There are four options: 00, 01, 10 and 11.

31 Department of CSE

Page 32: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Binary combinations

• One bit

32 Department of CSE

Maximum binary number = 1

Maximum denary number = 1

Binary combinations = 2

Page 33: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Two bit

• Maximum binary number = 11

• Maximum denary number = 3

• Binary combinations = 4

33 Department of CSE

Page 34: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Three bit

• Maximum binary number = 111

• Maximum denary number = 7

• Binary combinations = 834 Department of CSE

Page 35: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Bit depth Max (binary) Max (denary)Combinations available

1 1 1 2

2 11 3 4

3 111 7 8

4 1111 15 16

5 11111 31 32

35 Department of CSE

A 1-bit image can have 2 colours,

a 4-bit image can have 16 colours,

an 8-bit image can have 256 colours,

and a 16-bit image can have 65,536 colours.

Page 36: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Encoding and Decoding

• Encoding is the process of putting a sequence of characters (letters,

numbers, punctuation, and certain symbols) into a specialized digital

format for efficient transmission or transfer.

• Decoding is the opposite process -- the conversion of a digital signal

into a sequence of characters.

• Encoding and decoding are used in data communications,

networking, and storage.

Department of CSE,Coimbatore

Page 37: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• Everything on a computer is represented as streams of binary

numbers.

• Audio, images and characters all look like binary numbers

in machine code.

• These numbers are encoded in different data formats to give them

meaning,

eg the 8-bit pattern 01000001 could be

the number 65,

the character 'A', or

a colour in an image.

37 Department of CSE

Page 38: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Encoding formats• Encoding formats have been standardised to help compatibility across

different platforms.

• audio is encoded as audio file formats, eg mp3,WAV, AAC

• video is encoded as video file formats, eg MPEG4, H264

• text is encoded in character sets, eg ASCII, Unicode

• images are encoded as file formats, eg BMP, JPEG, PNG

• The more bits used in a pattern, the more combinations of valuesbecome available.

• This larger number of combinations can be used to represent manymore things, eg a greater number of different symbols, or morecolours in a picture.

38 Department of CSE

Page 39: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

39 Department of CSE

QWERTY keyboard

A keyboard with Japanese characters

Character sets

Different languages use different keyboard layouts.

Page 40: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• A French keyboard has an é.

• If we were writing in Japanese or Arabic, we would need even

more choices of characters.

• In theory, anyone can create a character set.

But it is important that computers can communicate,

so we use global standards for character sets.

40 Department of CSE

Page 41: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• Every word is made up of symbols or characters.

• When you press a key on a keyboard, a number is generated that represents the symbol for that key.

• This is called a character code.

• A complete collection of characters is a character set.

41 Department of CSE

Page 42: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Representing Character

Department of CSE,Coimbatore

Page 43: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

You can check what character encoding

your web browser is using by looking in your

browser settings:

• Mozilla Firefox >Tools > Page Info: Encoding

• Microsoft Internet Explorer >View > Encoding

• Google Chrome >Tools > Encoding

43 Department of CSE

Page 44: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

If all our messages are made up of the eight

symbols A, B, C, D, E, F, G, and H,

we can choose a code with ----------------- bits per

character.

Department of CSE,Coimbatore

Page 45: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• If all our messages are made up of the eight

symbols A, B, C, D, E, F, G, and H,

• we can choose a code with three bits per

character:

• A 000 C 010 E 100 G 110

• B 001 D 011 F 101 H 111

Department of CSE,Coimbatore

Page 46: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• A 000 C 010 E 100 G 110

• B 001 D 011 F 101 H 111

• With this code, the message

BACADAEAFABBAAAGAH is encoded as the

string of ----------------bits

Department of CSE,Coimbatore

Page 47: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• A 000 C 010 E 100G 110

• B 001 D 011 F 101 H 111

• With this code, the message BACADAEAFABBAAAGAH is encoded as the string

of 54 bits

001000010000011000100000101000001001000000000110000111

Department of CSE,Coimbatore

Page 48: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Text Encoding

• Characters are usually encoded as integer values using encoding

schemes.

• The associations between numbers and text are known

collectively as a character encoding scheme.

Department of CSE,Coimbatore

Page 49: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

ASCII - American Standard Code for Information Interchange

• Unaccented, English letters

• Every letter, number, capital, etc , represented by

codes 0-127.

• Eg: Space, 32; “A”, 65; “a”, 97.

• Only the 7-bit patterns were standardized under ASCII.

• Standard 8-bit ASCII codes

• start with a zero-valued bit (followed by 7-bit

ASCII code).Department of CSE,Coimbatore

Page 50: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• Extended ASCII codes start with a one-valued bit

• these codes are not standard and vary in meaning

among different manufactures and equipment.

• First 32 patterns are control codes:

• the most common of these are 0Ah (Line Feed) and

0Dh (Carriage Return).

Department of CSE,Coimbatore

Page 51: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Table : ASCII Chart

Department of CSE,Coimbatore

Page 52: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

EBCDIC (Extended Binary Coded Decimal

Interchange Code)

• Developed by IBM.

• Restricted mainly to IBM or IBM compatible mainframes.

• Conversion software to/from ASCII available.

• Common in archival data.

• Character codes differ from ASCII.

Department of CSE,Coimbatore

ASCII EBCDIC

Space 2016 4016

A 4116 C116

b 6216 8216

Page 53: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• Unicode uses between 8 and 32 bits per character

• It can represent characters from languages from all around the world.

• It is commonly used across the internet.

• As it is larger than ASCII, it might take up more storage space when saving documents.

Global companies, like Facebook and Google, would not use the ASCII character set because their users communicate in many different languages.

Department of CSE,Coimbatore

Unicode

Page 54: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• Multilingual: defines codesfor

• Nearly every character-basedalphabet.

• Large set of ideographs forChinese, Japanese andKorean.

• Composite characters forvowels and syllabic clustersrequired by some languages.

• Allows software modifications forlocal-languages.

Department of CSE,Coimbatore

Page 55: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

ASCII only contains 127 characters

An extended version of ASCII exists with 257

characters

This is by far not enough as it is too restrictive

to the English language.

UNICODE was developed to alleviate this

problem:

the latest version, UNICODE 5.1.0 contains

more than 100,000 characters, covering most

existing languages.

For more information, see:

http://www.unicode.org/versions/Unicode5.

1.0/

Page 56: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Image Encoding

Department of CSE,Coimbatore

• Binary representation of bitmap images

• All bitmap images are stored as array of pixels.

• A monochrome images store

• 1 for black pixel and

• 0 for a white pixel

• (or vice versa depending on the encoding protocol)

• It could also be necessary to store the dimensions of the image.

Page 57: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Department of CSE,Coimbatore

Bitmap

Page 58: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

What is the bitmap?

Department of CSE,Coimbatore

Page 59: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Bitmap

Department of CSE,Coimbatore

Page 60: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Department of CSE,Coimbatore

✓ Show how to encode

Page 61: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Answer

This image could be

represented as following 35

binary digits (5 bytes):

00100 01010 01010 10001

11111 10001 00000

Page 62: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Color Images

Department of CSE,Coimbatore

Page 63: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Department of CSE,Coimbatore

Representing Color

Page 64: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Representing Color

Department of CSE,Coimbatore

• Each pixel of the rose flower is

to be defined using 24 bits(8

bits/ color RGB)

• The first 8 bits specifying the

shade of red,

• The next 8 bits specifying the

shade of green and

• The last 8 bits specifying the

shade of blue.

Page 65: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Color Images

Department of CSE,Coimbatore

Page 66: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Color Images

Department of CSE,Coimbatore

Page 67: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Color Images

Department of CSE,Coimbatore

Page 68: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Color Images

Department of CSE,Coimbatore

Page 69: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Representing Sound

Department of CSE,Coimbatore

Sound is produced by the vibration of a media like air or

water.

Audio refers to the sound within the range of human

hearing.

Sound is stored in a computer as binary codes

Page 70: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• A microphone translates the change in air pressure and

converts it to a wave form.

• A converter within the sound card of the computer takes

readings each second.

• These readings are positions (voltages, actually) on the

wave in relation to the zero line.

• They are recorded and converted from decimal to binary

numbers.Department of CSE,Coimbatore

Page 71: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Department of CSE,Coimbatore

Sound Data As Bytes:

• The data is represented as a pair of numbers.

• The first part representing the time and the second

part representing the voltage value {0000 — low and

1111-high}

Page 72: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Department of CSE,Coimbatore

Page 73: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

A sound signal is analog, i.e. continuous in both time

and amplitude.

To store and process sound information in a computer or

to transmit it through a computer network, we must first

convert the analog signal to digital form using an analog-

to-digital converter ( ADC )

The conversion involves two steps:

(1) sampling, and (2) quantization.

Page 74: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• Sampling is the process of examining the

value of a continuous function at regular

intervals.

• Sampling usually occurs at uniform intervals,

which are referred to as sampling intervals.

• The number of samples taken in a second is

called the sampling rate

Page 75: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other
Page 76: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

To represent the varying values of a soundwave, it’s height must

be measured at regular intervals and the measurements given

binary codes.

The sampled measurements make up the digital sound file

Sampling rate

Analogue signal

Time

Am

pli

tud

e

Page 77: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Quantization is the process of

limiting the value of a sample of a continuous function

to one of a predetermined number of allowed values,

which can then be represented by a finite number of

bits.

The number of bits used to store each intensity defines the

accuracy of the digital sound.

Page 78: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• Using 2 bit sampling to represent the audio signal ...

Department of CSE,Coimbatore

11100100

t1 t2 t3 t4 t10

Page 79: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• Using 2 bit sampling to represent the audio signal ...

• At t1 : 01

Department of CSE,Coimbatore

11100100

t1 t2 t3 t4 t10

Page 80: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• Using 2 bit sampling to represent the audio signal ...

• AT t2 it is : 00

• We have 01 00

Department of CSE,Coimbatore

11100100

t1 t2 t3 t4 t10

Page 81: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• Using 2 bit sampling to represent the audio signal ...

• At t3 it is: 01

• We have 01 00 01

Department of CSE,Coimbatore

11100100

t1 t2 t3 t4 t10

Page 82: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• Using 2 bit sampling to represent the audio signal ...

• The complete wave is represented by specifying the region to which it

belongs i.e at time 1 it is in region 01, at time 2 it is in 00… and so on .

• Here we are not representing time as we are sampling continuously at

time = 1, 2, 3…

Department of CSE,Coimbatore

11100100

Page 83: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

01 00 01 01 11

01 10 01 11 01

The complete representation of the

signal is….

Page 84: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Adding one bit makes the sample twice as accurate

Page 85: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

How much space do we need to store one minute of music?

- 60 seconds

- 44,100 samples

-16 bits (2 bytes) per sample

- 2 channels (stereo)

S = 60x44100x2x2 = 10,534,000 bytes ≈ 10 MB !!

1 hour of music would be more than 600 MB !

Page 86: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Data ≠ Information

• Data and information are not synonymous terms!

• Data is the means by which information is conveyed.

• Data compression aims to reduce the amount of data

while preserving as much information as possible.

Page 87: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

H.R. Pourreza

REDUNDANTDATA

INFORMATION

DATA = INFORMATION + REDUNDANT DATA

Page 88: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

The same information can be represented by different

amount of data

1. Your wife, Helen, will meet you at Logan Airport in Boston at 5

minutes past 6:00 pm tomorrow night

2. Your wife will meet you at Logan Airport at 5 minutes past

6:00 pm tomorrow night

3. Helen will meet you at Logan at 6:00 pm tomorrow night

Page 89: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Data Compression

• The art of reducing the number of bits needed to store or

transmit data is data compression.

• To reduce the volume of data to be transmitted (text, fax,

images).

• To reduce the bandwidth required for transmission and to

reduce storage requirements (speech, audio, video).

Department of CSE,Coimbatore

Page 90: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Classification

• Lossless compression

• Lossless compression for legal and medical documents, computer

programs.

• Information preserving

• Low compression ratios

• Lossy compression

• Digital audio, image, video where some errors or loss can be tolerated.

• Not information preserving

• High compression ratiosDepartment of CSE,Coimbatore

Page 91: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Department of CSE,Coimbatore

Trade-off: information loss vs compression ratio

Page 92: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Video and Audio Compression

• Video and Audio files are very large.

• Unless we develop and maintain very high bandwidth networks

(Gigabytes per second or more) we have to compress the data.

• Relying on higher bandwidths is not a good option.

• Compression becomes part of the representation or coding

scheme which have become popular audio, image and video

formats.

Department of CSE,Coimbatore

Page 93: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Run-length Encoding

• This encoding method is frequently applied to images (or pixels

in a scan line).

• It is a small compression component used in JPEG

compression.

• In this instance, sequences of image elements X1, X2, …, Xn

are mapped to pairs (c1, l1), (c1, l2), …, (cn, ln)

where ci represent image intensity or colour and

li the length of the ith run of pixelsDepartment of CSE,Coimbatore

Page 94: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Black and White Image

Department of CSE,Coimbatore

Page 95: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Black and White

Department of CSE,Coimbatore

Page 96: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Improve Efficiency

Department of CSE,Coimbatore

Page 97: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Color Images

Department of CSE,Coimbatore

Page 98: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Color Images

Department of CSE,Coimbatore

Page 99: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Color Images

Department of CSE,Coimbatore

Page 100: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Run-length Encoding

Department of CSE,Coimbatore Figure: An encoded figure

Page 101: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Run Length encode the image

Department of CSE,Coimbatore

Page 102: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Department of CSE,Coimbatore

Page 103: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Run Length Code the image ……….

Department of CSE,Coimbatore

Page 104: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Department of CSE,Coimbatore

Page 105: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Run-length encoding isn't a good approach for text

compression. Why?

Department of CSE,Coimbatore

Page 106: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Run-length encoding isn't a good approach for text

compression. Why?

Long runs rarely appear in a natural language.

Department of CSE,Coimbatore

Page 107: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

Data compression ratio

• Data compression ratio, also known as compression power, is acomputer science term used to quantify the reduction in data-representation size produced by a data compression algorithm.

• The data compression ratio is analogous to the physical compressionratio used to measure physical compression of substances.

• Data compression ratio is defined as the ratio between theuncompressed size and compressed size:

C o m p r e s s i o n R a t i o =

U n c o m p r e s s e d S i z e / C o m p r e s s e d S i z e

Department of CSE,Coimbatore

Page 108: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

• A representation that compresses a 10 MB file to 2 MB has a

compression ratio of 10/2 = 5, often notated as an explicit ratio,

5:1, or as an implicit ratio, 5/1.

• Sometimes the space savings is given instead, which is defined

as the reduction in size relative to the uncompressed size:

S p a c e S a v i n g s =

1 − C o m p r e s s e d S i z e / U n c o m p r e s s e d S i z e

• A representation that compresses a 10MB file to 2MB would yield a

space savings of 1 - 2/10 = 0.8, often notated as a percentage, 80%.

Department of CSE,Coimbatore

Page 109: Data Encoding CTPS 2018 - intranet.cb.amrita.edu · Byte to Terabyte • Bits can be grouped together to make them easier to work with. • A group of 8 bits is called a byte. Other

What has been described?

• Positional number system.

• Binary representation

• The data encoding schemes for text, color, image and sound.

• Compression technique and how data can be compressed using

RLE method.

Credits

▪Foundations of Computer Science --- Behrouz Forouzan, Firouz Mosharral

▪www.bbc.co.uk › Home › KS3 › Computing › Data representation

▪Google imagesDepartment of CSE,Coimbatore