engs4 2004 lecture 1 engs 4 - lecture 11 technology of cyberspace winter 2004 thayer school of...

27
ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko, x6-3843 [email protected] Assistant: Sharon Cooper (“Shay”), x6-3546 Course webpage: http://thayer.dartmouth.edu/~engs004/

Upload: helena-allison

Post on 25-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

ENGS 4 - Lecture 11Technology of Cyberspace

Winter 2004Thayer School of Engineering

Dartmouth College

Instructor: George Cybenko, x6-3843

[email protected]

Assistant: Sharon Cooper (“Shay”), x6-3546

Course webpage: http://thayer.dartmouth.edu/~engs004/

Page 2: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Today’s Class• Sharon’s mini-lecture

• Shannon theory and coding theory basics

• Break

• Leo’s mini-lecture

• Ryan’s mini-lecture

Page 3: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Future mini-lectures

• Feb 19 – Sarah (social inequality/digital divide), Ryan (internet dating), Noah R. (nanotechnology), Scott (digital watermarking)

• Feb 24 – Dason (pornography), En Young (persistence), Rob (GPS), Simon (online games)

• Topics – persistence, piracy, IP telephony, blogs, IM, global adoption and trends, GPS, online games

Page 4: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Constructing Prefix Codes Huffman coding

1/6 1/6 1/6 1/6 1/61/6

1 2 3 4 5 6

1/3 1/3 1/3

2/3

1

000 001 010 011 10 11

3/6 x 4 +2/6 x 2= 2 2/3 < 3

Page 5: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Another example

• What about two symbols, a,b with P(a)=0.9 and P(b)=0.1?

• The way to build efficient codes for such cases is to use “block codes”

• That is, consider pairs of symbols: aa, ab, ba, bb or aaa, aab, aba, abb, baa, bab, bba, bbb, etc

• Let’s do an example on the board.

Page 6: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

How about the other example?

S1 S20 1

0 1

Average number of bits per symbol = 1.

Is this the best possible? No.

Page 7: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Entropy of a source

Alphabet ={ s1 , s2 , s3 , ..., sn }

Prob(sk) = pk

Entropy = H = - pk log 2 (pk)

SOURCE

Shannon’s Source Coding Theorem:

H <= Average number of bits per symbolused by any decodable code.

Page 8: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Examples

H(die example) = - 6 x 1/6 x log(1/6)

H(Sun rise) = 0 log 0 + 1 log 1 = 0

How can we achieve the rate determined by theentropy bound?

Block codes can do better than pure Huffman coding.

“Universal codes”.... eg. Lempel-Ziv.

Page 9: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Break

Page 10: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Models of a source

We can easily measure the probability of each symbolin English. How?

How about the probabilities of each pair of symbols?

Triples?

Sentences?

True model of English = model of language = model of thought

This is very hard.

Page 11: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Lempel-Ziv Coding (zip codes, etc)

Want to encode a string: aaabcababcaaa into 0’s and 1’s.

Step 1: Convert to 0’s and 1’s by any prefix substitution.

a=00b=01c=11

00000001110001000111000000

Page 12: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Lempel-Ziv Coding (zip codes, etc)

Step 2: Parse the string into “never before seen” strings.

00000001110001000111000000

0,00,000,01,1,10,001,0001,11,0000,00

Page 13: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Lempel-Ziv Coding (zip codes, etc)

Step 3: Assign binary numbers to each.

0,00,000,01,1,10,001,0001,11,0000,00

0001,0010,0011,0100,0101,0110,0111,1000,1001,1010,1011

Step 4: To each string, assign number of substring plus last bit.

00000,00010,00100,00011,00001,01010,00101,

00000000100010000011000010101000101...

Page 14: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Lempel-Ziv Coding (zip codes, etc)

Step 5: Store this string plus length of labels in bits.

00000,00010,00100,

1111000000001000100....

This may be inefficient for small examples but for verylong inputs, it achieves the entropy for the best model.

LZW = Lempel-Ziv-Welsch, GIF image interchange standard

Page 15: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Example

What is the Lempel Ziv encoding of 00000...0000 (N 0’s)?

What is the entropy of the source?

How many bits per symbol will be used in the encodeddata as N goes to infinity?

Let’s work out the details.

How about 010010001000010000010000001.... ?

Page 16: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Properties of Lempel-ZivFor most sources (alphabets+probabilities), theLempel-Ziv algorithm will result in

average number of bits per symbolentropy of the source (any order model)

if the string/data to be compressed is long enough.How about compressing the compressed string?That is, applying Lempel-Ziv again and again?

Answer: The compressed bit string will look completely random: 0 or 1 with probability 1/2.Entropy = 1 means 1 bit per symbol on average.No improvement is possible.

Page 17: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Analog vs Digital• Most “real world” phenomena is continuous:

• images• vision• sound• touch

• To transmit it, we must convert continuous signalsinto digital signals.

• Important note:• There is a fundamental shift from continuous to

digital representation of the real world.

Page 18: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

The Fundamental Shift

The telephone system is designed to carry analogvoice signals using circuit switching. The whole infrastructure is based on that.

When a modem connects your computer to the networkover a telephone line, the modem must disguisethe computer data as a speech/voice signal.

The infrastructure of the internet is totally digital.Sending voice over the internet requiresdisguising voice as digital data!!!

This is a fundamental turnaround....same will hold forTV, cable TV, audio, etc.

Page 19: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Analog to Digital Conversion

111110101100011010001000

d 2d 3d 4d 5d 6d 7d 8d 9d 10d 11d 12d time

are samples

Page 20: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Analog to Digital Conversion

111110101100011010001000

d 2d 3d 4d 5d 6d 7d 8d 9d 10d 11d 12d time

are samples

d is the “sampling interval”, 1/d is the sampling “rate”

Page 21: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Sampling and quantization

In this example, we are using 8 quantization levelswhich requires 3 bits per sample. Using 8 bitsper sample would lead to 256 quantization levels, etc.

If the sampling interval is 1/1000000 second (amicrosecond), the sampling rate is 1000000 samples persecond or 1 megaHertz.

Hertz means “number per second” so 20,000 Hertz means20,000 per second.

So sampling at 20 kiloHertz means “20,000 samples per second”

Page 22: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Analog frequenciesAll real world signals can be represented as a sum

or superposition of sine waves with differentfrequencies - Fourier representation theorem.

The frequency of a sine wave is the number of timesit oscillates in a second.

Sine wave with frequency 20 will complete a cycle orperiod once every 1/20th of a second so 20 timesa second, etc.

We say that a sine wave with frequency 20 is a20 Hertz signal.....oscillates 20 times a second.

Page 23: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Fourier Java Applet

http://www.falstad.com/fourier/

Page 24: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Nyquist Sampling Theorem

• If an analog signal is “bandlimited” (ie consists of frequencies in a finite range [0, F]), then sampling must be at or above the twice the highest frequency to reconstruct the signal perfectly.

• Does not take quantization into account.

• Sampling at lower than the Nyquist rate will lead to “aliasing”.

Page 25: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Sampling for Digital Voice

• High quality human voice is 4000 Hz

• Sampling rate is 8000 Hz

• 8 bit quantization means 64,000 bits per second

• Phone system built around such a specification

• Computer communications over voice telephone lines is limited to about 56kbps

Page 26: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Implications for Digital Audio• Human ear can hear up to 20 kHz

• Sampling at twice that rate means 40 kHz

• Quantization at 8 bits (256 levels)

• 40,000 samples/second x 8 bits/ sample translates to 320,000 bits per second or 40,000 bytes per second.

• 60 seconds of music: 2,400,000 Bytes

• 80 minutes: about 190 Mbytes

• Audio CD??

Page 27: ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 1

Some Digital Audio Links• http://www.musiq.com/recording/mp3/index.html• http://www.musiq.com/recording/digaudio/bitrates.html

Aliasing in Images• http://www.telacommunications.com/nutshell/pixelation.ht

m#enlargement

Other• http://www.physics.nyu.edu/faculty/sokal/#papers