engs4 2004 lecture 1 engs 4 - lecture 11 technology of cyberspace winter 2004 thayer school of...
TRANSCRIPT
ENGS4 2004 Lecture 1
ENGS 4 - Lecture 11Technology of Cyberspace
Winter 2004Thayer School of Engineering
Dartmouth College
Instructor: George Cybenko, x6-3843
Assistant: Sharon Cooper (“Shay”), x6-3546
Course webpage: http://thayer.dartmouth.edu/~engs004/
ENGS4 2004 Lecture 1
Today’s Class• Sharon’s mini-lecture
• Shannon theory and coding theory basics
• Break
• Leo’s mini-lecture
• Ryan’s mini-lecture
ENGS4 2004 Lecture 1
Future mini-lectures
• Feb 19 – Sarah (social inequality/digital divide), Ryan (internet dating), Noah R. (nanotechnology), Scott (digital watermarking)
• Feb 24 – Dason (pornography), En Young (persistence), Rob (GPS), Simon (online games)
• Topics – persistence, piracy, IP telephony, blogs, IM, global adoption and trends, GPS, online games
ENGS4 2004 Lecture 1
Constructing Prefix Codes Huffman coding
1/6 1/6 1/6 1/6 1/61/6
1 2 3 4 5 6
1/3 1/3 1/3
2/3
1
000 001 010 011 10 11
3/6 x 4 +2/6 x 2= 2 2/3 < 3
ENGS4 2004 Lecture 1
Another example
• What about two symbols, a,b with P(a)=0.9 and P(b)=0.1?
• The way to build efficient codes for such cases is to use “block codes”
• That is, consider pairs of symbols: aa, ab, ba, bb or aaa, aab, aba, abb, baa, bab, bba, bbb, etc
• Let’s do an example on the board.
ENGS4 2004 Lecture 1
How about the other example?
S1 S20 1
0 1
Average number of bits per symbol = 1.
Is this the best possible? No.
ENGS4 2004 Lecture 1
Entropy of a source
Alphabet ={ s1 , s2 , s3 , ..., sn }
Prob(sk) = pk
Entropy = H = - pk log 2 (pk)
SOURCE
Shannon’s Source Coding Theorem:
H <= Average number of bits per symbolused by any decodable code.
ENGS4 2004 Lecture 1
Examples
H(die example) = - 6 x 1/6 x log(1/6)
H(Sun rise) = 0 log 0 + 1 log 1 = 0
How can we achieve the rate determined by theentropy bound?
Block codes can do better than pure Huffman coding.
“Universal codes”.... eg. Lempel-Ziv.
ENGS4 2004 Lecture 1
Break
ENGS4 2004 Lecture 1
Models of a source
We can easily measure the probability of each symbolin English. How?
How about the probabilities of each pair of symbols?
Triples?
Sentences?
True model of English = model of language = model of thought
This is very hard.
ENGS4 2004 Lecture 1
Lempel-Ziv Coding (zip codes, etc)
Want to encode a string: aaabcababcaaa into 0’s and 1’s.
Step 1: Convert to 0’s and 1’s by any prefix substitution.
a=00b=01c=11
00000001110001000111000000
ENGS4 2004 Lecture 1
Lempel-Ziv Coding (zip codes, etc)
Step 2: Parse the string into “never before seen” strings.
00000001110001000111000000
0,00,000,01,1,10,001,0001,11,0000,00
ENGS4 2004 Lecture 1
Lempel-Ziv Coding (zip codes, etc)
Step 3: Assign binary numbers to each.
0,00,000,01,1,10,001,0001,11,0000,00
0001,0010,0011,0100,0101,0110,0111,1000,1001,1010,1011
Step 4: To each string, assign number of substring plus last bit.
00000,00010,00100,00011,00001,01010,00101,
00000000100010000011000010101000101...
ENGS4 2004 Lecture 1
Lempel-Ziv Coding (zip codes, etc)
Step 5: Store this string plus length of labels in bits.
00000,00010,00100,
1111000000001000100....
This may be inefficient for small examples but for verylong inputs, it achieves the entropy for the best model.
LZW = Lempel-Ziv-Welsch, GIF image interchange standard
ENGS4 2004 Lecture 1
Example
What is the Lempel Ziv encoding of 00000...0000 (N 0’s)?
What is the entropy of the source?
How many bits per symbol will be used in the encodeddata as N goes to infinity?
Let’s work out the details.
How about 010010001000010000010000001.... ?
ENGS4 2004 Lecture 1
Properties of Lempel-ZivFor most sources (alphabets+probabilities), theLempel-Ziv algorithm will result in
average number of bits per symbolentropy of the source (any order model)
if the string/data to be compressed is long enough.How about compressing the compressed string?That is, applying Lempel-Ziv again and again?
Answer: The compressed bit string will look completely random: 0 or 1 with probability 1/2.Entropy = 1 means 1 bit per symbol on average.No improvement is possible.
ENGS4 2004 Lecture 1
Analog vs Digital• Most “real world” phenomena is continuous:
• images• vision• sound• touch
• To transmit it, we must convert continuous signalsinto digital signals.
• Important note:• There is a fundamental shift from continuous to
digital representation of the real world.
ENGS4 2004 Lecture 1
The Fundamental Shift
The telephone system is designed to carry analogvoice signals using circuit switching. The whole infrastructure is based on that.
When a modem connects your computer to the networkover a telephone line, the modem must disguisethe computer data as a speech/voice signal.
The infrastructure of the internet is totally digital.Sending voice over the internet requiresdisguising voice as digital data!!!
This is a fundamental turnaround....same will hold forTV, cable TV, audio, etc.
ENGS4 2004 Lecture 1
Analog to Digital Conversion
111110101100011010001000
d 2d 3d 4d 5d 6d 7d 8d 9d 10d 11d 12d time
are samples
ENGS4 2004 Lecture 1
Analog to Digital Conversion
111110101100011010001000
d 2d 3d 4d 5d 6d 7d 8d 9d 10d 11d 12d time
are samples
d is the “sampling interval”, 1/d is the sampling “rate”
ENGS4 2004 Lecture 1
Sampling and quantization
In this example, we are using 8 quantization levelswhich requires 3 bits per sample. Using 8 bitsper sample would lead to 256 quantization levels, etc.
If the sampling interval is 1/1000000 second (amicrosecond), the sampling rate is 1000000 samples persecond or 1 megaHertz.
Hertz means “number per second” so 20,000 Hertz means20,000 per second.
So sampling at 20 kiloHertz means “20,000 samples per second”
ENGS4 2004 Lecture 1
Analog frequenciesAll real world signals can be represented as a sum
or superposition of sine waves with differentfrequencies - Fourier representation theorem.
The frequency of a sine wave is the number of timesit oscillates in a second.
Sine wave with frequency 20 will complete a cycle orperiod once every 1/20th of a second so 20 timesa second, etc.
We say that a sine wave with frequency 20 is a20 Hertz signal.....oscillates 20 times a second.
ENGS4 2004 Lecture 1
Fourier Java Applet
http://www.falstad.com/fourier/
ENGS4 2004 Lecture 1
Nyquist Sampling Theorem
• If an analog signal is “bandlimited” (ie consists of frequencies in a finite range [0, F]), then sampling must be at or above the twice the highest frequency to reconstruct the signal perfectly.
• Does not take quantization into account.
• Sampling at lower than the Nyquist rate will lead to “aliasing”.
ENGS4 2004 Lecture 1
Sampling for Digital Voice
• High quality human voice is 4000 Hz
• Sampling rate is 8000 Hz
• 8 bit quantization means 64,000 bits per second
• Phone system built around such a specification
• Computer communications over voice telephone lines is limited to about 56kbps
ENGS4 2004 Lecture 1
Implications for Digital Audio• Human ear can hear up to 20 kHz
• Sampling at twice that rate means 40 kHz
• Quantization at 8 bits (256 levels)
• 40,000 samples/second x 8 bits/ sample translates to 320,000 bits per second or 40,000 bytes per second.
• 60 seconds of music: 2,400,000 Bytes
• 80 minutes: about 190 Mbytes
• Audio CD??
ENGS4 2004 Lecture 1
Some Digital Audio Links• http://www.musiq.com/recording/mp3/index.html• http://www.musiq.com/recording/digaudio/bitrates.html
Aliasing in Images• http://www.telacommunications.com/nutshell/pixelation.ht
m#enlargement
Other• http://www.physics.nyu.edu/faculty/sokal/#papers