python lecture 06

57
Python & Perl Lecture 06 Department of Computer Science Utah State University

Upload: tanwir-zaman

Post on 25-May-2015

70 views

Category:

Education


2 download

TRANSCRIPT

Page 1: Python lecture 06

Python & Perl

Lecture 06

Department of Computer ScienceUtah State University

Page 2: Python lecture 06

Outline

● Data Abstraction: Building Huffman Trees with Lists and Tuples

● List Comprehension

Page 3: Python lecture 06

Data Abstraction

Building Huffman Trees with Lists and Tuples

Page 4: Python lecture 06

Background ● In information theory, coding refers to methods that

represent data in terms of bit sequences (sequences of 0's and 1's)

● Encoding is a method of taking data structures and mapping them to bit sequences

● Decoding is a method of taking bit sequences and outputting the corresponding data structure

Page 5: Python lecture 06

Example: Standard ASCII & Unicode ● Standard ASCII encodes each character as a 7-bit sequence● Using 7 bits allows us to encode 27 possible characters● Unicode has three standards: UTF-8 (uses 8-bit sequences),

UTF-16 (uses 16-bit sequences), and UTF-32 (uses 32-bit sequences)

● UTF stands for Unicode Transformation Format

● Python 2.X's Unicode support: “Python represents Uni-code strings as either 16- or 32-bit integers), de-pending on how the Python interpreter was com-piled.”

Page 6: Python lecture 06

Two Types of Codes ● There are two types of codes: fixed-length and vari-

able-length

● Fixed-length (e.g., ASCII, Unicode) codes encode every character in terms of the same number of bits

● Variable-length codes (e.g., Morse, Huffman) encode char-acters in terms of variable numbers of bits: more fre-quent symbols are encoded with fewer bits

Page 7: Python lecture 06

Example: Fixed-Length Code ● A – 000 C – 010 E – 100 G – 110

● B – 001 D – 011 F – 101 H – 111

● AADF = 000000011101● The encoding of AADF is 12 bits

Page 8: Python lecture 06

Example: Variable-Length Code ● A – 0 C – 1010 E – 1100 G – 1110● B – 100 D – 1011 F – 1101 H – 1111

● AADF = 0010111101● The encoding of AADF is 10 bits

Page 9: Python lecture 06

End of Character in Variable-Length Code ● One of the challenges in variable-length codes is knowing

where one character ends and the one begins● Morse uses a special character (separator code) ● Prefix coding is another solution: the prefix of every

character is unique – no code of any character starts another character

Page 10: Python lecture 06

Huffman Code ● Huffman code is a variable-length code that takes ad-

vantage of relative frequencies of characters ● Huffman code is named after David Huffman, the re-

searcher who discovered it● Huffman code is represented as a binary tree where leaves

are individual characters and their frequencies● Each non-leaf node is a set of characters in all of its sub-

nodes and the sum of their relative frequencies

Page 11: Python lecture 06

Huffman Tree Example

G: 1 H: 1 E: 1 F: 1

{G, H}: 2 {E, F}: 2

{E, F, G, H}: 4

C: 1 D: 1

{C, D}: 2 B: 3 B: 3

{B, C, D}: 5

{B, C, D, E, F, G, H}: 9 A: 8

{A, B, C, D, E, F, G, H}: 17 0 1

0 1

0 1

0 1

0 1

0 1 1 0 1

Page 12: Python lecture 06

Using Huffman Tree to Encode/Decode Characters

● The tree on the previous slide, these are the encodings: A is encoded as 0 B is encoded as 100

C is encoded as 1010

D is encoded as 1011

E is encoded as 1100

F is encoded as 1101

G is encoded as 1110

H is encoded as 1111

Page 13: Python lecture 06

Building The Huffman Tree

Page 14: Python lecture 06

Simple Huffman Tree

C: 1 D: 1 {D, C}: 2

B: 2

{A, B, D, C}: 8 {B, D, C}: 4

A: 4

Page 15: Python lecture 06

Constructing Leaves ### a leaf is a tuple whose first element is symbol

### represented as a string and whose second element is

### the symbol's frequency

def make_leaf(symbol, freq):

return (symbol, freq)

def is_leaf(x):

return isinstance(x, tuple) and \

len(x) == 2 and \

isinstance(x[0], str) and \

isinstance(x[1], int)

Page 16: Python lecture 06

Constructing Leaves ### return the character (symbol) of the leaf

def get_leaf_symbol(leaf):

return leaf[0]

### return the frequency of the leaf's character

def get_leaf_freq(leaf):

return leaf[1]

Page 17: Python lecture 06

Constructing Huffman Trees ### A Non-Leaf node (internal node) is represented as

### a list of four elements:

### 1. left brach

### 2. right branch

### 3. list of symbols

### 4. combined frequency of symbols

[left_branch, right_branch, symbols, frequency]

Page 18: Python lecture 06

Accessing Huffman Trees def get_leaf_symbol(leaf):

return leaf[0]

def get_leaf_freq(leaf):

return leaf[1]

def get_left_branch(huff_tree):

return huff_tree[0]

def get_right_branch(huff_tree):

return huff_tree[1]

Page 19: Python lecture 06

Accessing Huffman Trees def get_symbols(huff_tree): if is_leaf(huff_tree): return [get_leaf_symbol(huff_tree)] else: return huff_tree[2]

def get_freq(huff_tree): if is_leaf(huff_tree): return get_leaf_freq(huff_tree) else: return huff_tree[3]

Page 20: Python lecture 06

Constructing Huffman Trees ### A Huffman tree is constructed from its left branch, which can

### be a huffman tree or a leaf, and its right branch, another

### huffman tree or a leaf. The new tree has the symbols of the

### left branch and the right branch and the frequency of the left

### branch and the right branch

def make_huffman_tree(left_branch, right_branch):

return [left_branch,

right_branch,

get_symbols(left_branch) + get_symbols(right_branch),

get_freq(left_branch) + get_freq(right_branch)]

Page 21: Python lecture 06

MAKE_HUFFMAN_TREE Example

C: 1 D: 1 {D, C}: 2

B: 2

{A, B, D, C}: 8 {B, D, C}: 4

A: 4

ht01 = make_huffman_tree(make_leaf('A', 4),

make_huffman_tree(make_leaf('B', 2),

make_huffman_tree(make_leaf('D', 1),

make_leaf('C', 1))))

Page 22: Python lecture 06

MAKE_HUFFMAN_TREE Example

C: 1 D: 1 {D, C}: 2 B: 2

{A, B, D, C}: 8 {B, D, C}: 4

A: 4

Python data structure that represents the Huffman tree below:

[('A', 4),

[('B', 2), [('D', 1), ('C', 1), ['D', 'C'], 2], ['B', 'D', 'C'], 4],

['A', 'B', 'D', 'C'],

8]

Page 23: Python lecture 06

Customizing sort() def leaf_freq_comp(leaf1, leaf2):

return cmp(get_leaf_freq(leaf1),

get_leaf_freq(leaf2))

huff_leaves = [make_leaf('A', 8), make_leaf('C', 1), make_leaf('B', 3),

make_leaf('D', 1), make_leaf('F', 1), make_leaf('E', 1),

make_leaf('H', 1), make_leaf('G', 1)]

print huff_leaves

huff_leaves.sort(leaf_freq_comp)

OUTPUT:

[('A', 8), ('C', 1), ('B', 3), ('D', 1), ('F', 1), ('E', 1), ('H', 1), ('G', 1)]

[('C', 1), ('D', 1), ('F', 1), ('E', 1), ('H', 1), ('G', 1), ('B', 3), ('A', 8)]

Page 24: Python lecture 06

Customizing sort() def leaf_symbol_comp(leaf1, leaf2):

return cmp(get_leaf_symbol(leaf1),

get_leaf_symbol(leaf2))

huff_leaves2 = [make_leaf('A', 8), make_leaf('C', 1), make_leaf('B', 3),

make_leaf('D', 1), make_leaf('F', 1), make_leaf('E', 1),

make_leaf('H', 1), make_leaf('G', 1)]

print huff_leaves2

huff_leaves2.sort(leaf_symbol_comp)

print huff_leaves2

OUTPUT:

[('A', 8), ('C', 1), ('B', 3), ('D', 1), ('F', 1), ('E', 1), ('H', 1), ('G', 1)]

[('A', 8), ('B', 3), ('C', 1), ('D', 1), ('E', 1), ('F', 1), ('G', 1), ('H', 1)]

Page 25: Python lecture 06

Encoding & Decoding Messages with Huffman Trees

Page 26: Python lecture 06

Sample Huffman Tree

G: 1 H: 1 E: 1 F: 1

{G, H}: 2 {E, F}: 2

{E, F, G, H}: 4

C: 1 D: 1

{C, D}: 2 B: 3 B: 3

{B, C, D}: 5

{B, C, D, E, F, G, H}: 9 A: 8

{A, B, C, D, E, F, G, H}: 17 0 1

0 1

0 1

0 1

0 1

0 1 1 0 1

Page 27: Python lecture 06

Symbol Encoding1. Given a symbol s and a Huffman tree ht, set current_node to the root node and encoding to an empty list (you can also check if s is in the root node's symbol leaf and, if not, signal error)

2. If current_node is a leaf, return encoding

3. Check if s is in current_node's left branch or right branch

4. If in the left, add 0 to encoding, set current_node to the root of the left branch, and go to step 2

5. If in the right, add 1 to encoding, set current_node to the root of the right branch, and go to step 2

6. If in neither branch, signal error

Page 28: Python lecture 06

Example● Encode B with the sample Huffman tree● Set current_node to the root node● B is in current_node's the right branch, so add 1 to encoding &

recurse into the right branch (current_node is set to the root of the right branch – {B, C, D, E, F, G, H}: 9)

● B is in current_node's left branch, so add 0 to encoding and re-curse into the left branch (current_node is {B, C, D}: 5)

● B is in current_node's left branch, so add 0 to encoding & recurse into the left branch (current_node is B: 3)

● current_node is a leaf, so return 100 (value of encoding)

Page 29: Python lecture 06

Message Encoding● Given a sequence of symbols message and a Huffman

tree ht

● Concatenate the encoding of each symbol in message from left to right

● Return the concatenation of encodings

Page 30: Python lecture 06

Example● Encode ABBA with the sample Huffman tree● Encoding for A is 0● Encoding for B is 100

● Encoding for B is 100

● Encoding for A is 0● Concatenation of encodings is 01001000

Page 31: Python lecture 06

Message Decoding1. Given a sequence of bits message and a Huffman tree ht, set current_node to the root and decoding to an empty list

2. If current_node is a leaf, add its symbol to decoding and set current_node to ht's root

3. If current_node is ht's root and message has no more bits, return decoding

4. If no more bits in message & current_node is not a leaf, signal error

5. If message's current bit is 0, set current_node to its left child, read the bit, & go to step 2

6. If message's current bit is 1, set current_node to its right child, read the bit, & go to step 2

Page 32: Python lecture 06

Example● Decode 0100 with the sample Huffman tree● Read 0, go left to A:8 & add A to decoding and reset

current_node to the root ● Read 1, go right to {B, C, D, E, F, G, H}: 9

● Read 0, go left to {B, C, D}:5

● Read 0, go left to B:3

● Add B to decoding & reset current_node to the root● No more bits & current_node is the root, so return AB

Page 33: Python lecture 06

List Comprehension

Page 34: Python lecture 06

List Comprehension● List comprehension is an syntactic construct in some

programming languages for building lists from list specifi-cations

● List comprehension derives its conceptual roots from the set-former (set-builder) notation in mathematics

[Y for X in LIST]

● List comprehension is available in other programming languages such as Common Lisp, Haskell, and Ocaml

Page 35: Python lecture 06

Set-Former Notation Example

predicate theis 100

setinput theis

variable theis

functionoutput theis 4

100,|4

2

2

x

N

x

x

xNxx

Page 36: Python lecture 06

Set-Former Notation Examples

.or by followed is or

wherestrings ofset theis ,,,|

s.' ofnumber the

toequal is s' ofnumber theand s' precede s'that

such ,over stringsempty -non ofset theis 1|

3.or 2, 1, 0, islength whose

,over strings all ofset theis 3|, *

ccaaba

ccaaybaxxy

b

aba

banba

baxbax

nn

Page 37: Python lecture 06

For-Loop Implementation

### building the list of the set-former example with for-loop

>>> rslt = []

>>> for x in xrange(201):

if x ** 2 < 100:

rslt.append(4 * x)

>>> rslt

[0, 4, 8, 12, 16, 20, 24, 28, 32, 36]

Page 38: Python lecture 06

List Comprehension Equivalent

### building the same list with list comprehen-sion

>>> s = [ 4 * x for x in xrange(201) if x ** 2 < 100]

>>> s

[0, 4, 8, 12, 16, 20, 24, 28, 32, 36]

Page 39: Python lecture 06

For-Loop

### building list of squares of even numbers in [0, 10]

### with for-loop

>>> rslt = []

>>> for x in xrange(11):

if x % 2 == 0:

rslt.append(x**2)>>> rslt

[0, 4, 16, 36, 64, 100]

Page 40: Python lecture 06

List Comprehension Equivalent

### building the same list with list comprehen-sion

>>> [x ** 2 for x in xrange(11) if x % 2 == 0]

[0, 4, 16, 36, 64, 100]

Page 41: Python lecture 06

For-Loop## building list of squares of odd numbers in [0, 10]

>>> rslt = []

>>> for x in xrange(11):

if x % 2 != 0:rslt.append(x**2)

>>> rslt

[1, 9, 25, 49, 81]

Page 42: Python lecture 06

List Comprehension Equivalent

## building list of squares of odd numbers [0, 10]

## with list comprehension

>>> [x ** 2 for x in xrange(11) if x % 2 != 0]

[1, 9, 25, 49, 81]

Page 43: Python lecture 06

List Comprehension with For-Loops

Page 44: Python lecture 06

For-Loop>>> rslt = []

>>> for x in xrange(6):

if x % 2 == 0:

for y in xrange(6):if y % 2 != 0:

rslt.append((x, y))>>> rslt

[(0, 1), (0, 3), (0, 5), (2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)]

Page 45: Python lecture 06

List Comprehension Equivalent

>>> [(x, y) for x in xrange(6) if x % 2 == 0 \

for y in xrange(6) if y % 2 != 0]

[(0, 1), (0, 3), (0, 5), (2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)]

Page 46: Python lecture 06

List Comprehension with Matrices

Page 47: Python lecture 06

List Comprehension with Matrices● List comprehension can be used to scan rows and columns in ma-

trices

>>> matrix = [

[10, 20, 30],

[40, 50, 60],

[70, 80, 90]

]

### extract all rows

>>> [r for r in matrix]

[[10, 20, 30], [40, 50, 60], [70, 80, 90]]

Page 48: Python lecture 06

List Comprehension with Matrices>>> matrix = [

[10, 20, 30],

[40, 50, 60],

[70, 80, 90]

]

### extract column 0

>>> [r[0] for r in matrix]

[10, 40, 70]

Page 49: Python lecture 06

List Comprehension with Matrices>>> matrix = [

[10, 20, 30],

[40, 50, 60],

[70, 80, 90]

]

### extract column 1

>>> [r[1] for r in matrix]

[20, 50, 80]

Page 50: Python lecture 06

List Comprehension with Matrices>>> matrix = [

[10, 20, 30],

[40, 50, 60],

[70, 80, 90]

]

### extract column 2

>>> [r[2] for r in matrix]

[30, 60, 90]

Page 51: Python lecture 06

List Comprehension with Matrices

### turn matrix columns into rows

>>> rslt = []

>>> for c in xrange(len(matrix)):

rslt.append([matrix[r][c] for r in xrange(len(matrix))])

>>> rslt

[[10, 40, 70], [20, 50, 80], [30, 60, 90]]

Page 52: Python lecture 06

List Comprehension with Matrices● List comprehension can work with iterables (e.g., dictio-

naries)

>>> dict = {'a' : 'A', 'bb' : 'BB', 'ccc' : 'CCC'}

>>> [(item[0], item[1], len(item[0]+item[1])) \

for item in dict.items()]

[('a', 'A', 2), ('ccc', 'CCC', 6), ('bb', 'BB', 4)]

Page 53: Python lecture 06

List Comprehension

● If the expression inside [ ] is a tuple, parentheses are a must

>>> cubes = [(x, x**3) for x in xrange(5)]

>>> cubes

[(0, 0), (1, 1), (2, 8), (3, 27), (4, 64)]

● Sequences can be unpacked in list comprehension

>>> sums = [x + y for x, y in cubes]

>>> sums

[0, 2, 10, 30, 68]

Page 54: Python lecture 06

List Comprehension ● for-clauses in list comprehensions can iterate over

any sequences:

>>> rslt = [ c * n for c in 'math' for n in (1, 2, 3)]

>>> rslt

['m', 'mm', 'mmm', 'a', 'aa', 'aaa', 't', 'tt','ttt', 'h', 'hh', 'hhh']

Page 55: Python lecture 06

List Comprehension & Loop Variables ● The loop variables used in the list comprehension for-loops

(and in regular for-loops) stay after the execution.>>> for i in [1, 2, 3]: print i

1

2

3

>>> i + 4

7

>>> [j for j in xrange(10) if j % 2 == 0]

[0, 2, 4, 6, 8]

>>> j * 2

18

Page 56: Python lecture 06

When To Use List Comprehension

● For-loops are easier to understand and debug● List comprehensions may be harder to understand● List comprehensions are faster than for-loops in the inter-

preter● List comprehensions are worth using to speed up simpler

tasks● For-loops are worth using when logic gets complex

Page 57: Python lecture 06

Reading & References● www.python.org● http://docs.python.org/library/stdtypes.html#typesseq● doc.python.org/howto/unicode.html● Ch 02, M. L. Hetland. Beginning Python From Novice to Pro-

fessional, 2nd Ed., APRESS● Ch 02, H. Abelson and G. Sussman. Structure and Interpre-

tation of Computer Programs, MIT Press● S. Roman, Coding and Information Theory, Springer-Verlag