sequences - george mason universitymarks/112/slides/4.sequences.pdf · sequences •sequence: an...

Post on 07-Jul-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Sequenceslists, tuples, and strings

CS 112 @ GMU

• sequences and operations• lists and operations• loops and lists

2

Topics

Sequences

sequences

• sequence: an ordered group of values→ each spot in a sequence is numbered.→example: a string is a sequence of characters

Type mutable?(modifiable) representations

String immutable 'enclosed' "in quotes"'''of various''' """kinds"""

List mutable [commas, surrounded, by, brackets][ ] [5] [10,15]

Tuple immutableseparated, by, commas

(often, between, parentheses, too)( ) (4,) (8, 12)

list examples

[ ][3][[3]][5,6,7]["hello", 1, True][ [1,2,3], ["a","b","c"],6][ [[1,2,3],[4,5,6]], [[7,8,9],[10,11,12]] ]["one", "two", "three"][True, False, False]

notes: the empty list exists! [ ] [ 3 ] != 3[[3]] != [3]

tuple examples

( )("one", "two", "three")(4,)(True, False, False)( (1,2,3), ["a","b","c"], 6)( [3], )"hello", 1, True5,

notes: the empty tuple exists! ( )one-tuples need the comma. (3) != (3,)

(3) is an integer with parentheses around it.(3,) is a tuple of length one.

indexing

indexing means accessing (by spot number) the value at a particular spot.• indexing begins at zero, and goes up each spot.• negative indexing begins at the end with -1

msg = "index"

i n d e x

0 1 2 3 4

-5 -4 -3 -2 -1

xs = [8.5, 100, -16.3, 2.5]

8.5 100 -16.3 2.5

0 1 2 3

-4 -3 -2 -1

Practice Problem

sequence lengths: what is the length of each of these tuples? → (what does len(expr) yield for each expr?)

(1,2,3)(4,)((5,6,7,8),)((9,10), (11,(12,13)), (14))(True, "(3,2,1)", 6, [1,2,3]) ("(1,2,3,4)",5)

Poll – 4A

• indexing into sequences

Slicing

>>> msg = "hello there">>> msg[3:9]'lo the'>>> msg[1:11:3]'eohe'>>> msg[8:2:-1]'eht ol'>>> msg[0:10:-1]''

>>> msg[1:1000]'ello there'>>> msg[6:]'there'>>> msg[:5]'hello'>>> msg[:]'hello there'

• we can grab a sub-sequence instead of just one index's value.• give [start : stop] or [start : stop : step] indexes, similar to

range() (but with :'s!)

Poll – 4B

• slicing

Sequence Operations

operation meaning result typex in s checks if an item in s equals x. boolx not in s checks if no items in s equal x. bools + t concatenation same seq. types*n (or: n*s) n shallow copies of s,

concatenatedsame seq. type

len(s) length of s ints.count(x) find # items in s equal to x int (#matches)

s.index(x[,i[,j]]) give index of first x in s.(if not found, crashes)

int

(these are all expressions)

POLL – 4C

sequence operations

lists: mutable sequences

• When a sequence is mutable (as lists are), we can update part of the structure, leaving the rest alone:

• There are many operations available on mutable sequences (see next slides).

xs = [1,2,3]xs[1] = 99print(xs) # prints out [1, 99, 3]

list operations

15

operation meanings[i] = x replace ith item of s with xs[i:j] = t replace slice i:j with t.

lengths needn't match!)s[i:j:k] = t replace slice i:j:k with t.

(lengths must match!)del s[i] remove ith item from s.del s[i:j] remove slice i:j from s.del s[i:j:k] remove slice i:j:k from s.

try interactively.

list operations

16

operation meaning returned value

s.append(x) add x as a single value at end of s. None value

s.extend(t) individually append each item of t to the end of s.

None value

s.insert(i,x) make space (push other spots to the right), put x value at location i.

None value

s.pop(i) remove value at index i from sequence; return the value that was there

item that was at index i

s.remove(x) find first occurrence of x, remove it. Nones.reverse() reverse the ordering of items. Nones.sort() sort the items in increasing order. None

append: attach a value to the list.extend: attach a sequence to the list. try interactively.

Programming TRAP

• many mutable sequence operations return the None value

→ value is directly modified: rather than returning a modified copy, returns the None value

→ assigning the result back to the variable discards the value!

xs = [2,5,4,1,3]ys = [2,5,4,1,3]xs.sort()

print (xs, type(xs))print (ys, type(ys))

output when run:

[1, 2, 3, 4, 5] <class 'list'>None <class 'NoneType'>

ys did get sorted, but then we threw out the whole list by storing a None value into ys.

lists in memory

• So far we've drawn simple boxes next to names for our variables:x = 5 x

• Now, we will draw an arrow from a variable to the block of values it contains.xs = [6,7,8] xs

5

6 7 8

Memory Usage• These arrows help us understand complex data, such as

lists of lists.

• Every variable always stores one value in a box.• The only new concept is that sometimes the contents

of the box is an arrow (a reference) to some other value in memory.

4 5 6

7 8 9

xsys

both

xs = [4,5,6] ys = [7,8,9]both = [xs,ys]

Poll – 4D

• mutable sequences

Sequences and Loops

Sequences and Loops

Loops are most useful with sequences.Each iteration of the loop can inspect/use/modify one value in the sequence.

xs = [5, 2, 14, 63]sumval = 0for x in xs:

sumval += xprint(sumval)

"Value" For-Loop• For-loops assign each

value of the supplied sequence to the loop variable.

• We directly traverse the values in the list themselves

# print some words out.words = ["you", "are", "great"]for word in words:

print(word)

# sum up some numbers.vals = [1.5, 2.25, 10.75, -2.0]total = 0for curr_val in vals:

total += curr_valprint("sum of vals is",total)

# what is the largest value?vals = [17, 10, 99, 14, 50]max_val = vals[0]for val in vals:

if val > max_val:max_val = val

print("largest:",max_val)

"Index" For-Loop

We can generate all the valid indexes we'd like to visit, and supply those to a for-loop instead of the values-sequence itself.

We are thus aware of our position (i) as well as the value at the current position (vals[i])

# where is the largest value located?vals = [2,5,3,6,4,1]max_loc = 0for i in range(len(vals)):

if vals[i]>vals[max_loc]:max_loc = i

print("maxval="+str(vals[max_val]))print("max val @"+str(max_loc))

Naming Loop Variables

When we intend to directly supply values of our sequence to the for-loop, we choose a loop variable name that represents one thing of the sequence.

for word in words: for val in vals:…word… …val…

When we intend to supply indexes of a sequence to the loop (and use them to access values in the actual data sequence), we choose an 'index' name for the loop variable, such as i, j, k, nums_i, etc.

for i in range(len(xs)): for bird_i in range(len(birdSpeciesList)):…xs[i]… …birdSpeciesList[bird_i]…

more loop examples# is v in the list xs?found = Falsefor x in xs:

if x==v:found = True

print("found?",found))

# count occurrences of vcount = 0for x in xs:

if x==v:count += 1

print("#occurrences:",count)

# where does v show up in xs?loc = Nonefor i in range(len(xs)):

if xs[i]==v:loc = ibreak

if loc==None:print("not found!")

else:print("location:", loc)

loops recipe• We want to get some property/answer based on

a list. Example: "I want to print the max value".• create a variable to hold the answer; give it a safe

starting value (sum starts at zero; max starts at first value in list; num_occurrences starts at zero)

• create a loop that inspects each item in the list• inside the loop, incorporate the current value to

improve your answer (found a new max; added to the sum; incremented num_occurrences)

• after loop, answer is ready!

look at previous slide: do you see the recipe in use?

when do we need index loops?

• when the location matters• when we need to update the list's contents

(updating individual slots)• when we want to visit locations of the

sequence in other orders/patterns than first-to-last (in reverse, ever other, all-but-the-last-one, etc)

Indexing in other orders

By constructing a different call to range(), we can index through our sequence in more sophisticated ways than just "in-order, all elements":

watch out! using range(), you must get the indexes exactly right (never out of bounds). Slicing gracefully ignores out-of-bounds issues, indexing does not.

vals = [10,11,12,13,14,15,16,17]

for i in range(0, len(vals),2):print(vals[i])

for i in range(len(vals)-1, -1, -1) :print (vals[i])

Nested Value Loops

xss = [[5,6,7],[8,9,10]]total = 0for xs in xss:

for x in xs:print("\t+ "+str(x))total += x

print("total:",total)

output when run:

+ 5+ 6+ 7+ 8+ 9+ 10

total: 45

• when we have multiple dimensions to our lists, we can use that many nested loops to access each item individually.

• Note the access pattern, as well as the total calculation.

Nested Index Loops

• Create an index for each dimension of your sequence.• Nest loops for each dimension.• Access each element individually (and starting from the

entire structure like xss below), no matter how many dimensions.

xss = [[5,6,7],[8,9,10]]for i in range(len(xss)):

for j in range(len(xss[i])):print(xss[i][j])

output when run:

5678910

Nested Index Loops• Our data doesn't have to have multiple dimensions for our

algorithm to find use for nested loops.

# are there any duplicates in the list?xs = [2,3,5,4,5,1,7,8]has_dupes = Falsefor i in range(len(xs)):

for j in range(len(xs)):if (i!=j) and xs[i]==xs[j]:

has_dupes = Truebreak

print("any dupes?",has_dupes)

# are there any duplicates in the list?xs = [2,3,5,4,5,1,7,8]has_dupes = Falsefor i in range(len(xs)):

for j in range(i+1, len(xs)):if xs[i]==xs[j]:

has_dupes = Truebreak

print("any dupes?",has_dupes)

• note: what is different/better about the second version?

building lists with loops

n = 42divisors = []for i in range(1,n+1):

if n%i==0:divisors . append( i )

print("divisors of %d: %s" % (n, divisors))

We can start with an empty list and .append() to it repeatedly to build up a list with a loop.

output when run:

divisors of 42: [1, 2, 3, 6, 7, 14, 21, 42]

Poll – 4E

• loops and sequences

for-loop with sequences of tuples

- We can dissect each tuple with our for-loop variable(s).- This is called tuple unpacking. Provide a pattern of variables.

tups = [('a',1), ('b',2),('c',3)]for (c,n) in tups:

print(c*n)

output when run:

abbccc

Modifying Lists

# make all values in the list non-negativexs = [1,-2, 3, -4, -5, 100, 150, -30, 123]for i in range(len(xs)):

if xs[i]<0:xs[i] = -xs[i] # make non-negative

• To update spots with a loop, we must use index-loops. • (A value loop would modify the loop variable only, not

the list)

Modifying Lists – can't use value loops

xs = [1,-2, 3, -4, -5, 100, 150, -30, 123]ys = xs[:] # here's a copy of xs' original value.for x in xs:

if x<0:x = -x # we try, but fail, to modify part of xs

if xs==ys:print("failed to modify!") # this does print.

This code shows how a value loop won't succeed. You should trace through this code to see why (with the visualizer).

Aliases Example

xs = [1,2,3]ys = [4,5,6]both = [xs,ys]xs[1] = 7print("xs is",xs)print("both is", both)ys = [8,9]print("ys is",ys)print("both is", both)

xs is [1, 7, 3]both is [[1, 7, 3], [4,5,6]]ys = [8, 9]both is [[1, 7, 3], [4, 5, 6]]

program output:

What is happening?

• variables are not the same as values.• alias: when multiple names for the same location

exist (such as xs vs both[0]) – changing the value by any name is witnessed from all others

• reassigning a variable re-establishes what the variable stores

• updating part of a value doesn't change which variables currently refer to the value

• We draw multiple arrows to the same value in our memory diagrams.

id( ) built-in function

40

• id(thing) returns a unique intvalue.

• detect aliases when id(x)==id(y) actual int value doesn't matter, only whether they are the same or not

• memory diagrams: two aliases both point to the shared value

>>> xs = [1,2,3]>>> ys = [4,5,6]>>> lists = [xs,ys]>>> id(xs)4302079040>>> id(ys)4301525288>>> id(lists)4301525360>>> id(lists[0])4302079040>>> id(lists[1])4301525288>>> xs = [7,8,9]>>> id(xs)4301525864>>> id(lists[0])4302079040

When are aliases Preserved?

• Re-assigning a variable (xs = newExpr) can point it to some different memory location, and can disassociate aliases.→ id(xs) result will change

• Updating part of a value (xs[0] = newval) reuses the same memory location, so any aliases are preserved.→ id(xs) result will stay the same

Poll – 4F

• aliases and memory

Extra Materials

Practice Problems

Here are some sample tasks you should try, either as functions or simple scripts:• Ask the user how many numbers they'll enter, then store

them all in a list. (what methods will we use?)• calculate the sum of a list of numbers• count how many numbers in a list are even.• step through a list and make all negative numbers positive

(take their absolute value)• Find the maximum number in a list of positive numbers• Find the sum of a list-of-lists-of-numbers. (2D list of nums)• Find the sum of a 3D list of numbers.

finding 2D indexes

xss = [[2,5,3],[1,4],[5,7,6,8]]val = 7row_loc = 0col_loc = 0found = Falsefor i in range(len(xss)):

for j in range(len(xss[i])):if xss[i][j]==val:

row_loc = icol_loc = jfound = Truebreak

if not found:print("not found!")

else:print("found at ("+str(row_loc)+","+str(col_loc)+")")

Bizarre Corner Cases of For-Loops• When we modify the list over which we want to

iterate, strange things can happen.Avoid modifying the list's length during the loop.

• Python actually finds out once, at the very start of running a loop, what structure it'll iterate over. This "iterator" can't be changed to some other iterator during the loop's execution.

• Following are some examples where modifying the list we're iterating over causes problems –don't code in this style!

Example – modifying length, with value-loops

Output

1098

<doesn't crash, stops 'early'>

Codexs = [6,7,8,9,10]for x in xs:

print(xs.pop( ))

The loop's iterator is an alias with xs. As xs' length changes, and we grab the next popped value from the list's end each iteration, the end of the list gets closer twice as fast!

→ it turns out Python does a bit of indexing behind the scenes when we write a value loop after all… implementation details are being exposed. boooo.

Example – modifying length, with index-loops

Output109876

Codexs = [6,7,8,9,10]for i in range(len(xs)):

print(xs.pop( ))

Our loop's iterator is a reference to the list of indexes [0,1,2,3,4], which never gets modified (though the list that xs refers to certainly does!).

Example – Iterator Unchanged

Output

xs @ 4301695384x = 1xs @ 4301695456x = 2xs @ 4301695528x = 3xs @ 4301695456x = 4xs @ 4301695528x = 5>>> xs[9, 9, 9]

Setup>>> xs = [1,2,3,4,5]>>> id(xs)4301695384

Usagefor x in xs:

print ("xs @", id(xs))print ("\tx = " +str(x))xs = [9,9,9]

The iterator is determined once and for all as we enter the loop. The loop iterates not over the thing named xs, but over the thing that xs referred to the moment we began the loop.

Aliases

• When we have multiple ways to access the same spot in memory, we call these alternate names "aliases."

→ xs and lists[0] are aliases.→ ys and lists[1] are aliases.

xs = [1,2,3]ys = [4,5,6]lists = [xs,ys]

lists[0][0] = 88print ("xs:", xs)

ys[2] = 77print ("lists:", lists)

Aliases Example

xs = [1,2,3]ys = [4,5,6]both = [xs,ys]xs[1] = 7print("xs is",xs)print("both is", both)ys = [8,9]print("ys is",ys)print("both is", both)

xs is [1, 7, 3]both is [[1, 7, 3], [4,5,6]]ys = [8, 9]both is [[1, 7, 3], [4, 5, 6]]

program output:

Practice Problem

What are the final values of x, y, and listval?(drawing out memory/arrows to names helps.)

xs = [1,2,3]ys = [4,5,6]listval = [xs,ys]xs[2] = 7ys[:] = ["hi","mom"]listval[0][1] = 8listval[1] = [True,False]ys[1] = 9

xs contains [1,8,7]ys contains ['hi',9]listval contains [[1,8,7],[True,False]]

Practice Problem

The following code doesn't put the value 99 into biglist anywhere. Why?

because biglist looks up the value of xs and *3's it, copying the contents of xs without making aliases. There are no complex values (e.g. lists) inside biglist.

biglist = [xs]*3 would create the list of lists, so each sub-list is a complex value that can exhibit reference updating. (of course, the meaning is slightly different too—originally biglist was one-dimensional, now it would be two-dimensional.)

xs = [1,2,3]biglist = xs*3xs[1] = 99

Shallow Copies

xss = [[1,2,3],[4,5,6]]ys = xss + xssys[0][0] = 8print(xss)print(ys)ys[1] = [100,200,300]print(xss)print(ys)

[[8, 2, 3], [4, 5, 6]][[8, 2, 3], [4, 5, 6], [8, 2, 3], [4, 5, 6]][[8, 2, 3], [4, 5, 6]][[8, 2, 3], [100, 200, 300], [8, 2, 3], [4, 5, 6]]

outp

ut:

code

:

Another effect driven by the data references – the "copies" made are simply multiple references to the same objects.

We call these "shallow copies".

top related