csc1015f – chapter 5, strings and input michelle kuttel mkuttel@cs.uct.ac.za

Post on 14-Dec-2015

220 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CSC1015F – Chapter 5, Strings and Input

Michelle Kuttel

mkuttel@cs.uct.ac.za

The String Data Type

Used for operating on textual information Think of a string as a sequence of

characters

To create string literals, enclose them in single, double, or triple quotes as follows: a = "Hello World" b = 'Python is groovy' c = """Computer says 'Noooo'"""

2

Comments and docstrings It is common practice for the first statement of

function to be a documentation string describing its usage. For example:

def hello:

“””Hello World function”””

print(“Hello”)

print(“I love CSC1015F”)

This is called a “docstring” and can be printed thus:print(hello.__doc__)

3

Comments and docstrings Try printing the doc string for functions you

have been using, e.g.:

print(input.__doc__)

print(eval.__doc__)

4

Checkpoint Str1: Strings and loops. What does the following function do?def oneAtATime(word): for c in word: print("give us a '",c,"' ... ",c,"!", sep='') print("What do you have? -",word)

5

Checkpoint Str1a: Indexing examples does this function do?def str1a(word):

for i in word:

if i in "aeiou":

continue

print(i,end='')

6

0 1 2 3 4 5 6 7 8

H e l l o B o b

Some BUILT IN String functions/methodss.capitalize() Capitalizes the first character. s.count(sub) Count the number of occurences of sub

in ss.isalnum() Checks whether all characters are

alphanumeric. s.isalpha() Checks whether all characters are

alphabetic. s.isdigit() Checks whether all characters are digits.s.islower() Checks whether all characters are low-

ercase. s.isspace() Checks whether all characters are

whitespace.

7

Some BUILT IN String functions/methodss.istitle() Checks whether the string is a

title- cased string (first letter of each word capitalized).

s.isupper() Checks whether all characters are uppercase.

s.join(t) Joins the strings in sequence t with s as a separator.

s.lower() Converts to lowercase. s.lstrip([chrs]) Removes leading

whitespace or characters supplied in chrs. s.upper() Converts a string to uppercase.

8

Some BUILT IN String functions/methodss.replace(oldsub,newsub) Replace all

occurrences of oldsub in s with newsub

s.find(sub) Find the first occurrence of sub in s

9

BUILT IN String functions/methods

Try printing the doc string for str functions:

print(str.isdigit.__doc__)

10

The String Data TypeAs string is a sequence of characters, we can

access individual characters called indexing

form:<string>[<expr>]

The last character in a string of n characters has index n-1

11

0 1 2 3 4 5 6 7 8

H e l l o B o b

String functions: len len tells you how many characters there are

in a string:

len(“Jabberwocky”)

len(“Twas brillig and the slithy toves did gyre and gimble in the wabe”)

12

Checkpoint Str2: Indexing examplesWhat does this function do?

def str2(word):

for i in range(0,len(word),2):

print(word[i],end='')

13

0 1 2 3 4 5 6 7 8

H e l l o B o b

More Indexing examples - indexing from the endWhat is the output of these lines?greet =“Hello Bob”

greet[-1]

greet[-2]

greet[-3]

14

0 1 2 3 4 5 6 7 8

H e l l o B o b

Checkpoint Str3What is the output of these lines?def str3(word):

for i in range(len(word)-1,-1,-1):

print(word[i],end='')

15

0 1 2 3 4 5 6 7 8

H e l l o B o b

Chopping strings into pieces: slicingThe previous examples can be done much more

simply:

slicing indexes a range – returns a substring, starting at the first position and running up to, but not including, the last position.

16

Examples - slicingWhat is the output of these lines?greet =“Hello Bob”

greet[0:3]

greet[5:9]

greet[:5]

greet[5:]

greet[:]

17

0 1 2 3 4 5 6 7 8

H e l l o B o b

Checkpoint Str4: Strings and loops. What does the following function do?def sTree(word): for i in range(len(word)): print(word[0:i+1])

18

Checkpoint Str5: Strings and loops. What does the following code output?def sTree2(word):

step=len(word)//3

for i in range(step,step*3+1,step):

for j in range(i):

print(word[0:j+1])

print("**\n**\n")

sTree2(“strawberries”)

19

More info on slicing The slicing operator may be given an optional

stride, s[i:j:stride], that causes the slice to skip elements. Then, i is the starting index; j is the ending index; and

the produced subsequence is the elements s[i], s[i+stride], s[i+2*stride], and so forth until index j is reached (which is not included).

The stride may also be negative. If the starting index is omitted, it is set to the

beginning of the sequence if stride is positive or the end of the sequence if stride is negative.

If the ending index j is omitted, it is set to the end of the sequence if stride is positive or the beginning of the sequence if stride is negative.

20

More on slicing Here are some examples with strides:

a = "Jabberwocky”b = a[::2] # b = 'Jbewcy'c = a[::-2] # c = 'ycwebJ'd = a[0:5:2] # d = 'Jbe'e = a[5:0:-2] # e = 'rba'f = a[:5:1] # f = 'Jabbe'g = a[:5:-1] # g = 'ykcow'h = a[5::1] # h = 'rwocky'i = a[5::-1] # i = 'rebbaJ'j = a[5:0:-1] # 'rebba'

21

Checkpoint Str6: stridesWhat is the output of these lines?greet =“Hello Bob”

greet[8:5:-1]

22

0 1 2 3 4 5 6 7 8

H e l l o B o b

Checkpoint Str7: Slicing with stridesHow would you do this function in one line with no

loops?

def str2(word):

for i in range(0,len(word),2):

print(word[i],end='')

23

0 1 2 3 4 5 6 7 8

H e l l o B o b

Checkpoint Str8: What does this code display?

#checkpointStr8.py

def crunch(s):

m=len(s)//2

print(s[0],s[m],s[-1],sep='+')

crunch("omelette")

crunch("bug")

24

Example: filters Pirate, Elmer Fudd, Swedish Cheff produce parodies of English speech

How would you write one in Python?

25

Example: Genetic Algorithms (GA’s) GA’s attempt to mimic the process of natural

evolution in a population of individuals use the principles of selection and evolution to

produce several solutions to a given problem. biologically-derived techniques such as inheritance,

mutation, natural selection, and recombination a computer simulation in which a population

of abstract representations (called chromosomes) of candidate solutions (called individuals) to an optimization problem evolves toward better solutions.

over time, those genetic changes which enhance the viability of an organism tend to predominate

Bioinformatics Example: Crossover (recombination)

Evolution works at the chromosome level through the reproductive process portions of the genetic information of each parent are

combined to generate the chromosomes of the offspring

this is called crossover

Crossover MethodsSingle-Point Crossover

randomly-located cut is made at the pth bit of each parent and crossover occurs

produces 2 different offspring

Gene splicing example (for genetic algorithms) We can now do a cross-over!

Crossover3.py

29

Example: palindrome program

palindrome |ˈpalɪndrəʊm|nouna word, phrase, or sequence that reads the same backward as forward,

e.g., madam or nurses run

In Python, write a program to check whether a word is a palindrome.

You don’t need to use loops…

30

String representation and message encoding On the computer hardware, strings are also

represented as zeros and ones. Computers represent characters as numeric

codes, a unique code for each digit. an entire string is stored by translating each

character to its equivalent code and then storing the whole thing as as a sequence of binary numbers in computer memory

There used to be a number of different codes for storing characters which caused serious headaches!

31

ASCII (American Standard Code for Information Interchange) An important character encoding standard

are used to represent numbers found on a typical (American) computer keyboard as well as some special control codes used for sending and recieveing information

A-Z uses values in range 65-90 a-z uses values in range 97-122

in use for a long time: developed for teletypes

American-centric Extended ASCII codes have been developed

32

33

Unicode A character set that includes all the ASCII

characters plus many more exotic characters http://www.unicode.org

34

Python supports Unicode standard

ord returns numeric code

of a character chr

returns character corresponding to a code Unicodes for Cuneiform

Characters in memory Smallest addressable piece of memory is

usually 8 bits, or a byte how many characters can be represented by a

byte?

35

Characters in memory Smallest addressable piece of memory is

usually 8 bits, or a byte how many characters can be represented by a

byte? 256 different values (28) is this enough?

36

Characters in memory Smallest addressable piece of memory is

usually 8 bits, or a byte 256 different values is enough for ASCII (only a 7

bit code) but not enough for UNICODE, with 100 000+

possible characters UNICODE uses different schemes for packing

UNICODE characters into sequences of bytes UTF-8 most common

uses a single byte for ASCIIup to 4 bytes for more exotic characters

37

Comparing strings conditions may compare numbers or

strings when strings are compared, the order is lexographic

strings are put into order based on their Unicode values

e.g “Bbbb” < “bbbb”“B” <”a”

38

The min function…min(iterable[, key=func]) -> valuemin(a, b, c, ...[, key=func]) -> value

With a single iterable argument, return its smallest item.

With two or more arguments, return the smallest argument.

39

Checkpoint: What do these statements evaluate as?

min(“hello”)

min(“983456”)

min(“Peanut”)

40

Example 2: DNA Reverse Complement Algorithm

A DNA molecule consists of two strands of nucleotides. Each nucleotide is one of the four molecules adenine, guanine, thymine, or cytosine. Adenine always pairs with

guanine and thymine always pairs with cytosine.

A pair of matched nucleotides is called a base pair

Task: write a Python program to calculate the reverse complement of any DNA strand

41

Scrabble letter scores Different languages

should have different scores for the letters how do you work this

out? what is the algorithm?

42

Related Example: Calculating character (base) frequency DNA has the alphabet ACGT

BaseFrequency.py

43

Why would you want to do this? You can calculate the

melting temperature of DNA from the base pair percentage in a DNA References:

Breslauer et al. Proc. Natl. Acad. Sci. USA 83, 3746-3750

Baldino et al. Methods in Enzymol. 168, 761-777).

44

Input/Output as string manipulation eval evaluates a string as a Python expression.

Very general and can be used to turn strings into nearly any other Python data type

The “Swiss army knife” of string conversion eval("3+4")

Can also use Python numeric type conversion functions: int(“4”) float(“4”)

But string must be a numeric literal of the appropriate form, or will get an error

Can also convert numbers to strings with str function

45

String formatting with formatThe built-in s.format() method is used to

perform string formatting. The {} are slots show where the values will

go. You can “name” the values, or access them

by their position (counting from zero).

>>> a = "Your name is {0} and your age is {age}"

>>> a.format("Mike", age=40) 'Your name is Mike and your age is 40'

46

Example 4: Better output for Calculating character (base) frequency BaseFrequency2.py

47

More on formatYou can add an optional format specifier to each

placeholder using a colon (:) to specify column widths, decimal places, and alignment.

general format is: [[fill[align]][sign][0][width] [.precision][type]

where each part enclosed in [] is optional. The width specifier specifies the minimum field

width to use the align specifier is one of '<', '>’, or '^' for left,

right, and centered alignment within the field. An optional fill character fill is used to pad the

space

48

More on formatFor example:name = "Elwood"

r = "{0:<10}".format(name) # r = 'Elwood '

r = "{0:>10}".format(name) # r = ' Elwood'

r = "{0:^10}".format(name) # r = ' Elwood '

r = "{0:=^10}".format(name) # r = '==Elwood==‘

49

format: type specifier indicates the type of data.

50

More on format The precision part supplies the number of digits of

accuracy to use for decimals. If a leading '0' is added to the field width for numbers, numeric values are padded with leading 0s to fill the space.

x = 42

r = '{0:10d}'.format(x) # r = ' 42'

r = '{0:10x}'.format(x) # r = ' 2a'

r = '{0:10b}'.format(x) # r = ' 101010'

r = '{0:010b}'.format(x) # r = '0000101010'

y = 3.1415926

r = '{0:10.2f}'.format(y) # r = ' 3.14’

r = '{0:10.2e}'.format(y) # r = ' 3.14e+00'

r = '{0:+10.2f}'.format(y) # r = ' +3.14'

r = '{0:+010.2f}'.format(y) # r = '+000003.14'

r = '{0:+10.2%}'.format(y) # r = ' +314.16%'

51

Example: FormatEg.py

52

Checkpoint: Write down the exact output for the following codetxt="{name}-{0}*{y}+{1}”

print(txt.format("cat","dog",name=”hat",y="rat"))

print(txt.format(1,0,name=2,y=3))

print(txt.format(2,3))

53

Format to improve formatting BaseFrequency2.py

restuarant2.py

54

top related