1 week3: files and strings. list list is a sequence of data in any type. [ “hello”, 1, 3.7,...

57
1 Week3: Files and Strings

Upload: charlene-potter

Post on 11-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

1

Week3: Files and Strings

Page 2: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

2

List

List is a sequence of data in any type.

[ “Hello”, 1, 3.7, None, True, “You” ]

Accessing a list is done by the bracket [] operator.

Using an index, we can access any element of a list.

We can read a value like L[0], L[1], …

We can change a value like L[0] = 1, L[2] = “Mine”

Page 3: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

3

Tuple

Tuple is a list but not modifiable.

Once a tuple is created, it can’t be mutated.

T = (“Samsung”, “2013/04”, 30)

Accessing by indices, T[0], T[1], …

No way to change its element

T[0] = “Apple”

Don’t be confused with packing – unpacking!

Page 4: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

4

String

The string type is same as a tuple with characters.

Name = “Tom”

print Name[0], Name[1] # will print “T o”

Strings provide many convenient functions.

split(): separate a sentence into words

strip(): remove the leading and trailing

whitespaces

find(): find a word from a sentence

Page 5: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

5

String data type

Page 6: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

6

String

String is a tuple of characters.

Instead of () construction operator, use “ or ‘

Each character in a string can be accessed by []

operator.

print L[0] # will print ‘T’

print L[2] # will print ‘M’

T O M … ”“ S

Page 7: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

7

Substring

Page 8: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

8

Substring

Substring is a part of a string including itself and “”

string.

A substring of a string can be accessed by [] operator

with a slice representation.

A slice is similar to range function but has a simpler

form.

range(1,3) [1:3], range(1,len(S))[1:]

S = “abcdefg”

S[1:3]”bcd” S[1:] “bcdefg”

Page 9: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

9

More Substring Examples

Substring is a part of a string including itself and “”

string.

A substring of a string can be accessed by [] operator

with a slice representation.

A slice is similar to range function but has a simpler

form.

range(1,3) [1:3], range(1,len(S))[1:]

S = “abcdefg”

S[1:3]”bcd” S[1:] “bcdefg”

Page 10: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

10

Indexing for List/Tuple/String

Page 11: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

11

More Substring Examples

S = “I love honey”

S[-1] “y” # a negative index counts from the last

S[0:5:2] “Ilv” # we can use steps; 0, 2, 4th chars.

S[-1:-6:-1] ”yenoh” # using a negative step

S[:5] “I lov” # an empty beginning means 0

S[2:] “love honey” # an empty limit means the

length of a string

Page 12: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

12

List and tuple indexing

Index is universal for a list and a tuple as well

L = [ 4,3,2,”hello” ] or T = ( 4,3,2,”hello” )

L[1:] [3,2,”hello”]

T[:2] (4,3)

L[::2] [ 3, “hello” ] [ 4, 2 ]

T[::-1] (“hello”, 2, 3, 4 )

Page 13: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

13

Indexing Summary

An index used in the bracket [ ] operator has a

form of

:[ : ]

The beginning indexor 0

The limit of slicingor the length of a given sequence

The step of slicingor 1

Page 14: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

14

Special Characters

Page 15: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

15

Special Characters

Whitespaces

Tab ‘\t’: a fixed number of spaces (8 spaces)

Carriage return ‘\r’: move to the next line

New line ‘\n’: the beginning of the (next) line

Space ‘ ‘

\t \r \n

Page 16: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

16

String Comparisons

Page 17: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

17

String comparisons

The equality operators (‘==‘ and ‘!=‘) work same with

strings in the case sensitive manner.

“Hello” == “hello” False

“Abc” == “Abcd” False

“Hello” != “hello” True

“Abc” != “Abcd” True

a = “hello”

a == “” False

a == “hello” True

Page 18: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

18

String comparisons

The comparison operators (>, <, >=, <=) compare a pair

of string in the lexicographical order.

A string which appears first in a dictionary is smaller than

another.

“abc” < “bcd” True

“Abc” < “abc” True (Uppercases come first)

“abc” < “abcd” True

(If lengths are different, a smaller length comes

first)

“1abc” < “abcd” True (Numbers come first)

Page 19: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

19

String comparisons

We can use the min2(x,y) function used in the

previous quiz.

min2(min2(“Tom”, “Batty”), “Kim”) “Batty”

Similarly, max2(x,y) function could be used.

min2(min2(“Tom”, “Batty”), “Kim”) “Tom”

Page 20: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

20

String comparisons

count_Tom(count, x) could be defined:

def count_Tom(count,x):

if x == “Tom”:

return count + 1

else

return count

count_if([“Tom”, “Batty”, “Tom”, “Kim”],

count_Tom)

2

Page 21: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

21

Python Feature: Ternary If-Else Statement

Page 22: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

22

Quick If-Else Operation

def count_Tom(count,x):

if x == “Tom”:

return count + 1

else

return count

The above function is too long compared to its

logic.

In a short form, we can use

count = count + 1 if x == “Tome” else count

Page 23: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

23

Data as a string or a list of strings

Page 24: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

24

Number and String

Numbers have a string representation

Such as

1 “1”

103 “103”

32 “032”

Converting an integer to a string is done by str()

function.

str(1) “1”, str(103) “103”, str(32) “32”, …

Page 25: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

25

Number and String

A string is converted back to a number by int() or float()

function.

The use of int or float is decided by a programmer, by you.

int(“32”) 32

int(“-33”) -33

“32” + “33” “3233”

“32” + 33 will raise a TypeError

int(“32”) + 33 65

Page 26: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

26

Number and String

The int() function cannot handle a float number.

int(“3.24”)

ValueError: invalid literal for int() with base 10:

'3.24’

If ‘.’ is in a string, use float() function

float(“3.24”) 3.24

These casting functions only work with valid strings.

int(“ 3 4 “), float(“3.24 3.25”), int(“ x3”) ValueError

Page 27: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

27

String as a record

A string is a wonderful record.

Storing a sequence of integers:

my_nums = “1,2,3,4,5”

your_nums = “1 2 3 4 5”

Heterogeneous data tuple:

me = “Joohwi 10/24 5.11”

you = “Tom 3/2 5.6”

Page 28: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

28

String as a collection

A string can contain multiple records.

class_info = “Tom 732 Dave 733 Dorothy 734 … “

another_class_info = “Tom 732, Dave 733, Dorothy

734, …”

maybe_another_info = “Tom 732\nDave 733\

nDorothy 734\n…”

Page 29: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

29

Find a substring from a string

S = “Romeo, Juliet, Mulan, Fiona”

I want to know if “Mickey” is included in a data of

S.

Easy! Use IN function.

“Romeo” IN S True

“Jul” IN S True

However, IN operator doesn’t tell you where it is.

Instead IN operator, use find() function.

Page 30: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

30

‘find()’ function

Ask the given string S, if it has a w.

S.find(w) will return the index where w starts.

find() function will return the position of w, which

is

a substring of S.

“Mickey Mouse”.find(“Mouse”) 7

S = “Mini Mouse”

S.find(“Mouse”) 5

Page 31: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

31

‘replace()’ function

How to correct a string?

Use replace(u, v) function.

The replace() function of the string type will

replace a substring u to another substring v.

S = “Hello”

S.replace(“ello”, “ELLO”) produce “HELLO”

Note that replace() function always creates a new

string.

Page 32: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

32

String Formatting

Formatting an output is a inarguably frequently

used function.

From a tuple T = (“Tom”, “Jack”, “Kim”),

Let’s make greetings for each name.

Old way:

print “Hello,”, T[0] Hello, Tom

print “Hello”,, T[1] Hello, Jack

Page 33: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

33

String Formatting

A new way!

Prepare a format string (output pattern)

message_template = “Hello, {name}”

Replace the placeholder with an actual value; Tom,

Jack, ..

message_template.replace(“{name}”, “Tom”)

“Hello, Tom”

Page 34: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

34

String Formatting

Assumption? The substring {name} cannot be contained

in the output.

Another Example:

form = “{Name}’s score is {Score}”

data = [ (“Tom”, 100), (“Jack”, 99) ]

Replace() function will help

form.replace(“{Name}”, data[0]

[0]).replace(“{Score}”, data[0][1]) “Tom’s score is

100”

Page 35: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

35

String Formatting

form.replace(“{Name}”, data[0]

[0]).replace(“{Score}”, data[0][1]) “Tom’s score is

100” will raise ValueError.

The data[0][1]’s type is an integer and cannot

replace a substring.

Instead, use str() function to convert a number into

a string.

form.replace(“{Name}”, data[0]

[0]).replace(“{Score}”, str(data[0][1]))

Page 36: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

36

String Formatting

It is inconvenient to use str() function for every data.

What if we have a marker giving a hint of the data

type in a format string?

We have the feature already.

The ‘%’ operator for a string will do that.

“%s’s score is %d” % (“Tom”, 100)

“Tom’s score is 100”

Page 37: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

37

String Formatting

The placeholders in a given format string is replaced

by a tuple given to ‘%’ operator together with the

format string.

“%s %s %s” % (“I”, “am”, “a boy”)

“I am a boy”

“%d %s %f” % (3, “>”, 2.9)

“3 > 2.9”

‘%s’, ‘%d’, and ‘%f’ takes a string, a decimal, and a

float number, respectively.

Page 38: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

38

File

Page 39: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

39

File

File is a string stored in an external storage.

File is another source of input.

It is useful in reading huge data automatically.

Otherwise, we have to type into our Python code.

File is another destination for output.

You can store your data permanently into HDD.

Page 40: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

40

Data Processing

Data flows from a source to a destination.

Common practice for data processing

1. Read a file and make a string or strings separated

by lines.

2. Transform each line into a tuple.

Numeric strings are transformed into a float or an int

Date and time strings are transformed into a datetime

object

3. Process those tuples and produce outputs

4. Format the processed data into a string

Page 41: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

41

Reading a list of strings from a file

Page 42: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

42

Example Data: CSV format

Name, Date, Amount

Galaxy S5, 2014/05, 32

iPhone 5s, 2014/05, 108

Galaxy Note, 2014/05, 12

iPhone 4, 2014/05, 7

Galaxy S5, 2014/04, 98

Galaxy Note, 2014/04, 1

Moto X, 2014/04, 16

iPhone 5s, 2014/04, 99

Page 43: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

43

Reading a file

File is an external resource.

In order to read a file, the operating system must

help.

The behavior of a file might be different from Mac

and Windows.

Locating a file is done by a path name.

Basic knowledge on file system is assumed in this

class.

Page 44: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

44

How to read a file

Python provides open() function.

open(<string>, <string>,…) <file object>

The file object provides a set of functions to access

a file.

Page 45: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

45

Example of file reading

List Tuple String File

f = open(“test.csv”, “r”)

lines = f.readlines()

f.close()

The lines variable has a list of strings, which is

each line of the file, ‘test.csv’

Page 46: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

46

Readlines()

print lines

>>> ['Name, Date, Amount\n', 'Galaxy S5, 2014/05,

32\n', 'iPhone 5s, 2014/05, 108\n', 'Galaxy Note,

2014/05, 12\n', 'iPhone 4, 2014/05, 7\n', 'Galaxy S5,

2014/04, 98\n', 'Galaxy Note, 2014/04, 1\n', 'Moto X,

2014/04, 16\n', 'iPhone 5s, 2014/04, 99\n']

Page 47: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

47

File Processing

Page 48: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

48

First Processing

Remove the leading and trailing whitespaces (\n).

strip() function will do this.

We have to apply the strip() function for the entire

elements of the list. How?

This is an example of mapping!

[ x0.strip(), x1.strip(), x2.strip(), …, xn.strip() ] [x0,

x1, x2, …, xn ]

Use list comprehension! or collect_mapping_if()

Page 49: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

49

Remove Whitespaces

lines = [ l.strip() for l in

lines ]

Page 50: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

50

String to Words

After removing whitespaces, each word should be

separated by delimiters; for example, ‘, ’ here.

Let’s do this.

words = [ l.split(“, “) for l in

lines ]

Page 51: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

51

String to int

We have a list of words for each line.

They are all strings.

To compute the amount as a number, a type

conversion from string to int is required.

int() function will convert a string to an integer.

Page 52: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

52

String to int

Let’s do this

tuples = [ (w[0], w[1], int(w[2]) for w in

words[1:] ]

Why words[1:]? What it means?

Page 53: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

53

Ready for Processing

Now, we have a proper form of data.

Each individual item is separated from a string.

A number has been converted to an integer to

perform algebraic operations.

We are ready for further analysis.

Page 54: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

54

More Problems

What if a file is too large to load into our memory?

What if data is stored in a different format?

What other formats do we need?

In this class, we will deal with, CSV, XML, and Excel

What if data is scattered into many different files?

What if data is related within data?

How could we represent relationships such as social

network?

Page 55: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

55

Writing strings into a file

Page 56: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

56

How to write a file?

Let’s write the contents we have back into a file.

In the same way, open a file.

f = open(“testout.csv”, “w”)

“w” states that the file is used for writing.

Page 57: 1 Week3: Files and Strings. List List is a sequence of data in any type. [ “Hello”, 1, 3.7, None, True, “You” ] Accessing a list is done by the bracket

57

How to write a file?

Use a write function of a file and give a string.

String formatting via % operator.

“%s, %s, %d” % (w[0], w[1], w[2])

When % operator is used with a string, it is format

operator, not modulo.

for w in words:

f.write(“%s, %s, %d\n” % (w[0], w[1],

w[2]))