lecture05
TRANSCRIPT
Knowledge Representationin
Digital HumanitiesAntonio Jiménez Mavillard
Department of Modern Languages and LiteraturesWestern University
Lecture 5
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard
* Contents: 1. Why this lecture? 2. Discussion 3. Chapter 5 4. Assignment 5. Bibliography
2
Why this lecture?
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard
* This lecture... · goes deeply into the development of programming skills · introduces strings as means of text represention, the study subject for the rest of the course
3
Last assignment discussion
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard
* Time to... · consolidate ideas and concepts dealt in the readings · discuss issues arised in the specific solutions to the projects
4
Chapter 5
Text Representation in Python
1. More programming Python2. Complex data types
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard5
Chapter 5
1 More programming in Python 1.1 Functions 1.2 Basic data types
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard6
Chapter 5
2 Complex data types 2.1 Strings
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard7
More programming in Python
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard8
Functions
* Debugging · Syntax errors + colon at the end of def + indentation inside def · Logic errors + infinite recursion
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard9
Functions
* Definition · A funcion is a named sequence of statements that performs a task · To use a function: 1. Define it 2. Call it
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard10
Functions
* Definition · Syntax: + Definition
def function_name(parameters): #definition statements #body
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard11
Functions
* Definition · Syntax: + Call
function_name(arguments)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard12
Functions
* Arguments vs parameters · The arguments are values passed to the function call · The arguments are assigned to the parameters
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard13
Functions
* Arguments vs parameters · Example:
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard14
#print versiondef mean(x, y): print (x + y) / 2
In [1]: from mean1 import mean
In [2]: mean(2, 4)3
In [3]:
Functions* Arguments vs parameters · Example: + The parameter x takes the value of the first argument, 2 + The parameter y takes the value of the second argument, 4 + The function calculates (2 + 4) / 2 and prints the result
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard15
Functions
* Scope · A block is a section of code, consisting of one or more statements grouped together · Examples: branches in if statements, code in loops for and while, body of functions...
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard16
Functions* Scope · Variables created in a function are local to the function and do not exist outside · Two ways to communicate with the exterior: + arguments (input) + return statement (output)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard17
Functions
* Scope · Example:
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard18
#return versiondef mean(x, y): return (x + y) / 2
In [1]: from mean1 import mean
In [2]: m = mean(2, 4)3
In [3]: m
In [4]: from mean2 import mean
In [5]: m = mean(2, 4)
In [6]: mOut[6]: 3
In [7]:
#print versiondef mean(x, y): print (x + y) / 2
Functions
* Exercise 1 · An integer y is a divisor of the integer x if the reminder of the division x/y is equals to 0 · An integer number is prime if it is greater than 1 and has no divisors other than 1 and itself
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard19
Functions
* Exercise 1 · Write a function that prints the list of prime numbers less than 100
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard20
Functions
* Exercise 1 (solution)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard21
#prime numbersdef prime_list(n): i = 1 while i <= n: if is_prime(i): print i i = i + 1
Functions
* Exercise 1 (solution)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard22
def is_prime(n): result = True i = 1 while i <= n: if is_divisor(i, n) and i != 1 and i != n: result = False break i = i + 1 return result
Functions
* Exercise 1 (solution)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard23
def is_divisor(x, y): return y % x == 0
Functions* Exercise 1 (solution)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard24
In [1]: from prime import prime_list
In [2]: prime_list(20)1235711131719
In [3]:
Functions
* About functions · There exist predefined functions ready to be used · Programmers can define new functions · A function can call another functions
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard25
Functions
* Why functions? · Functions are reusable so they make a program shorter by eliminating repetitive code · Long programs divided into functions are easier to write, read, understand and debug
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard26
References
Downey, Allen. “Chapter 3: Functions.” Think Python. Sebastopol, CA: O’Reilly, 2012. Print.
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard27
Basic data types
* Debugging · Logic errors + mistake a variable data type
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard28
Basic data types
* int · Type for numbers · Examples: 1, 1234567890* long · Type for long numbers · Examples: 101000 (a one followed by a thousand zeros)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard29
Basic data types
* float · Type for floating-point numbers · Examples: 1.0, 3.1416* bool · Type for logic values · Examples: True, False
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard30
Basic data types
* The type function · Returns the type of a value, variable or expression
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard31
In [1]: type(1)Out[1]: int
In [2]: x = 10**1000 + 1
In [3]: type(x)Out[3]: long
In [4]:
Basic data types
* The type function · Returns the type of a value, variable or expression
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard32
In [4]: y = 3.1 + 2.21
In [5]: type(y)Out[5]: float
In [6]: type(x == y)Out[6]: bool
In [7]:
Basic data types
* Type conversion functions · int: converts to int (if possible)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard33
In [1]: int("123")Out[1]: 123
In [2]: int(3.1416)Out[2]: 3
In [3]:
Basic data types
* Type conversion functions · float: converts to float (if possible)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard34
In [1]: float(123)Out[1]: 123.0
In [2]: float('3.1416')Out[2]: 3.1416
In [3]:
Basic data types
* Type conversion functions · bool: converts to bool (if possible)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard35
In [1]: bool([1, 2, 3])Out[1]: True
In [2]: bool(0)Out[2]: False
In [3]:
Basic data types
* Type conversion functions · str: converts to str (if possible)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard36
In [1]: str(123)Out[1]: '123'
In [2]: str(not True)Out[2]: 'False'
In [3]:
References
“5. Built-in Types — Python v2.7.6 Documentation.” N. p., n.d. Web. 17 Feb. 2014.
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard37
Complex data types
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard38
Strings
* Debugging · Syntax errors + not closing ''/“” · Semantic errors + not accessing the first and/or last element
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard39
Strings
* Debugging · Logic errors + modifing an element + accessing to a non-existing element - index out of range
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard40
Strings
* str · Type for strings · Examples: 'hello world!', “hello world!” · A string is a sequence of characters · Suitable to represent texts
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard41
Strings
* Indices · Three ways to access a string: + As a whole - Example: word + Its characters one at a time - Syntax: string[index] - Example: word[1]
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard42
Strings
* Indices · Three ways to access a string: + Slices - Syntax: string[index_1:index_2] - Example: word[2:5]
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard43
Strings
* Exercise 2 · Figure out the range of indices for a string · Try out several examples · Extract a general pattern for any string
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard44
Strings* Exercise 2 (solution)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard45
In [1]: s = 'digital'
In [2]: s[1]Out[2]: 'i'
In [3]: s[0]Out[3]: 'd'
In [4]: s[7]IndexError: string index out of range
In [5]: s[6]Out[5]: 'l'
In [6]:
Strings* Exercise 2 (solution)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard46
In [6]: s = 'humanities'
In [7]: s[1]Out[7]: 'u'
In [8]: s[0]Out[8]: 'h'
In [9]: s[10]IndexError: string index out of range
In [10]: s[9]Out[10]: 's'
In [11]:
Strings
* Exercise 2 (solution) From 0 to the string's number of characters minus 1
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard47
Strings* Indices word = 'digital'
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard48
d i g i t a l
0 1 2 3 4 5 6
In [1]: word = 'digital'
In [2]: wordOut[2]: 'digital'
In [3]: word[1]Out[3]: 'i'
In [4]: word[2:5]Out[4]: 'git'
Strings
* Inmutability · Strings are inmutable (cannot be modified) · To modify a string, it is necessary to reasign changes to a new (or same) string
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard49
Strings
* Inmutability · Example: word += 's' + Equivalent to word = word + 's' + Accesses the value of the variable word, concatenates an s, and reasign the result to the variable word again
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard50
Strings
* Some functions and operators · The len function returns the number of characters in a string
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard51
In [1]: len('digital')Out[1]: 7
In [2]:
Strings
* Exercise 3 · Write a function that receives a string and returns the number of characters (do not use the len function and do use a for loop)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard52
Strings
* Exercise 3 (solution)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard53
def count(s): counter = 0 for c in s: counter += 1 return counter
Strings
* Exercise 4 · What does this function do?
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard54
def any_function(string, char): result = 1 index = 0 while index < len(string): if string[index] == char: result = index break index += 1 return result
Strings
* Exercise 4 (solution) It returns the (first) index of a character in a string or -1 if the not found
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard55
Strings
* Exercise 5 · Write a function that counts the number of ocurrences of a character in a string
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard56
Strings
* Exercise 5 (solution)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard57
def count(s, ch): counter = 0 for c in s: if c == ch: counter += 1 return counter
Strings* Some functions and operators · The operator in checks if a string is contained in another string
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard58
In [1]: s = 'abcde'
In [2]: 'bc' in sOut[2]: True
In [3]: 'rs' in sOut[3]: False
In [4]:
References
Downey, Allen. “Chapter 8: Strings.” Think Python. Sebastopol, CA: O’Reilly, 2012. Print.
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard59
Assignment
* Assignment 5: Lexicon · Readings + Word play (Think Python) + Files (Think Python)
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard60
Assignment* Assignment 5: Lexicon · Project + Grady Ward, as part of the Moby lexicon project, has collected a list of 113,809 official crosswords; that is, words that are considered valid in crossword puzzles and other word games
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard61
Assignment
* Assignment 5: Lexicon · Project + Download a copy of the word list from http://thinkpython.com/code/words.txt
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard62
Assignment
* Assignment 5: Lexicon · Project + Many words in English have endings (suffix) that identifies them as nouns + Some of these suffixes common to nouns are (non-exhaustive list):
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard63
Assignment
* Assignment 5: Lexicon · Project -age, -ance, -ant, -cy, -dom, -ee, -ence, -ent, -er, -hood, -ing, -ism, -ist, -ity, -ment, -ness, -or, -ry, -ship, -sion, -tion, -tude
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard64
Assignment* Assignment 5: Lexicon · Project + Write a program that reads the file words.txt and: - prints all the nouns (according to the previous list) - prints the number of nouns - prints the number of total words
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard65
Assignment* Assignment 5: Lexicon · Project + Write a program that reads the file words.txt and: - calculates and prints the proportion (expressed in % with decimals) of nouns with respect to the total words
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard66
References
Downey, Allen. “Chapter 14: Files.” Think Python. Sebastopol, CA: O’Reilly, 2012. Print.
Downey, Allen. “Chapter 9: Case Study - Word Play.” Think Python. Sebastopol, CA: O’Reilly, 2012. Print.
“Moby Project.” Wikipedia, the free encyclopedia 19 Jan. 2014. Wikipedia. Web. 20 Feb. 2014.
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard67
Bibliography
“5. Built-in Types — Python v2.7.6 Documentation.” N. p., n.d. Web. 17 Feb. 2014.
Downey, Allen. Think Python. Sebastopol, CA: O’Reilly, 2012. Print.
“Moby Project.” Wikipedia, the free encyclopedia 19 Jan. 2014. Wikipedia. Web. 20 Feb. 2014.
Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard68