mh3400 lecture 1 jan 13 2015

Upload: son-nguyen

Post on 03-Mar-2016

72 views

Category:

Documents


2 download

DESCRIPTION

Discrete Math

TRANSCRIPT

  • 1MH3400 Algorithms for the Real World

    Andrew Lim

  • 2Course Administration Instructor Information

    Andrew Lim Email : [email protected] Office Phone : 6513 8652 Skype Contact: [email protected], please identify

    yourself when contacting me for the first time Lecture

    Every TUE from 1330 1530 hrs SPMS LT2, please take note of some changes

    Tut/Lab Every FRI from 08301030hrs COMPLAB1 from Wk2-Wk13 Some Tut/labs may be taught by my grad students.

  • 3Course Administration Grading Criteria

    2-hour Final Written Exam 50% Continual Assessment 50%

    Written and Lab Assignments, weights would be given for each assignment.

    Class participation coupon

    Text Books and Materials No required text book. You can use the internet, it is a

    great source. Below are some texts on algorithms Dasgupta Papadimitriou and Vazirani, Algorithms 2006 Mehlhorn and Sanders, Algorithms and Data Structures : The

    Basic Toolbox, 2007 Lehman and Leighton, Mathematics for Computer Science

    More public materials will be forthcoming when required

  • 4Schedule and Changes - Lectures Jan 13: Lect 1 Jan 20: Postponed Jan 27: Postponed Feb 3: Lect 2 Feb 10: Lect 3 Feb 17: Lect 4 Feb 24: Lect 5 Feb 28: **Makeup

    1000-1200hrs Lect 6 1330-1530hrs Lect 7

    Mar10: Lect 8 Mar17: Lect 9 Mar24: Lect 10 Mar31: Lect 11 Apr7: Lect 12 Apr14: Lect 13

    Note that Mar 2-6 term-break

  • 5Schedule Lab/Tutorial Jan 16: No Lab Jan 23: Lab 1 Jan 30: Lab 2 Feb 6: Lab 3 Feb 13: Lab 4 Feb 20: ** Lunar NY Feb 27: Lab 5 Mar 13: Lab 6

    Mar20: Lab 7 Mar27: Lab 8 Apr3: ** Good Friday Apr10: Lab 9 Apr17: Lab 10

    Note that Mar 2-6 term-break

  • 6Properties of an algorithm? Input

    Zero or more quantities are externally supplied Output

    At least one quantity is produced Definiteness

    Each instruction is clear and unambiguous Finiteness

    Terminates after a finite number of steps Effectiveness

    Every instruction must be very basic so that it can be carried out in principle by a person using only pencil and paper. It not only need to be definite, it has to be feasible

  • 7Four distinct areas of study How to devise algorithms How to validate algorithms How to analyze algorithms How to test algorithms

  • 8Sorting

  • 9Selection Sort//third attempt

    for (i=1; i

  • 10

    Selection Sort//third attemptvoid selectionsort(Type a[], int n)//sort the array a[1:n] into non-decreasing order{for (i=1; i

  • 11

    Four distinct areas of study How to devise algorithms

    Selection Sort method, what other methods? How to validate algorithms

    How to prove correctness? How to analyze algorithms

    What about performance? Theoretically and in practice?

    How to test algorithms Correctness Performance

  • 12

    Other Sorting Algorithms Insertionsort Bubblesort Mergesort Quicksort

  • 13

    Insertion Sort

    5 2 4 6 1 3input array

    left sub-array right sub-array

    at each iteration, the array is divided in two sub-arrays:

    sorted unsorted

  • 14

    Insertion Sort

  • 15

    Bubble Sort1329648

    i = 1 j

    3129648i = 1 j

    3219648i = 1 j

    3291648i = 1 j

    3296148i = 1 j

    3296418i = 1 j

    3296481i = 1 j

    3296481i = 2 j

    3964821i = 3 j

    9648321i = 4 j

    9684321i = 5 j

    9864321i = 6 j

    9864321i = 7

    j

  • 16

    Mergesort - Divide1 2 3 4 5 6 7 8

    62317425

    1 2 3 4

    74255 6 7 8

    6231

    1 2

    253 4

    745 6

    317 8

    62

    1

    52

    23

    44

    7 16

    37

    28

    65

  • 17

    Mergesort Conquer and Merge

    1

    52

    23

    44

    7 16

    37

    28

    65

    1 2 3 4 5 6 7 8

    76543221

    1 2 3 4

    75425 6 7 8

    6321

    1 2

    523 4

    745 6

    317 8

    62

  • 18

    Quicksort

    userSticky Noteidea l ly s u tin ca array ra lm pivot

    userSticky Notenhng s no < 5 th nm bn array tri, > 5 th bn array phi

  • 19

    Type of Analysis Worst case

    Provides an upper bound on running time Best case

    Provides a lower bound on running time Average case

    Provides a prediction about the running time Assumes that the input is random

    ** Benchmark case Provide the prediction about the running on cases

    that are relevant to the problem the algorithm is solving

  • 20

    What to compare objectively?Compare execution times? Count the number of statements executed?

    Express running time as a function of the input size n (i.e., f(n)). Compare different functions corresponding to

    running times. Such an analysis is independent of machine time,

    programming style, etc.

  • 21

    Selection Sort

  • 22

    Asymptotic AnalysisTo compare two algorithms with running

    times f(n) and g(n), we need a rough measure that characterizes how fast each function grows.

    We use rate of growth Compare functions in the limit, that is,

    asymptotically! i.e., for large values of n

  • 23

    Rate of Growth

    The low order terms in a function are relatively insignificant for large n

    n4 + 10n3 + 100n2 + 1000n + 10 ~ n4

    we say that n4 + 10n3 + 100n2 + 1000n + 10and n4 have the same rate of growth

  • 24

    Asymptotic notation

  • 25

    Asymptotic notation

  • 26

    Asymptotic notation

    (g(n)) is the set of functions with the same order of growth

    as g(n)

  • 27

    O-notation n4 + 10n3 + 100n2 + 1000n + 10 is O(n4) 12345 is O(1) n2 + 3n is O(n2) Selection SortBubblesort? Insertion sort?Mergesort?Quicksort?Best Case? And Average case?

    userSticky NoteBest-Average-Worst

    Selection Sort: O(n^2)Bubble Sort: O(n) - O(n^2) - O(n^2)Insertion Sort: O(n) - O(n^2) - O(n^2)Merge Sort: O(nlogn)Quick Sort: O(nlogn)

  • 28

    Class Exercise For each of the following pairs of functions, either f(n) is

    O(g(n)), f(n) is (g(n)), or f(n) = (g(n)). Determine which relationship is correct. f(n) = log n2; g(n) = log n + 5 f(n) = n; g(n) = log n2 f(n) = log log n; g(n) = log n f(n) = n; g(n) = log2 n f(n) = n log n + n; g(n) = log n f(n) = 10; g(n) = log 10 f(n) = 2n; g(n) = 10n2 f(n) = 2n; g(n) = 3n

  • 29

    Identifying the Repeated Element Consider an array of a[] of n numbers that has

    n/2 distinct elements and n/2 copies of another element. Propose an algorithm to find the repeated element

    Method 1?

    Method 2?

  • 30

    An Algorithm to identify repeated element

    int RepeatedElement(Type a[], int n){

    while (1) {int i = random()%n+1; int j = random()%n+1; //i and j are random numbers in [1,n]if ((i!=j) && (a[i]==a[j])) return (i);

    }}

  • 31

    Identifying the Repeated Element What is the probability that in an iteration

    repeated elements are found and iteration will quit is:

    n/2 ( n/2 1) .

    for all n 10

    This means that it wont quit <

    Probability that it wont quit in 10 iterations is < (

    )10 < 0.1074

    100 iterations is < (

    )100 < 2.04 x 10-10

  • 32

    Randomized Algorithm Asymptotic Complexities The () like the O() notation is used to

    characterize the run times of non randomized algorithms, () is used for characterizing the run times of Las Vegas algorithms. We say a Las Vegas algorithm has a resource (time, space, etc) bound of (g(n)) if there exists a constant c such that the amount of resource used by the algorithm (on any input of size n) is no more than with probability . We shall refer to these bounds as high probability bounds.

  • 33

    Identifying the Repeated Element For a few million elements, any deterministic

    algorithm will certainly spend a few million steps while our simplistic randomize algorithm will almost certainly quit in 100 steps.

    In general, the algorithms does not quit in the first iterations is

    < (

    ) log = -clog

    < n-

    if we pick

    Note that yxlog yx loglog abx logxba log

  • 34

    Identifying the Repeated ElementThis means that the algorithm will

    terminates in

    iterations or

    less with probability of n-

    Since each iteration of the while loop takes O(1) time, the run time of this algorithm is (log n).

    userSticky Notek c c ch ny

  • 35

    And the Real World Math + Computing A killer combination At about week 7, Algorithms (Lecture) and Real

    World (Lab/Tut) will be meet! Real world dataset from a famous fashion

    company. Dataset has undergone transformation and masking to prevent leakage of company sensitive information

    Transformation maintained essential properties of the data for analysis

  • 36

    Additional Course Materials Repository

    URL: http://data.computational-logistics.org:8081/NTU2015MH3400/

    Username: NTU2015MH3400 Password: diFAcyugLeDu

  • 37

    Data Sample

  • 38

    Assignment 1 Individual (150 marks)Deadline: at the beginning of Lab 1 on Jan 23Part 1 50 marks Visit the url http://dev.mysql.com/downloads/mysql/Download and install mysql database in your notebook/computer Visit the url http://www.heidisql.com/Download and install heidisql in your notebook/computer Visit the url http://data.computational-

    logistics.org:8081/NTU2015MH3400/ Download the database dump sku_s_d_f.sql Create a new database using heidisql Load one of the above database dump into heidisql

  • 39

    Assignment 1 - IndividualPart 2 100 marks Write the relevant SQL statements to generate the following and cut and paste the screenshot of the results into a word document to show the results.a) Find the # of product groupsb) Find the # of productsc) Find the # of sizesd) Find the # of different colorse) Find the # of stores

  • 40

    Assignment 1 - IndividualPart 2 100 marks f) Find for each store what is its total

    sales/revenue for year 2012, 2013, 2014g) Find the total sales/revenue for each day of the

    week (i.e. Monday, Tuesday, Wednesday, etc)h) Find the total sales/revenue for 2012, 2013, 2014i) Find the total sales/revenue for each month

    from Jan to Dec for the year 2013j) Find the total sales of each store for the year

    2013The assignment is not simple, start early!

    userSticky Note- nh l price l after-tax, after discounted - year_for_week v year_for_month c th khc nhau- age_by_... l tnh t ngy transaction