stats lecture 01 intro

Upload: katherine-sauer

Post on 06-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Stats Lecture 01 Intro

    1/28

  • 8/3/2019 Stats Lecture 01 Intro

    2/28

    6/26/12

    An Introduction to Statistics

    Statistics is a discipline concerned with the- collection, classification and interpretation of quantitative data

    - application of probability theory to the analysis andestimation of population parameters

    A statistic is a sample characteristic.- a sample is a subset of the population

    - a population is the entire set of objects being studied

    A parameter is a population characteristic.

  • 8/3/2019 Stats Lecture 01 Intro

    3/28

    6/26/12

    A. Random SamplesWhen selecting a sample from the population, a random sampleshould be used.

    - minimize any bias

    simple random sample = sample in which every member of the population has an equal, non-zero and known probability of beingselected

    random sample = sample in which every member of the population

    has a non-zero, known but not necessarily equal probability of being selected

  • 8/3/2019 Stats Lecture 01 Intro

    4/28

    6/26/12

    Problems with Random Samples:

    1. population list may be unavailable

    2. may be time consuming and/or costly

    3. non-response may be high

    4. may be biased if certain subgroups are numerically small butimportant to the study

  • 8/3/2019 Stats Lecture 01 Intro

    5/28

    6/26/12

    Stratified Random Sampling

    - ensures key sub-groups in the population are includedin the random sample

    First, divide population into sub-groups (strata).

    Calculate how many observations are needed from each strata toreflect their proportion in the population.

    Then, choose a random sample of that many observations fromeach subgroup.

  • 8/3/2019 Stats Lecture 01 Intro

    6/28

    6/26/12

    Category Male Female TotalManagement 10 20 30Professional 50 40 90Administration 40 60 100Services 60 20 80

    Total 160 140 300

    Classification of Employees

    Example: Suppose a firms employees are classified in thefollowing way:

    This firm wishes to administer an office climate survey to a randomsample of 30 employees.

    How many people from each strata should get the survey?

  • 8/3/2019 Stats Lecture 01 Intro

    7/28

    6/26/12

    Step 1: Calculate the proportion of the population which isrequired for the sample.

    proportion of the population to be = total sample size

    sampled total population size

    = 30300

    = 0.10

    So, 1/10 of each group needs to be selected for the sample.

  • 8/3/2019 Stats Lecture 01 Intro

    8/28

    6/26/12

    Step 2: Multiply each groups size by the proportion youcalculated.

    - round to the nearest whole number

    Category Male Female TotalManagement 10 20 30Professional 50 40 90Administration 40 60 100Services 60 20 80

    Total 160 140 300

    Classification of Employees

    Category Male FemaleManagement 1 2Professional 5 4Administration 4 6Services 6 2

    Total 16 14

    Number Needed from Each Group

    16 + 14 = 30

  • 8/3/2019 Stats Lecture 01 Intro

    9/28

    6/26/12

    Step 3: Select the required number from each group randomly.

    Ex: Assign a number between 1 and 10 to each man frommanagement.

    Name Assign IDBen 1

    Damian 2

    Greg 3 Jeremy 4

    Matt 5Mohammad 6

    Nick 7

    Simon 8 Teddy 9

    Will 10

    Use the randbetween functionin Excel to randomly generate anumber between 1 and 10.

    The number the Excel generateswill be the ID number of the

    person chosen for the sample.

  • 8/3/2019 Stats Lecture 01 Intro

    10/28

    6/26/12

    In Excel, when you type =r in any cell, a list of functions thatstart with the letter r will pop up.

  • 8/3/2019 Stats Lecture 01 Intro

    11/28

    6/26/12

    Since you know you want to use the randbetween function, finishtyping it and end the command with (1,10) to indicate you want arandom number between 1 and 10.

    Hit enter to see your random number.

    Looking back at our table,Mohammad would be chosen for thesample.

    We only need one member from this

    group.

  • 8/3/2019 Stats Lecture 01 Intro

    12/28

    6/26/12

    Cluster Sampling Clusters are geographical areas or units like schools,households, etc..

    Once the clusters have been defined, the required number of clusters is selected randomly.

    Then, depending on the nature of the research, all or some of theindividuals in each cluster are surveyed. One-Stage Clustering

    If each cluster is divided into smaller clusters and a randomsample of them is chosen, it is called Two-Stage Clustering .

    Multi-Stage Clustering is another option.

  • 8/3/2019 Stats Lecture 01 Intro

    13/28

    6/26/12

    Problems with Stratified and Cluster Sampling:

    A stratified sample suffers from the same problems as a simplerandom sample.

    Each cluster should be representative of the population but in realitythis may be difficult to achieve.

  • 8/3/2019 Stats Lecture 01 Intro

    14/28

    6/26/12

    B. Non Random Sampling

    A population list may not be available. The researcher may

    have to use their judgment to determine the selection of thesample. ( judgment samples )

    Stratified Quota Sample- calculate the number needed from each strata- the selection of individuals from each strata is notrandom

    Other - self-selected sample- focus group- opportunity sample

  • 8/3/2019 Stats Lecture 01 Intro

    15/28

    6/26/12

    II. Sorting and Classifying Data

    Qualitative or Categorical Data = defined by some characteristic or quality

    nominal data = a group characteristic like gender or profession

    ordinal data = the result of ranking something in order of preference (e.g. products, TV shows)

  • 8/3/2019 Stats Lecture 01 Intro

    16/28

    6/26/12

    Quantitative or Numeric Data = described numerically by counts or measurements

    discrete = can only take certain distinct values

    e.g. a die can only turn up 1,2,3,4,5 or 6 when thrown once

    continuous = can be any value from a continuous set of valuese.g. temperature

    - usually round to a specified number of decimal places

  • 8/3/2019 Stats Lecture 01 Intro

    17/28

    6/26/12

    30 11 29 34 54 3649 31 42 45 25 2515 18 13 25 1355 55 38 31 4338 22 37 20 36

    Number of MP3 Players Sold Daily

    Suppose we have the following raw data on MP3 player salesfor the month of January.

    Lets construct a Frequency Distribution Table in order to makesome sense of the numbers.

    Typically between 5 and 20 intervals are chosen.

    Lets choose 10-19, 20-29, 30-39, 40-49 and 50-59 as our intervals.

  • 8/3/2019 Stats Lecture 01 Intro

    18/28

    6/26/12

    First, we need to sort our data. In Excel, click on the cell at the topof your data column. Then from the data tab, click sort.

  • 8/3/2019 Stats Lecture 01 Intro

    19/28

    6/26/12

    Once the data are sorted, tally the frequency of sales in eachinterval.

    Now that we have sorted and tallied our data, we can moreeasily make observations about it.

    For Example:

    Daily Sales Frequency10 to 19 520 to 29 630 to 39 940 to 49 4

    50 to 59 3 Total 27

    January Sales of MP3 Players

  • 8/3/2019 Stats Lecture 01 Intro

    20/28

    6/26/12

    We can also present the frequency data in a graph.

    Highlight thetop left cell.

    Click theInsert tab.

    SelectColumn.

    Select 2-D.

  • 8/3/2019 Stats Lecture 01 Intro

    21/28

    6/26/12

    Youll need to change the title of your graph and add appropriateaxis labels. Also, turn off the legend.

    0

    2

    4

    6

    8

    10

    12

    Frequency

  • 8/3/2019 Stats Lecture 01 Intro

    22/28

    6/26/12

    Now that we have constructed our graph, we can visually makesense of the data.

    0

    2

    4

    6

    8

    10

    12

    Daily MP3 Player Sales in January

    Number of MP3 Players Sold in a Day

    Frequency

  • 8/3/2019 Stats Lecture 01 Intro

    23/28

    6/26/12

    A histogram is a graphical representation of frequency distributionsfor numeric data.

    - the area of each rectangle is proportional to the frequency

    of the interval- intervals may be equal or unequal

    - typically no gaps in bars

    0

    2

    4

    6

    8

    10

    12

    Daily MP3 Player Sales in January

    Number of MP3 Players Sold in a Day

    Frequency

    In an Excelchart, rightclick on a barand select

    format dataseries. Selectno gap.

  • 8/3/2019 Stats Lecture 01 Intro

    24/28

    6/26/12

    The age group Under 15 contains all ages that round from 0 to not15. So, it has a lower bound of zero and an upper bound of 14.499999.

    - interval of 14.4999

    The age group 15 24 contains all ages that round from 15 to 24.- interval of 10 (14.5 to 24.4999999)

    Age 1995 2002Under 15 32.6 25

    15 - 24 69.1 91.925 - 44 106.5 186.445 - 64 21 46.2

    65 and over 7.3 9.9

    Outflow (thousands)

    Total International Emigration by Age: 1995 and 2002

    Example of unequal intervals:Why didtheychoosethe agesthey didfor these

    intervals?

  • 8/3/2019 Stats Lecture 01 Intro

    25/28

    6/26/12

    The age group 25 44 contains all ages that round from 25 to 44.- interval of 20

    The age group 45 64 contains all ages that round from 45 to 64.- interval of 20

    The age group 65 and over has no upper bound. We may wish tochoose a reasonable one. If we choose 84 (reasonable), the intervalwill be 20.

    There is a way to calculate the height of the histogram bars by hand, but most people simply use the command in their data processingsoftware.

  • 8/3/2019 Stats Lecture 01 Intro

    26/28

    6/26/12

    A Cumulative Frequency Graph (Ogive) depicts the total number of data that have values less than the upper class boundary of eachinterval as given in the frequency distribution table.

    Daily Sales Frequency10 to 19 520 to 29 630 to 39 940 to 49 450 to 59 3 Total 27

    January Sales of MP3 Players

    Recall our frequencydistribution table for

    MP3 players.

    Re-working it we get,

    Daily Sales Frequency Less than Cumulative frequency0 to 9 0 10 0

    10 to 19 5 20 520 to 29 6 30 1130 to 39 9 40 2040 to 49 4 50 24

    50 to 59 3 60 27

  • 8/3/2019 Stats Lecture 01 Intro

    27/28

    6/26/12

    0

    2

    4

    6

    8

    10

    12

    Cumulative Frequency of Daily MP3 Player Sales

    Daily Sales of MP3 Players

    CumulativeF requency

    The resulting Ogive graph:

  • 8/3/2019 Stats Lecture 01 Intro

    28/28

    6/26/12

    Skills:

    basic terminology of statistics

    given raw data:

    perform stratified random samplingconstruct frequency distribution tableconstruct frequency distribution chartconstruct ogive graph