lecture 1 dustin lueker. statistical terminology descriptive statistics probability and...

23
STA 291 Summer 2010 Lecture 1 Dustin Lueker

Upload: marlene-curtis

Post on 16-Jan-2016

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

STA 291Summer 2010

Lecture 1Dustin Lueker

Page 2: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Statistical terminology Descriptive statistics Probability and distribution functions Inferential statistics

◦ Estimation (confidence intervals)◦ Hypothesis testing

Simple linear regression and correlation

Topics

STA 291 Summer 2010 Lecture 1

Page 3: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Research in all fields is becoming more quantitative◦ Research journals◦ Most graduates will need to be familiar with basic

statistical methodology and terminology Newspapers, advertising, surveys, etc.

◦ Many statements contain statistical arguments Computers make complex statistical

methods easier to use

Why study Statistics?

STA 291 Summer 2010 Lecture 1

Page 4: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Many times statistics are used in an incorrect and misleading manner

Purposely misused◦ Companies/people wanting to further their

agenda Cooking the data

Completely making up data Massaging the numbers

Altering values to get desired result

Accidentally misused◦ Using inappropriate methods

Vital to understand a method before using it

Lies, Damn Lies, and Statistics

STA 291 Summer 2010 Lecture 1

Page 5: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data

Applicable to a wide variety of academic disciplines◦ Physical sciences◦ Social sciences◦ Humanities

Statistics are used for making informed decisions◦ Business◦ Government

What is Statistics?

STA 291 Summer 2010 Lecture 1

Page 6: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Design

•Planning research studies

•How to best obtain the required data

•Assuring that our data is representational of the entire population

Description

•Summarizing data

•Exploring patterns in the data

•Extract/condense information

Inference

•Make predictions based on the data

•‘Infer’ from sample to population

•Summarize results

General Statistical Methodology

STA 291 Summer 2010 Lecture 1

Page 7: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Population◦ Total set of all subjects of interest

Entire group of people, animals, products, etc. about which we want information

Elementary Unit◦ Any individual member of the population

Sample◦ Subset of the population from which the study

actually collects information◦ Used to draw conclusions about the whole

population

Basic Terminology

STA 291 Summer 2010 Lecture 1

Page 8: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Variable◦ A characteristic of a unit that can vary among

subjects in the population/sample Ex: gender, nationality, age, income, hair color,

height, disease status, state of residence, grade in STA 291

Parameter◦ Numerical characteristic of the population

Calculated using the whole population Statistic

◦ Numerical characteristic of the sample Calculated using the sample

Basic Terminology

STA 291 Summer 2010 Lecture 1

Page 9: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Why take a sample? Why not take a census? Why not measure all of the units in the population?◦ Accuracy

May not be able to find every unit in the population◦ Time

Speed of response from units◦ Money◦ Infinite Population◦ Destructive Sampling or Testing

Data Collection and Sampling Theory

STA 291 Summer 2010 Lecture 1

Page 10: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

University Health Services at UK conducts a survey about alcohol abuse among students◦ 200 of the students are sampled and asked to

complete a questionnaire◦ One question is “have you regretted something

you did while drinking?” What is the population? What is the sample?

Example

STA 291 Summer 2010 Lecture 1

Page 11: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

‘Flavors’ of Statistics Descriptive Statistics

◦ Summarizing the information in a collection of data

Inferential Statistics◦ Using information from a sample to make

conclusions/predictions about the population Ex: using a sample statistic to estimate a population

parameter

STA 291 Summer 2010 Lecture 1

Page 12: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Example The Current Population Survey of about 60,000

households in the United States in 2002 distinguishes three types of families: Married-couple (MC), Female householder and no husband (FH), Male householder and no wife (MH)

It indicated that 5.3% of “MC”, 26.5% of “FH”, and 12.1% of “MH” families have annual income below the poverty level◦ Are these numbers statistics or parameters?

The report says that the percentage of all “FH” families in the USA with income below the poverty level is at least 25.5% but no greater than 27.5%◦ Is this an example of descriptive or inferential statistics?

STA 291 Summer 2010 Lecture 1

Page 13: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Quantitative or Numerical◦ Variable with numerical values associated with

them Qualitative or Categorical

◦ Variables without numerical values associated with them

Scales of Measurement

STA 291 Summer 2010 Lecture 1

Page 14: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Ordinal◦ Disease status, company rating, grade in STA 291

Ordinal variables have a scale of ordered categories, they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.) One unit can have more of a certain property than

another unit

Nominal◦ Gender, nationality, hair color, state of residence

Nominal variables have a scale of unordered categories It does not make sense to say, for example, that green

hair is greater/higher/better than orange hair

Qualitative Variables

STA 291 Summer 2010 Lecture 1

Page 15: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Quantitative◦ Age, income, height

Quantitative variables are measured numerically, that is, for each subject a number is observed The scale for quantitative variables is called interval

scale

Quantitative Variables

STA 291 Summer 2010 Lecture 1

Page 16: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

A study about oral hygiene and periodontal conditions among institutionalized elderly measured the following◦ Nominal (Qualitative): Requires assistance from staff?

Yes No

◦ Ordinal (Qualitative): Plaque score No visible plaque Small amounts of plaque Moderate amounts of plaque Abundant plaque

◦ Interval (Quantitative): Number of teeth

Example

STA 291 Summer 2010 Lecture 1

Page 17: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

A birth registry database collects the following information on newborns◦ Birth weight: in grams◦ Infant’s Condition:

Excellent Good Fair Poor

◦ Number of prenatal visits◦ Ethnic background:

African-American Caucasian Hispanic Native American Other

What are the appropriate scales? Quantitative (Interval) Qualitative (Ordinal, Nominal)

Example

STA 291 Summer 2010 Lecture 1

Page 18: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Statistical methods vary for quantitative and qualitative variables

Methods for quantitative data cannot be used to analyze qualitative data

Quantitative variables can be treated in a less quantitative manner◦ Height: measured in cm/in

Interval (Quantitative) Can be treated at Qualitative

Ordinal: Short Average Tall

Nominal: <60in or >72in 60in-72in

Importance of Different Types of Data

STA 291 Summer 2010 Lecture 1

Page 19: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Try to measure variables as detailed as possible◦ Quantitative

More detailed data can be analyzed in further depth

◦ Caution: Sometimes ordinal variables are treated as quantitative (ex: GPA)

Other Notes on Variable Types

STA 291 Summer 2010 Lecture 1

Page 20: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

A variable is discrete if it can take on a finite number of values◦ Gender◦ Nationality◦ Hair color◦ Disease status◦ Grade in STA 291◦ Favorite MLB team

Qualitative variables are discrete

Discrete Variables

STA 291 Summer 2010 Lecture 1

Page 21: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Continuous variables can take an infinite continuum of possible real number values◦ Time spent studying for STA 291 per day

43 minutes 2 minutes 27.487 minutes 27.48682 minutes

Can be subdivided into more accurate values Therefore continuous

Continuous Variables

STA 291 Summer 2010 Lecture 1

Page 22: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Number of children in a family Distance a car travels on a tank of gas % grade on an exam

Examples

STA 291 Summer 2010 Lecture 1

Page 23: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive statistics  Probability and distribution functions  Inferential statistics ◦ Estimation

Quantitative variables can be discrete or continuous

Age, income, height?◦ Depends on the scale

Age is potentially continuous, but usually measured in years (discrete)

Discrete or Continuous

STA 291 Summer 2010 Lecture 1