lecture 1 dustin lueker. statistical terminology descriptive statistics probability and...

Post on 16-Jan-2016

231 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

STA 291Summer 2010

Lecture 1Dustin Lueker

Statistical terminology Descriptive statistics Probability and distribution functions Inferential statistics

◦ Estimation (confidence intervals)◦ Hypothesis testing

Simple linear regression and correlation

Topics

STA 291 Summer 2010 Lecture 1

Research in all fields is becoming more quantitative◦ Research journals◦ Most graduates will need to be familiar with basic

statistical methodology and terminology Newspapers, advertising, surveys, etc.

◦ Many statements contain statistical arguments Computers make complex statistical

methods easier to use

Why study Statistics?

STA 291 Summer 2010 Lecture 1

Many times statistics are used in an incorrect and misleading manner

Purposely misused◦ Companies/people wanting to further their

agenda Cooking the data

Completely making up data Massaging the numbers

Altering values to get desired result

Accidentally misused◦ Using inappropriate methods

Vital to understand a method before using it

Lies, Damn Lies, and Statistics

STA 291 Summer 2010 Lecture 1

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data

Applicable to a wide variety of academic disciplines◦ Physical sciences◦ Social sciences◦ Humanities

Statistics are used for making informed decisions◦ Business◦ Government

What is Statistics?

STA 291 Summer 2010 Lecture 1

Design

•Planning research studies

•How to best obtain the required data

•Assuring that our data is representational of the entire population

Description

•Summarizing data

•Exploring patterns in the data

•Extract/condense information

Inference

•Make predictions based on the data

•‘Infer’ from sample to population

•Summarize results

General Statistical Methodology

STA 291 Summer 2010 Lecture 1

Population◦ Total set of all subjects of interest

Entire group of people, animals, products, etc. about which we want information

Elementary Unit◦ Any individual member of the population

Sample◦ Subset of the population from which the study

actually collects information◦ Used to draw conclusions about the whole

population

Basic Terminology

STA 291 Summer 2010 Lecture 1

Variable◦ A characteristic of a unit that can vary among

subjects in the population/sample Ex: gender, nationality, age, income, hair color,

height, disease status, state of residence, grade in STA 291

Parameter◦ Numerical characteristic of the population

Calculated using the whole population Statistic

◦ Numerical characteristic of the sample Calculated using the sample

Basic Terminology

STA 291 Summer 2010 Lecture 1

Why take a sample? Why not take a census? Why not measure all of the units in the population?◦ Accuracy

May not be able to find every unit in the population◦ Time

Speed of response from units◦ Money◦ Infinite Population◦ Destructive Sampling or Testing

Data Collection and Sampling Theory

STA 291 Summer 2010 Lecture 1

University Health Services at UK conducts a survey about alcohol abuse among students◦ 200 of the students are sampled and asked to

complete a questionnaire◦ One question is “have you regretted something

you did while drinking?” What is the population? What is the sample?

Example

STA 291 Summer 2010 Lecture 1

‘Flavors’ of Statistics Descriptive Statistics

◦ Summarizing the information in a collection of data

Inferential Statistics◦ Using information from a sample to make

conclusions/predictions about the population Ex: using a sample statistic to estimate a population

parameter

STA 291 Summer 2010 Lecture 1

Example The Current Population Survey of about 60,000

households in the United States in 2002 distinguishes three types of families: Married-couple (MC), Female householder and no husband (FH), Male householder and no wife (MH)

It indicated that 5.3% of “MC”, 26.5% of “FH”, and 12.1% of “MH” families have annual income below the poverty level◦ Are these numbers statistics or parameters?

The report says that the percentage of all “FH” families in the USA with income below the poverty level is at least 25.5% but no greater than 27.5%◦ Is this an example of descriptive or inferential statistics?

STA 291 Summer 2010 Lecture 1

Quantitative or Numerical◦ Variable with numerical values associated with

them Qualitative or Categorical

◦ Variables without numerical values associated with them

Scales of Measurement

STA 291 Summer 2010 Lecture 1

Ordinal◦ Disease status, company rating, grade in STA 291

Ordinal variables have a scale of ordered categories, they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.) One unit can have more of a certain property than

another unit

Nominal◦ Gender, nationality, hair color, state of residence

Nominal variables have a scale of unordered categories It does not make sense to say, for example, that green

hair is greater/higher/better than orange hair

Qualitative Variables

STA 291 Summer 2010 Lecture 1

Quantitative◦ Age, income, height

Quantitative variables are measured numerically, that is, for each subject a number is observed The scale for quantitative variables is called interval

scale

Quantitative Variables

STA 291 Summer 2010 Lecture 1

A study about oral hygiene and periodontal conditions among institutionalized elderly measured the following◦ Nominal (Qualitative): Requires assistance from staff?

Yes No

◦ Ordinal (Qualitative): Plaque score No visible plaque Small amounts of plaque Moderate amounts of plaque Abundant plaque

◦ Interval (Quantitative): Number of teeth

Example

STA 291 Summer 2010 Lecture 1

A birth registry database collects the following information on newborns◦ Birth weight: in grams◦ Infant’s Condition:

Excellent Good Fair Poor

◦ Number of prenatal visits◦ Ethnic background:

African-American Caucasian Hispanic Native American Other

What are the appropriate scales? Quantitative (Interval) Qualitative (Ordinal, Nominal)

Example

STA 291 Summer 2010 Lecture 1

Statistical methods vary for quantitative and qualitative variables

Methods for quantitative data cannot be used to analyze qualitative data

Quantitative variables can be treated in a less quantitative manner◦ Height: measured in cm/in

Interval (Quantitative) Can be treated at Qualitative

Ordinal: Short Average Tall

Nominal: <60in or >72in 60in-72in

Importance of Different Types of Data

STA 291 Summer 2010 Lecture 1

Try to measure variables as detailed as possible◦ Quantitative

More detailed data can be analyzed in further depth

◦ Caution: Sometimes ordinal variables are treated as quantitative (ex: GPA)

Other Notes on Variable Types

STA 291 Summer 2010 Lecture 1

A variable is discrete if it can take on a finite number of values◦ Gender◦ Nationality◦ Hair color◦ Disease status◦ Grade in STA 291◦ Favorite MLB team

Qualitative variables are discrete

Discrete Variables

STA 291 Summer 2010 Lecture 1

Continuous variables can take an infinite continuum of possible real number values◦ Time spent studying for STA 291 per day

43 minutes 2 minutes 27.487 minutes 27.48682 minutes

Can be subdivided into more accurate values Therefore continuous

Continuous Variables

STA 291 Summer 2010 Lecture 1

Number of children in a family Distance a car travels on a tank of gas % grade on an exam

Examples

STA 291 Summer 2010 Lecture 1

Quantitative variables can be discrete or continuous

Age, income, height?◦ Depends on the scale

Age is potentially continuous, but usually measured in years (discrete)

Discrete or Continuous

STA 291 Summer 2010 Lecture 1

top related