lecture 1 dustin lueker. statistical terminology descriptive methods probability and distribution...
TRANSCRIPT
![Page 1: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/1.jpg)
Lecture 1Dustin Lueker
![Page 2: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/2.jpg)
Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals) Hypothesis testing Inferential methods for two samples Simple linear regression and correlation
STA 291 Summer 2008 Lecture 1
![Page 3: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/3.jpg)
Research in all fields is becoming more quantitative◦ Look at research journals◦ Most graduates will need to be familiar with basic
statistical methodology and terminology Newspapers, advertising, surveys, etc.
◦ Many statements contain statistical arguments Computers make complex statistical
methods easier to use
STA 291 Summer 2008 Lecture 1
![Page 4: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/4.jpg)
Many times statistics are used in an incorrect and misleading manner
Purposely misused◦ Companies/people wanting to furthur their
agenda Cooking the data
Completely making up data Massaging the numbers
Incidentally misused◦ Using inappropriate methods
Vital to understand a method before using it
STA 291 Summer 2008 Lecture 1
![Page 5: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/5.jpg)
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data
Applicable to a wide variety of academic disciplines◦ Physical sciences◦ Social sciences◦ Humanities
Statistics are used for making informed decisions◦ Business◦ Government
STA 291 Summer 2008 Lecture 1
![Page 6: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/6.jpg)
STA 291 Summer 2008 Lecture 1
![Page 7: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/7.jpg)
Population◦ Total set of all subjects of interest
Entire group of people, animals, products, etc. about which we want information
Elementary Unit◦ Any individual member of the population
Sample◦ Subset of the population from which the study
actually collects information◦ Used to draw conclusions about the whole
population
STA 291 Summer 2008 Lecture 1
![Page 8: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/8.jpg)
Variable◦ A characteristic of a unit that can vary among
subjects in the population/sample Ex: gender, nationality, age, income, hair color,
height, disease status, state of residence, grade in STA 291
Parameter◦ Numerical characteristic of the population
Calculated using the whole population Statistic
◦ Numerical characteristic of the sample Calculated using the sample
STA 291 Summer 2008 Lecture 1
![Page 9: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/9.jpg)
Why take a sample? Why not take a census? Why not measure all of the units in the population?◦ Accuracy
May not be able to find every unit in the population◦ Time
Speed of response from units◦ Money◦ Infinite Population◦ Destructive Sampling or Testing
STA 291 Summer 2008 Lecture 1
![Page 10: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/10.jpg)
University Health Services at UK conducts a survey about alcohol abuse among students◦ 200 of the students are sampled and asked to
complete a questionnaire◦ One question is “have you regretted something
you did while drinking?” What is the population? Sample?
STA 291 Summer 2008 Lecture 1
![Page 11: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/11.jpg)
Descriptive Statistics◦ Summarizing the information in a collection of
data Inferential Statistics
◦ Using information from a sample to make conclusions/predictions about the population
STA 291 Summer 2008 Lecture 1
![Page 12: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/12.jpg)
The Current Population Survey of about 60,000 households in the United States in 2002 distinguishes three types of families: Married-couple (MC), Female householder and no husband (FH), Male householder and no wife (MH)
It indicated that 5.3% of “MC”, 26.5% of “FH”, and 12.1% of “MH” families have annual income below the poverty level◦ Are these numbers statistics or parameters?
The report says that the percentage of all “FH” families in the USA with income below the poverty level is at least 25.5% but no greater than 27.5%◦ Is this an example of descriptive or inferential statistics?
STA 291 Summer 2008 Lecture 1
![Page 13: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/13.jpg)
Univariate data◦ Consists of observations on a single attribute
Multivariate data◦ Consists of observations on several attributes
Special case Bivariate Data
Consists of observations on two attributes
STA 291 Summer 2008 Lecture 1
![Page 14: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/14.jpg)
Quantitative or Numerical◦ Variable with numerical values associated with
them Qualitative or Categorical
◦ Variables without numerical values associated with them
STA 291 Summer 2008 Lecture 1
![Page 15: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/15.jpg)
Nominal◦ Gender, nationality, hair color, state of residence
Nominal variables have a scale of unordered categories It does not make sense to say, for example, that green
hair is greater/higher/better than orange hair
Ordinal◦ Disease status, company rating, grade in STA 291
Ordinal variables have a scale of ordered categories, they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.) One unit can have more of a certain property than does
another unit
STA 291 Summer 2008 Lecture 1
![Page 16: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/16.jpg)
Quantitative◦ Age, income, height
Quantitative variables are measured numerically, that is, for each subject a number is observed The scale for quantitative variables is called interval
scale
STA 291 Summer 2008 Lecture 1
![Page 17: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/17.jpg)
A study about oral hygiene and periodontal conditions among institutionalized elderly measured the following◦ Nominal (Qualitative): Requires assistance from staff?
Yes No
◦ Ordinal (Qualitative): Plaque score No visible plaque Small amounts of plaque Moderate amounts of plaque Abundant plaque
◦ Interval (Quantitative): Number of teeth
STA 291 Summer 2008 Lecture 1
![Page 18: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/18.jpg)
A birth registry database collects the following information on newborns◦ Birthweight: in grams◦ Infant’s Condition:
Excellent Good Fair Poor
◦ Number of prenatal visits◦ Ethnic background:
African-American Caucasian Hispanic Native American Other
What are the appropriate scales? Quantitative (Interval) Qualitative (Ordinal, Nominal)
STA 291 Summer 2008 Lecture 1
![Page 19: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/19.jpg)
Statistical methods vary for quantitative and qualitative variables
Methods for quantitative data cannot be used to analyze qualitative data
Quantitative variables can be treated in a less quantitative manner◦ Height: measured in cm/in
Interval (Quantitative) Can be treated at Qualitative
Ordinal: Short Average Tall
Nominal: <60in or >72in 60in-72in
STA 291 Summer 2008 Lecture 1
![Page 20: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/20.jpg)
Try to measure variables as detailed as possible◦ Quantitative
More detailed data can be analyzed in further depth
◦ Caution: Sometimes ordinal variables are treated at quantitative (ex: GPA)
STA 291 Summer 2008 Lecture 1
![Page 21: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/21.jpg)
A variable is discrete if it can take on a finite number of values◦ Gender◦ Nationality◦ Hair color◦ Disease status◦ Grade in STA 291◦ Favorite MLB team
Qualitative variables are discrete
STA 291 Summer 2008 Lecture 1
![Page 22: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/22.jpg)
Continuous variables can take an infinite continuum of possible real number values◦ Time spent studying for STA 291 per day
43 minutes 2 minutes 27.487 minutes 27.48682 minutes
Can be subdivided into more accurate values Therefore continuous
STA 291 Summer 2008 Lecture 1
![Page 23: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/23.jpg)
Number of children in a family Distance a car travels on a tank of gas % grade on an exam
STA 291 Summer 2008 Lecture 1
![Page 24: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/24.jpg)
Quantitative variables can be discrete or continuous
Age, income, height?◦ Depends on the scale
Age is potentially continuous, but usually measured in years (discrete)
STA 291 Summer 2008 Lecture 1
![Page 25: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/25.jpg)
Each possible sample has the same probability of being selected
The sample size is usually denoted by n
STA 291 Summer 2008 Lecture 1
![Page 26: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/26.jpg)
Population of 4 students: Alf, Buford, Charlie, Dixie
Select a SRS of size n = 2 to ask them about their smoking habits◦ 6 possible samples of size 2
A,B A,C A,D B,C B,D C,D
STA 291 Summer 2008 Lecture 1
![Page 27: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/27.jpg)
Each of the size possible samples has to have the same probability of being selected◦ How could we do this?
Roll a die Random number generator
STA 291 Summer 2008 Lecture 1
![Page 28: Lecture 1 Dustin Lueker. Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697bfad1a28abf838c9bf43/html5/thumbnails/28.jpg)
Convenience sample◦ Selecting subjects that are easily accessible to you
Volunteer sample◦ Selecting the first two subjects who volunteer to
take the survey
What are the problems with these samples?◦ Proper representation of the population◦ Bias
Examples Mall interview Street corner interview
STA 291 Summer 2008 Lecture 1