data variables & unit of observation

40
1 Data Variables & Unit of Observation

Upload: astra

Post on 23-Feb-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Data Variables & Unit of Observation. Statistics. (The field of) Statistics is the systematic study of data. The word “data” is plural… “The data are the price gains of 200 stocks on the NYSE.” Singular? “Datum.” (Uncommon.) Shares of Exxon-Mobil gained 2.3%. The datum is 2.3%. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Variables & Unit of Observation

1

DataVariables & Unit of Observation

Page 2: Data Variables & Unit of Observation

2

Statistics(The field of) Statistics is the systematic study of data.

The word “data” is plural…“The data are the price gains of 200 stocks on the

NYSE.”

Singular? “Datum.” (Uncommon.)Shares of Exxon-Mobil gained 2.3%. The datum is 2.3%.

What characterizes data is variability.

Page 3: Data Variables & Unit of Observation

3

Variables / Units of Observation

Units of observation: Set of entities (things / objects) being studied

Variable: An attribute of units

Suppose X describes a variable and U describes the units.

“X varies among the (statistical) units.”

Page 4: Data Variables & Unit of Observation

4

Units of ObsMath 158-800 students.

Variable:Gender.

Gender is a Categorical Variable

Gender varies among Math 158-800 students.

Page 5: Data Variables & Unit of Observation

5

Units of ObsMath 158-800 students.

Variable:Number of FB friends.

Number of FB friends is a Quantitative Variable

Number of FB friends varies among Math 158-800 students.

Page 6: Data Variables & Unit of Observation

6

1. An experiment was conducted to test the performance of four brands of batteries in three different environments (room temperature; hot and humid; cold). For each combination of brand and environment, batteries were put into a flashlight. The flashlight was then turned on and allowed to run until the light went out. The amount of time until the flashlight stopped shining (in minutes) was recorded. Do brand and environment play a role in the lifetime of these batteries?Minutes are measurement units. Most quantitative variables have a measurement unit. If I want the measurement unit, I’ll say exactly that. By “unit” I mean “unit of observation” = thing / object that is studied.

Page 7: Data Variables & Unit of Observation

7

2. 55 year old men are recruited into a study about heart attacks. The heart rate of each man is recorded. Each is tracked for a one-year period, and whether or not he has a heart attack is determined.

Page 8: Data Variables & Unit of Observation

8

3. A student runs an experiment to study the effect of tire pressure on gas mileage. He devises a system so that his car uses gasoline from a one-liter container. Each time the container is filled, he randomly selects a tire pressure between 20 and 35 psi, then drives the car at 60 mph on a divided highway. When he runs out of gas, he records the distance driven on that fill. Does tire pressure impact the distance driven?

Something like “drives” (the noun, not the verb) would also suffice for the units.

Page 9: Data Variables & Unit of Observation

9

Types of variablesQuantitative VariableNaturally measured as numbers for which ordering and at least some of the usual operations (addition, multiplication, subtraction, etc.) make sense.

DiscreteAll the possible values are easily listedFrequent “ties”Often count or related to counts

ContinuousTechnically: “ties” are impossibleIn practice ties are uncommon

Page 10: Data Variables & Unit of Observation

10

Types of variablesCategorical (Qualitative) VariableNot quantitative (usually verbal, but sometimes expressed as numbers having little or no number meaning).

Categorical variables are discrete. So, the term discrete is rarely used in speaking about categorical variables – it is redundant.

Page 11: Data Variables & Unit of Observation

11

DistributionA variable’s distribution is a description of what values it takes and how often it takes them.Categorical Variables

Distributions are always summarized in terms of percents (falling into each category).Graphs include the pie chart, bar chart, and Pareto chart.

Page 12: Data Variables & Unit of Observation

12

DistributionA variable’s distribution is a description of what values it takes and how often it takes them.Quantitative Variables

There are many ways to summarize quantitative variables.

Graphs: Relative frequency charts and histograms (very similar); boxplots; dotplots; stemplots; etc.Quantitative summaries: Mode; Range; Mean + Standard Deviation; Median + Interquartile Range

Page 13: Data Variables & Unit of Observation

13

In a data table each unit takes a row; each variable occupies a column.Column headers identify variable names.

There are other ways to organize data, and some are preferable when the idea is to display the data efficiently. However, in most cases, a data table is how data are organized in a spreadsheet.

Page 14: Data Variables & Unit of Observation

14

Here are the monthly fees (in $) paid by a random sample of 50 users of internet service providers in 2008:

42 31 33 34 65 47 37 38 32 4032 36 31 42 32 32 72 42 45 3741 46 39 38 34 31 41 51 42 3732 42 31 43 40 32 37 34 44 4136 34 45 45 42 35 39 83 30 39

VARIABLE: ____________

UNITS: ____________

Page 15: Data Variables & Unit of Observation

15

Here are the monthly fees (in $) paid by a random sample of 50 users of internet service providers in 2008:

42 31 33 34 65 47 37 38 32 4032 36 31 42 32 32 72 42 45 3741 46 39 38 34 31 41 51 42 3732 42 31 43 40 32 37 34 44 4136 34 45 45 42 35 39 83 30 39

VARIABLE: Monthly fee (for use of internet)

UNITS: Users of internet service

Page 16: Data Variables & Unit of Observation

16

User Monthly Fee ($)

User 1* 42

User 2* 31

User 3* 33

:  

*Perhaps identified by name? (Names aren’t given here.)Often, unit identifiers will not be given or displayed.

You can start almost any problem in this course by first asking:

What are the units?What is the variable?

Page 17: Data Variables & Unit of Observation

17

Variables / Statistical UnitsThe units of observation are the companies listed on the New York Stock Exchange.Describe a variable.Write the sentencevariable_________ varies from company to company.

Is the variable quantitative or categorical?If quantitative, is it discrete or continuous?

Page 18: Data Variables & Unit of Observation

18

Distribution

The distribution of a variable tells us what values it takes and the likelihood of those values.

User Monthly Fee ($)

User 1* 42

User 2* 31

User 3* 33

:  

What the fees are.

How often those fees occur.

Page 19: Data Variables & Unit of Observation

19

Car model Vehicle typeTransmission

typeNumber of cylinders

City MPG

Highway MPG

:          

BMW 3030CI Subcompact Automatic 6 19 27

BMW 3030CI Subcompact Manual 6 21 30

Buick Century Midsize Automatic 6 20 29

Chevrolet Blazer 4-wheel drive Automatic 6 15 20

:          

           

Page 20: Data Variables & Unit of Observation

20

Car model Vehicle typeTransmission

typeNumber of cylinders

City MPG

Highway MPG

:          

BMW 3030CI Subcompact Automatic 6 19 27

BMW 3030CI Subcompact Manual 6 21 30

Buick Century Midsize Automatic 6 20 29

Chevrolet Blazer 4-wheel drive Automatic 6 15 20

:          

           

UNITSVARIABLES (there are 5)

varies from unit to unit.X (Variable)

Page 21: Data Variables & Unit of Observation

21

Car model Vehicle typeTransmission

typeNumber of cylinders

City MPG

Highway MPG

:          

BMW 3030CI Subcompact Automatic 6 19 27

BMW 3030CI Subcompact Manual 6 21 30

Buick Century Midsize Automatic 6 20 29

Chevrolet Blazer 4-wheel drive Automatic 6 15 20

:          

           

UNITSVARIABLES (there are 5)

varies from car model to car model.City MPG

Page 22: Data Variables & Unit of Observation

22

Car model Vehicle typeTransmission

typeNumber of cylinders

City MPG

Highway MPG

:          

BMW 3030CI Subcompact Automatic 6 19 27

BMW 3030CI Subcompact Manual 6 21 30

Buick Century Midsize Automatic 6 20 29

Chevrolet Blazer 4-wheel drive Automatic 6 15 20

:          

           

UNITSVARIABLES (there are 5)

varies from car model to car model.Number of cylinders

Page 23: Data Variables & Unit of Observation

23

Car model Vehicle typeTransmission

typeNumber of cylinders

City MPG

Highway MPG

:          

BMW 3030CI Subcompact Automatic 6 19 27

BMW 3030CI Subcompact Manual 6 21 30

Buick Century Midsize Automatic 6 20 29

Chevrolet Blazer 4-wheel drive Automatic 6 15 20

:          

           

UNITSVARIABLES (there are 5)

varies from car model to car model.Transmission type

Page 24: Data Variables & Unit of Observation

24

Car model Vehicle typeTransmission

typeNumber of cylinders

City MPG

Highway MPG

:          

BMW 3030CI Subcompact Automatic 6 19 27

BMW 3030CI Subcompact Manual 6 21 30

Buick Century Midsize Automatic 6 20 29

Chevrolet Blazer 4-wheel drive Automatic 6 15 20

:          

           

UNITSVARIABLES (there are 5)

CATEGORICAL VARIABLETransmission type

Page 25: Data Variables & Unit of Observation

25

Car model Vehicle typeTransmission

typeNumber of cylinders

City MPG

Highway MPG

:          

BMW 3030CI Subcompact Automatic 6 19 27

BMW 3030CI Subcompact Manual 6 21 30

Buick Century Midsize Automatic 6 20 29

Chevrolet Blazer 4-wheel drive Automatic 6 15 20

:          

           

UNITSVARIABLES (there are 5)

QUANTITATIVE VARIABLECity MPG

Page 26: Data Variables & Unit of Observation

26

Mutual Fund CategoryNet assets ($ millions)

2008 return Expense Ratio

:    

Fidelity Low-Priced Stock

Small cap value 19,378 -36.2% 0.98%

Price International Stock International 3,828 -48.0% 0.87%

Vanguard 500 Index Large cap blend 74,886 -37.0% 0.15%

:    

Page 27: Data Variables & Unit of Observation

27

Mutual Fund CategoryNet assets ($ millions)

2008 return Expense Ratio

:    

Fidelity Low-Priced Stock

Small cap value 19,378 -36.2% 0.98%

Price International Stock International 3,828 -48.0% 0.87%

Vanguard 500 Index Large cap blend 74,886 -37.0% 0.15%

:    

VARIABLES (there are 4)UNITS

Page 28: Data Variables & Unit of Observation

28

The Science of StatisticsData vary

A population is a collection of all the units of interest. If we have information on all the units of a population we have a complete description of the variation in the data. Such a description of a population is a census. Characteristics of populations are parameters.A sample is an incomplete collection of units from the population. A sample necessarily provides incomplete information. Characteristics of samples are called (the word) statistics.

Page 29: Data Variables & Unit of Observation

29

Variables / Statistical Units

The units of observation are the countries (of the world).Describe a variable.Write the sentence

_________ varies among __________.

Is the variable quantitative or categorical?

Page 30: Data Variables & Unit of Observation

30

Variables / Statistical Units

The units of observation are the countries (of the world).Describe a variable.Write the sentence

_________ varies among countries.

Is the variable quantitative or categorical?

Page 31: Data Variables & Unit of Observation

31

GDP per capita and LongevityCountry GDP / person Longevity (years)

Qatar $86006 75.6

U. S. $47440 78.2

Spain $30589 80.9

[world average] $10433 67.2

Haiti $1317 60.9

: :   :

Page 32: Data Variables & Unit of Observation

32NOT a unit

GDP per capita and LongevityCountry GDP / person Longevity (years)

Qatar $86006 75.6

U. S. $47440 78.2

Spain $30589 80.9

[world average] $10433 67.2

Haiti $1317 60.9

: :   :

Page 33: Data Variables & Unit of Observation

33

The Science of StatisticsData vary

A population is a collection of all the units of interest. If we have information on all the units of a population we have a complete description of the variation in the data. Such a description of a population is a census. Characteristics of populations are parameters.A sample is an incomplete collection of units from the population. A sample necessarily provides incomplete information. Characteristics of samples are called (the word) statistics.

Page 34: Data Variables & Unit of Observation

34NOT a unit

GDP per capita and LongevityCountry GDP / person Longevity (years)

Qatar $86006 75.6

U. S. $47440 78.2

Spain $30589 80.9

[world average] $10433 67.2

Haiti $1317 60.9

: :   :Parameters

Not statistics

Page 35: Data Variables & Unit of Observation

35

Purposes of variablesExplanatory and Response VariableChanging the value of the explanatory variable (EV) results in a change in the distribution of the response variable (RV).Loosely: A change in the explanatory variable alters the prediction of the response variable.

Page 36: Data Variables & Unit of Observation

36

Page 37: Data Variables & Unit of Observation

37

Variable:Form of study.

Units:The (200) college students involved in the experiment.

Form of study varies from student to student.

Page 38: Data Variables & Unit of Observation

38

Variable:Score on the short answer test.

Units:The (200) college students involved in the experiment.

Score on the short answer test varies from student to student.

Page 39: Data Variables & Unit of Observation

39

Page 40: Data Variables & Unit of Observation

40

Experimental studyThe explanatory variable is assigned (often by the people conducting the study).Units do not enter the study with a value for this variable.

Observational studyThe explanatory variable is a characteristic of the unit.