lesson 1 04 types of data

18
Chapter 1 Describing Data – Lesson 4 Page 1 Chapter 1: Describing Data Lesson 4: Data Sources, Variables (and Types of Variables) and Measurement Scales TIME FRAME: 1 hour session OVERVIEW OF LESSON In this activity, students will be given lectures (and assessments) regarding the typical sources of data (both primary data sources such as surveys and censuses, direct observations experiments, as well as secondary data sources such as survey reports, books, magazines, blogs), variables, the general classifications of variables (quantitative and qualitative), and the different measurement scales for data. LEARNING OUTCOME(S): At the end of the lesson, the learner is able to list various data sources; define and distinguish between qualitative and quantitative variables, and between discrete and continuous variables (that are quantitative); and, identify different scales of measurement. LESSON OUTLINE: 1. Introduction/Motivation 2. Lesson Proper: Sources of Data; Types of Data Sources; and Variables (including Broad Classifications of Variables) 3. Simulation Activity : Analysis of Measurement Scales 4. Advanced Lesson ( For Enrichment): Types of Data by Time Dependence DEVELOPMENT OF THE LESSON (A) Introduction/Motivation Begin by ask students to recall learnings in past lessons that “data” (records collected from experiments, observations, experience, surveys and administrative forms) have variability, there are various ways of generating measurements/data and issues behind these measurement, and we can summarize this variability (or distribution) of data by way of graphs (pie charts, bar charts, pictograms), or some descriptive summary numbers (such as modes, and medians).

Upload: perla-pelicano-corpez

Post on 20-Jan-2017

26 views

Category:

Education


0 download

TRANSCRIPT

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 1

Chapter 1:

Describing Data

Lesson 4: Data Sources, Variables (and Types of Variables) and

Measurement Scales

TIME FRAME: 1 hour session

OVERVIEW OF LESSON

In this activity, students will be given lectures (and assessments) regarding the typical sources of

data (both primary data sources such as surveys and censuses, direct observations experiments,

as well as secondary data sources such as survey reports, books, magazines, blogs), variables, the

general classifications of variables (quantitative and qualitative), and the different measurement

scales for data.

LEARNING OUTCOME(S): At the end of the lesson, the learner is able to

list various data sources;

define and distinguish between qualitative and quantitative variables, and between

discrete and continuous variables (that are quantitative); and,

identify different scales of measurement.

LESSON OUTLINE:

1. Introduction/Motivation

2. Lesson Proper: Sources of Data; Types of Data Sources; and Variables (including Broad

Classifications of Variables)

3. Simulation Activity : Analysis of Measurement Scales

4. Advanced Lesson ( For Enrichment): Types of Data by Time Dependence

DEVELOPMENT OF THE LESSON

(A) Introduction/Motivation

Begin by ask students to recall learnings in past lessons that

“data” (records collected from experiments, observations, experience, surveys and administrative forms) have variability,

there are various ways of generating measurements/data and issues behind these measurement, and

we can summarize this variability (or distribution) of data by way of graphs (pie

charts, bar charts, pictograms), or some descriptive summary numbers (such as

modes, and medians).

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 2

Mention the importance of knowing the manner in which data are collected, as this will have

implications to the way data will be analyzed. Illustrate this point with the following

question:

Is 301 smaller than 302? While numerically, 301 is smaller than 302, but the numbers may mean room numbers (and room 301 might, in fact, be bigger than room

302). The numbers 301and 302 may also mean classification codes (say for

occupations, or for geographic locations), in which case, there may not be a meaning

for orders in these codes (that is, one number may not really be smaller than another).

(B) Lesson Proper

(1) Sources of Data

For 5 minutes, ask students to identify various types of sources of data. These may include:

Experiments

Direct observation (i.e., measuring amount of land owned by farmer)

Surveys (sample survey or census) o Face-to-face interviews

o Telephone surveys

o Mail/e-mail/SMS/Online surveys

Administrative records

Internet (social media, search engines, etc.)

Reports of surveys conducted, reports of administrative records

News articles, blogs, facebook posts about survey reports

Ask students how they think the proportion of total land devoted to farming is obtained (there

can be a direct measurement of land, but this information can also be sourced from admin

data, or from a census or sample survey of farmers with their reported answers).

(2) Types of Data Sources

Suggest that all these data sources listed above can be classified into primary and secondary

sources. Ask students what differentiates primary data from secondary data.

Special Note: When data are collected, they may be gathered for purposes of answering a

research question. Primary data is collected first hand by a researcher directly from the

source of information, while secondary data are those acquired through existing records

(one step removed from the original source, usually describing, summarizing, analyzing,

evaluating, derived from, or based on primary source materials).

(3) Variables

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 3

For 5 minutes, tell students to imagine that their seatmates are unknown to them. What

would be three questions they may want to ask other than their seatmate’s name, facebook

username, address, email address, and telephone number. Examples may be:

What is your height?

What province where you born in?

What type of movies do you enjoy watching?

How many are you in the family?

How old is your mother?

Inform students that when units being studied (persons, families, companies, farms,

countries, events, objects) have properties or characteristics that takes on different values,

these properties or characteristics are called variables. For example, the height of a student is

a variable since the value of heights changes from student to student. Occasionally, a variable

can only assume one value, then it is called a constant. For instance, in a class of fifteen-year

olds, the age in years of students is constant.

Broad Classification of Variables

Mention to students that variables can be broadly classified as either quantitative or

qualitative, with the latter further classified into discrete and continuous types (see Figure 1).

(i) Qualitative variables express a categorical attribute, such as sex (male or female),

religion, marital status, region of residence, highest educational attainment.

Qualitative variables do not strictly take on numeric values (although we can have

numeric codes for them, e.g., for sex variable, 1 and 2 may refer to male, and female,

respectively). Qualitative data answer questions “what kind.” Sometimes, there is a

sense of ordering in qualitative data, e.g., income data grouped into high, middle and

low-income status. Data on sex or religiondo not have the sense of ordering, as there

is no such thing as a weaker or stronger sex, and a better or worse religion.

Qualitative variables are sometimes referred to as categorical variables.

(ii) Quantitative (otherwise called numerical) data, whose sizes are meaningful, answer

questions such as “how much” or “how many”. Quantitative variables have actual

units of measure. Examples of quantitative variables include the height, weight,

number of registered cars, household size, and total household expenditures/income

of survey respondents. Quantitative data may be further classified into:

a. Discrete data are those data that can be counted, e.g., the number of days for cellphones to fail, the ages or survey respondents measured to the nearest year,

and the number of patients in a hospital. These data assume only (a finite or

infinitely) countable number of values.

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 4

b. Continuous data are those that can be measured, e.g. the exact height of a survey

respondent and the exact volume of some liquid substance. The possible values

are uncountably infinite.

Special Note:

For quantitative data, arithmetical operations have some physical interpretation. One can add

301 and 302 if these have quantitative meanings, but if, as pointed out, earlier, they refer to

room numbers, then adding these numbers does not make any sense. Even though a variable

may take numerical values, it does not make the corresponding variable quantitative! The

issue is whether performing arithmetical operations on these data would make any sense. It

would certainly not make sense to sum two zip codes or multiply two room numbers.

(C) Simulation Activity

Tell students to imagine that they are psychologists who want to study whether eating

breakfast will help kids focus in school. (Perhaps students who eat a healthy breakfast will do

best on a quiz, students who eat an unhealthy breakfast will get an average performance, and

students who do not eat anything for breakfast will do the worst on a quiz). How to being the

study?

Ask students to identify variables that need to be studied here. They are:

VARIABLES

Qualitative Quantitative

ContinuousDiscrete

Figure 1. Broad classification of Variables

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 5

what type of breakfast did the child take : a healthy an unhealthy breakfast or no breakfast (This clearly varies from child to child and perhaps even from day to day)

performance on a quiz (Teresa might do poorly on a quiz, while Joselito may do

well. Or Teresa might do poorly today but she may do well tomorrow. Scores on a

quiz change, and thus, the performance on a quiz is a variable).

How do we measure these variables?

Inform students that there are four major scales of measurement of variables: nominal,

ordinal, interval and ratio. The scale of measurement depends on the variable itself.

(a) Nominal scale of measurement arises when we have variables that are categorical

and non-numeric or where the numbers have no sense of ordering. In other words,

the numbers or categories can be put into any order, and it will not really matter.

Consider the numbers on the uniforms of basketball players. Is the player wearing

a number 7 a worse player than the player wearing number 10? Maybe, or maybe

not, but the number on the uniform does not have anything to do with their

performance. The numbers on the uniform merely help identify the basketball

player. Other examples of the nominal scale include sex, marital status, religious

affiliation. For the research on the effect of breakfast on school performance, children can be coded 1 as having healthy breakfast, 2 not a healthy breakfast, and

3 no breakfast. These numerical codes do not really matter.

(b) Ordinal scale also deals with categorical variables, but where order is important.

Suppose that, instead of looking at scores on a specific quiz for the research on

effect of breakfast on school performance, we examine the letter grades overall

for the course for each student.

Grade Scores

A 90-100

B 80-89

C 70-79

D 60-69

F 59 and below

So Joselito has an A, and Teresa has a C, and there are other students with Bs and

Ds and Fs. The letters here have a meaningful sense of ordering, unlike basketball

player uniforms, the letter grades suggests that Joselito is doing better than

Teresa. Other examples of the ordinal scale include socio economic status (A to

E, where A is wealthy, E is poor), IQ test core, difficulty of questions in an exam

(easy, medium difficult), rank in a contest (first place, second place, etc.),

perceptions in Likert scales.

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 6

Note to Teacher: While there is a sense or ordering, there is no zero point in an

ordinal scale. In addition, there is no way to find out how much “distance” there

is between one category and another. In a scale from 1 to 10, the difference

between 7 and 8 may not be the same difference between 1 and 2).

(c) Interval scale tells us that one unit differs by a certain amount of the property

from another unit. When measuring temperature in Celsius, a 10 degree difference

has the same meaning anywhere along the scale – the difference between 10 and

20 degree Celsius is the same as between 80 and 90 centigrade. But, we cannot

say that 80 degrees Celsius is twice as hot as 40 degrees Celsius since there is no

true zero, but only an arbitrary zero point. A measurement of 0 degrees Celsius

does not reflect a true "lack of temperature." Thus, Celsius scale is in interval

scale. Other examples of the interval scale include quiz results (even if a student

gets a zero, it does not mean the student has no knowledge), and the IQ of a

person (we can tell not only which person ranks higher in IQ but also how much

higher he or she ranks with another, but zero IQ does not mean no intelligence).

Special Note: The interval scale allows addition and subtraction operations, but it

does not possess an absolute zero. Zero is arbitrary as it does not mean the

variable does not exist. Zero only represents an additional measurement point.

(d) Ratio scale also tells us that one unit has so many times as much of the property

as does another unit. The ratio scale possesses a meaningful (unique and non-

arbitrary) absolute, fixed zero point and allows all arithmetic operations. The

existence of the zero point is the only difference between ratio and interval

measurement. Examples of the ratio scale include mass, heights, weights, energy

and electric charge. A temperature of zero on the Kelvin scale is absolute zero;

this makes the Kelvin scale a ratio scale. If one temperature is three times as high

as another as measured on the Kelvin scale, then it has three times the kinetic

energy of the other temperature. As regards mass, the difference between 120

grams and 135 grams is 15 grams, and this is the same difference between 380

grams and 395 grams. The scale at any given point is constant, and a

measurement of 0 reflects a complete lack of mass. Money is also on a ratio

scale. Money has properties of an interval scale; we can also say that 2000 pesos

is twice more than 1,000 pesos. In addition, money has a true zero point: if you

have zero money, this implies the absence of money.

In summary, we have the following measurement scales:

Type of Scale Characteristics of Scale Basic Empirical Operation

Nominal No order, distance, or origin Determination of Equality

Ordinal Has order but no distance or

unique origin

Determination of greater or lesser

values

Interval Both with order and distance

but no unique origin

Determination of equality of

intervals or difference

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 7

Ratio Has order, distance and unique origin

Determination of equality of ratios

The scale of measurement depends mainly on the method of measurement, not on the

property measured. The weight of primary school students measured in kilograms has a ratio

scale, but the students can be labeled into overweight, normal, underweight, and in which

case, the weight is then measured in an ordinal scale. Also, many scales are only interval

because their zero point is arbitrarily chosen

(D) Advanced Lesson (for Enrichment; may be skipped)

Types of Variables according to Time Dependence

Inform students that data may also be classified according to their dependence on time.

If individuals, establishments, households, events, etc., are observed at the same point in time (or under the same general conditions), the data set is called cross-sectional.

Table 1. Example of a cross-sectional data - Ages and monthly income of selected voters

Person Name Monthly

Income

Ages

(years)

1 Linda 6475 55

2 Romualdo 18600 32

3 Kerwin 23150 34

4 Ramon 9200 26

5 Randolph 24100 35

6 Carminda 13500 28

..

19 Dominador 7250 22

20 Grace 13450 64

Time series data represent an indicator subject's changes over the course of time. For example, the total number of primary students in a certain school district by years,

and monthly interest rates charged by a bank, hourly readings of blood pressure of a

patient in a hospital, are time series data set.

Table 2. Example of a time series data – Pupil to teacher ratio at the primary level in

Philippines : 2000-2013

Cross Sectional data set

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 8

Year Pupil-

teacher

ratio,

secondary 2000 20.36

2001 20.32

2002 18.91

2003 16.18

2004 16.64

2005 16.73

2006 16.21

2007 16.39

2008 16.24

2009 16.53

2010 15.99

2011 16.69

2012 16.75

2013 15.46

Some data sets have both time series and cross sectional features. These are called panel data. These data contains observations of multiple variables obtained over

several time periods for the same subjects (persons, establishments, countries, etc.)

Table 3. Example of Panel Data: Value of Production of Selected Metal Manufacturing

Firms

Firm Type of firm Year Production

1 Steel 2012 2.2 MT

1 Steel 2013 2.4 MT

1 Steel 2014 2.8 MT

2 Copper 2012 4.2 MT

2 Copper 2013 4.3 MT

2 Copper 2014 4.5 MT

3 Tin 2012 3.8 MT

Panel Data Set

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 9

KEY POINTS

Data sources: Primary Data Sources (direct observation, experiments, surveys, censuses, admin data) and Secondary Data Sources (survey reports, reports on admin

data, news articles, blogs, facebook posts about survey reports, etc)

Types of Data: qualitative and quantitative (further broken down into discrete and

continuous)

Measurement scales: nominal, ordinal, interval, ratio

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 10

REFERENCES

Albert, J. R. G. (2008).Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo

Patungan, Nelia Marquez), published by Rex Bookstore.

Takahashi, S. (2009). The Manga Guide to Statistics. Trend-Pro Co. Ltd.

Workbooks in Statistics 1: 11th Edition, Institute of Statistics, UP Los Banos, College Laguna

4031

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 11

ASSESSMENT

1. The website of Philippine Airlines provides a questionnaire instrument that can be answered

electronically. Which of the following methods of data collection is involved when people

complete the questionnaire?

a) Published sources

b) Survey

c) Experimentation

d) Direct Observation

ANSWER: B

2. The father of Noel was planning to meet with his boss to discuss a raise in his annual salary. In

preparation, he wanted to use the Consumer Price Index (generated by the Philippine Statistics

Authority) to determine the percentage increase in his salary in terms of real income over the last

three years. Which of the following methods of data collection was involved when he used the

Consumer Price Index?

a) Published sources

b) Experimentation

c) Survey

d) Direct Observation

ANSWER: A

3. In Metro Manila, one may want to record how long it takes to go from one end of the MRT to

another... Which of the following methods of data collection is involved here?

a) Published sources

b) Experimentation

c) Survey

d) Direct Observation

ANSWER: D

4. Which of the following are qualitative variables? Among the quantitative variables, classify

them as discrete or continuous.

height of students measured in number of centimeters

weight of teachers in school measured in number of kilograms

number of days it rained

hair color

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 12

sex

average daily temperature

civil status

brand of soap being used

highest educational attainment

total household expenditures last month in pesos

number of children in a household

number of customers waiting to be served at a supermarket counter

waiting time of a customer standing on queue for service at a bank

amount spent on rice last week by a household

distance traveled by a student going to school

time consumed on facebook on a particular day

ANSWER:

height of students measured in number of centimeters (quantitative: continuous)

weight of teachers in school measured in number of kilograms (quantitative:

continuous)

number of days it rained (quantitative: discrete)

hair color (qualitative)

sex (qualitative)

average daily temperature (quantitative: continuous)

civil status (qualitative)

brand of soap being used (qualitative)

highest educational attainment (qualitative)

total household expenditures last month in pesos (quantitative: discrete)

number of children in a household (quantitative: discrete)

number of customers waiting to be served at a supermarket counter (quantitative:

discrete)

waiting time of a customer standing on queue for service at a bank(quantitative:

continuous)

amount spent on rice last week by a household (quantitative: discrete)

distance traveled by a student going to school (quantitative: continuous)

time consumed on facebook on a particular day (quantitative: continuous)

5. A survey of students in a certain school is conducted. The survey questionnaire details the

following information: (a) number of family members who are working; (b) ownership of a cell

phone among family members; (c) length (in minutes) of longest call made on each cell phone

owned per month; (d) ownership/rental of dwelling; (e) amount spent on food in one week; (h)

occupation of household head; (i) total family income; (j) number of years of schooling of each

family member; (k) access of family members to social media; (l) amount of time last week spent

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 13

by each family member using the internet. For each of these variables, determine whether the

variable is qualitative or quantitative), and if the latter, state whether it is discrete or continuous.

ANSWER:

(a) number of family members who are working (quantitative: discrete);

(b) ownership of a cell phone among family members (qualitative);

(c) length (in minutes) of longest call made on each cell phone owned per month

(quantitative: continuous);

(d) ownership/rental of dwelling (qualitative);

(e) amount spent in pesos on food in one week (quantitative: discrete);

(h) occupation of household head (qualitative);

(i) total family income (quantitative: discrete);

(j) number of years of schooling of each family member (quantitative: discrete);

(k) access of family members to social media (qualitative);

(l) amount of time last week spent by each family member using the

internet(quantitative: continuous; Note to Teacher: if time measured in countable units,

then discrete)

6. (May be skipped if advanced lesson was not discussed) The Philippine Statistics Authority

(PSA) conducts a triennial Family Income and Expenditure Survey to determine the income and

expenditure patterns in the country. This survey, based on a probability sample of about 50

thousand households, is also the main source of official poverty data, and it provides weights for

the consumer price index, also generated by the PSA. What type of data source is this survey?

a) Cross section survey

b) Time series

c) Panel data survey

ANSWER : A

7. (May be skipped if advanced lesson was not discussed) Every quarter, the PSA releases the

growth in the Gross Domestic Product (GDP) as a measure of the economic performance in the

country. This data source is considered a

a) Cross section data source

b) Time series

c) Panel data

ANSWER : B

7. (May be skipped if advanced lesson was not discussed) From 2003 up to 2009, the PSA

interviewed about 15,000 sample households across rounds of the Family Income and

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 14

Expenditure Survey, as well as the Annual Poverty Indicator Survey. This data source is

considered a

a) Cross section data source

b) Time series

c) Panel data

ANSWER : C

8. Identify the scale of measurement for the following:

body temperature measured in degrees Fahrenheit (interval)

military title: Private, Seargent, Lieutenant, Captain, Major, Colonel, General

(ordinal)

clothing: hat, shirt, shoes, pants (nominal)

A score on a 5-point quiz measuring knowledge of probability and statistics (ordinal)

place (city/municipality) of birth (nominal)

Explanatory Note:

Teachers have the option to just ask this assessment orally to the entire class, or to group students and ask them to identify answers, or to give this as homework, or to use some

questions/items here for a chapter examination.

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 15

HANDOUT FOR STUDENTS

General Classification of Variables

(i) Qualitative variables (also called categorical) do not strictly take on numeric values

(but we can develop numeric codes for them, e.g., for a variable representing sex, the

values 1 and 2 may indicate male, and female, respectively). Qualitative data answer

questions “what kind.” Sometimes there is a sense of ordering in qualitative data, for

example, for income data grouped into high, middle and low-income status. Data

pertaining to sex or religion, on the other hand, do not have the sense of ordering, as

there is no such thing as a weaker or stronger sex, and a better or worse religion.

(ii) Quantitative (otherwise called numerical) data, whose sizes are meaningful, answer

questions such as “how much” or “how many”. Quantitative variables have actual

units of measure. Quantitative data may be further classified into:

a. Discrete data are those data that can be counted, e.g., the number of days before some equipment fails, the ages or survey respondents measured to the nearest

year, and the number of patients in a hospital. These data assume only a countable

number of values b. Continuous data are those that can be measured, e.g. the exact height of a survey

respondent and the exact volume of some liquid substance.

Scales of Measurement

(i) The nominal scale of measurement arises when we have variables that are and non-

numeric or where the numbers have no sense of ordering. In other words, the numbers

or categories can be put into any order, and it will not really matter. Examples of the

nominal scale include sex, marital status, religious affiliation.

(ii) The ordinal scale also deals with categorical variables, but where order is important.

Examples of the ordinal scale include socio economic status (A to E, where A is

wealthy, E is poor), IQ test core, difficulty of questions in an exam (easy, medium

difficult), rank in a contest (first place, second place, etc.), perceptions in Likert

scales.

(iii) The interval scale tells us that one unit differs by a certain amount of the property

from another unit. Examples of interval scale include measuring temperature in

Celsius, quiz results , IQ of a person. The interval scale does not possess an absolute

zero.

(iv) The ratio scale also tells us that one unit has so many times as much of the property

as does another unit. The ratio scale possesses a meaningful (unique and non-

arbitrary) absolute, fixed zero point and allows all arithmetic operations. Examples of

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 16

the ratio scale include mass, heights, weights, energy and electric charge, temperature

in the Kelvin scale

From Takahashi, S. (2009). The Manga Guide to Staitstics. Trend-Pro Co. Ltd.

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 17

C h a p t e r 1 D e s c r i b i n g D a t a – L e s s o n 4 Page 18