variables qualitative non- numerical · non-probability samples the probability ... the main...

MAT0144

hmz/june2016 1

ERRATA: CHAPTER 1 INTRODUCTION TO STATISTICS Page 5: (please cancel “Types of data”) Variables

Variables can be classified as either qualitative or quantitative. Quantitative variables are further

classified as either discrete or continuous. The following chart summarises the classifications.

Variables

Quantitative

Numerical

Discrete

Continuous

Qualitative

Non-numerical

Can be counted. Can assume only fixed values with no intermediate values. Example: number of children, shoe size

Can be measured. Can assume any value in an interval. Example: height, duration, temperature, expenses.

Can be placed into distinct categories, according to some characteristic or attribute. Often referred to as categorical variables. Example: gender, race, colour.

MAT0144

hmz/june2016 2

Levels of Measurement

In addition to being classified as qualitative or quantitative, variables can be classified by how they

are categorized, counted or measured. This type of classification uses measurement scales of which

there are four types: nominal, ordinal, interval and ratio. The types of variables and their relation to

the levels of measurement is summarized in the following chart.

Variables

Qualitative

Nominal

Ordinal

Quantitative

Discrete Interval

Continuous Ratio

MAT0144

hmz/june2016 3

The levels of measurement of variables are briefly explained in the following table.

LEVEL DESCRIPTION

Data are qualitative.

No inherent order between categories (we cannot say that one particular category is

better than another).

The lowest of the four ways to characterise data.

Deals with names, categories or labels.

Example: blood group, gender, yes or no response to a survey, favourite breakfast food,

Data are qualitative.

Data can be ordered.

There are no meaningful differences between the data ranks (the difference between two

ranks of an ordinal scale cannot be assumed to be the same as the difference between two

other ranks).

The next level after nominal.

Example: ranks (1st, 2nd, 3rd), ranks (Good, Better, Best), likert scale (Strongly Agree/

Agree/ Neutral/ Disagree/ Strongly Disagree), examination grades, size of T-shirt.

Data are quantitative.

Data can be ranked.

No true 0 (“0” does not mean absence of the quantity being measured).

The next level after ordinal.

Example: temperature.

Temperature does not have a true 0 point even if one of the scaled values happens to

carry the name "zero." The Celcius scale illustrates the issue. 0℃ does not represent the

complete absence of temperature (the absence of any molecular kinetic energy).

Other examples: IQ, date when measured from an arbitrary epoch (AD, BC), direction

measured in degrees from true or magnetic north.

MAT0144

hmz/june2016 4

Data are quantitative.

0 is meaningful (“0"indicates absence of the quantity being measured).

The highest level of measurement.

Example: amount of money.

Money is measured on a ratio scale because, in addition to having the properties of an

interval scale, it has a true 0 point: if you have 0 money, this implies the absence of money.

Since money has a true 0 point, it makes sense to say that someone with RM50 has twice

as much money as someone with RM25.

Other examples: weight, height, time taken.

Example 4

State whether the following are nominal, ordinal, interval or ratio data.

(a) A Statistics test which a student took classified as either easy, difficult or very difficult and these

alternatives are coded 1, 2 and 3.

(b) The IQ scores of 300 MENSA members in Malaysia recorded upon signing up.

(c) The platelet count of dengue patients at a hospital recorded within three days of admission.

(d) The make (brand) of cars reviewed by a newspaper columnist in a year.

(e) A list of temperatures in degrees Kelvin for the month of May compiled by a meteorologist.

(f) The most expensive cars for the year 2015 listed by a car magazine.

MAT0144

hmz/june2016 5

Data Collection and Sampling Techniques

Sampling is the process of selecting a number of subjects for a study in such a way that the subjects

represent the larger group from which they were selected. The reason for conducting a sample survey is

to estimate the value of some attribute of a population. The true value of a population attribute is called a

population parameter. A sample statistic which is obtained from sample data is used as an estimate of a

population parameter.

The quality of a sample statistic (i.e., accuracy, precision, representativeness) is strongly affected by the

way that sample elements are chosen, that is, by the sampling method.

As a group, sampling methods fall into one of two categories.

Probability samples Each subject in the population has a known (non-zero) chance of being

chosen for the sample.

Non-probability samples The probability that each subject in the population will be chosen is

unknown, and/or it cannot be determined that each subject in the

population has a non-zero chance of being chosen.

In this syllabus, we will only discuss probability sampling methods. The key benefit of probability

sampling methods is that they guarantee that the sample chosen is representative of the population. This

ensures that the statistical conclusions will be valid.

The main types of probability sampling methods are simple random sampling, systematic sampling,

stratified sampling, and cluster sampling. These types of probability sampling methods are explained in

the table below.

MAT0144

hmz/june2016 6

DESCRIPTION

A basic type of sampling which can be a component of other more complicated sampling

methods.

Every subject in the population has an equal and known chance of being selected.

Subjects are selected by random numbers (random numbers can be generated by using

MS Excel).

Since it is free of classification error, it requires minimum advance knowledge of the

population. Its simplicity also makes it relatively easy to interpret data collected in this

manner.

Best suits situations where not much information is available about the population and

data collection can be efficiently conducted on randomly distributed items, or where the

cost of sampling is small enough to make efficiency less important than simplicity.

Also called an Nth name selection technique. After the required sample size has been

calculated, every Nth record is selected from a list of population subjects.

As long as the list does not contain any hidden order, this sampling method is as good as

the simple random sampling method.

Its only advantage over the random sampling technique is simplicity.

Frequently used to select a specified number of records from a computer file.

A commonly used probability method that is superior to random sampling because it

reduces sampling error.

A stratum is a subset of the population that shares at least one common characteristic.

Examples of strata might be males and females, juniors and seniors in a university, or

managers and non-managers, or based on geography (north, south, east, west).

The relevant strata and their actual representation in the population are identified. Simple

random sampling is then used to select a sufficient number of subjects from each stratum.

("Sufficient" refers to a sample size large enough for us to be reasonably confident that the

stratum represents the population).

Stratified sampling is often used when one or more of the strata in the population have a

low incidence relative to the other strata.

MAT0144

hmz/june2016 7

The population is divided into groups, called clusters.

A number of clusters to be included in the sample is selected using a probability sampling

method (usually simple random sampling).

Each subject of the population can be assigned to one, and only one, cluster.

Only subjects within sampled clusters are surveyed.

There are two types of cluster sampling method:

One-stage sampling

All of the subjects within selected clusters are included in the sample

Two-stage sampling

A subset of subjects within selected clusters are randomly selected for inclusion in the

sample.

The main disadvantage of cluster sampling is it generally provides less precision than

either simple random sampling or stratified sampling.

Cluster sampling should only be used when it is economically justified. That is, when

reduced cost can be used to overcome the losses in precision.

One version of cluster sampling is area cluster sampling or geographical cluster sampling.

The difference between stratified sampling and cluster sampling methods:

Stratified Sampling Cluster Sampling

the sample includes elements from each

stratum.

the sample includes elements only from

sampled clusters.

MAT0144

hmz/june2016 8

(please note additional example)

Example 5

Classify each sample as random, systematic, stratified or cluster.

(a) Every tenth car owner using the valet service of a shopping mall in Kuantan is asked to rate the service.

(b) Employees of three oil and gas companies are selected using random numbers to determine annual

salaries.

(c) In a large school district, teachers from nine schools are interviewed to determine if they believe the

newly implemented school-based assessment system has been more effective than the old system.

(d) Students in a university are divided into six groups according to their gender and according to

whether they drive or take public transport to campus. Then 10 students are selected from each group

and interviewed to determine how long they take to come to class every day.

(e) Every 100th cupcake baked is checked to determine its trans-fat content.

Solution

Example 6

An auto analyst is conducting a satisfaction survey, sampling from a list of 10,000 consumers. The list

includes 2,500 Proton buyers, 2,500 Perodua buyers, 2,500 Honda buyers, and 2,500 Toyota buyers. The

analyst selects a sample of 400 car buyers, by randomly sampling 100 buyers of each brand.

Is this an example of simple random sampling? Explain.

MAT0144

hmz/june2016 9

(please use this data set instead)

Example 4

The ages of owners in a new residential area are shown below. Construct a frequency distribution for the

data using seven equal classes. Compute the relative frequency and cumulative relative frequency of each

class.

41 54 47 40 39 35 50 37 49 42 70 32 44 52 39 50 40 30 34 69 39 45 33 42 44 63 60 27 42 34 50 42 52 38 36 45 35 43 48 46 31 27 55 63 46 33 60 62 45 56 45 34 53 50 50

Solution

MAT0144

hmz/june2016 10

(please use these questions instead)

Exercise 1.2

The expenses (in RM) per visit of patients to a cardiologist’s clinic are tabled below. Construct

a frequency distribution using seven classes. Hence, draw a histogram, a frequency polygon

and an ogive for the data.

130 190 140 80 100 120 220 220 110 100

210 130 100 90 210 120 200 120 180 120

190 210 120 200 130 180 260 270 100 160

190 240 80 120 90 190 200 210 190 180

115 210 110 225 190 130

(a) From the histogram, give the class containing the cardiologist’s fee that most patients

(b) From the ogive, determine the middle value of the cardiologist’s fee.

A study was conducted to find out if part time jobs affect the academic performance of

secondary school and university students in Malaysia. The pie chart below gives a breakdown

of part time jobs that Malaysian students do.

(a) Are there any part time jobs that involve more than 25% of the students?

(b) Which two part time jobs appear to have the closest percentages of student involvement?

ConstructionSmall Business

Cashier

Food Stall

Workshop

Factory

Cinema

MAT0144

hmz/june2016 11

Finding quartiles for an odd data set:

Firstly, arrange the data in ascending order

3, 5, 7, 8, 12, 13, 14, 18, 21

lower half upper half

Median = 𝑄2 = 12

𝑄1 = median of lower half =5+7

𝑄3 = median of upper half =14+18

Finding quartiles for an even data set:

Firstly, arrange the data in ascending order

3, 5, 7, 8, 12, 13, 14, 17, 18, 21

lower half upper half

Median = 𝑄2 =12+13

2= 12.5

𝑄1 = median of lower half = 7

𝑄3 = median of upper half = 17

(please change example 12 (b))

Example 12

(a) Find 𝑃33 for Example 10(a)

(b) Find 𝑃60 for Example 10(c)

Solution

MAT0144

hmz/june2016 12

(Exercise 1.3 No 5 please use this question instead)

The data below represent the scores of a Placement Test for a group of pre-university students:

SCORE FREQUENCY

196.5 – 217.5 5

217.5 – 238.5 17

238.5 – 259.5 22

259.5 – 280.5 48

280.5 – 301.5 22

301.5 – 322.5 6

(a) Find the mean, median and standard deviation.

(b) Compute the Pearson’s Coefficient of Skewness, hence comment on the skewness of the

distribution.

(c) Construct a percentile graph (use a graph paper). Then, find:

i) the number of students who scored 270 and higher.

ii) the percentage of students who obtain 250 to 300 marks.

variables qualitative non- numerical · non-probability samples the probability ... the main...

Documents

sampling - rmc media club/4th year/communitymed/sam… ·...

sampling. sampling probability sampling probability sampling...

crowdsourced non-probability sampling a telecom industry...

sampling probability and inference - us.sagepub.com ·...

icssr workshop on research methodology in social sciences...

3 sampling probability non probability.ppt

day 8: sampling map why sample? sampling terminology...

agenda sampling probability sampling nonprobability...

probability and sampling

10/12/2004 9:20 amgeog 237a1 sampling sampling (babbie,...

unit 11 crc - food science universe (fsu) · 11.5.1...

metode penarikan sampel (sampling method)teknik pengambilan...

weighting non-probability and probability sample surveys...

inference.ppt - © aki taanila1 sampling probability sample...

non probability sampling

probability sampling

why sampling matters - bam.ac.uk 14.pdf · why sampling...

international journal of education and social science...

samples in a national survey probability and non ... and...

sampling - environmental science & policy non-zero...