onboarding training - i (1)
TRANSCRIPT
-
7/31/2019 Onboarding Training - I (1)
1/39
Introduction to Statistics - IMari Sudha
-
7/31/2019 Onboarding Training - I (1)
2/39
Outline
Glossary
Levels of Measurement
Sampling
Organizing Data Statistics
-
7/31/2019 Onboarding Training - I (1)
3/39
Glossary
Population Group of individuals under study
Sample A finite subset of statistical individuals in a population
Parameter A value, usually unknown (and which therefore has to be estimated), used to
represent a certain population characteristic. For example, the population meanis a parameter that is often used to indicate the average value of a quantity.Denoted by Greek letters e.g., ,
Statistic A quantity that is calculated from a sample of data; Possible to draw more than
one sample from the same population - the value of a statistic will in general varyfrom sample to sample. Often assigned Roman letters (e.g. m and s)
-
7/31/2019 Onboarding Training - I (1)
4/39
Glossary (cont.) Sample Size
No. of individuals in a sample
Population Frame List of sampling units from which the sample is selected (directories, maps,
registered voters, list(s), etc.)
Statistical Inference Makes use of information from a sample to draw conclusions (inferences) about
the population from which the sample was taken
Experiment Any process or study which results in the collection of data, the outcome of which
is unknown
-
7/31/2019 Onboarding Training - I (1)
5/39
Glossary (cont.) Random Process
An experiment, trial, or observation that can be repeated numerous times under
the same conditions; outcome of which are independent and identicallydistributed. It is in no ways affected by any previous outcome and cannot bepredicted with certainty
Random Variable A variable whose value results from a measurement on some type of random
process e.g., the tossing of a coin Can be classified as eitherdiscrete(a random variable that may assume either a
finite number of values or an infinite sequence of values) or as continuous(avariable that may assume any numerical value in an interval or collection ofintervals
Independent Variables
Variables that are manipulated and whose effects are measured and compared;also known as treatments; may include price levels, advertising themes etc.,
-
7/31/2019 Onboarding Training - I (1)
6/39
Glossary (cont.) Experimental/Test Unit
Individuals, organizations, or other entities whose response to the independentvariables or treatments is examined; may include consumers, stores, orgeographic areas
Dependent Variables Variables that measure the effect of the independent variables on the test units;
may include sales, profits, and market share
Extraneous Variables Variables other than the independent variables that affect the response on the
test units; can confound the dependent variable measures such that it weakensor invalidates the results of the experiment
Includes store size, store location, and competitive effort
Raw Data Data collected in original form
-
7/31/2019 Onboarding Training - I (1)
7/39
Glossary (cont.) Frequency
Variables that measure the effect of the independent variables on the test units;may include sales, profits, and market share
Frequency Distribution The organization of raw data in table form with classes and frequencies
-
7/31/2019 Onboarding Training - I (1)
8/39
Measurement Scales
Variables differ in how well they can be measured, i.e., inhow much measurable information their measurement scale can provide
There is obviously some measurement error involved inevery measurement, which determines the amount ofinformation that we can obtain
Another factor that determines the amount of informationthat can be provided by a variable is its type/level ofmeasurement scale
-
7/31/2019 Onboarding Training - I (1)
9/39
Outline
Glossary
Levels of Measurement
Sampling
Organizing Data Statistics
-
7/31/2019 Onboarding Training - I (1)
10/39
Levels of Measurement
Data obtained from measurement classified usingnumbers (In order to determine the way we are going to measure thevariables)
Classification can be done with different levels of
precision or levels of measurement Important to know the LOM we are working on partly
determines the arithmetic and statistical operations thatcan be carried out on them
-
7/31/2019 Onboarding Training - I (1)
11/39
Levels of Measurement (cont.)
Four types of Levels of Measurement
They, in ascending order of precision are:
- Nominal
- Ordinal
- Interval
- Ratio
-
7/31/2019 Onboarding Training - I (1)
12/39
Nominal Levels of Measurement (cont.)
Numbers are used to classify data words or letterwould be equally appropriate
Variables assessed on a nominal scale are calledcategoricalvariables
Examples include- Religion (Protestant Catholic, Hebrew, Buddhist, etc)
- Race (Caucasian, African-American, Hispanic, Asian, etc)
- Linguistic Group
- Marital Status (Married, Single, Divorced)
- Credit Card Numbers, Bank Account Numbers, Employee ID
-
7/31/2019 Onboarding Training - I (1)
13/39
Nominal (cont.) Simple and widely used when relationship between two
variables is to be studied Nominal Scale numbers are no more than labels; used
specifically to identify different categories of responses
E.g.,
What is your gender?
[ ] Male[ ] Female
-
7/31/2019 Onboarding Training - I (1)
14/39
Nominal (cont.) E.g.,A survey of retail stores done on two dimensions -
way of maintaining stocks and daily turnover.How do you stock items at present?
[ ] By product category[ ] At a centralized store[ ] Department wise
[ ] Single warehouseDaily turnover of consumer is?
[ ] Between 100 200[ ] Between 200 300[ ] Above 300
-
7/31/2019 Onboarding Training - I (1)
15/39
Ordinal Levels of Measurement (cont.)
Simplest attitude measuring scale used in Marketing
Research Values given to measurements can be ordered
There is a rough quantitative sense to theirmeasurement, but the differences between scores are
not necessarily equal Examples Shoe size
Shoes are assigned a number to represent the size, larger numbers mean
bigger shoes (show an ordered relationship between numbered items) we
know that a shoe size of 8 is bigger than a shoe size of 4. What you cant saythough is that a shoe size of 8 is twice as big as a shoe size of 4
-
7/31/2019 Onboarding Training - I (1)
16/39
Ordinal (cont.)
E.g., Results of a horse race, which say only whichhorses arrived first, second, third, etc. but include noinformation about times
Textual labels can be instead of numbers to represent
the category responses
-
7/31/2019 Onboarding Training - I (1)
17/39
Ordinal (cont.)
E.g.1, Rank the following attributes (1 5), on theirimportance in a microwave oven
1. Company Name
2. Functions
3. Price
4. Comfort5. Design
The most important attribute is ranked 1 by the respondents and the leastimportant is ranked 5. Instead of numbers, letters or symbols too can beused to rate in a ordinal scale. Such scale makes no attempt to measure thedegree of favorability of different rankings
-
7/31/2019 Onboarding Training - I (1)
18/39
Ordinal (cont.)
If there are 4 different types of fertilizers and if they areordered on the basis of quality as Grade A, Grade B, GradeC, Grade D is again an Ordinal Scale
If there are 5 different brands of Talcum Powder and if a
respondent ranks them based on say, Freshness into Rank1 having maximum Freshness Rank 2 the second maximumFreshness, and so on, an Ordinal Scale results
-
7/31/2019 Onboarding Training - I (1)
19/39
Interval Levels of Measurement (cont.)
Measurements are classified, ordered with equal distancesbetween each interval on the scale (right along the scalefrom low end to high end i.e., )
Does not have an absolute zero; zero is arbitrary withfurther numbers placed at equal interval
Also termed as Rating Scales
E.g.,Temperature in centigrade: distance between 96 and 98oC is the sameas between 100 and 102 oC; measurement of 100oC does not mean that thetemperature is 10 times hotter than something measuring 10oC even though the
value given on the scale is 10 times as large
-
7/31/2019 Onboarding Training - I (1)
20/39
Interval Levels of Measurement (cont.)
E.g., How do you rate your present refrigerator for thefollowing qualities
Tells us that position 5 on the scale is above position 4 and also thedistance from 5 to 4 is same as distance from 4 to 3
Does not permit conclusion that position 4 is twice as strong asposition 2 because no zero position has been established
Company Name
Less
Known 1 2 3 4 5
Well
Known
Functions Few 1 2 3 4 5 Many
Price Low 1 2 3 4 5 High
Design Poor 1 2 3 4 5 Good
Overall SatisfactionVery Dis-Satisfied 1 2 3 4 5
VerySatisfied
-
7/31/2019 Onboarding Training - I (1)
21/39
Interval (cont.)
E.g.2, Calendar years are an interval scale. The arbitrary 0(or 1 depending on your viewpoint) was assigned whenChrist was born and time before this is labeled BC
E.g.3, Difference between the following values is measured
by a fixed scale- Money
- People
- Education (in years)
-
7/31/2019 Onboarding Training - I (1)
22/39
Ratio Levels of Measurement (cont.) Has a natural zero point and further numbers are placed
at equally appearing Values given to measurements canbe ordered
Divisions between the points on the scale have the samedistance between them and numbers on the scale areranked according to size
Not widely used in Marketing Research unless a BaseValue is available for comparison
For example scales for measuring physical quantitieslike length, weight, etc.
-
7/31/2019 Onboarding Training - I (1)
23/39
Ratio (cont.) Data on certain demographic or descriptive attributes, if
they are obtained through open-ended questions, willhave ratio-scale properties E.g.,
What is your annual income before taxes? ______ $How far is the Theater from your home ? ______ miles
Answers to these questions have a natural, unambiguous startingpoint, namely zero. Since starting point is not chosen arbitrarily,computing and interpreting ratio makes sense. For example we cansay that a respondent with an annual income of $ 40,000 earnstwice as much as one with an annual income of $ 20,000
-
7/31/2019 Onboarding Training - I (1)
24/39
Levels of Measurement (cont.)
Nominal: Mode is frequently used for response category Ordinal: The central tendency can be represented by its
mode or its median, but the mean cannot be defined
Interval: Can be represented by its mode, its median, orits arithmetic mean. Statistical dispersion can bemeasured by range, inter-quartile range, and standarddeviation.
-
7/31/2019 Onboarding Training - I (1)
25/39
Levels of Measurement (cont.)
Scale Type Mathematicalstructure Permissible Statistics
Admissible ScaleTransformation Mathematical structure
Nominal (also denoted as
categorical or discrete)Mode, chi square One to One (equality(=))
Standard set structure
(unordered)
Ordinal Median, percentileMonotonic increasing
(order(
-
7/31/2019 Onboarding Training - I (1)
26/39
Levels of Measurement (cont.)
OK to compute Nominal Ordinal Interval Ratio
Frequency Distribution Yes Yes Yes Yes
Median and Percentiles No Yes Yes Yes
Add or Substract No No Yes Yes
Mean, Standard Deviation,Standard Error of the Mean No No Yes Yes
Ratio or Coefficient of Variation No No No Yes
-
7/31/2019 Onboarding Training - I (1)
27/39
Outline
Glossary
Levels of Measurement
Sampling
Organizing Data Statistics
-
7/31/2019 Onboarding Training - I (1)
28/39
Sampling Depends upon the nature of the data and type of enquiry
Procedure for selecting a sample- Decide on the target population/audience
- Identification of population frame
- Selection of sampling procedure/technique
- Decide the sample size
- Execute the Sampling Process (Select the sample individuals)
The nature of selecting a sample can be broadly classifiedunder three heads:- Non-Probability Sampling
- Probability Sampling
- Mixed Sampling
-
7/31/2019 Onboarding Training - I (1)
29/39
Sampling (cont.)
Procedure for selecting a sample
- Decide on the target population/audience- Identification of population frame
- Selection of sampling procedure/technique
- Decide the sample size- Execute the Sampling Process (Select the sample individuals)
-
7/31/2019 Onboarding Training - I (1)
30/39
Sampling (cont.)
Non-Probability Sampling- Every individual in the population does not have equal chance of being
selected
- Suffers from drawbacks of favoritism and nepotism depending upon
beliefs and prejudice of investigator- Statistically valid statements cannot be made about the precision of the
estimates (i.e. predictive value is weak)
- Methods of Non-Prob. Sampling: 1. Convenience Sampling2. Judgment Sampling
3. Quota Sampling
4. Snowball Sampling
-
7/31/2019 Onboarding Training - I (1)
31/39
Sampling (cont.) Mixed Sampling
- Samples selected partly according to some laws of chance and partly
according to a fixed sampling rule
- No assignment of probabilities
-
7/31/2019 Onboarding Training - I (1)
32/39
Sampling (cont.) Probability Sampling
- Every individual in the population has an equal chance of being selected
- No assignment of probabilities
- Different types of Probability Sampling:
I. Where each individual has an equal chance of being selected
II. Sampling units have different probabilities of being selected
III.Probability of selection of an individual is proportional to the sample size- Forms of Probability Sampling:
I. Simple Random Sampling
II. Stratified Simple Random Sampling
III.Systematic Sampling
IV.Cluster Sampling (simple and multistage)
-
7/31/2019 Onboarding Training - I (1)
33/39
Outline
Glossary
Levels of Measurement
Sampling
Organizing Data Statistics
-
7/31/2019 Onboarding Training - I (1)
34/39
Organizing Data
The first step in the analysis of the data is organizing thecollected numbers
A frequency distribution is a tool for organizing data
The first step in drawing a frequency distribution is to
construct a frequency table
A frequency table is a way of organizing the data by listingevery possible score (including those not actually obtained in the sample)as a column of numbers and the frequency of occurrence of
each score as another
-
7/31/2019 Onboarding Training - I (1)
35/39
Organizing Data (cont.): Frequency Distribution
Contingency Table- Frequency tables of two variables presented simultaneously
Information contained in the frequency table may betransformed to a graphical or pictorial form, like:
I. Histograms
II. Absolute Frequency Polygons
III. Relative Frequency Polygons
IV. Absolute Cumulative Frequency Polygons
V. Relative Cumulative Polygons
VI. Box PlotsVII. Pie Charts etc.,
-
7/31/2019 Onboarding Training - I (1)
36/39
Data Analysis
The steps in the analysis of the data include:- Data must be accurately scored and systematically organized to facilitate
data analysis
I. Scoring: assigning a total to each participants instrument
II. Tabulating: the mechanics of organizing the data
III. Coding: assigning numerals (e.g., ID) to data
IV. Performing both the initial and more detailed analysis
-
7/31/2019 Onboarding Training - I (1)
37/39
Outline
Glossary
Levels of Measurement
Sampling
Organizing Data Statistics
-
7/31/2019 Onboarding Training - I (1)
38/39
Statistics
Descriptive Statistics Gives numerical and graphic
procedures to summarize acollection of data in a clear andunderstandable way
Inferential Statistics Provides procedures to draw
inferences about a populationfrom a sample
-
7/31/2019 Onboarding Training - I (1)
39/39
An unsophisticatedforecaster uses
statistics as a drunken
man uses lamp-posts -for support rather thanfor illumination ~
Andrew Lang