amr concept note-1 (freq dist, cross tab, t-test and anova)

ADVANCED MARKETING RESEARCH Concept Note-I

Dr. VIKAS GOYAL

For class circulation only

1

� A frequency distribution is a convenient way of looking at the consolidated values of a variable.

� In a frequency distribution, one variable is considered at a time.

� A frequency distribution for a variable produces a table of frequency counts, percentages, and cumulative

percentages for all the values associated with that variable.

� A frequency distribution is a tool for organizing data. We use it to group data into categories and show the

number of observations in each category.

� Frequency distribution also indicates the extent of valid responses.

� Frequency distribution is a mathematical distribution whose objective is to obtain a count of the number

of responses associated with different values of one variable and to express these counts in % terms.

� Supplementary extension to the frequency distribution is to look at the percentage of values that point up

in every category. This is called a relative frequency distribution or percent frequency distribution.

� The frequency data may be used to construct a histogram, or a vertical bar chart, in which the values of

the variable are portrayed along X–axis and the absolute or relative frequencies of the values are placed

along Y-axis.

� Frequency distribution can help in assessing the following three characteristics of any variable:

o Measures of Location: Central tendency (Mean, Median, Mode)

o Measures of Variability: Range, Variance & Std. Deviation (s), Coefficient of Variation (s/mean)

o Measures of Shape: Skewness, Kurtosis (zero for normal)

� Variance is the mean squared deviation of all the values from the mean.

� Standard Deviation is the square root of the variance.

� Coefficient of variation is a useful expression in sampling theory for the standard deviation as a % of the

mean.

� Skewness and Kurtosis provide an idea of the shape of the distribution.

� Kurtosis is a measure of the relative peakedness of flatness of the curve defined by the frequency

distribution. The Kurtosis of a normal distribution is zero.

� Skewness is a characteristic of a distribution that assesses its symmetry about the mean.

� A null hypothesis is a statement of the status quo, one of no difference or no effect. The null hypothesis

can either be rejected or fail to be rejected, but cannot be accepted. The null hypothesis is always about

the population variables rather than the sample variables.

� An alternative hypothesis is one in which some difference or effect is expected. Accepting the alternative

hypothesis will lead to changes in opinions or actions.


Dr. VIKAS GOYAL


2

� The test statistic measures how close the sample has come to the null hypothesis.

� The test statistic often follows a well-known distribution, such as the normal, t, or chi-square distribution.

� Type I error occurs when the sample results lead to the rejection of the null hypothesis when it is in fact

true. The probability of type I error (α) is also called the level of significance.

� Type II error occurs when, based on the sample results, the null hypothesis is not rejected when it is in fact

false. The probability of type II error is denoted by β.

� Cross-tabulation is a statistical technique that describes two or more variables simultaneously and results

in tables that reflect the joint distribution of two or more variables that have a limited number of

categories or distinct values.

� A cross-tabulation is the merging of the frequency distribution of two or more variables in a single table.

� Cross-tabulation with two variables is also known as Bivariate Cross-tabulation.

� Cross-tabulation tables are also called contingency tables.

� The chi-square statistic (χ2) is used to test the statistical significance of the observed association in a cross-

tabulation.

� The chi-square distribution is a skewed distribution whose shape depends solely on the number of

degrees of freedom. As the number of degrees of freedom increases, the chi-square distribution becomes

more symmetrical.

� Chi-square requires that you use numerical values, not percentages or ratios.

� Chi-square should not be calculated if the expected value in any category is less than 5.

� The phi coefficient (φ) is used as a measure of the strength of association between the two variables, in

the special case of a table with two rows and two columns (a 2 x 2 table).

� Cramer’s V can be used to check the strength of relationship in cross tab for any size of table.

� t tests are conducted for examining hypothesis about means.

� t test could be conducted on the mean of one or two samples of observations.

� t test are used to provide inferences for making statements about the means of parent populations.

� Parametric tests are hypothesis-testing procedures that assume that the variables of interest are

measured on at least an interval scale.

� Nonparametric tests are hypothesis-testing procedures that assume that the variables are measured on a

nominal or ordinal scale.

� Single sample t-test is performed when we wish to test the hypothesis about the mean of a variable

against an absolute number (say, mean height is greater than 4)


Dr. VIKAS GOYAL


3

� Two-sample t-tests for testing the difference in the mean of either the independent samples, paired

samples and overlapping samples.

� Two independent sample t-tests allow researchers to evaluate the mean difference between two

populations using the data from these two separate samples.

� Two independent samples t-test is used when two separate sets of independent and identically

distributed samples are obtained, one from each of the two populations being compared.

� Paired samples t-tests is used when the hypothesis needs to compare the mean of two different variables

for a single population, without identifying separate groups. Thus, it is called paired or related sample t-

test. It characteristically comprise of a sample of matched pairs, or one group of units that has been tested

twice.

� Kolmogorov-Smirnov (K-S) one-sample test is a one-sample nonparametric goodness-of-fit test that

compares the cumulative distribution function for a variable with a specified distribution.

� Mann-Whitney U Test is a statistical test for a variable measured on an ordinal scale, comparing the

difference in the location of two populations based on observations from two independent samples.

� Wilcoxon test can be used as the non-parametric equivalent for paired sample t-test.

� Analysis of variance (ANOVA) & Analysis of Covariance (ANCOVA) are used for examining the differences

in the mean values of the dependent variable associated with the effect of the independent variables with

more than 2 categories/levels or treatment. Dependent variable needs to be metric and the independent

variable needs to be categorical for ANOVA.

� Analysis of variance (ANOVA) is a statistical technique for examining the differences among means for two

or more populations.

� Treatment in ANOVA is a particular combination of factor levels or categories.

� One-way ANOVA is a technique in which there is only one factor or independent variable.

� SSy is the total variation in Y, i.e. the sum of squares. This is the total of SS between groups and SS within

groups.

� SSbetween is also denoted as SSx, is the variation in Y related to the variation in the mean level of different

categories of X. This represents variation between the categories of X, or the portion of the sum of

squares in Y related to X.

� SSwithin is also referred to as SSerror, is the variation in Y due to the variation within each of the categories of

X.


Dr. VIKAS GOYAL


4

� The strength of the effects of individual X (independent variable or factor) on Y (dependent variable) is

measured by eta2 (η

2). The value of η

2 varies between 0 and 1.

� N-way ANOVA is a model where two or more factors are involved.

� Strength of relationship between individual factors and the dependent variable can be measured by using

omega-squared.

� Analysis of covariance (ANCOVA) is an advanced analysis of variance procedure in which the effects of one

or more metric-scaled independent variables are removed from the dependent variable before conducting

the ANOVA. Metric independent variables in ANOVA are treated as covariates.

� The covariate is generally used as the control variables in ANCOVA.

� Multivariate analysis of variance (MANOVA) is similar to analysis of variance (ANOVA), except that

instead of one metric dependent variable, we have two or more.

� A Classification of Hypothesis Testing Procedures for Examining Differences:

� A Concept map for frequency distribution:


Dr. VIKAS GOYAL


5

� A Concept map for cross-tabulation:

� A Concept map for conducting t-test


Dr. VIKAS GOYAL


6

� Relationship Amongst Test, Analysis of Variance, Analysis of Covariance, & Regression:

� A Concept map for One-Way ANOVA:


Dr. VIKAS GOYAL


7

amr concept note-1 (freq dist, cross tab, t-test and anova)

Documents

percent frequency distribution

normal distribution

wellknown distribution

mathematical distribution

chisquare distribution

frequency data

null hypothesis

table of frequency counts