Introduction to Quantitative Data Analysis
Introduction to Quantitative Data Analysis
Quantitative Data AnalysisQuantitative Data Analysis
Types of StatisticsTypes of Statistics DescriptiveDescriptive Inferential—probabilistic sampling techniques, notion of Inferential—probabilistic sampling techniques, notion of
randomrandom
Data Preparation (Coding & Cleaning Data)Data Preparation (Coding & Cleaning Data) Common Ways of Presenting StatisticsCommon Ways of Presenting Statistics
TablesTables ChartsCharts GraphsGraphs
Presenting Data (Raw Data)Presenting Data (Raw Data)
Regan, T. (1985). In search of sobriety: Identifying factors contributing to the recovery from alcoholism. Kentville, NS.
univariate:= one variableunivariate:= one variable ““raw count” (frequencies, percentages)raw count” (frequencies, percentages)
Simple Univariate Tables of Frequency Distributions and Percentages
Simple Univariate Tables of Frequency Distributions and Percentages
Neuman (2000: 318)
Revision of Example: Collapsing Categories and Treatment of Missing Data in Tables Revision of Example: Collapsing Categories and Treatment of Missing Data in Tables
Johnson, A. G. (1977). Social Statistics Without Tears. Toronto: McGraw Hill.
Example: Raw Example: Raw Data FrequenciesData Frequencies
Types of Missing DataTypes of Missing Data
Examples: Non-response, don’t know, refusal etc.Examples: Non-response, don’t know, refusal etc. Categories of missing dataCategories of missing data
Missing data completely at random (MCAR) (MCAR) Equipment malfunction, illness etc…Equipment malfunction, illness etc…
Missing data at randomMissing data at random Can be explained by controlling for another variableCan be explained by controlling for another variable
Missing data that is not randomMissing data that is not random
Some techniques for dealing with missing dataSome techniques for dealing with missing data
OmissionOmission (may involve using statistical techniques or (may involve using statistical techniques or logie to decide who to omit, ex. Add all like cases logie to decide who to omit, ex. Add all like cases based on other responses)based on other responses)
ImputationImputation (guess at what the likely responses would (guess at what the likely responses would be by comparing with other response patterns)be by comparing with other response patterns) Match other characteristicsMatch other characteristics Distribute by equally or use weighted responsesDistribute by equally or use weighted responses
Treatment of Missing Data (Ommison vs. Inclusion)Treatment of Missing Data (Ommison vs. Inclusion)
Table 5-1 Alienation of Workers
Level of Alienation F %High 30 14 Medium 100 48 Low 20 10 No Response 60 29
(Total) 210 100
Comparison of % distributions and without non Comparison of % distributions and without non respondentsrespondents
Table 5-1 Alienation of Workers
Level of Alienation F %High 30 20 Medium 100 67 Low 20 13
(Total) 150 100
Comparison with high & medium alienation collapsedComparison with high & medium alienation collapsed
Treatment of Missing Data & collapsing categories (creating new variables after data collection)
Treatment of Missing Data & collapsing categories (creating new variables after data collection)
Table 5-1 Alienation of Workers
Level of Alienation F %High & Medium 130 62 Low 20 10 No Response 60 29
(Total) 210 100
Table 5-1 Alienation of Workers
Level of Alienation F %High & Medium 130 87 Low 20 13
(Total) 150 100
Non-respondents included Non-respondents eliminated
Comparison with medium & low collapsedComparison with medium & low collapsedTreatment of Missing Data Treatment of Missing Data
Table 5-1 Alienation of Workers
Level of Alienation F %High 30 14 Medium & Low 120 58 No Response 60 29
(Total) 210 100
Table 5-1 Alienation of Workers
Level of Alienation F %High 30 20 Medium & Low 120 80
(Total) 150 100
Non-respondents included Non-respondents eliminated
Effects of Collapsing Response CategoriesEffects of Collapsing Response Categories
Comparison of two different ways of Comparison of two different ways of collapsing response categoriescollapsing response categories
Table 5-1 Alienation of Workers
Level of Alienation F %High & Medium 130 87 Low 20 13
(Total) 150 100
Table 5-1 Alienation of Workers
Level of Alienation F %High 30 20 Medium & Low 120 80
(Total) 150 100
Collapsing categories (U.N. example)Collapsing categories (U.N. example)
Babbie, E. (1995). The practice of social researchBelmont, CA: Wadsworth
Collapsing Categories & omitting missing dataCollapsing Categories & omitting missing data
Babbie, E. (1995). The practice of social researchBelmont, CA: Wadsworth
Grouping Response CategoriesGrouping Response Categories
To make new categoriesTo make new categories Facilitate analysis of trendsFacilitate analysis of trends But decisions have effects on the interpretation But decisions have effects on the interpretation
of patternsof patterns Importance of understanding logic, conceptual Importance of understanding logic, conceptual
and operational definitionsand operational definitions Same data can produce totally different-looking Same data can produce totally different-looking
resultsresults
Bivariate Tables (Cross Tabulations): Tables Presenting Relationship between Two Variables
Bivariate Tables (Cross Tabulations): Tables Presenting Relationship between Two Variables
Singleton, R., Straits, B. & Straits, M. (1993)Approaches to social research. Toronto: Oxford
Expected outcomes (Null Hypothesis)Expected outcomes (Null Hypothesis)
Singleton, R., Straits, B. & Straits, M. (1993)Approaches to social research. Toronto: Oxford
Interpretation issues (Bivariate Tables) Interpretation issues (Bivariate Tables)
Percentages within categories of attributes of Percentages within categories of attributes of independent variable independent variable
In example:In example: Independent variable: genderIndependent variable: gender Dependent variable: fear of walking alone at nightDependent variable: fear of walking alone at night Women more afraid than men Women more afraid than men
Styles of Presentation of Percentaged Tables (Bivariate)Styles of Presentation of Percentaged Tables (Bivariate)
Table 1. Percentage in support of strike by type of school
Percent supportingType of School Strike
Secondary 60% (800)
Elementary 30% (1000)
__________________________________________________________= .30 N = 1800
Serial NumberDescriptive CaptionDependent Variable
IndependentVariable
Variable
Categories
One category of dichotomousdependent variable
Marginals for independentvariable
Percentage difference(epsilon)
Total Sample
Factors to consider when reading tableFactors to consider when reading table
SamplingSampling technique? Or total technique? Or total populationpopulation?? Conceptual & operational definitions (Conceptual & operational definitions (Validity & Validity &
reliability issues)reliability issues) What What measuremeasure was used? was used? How was it used?How was it used? Data preparation and cleaning issues (treatment of Data preparation and cleaning issues (treatment of
inconsistencies, non-responses etc..)inconsistencies, non-responses etc..) Data Analysis issuesData Analysis issues
Other Ways of Presenting Same Data & Interpretation IssuesOther Ways of Presenting Same Data & Interpretation Issues
Deciding on Direction of Calculation of Deciding on Direction of Calculation of Percentages?Percentages? Depends on Objectives (Research Questions), for Depends on Objectives (Research Questions), for
example:example: Are we interested in the patterns within each school Are we interested in the patterns within each school
type?type? Are we interested in overall support of strike?Are we interested in overall support of strike?
Other Ways of Presenting Bivariate Relationships in tabular form (ex. Ratios)Other Ways of Presenting Bivariate Relationships in tabular form (ex. Ratios)
Control variables: Trivariate Tables Men/Women Drivers
Control variables: Trivariate Tables Men/Women Drivers
Automobile Accidents by Sex
------------------------------------------ Per Cent Accident Free
Women 68%
(6,950)
Men 56%
(7,080)
------------------------------------------
Automobile Accidents by Sex and Distance Driven
----------------------------------------------------------------------------Distance
Under 10,000 km Over 10,000 kmPer Cent Per Cent
Accident Free Accident Free
Women 75% 48% (5,035) (1,915)
Men 75% 48% (2,070) (5,010)
----------------------------------------------------------------------------
Women have fewer accidents than men because women tend to drive less frequently than do men, and people who drive less frequently tend to have fewer accidents
In, In, Say it with FiguresSay it with Figures, Hans Zeisel presents the following data:, Hans Zeisel presents the following data:
Another Way to Present Percentaged Tables (Trivariate)Another Way to Present Percentaged Tables (Trivariate)
Table 2. Percentage who support strike by type of school and sex
Sex Female Per cent Male Per cent
Type of School supporting strike supporting strike
Secondary 60% 60% (400) (400)
Elementary 30% 30% (900) (100)
__________________________________________________________Female = .30 : Male = .30 N = 1800
Dependent Variable
IndependentVariable
Controlvariable
Control variable
Categories of control variable
Common Types of Charts & GraphsCommon Types of Charts & Graphs
Bar chartsBar charts HistogramsHistograms Pie ChartsPie Charts Line Graphs/PolygonsLine Graphs/Polygons ScattergramsScattergrams
Bar ChartBar Chart
Parallel bars or rectangles with lengths Parallel bars or rectangles with lengths proportional to the frequency with which specified proportional to the frequency with which specified quantities occur in a set of dataquantities occur in a set of data
graphic representation of frequency distributiongraphic representation of frequency distribution, , generally used for generally used for discrete datadiscrete data..
A Bar Chart (flat-best for 2 dimensional data)A Bar Chart (flat-best for 2 dimensional data)
Bar Chart with breakBar Chart with break World Population Growth Showing World Population Growth Showing
Projections (Time to add billions)Projections (Time to add billions)
Click for source
HistogramsHistograms graphically representing grouped data of a frequency graphically representing grouped data of a frequency
distribution distribution baseline typically depicts the classes, and the vertical baseline typically depicts the classes, and the vertical
scale represents the frequencies or percentagesscale represents the frequencies or percentages for continuous data.for continuous data.
ExampleExample In a survey of people between the age of 18 and 74 to determine the In a survey of people between the age of 18 and 74 to determine the
number of bike users categorized by age groups. number of bike users categorized by age groups.
Q. Which age-group do you belong to?Q. Which age-group do you belong to?18 to 2418 to 2425 to 3425 to 3435 to 4435 to 4445 to 5445 to 5455 to 6455 to 6465 to 7465 to 74
HistogramHistogram
Pie ChartPie Chart
circular chart circular chart divided into sectors, divided into sectors,
illustrating relative illustrating relative magnitudes or magnitudes or frequencies. frequencies. arc length of each sector arc length of each sector
(and consequently its (and consequently its centralcentral angle and area), is angle and area), is proportional to the proportional to the quantity it represents. quantity it represents.
sectors create a full disk.sectors create a full disk.
Example: 2004 Election Results of EUExample: 2004 Election Results of EU
((link to source & data)source & data)
exploded pie chart exploded pie chart
one or more sectors one or more sectors separated from the rest separated from the rest of the diskof the disk
Example: 2004 Election Results of EUExample: 2004 Election Results of EU
Presentation of identical data in pie and bar chartsPresentation of identical data in pie and bar charts
Problem with pie charts: easier to compare bar Problem with pie charts: easier to compare bar charts visually & to see differences in proportionscharts visually & to see differences in proportions
Line and Scatter Charts (Graph)Line and Scatter Charts (Graph)
starts with mapping quantitative data points. starts with mapping quantitative data points. usually a dot or a small circle represents a single data usually a dot or a small circle represents a single data
point. point. one mark (point) for every data pointone mark (point) for every data point visual distribution of the datavisual distribution of the data When both variables are quantitativeWhen both variables are quantitative, the line , the line
segment that connects the two points on the chart segment that connects the two points on the chart expresses a slopeexpresses a slope
Slope can be visually interpreted relative to the slope Slope can be visually interpreted relative to the slope of other lines. of other lines.
Example of Frequency Distribution Table from Textbook
Example of Frequency Distribution Table from Textbook
Frequency Polygon Showing Same Data (Graph Plotting Frequency Distribution)
Frequency Polygon Showing Same Data (Graph Plotting Frequency Distribution)
Common types of DistributionsCommon types of Distributions
Normal DistributionNormal Distribution (bell-shaped curve) (bell-shaped curve) Skewed DistributionsSkewed Distributions Bi-Modal DistributionsBi-Modal Distributions
Normal DistributionNormal Distribution
Neuman (2000: 319)Neuman (2000: 319)
Skewed DistributionsSkewed Distributions
Neuman (2000: 319)Neuman (2000: 319)
Multiple Line chartsMultiple Line charts
Multi-symbol Line chartMulti-symbol Line chart
Combining Quantitative & Qualitative Info. In Graphs: Temperatures during Napoleon’s March (E. Tufte)Combining Quantitative & Qualitative Info. In Graphs: Temperatures during Napoleon’s March (E. Tufte)
Line Chart (Poor example)Line Chart (Poor example)
Example of Example of Bad choice of Bad choice of
graphic graphic representationrepresentation
Data discrete Data discrete ConnectingConnecting
dots does not make dots does not make sense becausesense because
Measures of Measures of colours are colours are nominal herenominal here
ScattergramsScattergrams
Design & Interpretation Issues: Choice of ScalesDesign & Interpretation Issues: Choice of Scales
Same data presented using Same data presented using different scales for x and y different scales for x and y axisaxis
Core Notions in Basic Univariate StatisticsCore Notions in Basic Univariate Statistics
Ways of describing data about one Ways of describing data about one variable (“uni”=one)variable (“uni”=one) Measures of central tendencyMeasures of central tendency
Summarize information about one variable Summarize information about one variable (“averages”)(“averages”)
Measures of dispersionMeasures of dispersionVariations or “spread”Variations or “spread”
Measures of Central Tendency Measures of Central Tendency
summarize information about one variable summarize information about one variable in single number in single number ModeMode MedianMedian MeanMean
Use of Measures of Central TendencyUse of Measures of Central Tendency to summarize common “overall” “centralized” trendsto summarize common “overall” “centralized” trends doesn’t show variability, spread, dispersiondoesn’t show variability, spread, dispersion
ModeMode
Babbie (1995: 378)
most common or frequently occurring case most common or frequently occurring case (for all types of data)(for all types of data)
MedianMedian
Babbie (1995: 378)
middle point (only for ordinal, interval or ratio middle point (only for ordinal, interval or ratio data)data)
Mean (arithmetic mean)Mean (arithmetic mean)
Babbie (1995: 378)
““average” = sum of values divided by number of average” = sum of values divided by number of cases (only for ratio and interval data)cases (only for ratio and interval data)
Normal Distribution & Measures of Central TendencyNormal Distribution & Measures of Central Tendency
Neuman (2000: 319)Neuman (2000: 319)
Skewed Distributions & Measures of Central TendencySkewed Distributions & Measures of Central Tendency
Neuman (2000: 319)Neuman (2000: 319)