conducting a cross tabulation analysis in the qualtrics research suite

7
8/1/2016 Conducting a Cross Tabulation Analysis in the Qualtrics Research Suite http://scalar.usc.edu/works/c2cdigitalmagazinefall2016winter2017/conductingacrosstabulationanalysisqualtricsresearchsuite?path=index 1/7 Shalin Hai-Jew Sign out You have Author privileges Dashboard | Index | Guide C2C Digital Magazine (Fall 2016 / Winter 2017) Colleague 2 Colleague, Author C2C Digital Magazine (Fall 2016 / Winter 2017) 1. Cover 2. Issue Navigation 3. Letter from the Chair: Anna Catterson 4. Announcing 2016 SIDLIT Award Winners! 5. Cluster Analyses and Related Data Visualizations in NVivo 11 Plus 6. Telling Data Stories 7. Drawing a 2D Informational Graphic with Microsoft Visio 8. Extracting Linguistic Patterns from Texts with LIWC (“luke”) for Analysis 9. Creating Article Theme Histograms to Map a Topic 10. Conducting a Cross Tabulation Analysis in the Qualtrics Research Suite 11. Book Review: Immersing Virtually through Avatars for Online and Blended Learning 12. About Colleague 2 Colleague 13. A Call for Submissions to the C2C Digital Magazine Other paths that intersect here: Cover, page 9 of 13 Previous page on path Next page on path Conducting a Cross Tabulation Analysis in the Qualtrics Research Suite By Shalin HaiJew, Kansas State University It used to be that online survey tools enabled the rich capture of respondent data and then enabled researchers to download the data for analysis in other tools. While that workflow is still valid for many cases, many online survey systems have become their own “research suites” and enable data analytics, data visualizations, autocreated data dashboards, and report creation. Figure 1: A Cross Tabulation Table with Attribute Values as Variables One of the data analytics methods built into the Qualtrics Research Suite is a cross tabulation analysis, a common tool used with categorical (or nominal) and “nonparametric” data. The computational cross tabulation enables the identification of patterns in survey question responses that might well remain latent otherwise…at computer speeds…and with big(ger) data. (The limits of “big data” analytics are not fully clear since Qualtrics is a cloudbased tool and may be hosted on servers with largescale processing capabilities, but processing may be limited based on the user account types.) This article introduces some features of this cross tabulation feature in Qualtrics. A Generic Cross Tabulation Analysis A cross tabulation table (also known as a “contingency table”) basically captures the frequency distribution of multiple variables and their interrelations (if any). This approach was first described by Karl Pearson in 1904 (“Contingency table,” July 6, 2016). Main menu A Cross Tabulation Table with Attribute Values as Variables A Details

Upload: shalin-hai-jew

Post on 16-Apr-2017

276 views

Category:

Data & Analytics


2 download

TRANSCRIPT

8/1/2016 Conducting a Cross Tabulation Analysis in the Qualtrics Research Suite

http://scalar.usc.edu/works/c2cdigitalmagazinefall2016winter2017/conductingacrosstabulationanalysisqualtricsresearchsuite?path=index 1/7

Shalin Hai-Jew Sign outYou have Author privilegesDashboard | Index | Guide

C2C Digital Magazine (Fall 2016 / Winter2017)Colleague 2 Colleague, Author

C2C Digital Magazine(Fall 2016 / Winter

2017)

1. Cover

2. Issue Navigation

3. Letter from theChair: AnnaCatterson

4. Announcing2016 SIDLIT AwardWinners!

5. Cluster Analysesand Related DataVisualizations inNVivo 11 Plus

6. Telling DataStories

7. Drawing a 2DInformationalGraphic withMicrosoft Visio

8. ExtractingLinguistic Patternsfrom Texts withLIWC (“luke”) forAnalysis

9. Creating ArticleTheme Histogramsto Map a Topic

10. Conducting aCross TabulationAnalysis in theQualtrics ResearchSuite

11. Book Review:ImmersingVirtually throughAvatars for Onlineand BlendedLearning

12. AboutColleague 2Colleague

13. A Call forSubmissions to theC2C DigitalMagazine

Other paths that intersect here:Cover, page 9 of 13 Previous page on path Next page on path

Conducting a Cross Tabulation Analysis in the Qualtrics Research Suite

By Shalin HaiJew, Kansas State University

It used to be that online survey tools enabled the rich capture of respondent data and then enabledresearchers to download the data for analysis in other tools. While that workflow is still valid for manycases, many online survey systems have become their own “research suites” and enable data analytics, datavisualizations, autocreated data dashboards, and report creation.

Figure 1: A Cross Tabulation Table with Attribute Values as Variables

One of the data analytics methods built into the Qualtrics Research Suite is a cross tabulation analysis, acommon tool used with categorical (or nominal) and “nonparametric” data. The computational crosstabulation enables the identification of patterns in survey question responses that might well remain latentotherwise…at computer speeds…and with big(ger) data. (The limits of “big data” analytics are not fully clearsince Qualtrics is a cloudbased tool and may be hosted on servers with largescale processing capabilities,but processing may be limited based on the user account types.) This article introduces some features of thiscross tabulation feature in Qualtrics.

A Generic Cross Tabulation Analysis

A cross tabulation table (also known as a “contingency table”) basically captures the frequency distributionof multiple variables and their interrelations (if any). This approach was first described by Karl Pearson in1904 (“Contingency table,” July 6, 2016).

Main menu

A Cross Tabulation Table with Attribute Values as Variables Annotations

Details

8/1/2016 Conducting a Cross Tabulation Analysis in the Qualtrics Research Suite

http://scalar.usc.edu/works/c2cdigitalmagazinefall2016winter2017/conductingacrosstabulationanalysisqualtricsresearchsuite?path=index 2/7

Search

So what are the basic elements of a cross tabulation data table (Figure 2)? Essentially, across the columnheaders and down the side of row headers are various types of variables. The intersecting cells (readingacross from the row and down from the selected column) show the tabulation or counts of the occurrences ofboth variables.

Binary (or dichotomous) cell data. Some cross tabulation results in a matrix with cells that are only 1sand 0s, with 1s representing the presence of a relationship and 0s representing the absence of a relationship. This binary result is a common type of matrix. (If both the column and row headers are the same entities—so B1H1 = 2A8A, then a relational graph may be drawn from the data with just the binary resultsindicating whether a relationship exists or not between each variable.) It can also be that for the particulartable, there are only two types of responses possible, like a positive or negative sentiment rating.

Frequency cell data. Another sort of cross tabulation table contains cells with frequency data. What is inthese cells are numbers that show specific counts of the intersecting rows and columns. The results are oftendepicted as intensity matrices (with darker and more saturated color in cells that have proportionally highercounts).

Content cell data. In some cross tabulation analyses, the cell data may be textual contents. For example,when cross tabulations are of coded nodes (such as in a qualitative data analytics tool), the intersected cellscontain text that were coded to both nodes (in an overlapping way).

Variables in rows or columns? The variables themselves may be put in either the rows or the columns(such tables can be transposed easily), but there is usually a method to their selection, in order to identifyparticular patterns in the underlying data. Sometimes researchers will run very large cross tabulationanalyses in order to find particular variable relationships, which they will then depict in much smaller andtargeted cross tabulation data tables for visual coherence in presentation.

Figure 2: Basic Elements of a Cross Tabulation Table

Figure 2 gives a small sense of some of the analytical dependencies for a cross tabulation analysis. It isimportant to know how the research was conducted to acquire the underlying variable data and how solidthose data are. How were the variables selected is important? As noted in the figure, observed nominal datamay come from experimental conditions or inworld nonexperimental ones. The variables in the firstcontext by be predictor variables and dependent variables. In inworld observations, the variables may be ofvarious types. Attribute variables describe features of respondents, such as demographic data, which

View Recent

Basic Elements of a Cross Tabulation Table Annotations

Details

8/1/2016 Conducting a Cross Tabulation Analysis in the Qualtrics Research Suite

http://scalar.usc.edu/works/c2cdigitalmagazinefall2016winter2017/conductingacrosstabulationanalysisqualtricsresearchsuite?path=index 3/7

enables grouping of respondents to see if there are patterns of survey responses among respondent groups. Outcome variables show fixed inworld realities that may be used to categorize respondents into groups tosee if there are patterns. Generic variables may have associational relationships with other variables...oreven apparent causal relationships. The interpretation of such variable relationships may be informed inpart by theory but also by empirical observations and by abductive logic.

What was seen in the data? What was not seen? How astutely did a researcher or research team analyze therespective cells, across cells, across columns, across rows, and through the cross tabulation tables (yes,plural) matters. What computational aids were used to extract patterns? How did the researcher(s)hypothesize around the cross tabulation table is central to a successful analysis? How nuanced is theanalysis, and how clearly explained are the outcomes?

Cross tabulation analyses are not just conducted to create finalized data summaries. These may be runduring the data exploration stage of research work to see if there may be data query leads to pursue.

This analytical approach may not necessarily result in reportable findings. There may not be any support forhypothesized associations or relationships between variables. The variables themselves may be unrelated oreven independent (based on the frequency counts). Maybe some variables have only very nuanced or mildassociations, and worse, maybe the collected data itself is insufficient to capture an actual real effect. [Evenwith categorical data and a fairly low “n,” there is an understanding that there has to be sufficient data toavoid Type 1 (false positives) and Type 2 (false negatives) errors. Type 1 errors involve rejection of a truenull hypothesis when the null hypothesis is true (thinking that an effect is there when it isn’t); Type 2 errorsinvolve rejection of a true hypothesis even when the null hypothesis should be rejected (thinking that aneffect is not there when in fact it is). If the research is sufficient (enough data points), in theory, there will bemostly true positives and true negatives.] Even if results are relevant, sometimes these analyses only resultin a publishable sentence or paragraph; occasionally, these may merit a data visualization.

In an Online Survey

While many may not have heard of cross tabulation analyses, this analytical approach is quite common: “One estimate is that single variable frequency analysis and crosstabulation analysis account for more than90% of all research analyses” (“Cross Tabulation Analysis,” 2013), according to the Qualtrics site. The easeof applying this approach computationally to survey results is a fairly new innovation. (In Figure 3, Qualtricspowers the KState Survey system.)

Qualtrics Research Suite Landing Page at Kansas State University Annotations

Details

8/1/2016 Conducting a Cross Tabulation Analysis in the Qualtrics Research Suite

http://scalar.usc.edu/works/c2cdigitalmagazinefall2016winter2017/conductingacrosstabulationanalysisqualtricsresearchsuite?path=index 4/7

Figure 3: Qualtrics Landing Page at Kansas State University

Effective question design. The rules to designing effective and nonbiased surveys involve plenty ofskill but are beyond the purview of this article. For practical purposes, assuming that a survey itself iscorrectly designed, there are some additional considerations so that the resulting data may effectivelyanalyzed and queried with cross tabulation tables.

Response types cannot be directly qualitative, such as through textonly or uploaded imagery or video oraudio. A cross tabulation assumes that there is a frequency count in the response. What works then wouldbe multiple choice questions (with a range of closedanswer questions which may be counted), truefalsequestions, demographic questions with defined selection categories, slider questions with measures ofintensity, Likertscaled questions with intensity responses, and so on. Textbased question results may bequantized using text frequency analyses, but these would have to be exported and analyzed outside Qualtrics(at least at this time). Multimedia responses, such as digital imagery, video, and audio responses (throughthe file upload feature), would have to be manually analyzed and coded for learning value, again, outside ofQualtrics.

Another important aspect is to ensure each question (or response elicitation) is only singlebarreled. Adoublebarreled or multiaspect question will muddle the data results. Multicollinearity in the designedvariables (respective survey questions) may be used to doublecheck results, but will add redundancy to thesurvey. If there are questions that were not included in the survey, then some aspect of the potential datawill not be usable in a cross tabulation analysis (or else, that question will have to be asked differently usingother data).

Cleaning data for cross tabulation analysis? There is not an actual equivalent approach to preprocessing and cleaning data before it is run through a cross tabulation analysis. Certainly, the data fromQualtrics may be exported in filtered reports that will enable data cleaning in external tools, but withinQualtrics, there is not an obvious way to clean the data online. This is another reason why proper questiondesign is important early on.

If there are problematic response entries (such as spam ones), it is possible to delete a response withinQualtrics and decrement any quota counts.

ChiSquared Statistics (χ2)

With some types of cross tabulation analyses, it may be relevant to run chisquare (or “chisquared”)statistics. Essentially, this statistic extends the power of a cross tabulation data table beyond basic countingby enabling a feature of quantitative data analytics: the ability to “reject the null hypothesis.” What thatphrase means is that a researcher can with a certain level of confidence suggest that the data he or she isobserving is likely not just due to random chance but is a result of some potential causal or associationalfactor (with α alpha values of p < .05, or an even higher standard of p < .01).

In this case, based on categorical data, the baseline is not set on any normal curve, but the baseline is set on“expected frequency values” (a statistically derived assumed distribution) in a particular cell as compared to“observed frequency values.” The expected frequency values are based on the known underlying classes andwhat researchers would expect to see in terms of data values based on those classes. This is a form of"bootstrapping," in which an underlying data distribution is empirically derived (albeit based not oncollected data but expected frequencies derived statistically). ["Bootstrapping" refers to the use of whateverexisting resources one has to achieve a particular aim in an environment of scarcity or challenge.]

The chisquare equation reads as follows:

χ 2 = ∑ (oe) 2 e

or chisquared equals the sum over all cells where the expected value (e) is subtracted from the observedvalue (o) and then squared (to capture the difference between the observed frequency value from theexpected frequency value, whether the first amount is larger or smaller than the expected frequency value),

8/1/2016 Conducting a Cross Tabulation Analysis in the Qualtrics Research Suite

http://scalar.usc.edu/works/c2cdigitalmagazinefall2016winter2017/conductingacrosstabulationanalysisqualtricsresearchsuite?path=index 5/7

divided by the expected value. The squaring ensures that the difference from the expected value is renderedas a positive number whether the difference is a positive or a negative number.

If the observed data follows theorized expected distribution (created from the expected values)whether itskews left or right or is bimodal or has other expected frequency curve featuresthen it may be assumedthat the null hypothesis cannot be legitimately rejected (so the assumption is that only random chance isinfluencing the variance in the observed data).

If the observed frequency data is sufficiently anomalous, the chisquare value has to be higher than whatwould be expected on a ChiSquare Distribution Table. This table basically calculates the critical chisquarevalue based on the degrees of freedom or “df” (the number of possible outcomes in the cross tabulationminus 1) and the alpha level (or pvalue). If a calculated χ 2 value is higher than the critical value in the table,there is a sufficient confidence that the null hypothesis may be rejected (usually at levels of 95% or 99%confidence). If it fails to exceed the critical value, then the findings are insufficient to reject the nullhypothesis (“There is no significant statistical difference between the observed and expected frequencies ofthis categorical data”).

In Qualtrics, the ChiSquare Distribution Table does not directly have to be referred to because the alphalevel is automatically calculated. Further, the resulting table itself can be layered over with additionalsummary statistics (Figure 4).

Figure 4: An Example of a Cross Tabulation Analysis from Qualtrics (with ChiSquare Statistics)

While the chisquare statistic requires at least a context of two possible outcomes or one degree of freedom,a cross tabulation analysis requires at least a twodimensional table but can include a wide range ofdimensions.

While this chisquare test can inform researchers about whether they may reject the null hypothesis withconfidence or not, the analysis does not stop here. The chisquare test may suggest that observed data issufficiently outofnorm to be statistically significant, which suggests that something more than chance isaffecting the observed frequencies. The nature of the apparent association between defined variables is notspelled out by this test. The interpretation of the findings may be better informed by the researcher’sexpertise. Part of expertise involves the deft use of language to explain the findings, so as not to overclaimor underclaim or otherwise miss out on what may legitimately be assertable.

An Example of a Chi-Square Cross Tabulation Analysis from Qualtrics with Labels

Details

8/1/2016 Conducting a Cross Tabulation Analysis in the Qualtrics Research Suite

http://scalar.usc.edu/works/c2cdigitalmagazinefall2016winter2017/conductingacrosstabulationanalysisqualtricsresearchsuite?path=index 6/7

Cross Tabulation Analysis in Qualtrics

So how does a researcher create a cross tabulation analysis using Qualtrics?

Basic Steps to Starting a Cross Tabulation Analysis Using Qualtrics

1. Log into the Qualtrics Research Suite survey site. 2. Navigate to the target survey. 3. Click the “Data & Analysis” tab. 4. In the ribbon, select “Cross Tabs.” 5. Click the green “+ Create a new Cross Tabulation” button at the top left. 6. In the left columns of checkboxes, select the desired Banner elements (column headers). 7. In the left columns of checkboxes, select the desired Stub elements (row headers)8. At the bottom right, click “Create Cross Tabulation.” The Cross Tabulation table appears, and the chisquare statistics appear below the main table.

9. To add elaboratory cell information, an additional step is needed. In the Data Options dropdownmenu, select the following: Expected Frequencies, Actual – Expected, Row Percents, ColumnPercents, Show Banner Means, and Show Stub Means.

10. To change the default name of the cross tabulation analysis (which is an automated concatenation ofthe survey name and “Cross Tabulation”), click on the name at the top left.

11. Click on the Custom Highlights button at the top, and manually highlight the cells which show relevantpatterning.

There are tools to enhance researcher interactivity with the data. There is a Row/Column Selector to enablehoming in on a particular cell and results in the highlighting of the entire row and column. A “Puller” toolenables navigating around a particularly large cross tabulation table by enabling the pulling of a table up anddown, and sidetoside, as needed.

To change up the data, additional banners and stub elements may be added on the fly. At the banner andstub levels, users may “Add Multilevel Drill Down” features to the data for more complex dimensionality. Additional question elements may be brought into play to add nuance to the crosstab analysis. The existingdata may be filtered (by question responses, by embedded data) and the cross tabulation table recalculated. Custom equations may be applied to respective banners and stubs for further complex analysis.

Under "Data Options" > "Advanced Options," it is possible to change how the cross tabulation table handlesthe statistics, whether calculating statistics based on respondents or on responses. In the notes, it reads thatstatistics based on responses are calculated as follows: "Percentages and other stats are calculated based onthe number of responses. (For multiple answer questions the number of responses may be greater than thenumbr of respondents to that particular question. This method is not recommended.)" The default is set tothe calculating of statistics based on respondents. Also, researchers may choose to "Ignore nonresponses"(default), or they m ay choose to "Show nonresponses," which would draw an additional column for eachquestion with the number of survey respondents which skipped that question.

The color scheme applied to the cross tabulation table may be changed up for a different lookandfeel.

Finally, the cross tabulation tables may be exported to Excel or PDF formats. In Excel format, the table datamay be further analyzed in other data analytics tools. In the PDF format, the lookandfeel of thevisualizations are captured and may be reversioned into digital image format for presentation purposes.

Conclusion

This article touches on cross tabulation analysis in a general way and then showed how this classic analyticsapproach may be applied in Qualtrics, using responses to questions to identify statistically significantassociations between survey responses (as variables). While this used an online survey as an example, thereare many ways to use an online research suitefor

8/1/2016 Conducting a Cross Tabulation Analysis in the Qualtrics Research Suite

http://scalar.usc.edu/works/c2cdigitalmagazinefall2016winter2017/conductingacrosstabulationanalysisqualtricsresearchsuite?path=index 7/7

Version 23 of this page, updated 01 August 2016.C2C Digital Magazine (Fall 2016 / Winter 2017) by Colleague 2 Colleague. Help reading this book.Powered by Scalar.Terms of Service | Privacy Policy | Scalar Feedback

New Edit Hide

Comment on this page

Previous page on path Cover, page 9 of 13 Next page on path

online polling, electronic Delphi studies, largescale trainings and related assessments, crowdsourced sampling, and other types of research.

These approaches have their own underlying assumptions and data strengths / limitations. Even so, thecross tabulation analysis tool within Qualtrics may be used to identify empirical data patterns and createinsights.

This article is not meant to be a complete introduction to the full complexities of the Cross Tabs analytic toolin the Qualtrics Research Suite but a light (albeit somewhat complicated) introduction.

References

“Contingency Table.” (2016, July 6). Wikipedia. Retrieved July 9, 2016, fromhttps://en.wikipedia.org/wiki/Contingency_table.

“Cross Tabulation Analysis.” (2013). Qualtrics site. Retrieved July 6, 2016, fromhttps://www.qualtrics.com/wpcontent/uploads/2013/05/CrossTabulationTheory.pdf.

About the Author

Shalin HaiJew works as an instructional designer at Kansas State University. She has conducted dataanalyses using Qualtrics—on grantfunded projects. She has no official tie to Qualtrics. She may be reachedat [email protected].

Related: Issue NavigationQualtrics cross tabulation chi square statistic