com 633: content analysis cata kimberly a. neuendorf, ph.d. cleveland state university fall 2010

76
COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Upload: clare-price

Post on 26-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

COM 633: Content AnalysisCATA

Kimberly A. Neuendorf, Ph.D.

Cleveland State UniversityFall 2010

Page 2: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

COM 633 Fall 2010CATA Presentations

Kate & Julie: LIWC & PCADJen & Diane: LIWC & MCCALite?Fran & Dongwoo: CATPAC & WordStatJon & Elizabeth: Yoshikoder & General InquirerJoe: Diction

Page 3: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

CATA: Computer Aided Text Analysis

Why might you want to use CATA rather than traditional human-coding techniques?CATA programs typically have been written by researchers with a specific need; thus, their utility is often limited.Online search and acquisition opportunities have made CATA easier, more attractive (e.g., Nexis)

Page 4: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Purposes of CATA1. Descriptive—e.g., word counts

Modell project, using:VBPro, M. Mark Miller, 1980s software

2. Coding of Open-ended Survey Responses

WordStat, SimStat adjunct program(Provalis Research; Normand Peladeau)

Page 5: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Purposes of CATAStandard Dictionaries: Most of the following

applications use internal “standard” dictionaries:

3. Linguistic and Sociolinguistic MeasuresGeneral Inquirer, Philip Stone, 1966

Harvard IV DictionaryMCCALite, Don McTavish & Ellen Pirro

116 “idea categories” are applied to multiple characters in a script

CATPAC, Joseph WoelfelSemantic “neural” networks—no actual dictionary

Page 6: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Purposes of CATA4. Psychometric Measures (or “Thematic Content Analysis”—Smith)

General Inquirere.g., Lasswell Values Dictionary

5. Clinical Psychological/Psychiatric Diagnoses

PCAD, Louis Gottschalk & Robert BechtelComputer version of Gottschalk’s earlier human-coded schemes devised to provide alternative diagnostic techniques

Page 7: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Purposes of CATA6. Verbal Style or Communicator Style

LIWC, Pennebaker, Booth, & Francise.g., positive emotions, cognitive processesAlso includes many linguistic measures and some that might be used as psychometrics

Diction, Rod HartComputer application of Hart’s earlier human-coded schemes aimed at measuring characteristics of political speech—e.g., aggression, cooperation, ambivalence

Page 8: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Purposes of CATA7. Authorship Attribution

Most use simple counts of letters or words to attribute authorship (e.g., the Federalist papers; Raymond Chandler; Shakespeare)Basic computer/word processing programming is sufficient

Page 9: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Measurement in CATAThree choices:

Custom DictionariesComplicated, time-consuming

Standard DictionariesA task of matching one’s conceptualization to someone else’s operationalization—sometimes a scavenger huntSimilar to the challenge of finding an appropriate scale for a survey

“Emergent” Coding—outcome based on language patterns that emerge (e.g., CATPAC)

Page 10: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Quantitative CATA Programs

Program Author Original Purpose

VBPro M. Mark Miller Newspaper articles

Yoshikoder Will Lowe Political documents

WordStat Normand Peladeau Part of SimStat, a statistical analysis package

General Inquirer

Philip Stone General mainframe computer application (1960s)

Profiler Plus Michael Young Communications of world leaders

LIWC 2007 Pennebaker, Booth, & Francis

Linguistic characteristics & psychometrics

Diction 5.0 Rod Hart Political speech

PCAD 2000 Gottschalk & Bechtel Psychiatric diagnoses

WORDLINK James Danowski Network analysis/communication

CATPAC Joseph Woelfel Consumer behavior/marketing

Page 11: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Quantitative CATA Programs

Program Type

VBPro Word count/researcher-created dictionaries only

Yoshikoder Word count/researcher-created dictionaries only

WordStat Word count/researcher-created dictionaries only

General Inquirer

Word count with pre-set dictionaries

Profiler Plus

Word count with pre-set dictionaries

LIWC 2007 Word count with pre-set dictionaries (researcher-created dictionaries may be added)

Diction 5.0 Word count with pre-set dictionaries

PCAD 2000 Word count with pre-set dictionaries (researcher-created dictionaries may be added)

WORDLINK Word co-occurrence

CATPAC Word co-occurrence

Page 12: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Validity and CATA

Validation part of development of CATA system (e.g., Lin et al., 2009—genres of online discussion threads)Validation of thematic CA (psychometrics) against self-report—rare and uncertain (e.g., McClelland et al., 1992)A comprehensive model for assessing content, external, and predictive validity when using CATA—Short, Broberg, Cogliser, Brigham (2010) as applied to “entrepreneurial orientation”:

Content validity—an inductive/deductive comboExternal validity—use multiple sampling framesPredictive validity—measure non-CATA variables that should relate

Page 13: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Validity of Standard Dictionaries

Trusting the Standard Dictionary—an issue of face validity

Few CATA programs reveal the full dictionary lists (e.g., Diction, General Inquirer)None reveal the full algorithm (including disambiguation (e.g., well, pot, leaves))None account for negation

Construct and Criterion ValidityRod Hart’s Diction—”normed” rather than validatedGottschalk and Bechtel’s PCAD—validated against standard psychiatric diagnoses

Page 14: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Quantitative CATA Programs

Program Type Validation

VBPro Word count/researcher-created dictionaries only

N/A—all custom dictionaries

Yoshikoder Word count/researcher-created dictionaries only

N/A—all custom dictionaries

WordStat Word count/researcher-created dictionaries only

N/A—all custom dictionaries

General Inquirer

Word count with pre-set dictionaries No--Dictionaries adapted from Harvard IV, Lasswell values, other standard linguistic and socio-psychological scales

Profiler Plus

Word count with pre-set dictionaries Proprietary

LIWC 2007 Word count with pre-set dictionaries (researcher-created dictionaries may be added)

Some dimensions have been validated against assessments by human judges

Diction 5.0 Word count with pre-set dictionaries No—Based on R. Hart’s substantive work

PCAD 2000 Word count with pre-set dictionaries (researcher-created dictionaries may be added)

Long history of development of a human-coded scheme; both human & CATA heavily validated against clinical diagnoses

WORDLINK Word co-occurrence N/A—emergent dimensions

CATPAC Word co-occurrence N/A—emergent dimensions

Page 15: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Yoshikoder

Page 16: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

About Yoshikoder

Created by Will Lowe at Harvard’s Department of Government

Can be downloaded free at www.yoshikoder.org

A cross-platform, multi-lingual CATA program

Must run one case at a time

Assumes the researcher will create dictionaries

Can import external dictionaries

Exports results into Excel

Page 17: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Yoshikoder: KWIC and Concordance

Page 18: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Yoshikoder: Dictionary Report

Page 19: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

WordStat

Page 20: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

About WordStat• Created by Normand Peladeau, as part of the SimStat suite for quantitative data analysis (a counterpart to SPSS)

•Must be run as part of SimStat

•Particularly suited to analyzing open-ended responses, in that data set typically includes both numeric and textual variables—which can immediately be crosstabulated

•The “standard” dictionaries that are included are incomplete and should be avoided

•Also includes KWIC

Page 21: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

The WordStat Interface (within SimStat)

Page 22: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010
Page 23: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Selection of Independent & Dependent Variables—Including Textual Variable

Page 24: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Standard WordStat “Dictionaries”

Page 25: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Breakdown of very limited WordStat “Dictionary”

Page 26: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

WordStat Output: Word counts

Page 27: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

WordStat Output: Dendogram

Page 28: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

WordStat Output: Crosstab with bar graph

Page 29: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

WordStat Output: Crosstab and 3D

representation

Page 30: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

WordStat Output: KWIC

Page 31: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

General Inquirer (PC/MAC version)

Page 32: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

About General InquirerCreated by Philip Stone in the Department of Social Relations at Harvard in the 1960s—on mainframe for many yearsThe current version combines the "Harvard IV-4" dictionary content-analysis categories, the "Lasswell" dictionary content-analysis categories, and five categories based on the social cognition work of Semin and Fiedler, making for 182 categories (dictionaries) in all

Page 33: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

The General Inquirer (PC) Interface

Input and output files must be namedTwo choices: Tags (application of dictionaries) & Words

Page 34: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

General Inquirer Output: Tags (data file that may easily be exported to Excel &

SPSS)

First row of each set is the ‘r’ (raw count) form of the output. This corresponds to frequencies.

Second row of each set is the ‘s’ (scaled count) form of the output. This corresponds to percentages (of total).

Page 35: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

General Inquirer Output: Words

Page 36: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

PCAD

Page 37: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

About PCADDeveloped by Gottschalk & Bechtel, using scales developed by Gottschalk & Gleser for human-coding in 1960sDiagnostic—assesses one text at a timeIntended for naturally-occurring speech or writing, minimum 80 wordsMeasures states of neuropsychiatric interest such as:

AnxietyHostilityCognitive impairmentDepressionSchizophreniaAchievement StrivingsHope

Page 38: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

The PCAD Interface

Page 39: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

PCAD Interface-2

Page 40: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

PCAD Output: 4 Types(Clauses, Summaries, Analyses,

Diagnoses)

Page 41: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

PCAD Output: Analyses

Page 42: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

PCAD Output: Diagnoses

Page 43: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

LIWC

Page 44: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

About LIWC

James W. Pennebaker & Martha E. Francis

•Created by Pennebaker, Booth, & Francis

•“Looks at how people write & their state of mind”

•Intended to measure both affective and cognitive constructs

•84 Output Variables (standard dictionaries):

•17 Standard linguistic dimensions (e.g., number of pronouns)

•25 Word categories (e.g., “psychological constructs – affect, cognition”)

•10 Time categories (e.g.“space, motion”)

•19 Personal concerns (e.g., “home”)

Page 45: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

LIWC Dictionaries (dimensions) with sample words

http://www.liwc.net/descriptiontable1.php

Page 46: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

The LIWC Interface

Page 47: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

LIWC Output: Data Matrix (Each row is a case/text, each column a dictionary)

Page 48: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Diction

Page 49: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

About Diction• Created by Roderick P. Hart, University of Texas, originally for the

purpose of analyzing political discourse

• To measure “semantic features”, uses a series of 31 standard dictionaries and five “Master Variables” (scales constituted of combinations of the 31):

• Activity

• Optimism

• Certainty

• Realism

• Commonality Users can create custom dictionaries in addition to standard dictionaries. The program can accept individual or multiple passages.

Page 50: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

The Diction Interface

Page 51: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Diction Output: Calculated & Master Variables

Page 52: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Diction Output: Dictionary Totals with Normative Values

Page 53: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Diction Output: Interactively Changing Normative Values

Page 54: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Diction: Custom Dictionaries as Simple .txt Files

Page 55: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

Diction Output: Data file may be exported to SPSS

SPSS Syntax Editor

Page 56: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

CATPAC

Page 57: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

About CATPACCreated by Joseph Woelfel, Communication scientist at University of BuffaloPart of the GALILEO suite of softwares that analyze and display various types of networksCATPAC uses a neural network approach, identifying the most frequent words and determining patterns of connection based on co-occurrenceA scanning window is used to measure the association/co-occurrenceUses cluster analysis to present results of this co-occurrence procedure

Page 58: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

The CATPAC Interface

Text input will appear in CATPAC main screen

Page 59: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

CATPAC Output: Descending Frequency List, Alphabetically Sorted List

      

CATPAC Output:

Descending Frequency List, Alphabetically Sorted List

Page 60: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

CATPAC Output: Dendogram

Page 61: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

CATPAC Output: 3D Plot (using ThoughView, another part of Galileo

Suite)

Page 62: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

VBPro

Page 63: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

About VBProCreated by M. Mark Miller at the University of TennesseeFor use with MS-DOS (!!)Entirely do-it-yourself. . . no standard dictionariesQuantitative: frequencies & coding texts in numeric format for analysis in statistical softwareQualitative: can provide KWIC (key word in context)

Page 64: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

VBPro: Preparing the textMultiple cases within one file are prefixed with an identification tag and saved as a .txt file (NOT .asc, the old standard)

Page 65: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

VBPro: Preparing Dictionaries

Each search dictionary is headed with >>#<<

Page 66: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

The VBPro Interface

Page 67: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010
Page 68: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010
Page 69: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

VBPro Output: Data matrix (each row is a case/text, each column a

dictionary)

Page 70: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

VBPro Output: Alphabetization

Page 71: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

VBPro Output: Word Frequency

Page 72: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

MCCALite

Page 73: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

About MCCALITE Created by Donald G. McTavish & Ellen B. Pirro, sociologists at the University of Minnesota, 1990Full name: Minnesota Contextual Content AnalysisMeasures the frequency of words in 116 “idea categories” (dictionaries) and compare these frequencies to the norms of general usage statistics for the English LanguageThere are standard dictionaries (categories) and KWIC, DIMAPTwo types of dictionary scores are reported: E-Scores (emphasis) and C-Scores (context)

Ideal content for MCCALITE are multiple-person transcripts (plays, hearings, interviews, TV)

Page 74: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

The MCCALite Interface & Output

Page 75: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

MCCALite: One more example (of many possible)

Page 76: COM 633: Content Analysis CATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

end