text analysis and visualisation: an overview of tools · summit 1 jul 2014 more analyze data...
TRANSCRIPT
Text Analysis and Visualisation:
An Overview of Tools
□ Segmentation or tokenisation □ Often based on the fact that there are generally
spaces in between words □ Types are the unique words in a document;
tokens are the total number of words
He cried in a whisper at some image, at some vision,--he cried out twice, a cry that was no more than a breath-- 'The horror! The horror!‘
28 tokens and 21 types
Studies based on vocabulary
Frequency lists
Frequency list produced using TaporWare
Stopword filtering
Frequency list produced using TaporWare
Distribution
Voyant Type Frequencies Chart
Voyant BubbleLines
Tapor Distribution
Collocation
List produced using TaporWare
List produced using AntConc
Co-occurrence
List produced using TaporWare
Clustering
Dendrogram produced using Lexomics
PCA created using Tapor and R
Information extraction
Conclusions
□ Text analysis tools may produce new views on the text
□ There are caveats; Tools are based on assumptions on how texts ought to be analysed
□ Customisations of existing tools are generally needed for more specific research question
□ Identification of appropriate tools via library
support