![Page 1: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/1.jpg)
PowerConc: An R-gram Based Corpus Analysis Tool
Jiajin Xu & Yunlong JiaBeijing Foreign Studies University
![Page 2: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/2.jpg)
2
PowerConc• National Research Centre for Foreign Language E
ducation, Beijing Foreign Studies University• A general purpose tool for corpus analysis• Developed in Delphi• can deal with any ANSI encoded texts
– E.g. on a Simplified Chinese OS– works well with Simplified/Trad. Chinese texts,
(un)tokenised or raw/POS-tagged, as well as raw/POS-tagged English texts
![Page 3: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/3.jpg)
3
• Size: 1.5MB, compressed package less than 1MB
• Installation: Doesn’t require any installation.
• OS: Works only on Windows now.
PowerConc
![Page 4: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/4.jpg)
Design principles for PowerConc
![Page 5: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/5.jpg)
5
Ideally• Most powerful, can do anything that a concor
dancer can do and cannot do.• involves least effort in learning to use it
• Doing MORE with less• Reductionism in software design
![Page 6: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/6.jpg)
6
Less buttons and/or tabs
Frequencycount
SearchList
![Page 7: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/7.jpg)
7
![Page 8: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/8.jpg)
8
![Page 9: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/9.jpg)
9
Freq. Count
Concordance N-gram list
Collocation &Colligation Key n-gram list
![Page 10: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/10.jpg)
10
More possibilities in tool develop’t
• Corpus-informed/related ‘grammars’– Pattern grammar (local grammar)– Collostruction– Lexical grammar (natural grammar, real grammar)– Lexical priming (textual colligation)– Longman grammar: Biber et al. grammar register
variation• Tool development lags behind
![Page 11: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/11.jpg)
11
From phraseology to R-gram
• Many of the ‘grammars’ as some sort of phraseology
• We coined a technical term ‘R-gram’.– An operational parallel to phraseology– The unit of language can be words, lemmata,
phrases, POS, POS sequence, and combination of all these.
– Can be linguistic structures with uncertain words or categories (e.g. be passive/get passive).
![Page 12: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/12.jpg)
12
• a * of: collocational framework• It be ADJ that: evaluative construction• Noun noun compounds• Bi-nominal constructions• Passive constructions: be/get ADV. V-EN• All these could be matched with Regular
Expressions.• But Regex is too difficult for lay users.
![Page 13: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/13.jpg)
13
Easy search with enhanced hits
• Smart Input• Three meta-characters in Smart Input syntax,
the simplest grammar ever.
• @be returns all inflectional forms of ‘be’
• #n returns all nouns
• * refers to any single word
![Page 14: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/14.jpg)
14
• a * of => a * of• It be ADJ that => It @be #adj that• Noun noun compound => #n #n• Bi-nominal => #n and #n• Passive => \S+_VB\S+\s(\S+_[RXPJDN]\S+\s)*\
S+_V\S*N
![Page 15: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/15.jpg)
15
Limitation
• speed• A concordancer without applying indexing• can't process texts larger than a few million
words anyway.
![Page 16: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/16.jpg)
16
Download PowerConc
•www.fleric.org.cn/powerconc/• http://www.bfsu-corpus.org/channels/tools
![Page 17: PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University](https://reader035.vdocuments.us/reader035/viewer/2022071718/56649e7d5503460f94b8030b/html5/thumbnails/17.jpg)
Thank you!