personalised statistical writing analysis

Post on 21-Jan-2015

417 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Powerpoint slides from JAECS, 2013, Sendai, Japan.

TRANSCRIPT

John Blake Japan Advanced Institute of Science and Technology

Personalised statistical writing analysis

Overview• Introduction

– context, impetus – focus, process

• Five aspects – statistical analysis

• Personalised writing analysis – sample extracts

• Interview survey• Future direction

2

Context* Proofreading for faculty* Writing assistance for PhD candidates

3

70% 50% science

Impetus

21 email exchange on various points, including:• “minor scary incident” で統一したいと思います。• “near miss”“ ではなく” minor scary incident” で統一したい

と思います。• 提出先に聞きました。 near accident というのが一般的な

ようです。これで修正しました。• “near-miss incident” に変更しました。 … . 先生から指示

に従うように提案されました。• Near miss incident → Near miss incidents に全て修正しま

した。4

From one research article (RA)minor scary incident near-miss incident ヒヤリ・ハット

FocusEnable research articles meet generic expectations of:• Accuracy by being factually correct• Clarity by avoiding ambiguity• Formality by adopting appropriate style

5

rhetorical structure, logic, originality, flawed method, etc.= important, but…

Five aspects of generic integrity

1. Vocabulary fit2. Readability3. Word type balance4. Style and usage 5. Lexicogrammatical

errorsSummary statistics

6Bhatia, V. K. (1993). Analysing genre: Language use in professional settings. London: Longman.

Process for each research article• Create target corpus (TC)

• Analyse RA and TC

• Identify errors in RA• Compile ratios where

poss.• Create feedback document

7

Five aspects

8

• keyness of RA & TCVocabulary fit

• Readability statistics of RA & TCReadability

• Ratio of GSL, AWL and off-list for RA & TC

Word type balance

• Markedness, modality, registerStyle and usage

• Vocabulary & grammatical errorsLexico-grammar

1. Vocabulary fitScott & Tribble (2006, p.56)

``keyness [is what a text] boils down to``Hyland (2011) paper-journal fit

9

Hyland, K. (2011). Welcome to the Machine: Thoughts on writing for scholarly publication. Journal of Second Language Teaching and Research, 1 (1), 58–68.

Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language Education. Amsterdam, Philadelphia: John Benjamins.

TC firm knowledge market international foreignperformance research variables markets countriesexport country relationship business model

RA organizational TMSs coordination DOPPO expertise interactions mechanisms BLOCK employee leader team coordinate informal information management

Prepared using AntConc 3.2.4w with Brown Corpus as referenceTC = 243 RAs, c. 2.1 million words RA = 10k words

10

Prepared using Wordle with RA, 10k words

TC firm knowledge market international foreignperformance research variables markets countriesexport country relationship business model

RA

2. Readability

11

Gunning fog i

ndex

Flesch

Kincaid gr

ade le

vel

Mean se

ntence le

ngth05

10152025

DraftTarget

Bogert, J. (1985). In Defense of the Fog Index. Business Communication Quarterly, 48 (2), 9-12.Gilquin, G., & Paquot, M. (2008). Too chatty: Learner academic writing and register variation.

English Text Construction, 1 (1), 41-61. McClure, G. (1987). Readability Formulas: Useful or Useless, Professional Communication, IEEE

Transactions on, 30 (1), 12-15.

Bogert (1985) & McClure (1987) – factors affecting readabilityGilquin & Paquot (2008) - Learner academic writing – rather `chatty` Research articles tend to have a higher reading difficulty.

3. Word type balance

Levels academic text1st 1000 73.5%2nd 1000 4.6%AWL 8.5% Other 13.3%

12

First 2k

words69%

AWL16%

Off-list15%

Cobb , T. (2013). Web Vocabprofile. www.lextutor.ca/vp/Nation, I.S.P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.

Used in EAP courses at PolyU and CityU in Hong Kong

Nation (2001,p.17)

RA analysed by WebVP classic v4 (Cobb, 2013)

4. Style and usage errors

13

Marked usage Ratio SuggestionPeople provide first 0:9 COCA People first provide

Hyland (1998) – hedgingRobb (2003) – “Google as a quick ‘n’ dirty corpus tool”

Hyland, K. (1998). Hedging in scientific research articles. Amsterdam : John BenjaminsRobb, T. (2003). Google as a quick ‘n’ dirty corpus tool. TESL-EJ, 7(2).

Corpora: IS, KS, MS, BNC , COCA , WAC

5. Lexicogrammatical errors

14

Grammatical or vocabulary errors

Incorrect form Correct form Comment

1 Taking account differences

Taking account of differences

preposition

2 this study answers to two questions

this study answers two questions

answer to s.b. / answer s.th.

3 former employee a former employee employee [singular]

4 to participate to this study

to participate in this study

collocation (participate in)

5 emphasis is given on XX

emphasis is placed on XX

collocation (give to / place on)

6 for being responsible to be responsible general vs. specific purpose

Summary statistics

15

Based on requests for simple to understand evaluation

Caveat: subjective evaluations disguised as statistics

Personalised writing analysis

16

Selected statistics for subject 1

Readability Yours Target Word type balance Yours %

Target %

Gunning fog index

13.2 13.2 1k words 68.58 74.39

Mean sentence length

15.49 19.37 2K words 6.69 5.29

Mean number of clauses /sentence

1.19 1.54 AWL 16.36 7.67

Lexical density 0.63 0.57 Off-list words 8.36 12.65

Personalised writing analysis

17

Selected statistics for subject 4

Style and usage Sentence Ratio Comment or correction1 minor scary incidents 1: 58,700 WAC near-miss incidents2 falling-accident 0: 19 COCA slips, trips and falls OR

falling objects3 a medical examination

by interview1: 525 WAC0: 1 COCA

a medical consultation

4 According to sex 1: 18 WAC According to the gender5 175 indoor workers n/a Use One hundred and ….

6 Tomio,T. (1995) proposes

n/a Omit initials in in-text citations unless …

Personalised writing analysis

18

Selected statistics for subject 7

Style and usage Sentence Ratio Comment or correction1 people provide first their

expertise … 0:9 COCA

people first provide their expertise …

2 XX also engage into XX 1:9000 COCA

XX also engage in XX

3 The XX structure limits become

n/a Use limits for boundaries and limitations for restrictions/ inabilities

4 future studies are able to n/a Use may be to show uncertainty

5 employee simultaneous participation

0:5WAC

simultaneous participation of employees

Interview surveyInterviewer = meSubjects = 4 faculty, 1 PhD candidateNationalities = 3 Japanese, 2 non-Japanese Number   = 5 participants   Interview time    = 30 minutes Location   = private office on campus   Dates of interview = Jun-Jul 2013

Semi-structured interviews

e.g. `What revisions did you make to your paper since…..? `How can I make the feedback more useful?`

19

Survey results

20

• Explanatory notes – too long

• Key word lists – couldn`t understand

• Three readability scores – too complex

• Raw ratios – too difficult e.g. 47:211,120 1:4500

• Lexico-grammatical errors• Word type balance• Ratios for style and usage

Incremental improvements (made)1. Create summary statistic scorecard 2. Use word tag cloud for vocabulary fit 3. Shorten explanatory notes 4. Simplify and approximate ratios 5. Show word type balance graphically with

percentages6. Select `most useful` readability measure(s) –

mean sentence and word length?

21

Future developments• Integration of metrics into one-stop online

porthole (thanks to reviewer for idea) for researchers to submit drafts

• Statistical comparison of draft and published versions to evaluate success of feedback

22

Any questions, suggestions or comments?

John Blake johnb@jaist.ac.jp

top related