cfl lexical feature marker

23
CFL Lexical Feature Marker A User Walk-through Guide

Upload: others

Post on 03-Dec-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CFL Lexical Feature Marker

CFL Lexical Feature Marker

A User Walk-through Guide

Page 2: CFL Lexical Feature Marker

First, if you don’t have it, download the program and data…• Ctrl-Click this link http://www.cfl-toolkit.com

• Then click on Lexical Feature Marker

• You will see this screen.

The Download is a zip file, which has the full program and other files inside it. Unzip it into a folder of your choice. You can use the LFM2021.jar file like a Windows program, by clicking on the file name. This will work as long as Java is already installed on your computer.If Java isn't there, simply type in 'java download' in a search engine and go to the Java page, which will know what version you need for your computer.The online version is there in case your computer doesn’t allow Java to run. It is slightly different from the Java program, but does most of the same things. The demo version has preloaded files for you to use alongside the introduction.The other two options allow you to use files from your own computer.

Page 3: CFL Lexical Feature Marker

CFL LFM

The LFM is designed to help us examine small numbers of individual texts, rather than the larger corpora used by AntConc or WordSmith concordancers.

LFM contains built-in lists of what are known as function or grammar words. These are the top four items on the Show* tick-box list in the screenshot to the right (* In this ppt, red text indicates the red headers on the screenshots to help you find your way around the program).

Getting started• Select Files (top left in the screenshot). Select a file

or files, which must be plain text (.txt files), from a folder on your computer.

• Click on one or more of the Show check boxes and the program will show you where they occur in the currently selected file.

Page 4: CFL Lexical Feature Marker

Pronouns

• Here is the Prime Minister of the UK, Boris Johnson, announcing the January 2021 lockdown.

• The program always starts with Pronouns if you don’t change it.

• The list contains only personal, demonstrative and reflexive pronouns, with the exception of 'that' as a demonstrative because 'that' is polysemantic and polyfunctional.

• In the text box on the left, all the pronouns are shown in red so that you can read them in context.

• The Statistics panel shows you the totals for each pronoun in descending order of frequency together with the percentage of the text that they represent.

• You can clearly see the emphasis is on ‘we’ and ‘you’ in the speech.

Page 5: CFL Lexical Feature Marker

Prepositions

• You can see any of the function word sets individually. Each set has a different colour, in this case bold blue.

• Prepositions make up 14% of this text and their organization role in text is very clear from this screenshot.

• Function words normally account for between 45% and 55% of any English text.

• ‘of’ is the second most common word in English, after ‘the’, which we don’t show in the LFM because it makes up around 7% of most texts.

• ‘to’ comes top of this listing because of the number of infinitive verbs it introduces, saying what we can and cannot do, particularly in paragraph 3.

• You can also see the rapid drop in quantity after ‘of’, ‘to’ and ‘in’ This is an example of Zipf’s Law, which you which you may already have heard about. If not, look it up.

Page 6: CFL Lexical Feature Marker

Modals and ‘to be’

• You can see more than one function set at a time. Here we are selecting Modals and the various parts of the irregular verb ‘to be’.

• Here the modal verbs are shown bold blue with the different parts of the verb ‘to be’ shown in purple.

• Modal verbs are always followed by the infinitive form of the verb. Look at the verbs that follow ‘will’ in the second paragraph.

Page 7: CFL Lexical Feature Marker

All four together

• When we view all the function words from our sets that are used by Mr. Johnson you will see that normally more than one is present in of all the sentences, and that they frequently work together. The words shown represent almost 26% of the text. This is similar to what will be found in literary or newspaper texts.

• You will see that there are other words that are normally considered functional but are not included in the sets, ‘the’, the conjunctions ‘and’ and ‘but’, the words ‘what’ and ‘that’ which are multifunctional, ‘not’ and ‘no’ and a few others. You can see them using the Function checkbox, explained below, after Distribution.

Page 8: CFL Lexical Feature Marker

The Distribution

• When you go to the Distribution tab you can view 1) the location of each of the words on the list across the full text in the top box and 2) the sentences where the words occur in the lower box. Here we are looking at pronouns; the box in the centre-left shows which pronoun is highlighted in the lower box.

• The top box is read as follows:

• A dot indicates that the word does not occur in the sentence

• A number indicates how many times the word appears in a sentence.

• Here we can see the ‘we’ is used throughout, ‘you’ is focused, and ‘I’ comes largely near the end.

Page 9: CFL Lexical Feature Marker

The distribution and use of ‘under’

• Looking at ‘under’, which you do by selecting the word in the centre-left box, you can see three different occurences of that word, which appears in just two sentences early in the speech.

• In sentence 8 ‘hospitals are under pressure’

• In sentence 15

• the ‘country is under extreme measures’ and

• the variant needs bringing ‘under control’.

Page 10: CFL Lexical Feature Marker

Core and Function words

• The central purpose of the LFM is to draw attention to the way in which a small number of words assist in the construction of meaning, by organizing the structure of a sentence.• The LFM includes two further lists of words, Core and Function. • Core is a list of words that are found in quantity in books in use in schools for teaching reading to 4 to 14 year-olds. It was created as part of a review of the words which children might be expected to see frequently as their reading skills develop. They are retained in adult writing, of course, but different writers employ them in different proportions, so they can be useful in authorship attribution.• Function is the full list of words used in other programs by the builder of the LFM. There are about 450 of them and function words represent between 45% and 55% of most English texts.• You get no Distribution display with Core, Function or Content, as all sentences will have representation of one sort or another.

Page 11: CFL Lexical Feature Marker

Core

• You can see that most of the sentences have more than one of the Core vocabulary items in them, and also see that the vocabulary itself is of relatively short words and words that are likely to be found in writing of all sorts. There are 140 tokens representing almost 12% of the text.

Page 12: CFL Lexical Feature Marker

Function

• You can see here that there are many words highlighted in blue, including ‘and’, ‘but’, ‘as’ and ‘already’.

• You can probably see that function words make up the majority of most sentence words. In this text they account for nearly 55% of the running words.

Page 13: CFL Lexical Feature Marker

Content

• Content words are negatively defined. They are all the words not held on the function word list, so including those that are on the Core list.

• ‘Boris’ and ‘Johnson’ appear 13 times as a result of the reporting style/metadata of the web site from which this text was collected.

• After that, in this speech the repetition 7 times each of ‘covid’, ‘new’, ‘schools’, ‘variant’ and ‘virus’ shows clearly what this text is about.

Page 14: CFL Lexical Feature Marker

Function and Content

• You can therefore see all of the text, by ticking both Function and Content checkboxes, with the mixture of function and content words in almost all sentences. (It is possible to have sentences entirely formed from function words or entirely formed from content words, but they are unusual.)

• Here, the importance of the topics makes the content words appear close to the top of the Statistics box, as the position of the slider to the right of the box shows; this is unusual. Normally content does not appear until much lower down, because of the structural role of the function words making them far more frequent than even the central topic of a text.

Page 15: CFL Lexical Feature Marker

Using names in a conversation or debate -1

• Here we have the first debate between Republican President Donald Trump and his Democrat rival for the 2021 presidency, Joe Biden.

• You can find who uses which words and with what frequency by entering the distinct introductions to each speaker, which you can see are their surnames followed by a colon (e.g. Vice President Joe Biden:). You enter those names (though you only need to use the Biden:) on the top right of the screen in the Names area, one in turn, clicking Add to put them in the box.

• When you highlight the file (e.g. TrumpBiden1.txt) in the Select Files area, the program identifies the frequency of use of the currently selected function word set by speaker name for you in the Statistics.

• You can see a clear difference in the frequency of ‘you’ ‘he, ‘it’ and ‘I’ by the two speakers.

Page 16: CFL Lexical Feature Marker

Using names in a conversation of debate -2

• When you go to the distribution tab now, the display has changed, showing the most frequent word, and the usage, where the first name on the list, in this case the debate chair, is shown in blue and the other participants are shown in red.

• This was a very long debate, but you can see that it is possible to follow any of the words across by using the slider button on the bottom of the distribution pattern.

Page 17: CFL Lexical Feature Marker

Save

• The Save button under the Show tickboxes saves the following HTML files:

1. The whole of the current Text window contents to the filename you select

2. The Distribution adding a _D suffix to the filename, only if there is a Distribution.

3. The Current Statistics window contents, adding a _S suffix to the filename.

• You can import these into Excel, since Excel recognizes HTML files as a potential source. Doing this gives you the full sorting and arithmetic potential of Excel files. You can save them as an XLSX file to preserve that functionality.

Page 18: CFL Lexical Feature Marker

Your word list - I

• The Your word list box lets you enter any set of words you want to investigate by entering each word into the top box in the middle of the display.

• To add words type in the thin white box and then clicking Add; your word will appear in the larger box below and then be marked up in red in the text below that. Here we have chosen the word ‘million’.

• You can have as many such words as you want, of course.

• The Remove button takes them out.

• You can see the different frequencies of use of ‘million’ for Trump and Biden in the Statistics and you can see their uses by clicking on each file in turn and then reading the text examples.

• The next screen shows the sentences that contain the term.

Page 19: CFL Lexical Feature Marker

Your word list - 2

• Here you can read all the sentences with the word ‘million’ in them to see the full context.

Page 20: CFL Lexical Feature Marker

Transcripts I

• The Transcript tab allows you to enter court transcripts, conversations or debates, etc., where the speakers can be clearly identified, much in the same way as we have used Names, but this function adds some information about vocabulary usage and also which items are shared and which are unique to a speaker.

• In this case we have used the two presidential election debates used earlier.

• You first need to add a line at the top of your text file (before uploading the file) saying simply TRANSCRIPT.

• In the Delimiters box you then add each of the participants as they are represented in the transcript. In this case surnames terminated with a colon. (You can Save them to a file and Load it as well).

• Select Split Transcript button and the program splits the vocabulary by speaker.

• You can then see what vocabulary is distinct to each of the leaders, what is shared by two, and what all three use at some point in the debate.

• The double exclamation marks show where there is a distinct difference between speakers in their use of the same words. B0 is Biden and T0 is Trump, and the frequencies are shown after each name. You can see clear differences in use between the protagonists from this particularly stormy debate.

Page 21: CFL Lexical Feature Marker

Transcripts II

• Here we are using two transcripts in the same file, each identified by the word TRANSCRIPT on a separate line at the start of each debate.

• The advantage of this is to see in the Names sharing vocabularybox, the amount of vocabulary that is shared with one or more speaker or unique to the three speakers across the two debates.

• For example, in line 1, we have Biden:-1=330. This means that there are 330 items of vocabulary in debate 1 that he doesn’t share with anyone else, or, indeed, with himself in debate 2. These words are, therefore, unique to him in debate 1.

• And, for example, in line 2, we see that there are 122 vocabulary items that are shared by Biden across the two debates. He shares these words with himself.

• And for example, in the blue highlighted line, what is being shown is that, remarkably, Donald Trump and Joe Biden only share 72 words across the two debates, with that Vocabulary and usage by name shown in the box on the right.

• In the line two lines below, you can see that both participants shared 230 words with those used by the two different chairs of the debates, that is responding to the topics for discussion introduced by the chairs.

Page 22: CFL Lexical Feature Marker

Conclusion

• This should give you a flavour of what you can do with the LFM on this module and in the future.

• Over the years students have used it to explore many different texts, datasets and corpora that they have been given or have built for themselves.

• Building on the foundations of what is built in to the LFM, in your analysis you can explore whatever selection of texts takes your fancy, learning more about how language works and how it is used.

• The decisions about how you interpret the output are down to you.

• We are the textual analysts; the computer just gives us data that we can use to try to understand, interpret, and make our points.

Page 23: CFL Lexical Feature Marker

References

• Transcripts of Boris Johnson broadcast and 2020 US election debates obtained from this site https://www.rev.com/blog/transcripts