corpus and semantics final

18
MARIA CAROLINA FILIPE RODRIGUES FABIO RONDINELLI ERICO CAETANO Corpus linguistics and semantics studies

Upload: filipe-santos

Post on 04-Jul-2015

103 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Corpus and semantics final

M A R I A C A R O L I N A

F I L I P E R O D R I G U E S

F A B I O R O N D I N E L L I

E R I C O C A E T A N O

Corpus linguistics and semantics studies

Page 2: Corpus and semantics final

What is corpus linguistics?

• Collection and analysis of a specific set of data

• Corpus characteristics

• Technology and corpora

• Access to language in proper use

• Quantification

• It facilitates the access to the material

Page 3: Corpus and semantics final

Most known corpora

• Brown Corpus

• British National Corpus

• Oxford English Corpus

• International Corpus of English

• Example: http://corpus.byu.edu/coca/

Page 4: Corpus and semantics final

Corpus linguistics in semantic prosody

“Prosody” in the term “semantic prosody” is borrowedfrom Firth (1957), who used it to refer to phonologicalcolouring which spreads beyond semantic boundaries. Togive an example, the word animal has so strong a nasalprosody that the vowel sound of the letter a is endowedwith a nasal quality through assimilation, simply because ais closely adjacent to the nasal sound of n. In the same way,lexical items share this particular phenomenon of“prosody” in lexical patterning. Enlightened by Firthiansense of a “prosody”, Bill Louw coins the term “semanticprosody” and endows it with its first definition, a“consistent aura of meaning with which a form is imbuedby its collocates” (Louw, 1993: 157).

Page 5: Corpus and semantics final

Louw illustrates SP with several examples such as theadverbs utterly, the phrase bent on and the expressionsymptomatic of, which simultaneously carry negativeSP. These three words are followed by expressionswhich refer to undesirable things, such as destroying,ruining, clinical, depression, multitude of sins, etc.

Page 6: Corpus and semantics final

Semantics x Pragmatics

Semantic meaning and pragmatic meaning are the twoextremes in meaning system, for semantic meaningcan be seen as the meaning which arises only fromlinguistic factors in a piece of communication, whilepragmatic meaning is that meaning imposed by thenon-linguistic elements which has an impact oncommunication

Page 7: Corpus and semantics final

Studies on Corpus Linguistics and Semantics

-Chishman and Teixeira (2009) provide us with an interesting study on nominal compounds based on Corpus Linguistics.

-Data from 10 digital issues of National Geographic analyzed by a software.

-It recalls a common question Brazilian students of English may ask: when trying to say “bolo de maça”, for instance, they may try “cake of apple” or even “apple‟s cake” before getting to “apple cake”, the correct nominal compound.

Page 8: Corpus and semantics final

- Identification and categorization of recurrent semantic relations between nominal compounds.

Examples: in memory drugs we find a relation of telicity, for those drugs aim at serving memory purposes. In school play, there is a relation of localization, while in rice bag the effect is of meronimy, for one element contains the other.

Page 9: Corpus and semantics final

Such analysis could inspire us to observe the relation of compound nouns and even suggest that Brazilian students of English take a deeper look at them. For instance, what kind of relation would students find in the following compounds? How could they explain it with their own words?

- car accident - fruit bat - skin cancer

- island culture - lemon tree - cameraman

- metal armor - ethanol production

Page 10: Corpus and semantics final

A Corpus-Driven Approach to Genre Analysis

- The paper shows that an exhaustive corpus-drivenapproach, mixed with statistics, is the most effectiveanalytical method for comparing texts across genres.

- By using the resources above, the author examinesthe characteristics of each genre, looking at wordsand phrasal behavior

- According to the author, such na approach cancontribute much to the study of the pragmaticanalisys of written texts

Page 11: Corpus and semantics final

Genres

Prior conceptions of genres considered externalcriteria. Biber(1988,1993)

With the new approach, genre can based on internalcriteria

Instead of using a priori listings, genre can emerge through quantitative research in linguistics

Biber (1988) and the multianalytical approach: ifsome linguistic features are frequently in a text, other features will appear less frequently

Page 12: Corpus and semantics final

Corpus compilation

The general reference includes academic texts, newspaper and literature from 6 pre-existing corpora

The size of the resulting genre corpora are as follows: academic corpus (MicroConcord B + text category J of the 4 corpora), 1,662,106 running words; newspaper corpus (MicoroConcord A + text category A, B, C texts of 4 corpora), 1,760,664 running words; literature corpus (text category K-R texts from 4 corpora), 1,019,254 running words. The size of a general reference corpora derived from mixing the 4 corpora (hereafter re ferred to as the „GR‟ corpus) was 4,071,830 running words.

Page 13: Corpus and semantics final

Vocabulary variety and difficulty

The ranked order is, 1. newspaper, 2. literature and 3. academic. Therefore, both S-TTR and Guiraud values suggest that newspaper English uses the most varied vocabulary, literary English an intermediate one, and academic English the smallest, if estimators of lexical density are used.

The inclusion of longer words is taken to mean that texts have many difficult words from a solely empirical perspective - 1. academic, 2. newspaper and 3. literature.

Page 14: Corpus and semantics final

N-Gram analisys

This analisys was done by comparing multi-word unitsbetween genre corpora, in particular 4-word unitsoccurring in each genre corpus. Coniam (2004) usedKfNgram (Fletcher 2002) to compute 4-word unitsoccurring in specific genre texts taken from appliedlinguistics articles

N-grams are able to identify the commonest collocations in a discourse far more effectively than a single word analysis. There is an overall tendency toward using multi-word fixed units in academic texts as opposed to other genres.

Page 15: Corpus and semantics final

Personality in texts: I, we and passives

Kuo researched the use of the personal pronoun in academic texts from an empirical viewpoint. The use of the personal pronoun provides an environment creating an interpersonal interaction between the writer and the readers (Kuo 1999:123)

Literature overuses “I”, while academic and newspapersunderuse it. Academic and literature use “we” more oftenthan newspapers.

The passive voice is much more used in academic texts

Page 16: Corpus and semantics final

Nominalization

Biber et al. (1998:58) suggest that, “studying a morphological characteristic in a corpus can teach us both about the frequency and distribution of the characteristic and about the differing functions of particular variants”.

nominalization creates forms ending with -tion -sion, -ness, -ment and -ity, including plural forms.

Page 17: Corpus and semantics final

academic texts show nominalization at a higher ratio than other genres and its texts tend to use nominalizations ending with -ity, -ment, but at a much lower frequency, -ness

Newspapers show a similar use of nominalization as academic texts, but the –ment form is predominant

literature works use these three nominalizations almost equally and the –ness form is the most salient

Page 18: Corpus and semantics final

References

ZHANG, Changu. An Overview of Corpus-based Studies of Semantic Prosody. Asian Social Science, vol. 6, June 2010.

CHISHMAN, Rove; TEIXEIRA, Lilian F. A semântica dos compostos nominais em língua inglesa: um estudo de corpus. Veredas on-line – Linguística de Corpus e Computacional, 2/2009, P. 84-99

NISHINA, Yasunori (2007) “A Corpus-Driven Approach to Genre Analysis: The Reinvestigation of Academic, Newspaper and Literary Texts”, ELR Journal, 1 (2).