syntax: if words “are more like humans than machines”- let’s party!

29
Syntax: If words “are more like humans than machines”- Let’s party!

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Syntax: If words “are more like humans than machines”- Let’s party!

Syntax:

If words “are more like humans than machines”-

Let’s party!

Page 2: Syntax: If words “are more like humans than machines”- Let’s party!

What is syntax?

• Syntax is, as you know, the process which governs the way in which words are combined together.

• But to understand it, we need to start by understanding functions

Page 3: Syntax: If words “are more like humans than machines”- Let’s party!

The nature of computation

• Syntax is a form of computation• Computation is essentially a mapping: a b• In the simplest ‘computers’ (finite automaton), the mappings are

deterministic, from state to state– State a State b

• In more complex machines, we get non-deterministic mappings depending on context (memory): the same state may map onto to any one of several states due to memory, and that memory may be under control of the machine itself

• Alan Turing (1936): Very simple machines can compute anything computable

Page 4: Syntax: If words “are more like humans than machines”- Let’s party!

Functions• A function is just a mapping from a specific input to a

specific output– The input and the input don't have to be numbers

– NameProf(x) takes in the number of a course and maps it onto the name of the person teaching that course

• So: NameProf(357) Westbury

– RazeTheHouse(x), which takes a house as input, and returns that house destroyed as output

Page 5: Syntax: If words “are more like humans than machines”- Let’s party!

Primitive Functions• A primitive is a lowest function: the one that can't be defined in

terms of any other

• Let’s consider an old favorite: ‘+’

• If we want to define a non-primitive function ‘AddOne’ we can:AddOne(x) = x + 1

• We haven’t added new functionality: we’ve just re-named what we had in a way that is convenient

Page 6: Syntax: If words “are more like humans than machines”- Let’s party!

Functions of Functions• Functions can call other functions, including themselves (recursion)• Let's define a function AddTwo, which adds two

• We already have AddOne• We can just say define AddTwo as:

AddTwo AddOne(AddOne(x)) AddOne(x + 1)

(x + 1) + 1

• We haven’t added new functionality: we’ve just named something in a way that is convenient

Page 7: Syntax: If words “are more like humans than machines”- Let’s party!

Functions of Functions of Functions• Let's say we want to define a function AddThree, which adds

three• We already have AddTwo, and AddOne• We can just say define AddThree as:

AddThree AddTwo(AddOne(x))

• At some point we may get tired of this game: We are wasting time and energy trying to name all these silly little functions, AddOne, AddTwo, AddThree….Will it never end?

Page 8: Syntax: If words “are more like humans than machines”- Let’s party!

Generalizing functions• A more general solution would add ANY number to any input• But we already know how to do that, since we have addition as

a primitive: AddN(x,n) x + n• Notice the difference we had to introduce: we had to add

a second input or parameter• Why? Because the way AddOne was defined had a

constant in it• We just said "let that constant be a variable"- and so

we got a much more powerful function that eliminated the need for thousand of other more specialized functions: all the AddOne, AddSixteen, AddSeventy etc.

Page 9: Syntax: If words “are more like humans than machines”- Let’s party!

The magic of parameters

• By adding one variable we got rid of an infinite number of functions, collapsing them all into a single function with two arguments

• What we noticed, in essence, is that all cases of addition were similar- they could be all computed in the same way we were computing our primitive, +

• Parameterization can be traded off against computation

Page 10: Syntax: If words “are more like humans than machines”- Let’s party!

Hey, what about language?• This is the kind of functional collapse that

Chomsky wants to do• He wants to show that many things that appear to

be different are minor variations of the same function, just in the same way that AddOne and AddThreeHundred are minor variations of the same function

• He wants to do it in the same kind of way we did: by saying, look, you have N functions here that are really just 1 function, plus an extra parameter

Page 11: Syntax: If words “are more like humans than machines”- Let’s party!

How little can we get by with?

• The question becomes: What is the simplest representation of the computation that is sentence-making?

• This breaks down into the related questions:– What are the most primitive functions?– What are their parameters?

• If we can identify a few primitive universal functions and some universal parameters, we may find deep underlying similarities between languages that appear on the surface to be different (as multiplication might appear different from addition at first sight)

Page 12: Syntax: If words “are more like humans than machines”- Let’s party!

What syntax is not

• One possible way syntax might work would be Markov chaining: i.e. probabilistic word chaining

– Calculate the likelihood that one word follows another (transition probability) , and then only select from those words that actually have a probability > 0 of following a word

– A frequentist approach

Page 13: Syntax: If words “are more like humans than machines”- Let’s party!

Two arguments against chaining

• Chomsky's initial claim to fame is that he claimed to have proven that there is no possible way that word-chaining devices could account for syntax– Not everyone is convinced, but everyone does agree

that simple word-chaining devices won't work

• Chomsky basically had two main argument against them:– i.) Zero probability transitions

– ii.) Relational dependencies

Page 14: Syntax: If words “are more like humans than machines”- Let’s party!

i.) Zero probability transitions

• We can produce and understand transitions that have zero probability (= have never been encountered before)– i.e. 'colorless green' and 'sleep furiously' had probably

never been uttered before Chomsky said it, but we can all agree that it is grammatical, therefore grammar cannot be only transitions

– This means we can’t be chaining on words

– It also indicates the autonomy of syntax from semantics• We can judge grammaticality of sentences

independently of their meaning

Page 15: Syntax: If words “are more like humans than machines”- Let’s party!

ii.) Relational dependencies

• Some sentences contain relational dependencies of a kind that simply cannot be captured by transition probabilities

• For example, consider: "If I show you this sentence, then you will understand the problem”– there is a long-distance dependency from 'if' to 'then' that can

(provably) not be captured by a particular kind of transition-calculating device called a finite state machine

– In normal language, we can say that the problem is simply than transition devices don't have a memory, so they can't 'force' a later transition to match an earlier one.

– An aside: There are ways to make transition devices deal with these problems, but they require all sorts of very clunky machinery (requiring hugely redundant encoding) that seems very implausible

Page 16: Syntax: If words “are more like humans than machines”- Let’s party!

ii.) Relational dependencies• The problem gets even more complicated because we can

embed long-distance dependencies– Consider: "If either I show you this sentence or I explain the problem

clearly, then you will understand what Chomsky's point was.”– Now we have a sentence we can all understand, but we have a second

dependency: the 'if' has to first close up the 'either' clause and also remember that it is needing a 'then'.

• There is not necessarily a simple lexical marker: I can also say "If I show you this sentence or I explain the problem clearly, you will understand what Chomsky's point was."- now there is no 'or' or 'then' to trigger the memory– Listen to language, you'll see the point: such long-distance are not all

rare, but occur in many sentences and from a very early age.

Page 17: Syntax: If words “are more like humans than machines”- Let’s party!

ii.) Relational dependencies• There is well known grammatically-correct sentence that ends with 5

prepositions closing 4 embeddings, said by a young child to his father:

"Daddy, what did you bring up that book that I don't want to be read to out of up for?"

• By the time he gets to "read" the child has to remember the following dependencies:– i.) 'to be read' requires 'to’– ii.) 'that book that' requires 'out of’– iii.) 'bring' requires 'up’– iv.) 'what' requires 'for’

And he does….!

Page 18: Syntax: If words “are more like humans than machines”- Let’s party!

Sentences aren’t beads on a string

• Chomsky's solution was one that many take for granted now: it was to suggest that sentences are not flat lists of words, but have a tree structure, and that it is the not the individual words, but parts of the tree that are the units of language– i.e. syntactical constraints are not at the single word

level but at the role level, where a role may be played by a multiword string or a single word

– Each element that can fill a role is called a constituent

Page 19: Syntax: If words “are more like humans than machines”- Let’s party!

A constituent

• An example is a NP (noun phrase), which is defined in Chomsky's original tree notation as

(det) A* N• This just means that it contains exactly one

optional determiner (like 'a', 'the', 'some', 'many') plus any number of adjectives (including 0) plus a noun.

• 'dog' is a noun phrase• So is 'A big hairy rabid frightening nasty dog'

Page 20: Syntax: If words “are more like humans than machines”- Let’s party!

So what?

• When our units are defined at the constituent level, instead of the word level, we can easily understand how we can re-use parts in different places as in 'A big hairy rabid frightening dog bit me' and 'I gave the big hairy rabid frightening dog a steak’– it also impacts on the dependency problem,

because we can have trees that constitute a 'memory' for the whole sentence

Page 21: Syntax: If words “are more like humans than machines”- Let’s party!

So what?• We can have functions (= rules) like:

• S Either S or S• S If S then S

• This kind of self-referentiality- in which an object (here, a sentence) is defined in terms of itself- is recursion

• Recursion allows for very tightly defined functions, which simplify complex calculations by defining them in terms of simpler cases.

Page 22: Syntax: If words “are more like humans than machines”- Let’s party!

A classic example: Factorial

Factorial(x) :If x = 1 1

Otherwise x * Factorial (x-1)

Page 23: Syntax: If words “are more like humans than machines”- Let’s party!

Calling each other.• With recursion in language you can also calculate a very complex output

with very simple rulesS Either S or SS If S then S

• With these two rules we can get sentences like:"If either my big hairy frightening dog is rabid or my unrepaired car

brakes are faulty, then either I will be going to the scary grey hospital this afternoon or I will be going mad.”

• This seems to match our ‘mentalese’: ‘A big hairy rabid frightening dog’ is certainly a dog, and we want to be able to move our attention around from the dog to the brake and hospital without being ‘thrown off’ by the number of adjectives or qualifying clauses attached to those things in the sentence.

Page 24: Syntax: If words “are more like humans than machines”- Let’s party!

Example

• "Tonight's program will discuss stress, exercise, and sex with Celtic forward Scott Wedman, Dr. Ruth Westheimer, and Dick Cavett".

• This can be VP VP NP PP– VP (verb phrase) = ‘will discuss’– NP (nouns phrase) = ‘stress, exercise, and sex’– PP (prepositional phrase) P NP

• P = ‘with’ • NP = ‘Celtic forward Scott Wedman, Dr. Ruth

Westheimer, and Dick Cavett’

Page 25: Syntax: If words “are more like humans than machines”- Let’s party!

Example• "Tonight's program will discuss stress, exercise, and sex

with Celtic forward Scott Wedman, Dr. Ruth Westheimer, and Dick Cavett".– This can also be VP VP NP– VP = ‘will discuss’ – NP N PP [‘…sex with Dick Cavett…]

• N = ‘stress, exercise, and sex’ • PP P NP

– P = ‘with’ – NP = ‘Celtic forward Scott Wedman, Dr. Ruth

Westheimer, and Dick Cavett’

Page 26: Syntax: If words “are more like humans than machines”- Let’s party!

How do we know what is what?

• Each part of speech is defined by the role it plays

– so a noun is anything that can go in the NP slot

• There are two main principles for understanding slots:

– i.) The head determines the meaning

– ii.) Slots determine what roles each element in a sentence can play

Page 27: Syntax: If words “are more like humans than machines”- Let’s party!

i.) The head determines the meaning

• ‘Fox in socks’ is about a fox, not about socks• ‘Flying to Rio before the taxman catches him’ is

about flying, not about catching• There are hard rules in every language which

determine which component plays the head role• We saw one English rule above: NP N PP

– so, ‘sex with Dick Cavett’ is about a specific kind of sex, not about a specific attribute of Dick Cavett.

– ‘with Dick Cavett’ is also a slot, called a modifier

Page 28: Syntax: If words “are more like humans than machines”- Let’s party!

ii.) The choreographing of roles

• Slots determine what roles each element in a sentence can play– "Ruth Westheimer discussed sex with Dick Cavett"

choreographs three things: the discusser (Ruth), the object (sex), and the recipient (Cavett)

• Each one of these roles is called an argument to make clear that they are being fed into a function- that function is determined by the tree structure– every end-point (branch) of the tree has to be filled, so

the number of branches = the number of arguments.

Page 29: Syntax: If words “are more like humans than machines”- Let’s party!

So what?

• When we start to think of things in terms of trees with arguments, then we can start to see some deep regularities in language

• For example, NP and VP turn out to be very similar in their abstract structure…

• Tune in next time…