ling 581: advanced computational linguistics lecture notes january 19th

22
LING 581: Advanced Computational Linguistics Lecture Notes January 19th

Post on 21-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

LING 581: Advanced Computational Linguistics

Lecture NotesJanuary 19th

Administrivia• New room

– Shantz 338– (I have asked Jennifer Columbus to investigate refund: however, I’m told it may

not happen)

Marshall 480

Shantz338

Penn Treebank

• Availability– Source:• Linguistic Data Consortium (LDC)• U. of Arizona is a (fee-paying) member of this

consortium• Resources are made available to the community

through the main library• URL

– http://sabio.library.arizona.edu/search/X

Penn Treebank (V3)

• Call Record

Penn Treebank

1. Tagging Guide2. Arpa94 paper3. Parse Guide

Penn Treebank

Penn Treebank

sections 00-24

Penn Treebank

tregex

• Tregex is a Tgrep2-style utility for matching patterns in trees.

writtenIn Java

run-tregex-gui.command shell script

-mx flag, the 300m default memory size will need to be increased depending on the platform

tregex• Select the PTB directory

– TREEBANK_3/parsed/mrg/wsj/• Browse

Deselect any unwanted files

tregex

• Search

tregex

Help

tregex

• Help

tregex

• Help

tregex

• Help

tregex

• Help

tregex

• Pattern: – (@NP <, (@NP $+ (/,/ $+ (@NP $+ /,/=comma))) <- =comma)

tregex

• Help

tregex

tregex• Different results from:

– @SBAR < /^WH.*-([0-9]+)$/#1%index << (@NP < (/^-NONE-/ < /^\*T\*-([0-9]+)$/#1%index))

tregex

Example: WHADVPalso possible(not just WHNP)

Ungraded Homework Exercise

• Search for NP trace relative clauses as defined below:

Be ready tocompare searchpattern andnumber found next timein class