using selectors for nouns, verbs and adjectives

Using Selectors for nouns, verbs and adjectives as features to estimate Sentence SimilarityAndres VargasComputer Understanding of Natural Language.

Proposed Topic• Whether or not the use of selectors for other parts of speech

would increase the power of the selector similarity metric ?

Selectors• Selectors are words that take the place of an instance of a

target word within its local context.

Hansen A. Schwartz and Fernando Gomez. 2008. Acquiring knowledge from the web to be used as selectors for noun sense disambiguation. In Proceedings of the Twelfth Conference on Computational Natural Language Learning (CoNLL '08). Association for Computational Linguistics, Stroudsburg, PA, USA, 105-112.

“The kid plays in the garden”“The boy plays in the garden”

minor, child, schoolboy

Selectors• Similarity measures• Information Content

• The similarity between two concepts is the extent to which they share information in common.

The kid plays in the garden The boy plays in the garden

minorkindergartenchildbabyschoolboy

schoolboyteenagerminorjuvenilechild

Purpose• Extend the work done by Jha et al. by increasing the

performance of selectors for sentence similarity.

Sneha Jha, H. Andrew Schwartz, and Lyle H. Ungar. 2012. Penn: using word similarities to better estimate sentence similarity. In Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval '12). Association for Computational Linguistics, Stroudsburg, PA, USA, 679-683.

Method

• Focus words from each sentence by pos tag.• Get n-grams from the focus words.• Get selectors from each n-gram respective to the word.• Find similarity between vectors of selectors by cosine.• Value of cosine close to 1 similar• Value of cosine close to 0 not similar

• Store the cosine value in a semantic matrix• Build the sentence vector by mapping the max value

between words in the sentence.• Calculate the similarity between vectors by cosine.

Acquire selectors

• Take the respective n-gram from the focus word.• Perform a LIKE query in the hive table by

replacing the focus word with %• Get all the sentences that match the like

and extract the 10 most occurring words.• Those will be the selectors input we will

Components

• 2 EC2 instances • Amazon Simple Storage S3 bucket• Apache Hadoop and Hive• Python• NLTK

Dataset

• Google N-grams• 6% of total books• Version 2 is tagged pos• Version 2 is ordered alphabetically.

2009 ngram TAB year TAB match_count TAB page_count TAB volume_count NEWLINE

2012 ngram TAB year TAB match_count TAB volume_count NEWLINE

Dataset

•Microsoft Research Paraphrase•5801 pair of sentences•Human annotations• For this work the test file was used.

(1664 sentences)

Experiment

• Host• 7 GiB of memory

• 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)• 1690 GB of instance storage• 64-bit platform• I/O Performance: High• EBS-Optimized Available: 1000 Mbps

• Map the 5-gram file to a Hive Table.• Clean the Hive table by removing special characters.

Results

Conclusions and Future Work• For labs and personal work the use of amazon web servers is

the best solution to process big data.• With very high recall we ensure that the algorithm is returning

most of the relevant results.• As a future work the combination of selectors with Name

Entity Recognition algorithms could increase the precision.• The use and comparison of different semantic metrics is left

for a future study.

References• Hansen A. Schwartz and Fernando Gomez. 2008. Acquiring knowledge

from the web to be used as selectors for noun sense disambiguation. In Proceedings of the Twelfth Conference on Computational Natural Language Learning (CoNLL '08). Association for Computational Linguistics, Stroudsburg, PA, USA, 105-112.

• Sneha Jha, H. Andrew Schwartz, and Lyle H. Ungar. 2012. Penn: using word similarities to better estimate sentence similarity. In Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval '12). Association for Computational Linguistics, Stroudsburg, PA, USA, 679-683.

• H. Andrew Schwartz, Fernando Gomez ,and Lyle H. Ungar. Improving Supervised Sense Disambiguation with Web-scale selectors. In Proceedings of COLING 2012: Technical Papers, pages 2423–2440, COLING 2012, Mumbai, December 2012.

using selectors for nouns, verbs and adjectives

Education