course grading project: 75% broken into several incremental deliverables paper...

17
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Upload: flora-stevens

Post on 04-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Course grading

Project: 75% Broken into several incremental

deliverables Paper appraisal/evaluation/project tool

evaluation in earlier May: 25%

Page 2: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Paper appraisal Read and critically appraise the system/tool your

project uses and/or a recent research paper (e.g. SIGIR, WWW conferences) which is relevant to your project

By April 24, obtain instructor confirmation on content you will read.

Propose what to do no later than April 17 By May 12 turn in a slide report/web site

Summarize and include relevant content. Compare it to other work in the area Discuss interesting issue/research directions that

arise. Get ready for a short presentation.

Page 3: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Project

Opportunity to devote time to a substantial research project Typically a substantive programming

project. Or in-depth analysis/comparison of methods

and evaluation. Work in teams of 1 or 2 students Topic can be selected towards your

interests Meet with me to discuss options

Page 4: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Project

Due April 14: Project group and project idea Decision on project group Brief description of project area/topic We’ll provide initial feedback

Due April 21: Project proposal Should break project execution into three

phases – Block 1, Block 2 and Block 3 Each phase should have a tangible deliverable Block 1 delivery due May 5 Block 2 due May 19 Block 3 (final project report) due June 9th

Week of June 7: Student project presentations

Page 5: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Project - breakdown

10% for initial project proposal Scope, timeline, cleanliness of measurements Writeup should state problem being solved,

related prior work, approach you propose and what you will measure.

10% for deliveries each of Blocks 1, 2 30% for final delivery of Block 3

Must turn in a writeup Components measured will be overall scope,

writeup, code quality, fit/finish. Writeup should be ~8 pages

Page 6: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Project Presentations

Project presentations in class (about 10 mins per group):

Great opportunities to get feedback. April 23/26: Students present project plans

Week of June 7: Final project presentations

Page 7: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

What is next? Project examples Example of Tools

WordNet Google API Amazon Web Services / Alexa Lucene Stanford WebBase

Page 8: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Project examples

Leveraging existing theory/data/software is encouraged, e.g.: Web services WordNet Algorithms and concepts from research

papers Etc.

Most projects: compare performance of several methods, or test a new idea against some baseline

Page 9: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Project Ideas

Build a search engine for UCSB technical reports. Compare and improve the ranking algorithm.

Crawl pages of a particular subject and build a special database and ranking (e.g. wikipedia)

Classify pages based on wikipedia or DMOZ categories.

Page 10: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Lucene

http://jakarta.apache.org/lucene/docs/index.html

Easy-to-use, efficient Java library for building and querying your own text index

Could use it to build your own search engine, experiment with different strategies for determining document relevance, …

Page 11: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Google API

http://code.google.com/intl/en/apis/ajaxsearch Web service for querying Google from your

software ~1M queries per day. Web search, site search, news search, blog search, Note: within search requests you can use special

commands like link, related, intitle, etc.

Page 12: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

WordNet

http://wordnet.princeton.edu/ Java API available (already installed) Useful tool for semantic analysis Represents the English lexicon as a graph Each node is a “synset” – a set of words

with similar meanings Nodes are connected by various relations

such as hypernym/hyponym (X is a kind of Y), troponym, pertainym, etc.

Could use for query reformulation, document classification, …

Page 13: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Stanford WebBase http://www-diglib.stanford.edu/~testbed/doc2/WebBase/

They offer various relatively small web crawls (the largest is about 100 million pages) offering cached pages and link structure data

They provide code for accessing their data

Page 14: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Recommendation systems

Data: http://www.grouplens.org/node/12 Rating of 270K books from 278K users. Rating of 100 jokes from 73K users.

Page 15: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

“Natural language” search

Present an interface that invites users to type in queries in natural language

Find a means of parsing such questions of important categories into full-text queries for the engine. What is Why is How to

Evaluate the relevancy of query answering.

Page 16: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Text spamming detection

Page 17: Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Detecting index spamming

lots of “invisible” text in the background color There is less of that now, as search engines check for it

as sign of spam

Questions: Can one use term weighting strategies to make IR

system more resistant to spam? Can one detect and filter pages attempting index

spamming? E.g. a language model run over pages

[From the other direction, are there good ways to hide spam so it can’t be filtered??]