cross-language retrieval lbsc 708a/cmsc 838l philip resnik and douglas w. oard session 9, november...

Post on 05-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Cross-Language Retrieval

LBSC 708A/CMSC 838L

Philip Resnik and Douglas W. Oard

Session 9, November 13, 2001

Agenda

• Questions

• Overview– The information– The users

• Cross-Language Search

• User Interaction

The Grand Plan

• Phase 1: What makes up an IR system?– perspectives on the elephant

• Phase 2: Representations– words, ratings

• Phase 3: Beyond English text– ideas applied in many settings

A Driving Example

• Visual History Foundation– Interviews with Holocaust survivors– 39 years’ worth of audio/video– 32 languages; accented, emotional speech– 30 people, 2 years : $12 million

• Joint project: MALACH– VHF, IBM, JHU, UMD– http://www.clsp.jhu.edu/research/malach

TranslationTranslingual

BrowsingTranslingual

Search

Query Document

Select Examine

InformationAccess

InformationUse

A Little (Confusing) Vocabulary• Multilingual document

– Document containing more than one language

• Multilingual collection– Collection of documents in different languages

• Multilingual system– Can retrieve from a multilingual collection

• Cross-language system– Query in one language finds document in another

• Translingual system– Queries can find documents in any language

Who needs Cross-Language Search?

• When users can read several languages– Eliminate multiple queries– Query in most fluent language

• Monolingual users can also benefit– If translations can be provided– If it suffices to know that a document exists– If text captions are used to search for images

Motivations

• Commerce

• Security

• Social

0.1

1.0

10.0

100.0

Inte

rnet

Hos

ts (

mill

ion)

:

Eng

lish

Japa

nese

Ger

man

Fre

nch

Dut

ch

Fin

nish

Span

ish

Chi

nese

Swed

ish

Language (estimated by domain)

Global Internet Hosts

Source: Network Wizards Jan 99 Internet Domain Survey

72%

7%

5%

2%

2%

1%

1%

1%

1%

1%

7%

English

Japanese

German

French

Chinese

Spanish

Italian

Swedish

Malay

Korean

Portuguese

Dutch

Danish

Czech

Finnish

Russian

Polish

Hungarian

Norwegian

Estonian

Greek

Bulgarian

Croatian

Basque

Thai

Turkish

Arabic

Albanian

Others & Unknown

Source: Jack Xu, Excite@Home, 1999

Global Web Page Languages

European Web Content

Source: European Commission, Evolution of the Internet and the World Wide Web in Europe, 1997

Bilingual25%

Other8%

Native Tounge Only42%

English Only11%

English Only/UK &

Ireland14%

European Web Size Projection

0.1

1.0

10.0

100.0

1,000.0

10,000.0

Bil

lio

ns

of

Wo

rds

English Other European

Source: Extrapolated from Grefenstette and Nioche, RIAO 2000

Global Internet Audio

source: www.real.com, Feb 2000

529

1367

English

OtherLanguages

Almost 2000 Internet-accessible

Radio and TelevisionStations

13 Months Later

source: www.real.com, Mar 2001

10621438

English

OtherLanguages

About 2500 Internet-accessible

Radio and TelevisionStations

User Needs Assessment

• Who are the potential users?

• What goals do we seek to support?

• What language skills must we accommodate?

Global Languages

0

200

400

600

800

Spea

kers

(M

illio

ns)

Chi

nese

Eng

lish

Hin

di-U

rdu

Span

ish

Por

tugu

ese

Ben

gali

Rus

sian

Ara

bic

Japa

nese

Source: http://www.g11n.com/faq.html

Global Trade

0

200

400

600

800

1000U

SA

Ger

man

y

Jap

an

Ch

ina

Fra

nce

UK

Can

ada

Ital

y

Net

her

lan

ds

Bel

giu

m

Kor

ea

Mex

ico

Tai

wan

Sin

gap

ore

Sp

ain

Exports Imports

Bil

lion

s of

US

Dol

lars

(19

99)

Source: World Trade Organization 2000 Annual Report

Source: Global Reach

EnglishEnglish

2000 2005

Global Internet User Population

Chinese

Agenda

• Questions

• Overview

• Cross-Language Search

• User Interaction

The Search Process

Choose Document-Language

Terms

Query-DocumentMatching

InferConcepts

Select Document-Language

Terms

Document

Author

Query

Choose Document-Language

Terms

MonolingualSearcher

Choose Query-Language

Terms

Cross-LanguageSearcher

C ross -L an g u ag e R etrieva lIn d exin g L an g u ag esM ach in e-A ss is ted In d exin g

In fo rm ation R e trieva l

M u lt ilin g u a l M e tad a ta

D ig ita l L ib ra ries

In te rn a tion a l In fo rm ation F lowD iffu s ion o f In n ova tion

In fo rm ation U se

A u tom atic A b s trac tin g

Inform ation Science

M ach in e Tran s la tionIn fo rm ation E xtrac tionText S u m m ariza tion

N atu ra l L an g u ag e P rocess in g

M u ltilin g u a l O n to log ies

O n to log ica l E n g in eerin g

Textu a l D a ta M in in g

K n ow led g e D iscovery

M ach in e L earn in g

Artificial Intelligence

L oca liza tionIn fo rm ation V isu a liza tion

H u m an -C om p u ter In te rac tion

W eb In te rn a tion a liza tion

W orld -W id e W eb

Top ic D e tec tion an d Track in g

S p eech P rocess in g

M u ltilin g u a l O C R

D ocu m en t Im ag e U n d ers tan d in g

Other Fields

M ultilingua l In form ation Access

Some history: from controlled vocabular to free text

• 1964 International Road Research– Multilingual thesauri

• 1970 SMART– Dictionary-based free-text cross-language retrieval

• 1978 ISO Standard 5964 (revised 1985)– Guidelines for developing multilingual thesauri

• 1990 Latent Semantic Indexing– Corpus-based free-text translingual retrieval

Multilingual Thesauri

• Build a cross-cultural knowledge structure– Cultural differences influence indexing choices

• Use language-independent descriptors– Matched to language-specific lead-in vocabulary

• Three construction techniques– Build it from scratch– Translate an existing thesaurus– Merge monolingual thesauri

Free Text CLIR

• What to translate?– Queries or documents

• Where to get translation knowledge?– Dictionary or corpus

• How to use it?

Translingual Retrieval Architecture

LanguageIdentification

EnglishTerm

Selection

ChineseTerm

Selection

Cross-LanguageRetrieval

MonolingualChineseRetrieval

3: 0.91 4: 0.575: 0.36

1: 0.722: 0.48

ChineseQuery

ChineseTerm

Selection

Evidence for Language Identification

• Metadata– Included in HTTP and HTML

• Word-scale features– Which dictionary gets the most hits?

• Subword features– Character n-gram statistics

Query-Language Retrieval

MonolingualChineseRetrieval

3: 0.91 4: 0.575: 0.36

ChineseQueryTerms

EnglishDocument

Terms

DocumentTranslation

Example: Modular use of MT

• Select a single query language

• Translate every document into that language

• Perform monolingual retrieval

TDT-3 Mandarin Broadcast News

Systran

Balanced 2-best translation

Is Machine Translation Enough?

Document-Language Retrieval

MonolingualEnglish

Retrieval

3: 0.91 4: 0.575: 0.36

QueryTranslation

ChineseQueryTerms

EnglishDocument

Terms

Query vs. Document Translation

• Query translation– Efficient for short queries (not relevance feedback)– Limited context for ambiguous query terms

• Document translation– Rapid support for interactive selection– Need only be done once (if query language is same)

• Merged query and document translation– Can produce better effectiveness than either alone

The Short Query Challenge

0 1 2 3

1997

1998

1999

Number of Terms Per Query

English

Other EuropeanLanguages (German,French, Italian, Dutch,Swedish)

Source: Jack Xu, Excite@Home, 1999

Interlingual Retrieval

InterlingualRetrieval

3: 0.91 4: 0.575: 0.36

QueryTranslation

ChineseQueryTerms

EnglishDocument

Terms

DocumentTranslation

Key Challenges in CLIR

probesurveytake samples

cymbidium goeringii

restrainoilpetroleum

Wrongsegmentation

Whichtranslation?

Notranslation?

Sources of Evidence for Translation

• Corpus statistics

• Lexical resources

• Algorithms

• The user

Hieroglyphic

Egyptian Demotic

Greek

Types of Bilingual Corpora

• Parallel corpora: translation-equivalent pairs– Document pairs– Sentence pairs – Term pairs

• Comparable corpora: topically related– Collection pairs– Document pairs

Exploiting Parallel Corpora

• Automatic acquisition of translation lexicons

• Statistical machine translation

• Corpus-guided translation selection

• Document-linked techniques

Lexicon acquisition from the WWW

Word alignment (GIZA)

STRAND

Chunk-level alignment

Association stats

3378 document pairs

63K chunks

500K words

170K entries

Frequency-based thresholding

… cannot understand crew commands…

ne comprenez pas les instructions de l’ equip…

Corpus-Guided Translation Selection

• Rank translation alternatives for each term – pick English word e that maximizes Pr(e)– Pick English word e that maximizes Pr(e|c)

– Pick English words e1…en maximizing Pr(e1…en|c1…cm)

= statistical machine translation!

• Unigram language models are easy to build– Can use the collection being searched– Limits uncommon translation and spelling error effects

Corpus-Based CLIR Example

Top ranked FrenchDocumentsFrench

IR SystemEnglish

IR System

FrenchQueryTerms

EnglishTranslations

Top rankedEnglish

DocumentsParallelCorpus

Exploiting Comparable Corpora

• Blind relevance feedback– Existing CLIR technique + collection-linked corpus

• Lexicon enrichment– Existing lexicon + collection-linked corpus

• Dual-space techniques– Document-linked corpus

Blind Relevance Feedback

• Augment a representation with related terms– Find related documents, extract distinguishing terms

• Multiple opportunities:– Before doc translation: Enrich the vocabulary– After doc translation: Mitigate translation errors – Before query translation: Improve the query– After query translation: Mitigate translation errors

• Short queries get the most dramatic improvement

Example: Post-Translation “Document Expansion”

Mandarin Chinese Documents

Term-to-TermTranslation

EnglishCorpus

IRSystem

Top 5

AutomaticSegmentation

TermSelection

IRSystem

Results

EnglishQuery

Document to be Indexed

Single Document

Mandarin Newswire Text

Post-Translation Document Expansion

Why Document Expansion Works

• Story-length objects provide useful context

• Ranked retrieval finds signal amid the noise

• Selective terms discriminate among documents– Enrich index with low DF terms from top documents

• Similar strategies work well in other applications– CLIR query translation

– Monolingual spoken document retrieval

Lexicon Enrichment

… Cross-Language Evaluation Forum …

… Solto Extunifoc Tanixul Knadu …

?

Similar techniques can guide translation selection

Lexicon Enrichment

• Use a bilingual lexicon to align “context regions”– Regions with high coincidence of known translations

• Pair unknown terms with unmatched terms– Unknown: language A, not in the lexicon– Unmatched: language B, not covered by translation

• Treat the most surprising pairs as new translations

• Not yet tested in a CLIR application

Learning From Document Pairs

E1 E2 E3 E4 E5 S1 S2 S3 S4

Doc 1

Doc 2

Doc 3

Doc 4

Doc 5

4 2 2 1

8 4 4 2

2 2 2 1

2 1 2 1

4 1 2 1

English Terms Spanish Terms

Similarity “Thesauri”

• For each term, find most similar in other language– Terms E1 & S1 (or E3 & S4) are used in similar ways

• Treat top related terms as candidate translations– Applying dictionary-based techniques

• Performed well on comparable news corpus– Automatically linked based on date and subject codes

Generalized Vector Space Model

• “Term space” of each language is different– Document links define a common “document space”

• Describe documents based on the corpus– Vector of similarities to each corpus document

• Compute cosine similarity in document space

• Very effective in a within-domain evaluation

Latent Semantic Indexing

• Cosine similarity captures noise with signal– Term choice variation and word sense ambiguity

• Signal-preserving dimensionality reduction– Conflates terms with similar usage patterns

• Reduces term choice effect, even across languages

• Computationally expensive

Cognate Matching

• Dictionary coverage is inherently limited– Translation of proper names– Translation of newly coined terms– Translation of unfamiliar technical terms

• Strategy: model derivational translation– Orthography-based– Pronunciation-based

Matching Orthographic Cognates

• Retain untranslatable words unchanged– Often works well between European languages

• Rule-based systems– Even off-the-shelf spelling correction can help!

• Character-level statistical MT– Trained using a set of representative cognates

Matching Phonetic Cognates

• Forward transliteration– Generate all potential transliterations

• Reverse transliteration– Guess source string(s) that produced a transliteration

• Match in phonetic space

Sources of Evidence for Translation

• Corpus statistics

• Lexical resources

• Algorithms

• The user

Types of Lexical Resources

• Ontology

• Thesaurus

• Lexicon

• Dictionary

• Bilingual term list

Original query: El Nino and infectious diseases

Term selection: “El Nino” infectious diseases

Term translation:(Dictionary coverage: “El Nino” is not

found)

Translation selection:

Query formulation:Structure:

Dictionary-Based Query Translation

The Unbalanced Translation Problem

• Common query terms may have many translations

• Some of the translations may be rare

• IR systems give rare translations higher weights

• The wrong documents get highly ranked

Solution 1: Balanced Translation

• Replace each term with plausible translations– Common terms have many translations– Specific terms are more useful for retrieval

• Balance the contribution of each translation– Modular: duplicate translations– Integrated: average the contributions

Solution 2: “Structured Queries”

• Weight of term a in a document i depends on:– TF(a,i): Frequency of term a in document i– DF(a): How many documents term a occurs in

• Build pseudo-terms from alternate translations– TF (syn(a,b),i) = TF(a,i)+TF(b,i)– DF (syn(a,b) = |{docs with a}U{docs with b}|

• Downweights terms with any common translation– Particularly effective for long queries

(Query Terms: 1: 2: 3: )

Computing Weights

• Unbalanced:– Overweights query terms that have many translations

• Balanced (#sum): – Sensitive to rare translations

• Pirkola (#syn):– Deemphasizes query terms with any common translation

][3

1

3

3

2

2

1

1

DF

TF

DF

TF

DF

TF

])(2

1[

2

1

3

3

2

2

1

1

DF

TF

DF

TF

DF

TF

][2

1

3

3

21

21

DF

TF

DFDF

TFTF

Relative Effectiveness

0 0.1 0.2 0.3 0.4

Pirkola

Balanced

Mean Average Precision

All Trans 3 Trans

NTCIR-2 ECIR Collection, CETA+LDC Dictionary, Inquery 3.1p1

Exploiting Part-of-Speech (POS)

• Constrain translations by part-of-speech– Requires POS tagger and POS-tagged lexicon

• Works well when queries are full sentences– Short queries provide little basis for tagging

• Constrained matching can hurt monolingual IR– Nouns in queries often match verbs in documents

Evaluation Collections• TREC

– TREC-6/7/8: English, French, German, Italian text– TREC-9: Chinese text– TREC-10: Arabic text

• CLEF– CLEF-1: English, French, German, Italian text– CLEF-2: Above, plus Spanish and Dutch

• TDT– TDT-2/3: Chinese and English, text and speech

• NTCIR– NTCIR-1: Japanese and English text– NTCIR-2: Japanese, Chinese and English text

Percent of Monolingual Performance

• Create document collection in language F• Create queries and relevance judgments

– Queries in language F, relevance judgments– Translated queries in language E

• Monolingual run uses F queries, F docs– Upper bound on performance

• Cross-language run uses E queries, F docs– Measure as percentage of the upper bound

• Caveat: “query expansion” effect!

A real example (HLT’2001)• Combining two translation lexicons

– Contained in downloaded dictionary– Extracted from bilingual WWW documents

• Interaction between stemming and translation– 4-stage backoff, balanced translation

• Evidence combination– Baseline: dictionary with four-stage backoff– Merging: reranking informed by WWW tralex– Backoff: from baseline to WWW tralex

• Experiment and results

Four-stage Backoff

• Tralex might contain stems, surface forms, or some combination of the two.

mangez mangez

mangez mangemange

mangezmange mange

mangez mange mangent mange

- eat

- eats eat

- eat

- eat

Document Translation Lexicon

surface form surface form

stem surface form

surface form stem

stem stem

French stemmer: Oard, Levow, and Cabezas (2001); English: Inquiry’s kstem

Tralex Merging by Voting

Ranked dictionary

F

e1e2e3

100

99

98

100

99

Corpus tralex

F

e2e3e4e5

F

Ranked merge

e2e3e1

199

197

100

e4e5

Tralex Backoff

• Attempt 4-stage backoff using dictionary

• If not found, use corpus tralex as 5th stage

Dictionary-based 4-stage backoff

Lookup in WWW-based tralexmangez mangez - eatsurface form surface form

Experiment

• CLEF-2000 French document collection– ~21M words, Le Monde, document translation

• English queries (34 topics with relevant docs)

• Inquery (using #sum operator)

• Conditions (balanced top-2 translation of docs):– Baseline: downloaded dictionary alone– Corpus (STRAND) tralex, thresholds N=1,2,3– Tralex merging by voting– Backoff from dictionary tralex to corpus tralex

Results

STRAND corpus tralex (N=1) 0.2320

STRAND corpus tralex (N=2) 0.2440

STRAND corpus tralex (N=3) 0.2499

Merging by voting 0.2892

Baseline: downloaded dictionary 0.2919

Backoff from dictionary to corpus tralex

0.3282

Condition Mean Average Precision

+12% (p < .01) relative

Results Detail

mAP

Topic Detection and Tracking

• English and Chinese news stories– Newswire, radio, and television sources

• Query-by-example, mixed language/source– Merged mutilingual result set

• Set-based retrieval measures– Focus on utility

TDT: Evaluating CL Speech Retrieval

2265manually

segmentedstories

3371manually segmented

stories

DevelopmentCollection: TDT-2

EvaluationCollection: TDT-3

Mar 98

Oct 98 Dec 98

17 topics,variable number

of exemplars

Jun 98Jan 98

Exhaustive relevance assessment based on event overlap

English texttopic exemplars:Associated PressNew York Times

Mandarin audiobroadcast news:Voice of America

56 topics,variable number

of exemplars

Query byExampleEnglish

NewswireExemplars

MandarinAudioStories

President Bill Clinton and Chinese President Jiang Zemin engaged in a spirited, televised debate Saturday over human rights and theTiananmen Square crackdown, and announced a string of agreements on arms control, energy and environmental matters. There were no announced breakthroughs on American human rights concerns, including Tibet, but both leaders accentuated the positive …

美国总统克林顿的助手赞扬中国官员允许电视现场直播克林顿和江泽民在首脑会晤后举行的联合记者招待会。。特别是一九八九镇压民主运动的决定。他表示镇压天安门民主运动是错误的 , 他还批评了中国对西藏精神领袖达 国家安全事务助理伯格表示 , 这次直播让中国人第一次在种公开的论坛上听到围绕敏感的人权问题的讨论。在记者招待会上 …

Known Item Retrieval

• Design queries to retrieve a single document

• Measure the rank of that document in the list

• Average the inverse of the rank across queries

• Use sign test for statistical significance

• Useful first pass evaluation strategy– Avoids the cost of relevance judgments

– Poor mean inverse rank implies poor average precision

– Does not distinguish well among fairly close systems

Evaluating Corpus-Based Techniques

• Within-domain evaluation (upper bound)– Partition a bilingual corpus into training and test– Use the training part to tune the system– Generate relevance judgments for evaluation part

• Cross-domain evaluation (fair)– Use existing corpora and evaluation collections– No good metric for degree of domain shift

Evaluating Tralex Coverage

• Lexicon size

• Vocabulary coverage of the collection

• Term instance coverage of the collection

• Term weight coverage of the collection

• Term weight coverage on representative queries

• Retrieval performance on a test collection

Outline

• Questions

• Overview

• Search

Browsing

Interactive CLIR

• Important: Strong support for interactive relevance judgments can make up for less accurate nominations– [Hersh et al., SIGIR 2000]

• Practical: Interactive relevance judgments based on imperfect translations can beat fully automatic nominations alone– [Oard and Resnik, IP&M 1997]

User-Assisted Query Translation

Reverse Translation

Swiss bankQuery in English:

bank: Bankgebäude ( )bankverbindung (bank account, correspondent)bank (bench, settle)damm (causeway, dam, embankment)ufer (shore, strand, waterside)wall (parapet, rampart)

Click on a box to remove a possible translation:

Search

Continue

Selection

• Goal: Provide information to support decisions

• May not require very good translations– e.g., Term-by-term title translation

• People can “read past” some ambiguity– May help to display a few alternative translations

Language-Specific Selection

Swiss bankQuery in English: Search

English German(Swiss)(Bankgebäude, bankverbindung, bank)

1 (0.72) Swiss Bankers CriticizedAP / June 14, 1997

2 (0.48) Bank Director ResignsAP / July 24, 1997

1 (0.91) U.S. Senator Warpathing NZZ / June 14, 1997

2 (0.57) [Bankensecret] Law ChangeSDA / August 22, 1997

3 (0.36) Banks Pressure ExistentNZZ / May 3, 1997

Translingual Selection

Swiss bankQuery in English: Search

German Query: (Swiss)(Bankgebäude, bankverbindung, bank)

1 (0.91) U.S. Senator Warpathing NZZ June 14, 19972 (0.57) [Bankensecret] Law Change SDA August 22, 19973 (0.52) Swiss Bankers Criticized AP June 14, 19974 (0.36) Banks Pressure Existent NZZ May 3, 19975 (0.28) Bank Director Resigns AP July 24, 1997

Merging Ranked Lists

• Types of Evidence– Rank– Score

• Evidence Combination– Weighted round robin– Score combination

• Parameter tuning– Condition-based– Query-based

1 voa4062 .22 2 voa3052 .21 3 voa4091 .17 …1000 voa4221 .04

1 voa4062 .52 2 voa2156 .37 3 voa3052 .31 …1000 voa2159 .02

1 voa4062 2 voa3052 3 voa2156 …1000 voa4201

Examination Interface

• Two goals– Refine document delivery decisions– Support vocabulary discovery for query refinement

• Rapid translation is essential– Document translation retrieval strategies are a good fit– Focused on-the-fly translation may be a viable alternative

Uh oh…

State-of-the-Art Machine Translation

Term-by-Term Gloss Translation

Experiment Design

Participant

1

2

3

4

Task Order

Narrow:

Broad:

Topic Key

System Key

System B:

System A:

Topic11, Topic17 Topic13, Topic29

Topic11, Topic17 Topic13, Topic29

Topic17, Topic11 Topic29, Topic13

Topic17, Topic11 Topic29, Topic13

11, 13

17, 29

An Experiment Session• Task and system familiarization (30 minutes)

– Gain experience with both systems

• 4 searches (20 minutes x 4):– Read topic description (in a language you know)

– Examine translations (into that same language)

– Select one of 5 relevance judgments (two clusters)• Relevant

• Somewhat relevant, Not relevant, Unsure, Not judged

– Instructed to seek high precision

• 8 questionnaires– Initial (1), each topic (4), each system (2), Final (1)

Measure of Effectiveness

• Unbalanced F-Measure:– P = precision

– R = recall = 0.8

• Favors precision over recall

• This models an application in which:– Fluent translation is expensive

– Missing some relevant documents would be okay

RP

F

11

French Results OverviewCLEF

AUTO

English Results OverviewCLEF

AUTO

Maryland Experiments

• MT is almost always better– Significant overall and for narrow topics alone (one-tailed t-test, p<0.05)

• F measure is less insightful for narrow topics– Always near 0 or 1

0

0.2

0.4

0.6

0.8

1

1.2

umd01 umd02 umd03 umd04 umd01 umd02 umd03 umd04

Participant

Ave

rag

e F

_0.8

on

Tw

o T

op

ics MT

GLOSS|---------- Broad topics -----------| |--------- Narrow topics -----------|

Some Observations

• Small agreement with CLEF assessments!– Time pressure, precision bias, strict judgments

• Systran was fairly consistent across sites– Only when the language pair was the same

• Monolingual > Systran > Gloss– In both recall and precision

• UNED’s phrase translations improve recall– With no adverse affect on precision

Delivery

• Use may require high-quality translation– Machine translation quality is often rough

• Route to best translator based on:– Acceptable delay– Required quality (language and technical skills)– Cost

Summary

• Controlled vocabulary– Mature, efficient, easily explained

• Dictionary-based– Simple, broad coverage

• Collection-aligned corpus-based– Generally helpful

• Document- and Term-aligned corpus-based– Effective in the same domain

• User interface– Only very preliminary results available

Research Opportunities

User Interaction

Translation Selection

Transliteration

Comparable Corpora

Parallel Corpora

Dictionaries

Term Selection

Percieved Opportunities Past InvestmentSegmentation &Phrase Indexing

Lexical Coverage

No “muddiest point” question…

… it’s all CLIR!

Concerns about Social Filtering• Subjectivity: is good for me good for you? (6)• Sparse recommendations (4)• Privacy and trust are in tension(3)• Insufficient perceived self interest (3)• Difficulty of making inferences from behavior• Changing information needs• How to observe user behavior?• Non-cooperative users• Scalable ratings servers• Adoption of new technology

Muddiest Points

• Architectures for using implicit feedback (4)• Rocchio (4)

– Is it commonly used?

• Machine learning techniques (2)– Especially hill climbing and rule induction

• How are links related to social filtering?• The math behind Google• How to reducing marginal cost of ratings?• Relationship between filtering and routing

top related