structured queries for legal search trec 2007 legal track yangbo zhu, le zhao, jamie callan, jaime...

17
Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of Computer Science Carnegie Mellon University 11/06/2007

Upload: reynold-walsh

Post on 16-Dec-2015

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Structured Queries for Legal SearchTREC 2007 Legal Track

Yangbo Zhu, Le Zhao,Jamie Callan, Jaime Carbonell

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University11/06/2007

Page 2: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Agenda

Introduction Main task – ad hoc search Routing task – relevance feedback

Page 3: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

What is legal search Goal: retrieve all documents for production requests. Production request: describes a set of documents that

the plaintiff forces the defendant to produce. Recall-oriented: high risk (value) of missing (finding)

important documents.

Sample request text:All documents discussing, referencing, or relating to company guidelines, strategies, or internal approval for placement of tobacco products in movies that are mentioned as G-rated.

AND

OR

OROR

W/5

guide

strategy

approval

family

“G rated”

movie

film

Final query

Page 4: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Data set

7 million business records from tobacco companies and research institutes.

Metadata: title, author, organizations, etc. OCR text: contain errors 50 topics generated from four hypothetical

complaints created by lawyers

Page 5: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Main task – Ad hoc search

Indri query formulation Without boolean constraint #combine(ranking function)

With boolean constraints #filreq( #band(boolean constraint) #combine(ranking function) )

Page 6: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Boolean constraint

Translate the Final Query

Original expression Indri operator

x AND y #uw(x y)

x OR y #syn(x y)

x BUT NOT y #filrej(y x)

Phrase: “x y” #1(x y)

Proximity: (x W/k y) #uw(k+2)(x y)

AND

OR

OROR

W/5

guide

strategy

approval

family

“G rated”

movie

film

Page 7: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Ranking functions Bag of words

(guide strategy approval family G rated movie film) Respect phrase operators

(guide strategy approval family #1(G rated) movie film) Group synonyms together

(#syn(guide strategy approval) #syn(family #1(G rated)) #syn(movie film))

AND

OR

OROR

W/5

guide

strategy

approval

family

“G rated”

movie

film

Page 8: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Experiments and findings

Boolean constraints improve recall and precision Structured queries outperform bag-of-words ones

00.05

0.10.15

0.20.25

w/o BC w/ BC

est_R@B

0

0.1

0.2

0.3

w/o BC w/ BC

est_P@B

TermsPhrasesSynonyms

0

0.05

0.1

0.15

w/o BC w/ BC

est_R@1K

0

0.1

0.2

0.3

0.4

w/o BC w/ BC

est_R@25K* B is the number of documents matching the Final Query. Its average value is 5000.

Page 9: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Per topic performance(Difference to the median of 29 manual runs)

est_RB

50 60 70 80 90 100 110-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Topic ID

Diff

eren

ce f

rom

med

ian

estR

B

50 60 70 80 90 100 110-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Topic ID

Diff

eren

ce f

rom

med

ian

estP

B

est_PB

Page 10: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Routing task of Legal track 2007

Structured queries are known to be hard to construct. Not, with supervision

Questions Weighted query help? Metadata&Annotations help?

A definitive answer from Supervised Structured Query Construction

Page 11: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Structured query

#weight( w1 t1 w2 t2 … wn tn)0.00851 trademark.sentence 0.00846 trademark

0.00665 gmp.product 0.00653 basement.product

0.00625 steenland 0.00606 steenland.sentence

0.00602 gouda.sentence 0.00600 gouda

0.00587 steenland.organization 0.00561 toi

0.00550 toi.sentence 0.00544 lett.product

0.00486 chocol.ti 0.00479 legal.sentence

0.00474 children.per_desc 0.00467 legal.s

0.00459 legal 0.00453 legal.organization

0.00435 kid.sentence 0.00433 kid

Page 12: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Supervised Structured Query Construction

Relevance feedback => supervised learning

Train linear SVM with keyword, keyword.field feature SVM classifier

fi : training weights for terms, choose to be tfidf/LM scores

Retrieval: #weight( w1 t1 w2 t2 … ) fi : tfidf/LM scores for terms

Advantages Given enough training, know for sure whether one

type of feature helps

UniqTerms( )

P( relevant | ) i ii d

d w f

Page 13: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Example Query

<RequestNumber>13</RequestNumber><RequestText>All documents to or from

employees of a tobacco company or tobacco organization referring to the marketing, placement, or sale of chocolate candies in the form of cigarettes.</RequestText>

<FinalQuery>(cand! OR chocolate) w/10 cigarette!</FinalQuery>

Page 14: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Annotations

Feedback query:

NE: bush.person

sentence: violate.sent

meta: television.title

Keyword

NE

sentence

meta

00.05

0.10.15

0.20.25

0.30.35

0.40.45Query Terms%

0.00851 trademark.sentence 0.00846 trademark

0.00665 gmp.product 0.00653 basement.product

0.00625 steenland 0.00606 steenland.sentence

0.00602 gouda.sentence 0.00600 gouda

0.00587 steenland.organization 0.00561 toi

0.00550 toi.sentence 0.00544 lett.product

0.00486 chocol.ti 0.00479 legal.sentence

0.00474 children.per_desc 0.00467 legal.s

0.00459 legal 0.00453 legal.organization

0.00435 kid.sentence 0.00433 kid

Page 15: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Performance

0.10.110.120.130.140.15

uw pw pnw

MAP

0.5

0.52

0.54

0.56

0.58

uw pw pnw

Recall@B

0.1

0.12

0.14

0.16

0.18

uw pw pnw

R-Prec

On 39 topics of Legal 2006(2/3 of judged documents for training, the rest for testing)

0.3

0.32

0.34

0.36

0.38

uw pw pnw

est_R@B

keywords

structured

structuredw/o meta

On 10 topics of Legal 2007 routing task

Page 16: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Routing Conclusions

A principled way of constructing structured queriesAnnotationsQuery term weights

Answers from a supervised learning algorithmWeights helps, annotations less.

Page 17: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of

Thank you!

Questions?