structured queries for legal search trec 2007 legal track yangbo zhu, le zhao, jamie callan, jaime...
TRANSCRIPT
![Page 1: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/1.jpg)
Structured Queries for Legal SearchTREC 2007 Legal Track
Yangbo Zhu, Le Zhao,Jamie Callan, Jaime Carbonell
Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University11/06/2007
![Page 2: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/2.jpg)
Agenda
Introduction Main task – ad hoc search Routing task – relevance feedback
![Page 3: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/3.jpg)
What is legal search Goal: retrieve all documents for production requests. Production request: describes a set of documents that
the plaintiff forces the defendant to produce. Recall-oriented: high risk (value) of missing (finding)
important documents.
Sample request text:All documents discussing, referencing, or relating to company guidelines, strategies, or internal approval for placement of tobacco products in movies that are mentioned as G-rated.
AND
OR
OROR
W/5
guide
strategy
approval
family
“G rated”
movie
film
Final query
![Page 4: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/4.jpg)
Data set
7 million business records from tobacco companies and research institutes.
Metadata: title, author, organizations, etc. OCR text: contain errors 50 topics generated from four hypothetical
complaints created by lawyers
![Page 5: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/5.jpg)
Main task – Ad hoc search
Indri query formulation Without boolean constraint #combine(ranking function)
With boolean constraints #filreq( #band(boolean constraint) #combine(ranking function) )
![Page 6: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/6.jpg)
Boolean constraint
Translate the Final Query
Original expression Indri operator
x AND y #uw(x y)
x OR y #syn(x y)
x BUT NOT y #filrej(y x)
Phrase: “x y” #1(x y)
Proximity: (x W/k y) #uw(k+2)(x y)
AND
OR
OROR
W/5
guide
strategy
approval
family
“G rated”
movie
film
![Page 7: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/7.jpg)
Ranking functions Bag of words
(guide strategy approval family G rated movie film) Respect phrase operators
(guide strategy approval family #1(G rated) movie film) Group synonyms together
(#syn(guide strategy approval) #syn(family #1(G rated)) #syn(movie film))
AND
OR
OROR
W/5
guide
strategy
approval
family
“G rated”
movie
film
![Page 8: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/8.jpg)
Experiments and findings
Boolean constraints improve recall and precision Structured queries outperform bag-of-words ones
00.05
0.10.15
0.20.25
w/o BC w/ BC
est_R@B
0
0.1
0.2
0.3
w/o BC w/ BC
est_P@B
TermsPhrasesSynonyms
0
0.05
0.1
0.15
w/o BC w/ BC
est_R@1K
0
0.1
0.2
0.3
0.4
w/o BC w/ BC
est_R@25K* B is the number of documents matching the Final Query. Its average value is 5000.
![Page 9: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/9.jpg)
Per topic performance(Difference to the median of 29 manual runs)
est_RB
50 60 70 80 90 100 110-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Topic ID
Diff
eren
ce f
rom
med
ian
estR
B
50 60 70 80 90 100 110-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Topic ID
Diff
eren
ce f
rom
med
ian
estP
B
est_PB
![Page 10: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/10.jpg)
Routing task of Legal track 2007
Structured queries are known to be hard to construct. Not, with supervision
Questions Weighted query help? Metadata&Annotations help?
A definitive answer from Supervised Structured Query Construction
![Page 11: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/11.jpg)
Structured query
#weight( w1 t1 w2 t2 … wn tn)0.00851 trademark.sentence 0.00846 trademark
0.00665 gmp.product 0.00653 basement.product
0.00625 steenland 0.00606 steenland.sentence
0.00602 gouda.sentence 0.00600 gouda
0.00587 steenland.organization 0.00561 toi
0.00550 toi.sentence 0.00544 lett.product
0.00486 chocol.ti 0.00479 legal.sentence
0.00474 children.per_desc 0.00467 legal.s
0.00459 legal 0.00453 legal.organization
0.00435 kid.sentence 0.00433 kid
![Page 12: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/12.jpg)
Supervised Structured Query Construction
Relevance feedback => supervised learning
Train linear SVM with keyword, keyword.field feature SVM classifier
fi : training weights for terms, choose to be tfidf/LM scores
Retrieval: #weight( w1 t1 w2 t2 … ) fi : tfidf/LM scores for terms
Advantages Given enough training, know for sure whether one
type of feature helps
UniqTerms( )
P( relevant | ) i ii d
d w f
![Page 13: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/13.jpg)
Example Query
<RequestNumber>13</RequestNumber><RequestText>All documents to or from
employees of a tobacco company or tobacco organization referring to the marketing, placement, or sale of chocolate candies in the form of cigarettes.</RequestText>
<FinalQuery>(cand! OR chocolate) w/10 cigarette!</FinalQuery>
![Page 14: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/14.jpg)
Annotations
Feedback query:
NE: bush.person
sentence: violate.sent
meta: television.title
Keyword
NE
sentence
meta
00.05
0.10.15
0.20.25
0.30.35
0.40.45Query Terms%
0.00851 trademark.sentence 0.00846 trademark
0.00665 gmp.product 0.00653 basement.product
0.00625 steenland 0.00606 steenland.sentence
0.00602 gouda.sentence 0.00600 gouda
0.00587 steenland.organization 0.00561 toi
0.00550 toi.sentence 0.00544 lett.product
0.00486 chocol.ti 0.00479 legal.sentence
0.00474 children.per_desc 0.00467 legal.s
0.00459 legal 0.00453 legal.organization
0.00435 kid.sentence 0.00433 kid
![Page 15: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/15.jpg)
Performance
0.10.110.120.130.140.15
uw pw pnw
MAP
0.5
0.52
0.54
0.56
0.58
uw pw pnw
Recall@B
0.1
0.12
0.14
0.16
0.18
uw pw pnw
R-Prec
On 39 topics of Legal 2006(2/3 of judged documents for training, the rest for testing)
0.3
0.32
0.34
0.36
0.38
uw pw pnw
est_R@B
keywords
structured
structuredw/o meta
On 10 topics of Legal 2007 routing task
![Page 16: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/16.jpg)
Routing Conclusions
A principled way of constructing structured queriesAnnotationsQuery term weights
Answers from a supervised learning algorithmWeights helps, annotations less.
![Page 17: Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of](https://reader036.vdocuments.us/reader036/viewer/2022072005/56649ce05503460f949a98f2/html5/thumbnails/17.jpg)
Thank you!
Questions?