transfer of predictive models for classification of
TRANSCRIPT
![Page 1: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/1.jpg)
Transfer of Predictive Models for Classification ofStatutory Texts in Multi-jurisdictional Settings
Jaromir SavelkaKevin D. Ashley
Intelligent Systems ProgramUniversity of Pittsburgh
ISP AI ForumJanuary 23, 2015
![Page 2: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/2.jpg)
Presentation Overview
Motivation
Task Description
Related and Prior Work
Data from Multiple Jurisdictions
Data Processing
Framework
Experimental Setup
Evaluation and Results
Future Work
Conclusions
2
![Page 3: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/3.jpg)
Presentation Overview
Motivation
Task Description
Related and Prior Work
Data from Multiple Jurisdictions
Data Processing
Framework
Experimental Setup
Evaluation and Results
Future Work
Conclusions
3
![Page 4: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/4.jpg)
Ebola Patient in Texas Presbytarian Hospital
4
![Page 5: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/5.jpg)
Example Network
5
![Page 6: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/6.jpg)
Presentation Overview
Motivation
Task Description
Related and Prior Work
Data from Multiple Jurisdictions
Data Processing
Framework
Experimental Setup
Evaluation and Results
Future Work
Conclusions
6
![Page 7: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/7.jpg)
Task Description (Manual)
1. Set of candidate statutory texts is retrieved on basis ofpredefined set of search queries from legal IR system.
2. Expert human annotators go through texts and identifyrelevant spans, i.e. parts containing relevant legal norms.
3. Each relevant span is represented as numeric code followingguidelines provided in codebook (citation and 9 descriptors).[28]
NOTE: 95% confidence interval for average inter-annotator agreement for all tasks
was reported as (63.1%, 74.9%).
7[28] PHASYS Codebook [online]
![Page 8: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/8.jpg)
Example Code Assignment
Example statutory provision
The number of patients admitted to any area of the hospital shallnot exceed the number for which the area is designed, equipped,and staffed except in cases of emergency, and then only inaccordance with the emergency or disaster plan of the hospital.(28 Pa. Code para 101.172)
Corresponding code
28 Pa. Code § 101.172; Hospital (14); Must Do (2); Suspend (29);Rule/Regulations/Restrictions (4); For Emergency Response (2);Non-specified Disaster/Emergency (5); Public/Individuals (27);Silent (0); Silent (0)
8
![Page 9: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/9.jpg)
Coding Scheme Elements
I Citation
I Relevance
I Acting PHS agent (Who is acting?)
I Prescription
I Action (Which action is being taken?)
I Goal
I Purpose (For what purpose is action being taken?)
I Type of Emergency Disaster
I Receiving PHS agent
I Timeframe (In what timeframe can/must action be taken?)
I Condition
9[15] Grabmair et al. 2011, [22] Sweeney et al. 2014
![Page 10: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/10.jpg)
Problem
10
![Page 11: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/11.jpg)
Problem
10
![Page 12: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/12.jpg)
Task Description (Automated)
I In our work we perform described tasks automatically, i.e.:
1. We transform textual data into feature vectors.2. We classify vectors in terms of relevance for PHS analysis.3. We classify vectors in terms of each of nine code categories.4. We evaluate performance of our system with respect to labels
created by expert annotators (treated as gold standard).
I In prior work data sparsity was recognized as key elementlimiting performance.
I We decided to focus on use of data from other jurisdictions asone possible way to mitigate problem of data sparsity.
I Currently, we have developed a framewrok for transfer of textclassification models among different jurisdictions.
11
![Page 13: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/13.jpg)
Presentation Overview
Motivation
Task Description
Related and Prior Work
Data from Multiple Jurisdictions
Data Processing
Framework
Experimental Setup
Evaluation and Results
Future Work
Conclusions
12
![Page 14: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/14.jpg)
Related Work (AI & Law)
1. Classification of legal norms in terms of type.[3], [8], [10], [11], [13]
We classify texts as containing, e.g., obligation (‘must’), permission
(‘may’) or prohibition (‘must not’).
2. Classification of legal literature and legislative texts withhierarchically organized topics.[12], [18]
Closely related to classification of the texts in terms of relevance.
3. Rule-based techniques for extraction of specificelements.[3], [10], [11], [13], [24], [25]
We mine texts for presence of similar elements.
4. Classification of EU documents with terms fromEuroVoc.[4], [7], [20], [21]
Close to mining texts for specific topical and functional information.
13
[3] Biagioli et al. 2005, [4] Boella 2012, [7] Daudaravicius 2012, [8] de Maat & Winkels 2007,[10] Francesconi et al. 2010, [11] Francesconi 2009, [12] Francesconi & Peruginelli 2008,
[13] Francesconi & Passerini 2007, [18] Opsomer et al. 2009, [20] Pouliquen 2003,
[21] Steinberger 2012, [24] Winkels & Hoekstra 2012, [25] Wyner & Peters 2011
![Page 15: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/15.jpg)
Related Work (Transfer Learning)
I Transfer learning, in contrast to traditional ML framework,allows the domains, tasks, and distributions used in trainingand testing to be different.
I Transfer learning aims to extract the knowledge from one ormore source tasks and applies the knowledge to a target task.
14[19] Pan & Yang 2010; [9] Evgeniou & Pontil 2004
![Page 16: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/16.jpg)
Prior Work (Results)
15
PA→PA FL→PA FL+PA→PA FL→FL PA→FL FL+PA→FL
Relevance F: 0.72 F: 0.54 F: 0.73 F: 0.52 F: 0.35 F: 0.54
P: 0.75 P: 0.62 P: 0.77 P: 0.62 P: 0.27 P: 0.55
R: 0.70 R: 0.47 R: 0.69 R: 0.45 R: 0.50 R: 0.52
Act. agent A: 0.49 A: 0.30 A: 0.52 A: 0.36 A: 0.25 A: 0.44
Prescription A: 0.76 A: 0.72 A: 0.77 A: 0.77 A: 0.75 A: 0.75
Action A: 0.29 A: 0.23 A: 0.30 A: 0.23 A: 0.18 A: 0.24
Goal A: 0.32 A: 0.17 A: 0.32 A: 0.20 A: 0.16 A: 0.25
Purpose A: 0.59 A: 0.53 A: 0.61 A: 0.58 A: 0.61 A: 0.62
Emg. Type A: 0.78 A: 0.69 A: 0.80 A: 0.76 A: 0.72 A: 0.77
Rec. agent A: 0.36 A: 0.25 A: 0.35 A: 0.25 A: 0.25 A: 0.28
Time frame A: 0.84 A: 0.81 A: 0.85 A: 0.80 A: 0.78 A: 0.80
Condition A: 0.77 A: 0.68 A: 0.75 A: 0.65 A: 0.65 A: 0.67
Relevance Acting agent Prescription Action Goal Purpose Emergency type Receiving agent Time frame Condition
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Relevance Acting agent Prescription Action Goal Purpose Emergency type Receiving agent Time frame Condition
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
[15] Grabmair et al. 2011 & [23] Savelka et al. 2014
![Page 17: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/17.jpg)
Prior Work (Similar Traits in Both Jurisdictions)
Intra-jurisdictional classifiers trained for Florida (yellow) andPennsylvania (blue) show that they both share similar traits.
16
Relevance Acting agent Prescription Action Goal Purpose Emergency type Receiving agent Time frame Condition0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
![Page 18: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/18.jpg)
Presentation Overview
Motivation
Task Description
Related and Prior Work
Data from Multiple Jurisdictions
Data Processing
Framework
Experimental Setup
Evaluation and Results
Future Work
Conclusions
17
![Page 19: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/19.jpg)
Comparison of Similar PA and FL Provisions
18
COMAR 01.01.2003.18(D)(2) Fla. Stat. § 943.0312(3)
CODE OF MARYLAND REGULATIONS Florida Annotated Statutes
TITLE 01. EXECUTIVE DEPARTMENT TITLE 47. CRIMINAL PROCEDURE AND CORRECTIONS
SUBTITLE 01. EXECUTIVE ORDERS CHAPTER 943. DEPARTMENT OF LAW ENFORCEMENT
Establishment of the Governor’s Office Of Homeland Security Regional domestic security task forces
The Director shall be responsible for the following activities:Advise the Governor on policies, strategies, and measures toenhance and improve the ability to detect, prevent, preparefor, protect against, respond to, and recover from, man-madeemergencies or disasters, including terrorist attacks;
The Chief of Domestic Security, in conjunction with the Divi-sion of Emergency Management, the regional domestic secu-rity task forces, and the various state entities responsible forestablishing training standards applicable to state law enforce-ment officers and fire, emergency, and first-responder person-nel shall identify appropriate equipment and training needs,curricula, and materials related to the effective response tosuspected or actual acts of terrorism or incidents involvingreal or hoax weapons of mass destruction [...]
Administrative agency [Active agent: 26] of the State [Activeagent subset: 2] (homeland security) [Active agent footnote:502] must [Prescription: 2] advise [Action: 21] the elected of-ficials [Receiving agent: 20] on a plan [Goal: 1] for emergencypreparedness, response, and recovery [Purpose: 1, 2 and 4]for an event of terrorist/bioterrorist/biohazardous emergency[Emergency type: 5, 19].
Law enforcement agency [Active agent: 16] of the State [Ac-tive agent subset: 2] must [Prescription: 2] advise [Action: 21]the elected officials [Receiving agent: 20] on a training pro-gram, equipment and personnel [Goal: 5, 7, 16] for emergencypreparedness, response, and recovery [Purpose: 1, 2 and 4]for an event of terrorist/bioterrorist/biohazardous emergency[Emergency type: 5, 19].
![Page 20: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/20.jpg)
Presentation Overview
Motivation
Task Description
Related and Prior Work
Data from Multiple Jurisdictions
Data Processing
Framework
Experimental Setup
Evaluation and Results
Future Work
Conclusions
19
![Page 21: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/21.jpg)
Source Data
20
![Page 22: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/22.jpg)
Partitioning into Subtrees
I Statutory documents are (in comparison to other types ofdocuments) well structured.
I Document can be viewed as a tree graph with given spans oftext as nodes and sub-part relations as edges.
I We need to divide each statutory text into smaller parts thatcould be referred via citations.
21
![Page 23: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/23.jpg)
Partitioning into Subtrees
Fla. Stat. §101.62Florida Annotated StatutesTITLE 9. ELECTORS AND ELECTIONS (Chs. 97-107)CHAPTER 101. VOTING METHODS ANDPROCEDUREFla. Stat. §101.62 (2010)§101.62. Request for absentee ballots(1) (a) The supervisor shall accept a request for anabsentee ballot from an elector in person or in writing.One request shall be deemed sufficient to receive anabsentee ballot for all elections through the next regularlyscheduled general election, unless the elector or theelector’s designee indicates at the time the request ismade the elections for which the elector desires to receivean absentee ballot. Such request may be consideredcanceled when any first-class mail sent by the supervisorto the elector is returned as undeliverable.(b) The supervisor may accept a written or telephonicrequest for an absentee ballot from the elector, or, ifdirectly instructed by the elector, a member of theelector’s immediate family, or the elector’s legal guardian.For purposes of this section, the term ”immediate family”has the same meaning as specified in paragraph (4)(b).The person making the request must disclose:1. The name of the elector for whom the ballot isrequested.2. The elector’s address.3. The elector’s date of birth.4. The requester’s name.5. The requester’s address.6. The requester’s driver’s license number, if available.7. The requester’s relationship to the elector.
22
![Page 24: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/24.jpg)
Partitioning into Subtrees
Fla. Stat. §101.62Florida Annotated StatutesTITLE 9. ELECTORS AND ELECTIONS (Chs. 97-107)CHAPTER 101. VOTING METHODS ANDPROCEDUREFla. Stat. §101.62 (2010)§101.62. Request for absentee ballots(1) (a) The supervisor shall accept a request for anabsentee ballot from an elector in person or in writing.One request shall be deemed sufficient to receive anabsentee ballot for all elections through the next regularlyscheduled general election, unless the elector or theelector’s designee indicates at the time the request ismade the elections for which the elector desires to receivean absentee ballot. Such request may be consideredcanceled when any first-class mail sent by the supervisorto the elector is returned as undeliverable.(b) The supervisor may accept a written or telephonicrequest for an absentee ballot from the elector, or, ifdirectly instructed by the elector, a member of theelector’s immediate family, or the elector’s legal guardian.For purposes of this section, the term ”immediate family”has the same meaning as specified in paragraph (4)(b).The person making the request must disclose:1. The name of the elector for whom the ballot isrequested.2. The elector’s address.3. The elector’s date of birth.4. The requester’s name.5. The requester’s address.6. The requester’s driver’s license number, if available.7. The requester’s relationship to the elector.
22
![Page 25: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/25.jpg)
Partitioning into Subtrees
Fla. Stat. §101.62Florida Annotated StatutesTITLE 9. ELECTORS AND ELECTIONS (Chs. 97-107)CHAPTER 101. VOTING METHODS ANDPROCEDUREFla. Stat. §101.62 (2010)§101.62. Request for absentee ballots(1) (a) The supervisor shall accept a request for anabsentee ballot from an elector in person or in writing.One request shall be deemed sufficient to receive anabsentee ballot for all elections through the next regularlyscheduled general election, unless the elector or theelector’s designee indicates at the time the request ismade the elections for which the elector desires to receivean absentee ballot. Such request may be consideredcanceled when any first-class mail sent by the supervisorto the elector is returned as undeliverable.(b) The supervisor may accept a written or telephonicrequest for an absentee ballot from the elector, or, ifdirectly instructed by the elector, a member of theelector’s immediate family, or the elector’s legal guardian.For purposes of this section, the term ”immediate family”has the same meaning as specified in paragraph (4)(b).The person making the request must disclose:1. The name of the elector for whom the ballot isrequested.2. The elector’s address.3. The elector’s date of birth.4. The requester’s name.5. The requester’s address.6. The requester’s driver’s license number, if available.7. The requester’s relationship to the elector.
22
![Page 26: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/26.jpg)
Partitioning into Subtrees
Fla. Stat. §101.62Florida Annotated StatutesTITLE 9. ELECTORS AND ELECTIONS (Chs. 97-107)CHAPTER 101. VOTING METHODS ANDPROCEDUREFla. Stat. §101.62 (2010)§101.62. Request for absentee ballots(1) (a) The supervisor shall accept a request for anabsentee ballot from an elector in person or in writing.One request shall be deemed sufficient to receive anabsentee ballot for all elections through the next regularlyscheduled general election, unless the elector or theelector’s designee indicates at the time the request ismade the elections for which the elector desires to receivean absentee ballot. Such request may be consideredcanceled when any first-class mail sent by the supervisorto the elector is returned as undeliverable.(b) The supervisor may accept a written or telephonicrequest for an absentee ballot from the elector, or, ifdirectly instructed by the elector, a member of theelector’s immediate family, or the elector’s legal guardian.For purposes of this section, the term ”immediate family”has the same meaning as specified in paragraph (4)(b).The person making the request must disclose:1. The name of the elector for whom the ballot isrequested.2. The elector’s address.3. The elector’s date of birth.4. The requester’s name.5. The requester’s address.6. The requester’s driver’s license number, if available.7. The requester’s relationship to the elector.
22
![Page 27: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/27.jpg)
Partitioning into Subtrees
Fla. Stat. §101.62Florida Annotated StatutesTITLE 9. ELECTORS AND ELECTIONS (Chs. 97-107)CHAPTER 101. VOTING METHODS ANDPROCEDUREFla. Stat. §101.62 (2010)§101.62. Request for absentee ballots(1) (a) The supervisor shall accept a request for anabsentee ballot from an elector in person or in writing.One request shall be deemed sufficient to receive anabsentee ballot for all elections through the next regularlyscheduled general election, unless the elector or theelector’s designee indicates at the time the request ismade the elections for which the elector desires to receivean absentee ballot. Such request may be consideredcanceled when any first-class mail sent by the supervisorto the elector is returned as undeliverable.(b) The supervisor may accept a written or telephonicrequest for an absentee ballot from the elector, or, ifdirectly instructed by the elector, a member of theelector’s immediate family, or the elector’s legal guardian.For purposes of this section, the term ”immediate family”has the same meaning as specified in paragraph (4)(b).The person making the request must disclose:1. The name of the elector for whom the ballot isrequested.2. The elector’s address.3. The elector’s date of birth.4. The requester’s name.5. The requester’s address.6. The requester’s driver’s license number, if available.7. The requester’s relationship to the elector.
22
![Page 28: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/28.jpg)
Selected Properties of Data Sets
state # statutes # text units # relevant # codes
AK 135 1965 331 386
CA 1174 19857 2296 2712
FL 464 16618 1033 1476
KS 304 5003 713 1190
MD 248 7593 687 760
ND 208 3114 458 656
PA 808 10882 1665 1873
TX 811 30474 1462 1712
I The individual text units are stored in XML files (one for eachstate).
I These files are the starting point for all of our experiments.
I There are 18,998 unique terms/lemmas (i.e., features) afterstop-words removal.
23
![Page 29: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/29.jpg)
Labels
0 5 10 15 20 25 300
100
200
300
400
500
600
700
5 10 15 20 25 300
200
400
600
800
1000
1200
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
120
140
160
0 1 2 3 4 5 6 7 80
200
400
600
800
1000
1200
1400
1600
1800
2000
0 1 2 3 4 5 6 7 8 9 100
200
400
600
800
1000
1200
1400
1600
1800
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
Acting agent Emergency type Prescription24
![Page 30: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/30.jpg)
Presentation Overview
Motivation
Task Description
Related and Prior Work
Data from Multiple Jurisdictions
Data Processing
Framework
Experimental Setup
Evaluation and Results
Future Work
Conclusions
25
![Page 31: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/31.jpg)
Framework: Data Sets
At minimum, framework assumes existence of labeled datasetDtrain = 〈Xtrain,Ytrain〉 ∈ Dtarget
In addition, there may be an arbitrary number of labeled datasetsDaux = 〈Xaux ,Yaux〉 ∈ Daux ∼ Dtarget
Goal is to train f (·) which performs well on unseen x (i)test ∈ Dtarget .
Framework uses Daux to train f (·) which performs better thanpredictive function trained on Dtrain only.
Underlying idea is to train a number of different fi (·) on differentDi and decide about their usefulness in particular contexts.
26
![Page 32: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/32.jpg)
Framework: Predictive Models
Framework does not rely on a specific model of f (·).
For different datasets different models or combination of modelsmay be used.
Instead of actual prediction for x (i)test probability distribution over
label space is used.
Therefore, f (·) should be capable of providing probabilitydistribution (or at least some score for each possible yj).
f (x (i)test)→ 〈p(y1), p(y2), . . . , p(ym)〉
27
![Page 33: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/33.jpg)
Framework: Training
We train a predictive function ftrain(·) on Dtrain.
In addition, we train f(i)aux(·) for each available D(i)
aux .
Next we generate accuracy matrix:
A =
a1,1 a1,2 · · · a1,n
a2,1 ai ,j · · · a2,n...
.... . .
...am,1 am,2 · · · am,n
where
ai ,j =1
n
n∑k=1
[f (i)(x (k)) = j
]
28
![Page 34: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/34.jpg)
Framework: Prediction
First, we generate a prediction matrix:
P(x (k)) =
p1,1 p1,2 · · · p1,n
p2,1 pi ,j · · · p2,n...
.... . .
...pm,1 pm,2 · · · pm,n
We can perform element-wise multiplication of A and P(x (k)) toobtain confidence matrix for x (k):
C (x (k)) = A� P(x (k)) =
a1,1 × p1,1 · · · a1,n × p1,n...
. . ....
am,1 × pm,1 · · · am,n × pm,n
Each ai ,j × pi ,j can be understood as our confidence that x (k)
should be labeled with class j emulated by fi (·).
29
![Page 35: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/35.jpg)
Presentation Overview
Motivation
Task Description
Related and Prior Work
Data from Multiple Jurisdictions
Data Processing
Framework
Experimental Setup
Evaluation and Results
Future Work
Conclusions
30
![Page 36: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/36.jpg)
Experiments
We generate following data sets:
D(i)train = 〈X (i)
train,Y(i)train〉 (100 times)
D(i)test = 〈X (i)
test ,Y(i)test〉 (100 times)
D(i)aux = 〈X (i)
aux ,Y(i)aux〉 (# of auxiliary states)
For each task we conduct 8 related experiments:(AK, MD, TX, KS, CA, ND, PA)(KS, PA, AK, ND, CA, TX, MD)(PA, CA, ND, MD, AK, TX, KS)
In related experiments there are 100 runs for first and eighthexperiments and 300 runs for other experiments.
Experiments show how performance changes as we use more D(i)aux .
31
![Page 37: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/37.jpg)
Training and Test Set Vectorization
We create vectorized data sets X n×m with rows as documents andcolumns as terms by setting each entry of matrix to:
weight(t, d ,D) = tf (t, d) ∗ log(idf (t,D))
t: termd : documentD: document collectiontf (t, d): number of occurrences of t in didf (t,D): number of d ∈ D over number of d ∈ D containing t
Each x (i) ∈ X n×m is vector with m dimensions, where m is numberof unique terms that occur in document collection.
Each x (i) ∈ X n×m is referrenced with unique citation connectingvector to text unit from which it originates.
32
![Page 38: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/38.jpg)
Presentation Overview
Motivation
Task Description
Related and Prior Work
Data from Multiple Jurisdictions
Data Processing
Framework
Experimental Setup
Evaluation and Results
Future Work
Conclusions
33
![Page 39: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/39.jpg)
Evaluation Metrics
PrecisionRatio of correctly retrieved instances over all instances that wereretrieved.
RecallRatio of correctly retrieved instances over all instances that shouldhave been retrieved.
F1 MeasureHarmonic mean of precision and recall where both measures aretreated as equally important.
34
P(f (·),D) =
n∑i=1
∣∣f (x (i)) ∩ y (i)∣∣∣∣f (x (i))
∣∣
R(f (·),D) =
n∑i=1
∣∣f (x (i)) ∩ y (i)∣∣∣∣y (i)
∣∣
F1(P(f (·),D),R(f (·),D)) =2 ∗ P(·) ∗ R(·)P(·) + R(·)
![Page 40: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/40.jpg)
Results (F1-measure)
35
Florida Maryland
task 0aux 1aux 2aux 3aux 4aux 5aux 6aux 7aux 0aux 1aux 2aux 3aux 4aux 5aux 6aux 7aux
AA .43 .45 .45 .46 .47 .47 .48 .48 .42 .44 .45 .47 .48 .50 .50 .51PR .78 .80 .81 .82 .82 .82 .82 .82 .86 .89 .89 .89 .89 .90 .90 .90AC .21 .22 .23 .24 .24 .25 .26 .26 .24 .25 .26 .26 .27 .27 .28 .28GL .25 .27 .28 .28 .29 .29 .30 .30 .27 .29 .30 .31 .32 .32 .33 .33PP .67 .70 .71 .71 .72 .72 .72 .72 .74 .77 .78 .78 .78 .78 .79 .79ET .78 .79 .79 .79 .80 .80 .79 .80 .73 .76 .76 .77 .77 .77 .78 .78RA .30 .30 .31 .31 .32 .32 .33 .33 .30 .30 .31 .31 .31 .32 .32 .32CN .62 .66 .67 .67 .67 .67 .67 .67 .58 .63 .63 .63 .64 .63 .64 .64TF .80 .81 .82 .83 .83 .83 .83 .83 .81 .83 .84 .84 .84 .84 .85 .85
AA PR AC GL PP ET RA CN TF0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Florida
AA PR AC GL PP ET RA CN TF0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Maryland
![Page 41: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/41.jpg)
Comparison to Prior Work
36
task +0 +1 +2 +3 +4 +5 +6 +7
AA P .42 .42 .42 .42 .43 .43 .43 .43
R .45 .44 .44 .45 .45 .45 .45 .45
F .43 .43 .44 .44 .44 .44 .44 .44
PP P .66 .66 .66 .66 .66 .66 .66 .66
R .70 .70 .70 .70 .70 .70 .70 .70
F .67 .68 .68 .68 .68 .68 .68 .68
ET P .78 .79 .79 .80 .80 .80 .80 .80
R .79 .80 .80 .80 .80 .80 .81 .81
F .78 .79 .79 .80 .80 .80 .80 .80
task +0 +1 +2 +3 +4 +5 +6 +7
AA P .42 .42 .42 .42 .42 .41 .42 .42
R .45 .48 .50 .52 .54 .55 .56 .57
F .43 .45 .45 .46 .47 .47 .48 .48
PP P .66 .65 .64 .65 .65 .64 .64 .64
R .70 .75 .78 .80 .81 .82 .83 .84
F .67 .70 .71 .71 .72 .72 .72 .72
ET P .78 .76 .75 .75 .75 .75 .75 .75
R .79 .82 .84 .84 .84 .85 .85 .86
F .78 .79 .79 .79 .80 .80 .79 .80
AA PP ET
0.3
0.4
0.5
0.6
0.7
0.8
0.9
AA PP ET
0.3
0.4
0.5
0.6
0.7
0.8
0.9
![Page 42: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/42.jpg)
Improvement Example
task Manual 0aux 1aux 2-5aux 6aux 7aux
AA 26 20 26 26 26 26
26
PR 2 1 1 1 2 2
2 2
AC 21 43 21 21 21 21
48 43 48 48 48
48
GL 1 50 50 50 50 50
PP 1 1 1 1 1 1
2 2 2 2 2 2
4 4 4 4 4 4
ET 5 5 5 5 5 5
19 19 19 19 19
RA 20 20 20 20 20 20
CN 0 7 7 0 0 0
29 29
30 30
40 40
41 41
TF 0 0 0 0 0 0
P 1 .56 .59 .78 .83 .83
R 1 .5 .78 .89 .89 .89
F1 1 .53 .67 .83 .86 .86
37
![Page 43: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/43.jpg)
Presentation Overview
Motivation
Task Description
Related and Prior Work
Data from Multiple Jurisdictions
Data Processing
Framework
Experimental Setup
Evaluation and Results
Future Work
Conclusions
38
![Page 44: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/44.jpg)
Future Work
I Implement similar framework for the relevance task.
I Experiment with techniques to handle imbalanced and sparsedata sets, e.g. SMOTE.[6] Chawla et al. 2002
I Experiment with overlay framework for multi-dimensionalclassification.[2] Batal et al. 2013
I Generate richer text representation (automatic annotation).
I Experiment with learning tasks simultaneously (multi-tasklearning).[9] Evgeniou & Pontil 2004
I Experiment with other transfer learningtechniques.[19] Pan & Yang 2010
I Utilize existing knowledge:I codebook[28] Codebook [online]
I tables of corresponding agents from different statesI data generated by network analysis
39
![Page 45: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/45.jpg)
Presentation Overview
Motivation
Task Description
Related and Prior Work
Data from Multiple Jurisdictions
Data Processing
Framework
Experimental Setup
Evaluation and Results
Future Work
Conclusions
40
![Page 46: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/46.jpg)
Conclusions
I We have presented framework for transfer of textcategorization models among different US state jurisdictions.
I Performance of most classifiers gradually improve as we usemodels from increasing number of states.
I Relatedness of domains as well as tasks we deal with wasconfirmed.
I Possible way to deal with data sparsity was further exploredand confirmed as promising.
I The framework’s potential benefits are not limited to contextof United States.
41
![Page 47: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/47.jpg)
References I
Aggarwal, Charu & Zhai, ChengXiang (eds.). Mining Text Data. Springer, 2012.
Batal, I., Hong, C., and Hauskrecht, M., An Efficient Probabilistic Framework forMulti-Dimensional Classification. ACM Conference on Information andKnowledge Management. San Fransisco (2013).
Biagioli, C., Francesconi, E., Passerini, A., Montemagni, S., and Soria, C.,Automatic Semantics Extraction in Law Documents, ICAIL 2005 Proceedings,133–140, ACM Press (2005).
Boella, G., Di Caro, L., Lesmo, L., Rispoli, D., and Robaldo, L., Multi-labelClassification of Legislative Text into EuroVoc. JURIX 2012 Proceedings, pp.21–30, B. Schafer (Ed.), IOS Press (2012).
Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and RegressionTrees. Boca Raton, FL: CRC Press, 1984.
Chawla, N.V., Bowyer, K.W., Hall, L.O., and Kegelmeyer, W.P., SMOTE:synthetic minority over-sampling technique. J. Artif. Int. Res. 16:321–357 (2002).
Daudaravicius, V., Automatic multilingual annotation of EU legislation withEurovoc descriptors. EEOP2012 Workshop Proceedings (2012).
42
![Page 48: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/48.jpg)
References II
de Maat, E., Winkels, R., Categorisation of norms. JURIX 2007, pp.79–88, IOSPress (2007).
Evgeniou, Theodoros & Pontil, Massimiliano. Regularized Multi–Task Learning.KDD’04. Seattle, WSH, USA, 2004.
Francesconi, E., Montemagni, S., Peters, W., and Tiscornia, D., Integrating aBottom-Up and Top-Down Methodology for Building Semantic Resources for theMultilingual Legal Domain. In Semantic Processing of Legal Texts. LNAI 6036,pp. 95–121. Springer: Berlin (2010).
Francesconi, E., An Approach to Legal Rules Modelling and Automatic Learning.JURIX 2009 Proceedings (G. Governatori, Ed.), 59–68, IOS Press (2009).
Francesconi, E., and Peruginelli, G., Integrated Access to Legal Literaturethrough Automated Semantic Classification. Artificial Intelligence and Law17:31–49 (2008).
Francesconi, E., and Passerini, A., Automatic Classification of Provisions inLegislative Texts. Artificial Intelligence and Law 15:1–17 (2007).
Jones, Karen Sparck. ”A statistical interpretation of term specificity and itsapplication in retrieval.” Journal of documentation 28.1 (1972): 11-21.
43
![Page 49: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/49.jpg)
References III
Grabmair, M., Ashley, K.D., Hwa, R., and Sweeney, P.M., Toward ExtractingInformation from Public Health Statutes using Text Classification and MachineLearning. JURIX 2011 Proceedings, pp. 73-82 (Katie M. Atkinson ed.) IOS Press2011.
Kakwani, N., On a class of poverty measures. Econometrica, pp. 437-446 (1980).
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., andMcClosky, D., The Stanford CoreNLP Toolkit. In 52nd Annual Meeting of theACL: System Demonstrations, pp. 55-60.
Opsomer, R., De Meyer, G., Cornelis, C., van Eetvelde, G., Exploiting Propertiesof Legislative Texts to Improve Classification Accuracy. JURIX 2009 (G.Governatori, Ed.), 136–145, IOS Press (2009).
Pan, Sinno Jialin & Yang, Qiang. A Survey on Transfer Learning. IEEETransactions on Knowledge and Data Engineering, 22(10):1345–1359, 2010.
Pouliquen, B., Steinberger, R., and Ignat, C., Automatic annotation ofmultilingual text collections with a conceptual thesaurus. arXiv preprintcs/0609059 (2006).
Steinberger, R., Ebrahim, M., and Turchi, M., JRC EuroVoc Indexer JEX-A freelyavailable multi-label categorisation tool. arXiv preprint arXiv:1309.5223 (2013).
44
![Page 50: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/50.jpg)
References IV
Sweeney, P.M., Bjerke, E.F., Potter, M.A., Guclu, H., Keane, C.R., Ashley, K.D.,Grabmair, M., Hwa, R., Network Analysis of Manually-Encoded State Laws andProspects for Automation. In Winkels, R., Lettieri, N., Faro, S., (Eds.) NetworkAnalysis in Law. Diritto Scienza Tecnologia (2014).
Savelka, J., Ashley, K.D., Grabmair, M., Mining Information from StatutoryTexts in Multi-jurisdictional Settings. In Hoekstra, R. (Ed.) JURIX 2014. IOSPress (2014).
Winkels, R., and Hoekstra, R., Automatic Extraction of Legal Concepts andDefinitions. JURIX 2012, pp. 157–166, IOS Press (2012).
Wyner, A., and Peters, W., On Rule Extraction from Regulations. JURIX 2011,pp. 113–122, IOS Press (2011).
Zhang, M., and Zhou, Z., ML-KNN: A lazy learning approach to multi-labellearning. Pattern recognition 40.7:2038-2048 (2007).
Lucene [online]. 2012 [cit. 08/27/2014]. Accessed at:http://lucene.apache.org/core/
PHASYS ARM 2 - LEIP codebook [online]. Revised 11/18/2012 [cit.08/27/2014]. Accessed at: http://www.phasys.pitt.edu/
45
![Page 51: Transfer of Predictive Models for Classification of](https://reader034.vdocuments.us/reader034/viewer/2022042201/625865f1df82e1313971908a/html5/thumbnails/51.jpg)
Thank you!
Questions, comments and suggestions are welcome nowor any time at [email protected].
This work was supported by the University of Pittsburgh’s University Research Council Multidisciplinary Small Grant Program.This publication was also supported in part by the Cooperative Agreement 5P01TP000304 from the Centers for Disease Control and
Prevention. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the CDC.