memo$ to:!! students!in!ai!and!legal!reasoning!seminar ... · ! 5!...

11
1 MEMO To: Students in AI and Legal Reasoning Seminar From: Kevin Ashley re: Collaborative Design Problem Date: January 3, 2017 Students, This is a draft that I prepared recently in connection with a proposal for sabbatical funding for spring 2018. It describes the kind of project that I would like us to focus on for the seminar Collaborative Design Problem. Please read this for the first class on January 11, 2017 and we can discuss it. Best, KA

Upload: others

Post on 05-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MEMO$ To:!! Students!in!AI!and!Legal!Reasoning!Seminar ... · ! 5! constructed!rules!to!extract!information.8! Figure$1:LUIMAsentenceleveltypesthatlawstudentsneedtounderstandinlearningtoreadacase

  1  

MEMO  To:     Students  in  AI  and  Legal  Reasoning  Seminar  From:   Kevin  Ashley  re:     Collaborative  Design  Problem  Date:   January  3,  2017      Students,    This  is  a  draft  that  I  prepared  recently  in  connection  with  a  proposal  for  sabbatical  funding  for  spring  2018.  It  describes  the  kind  of  project  that  I  would  like  us  to  focus  on  for  the  seminar  Collaborative  Design  Problem.    Please  read  this  for  the  first  class  on  January  11,  2017  and  we  can  discuss  it.    Best,  KA    

Page 2: MEMO$ To:!! Students!in!AI!and!Legal!Reasoning!Seminar ... · ! 5! constructed!rules!to!extract!information.8! Figure$1:LUIMAsentenceleveltypesthatlawstudentsneedtounderstandinlearningtoreadacase

  2  

ANNOTATING  LEGAL  CASES  AND  STATUTES  FOR  ARGUMENT  MINING  AND  PEDAGOGY  New   text   analytic   techniques  promise   to   revolutionize   legal   information   retrieval,   but   the  machine   learning   on   which   they   depend   requires   sets   of   training   instances   comprising  manually   annotated   legal   cases   and   statutes.   “Annotating”  means  marking-­‐up   the   texts   of  case   decisions   or   statutes   to   identify   instances   of   semantic   types   of   information   that   are  both   important   for   conceptual   legal   information   retrieval   and   pedagogically   relevant.   In  order  to  support  this  need  for  manual  annotation  of  legal  texts,   it   is  possible  that  students  learning  law  could  perform  these  annotations  as  part  of  their  studies.  As  a  by-­‐product,  they  would   produce   the   semantically   annotated   legal   texts   with   which   machine   learning  programs  could  be  taught  to  automatically  perform  similar  annotation  of  new  texts.  AN  OPPORTUNITY  AND  AN  OBSTACLE    

Recent   research   and   developments   in   question   answering,   information   extraction,   and  argument  mining   from   text   have   planted   the   seeds   of   this   revolution  with   programs   like  IBM’s   Watson   and   Debater   and   their   underlying   open-­‐source   information   management  architectures.  These  programs  will  not  perform  legal  reasoning  themselves,  but  their  open  source   text   analysis   tools   will   enable   the   development   of   new   legal   applications.   The  machine  learning  based  tools  will   identify  argument-­‐related  information  in  legal  texts  that  can   transform   legal   information   retrieval   into   a   new   kind   of   conceptual   information  retrieval:  argument  retrieval.  As  a  result,  lay  and  professional  legal  practitioners  will  be  able  to  retrieve  information  more  effectively  from  legal  databases.  Conceptual  legal  information  retrieval   addresses   the  need   in  common   law,  and   increasingly   in  civil   law,   jurisdictions   to  retrieve   legal   case   opinions   whose   relevance   lies   in   how   they   can   be   used   in   particular  argument  schemes  for  analyzing  new  problems.    

While  new  legal  apps  such  as  Lex  Machina,1  Ross,2  and  Ravel3  employ  text  analytics  and  some  can  even  make  predictions,  they  do  not  address  the  substantive  merits  of  a  case  and  cannot  make  legal  arguments  or  explain  their  predictions  because  they  lack  computational  models   of   legal   reasoning   or   argument.   The   computer   science   research   field   of   Artificial  Intelligence   and   Law   (AI   &   Law)   provides   such   computational   models,   and   semantic  annotation  helps  to  fill  the  gap  between  legal  case  texts  and  those  models  of  legal  reasoning,  enabling  a  computer  to  recognize,  understand  and  process  elements  of  a  court’s  reasoning.    

The  newly   extracted   argument-­‐related   information   enables   the   computational  models  of  legal  reasoning  and  argument  to  deal  directly  with  legal  texts.  These  models  will  perform  legal   reasoning;   they   can   generate   arguments   for   and   against   particular   outcomes   in  problems   input   as   texts,   predict   a   problem’s   outcome,   and   explain   their   predictions  with  reasons  that   legal  professionals  will  recognize  and  can  evaluate  for  themselves.  The  result  will  be  a  new  kind  of  legal  app,  one  that  enables  cognitive  computing,  a  kind  of  collaborative  activity  between  human  and  computers   in  which  each  does   the  kind  of   intelligent  activity  that   it   can   do   best.   For   particular   domains,   it   will   support   cognitive   computing   legal  applications  that  can  help  humans  to  pose  and  test  legal  hypotheses,  predict  outcomes,  and  explain  the  predictions.  For  argument  mining  to  succeed,  however,   the  machine   learning  programs  need  sets  of  

training  instances  manually  annotated  by  humans.  The  legal  decisions  can  be  annotated  in  terms  of  certain  roles  that  sentences  play  in  legal  argument.  The  sentence  annotation  types  

                                                                                                               1  Surdeanu,  M.,  Nallapati,  R.,  Gregory,  G.,  Walker,  J.,  and  Manning,  C.  2011.  Risk  Analysis  For  Intellectual  Property  Litigation.    Proceedings  of  the  13th  International  Conference  on  Artificial  Intelligence  and  Law.  Pages  116–120.  ACM.  2  Ross  Intelligence.  2015.  Ross:  Your  Brand  New  Super  Intelligent  Attorney.  (accessed:  December  30,  2015).  3  Ravel  Law.  2015a.  Ravel:  Data  Driven  Research.  (Last  accessed:  December  30,  2015).  

Page 3: MEMO$ To:!! Students!in!AI!and!Legal!Reasoning!Seminar ... · ! 5! constructed!rules!to!extract!information.8! Figure$1:LUIMAsentenceleveltypesthatlawstudentsneedtounderstandinlearningtoreadacase

  3  

include  stating  a   legal   rule,  expressing  a   judge’s  holding   that  a   rule   requirement  has  been  satisfied   (or  not),   reporting   a   finding  of   fact,   describing   evidence,   and  others.   In   addition,  the  decisions  can  be  marked-­‐up  in  terms  of  general  structural  features  of  arguments  such  as  premises   and   conclusions   as  well   as   substantive   features   of   legal   domains.   Such   features  include  legal  factors,  patterns  of  fact  that  strengthen  or  weaken  a  side’s  position  on  a  claim.  In  related  work,  researchers  are  developing  annotation  schemes  with  pedagogical  potential  for  marking-­‐up  statutes.  A   substantial   obstacle,   however,   involves   the  question  of  who  will   perform   the  manual  

annotation  of   training   sets.  While   crowd   sourced   text   annotation  may   succeed  with   some  kinds  of  texts,  it  is  likely  that  marking  up  legal  cases  and  statutes  requires  annotators  with  some   level   of   legal   expertise.   Legal   professionals,   however,   are   often   unwilling   to   devote  substantial  amounts  of  time  to  the  task.    

SOLUTION  VISION    From  an  educational  viewpoint,   I  hypothesize   that  students  of   law  could   learn  valuable  

lessons  from  annotating  legal  texts.  The  annotating  task  would  draw  students’  attention  to  key   aspects   of   the   reasoning   in   a   legal   case:   the   roles   that   certain   sentences   play   in   legal  argument,  structural  features  of  argumentation,  and  substantive  strengths  and  weaknesses  of  a   legal  argument   involving  a  particular  area  of   law.     Indeed,   in  US   law  schools,  much  of  the   first   year   curriculum   is   aimed   at   inculcating   skills   of   identifying   these   aspects   as  students   learn   to   “read   a   case.”   Annotation   exercises   supported   by   new   technological  techniques   could   fit   well   into   the   existing   law   school   curriculum   and   reinforce   these  important   lessons.   As   they   learn,   the   students   would   produce   useful   data   with   which  machine  learning  programs  can  learn  to  annotate  texts  automatically.    Three  components  are  necessary  in  order  to  support  students  in  annotation  as  a  learning  

activity.    First,   in  order  to   incorporate  annotation  into  the  curriculum  for   law  students  (graduate  

students,  undergraduates  and  even  high  school  students)  one  needs  to  develop  convenient  web-­‐based  mark-­‐up  environments   that   could  be  used  on   tablet   computers  or   laptops  and  that  would  make  annotation  almost  as  convenient  as  highlighting  texts  on  line.    

Second,   the   annotation   environments   must   be   part   of   an   online   system   that   ensures  annotator   reliability   (i.e.,   agreement)   across   multiple   students   annotating   the   same  documents.  Maintaining  reliability  is  important  for  enabling  successful  machine  learning.4  

Third,  one  needs  to  tailor  the  annotation  activities  into  a  legal  curriculum  by  developing  a   set   of   pedagogical   materials   that   not   only   guide   students   in   annotation   but   also  underscore   the   lessons   to   be   drawn   from   the   annotation   experience.   This   may   include  computer-­‐supported   peer   review,   in   which   student   annotators   critique   each   other’s  annotations,   as  well   as   formative   evaluation   instruments   for   assessing  whether   and  what  students  have  learned  from  the  annotation.    

NEED  FOR  DAAD  SUPPORT    My  goals  during  the  sabbatical  period  are  to  strengthen  my  connections  to  and  research  

collaborations  with  particular  German  research  groups  whose  technology  and  insights  will  be   instrumental   in   realizing   the   vision,   especially   regarding   the   first   two   of   the   above  components.  DAAD’s  support  will  enable  me  to  pursue  these  collaborations.  In  particular,  I  hope  to  spend  time  in  extended  discussions  with  the  following  groups:  

• Ubiquitous   Knowledge   Processing   (UKP)   Lab,   Dept.   of   Computer   Science,  Technische  Universität  Darmstadt  (TUDA),  Dr.  Iryna  Gurevych,  Director.  

                                                                                                               4  See  Aharoni,  E.,  Polnarov,  A.,  Lavee,  T.,  Hershcovich,  D.,  Levy,  R.,  Rinott,  R.,  Gutfreund,  D.,  and  Slonim,  N.  2014.  A  benchmark  dataset  for  automatic  detection  of  claims  and  evidence  in  the  context  of  controversial  topics.  ACL  2014,  64.  

Page 4: MEMO$ To:!! Students!in!AI!and!Legal!Reasoning!Seminar ... · ! 5! constructed!rules!to!extract!information.8! Figure$1:LUIMAsentenceleveltypesthatlawstudentsneedtounderstandinlearningtoreadacase

  4  

• Institute  of  Philosophy,  Karlsruhe  Institute  of  Technology  (KIT),  Prof.  Gregor  Betz  • Software  Engineering  for  Business  Information  Systems,  Department  of  Informatics,  

Technische  Universität  München,  Profs.  Dr.  Florian  Matthe  and  Bernhard  Waltl.  The   UKP   Lab   in   Darmstadt   has   developed   the   WebAnno   tool,   a   convenient   on-­‐line  

annotation   environment   that   supports   annotation   through   crowdsourcing.   My   research  colleagues  are  already  using  WebAnno  in  small-­‐scale  annotation  applications  and  we  would  like  to  extend  it  into  a  pedagogical  environment.    

The  KIT  group  is  working  on  development  of  text  corpora  for  policy  argumentation  and  of   argumentation   schemes   that   extend   surface   annotations   into   deeper   interpretative  argument  analyses.    

The  TUM  team  is  focused  on  annotating  more  indirect,  inferential  citation  references  in  legal   texts   and   on   semantic   analysis   of   legal   statutory   and   regulatory   texts   in   connection  with  business  compliance.  Since  annotation  of   legal  documents   (e.g.,   contracts,  briefs,   and  memoranda)   could   also   be   an   important   training   activity   for   in-­‐house   counsel,   legal  associates  and  paralegals,   the  Munich   team’s  experience  applying  annotation   in  corporate  legal  settings  could  lead  to  mutually  beneficial  collaborations.    

BACKGROUND  AND  RELATED  WORK    My   book,   Artificial   Intelligence   and   Text   Analytics:   New   Tools   for   Law   Practice   in   the  

Digital   Age,   to   be   published   by   Cambridge   University   Press   in   late   spring   2017,   explains  how  the  text  analytic  technologies  enable  new  tools  for   legal  practice  using  computational  models   of   legal   reasoning   and   argumentation   developed   by   AI   &   Law   researchers.5   It  introduces   the   techniques   for   automated  question   answering,   information   extraction,   and  argument   mining   from   texts   as   well   as   various   AI   &   Law   computational   models   of   legal  reasoning   and   argument   that   the   text   analytic   technologies   will   connect   directly   to   legal  texts.  

The   computational  models   illustrate   how   to   represent   legal   cases   so   that   a   computer  program   can   reason   about   whether   they   are   analogous   to   a   case   to   be   decided.6   They  illustrate   ways   in   which   a   program   can   compare   a   problem   and   cases,   select   the   most  relevant  cases,  and  generate   legal  arguments  by  analogy   for  and  against  a   conclusion   in  a  new  case.  Some  of  the  computational  models  of  legal  analogy  integrate  values  and  policies  into  the  measures  of  case  relevance,  predict  case  outcomes,  and  explain  the  predictions  in  terms  of  arguments.7  

The   Artificial   Intelligence   and   Text   Analytics   book   introduces   a   specialized   kind   of  ontology   for   information   extraction,   “type   systems,”   which   are   a   basic   text   analytic   tool.  Type   systems   support   automatically  marking-­‐up   or   annotating   legal   texts   semantically   in  terms  of   important  concepts.   It  explains  how  case  and  statutory   texts  are  represented   for  purposes   of   applying   machine   learning,   natural   language   processing,   and   manually                                                                                                                  5  Ashley,  K.  D.  2017,  Artificial  Intelligence  and  Legal  Analytics,  Cambridge  University  Press,  Cambridge;  New  York.  6  See,  e.g.,    − The  Value  Judgment-­‐based  Argumentative  Prediction  Model  (VJAP),  Grabmair,  M.  2016.  

Modeling  Purposive  Legal  Argumentation  And  Case  Outcome  Prediction  Using  Argument  Schemes  In  The  Value  Judgment  Formalism.  Ph.D.  thesis,  University  of  Pittsburgh,  Pittsburgh,  PA,  USA;    

− The  Carneades  model,  Gordon,  T.,  Prakken,  H.,  and  Walton,  D.  2007.  The  Carneades  model  of  argument  and  burden  of  proof.  Artificial  Intelligence,  Argumentation  in  Artificial  Intelligence,  171(10–15),  875  –  896;    

− The  Value-­‐Based  Argumentation  Framework  model  (VAF),  Atkinson,  K.,  and  Bench-­‐Capon,  T.  2007.  Argumentation  and  standards  of  proof.  Proceedings  of  the  11th  International  Conference  on  Artificial  Intelligence  and  Law.  Pages  107–116,  ACM.  

7  Notably,  Grabmair’s  VJAP  and  Atkinson’s  and  Bench-­‐Capon’s  VAP  models,  ibid.  

Page 5: MEMO$ To:!! Students!in!AI!and!Legal!Reasoning!Seminar ... · ! 5! constructed!rules!to!extract!information.8! Figure$1:LUIMAsentenceleveltypesthatlawstudentsneedtounderstandinlearningtoreadacase

  5  

constructed  rules  to  extract  information.8  

 Figure  1:  LUIMA  sentence  level  types  that  law  students  need  to  understand  in  learning  to  read  a  case.  

These   techniques   have   automatically   identified   argument-­‐related   information   about  some  of  the  roles  of  sentences  as,  for  example,  statements  of  legal  rules  in  the  abstract  or  as  applied   to   specific   facts,   or   as   case   holdings   and   findings   of   fact.9   Figure   1   shows   an  extended  list  of  the  LUIMA  sentence  level  types.  Law  students  would  benefit  from  practice  identifying  the  sentences  playing  these  roles   in   legal  cases;  their  annotations  of  such  roles  could   enable   machine   learning   programs   to   learn   to   successfully   extract   the   remaining  roles.  

Information   extraction   techniques   also   have   automatically   annotated   more   general  roles   such   as   propositions   in   arguments,   premises   or   conclusions,10   and   the   argument  schemes   that   justify   the   conclusions   given   the   premises,   schemes   such   as   analogizing   the  current  facts  to  a  prior  case  or  distinguishing  them.11    

                                                                                                               8  Ashley,  op.  cit.    9  Grabmair,  M.,  Ashley,  K.,  Chen,  R.,  Sureshkumar,  P.,Wang,  C.,  Nyberg,  E.,  and  Walker,  V.  2015.  Introducing  LUIMA:  An  Experiment  in  Legal  Conceptual  Retrieval  of  Vaccine  Injury  Decisions  using  a  UIMA  Type  System  and  Tools.  Proceedings  of  the  15th  International  Conference  on  Artificial  Intelligence  and  Law.  ICAIL  2015.  Pages  1–10.  New  York,  NY,  USA:  ACM;  A.  Bansal,  Z.  Bu,  B.  Mishra,  S.  Wang,  K.  Ashley  and  M.  Grabmair,  Document  Ranking  with  Citation  Information  and  Oversampling  Sentence  Classification  in  the  LUIMA  Framework,  Legal  Knowledge  and  Information  Systems,  F.  Bex  and  S.  Villata  (Eds.),  pp.  33-­‐42.  IOS  Press,  2016.  10  Mochales,  R.,  and  Moens,  M.-­‐F..  2011.  Argumentation  mining.  Artificial  Intelligence  and  Law,  19(1),  1–22.  Moens,  M.-­‐F.,  Boiy,  E.,  Palau,  R.  M.,  and  Reed,  C.  2007.  Automatic  Detection  of  Arguments  in  Legal  Texts.  Proceedings  of  the  11th  International  Conference  on  Artificial  Intelligence  and  Law.  ICAIL  ’07,  Pages  225–230.  New  York,  NY,  USA:  ACM.  Levy,  Ran,  Bilu,  Yonatan,  Hershcovich,  Daniel,  Aharoni,  Ehud,  and  Slonim,  Noam.  2014.  Context  Dependent  Claim  Detection.  Pages  1489–1500  of:  Proceedings  of  COLING  2014,  the  25th  International  Conference  on  Computational  Linguistics:  Technical  Papers.  (IBM  Debater  Project)  11  Feng,  V.  W.,  and  Hirst,  G.  2011.  Classifying  arguments  by  scheme.  Proceedings  of  the  49th  Annual  

Cita%on( Sentence&includes&a&cita-on&to&a&legal&authority.&

Legal(rule(( Sentence&states&a&legal&rule&in&the&abstract,&without&applying&it&to&par-cular&facts.&

Legal(rule(requirement( Sentence&that&states&a&requirement&or&element&of&a&legal&rule&in&the&context&of&applying&the&requirement&to&the&facts&of&the&par-cular&case&being&li-gated.&

Legal(ruling(or(holding(of(law((

Sentence&states&a&legal&ruling&or&holding&of&law&by&the&judge.&

Evidence8based(finding(of(fact((

Sentence&reports&the&fact&finder's&finding&on&whether&or&not&evidence&in&a&par-cular&case&proves&that&a&rule&condi-on&has&been&sa-sfied.&

Evidence(( Sentence&summarizes&an&item&of&evidence&in&the&case.&

Evidence8based(intermediate(reasoning(

Sentence&involves&reasoning&about&whether&evidence&in&a&par-cular&case&proves&that&a&rule&condi-on&has&been&sa-sfied.&

Legal(policy( Sentence&states&a&legal&policy&or&value&in&the&abstract&without&applying&it&to&par-cular&facts.&

Applied(legal(value( Sentence&that&involves&reasoning&about&the&applica-on&of&a&legal&policy&or&value&to&par-cular&facts.&

Legal(factor( Sentence&in&which&a&judge&states&as&a&reason&why,&or&despite&which,&s/he&came&to&a&conclusion&of&law&that&a&legal&rule&or&requirement&did&[not]&apply&in&a&fact&situa-on&and&refers&to&a&stereotypical&fact&paDern&that&tends&to&strengthen&or&weaken&the&legal&conclusion&because&of&its&effect&on&a&legal&policy&or&value.&

Policy8based(reasoning( Sentence&involves&reasoning&about&the&applica-on&of&a&legal&policy&to&par-cular&facts.&

Case8specific(process(or(procedural(facts(

Sentence&refers&to&the&procedural&seGng&of&or&a&procedural&issue&in&the&case.&

Page 6: MEMO$ To:!! Students!in!AI!and!Legal!Reasoning!Seminar ... · ! 5! constructed!rules!to!extract!information.8! Figure$1:LUIMAsentenceleveltypesthatlawstudentsneedtounderstandinlearningtoreadacase

  6  

Finally,   some  progress   has   been  made   in   machine  learning   to   identify  information   that  affects   the   strength   of  an   argument   such   as  evidence   factors,   a  fact-­‐finder’s   stated  reasons   and   assigned  plausibility-­‐values   for  a  conclusion,12  or   legal  

factors,   stereotypical   fact   patterns   that   strengthen   a   particular   type   of   legal   claim.13   For  example,   Figure   2   refers   to   three   such   legal   factors,   F6   Security-­‐Measures,   F15   Unique-­‐Product,  and  F19  No-­‐Security-­‐Measures,  that  apply  in  the  domain  of  claims  for  trade  secret  misappropriation,   as   well   as   their   effects   on   certain   policy   values   that   trade   secret   law  protects.  

The  LUIMA  architecture  supports  applying  a  type  system  and  text  annotation  pipeline  to  process   case   texts   for   argument-­‐related   information   about   sentence   roles.   It   employs   the  techniques  for  automating  conceptual  markup  of  documents  and  for  extracting  information  from   legal   case   texts   and   integrates   them   into   a   prototype   system   for   conceptual   legal  information  retrieval.  The  system  has  modules  for  automatic  sub-­‐sentence  level  annotation,  machine-­‐learning-­‐based   sentence   annotation,   basic   retrieval   using   a   full-­‐text   information  retrieval  system,  and  a  machine-­‐learning-­‐based  re-­‐ranking  of  the  retrieved  documents.  The  automated  annotation  and  machine  learning  are  based  on  manually  annotated  training  sets  of   documents.14   Evaluations   have   demonstrated   objectively   the   contribution   the   system’s  argument-­‐related   information   makes   to   improve   the   full-­‐text   legal   information   system’s  rankings.15    

Thus,   evidence  supports   the  LUIMA  hypothesis,   that  argument   retrieval   is   feasible.  By  semantically   annotating   documents   with   argument   role   information   and   retrieving   them  based  on   the  annotations,  one  can  outperform  current  systems   that   rely  on   text  matching  and  current  techniques  for  legal  information  retrieval.    The  Artificial  Intelligence  and  Legal  Analytics  book  explores  how  to  extend  the  LUIMA  type  system  to  other  kinds  of  conceptual  queries  and  to  statutory  information  retrieval.16  

Given   robust   argument   retrieval,   an   app   could   support   a   cognitive   computing  environment   tailored   to   the   legal   domain   in   terms  of   tasks,   interface,   inputs   and  outputs.  Type   systems   and   annotations   based   on   the   computational   models   could   help   humans  frame   hypotheses   about   legal   arguments,   make   predictions,   and   test   them   against   the                                                                                                                                                                                                                                                                                                                                            Meeting  of  the  Association  for  Computational  Linguistics:  Human  Language  Technologies-­‐Volume  1,  Pages  987–996.  Association  for  Computational  Linguistics.  12  Walker,  Vern  R,  Carie,  Nathaniel,  DeWitt,  Courtney  C,  and  Lesh,  Eric.  2011.  A  framework  for  the  extraction  and  modeling  of  fact-­‐finding  reasoning  from  legal  decisions:  lessons  from  the  Vaccine  /  Injury  Project  Corpus.  Artificial  Intelligence  and  Law,  19(4),  291–331.  13  Ashley,  K.,  and  Brüninghaus,  S.  2009.  Automatically  classifying  case  texts  and  predicting  outcomes.  Artificial  Intelligence  and  Law,  17(2),  125–165;  Wyner,  A.,  and  Peters,  W.  2012.  Semantic  Annotations  for  Legal  Text  Processing  using  GATE  Teamware.  LREC  2012  Conference  Proceedings,  Semantic  Processing  of  Legal  Texts  (SPLeT-­‐2012)  Workshop.  Pages  34–36.  14  Grabmair,  et  al.,  op  cit.;  Bansal,  et  al.  op.  cit.  15  Ibid.  16  Ashley,  op.  cit.,  Chapters  11,  12.  

•  Plain&ff’s*property'interest''is'legi-mate**–  because*plain&ff’s*informa&on*was*unique*in*that*plain&ff**was*the*only*manufacturer*

making*the*product.*(F15$Unique+Product)**

•  Plain&ff**has*protected'his'property'interest'''–  because*plain&ff**took*ac&ve*measures*to*limit*access*to*and*distribu&on*of*its*informa&on.*

(F6$Security+Measures)**

•  Plain&ff**has*protected'his'confiden-ality'interest'''–  because*plain&ff**took*ac&ve*measures*to*limit*access*to*and*distribu&on*of*its*informa&on.*

(F6$Security+Measures)**

•  Plain&ff*has*waived'his'interest'in'confiden-ality'–  because*plain&ff*failed*to*take*ac&ve*measures*to*limit*access*to*and*distribu&on*of*its*

informa&on.*(F19$No+Security+Measures)**

Figure  2:  Some  policy  values  underlying  trade  secret  law  (italics)  and  selected  factors  affecting  them  (bold).  

Page 7: MEMO$ To:!! Students!in!AI!and!Legal!Reasoning!Seminar ... · ! 5! constructed!rules!to!extract!information.8! Figure$1:LUIMAsentenceleveltypesthatlawstudentsneedtounderstandinlearningtoreadacase

  7  

documents   in   a   corpus.   Posing   and   testing   legal   hypotheses   is   a   paradigmatic   cognitive  computing   activity   in  which  humans   and   computers   can   collaborate,   each  performing   the  kind   of   intelligent   activity   it   performs   best.   Humans   know   the   hypotheses   that   matter  legally;   the  computer  helps   them  to   frame  and   test   these  hypotheses  based  on  arguments  citing   cases   and   counterexamples.   The   type   system   annotations   will   enable   a   conceptual  legal   information   system   to   retrieve   case   examples   relevant   to   the   hypotheses,   generate  summaries  tailored  to  the  users’  needs,  construct  arguments,  and  explain  predictions.17  

SOLUTION  OVERVIEW  By   the   end   of   the   spring   term   in   2017,   I   hope   to   have   gained   some   insight   and  

experience  by  engaging   law  and  graduate  students   in  designing  and  applying  an  extended  pedagogical  activity  of  annotating   legal   texts.  The  Artificial   Intelligence  and  Legal  Analytics  book  will  serve  as  the  course  textbook  for  my  spring  term  seminar  on  AI  &  Legal  Reasoning  at  the  University  of  Pittsburgh  School  of  Law.  As  part  of  the  seminar,  students  will  engage  in  designing   and   trying   out   exercises   that   focus   students   on   annotating   sentence   roles   and  legal   factors   in   trade   secret   misappropriation   cases.   Later   in   the   course,   students   will  observe  how  their  annotations  work  in  conjunction  with  the  LUIMA  architecture  to  enable  the  Value  Judgment-­‐based  Argumentative  Prediction  (VJAP)  computational  model  to  make  arguments  about  and  predictions  for  the  cases  they  have  annotated.    

 Figure  3:  WebAnno  annotation  environment  for  marking  up  trade  secret  factors.  

ANNOTATION   ENVIRONMENT:   In   particular,   this   will   be   our   first   attempt   at   using   UKP’s  WebAnno   to  orchestrate  a   larger   scale  annotation  effort   as  a  pedagogical   activity,   employ                                                                                                                  17  Ashley,  op.  cit.,  Chapter  12.  

Page 8: MEMO$ To:!! Students!in!AI!and!Legal!Reasoning!Seminar ... · ! 5! constructed!rules!to!extract!information.8! Figure$1:LUIMAsentenceleveltypesthatlawstudentsneedtounderstandinlearningtoreadacase

  8  

multiple  annotators  per  case  and  monitor  reliability  of  annotation.  As  suggested  in  Figure  3,  students  read  a  case  on  screen  and  highlight  sentences  they  label  as  instances  of  particular  factors.   A   manual   will   guide   them   through   the   process   of   annotating   factors,   including  providing   a   list,   definitions,   and   examples   of   the   26   trade   secret   factors   that   have   been  identified  so  far  as  well  as  sentence  argument  roles  such  as  stating  findings  of  fact  and  legal  conclusions.    

We  will  explore:   (1)  Motivating  student  annotators   to  achieve  higher  productivity  and  reliability   of   annotation   by   introducing   some   competition.   (2)   Employing   computer-­‐supported   peer   review   through   which   students   can   critically   review   each   other’s  annotations  in  order  to  achieve  reliability.  

DEMONSTRATION  OF  REASONING:    Once  students  have  annotated  some  cases,  we  will  demo  inputting   them   into   the   VJAP   program,   which   can   predict   an   outcome   and   generate  arguments   explaining   the   prediction.   VJAP   employs   a   legal   domain  model   of   trade   secret  misappropriation   law   shown   in   Figure   4.   The   upper   part   comprises   the   rules   defining   a  trade  secret  claim,  including  the  rules’  terms  or  issues,  such  as  the  requirement  to  Maintain-­‐Secrecy.  Each  issue  has  associated  factors  that  can  strengthen  or  weaken  a  side’s  argument  that  the  issue  has  been  satisfied.  

In   realistic   legal  disputes,   it   is   frequently  true   that   some   facts  favor   a   side   and   others  favor   an   opponent.   In  Figure   3,   for   instance,  the   student   has   marked  instances   of   the   three  factors   illustrated   in  Figure   2:   F6   Security-­‐Measures,   F15   Unique-­‐Product,   and   F19   No-­‐Security-­‐Measures.   F6  and   F15   favor   the  plaintiff,   while   F19  favors   the   defendant.   In  another   case   called  Dynamics,   the   factors  also   conflicted;   F15,   F6  and   F4   (Agreed-­‐Not-­‐To-­‐Disclose)   favored   the   plaintiff,   while   F27   Disclosure-­‐In-­‐Public-­‐Forum   and   F5   Agreement-­‐Not-­‐Specific  favored  the  defendant.  

   Based  on   its  domain  model,  Figure  4,   the  VJAP  program  determines   the   issues   in   the  rules   defining   a   trade   secret   claim   that   are   affected   by   the   conflicting   factors.   It   employs  additional   information   based   on   Factors’   effects   on   values   protected   under   trade   secret  regulation  as  in  Figure  2.  In  light  of  these  effects  on  values,  deciding  an  issue  or  the  claim  for  one  side  or  the  other  protects  certain  interests  at  the  expense  of  other  interests.    

The  VJAP  program  resolves  tradeoffs  into  confidence  values  in  an  argument  graph  and  aggregates   them   quantitatively   using   the   domain  model.   The   arguments   in   the   graph   are  constructed  with  schemes  for  making  and  responding  to  arguments  by  analogy  that  a  legal  rule   should   apply   to   a   current   case   based   on   past   cases.   These   analogies   assert   that   the  current  case  and  prior  cases  present  the  same  (local  or  inter-­‐issue)  tradeoffs  in  value  effects  and  should  therefor  have  the  same  results.  Using  these  schemes  the  program  can  retrieve  

• F5�

[agreement-not-specific] : The facts that the nondisclosure agreement was not specific

is relevant for breach-of-confidentiality .

• F27�

[disclosure-in-public-forum] : The fact that plainti↵ disclosed its information in a

public forum is relevant for maintain-secrecy .

In this matter, we can connect all 26 factors to the leaf issues of the Restatement model

and produce the complete VJAP model displayed in Fig. 1. A tabular specification of the all

issues and factor connections is given in appendix B.

Info-Trade-Secret

Info-Misappropriated

Information- Valuable

Maintain- Secrecy

Confidential- Relationship

Improper- Means

Information- Used

and

and

Trade-Secret-Misappropriation

F8 Competitive-Advantage (P) F11 Vertical-Knowledge (D) F15 Unique-Product (P) F16 Info-Reverse-Engineerable (D) F20 Info-Known-to-Competitors (D) F24 Info-Obtainable-Elsewhere (D)

F4 Agreed-Not-To-Disclose (P) F6 Security-Measures (P) F10 Secrets-Disclosed-Outsiders (D) F12 Outsider-Disclosures-Restricted (P) F19 No-Security-Measures (D) F27 Disclosure-In-Public-Forum (D)

F1 Disclosure-In-Negotiations (D) F4 Agreement-not-specific (D) F5 Agreed-Not-To-Disclose (P) F13 Noncompetition-Agreement (P) F21 Knew-Info-Confidential (P) F23 Waiver-of-Confidentiality (D)

F2 Bribe-Employee (P) F3 Employee-Sole-Developer (D) F7 Brought-Tools (P) F14 Restricted-Materials-Used (P) F17 Info-Independently-Generated (D) F22 Invasive-Techniques (P) F25 Info-Reverse-Engineered (D) F26 Deception (P)

F7 Brought-Tools (P) F8 Competitive-Advantage (P) F14 Restricted-Materials-Used (P) F17 Info-Independently-Generated (D) F18 Identical-Products (P) F25 Info-Reverse-Engineered (D)

Wrongdoing and

or

Figure 1: Diagrammatic view of VJAP domain model of issues and associated factors.

30

Figure  4:  VJAP  domain  model  for  trade  secret  misappropriation  law  

Page 9: MEMO$ To:!! Students!in!AI!and!Legal!Reasoning!Seminar ... · ! 5! constructed!rules!to!extract!information.8! Figure$1:LUIMAsentenceleveltypesthatlawstudentsneedtounderstandinlearningtoreadacase

  9  

cases  sharing  the  same  tradeoffs  as  a  current  case  and  apply  them  in  textual  arguments.  A  portion  of  its  argument  for  the  Dynamics  case  is  shown  in  Figure  5.  The  underlined  phrases  in   the   text   illustrate   where   the  argument  scheme  introduces  tradeoffs  in  effects  on  values  protected  by  trade  secret   law.   Here,   VJAP   draws   an  analogy  to  the  National-­‐Rejectors  case,  which   involves   similar   tradeoffs   in  value  effects.  

The  argument  graph  represents  all  possible  arguments  about  who  should  win   the   case   given   VJAP’s   domain  knowledge,   argument   schemes,   and  the   other   cases   in   its   corpus   of   121  trade  secret  cases.  Figure  6   illustrates  an   excerpt   of   the   argument   graph   it  generates   for   the   Dynamics   case.   A  quantitative   model   associated   with  the   argument   graph   predicts   the  outcome   in   the   new   case   given   the  value  tradeoffs  in  the  previous  cases.    

The   model   propagates  quantitative  weights  across  the  graph.  The  weights  correspond  to  the  degree  of   confidence   with   which   the  argument  premises  can  be  established  and   that   the   prediction   is   correct.  These   depend   on   the   strength   of  arguments  pro  and  con   the  premises,  which,   in   turn,   depend   on   the  magnitude  of  promotion  or  demotion  of  the  value  in  past  case  contexts.  The  confidence   measure   is   increased   in  relation  to  the  strength  of  the  analogy  between  a  precedent   and   the   current  case  and  decreased  to  the  extent  they  can  be  distinguished.  

In   an   evaluation   of   VJAP,   the  system   learns   in   a   training   step   the  optimal   fact  effect  weight  parameters  to   maximize   prediction   accuracy.   It  uses  simulated  annealing,  a  technique  for   finding   the   global   maximum   of   a  function   like   confidence   while  avoiding  local  maxima.  VJAP  achieved  an   accuracy   of   79.3%   versus   a  majority  label  baseline  of  61%.    

   PEDAGOGICAL   IMPACT:   From   an  instructional   viewpoint,   students   and  I   will   brainstorm   designing   the  

Example(verbaliza.on(in(DYNAMICS,(defendant(on(info%valuable:(Plain&ff's*product*informa&on*is*not*sufficiently*valuable*because*the*plain&ff*has*taken*such*li;le*efforts*to*maintain*the*secrecy*of*the*informa&on*that,*despite*the*lack*of*strong*evidence*for*the*defendant,*it*must*be*assumed*that*plain&ff's*product*informa&on*is*not*sufficiently*valuable*because*deciding*otherwise*would*be*inconsistent*with*the*purposes*underlying*trade*secret*law.*

Specifically,*regarding*the*maintenance*of*secrecy*by*the*plain&ff,*the*public*disclosure*amounts*to*such*a*clear*waiver*of*property*interest,*a*scenario*where*usability*of*public*informa&on*is*cri&cal*and*such*a*clear*waiver*of*confiden&ality*interest*regarding*the*lack*of*maintenance*of*secrecy*by*the*plain&ff*that*the*lack*of*value*of*the*informa&on*must*be*deemed*sufficiently*established*despite*the*lack*of*strong*evidence*for*the*defendant*and*the*fact*that*the*product*informa&on*was*unique.**

A*similar*interDissue*tradeoff*was*made*in*NATIONALAREJECTORS,*which*was*decided*for*defendant.*There,*regarding*the*maintenance*of*secrecy*by*the*plain&ff,*the*disclosure*to*outsiders*amounted*to*such*a*clear*waiver*of*property*interest*and*such*a*clear*waiver*of*confiden&ality*interest,*the*public*disclosure*amounted*to*such*a*clear*waiver*of*property*interest,*a*scenario*where*usability*of*public*informa&on*is*cri&cal*and*such*a*clear*waiver*of*confiden&ality*interest*and*the*absence*of*security*measures*amounted*to*such*a*clear*waiver*of*property*interest*and*such*a*clear*waiver*of*confiden&ality*interest*that*the*reverseDengineerability*qualified*as*the*lack*of*value*of*the*informa&on*despite*the*fact*that*the*product*informa&on*had*been*unique.*

Figure  5:  Excerpt  from  VJAP  argument  for  Dynamics  case  based  on  analogy  to  National  Rejectors  case.  

π arg: i conceded

propmaxconfidence

restatement leaf issue i pro π

π/δ arg: i unambiguous

+ +/-

π argfor tradeoff to

tradeoff to has precedent

premise

prop-maxconfidence

π argfor tradeoff to precedent p

δ argfor tradeoff to precedent d

+ -

p analogous to c re tradeoff to

proportionalconfidence

π argc similar

to p

premise

δ argc different

from p

+ -

+

δ argfor tradeoff to’

tradeoff to’ has precedent

premise

prop-maxconfidence

p analogous to d re tradeoff to’

proportionalconfidence

δ argc similar

to d

premise

π argc different

from d

+ -

-

Restament Model Arguments argument node

statement node

confidence propagation node

Figure 12: Statement and argument structure for reasoning about a Restatement issue with

tradeo↵s in VJAP.

specification followed by an example verbalization from a generated argument where available.

For a small number of schemes (mostly pro-defendant) where a verbalization function was

not implemented, an example verbalization was manually edited from the program’s string

51

π:#plain)ff#δ:#defendant#

Figure  6:  Excerpt  of  VJAP  argument  graph  for  Dynamics  case.  

Page 10: MEMO$ To:!! Students!in!AI!and!Legal!Reasoning!Seminar ... · ! 5! constructed!rules!to!extract!information.8! Figure$1:LUIMAsentenceleveltypesthatlawstudentsneedtounderstandinlearningtoreadacase

  10  

pedagogical   annotation   environment.   They   will   receive   hands-­‐on   experience   with  annotation  tasks  and  an  opportunity  to  see  how  these  enable  a  computer  program  to  make  predictions  and  arguments.  The  Artificial  Intelligence  and  Legal  Analytics  book  provides  an  overview  and  selected  details  on  each  of  the  steps   in  this  process.  Students  will  also   learn  about   the   relevant  metrics   and   approaches   for   evaluation   in   connection  with   each   step.   I  will   prepare   pre-­‐   and   post-­‐test   formative   instruments   to   assess   the   pedagogical  effectiveness  in  terms  of  learning  relevant  concepts  of  argumentation  and  trade  secret  law.  Between   now   and   my   sabbatical,   I   will   be   working   to   prepare   the   mechanisms   and  infrastructure  for  such  pedagogical  annotation  efforts.    

PROPOSED  DAAD  WORK:    With  DAAD  support,  I  will  spend  one  or  two  weeks  visiting  each  of  the  above  groups  in  

Germany.  Based  on   the  experience  of   the  student  design  and  annotation  activity   in  spring  2017,  I  will  present  seminars  and  demonstrations.  I  will  provide  useful  feedback  to  the  UKP  Lab   concerning   applying  WebAnno   to   organize   a   course-­‐focused   instructional   annotation  activity,   engage   in   discussions   with   the   KIT   group   about   corpus   development   for  interpretive  arguments  concerning  policy-­‐  and  valued-­‐based  reasoning,  and  learn  from  the  TUM   team  how   the   techniques   could   be   adapted   to   a  more   business-­‐oriented   annotation  exercise  involving  legal  professionals.    In  particular,  I  expect  that  the  discussions  will  focus  on  the  following.  

UKP:  ADAPTING  ANNOTATION  ENVIRONMENTS  TO  LEGAL  TEXTS  From   the   viewpoint   of   annotation   and   modeling,   legal   texts   evidence   an   explicit  argumentation  structure  and  a  predictable  set  of  semantic  types,  such  as  the  sentence  roles  in  Figure  1.  This  is  one  reason  why  it  makes  pedagogical  sense  to  teach  law  students  about  these   structures   by   giving   them   practice   in   annotating   them.   These   structures   include  appellate   argument   and   reasoning   about   evidence,   for   example,   about   causation.18   Other  domains   like   scientific   argumentation   evidence   similar   structures,   for   example,   sentences  stating  a  hypothesis  or  distinguishing  previous  studies  from  the  current  work.  

Such   structures   and   types   are  more   domain-­‐specific   than   the   proposition/conclusion  structures   on   which   previous   argument   mining   work   has   focused.19   The   more   general  structures  may  be  instrumental  in  helping  to  identify  domain-­‐specific  structures  and  types,  but  the  latter  will  support  the  actual  domain-­‐focused  conceptual  information  retrieval.  For  example,   sentences   that   state   trade   secret   factors   are   likely   to   be   propositions   in   a   legal  argument  whose   conclusion   is   a   legal   claim,   and  more   likely   to   be   contained   in   a   court’s  statement  of  a  finding  of  fact.  Those  more  general  argument  types  may  increase  a  program’s  confidence   in   assigning   a   factor   to   a   new   case,   but   it   is   the   assigned   factor   that   is   more  useful  in  helping  users  find  a  more  relevant  decision  text.      

A  general  topic  for  discussion  with  the  UKP  team  will  be  the  extent  to  which  WebAnno  supports   annotation   of   instances   of   these   more   domain-­‐specific   structures   and   semantic  types.  The  experience  of  the  seminar  students  will  indicate  if  the  current  level  of  support  is  adequate   or   if   and   how   it   might   be   improved.   For   instance,   the  multiple   semantic   types  increase  the  degree  to  which  annotations  overlap,  which  can  lead  to  a  confusing  degree  of  visual  clutter.    

In   addition,   given   some   of   the   semantic   types   in   Figure   1,   such   as   evidence-­‐based  intermediate   reasoning,   legal   policy,   applied   legal   value,   policy-­‐based   reasoning   and,   to  some   extent,   legal   factor,   are   likely   to   be   more   subjective   and   to   challenge   reliability   of  annotation.   In   this   respect,   they  are   similar   to   the   subjectively  defined   labeling   criteria   in  

                                                                                                               18  Walker,  et  al.  op  cit.  19  Mochales  and  Moens,  op.  cit.;  Moens,  et  al.,  op.  cit.,  Levy,  Ran,  et  al.,  op.  cit.  

Page 11: MEMO$ To:!! Students!in!AI!and!Legal!Reasoning!Seminar ... · ! 5! constructed!rules!to!extract!information.8! Figure$1:LUIMAsentenceleveltypesthatlawstudentsneedtounderstandinlearningtoreadacase

  11  

the  IBM  Debater  project.20  Discussion  with  the  UKP  team  may  focus  on  the  level  of  support  WebAnno   provides   for   flagging   and   resolving   specific   disagreements   about   such  annotations.    

In  this  connection,  computer-­‐supported  peer  review  may  play  a  role.  For  example,   the  web-­‐based   SWoRD   system   (Scaffolded   Writing   and   Rewriting   in   the   Disciplines,  http://sword.lrdc.pitt.edu)   implements   reciprocal   peer   review   of   writing.   Instructors’  detailed  criteria  guide  students  in  critiquing  each  other's  essays.  SWoRD  scaffolds  a  cycle  of  writing,  reviews,  back-­‐reviews,  and  rewriting  and  performs  a  variety  of  statistical  reliability  analyses.  I  would  like  to  explore  with  the  UKP  team  the  extent  to  which  peer-­‐review  could  be   adapted   to   the   task   of   helping   students   to   critique   each   other's   annotations,   identify  disagreements,  and  resolve  them,  improving  reliability  in  the  process.  

KIT:  INTERPRETATION  AND  THE  LIMITS  OF  ANNOTATION  I  expect  that  Dr.  Gregor  Betz  and  I  will  discuss  the  potential  for  my  pedagogical  approach  to  annotation,  and   its   limitations,   in   leading  students   to  go  beyond  surface-­‐level  annotations  and   engage   in   deeper   interpretative   argument.     Interestingly,   some   of   the   sentence   role  annotation   types   in   Figure   1   relate   to   legal   policies   and   values   and   their   application   to  specific  factual  scenarios.  These  annotations  indicate  where  content  relevant  to  interpretive  argument  is   located,  although  they  do  not  necessarily  provide  a  human  or  a  program  with  much   information   about   the   meaning   of   that   content.   The   claim-­‐specific   legal   factors,  however,  and  their  associations  with  effects  on  underlying  values,  Figure  2,  do  provide  such  semantic   information.   Thus,   the   surface   level   annotations   to   some   extent   connect   to  elements   of   the   deeper   interpretive   reasoning   on  which   the   KIT   team   focuses.   The   VJAP  program  provides  an  example  of  that  extension  into  deeper  interpretation.  Of  course,  VJAP  takes   advantage   of   certain   constraints   in   legal   reasoning   and   argument   that   are   not  generally  applicable  to  the  social  and  scientific  policy  debates  in  which  the  KIT  team  is  most  interested.  Nevertheless,  I  expect  there  will  be  much  to  discuss.    

TUM:  STATUTORY  AND  REGULATORY  ANNOTATION  FOR  BUSINESS  COMPLIANCE  The  seminar  annotation  activity  will  focus  primarily  on  annotating  case  decisions,  where  we  have  elaborated  a  fairly  comprehensive  type  system  as  in  Figure  1.  Annotating  statutes  and  regulations   is   another   application   that   has   important   ramifications.   There   are   practical  ramifications   for   business   compliance   and   pedagogical   ones,   too,   since   American   law  schools   often   do   not   adequately   prepare   students   to   understand   and   analyze   statutes.  Discussions   with   TUM  may   focus   on   how   to   adapt   the   annotation   tools   and   pedagogical  environment   to   the   task   of   annotating   statutory   and   regulatory   texts.   This   includes  identifying  appropriate  annotation  types,  for  example,  statement  types  such  as  definition  or  obligation,   statement   parts   like   antecedent,   consequent,   or   exception,   and   agent   types   or  roles.  Moreover,   given  TUM’s   expertise   in  dealing  with   the   commercial   legal   sector,   I  will  benefit  from  their  ideas  about  how  best  to  adapt  annotation  environments  for  instruction  in  professional  business,  as  opposed  to  solely  academic,  contexts.  

My   interactions   with   these   groups   and   the   visits   will   help   us   to   identify   fruitful  collaborations   and   to   generate   fundable   research   proposals   based   on   this  work.   I   expect  that  my  DAAD-­‐supported  activities  will  result   in  a  novel   technologically-­‐based  component  of  a   law  school  curriculum,  and  new  techniques  that   law  school   instructors  can  employ  to  improve  teaching  and  student  learning.    I  also  expect  it  will  result  in  several  conference  and  journal  articles.  

                                                                                                               20  Aharoni,  Ehud,  Polnarov,  Anatoly,  Lavee,  Tamar,  Hershcovich,  Daniel,  Levy,  Ran,  Rinott,  Ruty,  Gutfreund,  Dan,  and  Slonim,  Noam.  2014a.  A  benchmark  dataset  for  automatic  detection  of  claims  and  evidence  in  the  context  of  controversial  topics.  ACL  2014,  64.