development of a web application to manage and edit ......development of a web application to manage...

15
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Development of a web application to manage and edit semantically annotated texts Thomas Grass, 07. September 2015

Upload: others

Post on 17-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de

Development of a web application to manage and edit semantically annotated texts

Thomas Grass, 07. September 2015

Page 2: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Key Facts

© sebis 150907 – Grass Thomas - Development of a web application to manage and edit semantically annotated texts 2

Master‘s Thesis – Information Systems Development of a web application to manage and edit semantically annotated texts

Masterarbeit - Wirtschaftsinformatik Entwicklung einer Web-Anwendung zur Verwaltung und Bearbeitung von semantisch annotierten Textsammlungen

Project LexAlyze - Analysis of Legal Texts

Student Thomas Grass Advisor Bernhard Waltl Supervisor Prof. Dr. Florian Matthes

Date 15.03.2015 – 15.09.2015

Page 3: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Agenda – An Overview

1.  Introduction, Problem & Basic Theory 2.  Scope & Research Questions

3.  Semantic Text Annotations

4.  Architecture & Implementation 1.  Overview 2.  Legal Documents 3.  Generic Importer

5.  Demonstration

6.  Conclusion

© sebis 150907 – Grass Thomas - Development of a web application to manage and edit semantically annotated texts 3

Page 4: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Problem / Basic Theory

© sebis 150907 – Grass Thomas - Development of a web application to manage and edit semantically annotated texts 4

Legal domain Nowadays, legal texts are hard to read and understand

Advanced text analysis Methods for performing advanced text analysis rapidly evolve

+

Usage of quantitative methods of structural network analysis and linguistics provide the possibility to do high class text analysis and comparison of different legal texts

=

Page 5: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Research Questions

© sebis 150907 – Grass Thomas - Development of a web application to manage and edit semantically annotated texts 5

What kind of legal texts exist in the German legislation?

What is a way to implement a generic importer for legal texts that can easily be adapted?

What kind of semantic text annotations exist and what are benefits and drawbacks of those?

How to persist semantic text annotations in order to access them for further semantic processing?

Page 6: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Semantic text annotations (1/3)

© sebis 150907 – Grass Thomas - Development of a web application to manage and edit semantically annotated texts 6

-  Add markups to raw text -  Fit raw text with additional information -  Can be done in-line or stand-off

Semantic text annotations

In-line annotation

-  Single file -  Add semantic text

annotations in raw text file

Stand-off annotation

-  Two files -  Raw text file -  Annotation file

-  Annotation file points at locations in raw text file

Page 7: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Semantic text annotations (2/3)

© sebis 150907 – Grass Thomas - Development of a web application to manage and edit semantically annotated texts 7

-  Add semantic text annotations in raw text file In-line semantic text annotation

Benefits -  Single file usage -  Interpretable by human -  Easy to implement

<sentence>! <subject syllables=“2”>Homer</subject> ! <verb>likes</verb>! <adjective type=“color”>blue</adjective>! <object size=“XXL”>jeans</object>.!</sentence>!

Drawbacks -  Manipulation in raw text file -  Overlapping annotations not

possible -  Analysis needs a bit of work

Page 8: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Semantic text annotations (3/3)

© sebis 150907 – Grass Thomas - Development of a web application to manage and edit semantically annotated texts 8

-  Add second file that contains the annotations Stand-off semantic text annotation

Benefits -  Raw text will not

manipulated -  Easy to perform analysis -  Overlappings are possible

<sentence>! <subject start=“0” end=“5” syllables=“2” /> ! <verb start=“6” end=“11” />! <adjective start=“12” end=“16” type=“color” />! <object start=“17” end=“22” size=“XXL” />!</sentence>!

Drawbacks -  Two files needed -  Update of raw text needs

update of annotation file -  Not interpretable by human

Homer likes blue jeans.!

Page 9: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Architecture & Implementation (1/3)

© sebis 150907 – Grass Thomas - Development of a web application to manage and edit semantically annotated texts 9

SocioCortex

Importer Interface

Data Access

User Interface

Text Mining Engine

LEXIA

Page 10: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Architecture & Implementation (2/3)

© sebis 150907 – Grass Thomas - Development of a web application to manage and edit semantically annotated texts 10

<<abstract>>!LegalDocument

title!

<<abstract>>!LegislativeDocument

shortTitle!

<<abstract>>!JurisprudenceDocument

court!dateOfJudgement!

<<abstract>>!LiteratureDocument

author!publisher!isbn!

Law

promulgDate!

Delegation

authority!

Judgement

transactNr!

Decision

decicionNr!

ArticleContainer

title!

Article

content!

1! *!

1!*!

1!

*!Legal document structure

Page 11: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Architecture & Implementation (3/3)

© sebis 150907 – Grass Thomas - Development of a web application to manage and edit semantically annotated texts 11

Generic importer structure

<<abstract>>!

LiteratureImporter

<<abstract>>!

LegalDocumentImporter

<<abstract>>!

JudgementsImporter <<abstract>>!

LawImporter

<<interface>>!

XMLImporterInterface <<interface>>!

PDFImporterInterface <<interface>>!

JSONImporterInterface

GermanLawsImporter

AktGLawsImporter

BGHJudgementsImporter

LGMJudgementsImporter

Page 12: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Demonstration

© sebis 150907 – Grass Thomas - Development of a web application to manage and edit semantically annotated texts 12

Livedemo

Page 13: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Conclusion and Outlook (1/2)

© sebis 150907 – Grass Thomas - Development of a web application to manage and edit semantically annotated texts 13

-  German Legislation exists of various legal document types -  e.g. Laws, Decisions, Judgements, … -  existing model can easily be extended & adapted

-  Generic importer for different document- and file-types

-  Two types of semantic text annotations -  In-line & Stand-off semantic text annotation -  Prototypical implementation of Stand-off

-  SocioCortex for persisting raw texts and annotations

-  MXL for selecting and quering legal documents

Summary

Open for further work

-  Adding new sources -  Adding any annotation -  Extendable

Open issues & restrictions

-  Slow interaction with SocioCortex (Bulk-Load)

-  Success depends on sources

Page 14: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Conclusion and Outlook (2/2)

© sebis 150907 – Grass Thomas - Development of a web application to manage and edit semantically annotated texts 14

-  Adding of advanced text analysis functionality -  (October 2015, Tobias Waltl)

-  Integration of new data sources, (e.g. contracts) -  Foundation for further advanced text analysis functionality

-  Determination of use cases

Upcoming

Page 15: Development of a web application to manage and edit ......Development of a web application to manage and edit semantically annotated texts Masterarbeit - Wirtschaftsinformatik Entwicklung

Technische Universität München Department of Informatics Chair of Software Engineering for Business Information Systems Boltzmannstraße 3 85748 Garching bei München Tel +49.89.289. Fax +49.89.289.17136 wwwmatthes.in.tum.de

Thomas Grass B.Sc.

17124

[email protected]

Thank you for your attention!