dr. slavko zitnik · 2021. 2. 23. · dr. slavko zitnik (fri) nlp class (63555) february 20211/32....
TRANSCRIPT
Natural language processing class
dr. Slavko Zitnik
University of LjubljanaFaculty for computer and information science
February 2021
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 1 / 32
Natural language processing
visualization
association analysis
link mining
information extraction
pattern recognition
lexical analysis
information retrieval
coreference resolution
relation modeling
named entity recognition
document summarization
sentiment analysis
sentiment analysis
entity extraction
concept extraction
text clustering text categorization
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 2 / 32
About 293,000 results (0.19 seconds)
ivan cankar biografijaivan cankar deseticaivan cankar moje življenjeivan cankar na klancu
ivan cankar hlapciivan cankar skodelica kaveivan cankar življenjepisivan cankar moje življenje obnova
Searches related to ivan cankar
1 2 3 4 5 6 7 8 9 10 Next
Ivan Cankar Wikipedia, the free encyclopediaen.wikipedia.org/wiki/Ivan_CankarIvan Cankar was born in the Carniolan town of Vrhnika near Ljubljana. He was one ofthe many children of a poor artisan who emigrated to Bosnia shortly after ...Biography Work Personality and world view Influence
Ivan Cankar Wikipedija, prosta enciklopedijasl.wikipedia.org/wiki/Ivan_Cankar Translate this pageIvan Cankar je za svoje pisateljsko delo uporabljal številne šifre in psevdonime. Ti soznačilni predvsem za zgodnja leta njegovega ustvarjanja. Izmišljena ...Življenje Delo Psevdonimi Bibliografija
Ivan Cankar Wikisource, the free online libraryen.wikisource.org/wiki/Author:Ivan_CankarMar 19, 2014 Author:Ivan Cankar. From Wikisource. Jump to: navigation, search.←Author Index: Ca, Ivan Cankar (1876–1918) ...
Ivan Cankar – Wikivirsl.wikisource.org/wiki/Ivan_Cankar Translate this pageMay 28, 2014 Ivan Cankar. Iz Wikivira, proste knjižnice besedil v javni lasti. Skoči na:navigacija, iskanje. Ivan Cankar (1876–1918). Glej tudi življenjepis ...
Ivan Cankar – največji mojster slovenske besede Veliki ...www.kam.si › Novice › Veliki Slovenci Translate this pageIvan Cankar se je rodil na Vrhniki (na Klancu) 10. maja 1876 kot osmi otrok vpropadajoči obrtniško – proletarski družini trškega krojača. Mladost je preživel na ...
[DOC] Ivan Cankar (1876 1918) Dijaski.netwww.dijaski.net/.../slo_dob_cankar_ivan_hlapci_09__... Translate this pageIvan Cankar (1876 1918). Največji mojster slovenske besede in osrednja postava vmoderni književnosti izvira iz revne družine z Vrhnike. Na Klancu ...
[PDF]ŽIVLJENJEPIS Ivan Cankar Dijaski.netwww.dijaski.net/get/slo_rfk_cankar_ivan_07.pdf Translate this pageIvan Cankar se je rodil 10. maja leta 1876 v kmečko družino na. Vrhniki. V družini jebilo osem otrok. Zapustil je družino. Ker je bil zelo nadarjen učenec in je ...
Cankar, Ivan (1876–1918) Slovenska biografijawww.slovenskabiografija.si/oseba/sbi155071/ Translate this pageCankar Ivan, pesnik, r. 10. maja 1876 na Vrhniki, u. 11. dec. 1918 v Lj. Pokopan je vskupnem grobišču s Kettejem in Murnom pri Sv. Križu; spomenik jim je dala ...
Cankarjeva smrt je bila političen umor Zgodovina Hervardiwww.hervardi.com/smrt_ivana_cankarja.php Translate this pageCankarjeva smrt je bila političen umor. Pisatelj, politik in ljudski tribun Ivan Cankar IvanCankar velja za največjega slovenskega pisatelja in ni ga Slovenca, ...
Ivan Cankar Občina Vrhnikawww.vrhnika.si/?m=pages&id=17 Translate this pageDomov > Občina Vrhnika > Znani Vrhničani > Ivan Cankar. Znani Vrhničani. IvanCankar | Simon Ogrin | Jožef Petkovšek | Karel Grabeljšek | France Kunstelj ...
Help Send feedback Privacy & Terms Use Google.com
Web Images Books Videos Search toolsMore
+Slavko Shareivan cankar
About 293,000 results (0.28 seconds)
ivan cankar biografijaivan cankar deseticaivan cankar moje življenjeivan cankar na klancu
ivan cankar hlapciivan cankar skodelica kaveivan cankar življenjepisivan cankar moje življenje obnova
Searches related to ivan cankar
1 2 3 4 5 6 7 8 9 10 Next
Ivan Cankar Wikipedia, the free encyclopediaen.wikipedia.org/wiki/Ivan_CankarIvan Cankar was born in the Carniolan town of Vrhnika near Ljubljana. He was one ofthe many children of a poor artisan who emigrated to Bosnia shortly after ...Biography Work Personality and world view Influence
Ivan Cankar Wikipedija, prosta enciklopedijasl.wikipedia.org/wiki/Ivan_Cankar Translate this pageIvan Cankar je za svoje pisateljsko delo uporabljal številne šifre in psevdonime. Ti soznačilni predvsem za zgodnja leta njegovega ustvarjanja. Izmišljena ...Življenje Delo Psevdonimi Bibliografija
Ivan Cankar Wikisource, the free online libraryen.wikisource.org/wiki/Author:Ivan_CankarMar 19, 2014 Author:Ivan Cankar. From Wikisource. Jump to: navigation, search.←Author Index: Ca, Ivan Cankar (1876–1918) ...
Ivan Cankar – Wikivirsl.wikisource.org/wiki/Ivan_Cankar Translate this pageMay 28, 2014 Ivan Cankar. Iz Wikivira, proste knjižnice besedil v javni lasti. Skoči na:navigacija, iskanje. Ivan Cankar (1876–1918). Glej tudi življenjepis ...
Ivan Cankar – največji mojster slovenske besede Veliki ...www.kam.si › Novice › Veliki Slovenci Translate this pageIvan Cankar se je rodil na Vrhniki (na Klancu) 10. maja 1876 kot osmi otrok vpropadajoči obrtniško – proletarski družini trškega krojača. Mladost je preživel na ...
[DOC] Ivan Cankar (1876 1918) Dijaski.netwww.dijaski.net/.../slo_dob_cankar_ivan_hlapci_09__... Translate this pageIvan Cankar (1876 1918). Največji mojster slovenske besede in osrednja postava vmoderni književnosti izvira iz revne družine z Vrhnike. Na Klancu ...
[PDF]ŽIVLJENJEPIS Ivan Cankar Dijaski.netwww.dijaski.net/get/slo_rfk_cankar_ivan_07.pdf Translate this pageIvan Cankar se je rodil 10. maja leta 1876 v kmečko družino na. Vrhniki. V družini jebilo osem otrok. Zapustil je družino. Ker je bil zelo nadarjen učenec in je ...
Cankar, Ivan (1876–1918) Slovenska biografijawww.slovenskabiografija.si/oseba/sbi155071/ Translate this pageCankar Ivan, pesnik, r. 10. maja 1876 na Vrhniki, u. 11. dec. 1918 v Lj. Pokopan je vskupnem grobišču s Kettejem in Murnom pri Sv. Križu; spomenik jim je dala ...
Cankarjeva smrt je bila političen umor Zgodovina Hervardiwww.hervardi.com/smrt_ivana_cankarja.php Translate this pageCankarjeva smrt je bila političen umor. Pisatelj, politik in ljudski tribun Ivan Cankar IvanCankar velja za največjega slovenskega pisatelja in ni ga Slovenca, ...
Ivan Cankar Občina Vrhnikawww.vrhnika.si/?m=pages&id=17 Translate this pageDomov > Občina Vrhnika > Znani Vrhničani > Ivan Cankar. Znani Vrhničani. IvanCankar | Simon Ogrin | Jožef Petkovšek | Karel Grabeljšek | France Kunstelj ...
Ivan Cankar was a Slovene writer, playwright, essayist, poet and politicalactivist. Together with Oton Župančič, Dragotin Kette, and Josip Murn,he is considered as the beginner of modernism in Slovene literature.Wikipedia
Born: May 10, 1876, Vrhnika
Died: December 11, 1918, Ljubljana
Education: University of Vienna
More images
Ivan CankarWriter
People also search for
FrancePrešeren
OtonŽupančič
DragotinKette
SrečkoKosovel
Josip Murn
View 15+ more
Feedback
Help Send feedback Privacy & Terms Use Google.com
Web Images Books Videos Search toolsMore
+Slavko Shareivan cankar
DBPedia entry
DBPedia entry
Text excerpt
Zoogle - Traditional
http://tradicionalni-iskalnik.si
Ivan Cankar Išči
Ivan Cankar - Wikipedija, prosta enciklopedijaIvan Cankar se je rodil v hiši Na klancu 141, kot eden od dvanajstih otrok obrtniško-proletarske družine. Leta 1882 se je vpisal v osnovno …
Ivan Cankar – največji mojster slovenske besedeIvan Cankar se je rodil na Vrhniki (na Klancu) 10. maja 1876 kot osmi otrok v propadajoči obrtniško – proletarski družini trškega krojača …
Cankarjeva smrt je bila političen umorCankarjeva smrt je bila političen umor. Pisatelj, politik in ljudski tribun Ivan Cankar Ivan Cankar velja za največjega slovenskega pisatelja in …
[PDF] Ivan Cankar (1876 - 1918)Ivan Cankar (1876 - 1918). Največji mojster slovenske besede in osrednja postava v moderni književnosti izvira iz revne družine …
ŽIVLJENJEPIS Ivan CankarIvan Cankar se je rodil 10. maja leta 1876 v kmečko družino na. Vrhniki. V družini je bilo osem otrok. Zapustil je družino. Ker je bil zelo …
Ivan Cankar memorial houseIvan Cankar memorial house. Ivan Cankar (1876 – 1918) is considered to be Slovenia's most important writer. The original house …
1 | 2 | 3 | 4 | 5 | … | Zadnja
666 najdenih zadetkov:
Zoogle - Semantic
http://semantični-iskalnik.si
Ivan Cankar Išči
Ivan Cankar - Wikipedija, prosta enciklopedijaIvan Cankar se je rodil v hiši Na klancu 141, kot eden od dvanajstih otrok obrtniško-proletarske družine. Leta 1882 se je vpisal v osnovno …
Ivan Cankar – največji mojster slovenske besedeIvan Cankar se je rodil na Vrhniki (na Klancu) 10. maja 1876 kot osmi otrok v propadajoči obrtniško – proletarski družini trškega krojača …
Informacije ekstrahirane iz 25 zadetkov, najdenih 666:
Vrhnika, Na Klancu
Ljubljanska Realka Cankarjeva mati
Kosovel
Josip Murn
Dragotin Kette
Hlapec Jernej innjegova pravica
jeNapisal
prijatelj
prijatelj
prijatelj
matiseJeŠolal
rojenV
Ivan Cankar
Information extraction
Definition
Information extraction
type of information retrieval
goal to automatically extract structureddata from unstructured data sources
Subtasks
named entity recognition
relationship extraction
coreference resolution
Preprocessing
Information extraction method
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 7 / 32
Information extraction
Preprocessing
John is married to Jena . They work at OBI .
Sentence detection
Tokenization
Lemmatization
Part-of-speech tagging
Dependency parsing
John is married to Jena . They work at OBI .
John is married to Jena . They work at OBI .
John be marry to Jena . They work at OBI .
NNP VBZ VBN TO NNP . PRP VBP IN NNP .
John is married to Jena . They work at OBI .
nsubjpass
auxpasspobjprep pobj
nsubj prep
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 8 / 32
Information extraction
General approaches
Informa(on)extrac(on
Pa/ern0based Machine)learning0based
Discovery Rules Probabilis(c Induc(on!"HMM,"CRF!"N!gram!"SVM,"naive"Bayes,"...
!"Linguis:c!"Structural
!"JAPE!"Taxonomy"label"matching
!"Seed"expansion
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 9 / 32
Information extraction Main information extraction tasks
Named entity recognition
Person Person Position Organization
Organizacija
John is married to Jena . He is a mechanic at OBI and she also works there .
It is a DIY market .
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 10 / 32
Information extraction Main information extraction tasks
Relationship extraction
John is married to Jena . He is a mechanic at OBI and she also works there .
It is a DIY market .
employedAtemployedAt
isA
hasProfession
marriedWith
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 11 / 32
Information extraction Main information extraction tasks
Coreference resolution
It is a DIY market .
John is married to Jena . He is a mechanic at OBI and she also works there .
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 12 / 32
Information extraction Main information extraction tasks
End-to-end information extraction
John is married to Jena . He is a mechanic at OBI and she also works there .
It is a DIY market .
Person Person Position Organization
Organization
employedAtemployedAt
isA
hasProfession
marriedTo
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 13 / 32
About the course
Course goals
Goals
To study algorithms and methods for building computational modelsof natural language processing.
To study issues involved in understanding natural languages togetherwith cognitive and linguistic phenomena.
To identify a text processing problem, design a solution andpractically solve it.
To get to know existing NLP approaches, techniques, tools and thestate-of-the-art in the field.
To become profficient in the end-to-end text processing problemshandling.
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 14 / 32
About the course
Syllabus
Proposed syllabus
Corpora acquisition and tagging, preprocessing techniques
Information extraction tasks and systems
Slovene text processing
Regular expressions, rule based systems
Semantic web and ontologies (optional in 2021)
Unsupervised learning and visualisation
Classification and tagging techniques
(Deep) Neural networks for text
Text processing assignment (practical work)
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 15 / 32
About the course
Timetable - tentative
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 16 / 32
About the course
Grading
Grading
First defense (10 points): task selection & simple corpusprocessing/analysis
Introduction, existing solutions, initial ideas.
Interim defense (10 points): at least one example of a solution to aproblem
Introduction, related work, implemented baseline, future directions
Final defense (30 points): full submission and presentation
clean Git repository (fully reproducible) and final report
Rules
Attendance is preferred but mandatory at the assignment defensedates.
Workshop pass condition: at least 25 points.
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 17 / 32
About the course
Assignment conditions
At the assignment defense dates at least one member of a group mustbe present, otherwise all need to provide their doctor’s justification.
At the last assignment all members must be present and also need tounderstand all parts of the submitted solution.
Students must work in groups of two to three members! Groups withtwo members have same pass conditions - they do not need tocoordinate so much and are also more independent.
Each group will have to grant access to a private GIT repository withat least read permissions to the assistant (GitHub username: szitnik).
The distribution of work between members should be seen bycommits within the repository.
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 18 / 32
About the course
Assignment goals
Goals
End-to-end NLP task processing
Understanding of existing works
Presenting results in an article-like report
Start early so that there will be enough time for fine tuning!
Tools
Python 3.6 (Anaconda) and a deep NLP library (PyTorch, Tensorflow,Keras)
Preferred IDE: JetBrains PyCharm (free for students, otherwiseCommunity Edition)
Other tools & languages: whichever you prefer (Scala and IntelliJIDEA)
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 19 / 32
Submission
SUBMISSION
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 20 / 32
Submission
Submission
PDF Report
Follow IMRAD (Introduction-Methods-Results-And-Discussion)structure. A full paper should have Abstract, Introduction, Relatedwork, (Data), Methods/Algorihtms/...,Results/Evaluation/Experiments, (Discussion) and Conclusion.
Use provided LaTeX template.
Submit a manuscript 6-8 pages long (max) including references.Longer papers should be discussed with the assistant.
Better manuscripts will be allowed to publish them on arXiv.org whilethe best could be extended to a Journal/Conference paper(Slovenscina 2.0, Uporabna informatika, etc.)
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 21 / 32
Submission
Submission
PDF Report
Follow IMRAD (Introduction-Methods-Results-And-Discussion)structure. A full paper should have Abstract, Introduction, Relatedwork, (Data), Methods/Algorihtms/...,Results/Evaluation/Experiments, (Discussion) and Conclusion.
Use provided LaTeX template.
Submit a manuscript 6-8 pages long (max) including references.Longer papers should be discussed with the assistant.
Better manuscripts will be allowed to publish them on arXiv.org whilethe best could be extended to a Journal/Conference paper(Slovenscina 2.0, Uporabna informatika, etc.)
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 21 / 32
LATEX template
Submission
Submission
Git repository
Organize code and repository structure according to your task.
Proposed structure should separate code, models, data or othermaterials in separate folders.
README.md should contain short description of the project,instructions for compiling and running the project and course/authorsdata.
Follow good practices from existing Git repositories that you will findduring the course work.
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 22 / 32
Related work
RELATED WORK
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 23 / 32
Related work
Related work
Related work search
NLP Journals, conferences
Computational linguistics, TACL, NLE, IJCLACL 2020, EMNLP 2020, NIPS 2020
NLP Shared Tasks, workshops
CoNLL 2020 SemEval 2020, CodaLab
Code, paper repositories
arXiv.org, Google Scholar, Papers with code
Slovene NLP resources
NLP Journals, conferences
Slovenscina 2.0JTDH 2020
NLP resources, projects
CLARIN.SI, CJVT, slovenscina.eu
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 24 / 32
Assignments
ASSIGNMENTS
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 25 / 32
Assignments
Assignment selection
The selected assignment task should involve richer text data processingand not only feature extraction and direct machine learning classification.Evaluation and discussion is the main objective.Possible options:
(A) IMapBook collaborative discussions classification
(B) Offensive language exploratory analysis
(C) Cross-lingual offensive language identification
(D) Automatic language translation (joint work with UL FF)
Custom ideas must be approved by the assistant (proposals must be based onstrong related work knowledge and solution ideas).
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 26 / 32
Assignments
(A) IMapBook collaborative discussions classification
: Difficulty: easy
Data was gathered within IMapBook system in schools. Afterparticipants (primary school) read a book, book clubs were formed.Each book club had a discussion question to answer. Participantscould chat to collaboratively provide the final answer (like in GoogleDoc). The goal is to classify each of the chat messages into specifiedcategories.
Data (TO BE PROVIDED) contains cca. 800 chat messages, booktext, final responses and annotation instructions.
Goals:
How good can we classify postings into predefined categories?Could (a) a result of a collaborative discussion or (b) eBook textcontribute in training an algorithm to provide better results?
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 27 / 32
Assignments
(A) IMapBook collaborative discussions classification
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 28 / 32
Assignments
(B) Offensive language exploratory analysis
: Difficulty: easy/moderate
Data (TO BE PROVIDED) consists of cca. 65 offensive languagedatasets (around 25 in English).
Goals:
Offensive language exploratory analysis (importance of specifickeywords, relationships between categories, ...).Description of existing offensive language texts using BoW, TF-IDFand pre-trained word embeddings (non-contextual - Word2Vec, Glove,fastText and contextual - BERT, ELMo).Cross-lingual mappings (e.g. from English to Slovene) using e.g.LASER toolkit and explanations.Meaningful visualizations/representations of distances of existingannotation classes.
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 29 / 32
Assignments
(C) Cross-lingual offensive language identification
: Difficulty: moderate
Data (TO BE PROVIDED) consists of cca. 65 offensive languagedatasets (around 25 in English).
Offensive language text classification for selected datasets (max. onedataset from Twitter)
Training on english data and transfer of model into Slovene languageusing multi-lingual pre-trained models (e.g. CroSloEn BERT,mBERT, XLM-R) or embeddings alignment (see Ales Zagar’s mastersor Zan Pecovnik’s diploma)
Slovene data retrieval and automatic classification intooffensive/offensive classes. Manual error analysis (at least 100examples).
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 30 / 32
Assignments
(D) Automatic language translation (joint work with ULFF)
: Difficulty: moderate/hard
Available parallel corpora - https://opus.nlpl.eu. Useful datasets:OpenSubtitles 2016 and 2018 (use other subtitle data with caution!),EUparl, EMEA, DGT, ELRC. Do not use bad corpora such asEUbookshop !!!
Cooperation with UL FF students (traslators) to prepare additionaldata/validation set, ...
Main task: selection of a translation framework and running modelson distributed GPU-enabled machined (e.g. SLING). Detaileddescription of your work and analyses. Examples of frameworks:Fairseq (FB), Marian NMT (MS), T5 (Google), XLM-Roberta.
dr. Slavko Zitnik (FRI) NLP class (63555) February 2021 31 / 32