atul ud paper - jawaharlal nehru...
TRANSCRIPT
UNIVERSAL DEPENDENCY TREEBANKS FORLOW-RESOURCE INDIAN LANGUAGES: THECASE OF BHOJPURI
DescriptionResource BuildingSyntactically annotated treebankUD Framework4881 annotated tokensML-based Tagger and ParserData Source: BLTRDomain: news and non-fiction5000 sentences (105,174 tokens)254 sentences(4881 tokens ) manually annotatedXPOS and UPOS tagsSupport: Hindi Treebank | BIS Tagset
Charles UniversityFaculty of Mathematics and Physics
Institute of Formal and Applied [email protected]
Atul Kr. Ojha Daniel Zeman
Bhojpuri
57.49% UAS | 45.50% LAS 79.69% UPOS | 77.64% XPOS
Accuracy
Indo_Aryan LanguageBihar | Jharkhand | Uttar PradeshNepal | trinindad | Mauritius | Guyana | Suriname | FijiSpeakers: 50,579,447Resource Poor Language for ML
Statistics of morphological features
Statistics of UPOS tags
UD relations. Out of 37 we use 30
Accuracy of a UDPipe model trained on the Hindi UDtreebank (HDTB) and applied to the first 50 Bhojpurisentences.
UDPipe accuracy of the conducted experiments
Learning curve of the Bhojpuri models
AcknowledgementsThis work has been supported by LINDAT/CLARIAH-CZ andKhresmoi, the grants no. LM2018101 and 7E11042 of theMinistry of Education, Youth and Sports of the CzechRepublic, and FP7-ICT-2010-6-257528 of the EuropeanUnion.