At this year's edition of the EACL (European Chapterof the Association for Computational Linguistics) conference held in Dubrovnik, Garbriela Palka and Artur Nowakowski presented their own solution to the 4th Shared Task on SlavNER Recognition, Normalization, Classification and Cross-lingual linking of Named Entities in Slavic Languages.
This year's edition focused on the analysis of Name Entities in multilingual documents compiled in Polish, Czech and Russian. The task involved detecting and defining categories of entities such as person (PER), location (Loc), organization (Org), product (Pro), and event (Evt).
Due to the rich inflection, free word order, derivation and other phenomena found in Slavic languages, working on named units is a major challenge. Supporting research and development on named-unit problems - name mention detection, lemmatization (normalization), classification and cross-linguistic matching - is crucial for cross-linguistic access to information and wider use of NLP in Slavic languages.
Seven teams applied for this year's edition, of which three submitted a solution: Tilde, CTC and AMU. Our team (AMU) achieved high scores in the recognition and lemmatization phases - 88.8 - 91.5 in the former, and 76.9 - 82.4 in the latter. Another team in this phase, Tilde achieved scores in the range of 53.9-72.6.
The models developed for the above task are made available on the profile CSI HuggingFace