Evaluating contextualized language models for Hungarian

Ács Judit and Lévai Dániel and Nemeskey Dávid Márk and Kornai András: Evaluating contextualized language models for Hungarian.

[thumbnail of msznykonf_017_015-028.pdf]
Cikk, tanulmány, mű

Download (895kB) | Preview


We present an extended comparison of contextualized language models for Hungarian. We compare huBERT, a Hungarian model against 4 multilingual models including the multilingual BERT model. We evaluate these models through three tasks, morphological probing, POS tagging and NER. We find that huBERT works better than the other models, often by a large margin, particularly near the global optimum (typically at the middle layers). We also find that huBERT tends to generate fewer subwords for one word and that using the last subword for token-level tasks is generally a better choice than using the first one.

Item Type: Conference or Workshop Item
Heading title: Nyelvmodellek
Journal or Publication Title: Magyar Számítógépes Nyelvészeti Konferencia
Date: 2021
Volume: 17
ISBN: 978-963-306-781-9
Page Range: pp. 15-28
Language: English
Event Title: Magyar számítógépes nyelvészeti konferencia (17.) (2021) (Szeged)
Related URLs: http://acta.bibl.u-szeged.hu/73340/
Uncontrolled Keywords: Nyelvészet - számítógép alkalmazása
Additional Information: Bibliogr.: p. 25-28. és a lábjegyzetekben ; összefoglalás angol nyelven
Subjects: 01. Natural sciences
01. Natural sciences > 01.02. Computer and information sciences
06. Humanities
06. Humanities > 06.02. Languages and Literature
Date Deposited: 2021. Sep. 28. 09:57
Last Modified: 2022. Nov. 08. 11:49
URI: http://acta.bibl.u-szeged.hu/id/eprint/73354

Actions (login required)

View Item View Item