HuSpaCy : an industrial-strength Hungarian natural language processing toolkit

Orosz György and Szántó Zsolt and Berkecz Péter and Szabó Gergő and Farkas Richárd: HuSpaCy : an industrial-strength Hungarian natural language processing toolkit. In: Magyar Számítógépes Nyelvészeti Konferencia, (18). pp. 59-73. (2022)

[thumbnail of msznykonf_018_059-073.pdf] Cikk, tanulmány, mű
msznykonf_018_059-073.pdf

Download (462kB)

Abstract

Although there are a couple of open-source language processing pipelines available for Hungarian, none of them satisfies the requirements of today’s NLP applications. A language processing pipeline should consist of close to state-of-the-art lemmatization, morphosyntactic analysis, entity recognition and word embeddings. Industrial text processing applications have to satisfy non-functional software quality requirements, what is more, frameworks supporting multiple languages are more and more favored. This paper introduces HuSpaCy, an industryready Hungarian language processing toolkit. The presented tool provides components for the most important basic linguistic analysis tasks. It is open-source and is available under a permissive license. Our system is built upon spaCy’s NLP components resulting in an easily usable, fast yet accurate application. Experiments confirm that HuSpaCy has high accuracy while maintaining resource-efficient prediction capabilities.

Item Type: Article
Heading title: Nyelvmodellek
Journal or Publication Title: Magyar Számítógépes Nyelvészeti Konferencia
Date: 2022
Volume: 18
ISBN: 978-963-306-848-9
Page Range: pp. 59-73
Language: English
Place of Publication: Szeged
Event Title: Magyar számítógépes nyelvészeti konferencia (18.) (2022) (Szeged)
Related URLs: http://acta.bibl.u-szeged.hu/75797/
Uncontrolled Keywords: Nyelvészet - számítógép alkalmazása
Additional Information: Bibliogr.: p. 70-73. és a lábjegyzetekben ; összefoglalás angol nyelven
Subjects: 01. Natural sciences
01. Natural sciences > 01.02. Computer and information sciences
06. Humanities
06. Humanities > 06.02. Languages and Literature
Date Deposited: 2022. May. 24. 15:03
Last Modified: 2022. May. 24. 15:03
URI: http://acta.bibl.u-szeged.hu/id/eprint/75865

Actions (login required)

View Item View Item