Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm

Tóth, Krisztina and Farkas, Richárd and Kocsor, András: Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm. Acta cybernetica, (18) 3. pp. 463-478. (2008)

[img] Cikk, tanulmány, mű
Toth_2008_ActaCybernetica.pdf

Download (199kB)

Abstract

We present an efficient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accuracy of anchor finding methods. The accuracy of finding cognates for Hungarian-English language pair is extremely low, hence we thought of using a novel approach that includes Named Entity recognition. Due to the well selected anchors it was found to outperform the best two sentence alignment algorithms so far published for the Hungarian-English language pair.

Item Type: Article
Event Title: Conference for PhD Students in Computer Science, 5., 2006, Szeged
Journal or Publication Title: Acta cybernetica
Date: 2008
Volume: 18
Number: 3
Page Range: pp. 463-478
ISSN: 0324-721X
Language: angol
Uncontrolled Keywords: Természettudomány, Informatika
Additional Information: Bibliogr.: p. 476-477.; Abstract
Date Deposited: 2016. Oct. 15. 12:25
Last Modified: 2018. Jun. 05. 14:35
URI: http://acta.bibl.u-szeged.hu/id/eprint/12830

Actions (login required)

View Item View Item