Statistical language models within the algebra of weighted rational languages

Hanneforth, Thomas and Würzner, Kay-Michael: Statistical language models within the algebra of weighted rational languages. Acta cybernetica, (19) 2. pp. 313-356. (2009)

[img] Cikk, tanulmány, mű
Hanneforth_2009_ActaCybernetica.pdf

Download (702kB)

Abstract

Statistical language models are an important tool in natural language processing. They represent prior knowledge about a certain language which is usually gained from a set of samples called a corpus. In this paper, we present a novel way of creating N-gram language models using weighted finite automata. The construction of these models is formalised within the algebra underlying weighted finite automata and expressed in terms of weighted rational languages and transductions. Besides the algebra we make use of five special constant weighted transductions which rely only on the alphabet and the model parameter N. In addition, we discuss efficient implementations of these transductions in terms of virtual constructions.

Item Type: Article
Event Title: Weighted Automata: Theory and Applications, 2008, Dresden
Journal or Publication Title: Acta cybernetica
Date: 2009
Volume: 19
Number: 2
Page Range: pp. 313-356
ISSN: 0324-721X
Language: angol
Uncontrolled Keywords: Természettudomány, Informatika
Additional Information: Bibliogr.: p. 346-349.; Abstract
Date Deposited: 2016. Oct. 15. 12:25
Last Modified: 2018. Jun. 06. 10:16
URI: http://acta.bibl.u-szeged.hu/id/eprint/12868

Actions (login required)

View Item View Item