Statistical language models within the algebra of weighted rational languages

Hanneforth Thomas and Würzner Kay-Michael: Statistical language models within the algebra of weighted rational languages. In: Acta cybernetica, (19) 2. pp. 313-356. (2009)

[thumbnail of Hanneforth_2009_ActaCybernetica.pdf]
Preview
Cikk, tanulmány, mű
Hanneforth_2009_ActaCybernetica.pdf

Download (702kB) | Preview

Abstract

Statistical language models are an important tool in natural language processing. They represent prior knowledge about a certain language which is usually gained from a set of samples called a corpus. In this paper, we present a novel way of creating N-gram language models using weighted finite automata. The construction of these models is formalised within the algebra underlying weighted finite automata and expressed in terms of weighted rational languages and transductions. Besides the algebra we make use of five special constant weighted transductions which rely only on the alphabet and the model parameter N. In addition, we discuss efficient implementations of these transductions in terms of virtual constructions.

Item Type: Article
Journal or Publication Title: Acta cybernetica
Date: 2009
Volume: 19
Number: 2
ISSN: 0324-721X
Page Range: pp. 313-356
Language: English
Place of Publication: Szeged
Event Title: Weighted Automata : Theory and Applications (2008) (Dresden)
Related URLs: http://acta.bibl.u-szeged.hu/38528/
Uncontrolled Keywords: Számítástechnika, Kibernetika
Additional Information: Bibliogr.: p. 346-349. ; összefoglalás angol nyelven
Subjects: 01. Natural sciences
01. Natural sciences > 01.02. Computer and information sciences
Date Deposited: 2016. Oct. 15. 12:25
Last Modified: 2022. Jun. 17. 09:03
URI: http://acta.bibl.u-szeged.hu/id/eprint/12868

Actions (login required)

View Item View Item