Forensic authorship classification by paragraph vectors of speech transcriptions

Sztahó Dávid and Beke András and Szaszák György and Fejes Attila: Forensic authorship classification by paragraph vectors of speech transcriptions.

[thumbnail of msznykonf_018_271-279.pdf] Cikk, tanulmány, mű

Download (458kB)


In forensic comparison, document classification techniques are used mainly for authorship classification and author profiling. In the present study, we aim to introduce paragraph vector modelling (by Doc2Vec) into the likelihoodratio framework paradigm of forensic evidence comparison. Transcriptions of spontaneous speech recording are used as input to paragraph vector extraction model training. Logistic regression models are trained based on cosine distances of paragraph vector pairs to predict the same and different author origin probability. Results are evaluated according to different speaking styles (transcriptions of speech tasks available in the dataset). Cllr and equal error rate values (lowest ones are 0.47 and 0.11, respectively) show that the method can be useful as a feature for forensic authorship comparison and may extend the voice comparison methods for speaker verification.

Item Type: Conference or Workshop Item
Heading title: Alkalmazások
Journal or Publication Title: Magyar Számítógépes Nyelvészeti Konferencia
Date: 2022
Volume: 18
ISBN: 978-963-306-848-9
Page Range: pp. 271-279
Language: English
Place of Publication: Szeged
Event Title: Magyar számítógépes nyelvészeti konferencia (18.) (2022) (Szeged)
Related URLs:
Uncontrolled Keywords: Nyelvészet - számítógép alkalmazása
Additional Information: Bibliogr.: 279. p. ; ill. ; összefoglalás angol nyelven
Subjects: 01. Natural sciences
01. Natural sciences > 01.02. Computer and information sciences
06. Humanities
06. Humanities > 06.02. Languages and Literature
Date Deposited: 2022. May. 25. 10:30
Last Modified: 2022. Nov. 08. 11:49

Actions (login required)

View Item View Item