TY - CONF
UR - http://acta.bibl.u-szeged.hu/78412/
N2 - In this study, cross-lingual binary classification and severity estimation of dysphonic speech have been carried out. Hand-crafted acoustic feature extraction is replaced by the speaker embedding techniques used in the speaker verification. Two state of art deep learning methods for speaker verification have been used: the X-vector and ECAPA-TDNN. Embeddings are extracted from speech samples in Hungarian and Dutch languages and used to train Support Vector Machine (SVM) and Support Vector Regressor (SVR) for binary classification and severity estimation, in a cross-language manner. Our results were competitive with manual feature engineering, when the models were trained on Hungarian samples and evaluated on Dutch samples in the binary classification of dysphonic speech and outperformed in estimating the severity level of dysphonic speech. Moreover, our model achieved 0.769 and 0.771 in Spearman and Pearson correlations. Also, our results in both classification and regression were superior compared to manual feature extraction technique when models were trained on Dutch samples and evaluated on Hungarian samples with only a limited number of samples are available for training. An accuracy of 86.8% was reached with features extracted from embedding methods, while the maximum accuracy using hand-crafted acoustic features was 66.8%. Overall results show that Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN) performs better than the former X-vector in both tasks.
KW - Nyelvészet - számítógép alkalmazása
TI - Cross-lingual dysphonic speech detection using pretrained speaker embeddings
Y1 - 2023///
N1 - Bibliogr.: p. 182-183. ; összefoglalás angol nyelven
T2 - Magyar számítógépes nyelvészeti konferencia (19.)
SN - 978-963-306-912-7
ID - acta78412
AV - public
A1 - Aziz Dosti Ali Hama Salih
A1 - Sztahó Dávid
EP - 183
CY - Szeged
M2 - Szeged
VL - 19
SP - 171
ER -