Benjamin van der Woerd, Zhuohao Chen, Nikolaos Flemotomos, Maria Oljaca, Lauren Timmons Sund, Shrikanth Narayanan, and Michael M. Johns. A Machine-Learning Algorithm for the Automated Perceptual Evaluation of Dysphonia Severity. Journal of Voice, 2023.

Download

[PDF] 

Abstract

SummaryObjectivesAuditory-perceptual assessments are the gold standard for assessing voice quality. This project aims to develop a machine-learning model for measuring perceptual dysphonia severity of audio samples consistent with assessments by expert raters.MethodsThe Perceptual Voice Qualities Database samples were used, including sustained vowel and Consensus Auditory-Perceptual Evaluation of Voice sentences, which were previously expertly rated on a 0–100 scale. The OpenSMILE (audEERING GmbH, Gilching, Germany) toolkit was used to extract acoustic (Mel-Frequency Cepstral Coefficient-based, n = 1428) and prosodic (n = 152) features, pitch onsets, and recording duration. We utilized a support vector machine and these features (n = 1582) for automated assessment of dysphonia severity. Recordings were separated into vowels (V) and sentences (S) and features were extracted separately from each. Final voice quality predictions were made by combining the features extracted from the individual components with the whole audio (WA) sample (three file sets: S, V, WA).ResultsThis algorithm has a high correlation (r = 0.847) with estimates of expert raters. The root mean square error was 13.36. Increasing signal complexity resulted in better estimation of dysphonia, whereby combining the features outperformed WA, S, and V sets individually.ConclusionA novel machine-learning algorithm was able to perform perceptual estimates of dysphonia severity using standardized audio samples on a 100-point scale. This was highly correlated to expert raters. This suggests that ML algorithms could offer an objective method for evaluating voice samples for dysphonia severity.Level of Evidence4

BibTeX Entry

@article{VANDERWOERD2023,
title = {A Machine-Learning Algorithm for the Automated Perceptual Evaluation of Dysphonia Severity},
journal = {Journal of Voice},
year = {2023},
issn = {0892-1997},
doi = {https://doi.org/10.1016/j.jvoice.2023.06.006},
url = {https://www.sciencedirect.com/science/article/pii/S0892199723001790},
author = {Benjamin {van der Woerd} and Zhuohao Chen and Nikolaos Flemotomos and Maria Oljaca and Lauren Timmons Sund and Shrikanth Narayanan and Michael M. Johns},
 link = {http://sail.usc.edu/publications/files/ML-voice-JVoice2023.pdf},
keywords = {Machine learning, Voice evaluation, Perceptual voice evaluation, Automation, Artificial intelligence},
abstract = {Summary
Objectives
Auditory-perceptual assessments are the gold standard for assessing voice quality. This project aims to develop a machine-learning model for measuring perceptual dysphonia severity of audio samples consistent with assessments by expert raters.
Methods
The Perceptual Voice Qualities Database samples were used, including sustained vowel and Consensus Auditory-Perceptual Evaluation of Voice sentences, which were previously expertly rated on a 0–100 scale. The OpenSMILE (audEERING GmbH, Gilching, Germany) toolkit was used to extract acoustic (Mel-Frequency Cepstral Coefficient-based, n = 1428) and prosodic (n = 152) features, pitch onsets, and recording duration. We utilized a support vector machine and these features (n = 1582) for automated assessment of dysphonia severity. Recordings were separated into vowels (V) and sentences (S) and features were extracted separately from each. Final voice quality predictions were made by combining the features extracted from the individual components with the whole audio (WA) sample (three file sets: S, V, WA).
Results
This algorithm has a high correlation (r = 0.847) with estimates of expert raters. The root mean square error was 13.36. Increasing signal complexity resulted in better estimation of dysphonia, whereby combining the features outperformed WA, S, and V sets individually.
Conclusion
A novel machine-learning algorithm was able to perform perceptual estimates of dysphonia severity using standardized audio samples on a 100-point scale. This was highly correlated to expert raters. This suggests that ML algorithms could offer an objective method for evaluating voice samples for dysphonia severity.
Level of Evidence
4}
}

Generated by bib2html.pl (written by Patrick Riley ) on Sat Oct 28, 2023 08:04:24