Prashanth Gurunath Shivakumar and Shrikanth Narayanan. End-to-end neural systems for automatic children speech recognition: An empirical study. Computer Speech & Language, 72:101289, 2022.

Download

[PDF] 

Abstract

A key desiderata for inclusive and accessible speech recognition technology is ensuring its robust performance to children’s speech. Notably, this includes the rapidly advancing neural network based end-to-end speech recognition systems. Children speech recognition is more challenging due to the larger intra-inter speaker variability in terms of acoustic and linguistic characteristics compared to adult speech. Furthermore, the lack of adequate and appropriate children speech resources adds to the challenge of designing robust end-to-end neural architectures. This study provides a critical assessment of automatic children speech recognition through an empirical study of contemporary state-of-the-art end-to-end speech recognition systems. Insights are provided on the aspects of training data requirements, adaptation on children data, and the effect of children age, utterance lengths, different architectures and loss functions for end-to-end systems and role of language models on the speech recognition performance.

BibTeX Entry

@article{GURUNATHSHIVAKUMAR2022101289,
title = {End-to-end neural systems for automatic children speech recognition: An empirical study},
journal = {Computer Speech & Language},
volume = {72},
pages = {101289},
year = {2022},
issn = {0885-2308},
doi = {https://doi.org/10.1016/j.csl.2021.101289},
  link = {http://sail.usc.edu/publications/files/Shivakumar-CSL2022.pdf},
url = {https://www.sciencedirect.com/science/article/pii/S0885230821000905},
author = {Prashanth {Gurunath Shivakumar} and Shrikanth Narayanan},
keywords = {Children speech recognition, End-to-end speech recognition, Residual network, Time depth separable convolutional network, Transformer},
abstract = {A key desiderata for inclusive and accessible speech recognition technology is ensuring its robust performance to children’s speech. Notably, this includes the rapidly advancing neural network based end-to-end speech recognition systems. Children speech recognition is more challenging due to the larger intra-inter speaker variability in terms of acoustic and linguistic characteristics compared to adult speech. Furthermore, the lack of adequate and appropriate children speech resources adds to the challenge of designing robust end-to-end neural architectures. This study provides a critical assessment of automatic children speech recognition through an empirical study of contemporary state-of-the-art end-to-end speech recognition systems. Insights are provided on the aspects of training data requirements, adaptation on children data, and the effect of children age, utterance lengths, different architectures and loss functions for end-to-end systems and role of language models on the speech recognition performance.}
}

Generated by bib2html.pl (written by Patrick Riley ) on Sat Nov 20, 2021 15:31:35