Athanasios Katsamanis, Matthew P. Black, Panayiotis Georgiou, Louis Goldstein, and Shrikanth S. Narayanan. SailAlign: Robust long speech-text alignment. In Proc. of Workshop on New Tools and Methods for Very-Large Scale Phonetics Research, pp. 28–31, University of Pennsylvania, jan 2011.

Download

[PDF] 

Abstract

Long speech-text alignment can facilitate large-scale study of rich spoken language resources that have recently become widely accessible, e.g., collections of audio books, or multime- dia documents. For such resources, the conventional Viterbi- based forced alignment may often be proven inadequate mainly due to mismatched audio and text and/or noisy audio. In this paper, we present SailAlign which is an open-source software toolkit for robust long speech-text alignment that circumvents these restrictions. It implements an adaptive, iterative speech recognition and text alignment scheme that allows for the pro- cessing of very long (and possibly noisy) audio and is robust to transcription errors. SailAlign is evaluated on artificially cre- ated long chunks of the TIMIT database. Audio is artificially contaminated with babble noise, and the corresponding tran- scriptions are corrupted at various levels. We present the corre- sponding word boundary detection results. Finally, we demon- strate the potential use of the software for the exploitation of audio books for the study of read speech.

BibTeX Entry

@inproceedings{Katsamanis2011SailAlign:Robustlongspeech-text,
 abstract = {Long speech-text alignment can facilitate large-scale study of rich spoken language resources that have recently become widely accessible, e.g., collections of audio books, or multime- dia documents. For such resources, the conventional Viterbi- based forced alignment may often be proven inadequate mainly due to mismatched audio and text and/or noisy audio. In this paper, we present SailAlign which is an open-source software toolkit for robust long speech-text alignment that circumvents these restrictions. It implements an adaptive, iterative speech recognition and text alignment scheme that allows for the pro- cessing of very long (and possibly noisy) audio and is robust to transcription errors. SailAlign is evaluated on artificially cre- ated long chunks of the TIMIT database. Audio is artificially contaminated with babble noise, and the corresponding tran- scriptions are corrupted at various levels. We present the corre- sponding word boundary detection results. Finally, we demon- strate the potential use of the software for the exploitation of audio books for the study of read speech.},
 author = {Katsamanis, Athanasios and Black, Matthew P. and Georgiou, Panayiotis and Goldstein, Louis and Narayanan, Shrikanth S.},
 bib2html_rescat = {speechlinks,span},
 booktitle = {Proc. of Workshop on New Tools and Methods for Very-Large Scale Phonetics Research},
 link = {http://sail.usc.edu/publications/files/35448ecec01603f25919f9302f8f368a15c6.pdf},
 location = {Philadelphia, PA},
 month = {jan},
 pages = {28-31},
 publisher = {University of Pennsylvania},
 title = {SailAlign: Robust long speech-text alignment},
 year = {2011}
}

Generated by bib2html.pl (written by Patrick Riley ) on Fri Sep 15, 2023 14:54:34