Speech production, acoustics, perception, synthesis, compression, recognition, transmission. Coding for speech, music, and CD-quality. Feature extraction. Echo cancellation. Audio, visual synchronization. Multimedia, Internet use.
DSP (EE483 or equivalent).
Knowledge of linear systems (EE482) and probability (EE464) desired, but not required.
This course will help you learn how modern automatic speech recognition systems (ASRs) are built and how they work. The emphasis will be on statistical methods and modeling techniques. You will learn about Hidden Markov Models as generative models for speech (including HMM training, evaluation, and decoding algorithms), acoustic modeling using HMMs, front end processing for robustness, statistical language models, and dialogue modeling. Finally, you will see how these techniques can be brought together to construct complex, useful applications, such as speech translation systems, multimodal information processing, and even speech synthesis.
EE519 or equivalent.
Probability (EE464); Stochastic Processes (EE562a).