USC Logo

EE 619: Advanced Topics in Automatic Speech Recognition
Spring 2006

Main | Assignment | Projects | Reading


Welcome to EE 619: Advanced Topics in Automatic Speech Recognition.

This course will help you learn how modern automatic speech recognition systems (ASRs) are built and how they work. The emphasis will be on statistical methods and modeling techniques. You will learn about Hidden Markov Models as generative models for speech (including HMM training, evaluation, and decoding algorithms), acoustic modeling using HMMs, front end processing for robustness, statistical language models, and dialogue modeling. Finally, you will see how these techniques can be brought together to construct complex, useful applications, such as speech translation systems, multimodal information processing, and even speech synthesis. Course details follow.

Course Goals: Theory, Practice and Research in ASR and speech processing
Meeting Times: Friday 2:00-4.50 pm
Location: RTH 109

Instructor:
Shri Narayanan
Office hours: Wednesday 9:00-11:00 am
Location: EEB 430
E-mail: shri[at]sipi.usc.edu

Teaching Assistant: Shankar Ananthakrishnan
Office hours: Monday 2:00-4:00 pm
Location: RTH 320
E-mail: ananthak[at]usc.edu

Pre-requisites
        - Probability (EE 464) and Speech Processing (EE 519).
        - Knowledge of text-processing tools such as sed / PERL will help a lot.
        - Familiarity with Linux / UNIX environment will be a definite plus.


Syllabus

0. Engineering spoken language systems - overview and problems
        - Review probability, estimation and information theory

1. Overview of speech recognition problem
        - Front-end signal processing, acoustic and language modeling, and search
        - Background Material: Probability, Statistics, Information Theory, Pattern Recognition

2. Hidden Markov Models for speech recognition
        - Search for the most likely state sequenc
        - Parameter estimation for HMMs.

3. Acoustic modeling
        - Context-dependent and context-independent models
        - Pronunciation modeling

4. Implementing a continuous speech recognizer
        - The speech decoder: beam search, review of the state-of-the-art in performance
        - Search techniques

5. Robust speech recognition
        - Front end signal processing methods: cepstral subtraction, perceptually motivated approaches
        - Parallel model combination techniques
        - Vocal-tract normalization
        - Supervised and unsupervised adaptation (LDA, MLLR etc)

6. Language modeling
        - Languages and grammars
        - Stochastic language models (N-grams)

7. Natural language understanding
        - Parsing and robust parsing
        - Semantic representation for machine understanding

8. Spoken dialog systems
        - Design principles and dialog system types
        - Architecture issues
        - Mathematical modeling of spoken dialog systems
        - The Markov decision process model
        - Evaluation and testing of spoken dialog systems

9. Speaker recognition

10. Additional application topics:
        - Audio-visual speech/speaker recognition
        - Distributed speech recognition
        - Speech-to-speech translation
        - Speech recognition techniques (especially HMM) for speech synthesis
        - Music transcription


Course Materials

a) Reference texts:

        * Spoken Language Processing: A guide to theory, algorithm and system development, X. Huang, A. Acero, H-W. Hon, Prentice Hall 2001
        * Cambridge University HTK Book, Steve Young et al. Download from http://htk.eng.cam.ac.uk/docs/docs.html [you need to first register]
        * Fundamentals of Speech Recognition, Rabiner and Juang, Prentice Hall, 1993
        * Statistical Methods for Speech recognition, F. Jelinek, MIT Press, 1997
        * Speech and Language Processing, Jurafsky and Martin, Prentice Hall, 2000
        * Automatic speech and speaker recognition: Advanced topics, Chin-hui Lee, Frank Soong, K. Paliwal, Kluwer, 1996

b) Other References

Proceedings of

        * ICASSP: International Conference on Acoustics, Speech and Signal Processing
        * ICSLP: International Conference on Spoken Language Processing
        * EuroSpeech Conference
        * ASRU: Automatic Speech Recognition and Understanding Workshop
        * Workshops of ESCA (European Speech Communication Association), DARPA

Journals

        * IEEE Transactions of Speech and Audio Processing
        * IEEE Transactions of Pattern Analysis and Machine Intelligence
        * Speech Communication Journal
        * IEEE Transactions on Multimedia
        * Computer, Speech and Language
        * Computational Linguistics


Open Source Software

        * HTK (HMM Toolkit), http://htk.eng.cam.ac.uk/, Cambridge University
        * Sphinx, http://www.speech.cs.cmu.edu/speech/sphinx, CMU


Course Outline

The course will be organized in a seminar style, and some sessions may be hands-on lab work. There will be no homeworks, but one mini-project-style assignment will be given in order for you to get familiar with building a speech recognizer using the HTK toolkit. In addition to this assignment, you are expected to work on a term project (somewhat comprehensive, a list of possible ideas will be handed out within the first few weeks). The project requires a midterm progress report presentation and a final report and presentation. Dates to note:

Midterm presentation: March 3, 2006
Final presentation: April 28, 2006

There will be no other exams.

Grading Policy: Class assignment 25%, course project 75%

Back to top

Last updated: 13 January 2006