USC Logo

EE 619: Advanced Topics in Automatic Speech Recognition
Spring 2013

Main | Assignment | Projects | Reading | Proceedings


Welcome to EE 619: Advanced Topics in Automatic Speech Recognition.

This course will help you learn how modern automatic speech recognition systems (ASRs) are built and how they work. The emphasis will be on statistical methods and modeling techniques. You will learn about Hidden Markov Models as generative models for speech (including HMM training, evaluation, and decoding algorithms), acoustic modeling using HMMs, front end processing for robustness, statistical language models, and dialogue modeling. Finally, you will see how these techniques can be brought together to construct complex, useful applications, such as speech translation systems, multimodal information processing, and even speech synthesis. Course details follow.

Course Goals: Theory, Practice and Research in ASR and speech processing
Meeting Times: Friday 12:00-2:50pm
Location: TBD (to be determined)

Instructor:
Shri Narayanan
Office hours: Wednesday 9:00-11:00am
Location: EEB 430
E-mail: shri@sipi.usc.edu

Teaching Assistant: Jangwon Kim
Office hours: Thursday 3-5pm
Location: EEB 413
E-mail: jangwon@usc.edu

Pre-requisites
        - Probability (EE 464) and Speech Processing (EE 519).
        - Knowledge of text-processing tools such as sed, and either Perl or Python will help a lot.
        - Familiarity with Linux / UNIX environment will be a definite plus.


Syllabus

0. Engineering spoken language systems - overview and problems
        - Review probability, estimation and information theory

1. Overview of speech recognition problem
        - Front-end signal processing, acoustic and language modeling, and search
        - Background Material: Probability, Statistics, Information Theory, Pattern Recognition

2. Hidden Markov Models for speech recognition
        - Search for the most likely state sequenc
        - Parameter estimation for HMMs.

3. Acoustic modeling
        - Context-dependent and context-independent models
        - Pronunciation modeling

4. Implementing a continuous speech recognizer
        - The speech decoder: beam search, review of the state-of-the-art in performance
        - Search techniques

5. Robust speech recognition
        - Front end signal processing methods: cepstral subtraction, perceptually motivated approaches
        - Parallel model combination techniques
        - Vocal-tract normalization
        - Supervised and unsupervised adaptation (LDA, MLLR etc)

6. Language modeling
        - Languages and grammars
        - Stochastic language models (N-grams)

7. Natural language understanding
        - Parsing and robust parsing
        - Semantic representation for machine understanding

8. Spoken dialog systems
        - Design principles and dialog system types
        - Architecture issues
        - Mathematical modeling of spoken dialog systems
        - The Markov decision process model
        - Evaluation and testing of spoken dialog systems

9. Speaker recognition

10. Additional application topics:
        - Audio-visual speech/speaker recognition
        - Distributed speech recognition
        - Speech-to-speech translation
        - Speech recognition techniques (especially HMM) for speech synthesis
        - Music transcription


Course Materials

a) Reference texts:

        * Spoken Language Processing: A guide to theory, algorithm and system development, X. Huang, A. Acero, H-W. Hon, Prentice Hall 2001
        * Cambridge University HTK Book, Steve Young et al. Download from http://htk.eng.cam.ac.uk/docs/docs.html [you need to first register]
        * Fundamentals of Speech Recognition, Rabiner and Juang, Prentice Hall, 1993
        * Statistical Methods for Speech recognition, F. Jelinek, MIT Press, 1997
        * Speech and Language Processing, Jurafsky and Martin, Prentice Hall, 2000
        * Automatic speech and speaker recognition: Advanced topics, Chin-hui Lee, Frank Soong, K. Paliwal, Kluwer, 1996

b) Other References

Conference Proceedings

        * ICASSP: International Conference on Acoustics, Speech and Signal Processing
        * Interspeech: International Conference on Spoken Language Processing (ICSLP) and Eurospeech (each biennial)
        * ASRU: Automatic Speech Recognition and Understanding Workshop
        * Workshops of ESCA (European Speech Communication Association), DARPA

Journals

        * IEEE Transactions of Speech and Audio Processing
        * IEEE Transactions of Pattern Analysis and Machine Intelligence
        * Speech Communication
        * IEEE Transactions on Multimedia
        * Computer, Speech and Language
        * Computational Linguistics


Open Source Software

        * HTK (HMM Toolkit), http://htk.eng.cam.ac.uk/, Cambridge University
        * Sphinx, http://www.speech.cs.cmu.edu/speech/sphinx, CMU
        * KALDI, http://kaldi.sourceforge.net/about.html


Course Outline

The course will be organized in a seminar style, and some sessions may be hands-on lab work. There will be no homeworks, but one mini-project-style assignment will be given in order for you to get familiar with building a speech recognizer using the HTK toolkit. In addition to this assignment, you are expected to work on a term project with KALDI (automatic speech recognition toolkit). The project is expected to be somewhat comprehensive. A list of possible ideas will be handed out within the first few weeks. The project requires a midterm progress report presentation and a final report and presentation. Dates to note:

Midterm presentation: TBD
Final presentation: TBD

There will be no other exams.

Grading Policy: Class assignment 25%, course project 75%

Back to top

Last updated: 16 January 2013