INTRODUCTION
US federal agencies are seeking new technologies to support multinational, multilingual operations.
One significant challenge in such operations is cross-language communication between speakers of
different native languages. DARPA and other DoD research groups envision enabling cross-language
communication via computer-mediated translation of speech between the native languages of the speakers.
The University of Southern California will develop Transonic Solutions prototypes enabling human-human
communication via automated two-way speech-to-speech language translation of English-Farsi and English-Dari.
TRANSONICS SOLUTIONS
The Transonics translation device operates by recognizing the user's
speech input and converting it into text. The text is subsequently
translated, and re-synthesized back into the target language. The
following is a more detailed description of the system's
functionality:
The Automatic Speech Recognition (ASR) subsystem that produces n-best
lists along with the decoding confidence scores. The ASR operates in
real time in two languages: English and Persian(Farsi). The English
ASR operates on a vocabulary of over 22,000 words and the Persian with
over 9000 words. The ASR uses models of the acoustic patterns of human
speech trained by example recordings and statistical knowledge about
the structure of the language trained by example transcripts.
The Dialog Manager (DM) acts as the heart of the system, redirecting
messages accordingly. Every output of the speech recognizer, tagged
with time stump, serial number, and other important information is
received by the DM, and usually is redirected for display to the
Graphical User interface, and for translation to the Machine
Translation unit. However, the DM also has the options of rejecting
recognition results based on several factors, such as confidence
reported by the ASR.
The Machine Translation (MT) unit operates in two modes: The
Classifier attempts to assign a concept to an utterance, so for
example a sentence such as "Umm and do you have any headaches" would
become the concept "Do you have a headache". The classifier can
perform really accurately in the case of utterances that are in
domain, in other words, for those sentences that we trained the system
exhaustively to work with (about 1200 utterances). However out of
domain utterances are translated in a statistical manner by the
Statistical Machine Translation (SMT) unit. The SMT breaks the
sentence into segments and translates piece by piece, reconstructing
the result on the destination language.
Finally, a unit selection based Text To Speech synthesizer (TTS)
provides the spoken output. The TTS operates in several modes: for
utterance results of the classifier, the naturally sounding human
recording of the utterance is played out. If the utterance was
statistically classified, then the unit of the synthesizer drops from
the utterance level to the word level and a concatenation of words is
played out. Finally, for words that we have no human recordings of,
synthesis of the individual words, and of the sentence takes place at
the diphone level.
PEOPLE
USC Faculty
USC Post-Doctors and Students
HRL Researchers
- Robert Belvin
- Shubha Kadambe
- Howard Neely
PAPERS
- JongHo Shin, Panayiotis G. Georgiou, and Shrikanth Narayanan, A Study of User Modeling in Machine Mediated Spoken Interactions, Submitted to ASRU 2005
- Abhinav Sethy, Panayiotis Georgiou, and Shrikanth Narayanan. Building topic specific language models from webdata using competitive models. In Proc. of EUROSPEECH, Interspeech, Lisbon, Portugal, 2005. PDF
- Dagen Wang and Shrikanth Narayanan. Piecewise linear stylization of pitch via wavelet analysis. In Proc. of EUROSPEECH, Interspeech, Lisbon, Portugal, 2005. PDF
- Dagen Wang and Shrikanth Narayanan. Speech rate estimation via temporal correlation and selected sub-band correlation. In Proc. ICASSP, Philadelphia, PA, March 2005. PDF
- Dagen Wang and Shrikanth Narayanan. An unsupervised quantitative measure for word prominence in spontaneous speech. In Proc. ICASSP, Philadelphia, PA, March 2005. PDF
- Shankar Ananthakrishnan and Shrikanth Narayanan. An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic-prosodic language model. In Proc. ICASSP, Philadelphia, PA, March 2005. PDF
- Joseph Tepperman and Shrikanth Narayanan. Automatic syllable stress detection using prosodic features for pronunciation evaluation of language learners. In Proc. ICASSP, Philadelphia, PA, March 2005. PDF
- Emil Ettelaie, Sudeep Gandhe, Panayiotis Georgiou, Scott Millward Kevin Knight, Daniel Marcu, Shrikanth Narayanan, David Traum, Robert Belvin, and Howard E. Neely. Transonics: A practical speech-to-speech translator for english-farsi medical dialogues. In Proc. ACL, Ann Arbor, MI, 2005.
- Abhinav Sethy and S. Narayanan. Measuring convergence in language model estimation using relative entropy. In Proceedings of ICSLP, Jeju, Korea, October 2004. PS
- Panayiotis G. Georgiou, Hooman Shirani Mehr, and Shrikanth S. Narayanan. Context dependent statistical augmentation of persian transcripts. In Proceedings of ICSLP, Jeju, Korea, October 2004.
- Robert Belvin, Win May, Shrikanth Narayanan, Panayiotis Georgiou, and Shadi Ganjavi. Creation of a doctor-patient dialogue corpus using standardized patients. In Proc. LREC, Lisbon, Portugal, 2004.
- S. Narayanan, S. Ananthakrishnan, R. Belvin, E. Ettaile, S. Gandhe, S. Ganjavi, P. G. Georgiou, C. M. Hein, S. Kadambe, K. Knight, D. Marcu, H. E. Neely, N. Srinivasamurthy, D. Traum, and D. Wang. The transonics spoken dialogue translator: An aid for english-persian doctor-patient interviews. In AAAI Fall Symposium, 2004.
- Dagen Wang and Shrikanth Narayanan. A multi-pass linear fold algorithm for sentence boundary detection using prosodic cues. In Proc. ICASSP, Montreal, Canada, May 2004. PDF
- Shadi Ganjavi, Panayiotis Georgiou, and Shrikanth Narayanan. Ascii based transcription systems with the arabic script: The case of persian. In Proc. IEEE ASRU, St. Thomas, U.S. Virgin Islands, December 2003. PDF
- S. Narayanan, S. Ananthakrishnan, R. Belvin, E. Ettaile, S. Ganjavi, P. Georgiou, C. Hein, S. Kadambe, K. Knight, D. Marcu, H. Neely, N. Srinivasamurthy, D. Traum, and D. Wang. Transonics: A speech to speech system for english-persian interactions. In Proc. IEEE ASRU, St.Thomas, U.S. Virgin Islands, Decmeber 2003. PDF
- Naveen Srinivasamurthy and Shrikanth Narayanan. Language-adaptive persian speech recognition. In Proc. Eurospeech, Geneva, Switzerland, 2003. PDF
Screenshots
DEMOS
Transonics 2.0 demo is ready, click here (4.82MB)
or here (35.8MB) to view
Transonics 3.0 demo is coming soon