We have developed and make use of real-time imaging of the vocal tract, with synchronized audio recordings. We are employing state-of-the-art image processing and signal analysis techniques to interpret the phonetic data acquired using these technologies. We aim to enhance scientific knowledge of the articulatory activity that creates speech, which we consider a necessary element in fully understanding human communication.
Real-time and interactive MRI requires an entirely new imaging infrastructure that supports real-time reconstruction and display, as well as real-time operator controls (such as scan plane and image contrast). Our acquisition pulse sequences are implemented using the RTHawk real-time imaging system developed at Stanford. We are developing accelerated 2D and 3D imaging techniques tailored to the upper airway and accurate depiction of air-tissue boundaries during vocal production. In addition, we are involved in the development and validation of new targeted phased array coils for imaging the upper airway.
- Yoon-Chul Kim, Michael I. Proctor, Shrikanth S. Narayanan, Krishna Nayak, "Improved Imaging of Lingual Articulation Using Real-Time Multislice MRI", Journal of Magnetic Resonance Imaging, vol. 35, no. 4, pp. 943-948, 2012.
- Yoon-Chul Kim, Cecil Hayes, Shrikanth S. Narayanan, Krishna S. Nayak, "A Novel 16-Channel Receive Coil Array for Accelerated Upper Airway MRI at 3 Tesla", Magnetic Resonance in Medicine, vol. 65, no. 6, pp. 1711-1717, 2011.
- Yinghua Zhu, Yoon-Chul Kim, Michael I. Proctor, Shrikanth S. Narayanan, Krishna S. Nayak, "Dynamic 3D visualization of vocal tract shaping during speech", International Society for Magnetic Resonance in Medicine (ISMRM) 19th Scientific Sessions, Montreal, Canada, pp. 4355, 2011.
- Yoon-Chul Kim, Shrikanth S. Narayanan, Krishna S. Nayak, "Flexible Retrospective Selection of Temporal Resolution in Real-Time Speech MRI Using a Golden-Ratio Spiral View Order", International Society for Magnetic Resonance in Medicine (ISMRM) 18th Scientific Sessions, Stockholm, Sweden, pp. 4967, 2010.
- Erik Bresch, Yoon-Chul Kim, Krishna S. Nayak, Dani Byrd, Shrikanth S. Narayanan, "Seeing speech: Capturing vocal tract shaping using real-time magnetic resonance imaging", IEEE Signal Processing Magazine, vol. 25, no. 3, pp. 123-132, 2008.
- Shrikanth S. Narayanan, Krishna S. Nayak, Sungbok Lee, Abhinav Sethy, Dani Byrd, "An approach to real-time magnetic resonance imaging for speech production", Journal of the Acoustical Society of America, vol. 115, no. 4, pp. 1771-1776, 2004. (This publication has an associated webpage).
MRI is a noisy imaging modality, making it challenging to acquire high-quality speech audio to accompany rtMRI videos. Thus, we are actively engaged in researching ways to record and process running speech inside the MRI scanner. The main focus of this work is (a) to ensure synchronicity between image and audio acquisition, and (b) to obtain a good signal-to-noise ratio to facilitate further speech analysis and modeling. The audio setup itself features two fiber optical microphones. Synchronization is achieved with custom designed field-programmable gate array hardware. We use novel approaches for noise cancellation employing, for instance, a pulse-sequence-specific model of the gradient noise of the MRI scanner.
- Colin Vaz, Vikram Ramanarayanan, Shrikanth S. Narayanan, "A two-step technique for MRI audio enhancement using dictionary learning and wavelet packet analysis", Interspeech, Lyon, France, 2013. Best Student Paper Award.
- Erik Bresch, Jon Nielsen, Krishna S. Nayak, Shrikanth S. Narayanan, "Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans", Journal of the Acoustical Society of America, vol. 120, no. 4, pp. 1791-1794, 2006. (This publication has an associated webpage).
Vocal tract image sequences acquired using rtMRI are rich in information, which presents many challenges in the domain of image processing and analysis. Much of our work is focused on developing computational methods to extract low-dimensional representations of image sequences which faithfully capture the complex structures and actions of the vocal tract. Analysis techniques must take theoretical considerations into account regarding scientific/linguistic interpretability, in addition to practical concerns like precision, robustness and efficiency.
- Michael I. Proctor, Adam Lammert, Athanasios Katsamanis, Louis Goldstein, Christina Hagedorn, Shrikanth S. Narayanan, "Direct Estimation of Articulatory Kinematics from Real-time Magnetic Resonance Image Sequences", Interspeech, Florence, Italy, 2011.
- Michael I. Proctor, Daniel Bone, Athanasios Katsamanis, Shrikanth S. Narayanan, "Rapid Semi-automatic Segmentation of Real-time Magnetic Resonance Images for Parametric Vocal Tract Analysis", Interspeech, Makuhari, Japan, 2010.
- Adam Lammert, Michael I. Proctor, Shrikanth S. Narayanan, "Data-Driven Analysis of Realtime Vocal Tract MRI using Correlated Image Regions", Interspeech, Makuhari, Japan, 2010.
- Erik Bresch, Shrikanth S. Narayanan, "Region segmentation in the frequency domain applied to upper airway real-time magnetic resonance images", IEEE Transactions on Medical Imaging, vol. 28, no. 3, pp. 323-338, 2009. (This publication has an associated webpage).
Upper airway real-time MRI is combined with linguistically informed analysis of vocal tract constriction actions in order to investigate the production and cognitive control of the compositional action units of spoken language. Speech is dynamic in nature: it is realized through time-varying changes in vocal tract shaping, which emerge lawfully from the combined effects of multiple constriction events distributed over space (subparts of the vocal tract) and over time. An understanding of this dynamical aspect is fundamental to the linguistic study and is intended through our research to be added to the field’s current -basically static- approach to describing speech. The technology as well as the analysis platform of real-time MRI developed in our team allows us to pursue such a goal through examining the decomposition of speech into cognitively-controlled vocal tract constriction events, or gestures. To further our understanding of linguistic structuring, three aspects of speech production are specifically examined in our project: (i) compositionality in space: the deployment of concurrent gestures distributed spatially, i.e. over distinct constriction effectors within the vocal tract, (ii) compositionality in time: deployment of gestures temporally, and (iii) compositionality in cognition: deployment of gestures during speech planning that mirror those observed during speech production.
- Michael I. Proctor, Louis Goldstein, Adam Lammert, Dani Byrd, Asterios Toutios, Shrikanth S. Narayanan, "Velic Coordination in French Nasals: a Realtime Magnetic Resonance Imaging Study", Interspeech, Lyon, France, 2013.
- Caitlin Smith, Michael I. Proctor, Khalil Iskarous, Louis Goldstein, Shrikanth S. Narayanan, "Stable articulatory tasks and their variable formation: Tamil retroflex consonants", Interspeech, Lyon, France, 2013.
- Fang-Ying Hsieh, Louis Goldstein, Dani Byrd, Shrikanth S. Narayanan, "Truncation of Pharyngeal Gesture in English Diphthong [aI]", Interspeech, Lyon, France, 2013.
- Michael I. Proctor, Rachel Walker, "Articulatory bases of sonority in English liquids", The Sonority Hierarchy (Steve Parker, ed.), De Gruyter Mouton, Berlin, Germany, pp. 289-316, 2012.
We aim to explore the ways in which the physical vocal tract structure affects articulatory behavior during speaking. We are interested both in details of how individual vocal morphological differences are reflected in the acoustic speech signal and what articulatory strategies are adopted in the presence of structural differences to achieve speech invariance. Our objective is to improve our scientific understanding of how vocal tract morphology and speech articulation interplay and explain the variant and invariant aspects of speech signal properties within and across talkers. A related goal is to create forward and inverse models that relate vocal tract details to the resultant acoustics that can shed light on individual differences during speech.
- Adam Lammert, Michael I. Proctor, Shrikanth S. Narayanan, "Interspeaker Variability in Hard Palate Morphology and Vowel Production", Journal of Speech, Language, and Hearing Research, vol. 56, pp. S1924-S1933, 2013.
- Adam Lammert, Michael I. Proctor, Shrikanth S. Narayanan, "Morphological Variation in the Adult Hard Palate and Posterior Pharyngeal Wall", Journal of Speech, Language, and Hearing Research, vol. 56, pp. 521-530, 2013.
Vocal expression of emotions is an integral part of human speech communication. The state of art in speech emotion research has predominantly focused on surface speech acoustic properties; there remain open questions as to how speech properties co-vary across emotional types, talkers, and linguistic conditions. Given the complex interplay between the linguistic and paralinguistic aspects of speech production, there are limitations to uncovering the underlying details just from the resultant acoustics. Using multi-modal dynamic articulatory data, we aim to shed further light on the strategies by which both linguistic articulation and emotion encoding are achieved in concert.
- Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan, "An Exploratory Study of the Relations between Perceived Emotion Strength and Articulatory Kinematics", Interspeech, Florence, Italy, 2011.
- Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan, "A detailed study of word-position effects on emotion expression in speech", Interspeech, Brighton, UK, 2009.
We study the characteristics of speech production of people suffering from speech disorders, such as verbal apraxia (inability to execute a voluntary movement despite being able to demonstrate normal muscle function), and people who have undergone surgical removal of part of their tongue (glossectomy) as a consequence of cancer.
- Christina Hagedorn, Adam Lammert, Mary Bassily, Yihe Zu, Uttam Sinha, Louis Goldstein, Shrikanth S. Narayanan, "Characterizing post-glossectomy speech using real-time MRI", International Seminar on Speech Production (ISSP), Cologne, Germany, 2014.
- Christina Hagedorn, Michael I. Proctor, Louis Goldstein, Maria Luisa Gorno Tempini, Shrikanth S. Narayanan, "Characterizing Covert Articulation in Apraxic Speech Using real-time MRI", Interspeech, Portland, OR, 2012.
We are using Real-time MRI to study different types of singing, including Western Classical Soprano and Human Beatboxing performance, to investigate: how human vocal organs are utilized in different performance styles; how this articulation resembles or differs from that of spoken speech; how percussive and linguistic gestures are coordinated; how we perceive whether different signals are musical or linguistic in nature.
- Michael I. Proctor, Erik Bresch, Dani Byrd, Krishna S. Nayak, Shrikanth S. Narayanan, "Paralinguistic Mechanisms of Production in Human 'Beatboxing:' a Real-time Magnetic Resonance Imaging Study", Journal of the Acoustical Society of America, vol. 133, no. 2, pp. 1043-1054, 2013. (This publication has an associated webpage).
We aim to synthesize fluent speech by simulating the articulatory-to-acoustic relationship in the human vocal tract and copying the dynamics of the vocal-tract shaping from recorded articulatory data. We also work on estimating articulatory movements from acoustics, a problem known as acoustic-to-articulatory inversion, with a particular interest in the speaker-independent case. A related problem of interest is the development of methods that will enable the parallel analysis and application of multiple types of articulatory data, such as EMA and real-time MRI.
- Jangwon Kim, Adam Lammert, Prasanta Kumar Ghosh, Shrikanth S. Narayanan, "Co-registration of speech production datasets from electromagnetic articulography and real-time magnetic resonance imaging", Journal of the Acoustical Society of America Express Letters, vol. 135, no. 2, pp. EL115-EL121, 2014.
- Asterios Toutios, Shrikanth S. Narayanan, "Articulatory Synthesis of French Connected Speech from EMA Data", Interspeech, Lyon, France, 2013. (This publication has an associated webpage).
- Prasanta Kumar Ghosh, Shrikanth S. Narayanan, "A generalized smoothness criterion for acoustic-to-articulatory inversion", Journal of the Acoustical Society of America, vol. 128, no. 4, pp. 2162-2172, 2010.
In an attempt to uncover the motor commands underlying speech production, we apply cutting-edge statistical modeling techniques to articulatory data to find movement patterns in space and time. A related problem of interest is the characterization of articulatory setting, i.e. the set of postural configurations that the vocal tract articulators tend to be deployed from and return to in the process of producing fluent and natural speech.
- Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan, "Spatio-temporal articulatory movement primitives during speech production -- extraction, interpretation and validation", Journal of the Acoustical Society of America, vol. 134, no. 2, pp. 1378-1394, 2013.
- Vikram Ramanarayanan, Louis Goldstein, Dani Byrd, Shrikanth S. Narayanan, "An investigation of articulatory setting using real-time magnetic resonance imaging", Journal of the Acoustical Society of America, vol. 134, no. 1, pp. 510-519, 2013.