research

We have developed and make use of real-time imaging of the vocal tract, with synchronized audio recordings. We are employing state-of-the-art image processing and signal analysis techniques to interpret the phonetic data acquired using these technologies. We aim to enhance scientific knowledge of the articulatory activity that creates speech, which we consider a necessary element in fully understanding human communication.



real-time MRI technology

Real-time and interactive MRI requires an entirely new imaging infrastructure that supports real-time reconstruction and display, as well as real-time operator controls (such as scan plane and image contrast). Our acquisition pulse sequences are implemented using the RTHawk real-time imaging system developed at Stanford. We are developing accelerated 2D and 3D imaging techniques tailored to the upper airway and accurate depiction of air-tissue boundaries during vocal production. In addition, we are involved in the development and validation of new targeted phased array coils for imaging the upper airway.


highlighted publications


audio acquisition technology

MRI is a noisy imaging modality, making it challenging to acquire high-quality speech audio to accompany rtMRI videos. Thus, we are actively engaged in researching ways to record and process running speech inside the MRI scanner. The main focus of this work is (a) to ensure synchronicity between image and audio acquisition, and (b) to obtain a good signal-to-noise ratio to facilitate further speech analysis and modeling. The audio setup itself features two fiber optical microphones. Synchronization is achieved with custom designed field-programmable gate array hardware. We use novel approaches for noise cancellation employing, for instance, a pulse-sequence-specific model of the gradient noise of the MRI scanner.


highlighted publications


image processing and analysis

Vocal tract image sequences acquired using rtMRI are rich in information, which presents many challenges in the domain of image processing and analysis. Much of our work is focused on developing computational methods to extract low-dimensional representations of image sequences which faithfully capture the complex structures and actions of the vocal tract. Analysis techniques must take theoretical considerations into account regarding scientific/linguistic interpretability, in addition to practical concerns like precision, robustness and efficiency.


highlighted publications


compositionality of speech production

Upper airway real-time MRI is combined with linguistically informed analysis of vocal tract constriction actions in order to investigate the production and cognitive control of the compositional action units of spoken language. Speech is dynamic in nature: it is realized through time-varying changes in vocal tract shaping, which emerge lawfully from the combined effects of multiple constriction events distributed over space (subparts of the vocal tract) and over time. An understanding of this dynamical aspect is fundamental to the linguistic study and is intended through our research to be added to the field’s current -basically static- approach to describing speech. The technology as well as the analysis platform of real-time MRI developed in our team allows us to pursue such a goal through examining the decomposition of speech into cognitively-controlled vocal tract constriction events, or gestures. To further our understanding of linguistic structuring, three aspects of speech production are specifically examined in our project: (i) compositionality in space: the deployment of concurrent gestures distributed spatially, i.e. over distinct constriction effectors within the vocal tract, (ii) compositionality in time: deployment of gestures temporally, and (iii) compositionality in cognition: deployment of gestures during speech planning that mirror those observed during speech production.


highlighted publications


structure-strategy interplay

We aim to explore the ways in which the physical vocal tract structure affects articulatory behavior during speaking. We are interested both in details of how individual vocal morphological differences are reflected in the acoustic speech signal and what articulatory strategies are adopted in the presence of structural differences to achieve speech invariance. Our objective is to improve our scientific understanding of how vocal tract morphology and speech articulation interplay and explain the variant and invariant aspects of speech signal properties within and across talkers. A related goal is to create forward and inverse models that relate vocal tract details to the resultant acoustics that can shed light on individual differences during speech.


highlighted publications


emotional speech production

Vocal expression of emotions is an integral part of human speech communication. The state of art in speech emotion research has predominantly focused on surface speech acoustic properties; there remain open questions as to how speech properties co-vary across emotional types, talkers, and linguistic conditions. Given the complex interplay between the linguistic and paralinguistic aspects of speech production, there are limitations to uncovering the underlying details just from the resultant acoustics. Using multi-modal dynamic articulatory data, we aim to shed further light on the strategies by which both linguistic articulation and emotion encoding are achieved in concert.


highlighted publications


speech pathologies

We study the characteristics of speech production of people suffering from speech disorders, such as verbal apraxia (inability to execute a voluntary movement despite being able to demonstrate normal muscle function), and people who have undergone surgical removal of part of their tongue (glossectomy) as a consequence of cancer.


highlighted publications


phonetics of singing

We are using Real-time MRI to study different types of singing, including Western Classical Soprano and Human Beatboxing performance, to investigate: how human vocal organs are utilized in different performance styles; how this articulation resembles or differs from that of spoken speech; how percussive and linguistic gestures are coordinated; how we perceive whether different signals are musical or linguistic in nature.


highlighted publications


synthesis and inversion

We aim to synthesize fluent speech by simulating the articulatory-to-acoustic relationship in the human vocal tract and copying the dynamics of the vocal-tract shaping from recorded articulatory data. We also work on estimating articulatory movements from acoustics, a problem known as acoustic-to-articulatory inversion, with a particular interest in the speaker-independent case. A related problem of interest is the development of methods that will enable the parallel analysis and application of multiple types of articulatory data, such as EMA and real-time MRI.


highlighted publications


speech motor control

In an attempt to uncover the motor commands underlying speech production, we apply cutting-edge statistical modeling techniques to articulatory data to find movement patterns in space and time. A related problem of interest is the characterization of articulatory setting, i.e. the set of postural configurations that the vocal tract articulators tend to be deployed from and return to in the process of producing fluent and natural speech.


highlighted publications