Co-registration of multimodal speech production data

Co-registration software is a MATLAB software for both temporal and spatial alignments of the two multimodal speech production data: ElectroMagnetic Articulography (EMA) and real-time Magnetic Resonance Imaging (rtMRI) data.
The two datasets were collected from the same stimuli, the same subject, but at different instances.

The software contains MATLAB codes and demo data (small subsets of the USC-TIMIT) for the followings.
(1) Post-processing of articulatory data for temporal alignments
(2) Dynamic-time-warping based temporal alignment on 13 dimensional Mel Frequency Cepstral Coefficients (MFCC).
(3) Joint Acoustic-Articulatory based Temporal Alignment (JAATA)
(4) Evaluation of temporal alignment results in terms of Average Phonetic-boundary Distance (APD).
(5) Spatial alignment based on EMA palate estimation and grid search.
(6) Generation of registered videos

See our published paper in ICASSP 2013 for technical details.
Jangwon Kim, Adam Lammert, Prasanta Ghosh and Shrikanth S. Narayanan, "Spatial and temporal alignment of multimodal human speech production data: Real time imaging, flesh point tracking and audio," in Proceedings of ICASSP, Vancouver, May, 2013 (PDF)

Contact Jangwon Kim for technical support.
Email: jangwon@usc.edu

Download

Version 1.0 (updated on September 23 2013)

Demo video: Registered EMA and rtMRI

Co-registration results of EMA and rtMRI data.
The two speech production data were recorded with the same stimulus at different instances.
The same stimuli: "Publicity and notoriety go hand in hand" (sent024002).

1. Before alignment



2. Initial alignment (Dynamic Time Warping with only MFCCs)



3. Final alignment (JAATA with MFCCs + articulatory features)



4. Final alignment (JAATA) with vocal airway boundaries





Updated by Jangwon Kim on September 23 2013