MRI-TIMIT: a Multimodal Real-Time MRI Articulatory Corpus

MRI-TIMIT is a large-scale database of synchronized audio and real-time magnetic resonance imaging (rtMRI) data for speech research. The database currently consists of midsagittal upper airway MRI data and phonetically-transcribed companion audio, acquired from two male and two female speakers of American English.

MRI-TIMIT is now publicly available for research purposes as part of the broader USC-TIMIT database.

Subjects

ID	Gender	Age	Birthplace
M1	Male	29	Buffalo, NY
M2	Male	33	Ann Arbor, MI
W1	Female	23	Commack, NY
W2	Female	32	Westfield, IA

Corpus

The same 460-sentence phonetically balanced dataset used in the MOCHA-TIMIT corpus (Wrench 1999) was elicited from each subject.

Articulatory Data

Subject's upper airways were imaged in the midsagittal plane using a custom real-time MRI protocol (Bresch et al. 2008). MRI data were acquired at Los Angeles County Hospital on a Signa Excite HD 1.5T scanner using a 13-interleaf spiral gradient echo pulse sequence (Tr = 6.164 msec, FOV = 200x200 mm, flip angle = 15deg). Image resolution: 68x68 pixels. Video rate = 23.18 frames/sec.

Acoustic Data

Audio was simultaneously recorded at a sampling frequency of 20kHz inside the MRI scanner, using a custom fiber-optic microphone noise-cancelling system (Bresch et al. 2006) synchronized with the video signal through the scanner master clock. Time-aligned phonetic transcriptions of all utterances in the database were generated from the audio recordings, using the freely available tool SailAlign (Katsamanis et al. 2011).

Example Utterances

Subject M1:

Subject M2:

Subject W1:

Subject W2: