Logo: University of Southern California

Sail Tools

SailAlign: Robust long speech-text alignment

Long speech-text alignment can facilitate large-scale study of rich spoken language resources that have recently become widely accessible, e.g., collections of audio books, or multimedia documents. SailAlign is an open-source software toolkit for robust long speech-text alignment implementing an adaptive, iterative speech recognition and text alignment scheme that allows for the processing of very long (and possibly noisy) audio and is robust to transcription errors. (Read more...)

Pitch contour stylization — an optimal piecewise polynomial approximation of pitch trajectory

Pitch contour stylization is often required to parametrically represent the pitch trajectory for various application such as coding, synthesis. Pitch contour stylization may also be useful for studying intonation pattern in an utterance. This MATLAB-based pitch contour stylization tool provides the opportunity to stylize a given pitch trajectory optimally (i.e., MSE of the approximation is minimum) using a piecewise polynomial function where the number of pieces and the order of the polynomial is user-defined. (Read more...)

Bark Frequency Transform Using an Arbitrary Order Allpass Filter

The Bark frequency scale closely resembles the frequency analysis scale in the human ear. It ranges from 1 to 24 Barks, corresponding to the first 24 critical bands of hearing. The transformation function between linear frequency scale and the Bark scale is often modeled by the phase function of a allpass filter thereby providing the flexibility of processing the signal in the Bark frequency scale. This MATLAB-based tool offers a function which provides the parameters of a arbitrary order (user-defined) allpass filter which represents the transformation function in least MSE sense. (Read more...)

Emotion Twenty Questions Questioner Agent Demo

Emotion 20 Questions (EMO20Q) is a Turing-style test for computational verbal emotion intelligence. In this demo, the computer will try to guess the emotion that you are thinking of using natural language questions. (Read more...)

Emotion Tracking

Estimating continuous emotional states of a subject through time based on his/her body language information. More generally in can be used for estimating a continuous random variable through time based on a set of other continuous random variables. In this package, we are releasing a small subset of the data that were used in the above publication, specifically extracted body language features and activation values of 2 subjects for 6 recordings. This is a limited release to demonstrate the use of the code. We are not replicating the setup/results of our experiments in the above paper, as this would require a full release of the data. We are preparing a full release of the data for the future, so if you are interested please check this website for updates.For more details regarding the features, tracking method and experiments, please see the corresponding publication:
Angeliki Metallinou, Athanasios Katsamanis and Shrikanth Narayanan, Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information, Image and Vision Computing, Special Issue on Continuous Affect Analysis, accepted for publication 2012.

Speech Rate Estimation directly from the acoustic speech signal

Code (C) for a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription.
Dagen Wang and Shrikanth Narayanan. Robust speech rate estimation for spontaneous speech. IEEE Transactions on Audio, Speech and Language Processing. 15(8): 2190 - 2201, November 2007.

Vocal Arousal Rating

Code (Matlab) for tracking affective arousal from speech.
Daniel Bone, Chi-Chun Lee, and Shrikanth Narayanan, "Robust Unsupervised Arousal Rating: A rule-based framework with knowledge-inspired vocal features", IEEE Transactions on Affective Computing (in Press), 2014.