USC

University of Southern California

Viterbi School of Engineering

Emotions Research

 

Analysis of Expressive Speech

Human speech carries information about both the linguistic content as well as the emotional/attitudinal state of the speaker. The goal is to obtain detailed acoustic knowledge on how the speech signal is modulated by changes from an emotionally neutral state to a specific emotionally aroused state.

Expressive Speech Synthesis

Speech synthesis is a complicated process where the input text is processed to produce intelligible and natural, human-sounding speech output. First, the text input is processed by the Natural Language Processing (NLP) module to generate a "meaningful" representation of the input, which is then fed to the Digital Signal Processing (DSP) module to generate the final speech signal. The current speech synthesis technology is capable of producing highly intelligible and natural speech. However, the produced speech is not exactly like the human speech, because mostly it is neutral, that is, it has no emotions. In our research we are trying to make this output "emotional". We are designing algorithms to build an Emotional Speech Generation (ESG) module that will produce expressive speech. Our goal is to generate expressive speech - through modifying prosody and spectral characteristics - which will be correctly perceived by human listeners with the intended emotion. Following an experimental methodology we investigate the individual and combined effects of modifying each parameter in different levels (phoneme, syllable, word, phrase, sentence) and formulate rules that can be used to impart emotional characteristics to non-emotional (i.e., neutral) speech.

Recognition of Expressive Speech

The importance of automatically recognizing emotions from human speech has grown with the increasing role of spoken language interfaces in human-computer interaction applications. This study addresses the design of an automatic emotion recognition system using spoken language information through signal processing and pattern recognition techniques.

Using the Little Children Multimedia Project database, our goal is to identify the visual, acoustic, and lexical cues correlated with the presence of uncertainty in young children interacting with a computer. Eventually, we hope to automatically identify uncertainty through the automatic detection and fusion of these cues. This research will help enhance the naturalness and efficiency of human-computer interactions, especially those related to educational purposes.

Expressive Human-Robot Interfaces

Human perception of robotic and simulated character emotions in the presence of conflicting and congruent vocal and facial expression information. Expressive robot: We are researching techniques to further the understanding of face-to-face communication techniques through the use of robotic and computer simulated characters. This analysis will incorporate aspects of personality type, familiarity with the technology, gender, etc. to investigate how individuals of various groups rate these conflicting and congruent emotional presentations. This research may provide the community with a more fundamental understanding of how individuals interpret emotional expressions with respect to vocal and facial information. This understanding will motivate design principles describing how to design robotic behavior to create emotional experiences that are understood by large groups of users.

Multimodal Analysis of Human Expressions

Since the communicative channels are not only strongly connected, but also systematically synchronized along different scales (phonemes-words-phrases-sentences), a joint analysis of these modalities is needed to fully understand expressive human communication. We are studying the relationship and interplay between gestures and speech during expressive utterances. We are especially interested in analyzing under a multimodal approach how linguistic and affective goals are jointly fulfilled through modulation of facial expressions and acoustic speech.

Expressive speech production

Vocing activity under the control of emotion is studied based on electroglottography and inverse-filtering. Assuming that emotional state affects the movements of muscle in vocal folds, the interplay of voicing activity and other acoustic control (pitch and energy) and its idiosyncratic ways of individuals are being investigated.

Emotions in Text

The study of emotions in text aims to recognize when emotions are expressed in text and generate text with specified emotions. Our work aims to get at the meaning of language to better determine the emotional content. Some open questions are, (1) how to combine textual emotion recognition with acoustic and other modalities, (2) how to effectively use the web to both display and recognize emotional content, and (3) what are the best methods for detecting emotion in text both at the sentence and document levels.

Laughter Synthesis

Presently, the goal of researchers in the speech synthesis field is to include expressive and emotional content in machine synthesized speech to enhance its naturalness which includes incorporating non-verbal cues appropriate to the context. One main motivation comes from the development of interactive applications in entertainment/games, education and even business services. Synthesis of laughter can be viewed as a part of expressive communication for instance, synthesized laughter can be used by itself, or along with "happy" speech to express the positive emotion of happiness better.