USC

University of Southern California

Viterbi School of Engineering

Emotions Research

 



Current Research/Projects:

Analysis of Expressive Speech

Human speech carries information about both the linguistic content as well as the emotional/attitudinal state of the speaker. The goal is to obtain detailed acoustic knowledge on how the speech signal is modulated by changes from an emotionally neutral state to a specific emotionally aroused state.

Analysis of Multimodal Emotion Expression

Emotions in Text

The study of emotions in text aims to recognize when emotions are expressed in text and generate text with specified emotions. Text abounds on the internet and emotions are frequently expressed in blogs, news comments, and product reviews. Also, processing the text of spoken language after speech recognition can allow for deeper analysis of the spoken emotional content of speech. Some open questions are, (1) what are the best textual features to recognize emotions in text, how to combine textual emotion recognition with acoustic and other modalities, (2) how to effectively use the web to both recognize and display emotional content, and (4) how to recognize emotion at different scales, such as word, sentence, and document level.

Human-Human Interaction Modeling

In human-human interaction, interacting partners often show influences on each other's behaviors and user states. This phenomenon is commonly referred as mutual influence or entrainment behaviors between interlocutors. The goal of interaction modeling is to bring insights into this phenomenon of multi-agents behaviors in human communication through detailed analysis of such effect at multiple time scales and quantitative statistical modeling to describe such effect. Applications, such as predicting overall dialog attributes, performing automatic meeting analysis, providing guidelines to the design of synthetic interactive agents, and inferring individual's user states in communication, will be more reliable and natural by incorporating this notion of entrainment between interlocutors.

Human Perception of Emotion Expressions

Proper design of emotional synthetic agent (humanoid robot, computer avatar, etc.) behavior requires an understanding of how humans perceive emotions. Common design methods include expert consultations. However, this method is expensive and does not permit on-the-fly creation of synthetic emotional behavior. Quantitative models of emotion perception will allow us to streamline the emotion creation process.




Past Research/Projects:

Expressive Speech Synthesis

Speech synthesis is a complicated process where the input text is processed to produce intelligible and natural, human-sounding speech output. First, the text input is processed by the Natural Language Processing (NLP) module to generate a "meaningful" representation of the input, which is then fed to the Digital Signal Processing (DSP) module to generate the final speech signal. The current speech synthesis technology is capable of producing highly intelligible and natural speech. However, the produced speech is not exactly like the human speech, because mostly it is neutral, that is, it has no emotions. In our research we are trying to make this output "emotional". We are designing algorithms to build an Emotional Speech Generation (ESG) module that will produce expressive speech. Our goal is to generate expressive speech - through modifying prosody and spectral characteristics - which will be correctly perceived by human listeners with the intended emotion. Following an experimental methodology we investigate the individual and combined effects of modifying each parameter in different levels (phoneme, syllable, word, phrase, sentence) and formulate rules that can be used to impart emotional characteristics to non-emotional (i.e., neutral) speech.

Recognition of Expressive Speech

The importance of automatically recognizing emotions from human speech has grown with the increasing role of spoken language interfaces in human-computer interaction applications. This study addresses the design of an automatic emotion recognition system using spoken language information through signal processing and pattern recognition techniques.

Using the Little Children Multimedia Project database, our goal is to identify the visual, acoustic, and lexical cues correlated with the presence of uncertainty in young children interacting with a computer. Eventually, we hope to automatically identify uncertainty through the automatic detection and fusion of these cues. This research will help enhance the naturalness and efficiency of human-computer interactions, especially those related to educational purposes.

Expressive Human-Robot Interfaces

Human perception of robotic and simulated character emotions in the presence of conflicting and congruent vocal and facial expression information. Expressive robot: We are researching techniques to further the understanding of face-to-face communication techniques through the use of robotic and computer simulated characters. This analysis will incorporate aspects of personality type, familiarity with the technology, gender, etc. to investigate how individuals of various groups rate these conflicting and congruent emotional presentations. This research may provide the community with a more fundamental understanding of how individuals interpret emotional expressions with respect to vocal and facial information. This understanding will motivate design principles describing how to design robotic behavior to create emotional experiences that are understood by large groups of users.

Multimodal Analysis of Human Expressions

Since the communicative channels are not only strongly connected, but also systematically synchronized along different scales (phonemes-words-phrases-sentences), a joint analysis of these modalities is needed to fully understand expressive human communication. We are studying the relationship and interplay between gestures and speech during expressive utterances. We are especially interested in analyzing under a multimodal approach how linguistic and affective goals are jointly fulfilled through modulation of facial expressions and acoustic speech.

Expressive speech production

Vocing activity under the control of emotion is studied based on electroglottography and inverse-filtering. Assuming that emotional state affects the movements of muscle in vocal folds, the interplay of voicing activity and other acoustic control (pitch and energy) and its idiosyncratic ways of individuals are being investigated.

Laughter Synthesis

Presently, the goal of researchers in the speech synthesis field is to include expressive and emotional content in machine synthesized speech to enhance its naturalness which includes incorporating non-verbal cues appropriate to the context. One main motivation comes from the development of interactive applications in entertainment/games, education and even business services. Synthesis of laughter can be viewed as a part of expressive communication for instance, synthesized laughter can be used by itself, or along with "happy" speech to express the positive emotion of happiness better.