Current Research/Projects:
Analysis of Expressive Speech
Human speech carries information about both the linguistic content as well as the emotional/attitudinal state of the speaker.
The goal is to obtain detailed acoustic knowledge on how the speech signal is modulated by changes from an emotionally
neutral state to a specific emotionally aroused state.
Analysis of Multimodal Emotion Expression
Emotions in Text
The study of emotions in text aims to recognize when emotions are expressed in text and generate text with specified emotions. Text abounds on the internet and emotions are frequently expressed in blogs, news comments, and product reviews. Also, processing the text of spoken language after speech recognition can allow for deeper analysis of the spoken emotional content of speech. Some open questions are, (1) what are the best textual features to recognize emotions in text, how to combine textual emotion recognition with acoustic and other modalities, (2) how to effectively use the web to both recognize and display emotional content, and (4) how to recognize emotion at different scales, such as word, sentence, and document level.
Human-Human Interaction Modeling
In human-human interaction, interacting partners often show influences on each other's behaviors and user states.
This phenomenon is commonly referred as mutual influence or entrainment behaviors between interlocutors.
The goal of interaction modeling is to bring insights into this phenomenon of multi-agents behaviors in human communication through detailed analysis of such effect at multiple time scales and quantitative statistical modeling to describe such effect. Applications, such as predicting overall dialog attributes, performing automatic meeting analysis, providing guidelines to the design of synthetic interactive agents, and inferring individual's user states in communication, will be more reliable and natural by incorporating this notion of entrainment between interlocutors.
Human Perception of Emotion Expressions
Proper design of emotional synthetic agent (humanoid robot, computer avatar, etc.) behavior requires an understanding of how humans perceive emotions. Common design methods include expert consultations. However, this method is expensive and does not permit on-the-fly creation of synthetic emotional behavior. Quantitative models of emotion perception will allow us to streamline the emotion creation process.
Past Research/Projects:
Expressive Speech Synthesis
Speech synthesis is a complicated process where the input text
is processed to produce intelligible and natural, human-sounding speech output.
First, the text input is processed by the Natural Language Processing (NLP) module
to generate a "meaningful" representation of the input, which is then fed to the Digital
Signal Processing (DSP) module to generate the final speech signal. The current speech
synthesis technology is capable of producing highly intelligible and natural speech. However,
the produced speech is not exactly like the human speech, because mostly it is neutral, that is,
it has no emotions. In our research we are trying to make this output "emotional". We are designing algorithms
to build an Emotional Speech Generation (ESG) module that will produce expressive speech.
Our goal is to generate expressive speech - through modifying prosody and spectral characteristics -
which will be correctly perceived by human listeners with the intended emotion. Following an
experimental methodology we investigate the individual and combined effects of modifying each
parameter in different levels (phoneme, syllable, word, phrase, sentence) and formulate rules
that can be used to impart emotional characteristics to non-emotional (i.e., neutral) speech.
Recognition of Expressive Speech
The importance of automatically recognizing emotions from human speech has
grown with the increasing role of spoken language interfaces in human-computer
interaction applications. This study addresses the design of an automatic emotion
recognition system using spoken language information through signal processing and
pattern recognition techniques.
Using the Little Children Multimedia Project database, our goal is to identify the
visual, acoustic, and lexical cues correlated with the presence of uncertainty in young
children interacting with a computer. Eventually, we hope to automatically identify
uncertainty through the automatic detection and fusion of these cues. This research
will help enhance the naturalness and efficiency of human-computer interactions,
especially those related to educational purposes.
Expressive Human-Robot Interfaces
Human perception of robotic and simulated character emotions in the presence of
conflicting and congruent vocal and facial expression information. Expressive robot: We are
researching techniques to further the understanding of face-to-face communication
techniques through the use of robotic and computer simulated characters. This analysis
will incorporate aspects of personality type, familiarity with the technology, gender,
etc. to investigate how individuals of various groups rate these conflicting and
congruent emotional presentations. This research may provide the community with
a more fundamental understanding of how individuals interpret emotional expressions with
respect to vocal and facial information. This understanding will motivate design
principles describing how to design robotic behavior to create emotional experiences that
are understood by large groups of users.
Multimodal Analysis of Human Expressions
Since the communicative channels are not only strongly connected, but also
systematically synchronized along different scales (phonemes-words-phrases-sentences),
a joint analysis of these modalities is needed to fully understand expressive human
communication. We are studying the relationship and interplay between gestures and speech during
expressive utterances. We are especially interested in analyzing under a multimodal
approach how linguistic and affective goals are jointly fulfilled through modulation of
facial expressions and acoustic speech.
Expressive speech production
Vocing activity under the control of emotion is studied based on electroglottography and
inverse-filtering. Assuming that emotional state affects the movements of muscle in
vocal folds, the interplay of voicing activity and other acoustic control (pitch and energy)
and its idiosyncratic ways of individuals are being investigated.
Laughter Synthesis
Presently, the goal of researchers in the speech synthesis field is to include expressive and emotional
content in machine synthesized speech to enhance its naturalness which includes incorporating non-verbal
cues appropriate to the context. One main motivation comes from the development of interactive applications in entertainment/games,
education and even business services. Synthesis of laughter can be viewed as a part of
expressive communication for instance, synthesized laughter can be used by itself, or along with "happy"
speech to express the positive emotion of happiness better.