Analysis of Expressive Speech
Human speech carries information about both the linguistic content as well as the emotional/attitudinal state of the speaker.
The goal is to obtain detailed acoustic knowledge on how the speech signal is modulated by changes from an emotionally
neutral state to a specific emotionally aroused state.
Expressive Speech Synthesis
Speech synthesis is a complicated process where the input text
is processed to produce intelligible and natural, human-sounding speech output.
First, the text input is processed by the Natural Language Processing (NLP) module
to generate a "meaningful" representation of the input, which is then fed to the Digital
Signal Processing (DSP) module to generate the final speech signal. The current speech
synthesis technology is capable of producing highly intelligible and natural speech. However,
the produced speech is not exactly like the human speech, because mostly it is neutral, that is,
it has no emotions. In our research we are trying to make this output "emotional". We are designing algorithms
to build an Emotional Speech Generation (ESG) module that will produce expressive speech.
Our goal is to generate expressive speech - through modifying prosody and spectral characteristics -
which will be correctly perceived by human listeners with the intended emotion. Following an
experimental methodology we investigate the individual and combined effects of modifying each
parameter in different levels (phoneme, syllable, word, phrase, sentence) and formulate rules
that can be used to impart emotional characteristics to non-emotional (i.e., neutral) speech.
Recognition of Expressive Speech
The importance of automatically recognizing emotions from human speech has
grown with the increasing role of spoken language interfaces in human-computer
interaction applications. This study addresses the design of an automatic emotion
recognition system using spoken language information through signal processing and
pattern recognition techniques.
Using the Little Children Multimedia Project database, our goal is to identify the
visual, acoustic, and lexical cues correlated with the presence of uncertainty in young
children interacting with a computer. Eventually, we hope to automatically identify
uncertainty through the automatic detection and fusion of these cues. This research
will help enhance the naturalness and efficiency of human-computer interactions,
especially those related to educational purposes.
Expressive Human-Robot Interfaces
Human perception of robotic and simulated character emotions in the presence of
conflicting and congruent vocal and facial expression information. Expressive robot: We are
researching techniques to further the understanding of face-to-face communication
techniques through the use of robotic and computer simulated characters. This analysis
will incorporate aspects of personality type, familiarity with the technology, gender,
etc. to investigate how individuals of various groups rate these conflicting and
congruent emotional presentations. This research may provide the community with
a more fundamental understanding of how individuals interpret emotional expressions with
respect to vocal and facial information. This understanding will motivate design
principles describing how to design robotic behavior to create emotional experiences that
are understood by large groups of users.
Multimodal Analysis of Human Expressions
Since the communicative channels are not only strongly connected, but also
systematically synchronized along different scales (phonemes-words-phrases-sentences),
a joint analysis of these modalities is needed to fully understand expressive human
communication. We are studying the relationship and interplay between gestures and speech during
expressive utterances. We are especially interested in analyzing under a multimodal
approach how linguistic and affective goals are jointly fulfilled through modulation of
facial expressions and acoustic speech.
Expressive speech production
Vocing activity under the control of emotion is studied based on electroglottography and
inverse-filtering. Assuming that emotional state affects the movements of muscle in
vocal folds, the interplay of voicing activity and other acoustic control (pitch and energy)
and its idiosyncratic ways of individuals are being investigated.
Emotions in Text
The study of emotions in text aims to recognize when emotions are
expressed in text and generate text with specified emotions. Our work
aims to get at the meaning of language to better determine the
emotional content. Some open questions are, (1) how to combine
textual emotion recognition with acoustic and other modalities, (2)
how to effectively use the web to both display and recognize emotional
content, and (3) what are the best methods for detecting emotion in text both
at the sentence and document levels.
Laughter Synthesis
Presently, the goal of researchers in the speech synthesis field is to include expressive and emotional
content in machine synthesized speech to enhance its naturalness which includes incorporating non-verbal
cues appropriate to the context. One main motivation comes from the development of interactive applications in entertainment/games,
education and even business services. Synthesis of laughter can be viewed as a part of
expressive communication for instance, synthesized laughter can be used by itself, or along with "happy"
speech to express the positive emotion of happiness better.