research

Emotion expression in human communication causes modulations in speech prosody, facial and bodily gestures as well as in lexical usage. Furthermore, in interactions, these modulations also affect the behaviors of interaction partners, shaping the conversational discourse and altering the communicative environment and outcome. We view emotion as an ever-presenting, background component in the human cognitive system which interacts with perceived events from the environment and determine our responses to the outside world. Please click below subtopics for more details.



    Multimodal Emotion Expression

    Emotional expression is a complex interplay of complementary, supplementary or even conflicting multimodal cues, such as facial expressions, body language, speech prosody. Extracting informative features as well as understanding and effectively modeling the dynamics of these modalities is important for building emotion recognition systems. Our work focuses on encoding emotion using multimodal signals, as well as analyzing emotional effect on the interplay between multimodal cues.
    People involved: Carlos Busso(Alumni), Angeliki Metallinou(Alumni), Zhaojun Yang(PhD student)
    Corpus: IEMOCAP, CreativeIT


    highlighted publications
    • Z. Yang, A. Metallinou, E. Erzin, S. Narayanan, "Analysis of Interaction Attitudes Using Data-driven Hand Gesture Phrases", Proc. of ICASSP, 2014.
    • Z. Yang, A. Metallinou, S. Narayanan, "Analysis and Predictive Modeling of Body Language Behavior in Dyadic Interactions from Multimodal Interlocutor Cues", IEEE Transactions on Multimedia, pp. 1-13, 2014.
    • Z. Yang, A. Ortega, S. Narayanan, "Gesture Dynamics Modeling For Attitude Analysis Using Graph Based Transform", Proc. of ICIP, 2014.
    • Zhaojun Yang, Angeliki Metallinou, Engin Erzin, Shrikanth Narayanan, "Analysis of interaction attitudes using data-driven hand gesture phrases", Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pp. 699-703, 2014.
    • Z. Yang, A. Metallinou, S. Narayanan, "TOWARD BODY LANGUAGE GENERATION IN DYADIC INTERACTION SETTINGS FROM INTERLOCUTOR MULTIMODAL CUES", Proc. of ICASSP, 2013.
    • Angeliki Metallinou, Martin Wollmer, Athanasios Katsamanis, Florian Eyben, Björn Schuller, Shrikanth Narayanan, "Context-sensitive learning for enhanced audiovisual emotion classification", IEEE Transactions on Affective Computing, IEEE, vol. 3, no. 2, pp. 184-198, 2012.
    • Carlos Busso, Shrikanth S Narayanan, "Interrelation between speech and facial gestures in emotional utterances: a single subject study", IEEE Transactions on Audio, Speech, and Language Processing, IEEE, vol. 15, no. 8, pp. 2331-2347, 2007.
    • Carlos Busso, Zhigang Deng, Serdar Yildirim, Murtaza Bulut, Chul Min Lee, Abe Kazemzadeh, Sungbok Lee, Ulrich Neumann, Shrikanth Narayanan, "Analysis of emotion recognition using facial expressions, speech and multimodal information", Proceedings of the 6th international conference on Multimodal interfaces [Ten Year Technical Impact Award, 2014 ICMI], pp. 205-211, 2004.

    Interaction Modeling

    In spontaneous interpersonal interactions, interlocutors' behavioral and mental/affective states influence each other. It is an integral part of the dialogue and often shapes the overall tones of affective interactions through this interactive natural flow between interlocutors. The goal of interaction modeling is to bring insights into this phenomenon of multi-agents behaviors in human communication by exploring the mutual behavior effect unfolded along interaction discourse towards conveying the underlying mental states.
    People involved: Chi-Chun (Jeremy) Lee(Alumni), Angeliki Metallinou(Alumni), Theodora Chaspari(PhD student), Zhaojun Yang(PhD student)
    Corpus: IEMOCAP, CreativeIT


    highlighted publications
    • Zhaojun Yang, Angeliki Metallinou, Shrikanth S. Narayanan, "Analysis and Predictive Modeling of Body Language Behavior in Dyadic Interactions from Multimodal Interlocutor Cues", Transactions on Multimedia, IEEE, vol. 16, no. 3, pp. 1766-1778, 2014.
    • Z. Yang, A. Metallinou, E. Erzin, S. Narayanan, "Analysis of Interaction Attitudes Using Data-driven Hand Gesture Phrases", Proc. of ICASSP, 2014.
    • Z. Yang, A. Metallinou, S. Narayanan, "Analysis and Predictive Modeling of Body Language Behavior in Dyadic Interactions from Multimodal Interlocutor Cues", IEEE Transactions on Multimedia, pp. 1-13, 2014.
    • Z. Yang, A. Ortega, S. Narayanan, "Gesture Dynamics Modeling For Attitude Analysis Using Graph Based Transform", Proc. of ICIP, 2014.
    • Chi-Chun Lee, Athanasios Katsamanis, Matthew P Black, Brian R Baucom, Andrew Christensen, Panayiotis G Georgiou, Shrikanth S Narayanan, "Computing vocal entrainment: A signal-derived PCA-based quantification scheme with application to affect analysis in married couple interactions", Computer Speech & Language, Elsevier, vol. 28, no. 2, pp. 518-539, 2014.
    • Z. Yang, A. Metallinou, S. Narayanan, "TOWARD BODY LANGUAGE GENERATION IN DYADIC INTERACTION SETTINGS FROM INTERLOCUTOR MULTIMODAL CUES", Proc. of ICASSP, 2013.
    • Angeliki Metallinou, Athanasios Katsamanis, Shrikanth Narayanan, "Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information", Image and Vision Computing, Elsevier, vol. 31, no. 2, pp. 137-152, 2013.

    Emotion Recognition and Tracking

    Emotion is at the core of human behavior, influencing our decision-making. This is quite evident in human communication, in which both participants transmit affecting signals through multiple modalities (vocal, visual, gestural) that are key to moderating the exchange. Consider, for example, the difficulties that occur when communicating by text, where affective modulation is missing. We aim to make systems that can track emotions continuously to support improved human-machine interfaces as well as empirical research. We concentrate on robust systems, continuous tracking from multiple modalities, and simultaneous modeling of joint affective dynamics.
    People involved: Daniel Bone(PhD student), Chi-Chun (Jeremy) Lee(Alumni), Angeliki Metallinou(Alumni)
    Corpus: IEMOCAP, CreativeIT, Vera Am Mittag (VAM)


    highlighted publications
    • Daniel Bone, Chi-Chun Lee, Alexandros Potamianos, Shrikanth Narayanan, "An Investigation of Vocal Arousal Dynamics in Child-Psychologist Interactions using Synchrony Measures and a Conversation-based Model", INTERSPEECH, 2014.
    • Angeliki Metallinou, Athanasios Katsamanis, Shrikanth Narayanan, "Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information", Image and Vision Computing, Elsevier, vol. 31, no. 2, pp. 137-152, 2013.
    • Chi-Chun Lee, Emily Mower, Carlos Busso, Sungbok Lee, Shrikanth Narayanan, "Emotion recognition using a hierarchical binary decision tree approach", Speech Communication, Elsevier, vol. 53, no. 9, pp. 1162-1171, 2011.
    • Chi-Chun Lee, Carlos Busso, Sungbok Lee, Shrikanth S Narayanan, "Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions.", INTERSPEECH, pp. 1983-1986, 2009.
    • Chul Min Lee, Shrikanth S Narayanan, "Toward detecting emotions in spoken dialogs", Speech and Audio Processing, IEEE Transactions on [IEEE Signal Processing Society Best Paper Award 2009], vol. 13, no. 2, pp. 293-303, 2005.

    Emotional language and sentiment analysis

    The analysis of the emotional content of language is an open research problem, relevant for numerous NLP, web and multi-modal dialogue applications. Popular targets include identifying the emotions expressed in (or perceived from) product reviews, social media posts, movie subtitles and psychotherapy sessions. The problem is non-trivial even in the simplest of cases, due to the general difficulty of language understanding, and further complicated by more elaborate expressive tools such as metaphors, humor and sarcasm. Our research adopts the bottom-up compositional view and uses distributional semantics to estimate emotional content at the word level, n-gram level and upwards to the sentence and utterance level. Branching from that we explore the problems of composition and figurative language use, as well as attempting to integrate the extracted emotional content over time.
    People involved: Nikolaos Malandrakis, Abe Kazemzadeh
    Applications and Demos: Sentiment analysis


    highlighted publications
    • Nikolaos Malandrakis, Michael Falcone, Colin Vaz, Jesse James Bisogni, Alexandros Potamianos, Shrikanth Narayanan, "SAIL: Sentiment Analysis using Semantic Similarity and Contrast Features", Proceedings of SemEval 2014, Association for Computational Linguistics and Dublin City University, Dublin, Ireland, pp. 512-516, 2014.
    • Nikolaos Malandrakis, Alexandros Potamianos, Elias Iosif, Shrikanth Narayanan, "Distributional Semantic Models for Affective Text Analysis", Audio, Speech, and Language Processing, IEEE Transactions on, vol. 21, no. 11, pp. 2379-2392, 2013.

    Emotion Synthesis and Generation

    While traditional speech synthesis work has concentrated on improving intelligibility and naturalness, an additional goal of emotion synthesis research is improving expressiveness. Our past work has addressed this problem through both speech signal manipulations and the automatic generation of multimodal cues (such as head motion gestures) for affective behavior generation. These technologies have applications in the enhancement of animated characters and human-computer interactions, as well as providing tools for analysis-by-synthesis based speech communication and behavioral science research. Current focus includes emotional articulatory synthesis and cross-lingual transfer of expressive information.
    People involved: Murtaza Bulut(Alumni), Carlos Busso(Alumni), Andreas Tsiartas(Alumni), Jangwon Kim
    Applications and Demos: Examples of re-synghesized emotional speech audio: Search for Murtaza Bulut


    highlighted publications
    • Murtaza Bulut, Sungbok Lee, Shrikanth Narayanan, "Recognition for synthesis: automatic parameter selection for resynthesis of emotional speech from neutral speech", International Conference on Acoustics, Speech and Signal Processing, pp. 4629 - 4632, 2008.
    • Carlos Busso, Zhigang Deng, Michael Grimm, Ulrich Neumann, Shrikanth Narayanan, "Rigid head motion in expressive speech animation: Analysis and synthesis", Transactions on Audio, Speech, and Language Processing, IEEE, vol. 15, no. 3, pp. 1075 - 1086, 2007.
    • Murtaza Bulut, Carlos Busso, Serdar Yildirim, Abe Kazemzadeh, Chul Min Lee, Sungbok Lee, Shrikanth Narayanan, "Investigating the role of phoneme-level modifications in emotional speech resynthesis.", INTERSPEECH, ISCA, pp. 801 - 804, 2005.
    • Murtaza Bulut, Carlos Busso, Serdar Yildirim, Abe Kazemzadeh, Chul Min Lee, Sungbok Lee, Shrikanth Narayanan, "Expressive Speech Synthesis Using a Concatenative Synthesizer", International Conference on Spoken Language Processing, ISCA, pp. 1265 - 1268, 2002.

    Emotion Representation

    Emotion expression and perception are complex process that involves ambiguity and inter- and intra-speaker variability. Hence, a small number of categorical emotion labels (e.g., happiness, anger, sadness) may not be enough to handle the subtlety and complexity of the emotional states. We attempted to address this problem in emotion annotation, modeling, recognition using speech and body gesture signals. Our past work includes emotion privitive description, emotion profiles (EPs) and continuous arousal/valence labeling and tracking.
    People involved: Michael Grimm(Alumni), Emily Mower Provost(Alumni), Angeliki Metallinou(Alumni)
    Applications and Demos:


    highlighted publications
    • Angeliki Metallinou, Shrikanth Narayanan, "Annotation and processing of continuous emotional attributes: Challenges and opportunities", Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, pp. 1-8, 2013.
    • Emily Mower, Maja J Mataric, Shrikanth Narayanan, "A framework for automatic human emotion classification using emotion profiles", Audio, Speech, and Language Processing, IEEE Transactions on, IEEE, vol. 19, no. 5, pp. 1057-1070, 2011.
    • Emily Mower, Angeliki Metallinou, Chi-Chun Lee, Abe Kazemzadeh, Carlos Busso, Sungbok Lee, Shrikanth Narayanan, "Interpreting ambiguous emotional expressions", Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on, pp. 1-8, 2009.
    • Michael Grimm, Kristian Kroschel, Emily Mower, Shrikanth Narayanan, "Primitives-based evaluation and estimation of emotions in speech", Speech Communication, Elsevier, vol. 49, no. 10, pp. 787-800, 2007.
    • M Grimm, E Mower, K Kroschel, S Narayanan, "Combining categorical and primitives-based emotion recognition", 14th European Signal Processing Conference (EUSIPCO), Florence, Italy, 2006.

    Emotional Speech Production

    Investigating how emotional information is encoded in speech is important in both a scientific point of view and for technical applications. For speech science, it advances our understanding of human speech generation and speech communication. For technical aspects, it provides novel insights for better computational tools for the analysis, recognition and systhesis of emotional speech. Emotional speech production datasets are collected using both ElectroMagnetic Articulography (EMA) and real-time Magnetic Resonance Imaging (rtMRI). Our interests includes representation, modeling and synthesis of emotional speech production. First, we explore automatic and robust ways of extracting articulatory information from these datasets. More importantly, we investigate variability and invariant aspects in the articulatory motions and prosodic behaviors jointly, in emotional speech. We also improves the performance of tools by utilizing the speech production information.
    People involved: Jangwon Kim
    Applications and Demos: Co-registration of multimodal speech production data, Robust vocal tract parameter extraction


    highlighted publications
    • Jangwon Kim, Naveen Kumar, Sungbok Lee, Shrikanth Narayanan, "Enhanced airway-tissue boundary segmentation for real-time magnetic resonance imaging data", 10-th International Seminar on Speech Production (ISSP), pp. 222 - 225, 2014.
    • Jangwon Kim, Donna Erickson, Sungbok Lee, Shrikanth Narayanan, "A study of invariant properties and variation patterns in the Converter/Distributor model for emotional speech", Interspeech, 2014.
    • Jangwon Kim, Adam Lammert, Prasanta Ghosh, Shrikanth Narayanan, "Spatial and temporal alignment of multimodal human speech production data: Real time imaging, flesh point tracking and audio", ICASSP, pp. 3637 - 3641, 2013.
    • Jangwon Kim, Adam Lammert, Prasanta Ghosh, Shrikanth Narayanan, "Co-registration of speech production datasets from electromagnetic articulography and real-time magnetic resonance imaging", The Journal of the Acoustical Society of America (Express Letter), vol. 135, pp. EL115 - EL121, (in Press) 2014.

    Tools and Corpus

    Emotional information is encoded as a combination of verbal (speech) and non-verbal (face and gestures) channels. To investigate emotional variation in the individual and multiple channels, we have collected a various types of multi-modal corpus, e.g., "interactive emotional dyadic motion capture database" (refered to as IEMOCAP), "creative and emotive improvisation in theatre performance (CreativeIT), "electromagnetic articulography" (EMA). We also develop tools for descriptive emotion labeling (emotion 20 questions) and analysis (sentiment analysis toolkit).
    People involved: Carlos Busso(Alumni), Murtaza Bulut(Alumni), Chi-Chun (Jeremy) Lee(Alumni), Abe Kazemzadeh(Alumni), Samuel Kim(Alumni), Serdar Yildirim(Alumni), Emily Mower Provost(Alumni), Angeliki Metallinou(Alumni), Jangwon Kim(PhD student), Nikolaos Malandrakis
    Links: Resources


    highlighted publications
    • Abe Kazemzadeh, James Gibson, Juanchen Li, Sungbok Lee, Panayiotis G Georgiou, Shrikanth Narayanan, "A Sequential Bayesian Dialog Agent for Computational Ethnography", INTERSPEECH, 2012.
    • Abe Kazemzadeh, Sungbok Lee, Panayiotis G Georgiou, Shrikanth Narayanan, "Emotion twenty questions: Toward a crowd-sourced theory of emotions", Affective Computing and Intelligent Interaction, Springer, pp. 1-10, 2011.
    • Angeliki Metallinou, Chi-Chun Lee, Carlos Busso, Sharon Carnicke, Shrikanth Narayanan, "The USC CreativeIT database: A multimodal database of theatrical improvisation", Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality 18 May 2010, Citeseer, pp. 55, 2010.
    • Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, Shrikanth S Narayanan, "IEMOCAP: Interactive emotional dyadic motion capture database", Language resources and evaluation, Springer, vol. 42, no. 4, pp. 335-359, 2008.
    • Sungbok Lee, Serdar Yildirim, Abe Kazemzadeh, Shrikanth Narayanan, "An articulatory study of emotional speech production", INTERSPEECH, pp. 497 - 500, 2005.