[ Digbalay Bose ]

Understanding of Emotion Perception from Art

Digbalay Bose, Krishna Somandepalli, Souvik Kundu, Rimita Lahiri, Jonathan Gratch, Shrikanth Narayanan;4th ICCV CLVL Workshop 2021

Visual arts is a rich medium for expressing human thoughts and emotions. Emotion invoked by art-work in viewers is highly subjective. Here, we analyze images and the accompanying text captions from the viewers expressing emotions as a multimodal classification task. We obtain improvements in performance for extreme positive and negative emotion classes, when a single-stream multimodal model like MMBT is compared with a text-only transformer model like BERT.

Paper Slides

Cross Domain emotion recognition using few shot knowledge transfer

Justin Olah, Sabyasachee Baruah, Digbalay Bose, Shrikanth Narayanan;arXiv:2110.05021

Emotion recognition from text is a challenging task due to diverse emotion taxonomies, lack of reliable labeled data in different domains, and highly subjective annotation standards. Few-shot and zero-shot techniques can generalize across unseen emotions by projecting the documents and emotion labels onto a shared embedding space. In this work, we explore the task of few-shot emotion recognition by transferring the knowledge gained from supervision on the GoEmotions Reddit dataset to the SemEval tweets corpus, using different emotion representation methods.

Paper

Automated analysis of asymmetry in facial paralysis patients using landmark-based measures

Digbalay Bose, Krishna Somandepalli, Tymon Tai, Courtney Voelker, Shrikanth Narayanan, Amit Kochhar;Facial Plastic Surgery and Aesthetic Medicine (Under Submission)

Facial paralysis, which arises from insult to the facial nerve and/or facial muscles, results in varying degrees of disfigurement. Existing facial paralysis assessment systems rely on guidance from clinical experts and require significant technical expertise. In this work, we explore the use of facial landmarks in an automatic assessment system of clinical videos by proposing a suite of facial asymmetry measures. For the purpose of the study, we consider 77 subjects across two different datasets and perform linear mixed-effects modeling for predicting standardized eFACE scores from the proposed asymmetry measures and additional information like gender, age. Certain measures based on eye-opening, oral-commissure and brow elevation exhibit statistically significant negative effects, while predicting eFACE scores. Further, correlation analysis using Spearman rank correlation between certain eFACE scores and asymmetry measures reveal significant negative correlations, thus capturing the underlying relationships between higher eFACE scores and lower asymmetry measures.

Future Sales Prediction

Machine Learning (CSCI 567), USC

We propose an ensembling of decision tree model based future sales prediction. In particular, we present a detailed description of feature engineering necessary to generate trainable parameters and then perform detailed model performance study with various state of the art decision tree based models. Our best performing model resulted in a RMSE of 0.87605 achieving a class rank of 6 and Global Kaggle leaderboard rank of 80.

Report Slides Github

Visual Question Answering

Deep Learning and its Applications (CSCI 599), USC

Visual Question Answering is one of the challenging AI tasks, which involves both reasoning about the image content and understanding the question to provide open ended natural language answer. Here in this work, we explore different deep neural network based models for generating natural language based answers.

Report Poster Github

Multimodal Emotion Recognition

Deep Learning for Speech Processing (EE 599), USC

We developed CNN(VGG-VOX) based architecture for analyzing speech utterances followed by fusion using text representations for utterance level emotion label. We obtained competitive results on MELD and CMU-MOSEI datasets.

Slides

Research Projects