Shao-Yen Tseng, Shrikanth Narayanan, and Panayiotis Georgiou. Multimodal Embeddings from Language Models for Emotion Recognition in the Wild. IEEE Signal Processing Letters, 28():608–612, 2021.

Download

[PDF] 

Abstract

Word embeddings such as ELMo and BERT have been shown to model word usage in language with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant performance improvement across many natural language processing tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of a parallel stream to the bidirectional language model. This multimodal language model is trained on spoken language data that includes both text and audio modalities. We show that embeddings extracted from this model integrate paralinguistic cues into word meanings and can provide vital affective information by applying these multimodal embeddings to the task of speaker emotion recognition.

BibTeX Entry

@article{TsengSPL2021,
 abstract = {Word embeddings such as ELMo and BERT have been shown to model word usage in language with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant performance improvement across many natural language processing tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of a parallel stream to the bidirectional language model. This multimodal language model is trained on spoken language data that includes both text and audio modalities. We show that embeddings extracted from this model integrate paralinguistic cues into word meanings and can provide vital affective information by applying these multimodal embeddings to the task of speaker emotion recognition.},
 author = {Tseng, Shao-Yen  and Narayanan, Shrikanth and Georgiou, Panayiotis},
 doi = {10.1109/LSP.2021.3065598},
 issn = {1558-2361},
 journal = {IEEE Signal Processing Letters},
 keywords = {Acoustics;Task analysis;Feature extraction;Convolution;Emotion recognition;Context modeling;Bit error rate},
 link = {http://sail.usc.edu/publications/files/Tseng-SPL2021.pdf},
 month = {},
 number = {},
 pages = {608--612},
 title = {Multimodal Embeddings from Language Models for Emotion Recognition in the Wild},
 volume = {28},
 year = {2021}
}

Generated by bib2html.pl (written by Patrick Riley ) on Fri Oct 01, 2021 10:50:42