Tiantian Feng, Rajat Hebbar, and Shrikanth Narayanan. TRUST-SER: On The Trustworthiness Of Fine-Tuning Pre-Trained Speech Embeddings For Speech Emotion Recognition. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 11201–11205, , April 2024.

Download

[PDF] 

Abstract

Recent studies have explored using pre-trained embeddings for speech emotion recognition, achieving comparable performance to conventional methods that rely on low-level knowledge-inspired acoustic features. These embeddings are often generated from models trained on large-scale speech datasets using self-supervised or weakly-supervised learning objectives. Despite the significant advancements made in SER through pre-trained embeddings, there is a limited understanding of the trustworthiness of these methods, including privacy breaches, unfair performance, vulnerability to adversarial attacks, and computational cost, all of which may hinder the real-world deployment of these systems. In response, we introduce TrustSER, a general framework designed to evaluate the trustworthiness of SER systems using deep learning methods, focusing on privacy, safety, fairness, and sustainability, offering unique insights into future research in the field of SER. Our code is publicly available under: https://github.com/usc-sail/trust-ser.

BibTeX Entry

@INPROCEEDINGS{10446616,
  author={Feng, Tiantian and Hebbar, Rajat and Narayanan, Shrikanth},
  booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title={TRUST-SER: On The Trustworthiness Of Fine-Tuning Pre-Trained Speech Embeddings For Speech Emotion Recognition},
  year={2024},
  volume={},
  number={},
  pages={11201-11205},
  abstract={Recent studies have explored using pre-trained embeddings for speech emotion recognition, achieving comparable performance to conventional methods that rely on low-level knowledge-inspired acoustic features. These embeddings are often generated from models trained on large-scale speech datasets using self-supervised or weakly-supervised learning objectives. Despite the significant advancements made in SER through pre-trained embeddings, there is a limited understanding of the trustworthiness of these methods, including privacy breaches, unfair performance, vulnerability to adversarial attacks, and computational cost, all of which may hinder the real-world deployment of these systems. In response, we introduce TrustSER, a general framework designed to evaluate the trustworthiness of SER systems using deep learning methods, focusing on privacy, safety, fairness, and sustainability, offering unique insights into future research in the field of SER. Our code is publicly available under: https://github.com/usc-sail/trust-ser.},
  keywords={Deep learning;Emotion recognition;Computational modeling;System performance;Speech recognition;Benchmark testing;Acoustics;speech;emotion recognition;self-supervision;trustworthiness},
  doi={10.1109/ICASSP48485.2024.10446616},
  ISSN={2379-190X},
  link = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446616},
  month={April},}

Generated by bib2html.pl (written by Patrick Riley ) on Thu Jul 04, 2024 06:37:02