About Me

Research Interests

Hi I'm Rajat Hebbar. I am a PhD Student in the Signal Analysis and Interpretation Laboratory (SAIL) at USC.

My research interests include speech and audio understanding technology in diverse, challenging audio domains including multimedia and healthcare applications

My PhD thesis focused on developing self supervised audio representation learning using contrastive techniques and visual context aware weak supervision

I am also interested in adapting speech technology such as ASR, diarization and VAD to atypical demographics such as child and aging population and egocentric recordings.

My Specialty

My Skills

Programming languages

Python

80%

Packages/Tools

PyTorch

75%

Bash

65%

TensorFlow (/Keras)

60%

C/C++

35%

Matlab

50%
Education

Education

National Institute of Technology Karnataka

Major: Electronics and Communication Engineering

2012 - 2016

GPA: 8.85/10

University of Southern California

Major: Electrical Engineering (Signal Processing)

2016 - 2018

GPA: 3.85/4

University of Southern California

Major: Electrical Engineering

2018 - 2024

GPA: 3.78/4

Research

Research Experience

Semantically-grounded Audio Representations 2023

Weakly-supervised learning of audio representations in movies using visual captions as weak-supervision signal and contrastive objective. Application in several broad movie understanding tasks such as genre and scene classification.

Self-supervised Multiview adaption for Face Clustering in Videos 2021

Large-scale self-supervised mining of 169K face-tracks from 240 movies, leveraging temporal/spatial co-occurrence of faces to mine positive/negative samples. Multiview adaptation of face-representations outperforms triplet learning for face-clustering on benchmark dataset.

Foreground speech localization using multiple-instance learning 2019-2020

Multiple-instance learning approach to detect foreground-speaker speech in egocentric audio recordings. Two-fold detection and localization of foreground segments using existing and novel pooling methods, transfer learned using SAD embeddings.

Speech Activity Detection in Movies 2018-2019

Automatic scalable method for extracting labeled data for speech activity detection (SAD) in movies, generating over 100 hours of aligned audio. Proposed lightweight CNN architectures to achieve state-of-the-art performance in movie-SAD, outperforming LSTM and ResNet models! Read more here!

Robust gender identification in audio 2017-2018

Transfer learning of audio-event VGGish embeddings for gender identification. Trained neural-network models on weakly-labelled AudioSet data to outperform GMM based models in movies.

Get in Touch

Contact

643 W 30th Street, Los Angeles, California 90007.