Rajat Hebbar personal website

Semantically-grounded Representations of Audio with Visually Augmented captioNs 2024

Large language Model aided generation of descriptive visual-context enhanced audio captions for audio-language pretraining. Generative-augmented pretraining objective to learn semantic-grounded audio representations with zeroshot applications.

Semantically-grounded Audio Representations 2023

Weakly-supervised learning of audio representations in movies using visual captions as weak-supervision signal and contrastive objective. Application in several broad movie understanding tasks such as genre and scene classification.

Self-supervised Multiview adaption for Face Clustering in Videos 2021

Large-scale self-supervised mining of 169K face-tracks from 240 movies, leveraging temporal/spatial co-occurrence of faces to mine positive/negative samples. Multiview adaptation of face-representations outperforms triplet learning for face-clustering on benchmark dataset.

Foreground speech localization using multiple-instance learning 2019-2020

Multiple-instance learning approach to detect foreground-speaker speech in egocentric audio recordings. Two-fold detection and localization of foreground segments using existing and novel pooling methods, transfer learned using SAD embeddings.

Speech Activity Detection in Movies 2018-2019

Automatic scalable method for extracting labeled data for speech activity detection (SAD) in movies, generating over 100 hours of aligned audio. Proposed lightweight CNN architectures to achieve state-of-the-art performance in movie-SAD, outperforming LSTM and ResNet models! Read more here!

Robust gender identification in audio 2017-2018

Transfer learning of audio-event VGGish embeddings for gender identification. Trained neural-network models on weakly-labelled AudioSet data to outperform GMM based models in movies.

Robust gender identification in audio 2017-2018

Hi!
I'm Rajat

I am
a PhD Researcher

Research Interests

My Skills

Python

PyTorch

Bash

TensorFlow (/Keras)

C/C++

Matlab

Education

Bachelor of Technology (B.Tech)

Masters Degree (MS)

PhD

Research Experience

Semantically-grounded Representations of Audio with Visually Augmented captioNs 2024

Semantically-grounded Audio Representations 2023

Self-supervised Multiview adaption for Face Clustering in Videos 2021

Foreground speech localization using multiple-instance learning 2019-2020

Speech Activity Detection in Movies 2018-2019

Contact

Hi! I'm Rajat

I am a PhD Researcher

Research Interests

My Skills

Python

PyTorch

Bash

TensorFlow (/Keras)

C/C++

Matlab

Education

Bachelor of Technology (B.Tech)

Masters Degree (MS)

PhD

Research Experience

Semantically-grounded Representations of Audio with Visually Augmented captioNs 2024

Semantically-grounded Audio Representations 2023

Self-supervised Multiview adaption for Face Clustering in Videos 2021

Foreground speech localization using multiple-instance learning 2019-2020

Speech Activity Detection in Movies 2018-2019

Robust gender identification in audio 2017-2018

Contact

Hi!
I'm Rajat

I am
a PhD Researcher