Rajat Hebbar personal website

Who is this guy???

My research work involves developing techniques for automatic processing of speech and audio signals in real-world settings.

In my time at SAIL, I have developed noise-robust machine learning models for preliminary speech-pipeline modules such as speech activity detection and gender identification, targeted to domains with challenging acoustic environments such as movies, and real-world egocentric audio.

PS: I also like to play chess.

Self-supervised Multiview adaption for Face Clustering in Videos 2020-

Large-scale self-supervised mining of 169K face-tracks from 240 movies, leveraging temporal/spatial co-occurrence of faces to mine positive/negative samples. Multiview adaptation of face-representations outperforms triplet learning for face-clustering on benchmark dataset.

Foreground speech localization using multiple-instance learning 2019-2020

Multiple-instance learning approach to detect foreground-speaker speech in egocentric audio recordings. Two-fold detection and localization of foreground segments using existing and novel pooling methods, transfer learned using SAD embeddings.

VINA - Analysing gender participation in meetings 2019-

Flask-based web-application to deploy state-of-the-art neural network models for gender-based speaking time estimation from audio. Redis Queue (RQ) serves asynchronous background jobs for processing multiple files.

Speech Activity Detection in Movies 2018-2019

Automatic scalable method for extracting labeled data for speech activity detection (SAD) in movies, generating over 100 hours of aligned audio. Proposed lightweight CNN architectures to achieve state-of-the-art performance in movie-SAD, outperforming LSTM and ResNet models! Read more here!

Robust gender identification in audio 2017-2018

Transfer learning of audio-event VGGish embeddings for gender identification. Trained neural-network models on weakly-labelled AudioSet data to outperform GMM based models in movies.

Robust gender identification in audio 2017-2018

Hi!
I'm Rajat

I am
a PhD Researcher

Who is this guy???

My Skills

Bash

Kaldi

Python

TensorFlow (/Keras)

Matlab

PyTorch

Chess

Education

Bachelor of Technology (B.Tech)

Masters Degree (MS)

PhD Candidate

Research Experience

Self-supervised Multiview adaption for Face Clustering in Videos 2020-

Foreground speech localization using multiple-instance learning 2019-2020

VINA - Analysing gender participation in meetings 2019-

Speech Activity Detection in Movies 2018-2019

Contact

Hi! I'm Rajat

I am a PhD Researcher

Who is this guy???

My Skills

Bash

Kaldi

Python

TensorFlow (/Keras)

Matlab

PyTorch

Chess

Education

Bachelor of Technology (B.Tech)

Masters Degree (MS)

PhD Candidate

Research Experience

Self-supervised Multiview adaption for Face Clustering in Videos 2020-

Foreground speech localization using multiple-instance learning 2019-2020

VINA - Analysing gender participation in meetings 2019-

Speech Activity Detection in Movies 2018-2019

Robust gender identification in audio 2017-2018

Contact

Hi!
I'm Rajat

I am
a PhD Researcher