About Me

Who is this guy???

Hi I'm Rajat Hebbar. I am a PhD Student in the Signal Analysis and Interpretation Laboratory (SAIL) at USC.

My research work involves developing techniques for automatic processing of speech and audio signals in real-world settings.

In my time at SAIL, I have developed noise-robust machine learning models for preliminary speech-pipeline modules such as speech activity detection and gender identification, targeted to domains with challenging acoustic environments such as movies, and real-world egocentric audio.

PS: I also like to play chess.

My Specialty

My Skills

Scripting languages








TensorFlow (/Keras)









Disclaimer: The skill-levels depicted in this section are products of the author's self-adjudication. Any resemblance to actual skill levels (living or dead?) is purely coincidental

Think of these numbers as confidence scores



National Institute of Technology Karnataka

Major: Electronics and Communication Engineering

2012 - 2016

GPA: 8.85/10

University of Southern California

Major: Electrical Engineering (Signal Processing)

2016 - 2018

GPA: 3.85/4

University of Southern California

Major: Electrical Engineering

2018 - Present

GPA: 3.8/4


Research Experience

Self-supervised Multiview adaption for Face Clustering in Videos 2020-

Large-scale self-supervised mining of 169K face-tracks from 240 movies, leveraging temporal/spatial co-occurrence of faces to mine positive/negative samples. Multiview adaptation of face-representations outperforms triplet learning for face-clustering on benchmark dataset.

Foreground speech localization using multiple-instance learning 2019-2020

Multiple-instance learning approach to detect foreground-speaker speech in egocentric audio recordings. Two-fold detection and localization of foreground segments using existing and novel pooling methods, transfer learned using SAD embeddings.

VINA - Analysing gender participation in meetings 2019-

Flask-based web-application to deploy state-of-the-art neural network models for gender-based speaking time estimation from audio. Redis Queue (RQ) serves asynchronous background jobs for processing multiple files.

Speech Activity Detection in Movies 2018-2019

Automatic scalable method for extracting labeled data for speech activity detection (SAD) in movies, generating over 100 hours of aligned audio. Proposed lightweight CNN architectures to achieve state-of-the-art performance in movie-SAD, outperforming LSTM and ResNet models! Read more here!

Robust gender identification in audio 2017-2018

Transfer learning of audio-event VGGish embeddings for gender identification. Trained neural-network models on weakly-labelled AudioSet data to outperform GMM based models in movies.

Get in Touch


643 W 30th Street, Los Angeles, California 90007. (Please wait until post-COVID for Apt #)

+323 948 7643 (Please don't call me!)