Command Invariant Speaker Representations
Visualization of shared subspace representation to discriminate between utterances of speakers. This is similar to text-dependent speaker diarization. The visualizations are generated using t-SNE. During training, we held out 15 words and 146 subjects. These are the results; Click anywhere to begin audio