Generalized Multiview Representation Learning

Generalized Multiview Representation Learning
What are multiple views and modalities?Data that is sampled from observing an event with different instruments to capture its various presentations. For example, consider the task of identifying the wake word(s) "OK Google" or "Alexa" regardless of age, accent, acoustic background and other factors, different variations of a given wake word are multiple views of the underlying task, i.e., recognizing the wake word. Another example is understanding the topic of a video using sound design, visuals and language as multiple modalities.

Why do multiview learning? why do multiview learning

So can machines. Learning from different devices or instruments observing multiple aspects of an underlying event can help build robust machine learning models.

Open research questions addressed by this work:

  • How can we model multiple views/modalities in parallel?
    • There is a need to develop methods that scale for a large number (>2) of views.
    • Need for view-agnostic methods i.e., having view-correspondence but not details of view acquistion
  • Developing mechanisms for multimodal “disentanglement”
    • How to disocciate modality-specific information and shared information across modalities?
    • Need for generalizable unsupervised and self-supervised methodologies.

Central idea of this work:

  • Inherent variability of a semantic class can be uncorrelated across multiple views in the data
  • Maximizing multiview correlation can transform input high-dim. data streams to low-dim. shared subspace across views
  • The subspaces estimated by factoring out the variabilities arising from the many views of an event are naturally discriminative of the classes

We evaluated our methodology on several datasets in the domains of audio, video and images. Below are three demos of such applications:

Click on the panels below to explore the learnt embeddings on different datasets

The visualizations above are developed using this tool

Why does this work?
The objective we maximize and the evolution of the eigenvalues in the objective through the training process is illustrated below:

© 2020. All rights reserved. Krishna Somandepalli