Representation Learning — researchengineer.ing

Landscape

The core question: what makes a representation good? Useful representations are structured (geometry reflects semantics), general (transfer across tasks), and compact (don't memorize — compress).

Sub-areas

Self-supervised learning — learning from data without labels: contrastive (SimCLR, MoCo), non-contrastive (BYOL, SimSiam), masked (MAE, BEiT)
Disentanglement — learning independent factors of variation (β-VAE, FactorVAE)
Metric learning — structuring embedding space for similarity (triplet loss, ArcFace)
Multi-modal representations — aligning representations across modalities (CLIP, ImageBind)
Geometric/equivariant representations — encoding symmetries and invariances by construction

Landmark papers

Representation Learning: A Review and New Perspectives — Bengio et al., 2013. Still the best map of the field.
A Simple Framework for Contrastive Learning (SimCLR) — Chen et al., 2020.
Masked Autoencoders Are Scalable Vision Learners (MAE) — He et al., 2021.
Learning Transferable Visual Models From Natural Language Supervision (CLIP) — Radford et al., 2021.
β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework — Higgins et al., 2017.

Key figures

Yoshua Bengio, Yann LeCun (self-supervised), Bernhard Schölkopf (disentanglement, causality), Pieter Abbeel (RL + representations), Stefano Soatto (compression view).

Open Problems

What is a good representation, formally? We have many intuitions (smooth, disentangled, linearly separable) but no unified theory. Most evaluation is downstream-task proxy — not principled.
Does disentanglement require supervision? Evidence suggests unsupervised disentanglement is fundamentally underdetermined without inductive biases. The right inductive biases (causal structure?) remain unclear.
Do self-supervised representations generalize out-of-distribution? In-distribution transfer is well-studied. OOD generalization — especially across modalities or domains — is not.
How do representations interact with continual learning? Catastrophic forgetting is partly a representation stability problem. Can we learn representations that are both plastic and stable?
What is the right geometry for representations? Euclidean space is the default, but hyperbolic (for hierarchies), spherical (for contrastive), and product spaces may be better suited for different data geometries.

Questions & Ideas

If two representations achieve the same downstream accuracy, are they equivalent? What would "representational equivalence" even mean formally?
Do contrastive methods implicitly learn causal structure, or just correlational structure that happens to be stable?
Can you train a single encoder that produces good representations for both RL (value prediction) and supervised (classification) without task-specific heads?
What does a "disentangled" representation look like in a high-dimensional, continuous action space?
Is there a theoretical connection between information bottleneck and the representations learned by transformers?
Do representations learned from video contain more or less causal information than representations learned from images alone?

My Take

This section evolves as thinking develops.

The connection between representation learning and causal structure is where I think the most important unsolved work lives. Current representations are fundamentally correlational — they compress statistical patterns, not causal mechanisms. A representation that captures causal structure would generalize in a fundamentally different (and more reliable) way.

The self-supervised + world model connection also feels underexplored: if you're going to predict future states, you're implicitly learning a representation — but most representation learning work and world model work proceed independently.

Journal

2026-02-27 — Starting this research area page. The motivation: representation learning is the foundation that connects everything else in my cluster (RL, world models, continual learning). Before going deep on any of those, I need a clear map of what "good representation" means and where the genuine open problems are.

First order of business: work through the Bengio 2013 review carefully and map it against the current (2024–2025) landscape to see what's been resolved and what's shifted.