Why Augmentations Matter in Contrastive Learning
Status
Planned experiment – not yet executed.
Hypothesis
Augmentations are the most critical design choice in contrastive learning. They define what invariances the model learns – get them wrong and the representations are useless, regardless of architecture or training budget.
Setup
Train SimCLR on CIFAR-10 with different augmentation configurations and measure linear probe accuracy:
| Config | Augmentations |
|---|---|
| Minimal | Random crop only |
| Moderate | Crop + horizontal flip + grayscale |
| Full | Crop + color jitter + flip + grayscale + blur |
| Aggressive | Full + extreme crop ratios + strong color distortion |
All other hyperparameters held constant (ResNet-18, batch 512, 200 epochs, temperature 0.5).
Expected Outcome
- Minimal augmentations will produce weak representations – the contrastive task becomes too easy (trivial shortcuts like position matching)
- Full augmentations will force the model to learn semantic features, producing the strongest embeddings
- Aggressive augmentations may hurt early training by making the pretext task too hard
Why This Matters
Most SSL papers treat augmentations as a hyperparameter table in the appendix. In practice, they are the experiment. The augmentation pipeline implicitly defines what the model treats as “same” vs “different” – which is the entire learning signal in contrastive methods.
Understanding this connection between augmentations and learned invariances is essential before scaling to harder domains (video, medical imaging) where the right invariances are less obvious.