<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Anurag builds things</title>
<link>https://anuragbuildsthings.com/</link>
<atom:link href="https://anuragbuildsthings.com/index.xml" rel="self" type="application/rss+xml"/>
<description>Projects, experiments, and systems built in public.</description>
<generator>quarto-1.9.37</generator>
<lastBuildDate>Wed, 18 Mar 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>Self-Supervised Learning</title>
  <link>https://anuragbuildsthings.com/posts/self-supervised-learning.html</link>
  <description><![CDATA[ 





<section id="what-it-is" class="level2">
<h2 class="anchored" data-anchor-id="what-it-is">What It Is</h2>
<p>Self-supervised learning (SSL) trains models on unlabeled data by generating supervision from the data itself. Instead of human-provided labels, the model solves a <em>pretext task</em> – a proxy objective that forces it to learn useful structure.</p>
<p>Examples of pretext tasks:</p>
<ul>
<li><strong>Contrastive</strong>: pull augmented views of the same image together, push different images apart (SimCLR, MoCo)</li>
<li><strong>Masked prediction</strong>: mask part of the input and predict it (BERT, MAE)</li>
<li><strong>Predictive</strong>: predict future frames, next tokens, or missing patches</li>
</ul>
</section>
<section id="intuition" class="level2">
<h2 class="anchored" data-anchor-id="intuition">Intuition</h2>
<p>Labels are expensive. Structure is free.</p>
<p>Images have spatial coherence. Text has sequential coherence. Video has temporal coherence. SSL exploits these natural regularities to learn representations that capture what matters in the data – without anyone telling the model what to look for.</p>
<p>The key insight: a model that can solve a hard pretext task (e.g., reconstruct a masked image region) must have learned something meaningful about the domain.</p>
</section>
<section id="simple-example" class="level2">
<h2 class="anchored" data-anchor-id="simple-example">Simple Example</h2>
<p>Take an image. Crop it twice, apply different augmentations. The model must learn that both crops came from the same source. To do this, it has to understand <em>content</em> (what’s in the image) and ignore <em>style</em> (color jitter, rotation, scale).</p>
<p>The result: an encoder that maps semantically similar inputs to nearby points in embedding space – without ever seeing a label.</p>
</section>
<section id="why-it-matters" class="level2">
<h2 class="anchored" data-anchor-id="why-it-matters">Why It Matters</h2>
<ul>
<li><strong>Scale</strong>: unlabeled data is orders of magnitude more available than labeled data</li>
<li><strong>Transfer</strong>: SSL representations often transfer better than supervised ones to new domains</li>
<li><strong>Foundation models</strong>: GPT, CLIP, DINO – the most capable models are pretrained with self-supervision</li>
<li><strong>Cost</strong>: eliminates the annotation bottleneck, especially for domains where labeling requires expertise (medical imaging, satellite data)</li>
</ul>
<p>SSL is not a niche technique. It is the default pretraining paradigm for modern AI systems.</p>


</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>self-supervised-learning</category>
  <category>representation-learning</category>
  <category>fundamentals</category>
  <guid>https://anuragbuildsthings.com/posts/self-supervised-learning.html</guid>
  <pubDate>Wed, 18 Mar 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Learning Representations Without Labels (SimCLR)</title>
  <link>https://anuragbuildsthings.com/posts/ssl-representation.html</link>
  <description><![CDATA[ 





<section id="goal" class="level2">
<h2 class="anchored" data-anchor-id="goal">Goal</h2>
<p>Learn visual representations without labels using contrastive learning. Specifically, implement SimCLR (Simple Framework for Contrastive Learning of Visual Representations) from scratch and evaluate the quality of learned embeddings on CIFAR-10.</p>
</section>
<section id="plan" class="level2">
<h2 class="anchored" data-anchor-id="plan">Plan</h2>
<p>Implement SimCLR on CIFAR-10 and validate:</p>
<ul>
<li>Can the model learn useful representations without labels?</li>
<li>How sensitive is performance to augmentations?</li>
<li>How does batch size affect learning?</li>
</ul>
</section>
<section id="initial-setup" class="level2">
<h2 class="anchored" data-anchor-id="initial-setup">Initial Setup</h2>
<ul>
<li>Encoder: ResNet-18</li>
<li>Projection head: 2-layer MLP</li>
<li>Loss: NT-Xent</li>
<li>Dataset: CIFAR-10</li>
</ul>
<p>Hyperparameters will be tuned incrementally during experiments.</p>
</section>
<section id="what-i-expect" class="level2">
<h2 class="anchored" data-anchor-id="what-i-expect">What I Expect</h2>
<ul>
<li>Augmentations will be critical for learning meaningful representations</li>
<li>Larger batch sizes may improve performance (more negative samples)</li>
<li>Training stability may depend on temperature and normalization</li>
</ul>
</section>
<section id="next-steps" class="level2">
<h2 class="anchored" data-anchor-id="next-steps">Next Steps</h2>
<ul>
<li>Implement data pipeline and augmentations</li>
<li>Implement NT-Xent loss</li>
<li>Run first small-scale training</li>
</ul>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li><a href="https://arxiv.org/abs/2002.05709">SimCLR Paper (Chen et al., 2020)</a></li>
<li><a href="https://arxiv.org/abs/1708.03888">LARS Optimizer (You et al., 2017)</a></li>
</ul>


</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>self-supervised-learning</category>
  <category>contrastive-learning</category>
  <category>computer-vision</category>
  <guid>https://anuragbuildsthings.com/posts/ssl-representation.html</guid>
  <pubDate>Wed, 18 Mar 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Why Augmentations Matter in Contrastive Learning</title>
  <link>https://anuragbuildsthings.com/posts/why-augmentations-matter.html</link>
  <description><![CDATA[ 





<section id="status" class="level2">
<h2 class="anchored" data-anchor-id="status">Status</h2>
<p>Planned experiment – not yet executed.</p>
</section>
<section id="hypothesis" class="level2">
<h2 class="anchored" data-anchor-id="hypothesis">Hypothesis</h2>
<p>Augmentations are the most critical design choice in contrastive learning. They define what invariances the model learns – get them wrong and the representations are useless, regardless of architecture or training budget.</p>
</section>
<section id="setup" class="level2">
<h2 class="anchored" data-anchor-id="setup">Setup</h2>
<p>Train SimCLR on CIFAR-10 with different augmentation configurations and measure linear probe accuracy:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Config</th>
<th>Augmentations</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Minimal</td>
<td>Random crop only</td>
</tr>
<tr class="even">
<td>Moderate</td>
<td>Crop + horizontal flip + grayscale</td>
</tr>
<tr class="odd">
<td>Full</td>
<td>Crop + color jitter + flip + grayscale + blur</td>
</tr>
<tr class="even">
<td>Aggressive</td>
<td>Full + extreme crop ratios + strong color distortion</td>
</tr>
</tbody>
</table>
<p>All other hyperparameters held constant (ResNet-18, batch 512, 200 epochs, temperature 0.5).</p>
</section>
<section id="expected-outcome" class="level2">
<h2 class="anchored" data-anchor-id="expected-outcome">Expected Outcome</h2>
<ul>
<li><strong>Minimal</strong> augmentations will produce weak representations – the contrastive task becomes too easy (trivial shortcuts like position matching)</li>
<li><strong>Full</strong> augmentations will force the model to learn semantic features, producing the strongest embeddings</li>
<li><strong>Aggressive</strong> augmentations may hurt early training by making the pretext task too hard</li>
</ul>
</section>
<section id="why-this-matters" class="level2">
<h2 class="anchored" data-anchor-id="why-this-matters">Why This Matters</h2>
<p>Most SSL papers treat augmentations as a hyperparameter table in the appendix. In practice, they are the experiment. The augmentation pipeline implicitly defines what the model treats as “same” vs “different” – which is the entire learning signal in contrastive methods.</p>
<p>Understanding this connection between augmentations and learned invariances is essential before scaling to harder domains (video, medical imaging) where the right invariances are less obvious.</p>


</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>self-supervised-learning</category>
  <category>contrastive-learning</category>
  <category>augmentations</category>
  <guid>https://anuragbuildsthings.com/posts/why-augmentations-matter.html</guid>
  <pubDate>Wed, 18 Mar 2026 00:00:00 GMT</pubDate>
</item>
</channel>
</rss>
