January 2, 2026

Theory Before the API

Went deep into LLMs. Proper deep. Like, sat down with Andrej Karpathy's "Intro to LLMs like ChatGPT" and didn't surface for hours kind of deep. The whole pretraining flow: tokenization (BPE), token embeddings in actual high-dimensional space (not the vague version), transformer layers doing attention, output distributions, softmax sampling. Then post-training, where the magic happens. RLHF, hallucinations, tool use, fine-tuning. The entire spectrum. Notes checkpoint by checkpoint are here if you want to follow the rabbit holes I went down.

Why? Building on sand sucks. If you're going to touch LLMs, the mental model has to come first. Not the "throw a prompt at the API and pray" version. The actual version. How these things work at the layer-by-layer level. It changes how you think about every problem downstream.

Then I built stuff. POCs first. Started with embeddings generation (OpenAI's text-embedding-3-small), realized I needed to actually understand vector similarity so I implemented cosine vs Euclidean. Set up ChromaDB. Tested different chunking strategies because I was curious (mistake? feature? still unclear). Then glued it all together into an end-to-end RAG pipeline. Experiments are here. Embeddings fundamentals, ChromaDB retrieval patterns, the pipeline. Three solid experiments.

The real takeaway: theory and code don't hold hands. Some chunking strategies look brilliant on a whiteboard. In practice? Garbage. Cosine similarity is elegant. Chunk size selection? Feels like art. Pure vibes. But here's the thing. Once you understand why embeddings work (semantic compression of meaning in vectors), debugging why your search sucks gets way faster. You're not just twiddling parameters hoping something sticks. You know what lever to pull. Also that video made me want to understand RL and DPO and all the weirder stuff that comes after. There's a lot here. Good problems to have.