Ankur Kakroo

Ankur Kakroo

Director of Engineering

I’m an asymmetric engineering leader for the AI era with strong business, product, and design sense. I think in second order, which shapes how I hire, build teams, and make decisions.

I maintain a high bar for quality, whether it is maintaining talent density, the clarity of a document, the copy in an error message, or a UI detail that is off. I have experience in both B2C and B2B products, including building from zero to one. I care deeply about platforms and over the years I have built several that are both reliable and scalable.

Log

Thinking out loud while building

February 28, 2026

Two Camps, Same Tool

There's a split I keep noticing. Not between people who use AI and people who don't. It's between people who ask "how do I do this faster?" and people asking "what can I do now that I couldn't do before?" Two camps. Same tool. Completely different orientation.

The optimizers aren't wrong. The gains are real. Writing faster, shipping features at higher velocity, summarizing what took hours in seconds. Useful, all of it. But if that's the entire frame, something's getting missed. Because you don't get exponential from a 2x workflow. You get 2x. Which is fine. But not the thing.

The actual unlock is in the things that used to cost too much. Too slow, too complex, required too many people to coordinate, felt impossible to attempt alone. Those problems didn't just get cheaper. They changed category. Something that was cost-infinity is now near-zero. And that math breaks in ways that are hard to think clearly about until you're inside it.

I started asking myself mid-task: am I optimizing something, or operating in something new? Simple question. Cuts every time. Because the honest answer is almost always: optimization. Habit is sticky. The shape of old work haunts the new work. You pick up a power tool and reach for the same nail.

What's strange is that AI seems to be doing the stretching actively. You build something in a weekend that would've taken a quarter, and something recalibrates. You start asking "wait, what else was I not even trying because it seemed too big?" That shift. That's not a productivity number. It's the ambition adjusting in real time. The ceiling going somewhere you can't fully see yet.

Pretty exciting times to be navigating the map while it's still being drawn.

February 24, 2026

Clicks Feel Wrong Now

Been spending a lot of time with AI tools lately. Not just reading about them, actually living inside them. And something started crystallizing that I hadn't fully articulated before: the UI layer, as we know it, is kind of on borrowed time.

Think about what clicking actually is. You're navigating someone else's mental model. A product designer decided there's a hierarchy, built a nav structure around it, and you spend the first hour with any new product just figuring out where the thing you want is buried. Add some imports. Submit a form. Select all. Confirm. Each click is a toll booth. And when you spend enough time with agents that just... do the thing you described in a sentence, the toll booths stop making sense. Not gradually. Suddenly. The contrast is jarring once you feel it.

I don't think the UI disappears. But I think it needs to look radically different. The whole premise of UI is that you need to form a mental model, traverse hierarchies, find affordances. That premise was built for a world where the software couldn't understand you. Now it can. So who's the navigation for? The hierarchy was a workaround, and we've been iterating on the workaround for forty years like it's the destination.

The hardware piece is what I find genuinely interesting. If the interface is intent-driven instead of click-driven, then a screen full of tappable elements isn't obviously the right form factor anymore. Rabbit R1, the AI Pin, whatever comes next. Most of them got dismissed as half-baked, and honestly some of them were. But the instinct behind them is correct. If the interface is conversation and context rather than navigation and tapping, then maybe the rectangle of glass in your pocket is one answer to a question that now has several. The hardware follows the interaction model. The interaction model is shifting. So.

Here's the honest thing though: I didn't fully feel any of this until I lived it. Saw the Rabbit R1 coverage when it dropped, thought "yeah, interesting, moving on." Because it was easy to process as another gadget in a world already full of gadgets. But you use agentic tools daily, you try to do the same task by clicking through a product, and the friction becomes visceral. Not conceptual. Actual friction. That's when you stop saying "this makes sense intellectually" and start saying "oh, this is actually going to happen."

The part I find uncomfortable is how fast it's moving. There's something disorienting about being deep in the current moment of AI. The pace is so relentless that doing the mental exercise of "what does this look like in three years" feels almost impossible. Not because the question is hard. Because by the time you finish the thought, three new things have happened and the assumptions you started with are already stale. You're in the sphere, not watching it from outside. Useful for doing things. Less useful for thinking about where things are going. Both matter, and right now one is drowning out the other.

February 18, 2026

No Playbook for This

The agent shift is different. I've said this before, but it keeps getting more true the more I sit in it. The cursor era was AI-as-assistant. A really smart autocomplete. A pair programmer that never got tired. That was one thing. This is another. Agents don't just assist. They execute. And that changes the relationship entirely, not just with the tools but with the work itself.

What's started becoming clear is that you can't just point a company at agents and say "go." The technology isn't the hard part. Every company has a different stack, different tools, different ways engineers already work. The real question isn't "which AI tools are best." It's "what does a workflow that actually holds together look like for us?" And that's a harder question. You have to figure out how the pieces stitch. How the agent fits into the existing toolchain without creating a parallel universe nobody maintains. How you get from a promising demo to something the whole team is actually using every day. Budget is a real thing, sure, but it's usually not the wall. The wall is coherence.

And that's before you get to the human side. Some people get it immediately. Others are suspicious. Not of AI, but of the implication that the way they've worked for ten years might need to change. That's not resistance to technology. That's a reasonable response to uncertainty. So now you're navigating both: the org mechanics and the culture shift. At the same time. With no playbook.

The snowball metaphor is the right one though. Before it starts rolling, the only thing that works is a real leap. Top-down mandate, bottom-up energy, both at once. Neither alone is enough. Top-down without bottom-up buy-in gives you a compliance exercise. Nobody actually changes how they work. Bottom-up without top-down support hits budget and access walls immediately and dies there. It's not a nice-to-have combination. It's load-bearing. The snowball needs both hands.

The uncomfortable truth is that companies doing this well right now aren't being thoughtful about it. They're being aggressive. Decisions that used to take a quarter are being made in a week. Some of those decisions are wrong. That's fine. The alternative (deliberate, consensus-driven, committee-approved AI adoption) is just structured latency. By the time you've figured out the perfect workflow, the floor has moved again.

February 14, 2026

Design Was Always It

Had a thought this week that landed differently than I expected. Been working with AI tooling constantly (agents, code generation, the whole stack) and I carried this assumption that every developer would level up. Rising tide, all boats, varying degrees. Seemed obvious. Turns out I was wrong, or at least incomplete.

The developers who came up writing code line by line, sequentially, building understanding as a side effect of typing things out? That sequential process was actually a crutch. Not a bad one. It worked. You wrote the implementation, so you understood the implementation. Tight loop between doing and knowing. But AI obliterates that loop. The distance between "what I want" and "what exists in the codebase" is now enormous. You can conjure a thousand lines in minutes. The question is whether you know what those lines should be.

And that's where design thinking stops being a nice-to-have and becomes the entire game. Not just system architecture whiteboard stuff. Low-level design too. How your components decompose. Where your boundaries sit. How state flows through the system. How your database connections pool. Schema decisions. Separation of concerns at every layer. The people who always thought in systems, who held a mental model of the whole thing before writing a single function. Those people are about to have a decade. Because when you're directing an agent, the agent is the hands. You need to be the brain. And the brain's job is design.

Here's what surprised me about this realization: it's not just senior engineers who win. It's anyone who thinks structurally. Junior devs with taste. Engineering managers who always had strong opinions about how systems should fit together but stopped building because the org chart said so. They just got handed the keys back. The agent doesn't care about your title. It cares about the clarity of your intent. If you can describe the shape of a system, really describe it at multiple levels of abstraction, you can build it now. That's a profound unlock for people who were sidelined by the mechanics of implementation.

The flip side is uncomfortable. People who shipped code by following patterns without understanding why those patterns exist, who relied on the sequential grind to paper over gaps in their mental model. AI doesn't fill those gaps. It widens them. You prompt an agent without a clear design in your head and it'll happily build something. It'll look complete. It'll even work, for a while. But the moment you need to extend it, debug it, reason about it under pressure, there's nothing there. No skeleton. No opinion. Just a pile of generated code shaped by whatever the model defaulted to. And taming that after the fact is brutally hard. Harder than building it right would've been.

So I'm weirdly more optimistic about the people nobody's talking about. The architect who hasn't committed code in three years. The manager who still sketches systems on napkins. The junior who asks "but why is it structured this way?" in every code review. Those instincts are suddenly the most valuable thing in the room. Design was always important. AI just made it non-optional.

February 12, 2026

Rip the Tools Out

We glorify tool use. MCPs, agents with twelve tools strapped on, multi-step chains doing backflips. It looks impressive in a demo. Feels like progress. But ship it to real users and watch what happens. They wait. And wait. And then they close the tab.

Built an agent at work this week. Internal use case. Started the way everyone starts. Gave it a bunch of tools, let it reason its way through the workflow. Textbook agentic pattern. The kind of thing that gets a standing ovation in a conference talk. In production? Brutal. Every tool call is latency. Every reasoning step is a user staring at a spinner. The agent was correct. It was also unusable. Correct and unusable is a special kind of failure because you can't even point at the output and say it's wrong. It's not wrong. It's just slow enough that nobody cares.

So I ripped the tools out. Made the API calls myself. Hardcoded the paths I knew were predictable, restricted tool use to the genuinely ambiguous cases where the model's judgment actually mattered. Less elegant? Sure. Faster? Dramatically. And the outputs got more predictable too, because fewer decision points means fewer places for the model to wander off and get creative when you didn't ask it to.

The thing nobody says out loud: the LLM is the easiest part now. The models are absurdly capable. That stopped being the bottleneck a while ago. What's hard is everything around it. The harness, the orchestration, the part where you figure out how to deliver speed, predictability, and quality simultaneously. Users don't care that your agent can use fourteen tools. Users care that the thing works fast and does what they expected. That's it. That's the entire product requirement.

There's a design instinct in this space that defaults to "give the model more capabilities." More tools, more context, more autonomy. And sometimes that's right. But sometimes the answer is the opposite. Constrain the model, take away options, make decisions on its behalf so it doesn't have to. Not because it can't. Because the round trip costs more than the decision is worth. Every tool call you eliminate is latency you delete. Every decision you make upfront is variance you remove. The craft isn't in how much you let the agent do. It's in knowing exactly where the agent's judgment is worth the cost, and handling everything else yourself.

February 8, 2026

Speed Has It Backwards

Everyone wants faster agents. Speed is the thing. And look, I know what fast actually looks like. Groq, Cerebras. Tokens flying at you before you've finished reading the last batch. That's a different tier entirely. The incremental speed improvements everyone else is shipping right now? They're not that. Not even close. And unless they reach that level, the marginal gains in latency don't actually change how I work. Counterintuitively, for where I am right now, slower is better.

Here's why. When a model is truly fast, Groq fast, it changes the interaction model completely. That's its own thing, worth exploring. But the in-between? A model that's 20% faster than last month? That doesn't unlock a new workflow. It just means you get responses back slightly before you've finished thinking about the last one. You glance at the output, fire off the next thing. Tight loop. Feels productive. Except you're not actually processing any of it. You're reacting. And reacting isn't engineering. Reacting is email.

Slow models fix this. Counterintuitive, I know. But when Opus takes its time on a response, I'm not sitting idle. I'm spinning up another agent on a different problem. Or reading the code the first agent just touched. Or thinking, actually thinking, about whether the direction even makes sense. The latency becomes space. Space to be parallel. Space to be deliberate.

That's the nuance with the Codex family too. They're not the quickest. Nobody's picking them for speed benchmarks. But that pace matches the kind of work where I need them most: longer steps, deeper reasoning, problems where rushing produces confident garbage. When I'm zeroing in on something and every decision compounds on the last, I don't want a model that outruns my judgment. I want one that keeps pace with it.

So the speed conversation has it backwards, I think. At least for serious work. Fast models optimize for throughput on a single thread. Slow models, paired with a human who knows how to fan out, optimize for total output across many threads. Parallelism is the multiplier. The model doesn't need to be fast. You need to be fast at deciding what to throw at it next.

February 7, 2026

Vibe Coding Hits a Wall

Codex 5.3 dropped. Opus 4.6 dropped. Spent the last 24 hours bouncing between both. Character-wise, each feels similar to its predecessor. Same general temperament, same shape of reasoning. But the improvements show up in specific domains. Some incremental, some genuinely surprising depending on what you throw at them. The Codex family in particular has become something I keep reaching for. Not flashy. Not trying to impress you. Just... solid. Boring in the best way. You ask it to do a thing, it does the thing. That's the whole pitch and it works.

But here's what actually stuck this week. Vibe coding. Everyone's doing it. I've been doing it. And I finally hit the wall that clarifies something important: vibe coding demands you surrender control. That's the deal. You're trading precision for speed, and for throwaway stuff or prototypes, that trade is fine. Nobody cares how the internals look if it's a weekend experiment. The problem starts when you decide to keep the thing. When somebody else has to use it. When you come back Tuesday and need to add a feature.

Built something for work recently. Borderline vibe coded it. Moved fast, felt great. Then a couple bugs surfaced and I needed to extend it. Should've been straightforward. It wasn't. I struggled. The model I was working with struggled. Neither of us had gotten dumber overnight. The code itself was the problem. Decisions made in that initial sprint weren't wrong exactly, they were just... unexamined. The architecture had no opinion. It was shaped by whatever the model suggested and I accepted, layer after layer, without pushing back hard enough. So when I needed to reason about it later, there was nothing solid to grab onto.

That discomfort of sitting in a codebase you built but don't fully understand is a specific kind of frustrating. You wrote every line. Or approved every line. And yet the mental model is thin. That's the cost of surrendered control.

So the insight is unglamorous but real: anything you plan to maintain, anything someone else will touch, anything with a future beyond "demo it once," you need to own the decisions. Not just accept them. The model can be the most capable thing on the planet and it still can't replace your judgment about what this particular system should look like. Human intent isn't a nice-to-have. It's the structural integrity. Skip it and you're building something that looks complete but buckles the moment you lean on it.

February 4, 2026

Know When to Stop

Short one today. But it matters.

Know when to kill a session. That's the skill nobody talks about. You spin up an agent, get deep into something, hit a wall. And instead of grinding through it, the right move is just... stop. Close it. Start fresh. Sounds obvious but it isn't. There's this pull to keep going, to salvage the thread, to squeeze one more useful thing out of a context that's already poisoned. Don't.

Started noticing it recently. Long sessions degrade. Not dramatically. Subtly. The model starts hedging, second-guessing, carrying baggage from five decisions ago that have nothing to do with the current problem. It's not broken. It's just full. And a full context is a slow context. Started cutting sessions shorter on purpose. Fresh start, clean slate, sharp output. The difference is obvious once you do it back to back.

So the meta-skill here isn't "use agents better." It's "know when to stop using this agent." Brave move, honestly. Especially when you've built momentum. Especially when you're close to something. But the best engineers I know have always been good at this. Knowing when to walk away from a problem and come back with fresh eyes. Same thing. Different scale. The session is the unit now, not the day.

February 1, 2026

Beyond the Terminal

Software development is shifting. The interface part, specifically. Been using Conductor long enough now to notice: neither the terminal nor the editor is the right fit for what's coming. They're built for single-project thinking. One window, one context, one linear flow.

Multiple agents? Different story. Running three, four threads at once (not just open tabs, actual parallel work streams), you need a different mental model. And I'm getting there. My job already trains me for this: multiple threads, continuous loops, constant switching. Turns out that maps nicely to orchestrating agents. When one's blocked, spin up another. When that one's waiting on input, shift to the third.

But I still have gaps. Use cases where I should spawn multiple agents? Blank. I know it's possible. I know it's useful. I just can't see it yet. The interface change made it obvious, though. Conductor pulled me out of the single-project default. One session per repo, one task at a time. That whole paradigm just dissolved. And once you see that, you can't unsee it. Building products will look different. Has to.

January 28, 2026

Interfaces Aren't Ready Yet

Wanted to talk about interfaces. Interfaces of tomorrow, really. I started with this belief that the terminal is the right interface. Still think it might be the most fundamental unit of orchestrating an agent. But using multiple agents across multiple repositories? It's clear the terminal might not be the most efficient user experience for the future. Would you even call that an editor? Agent orchestration will happen in user experiences that are very different from what we have now.

Tools like Conductor, Commander. They give us a peek. Been using Conductor heavily. Nice tool. The design of the software itself lets me multitask in a way I couldn't before. Subtle difference in how the UI works, but it changed my workflow. More efficient at hopping between tasks instead of moving terminal window by terminal window, figuring out which session is which. Turns out interface design matters more than I thought.

Tried Kimi K2.5 thinking model this week. Gave it a vague, UI-heavy problem just to see what happens. Impressed. Actually handled it well. Way better than GLM 4.7 on the same kind of thing. Then I had it generate a logo and we iterated on it naturally, back and forth, refining. That's where NanoBanana breaks down: they cache so aggressively you can't iterate. Probably optimizing for cost but it kills the flow. Open source models are going to eat here. They already are.

Office stuff ramped up. Product launch coming, so less time for personal repos. But I'm running two or three agents at work now, solving actual problems we hit daily. Real problems, not toy examples. They're shaping up nicely. Started contributing to internal agent tooling too because once you see what's possible, you can't unsee it. You just start building.

Hectic week. But the kind of hectic that's worth it.

January 24, 2026

No Ceiling Anymore

Model testing week. Ran GLM 4.7, Opus, Gemini 3 Pro, GPT 5.2 Codex through their paces. GLM's solid for straight coding but ask it to handle design or something genuinely complex? Falls apart. Design work, I keep reaching for Opus. Frontend skill loaded up, Opus just gets it. Gemini 3 Pro edges ahead slightly on pure design intuition, but pair Opus with that frontend skill and it's a different beast. GLM has the skill too but doesn't execute. So I'm back to Anthropic models every time.

GPT 5.2 Codex is still the choice for gnarly, long-running problems. Just accept that it's slow and move on.

Then I stumbled into Remotion. The MCP server that generates videos. Took my portfolio, fed it through, 20 minutes later had this slick video artifact. Use case unclear but the possibilities are obvious. There's a 0-to-1 product at work where this could create some genuinely cool stuff. Worth exploring.

Second discovery: pencil.dev. Open canvas, hooks into Claude Code, any coding tool really. You brainstorm with Opus, it generates real-time visual samples while you're thinking. Feels like having a designer sitting next to you. Iterate, discard, refine, narrow it down. Then hand the polished plan to Opus to finish functionality. Close the loop. No friction.

What hit me is there's no ceiling anymore. You can stretch your imagination as far as it goes and actually build the thing. The oyster metaphor is overused but accurate here. The world's wide open. These tools just keep revealing what's possible. Humbling, honestly. Next few months should be interesting.

January 19, 2026

Agents Meet Hardware

Claude Opus 4.5 and GPT 5.2 XHigh are really, really good. Tried the Ralph Wingham technique today, worked really well. Built a boring app but had a lot of components, so good stress test.

GLM 4.7 is surprising me. Somewhere around Sonnet thinking model level, but then it gives you these moments where it feels like Opus 4.5. The limits are extremely generous. Burned around 300 million tokens in a few hours.

Yesterday night bought an Arduino R4 microcontroller. Ran some experiments. Fun to work with my kid on it and program crazy stuff. Worried he's too small and will ruin everything, but thinking of interesting uses and how to build on top of it.

The experiments were small, but gave me a peek at what's possible. I've worked with embedded systems before, it's tedious. Writing all that code, testing it, etc. But this was seamless. Just told the agent what to do and it interfaced and loaded the program into the microcontroller directly. Rapidly did 4-5 experiments. Pretty fun ones.

Wild times.

January 18, 2026

Watching Models Think

Oh, I'm shit scared of Opus and even Codex High and Xhigh. Today I gave a very hairy problem to GPT 5.2 Codex. It was tricky and actually had got handovered from GLM 4.7 to Opus and eventually to 5.2 codex.

I was trying to set up a proxy for my Gemini subscription so that you can use it with Claude code because I have realized that harness is very important, and I like Claude code's harness. I think it just makes the models perform better. When you are just seeing the thinking tokens and how these very large models "think" and looking at it on your screen seeing multiple moments of brilliance and seeing how they latched on to a new and a completely different thread which eventually revealed a solution is actually kind of scary and crazy.

Today I had also the longest run for any model that I ran across last many months and I was intrigued looking at it following through. And it was mesmerizing to say the least.

This is what is currently in progress here. Here is the repo - claudecode-antigravity-auth (still in progress) Yeah, mostly I think people will benefit from this as well because for the mere mortals like me, I have seen my tendency is to kind of ration models from different providers but I'm just trying to unify the user experience for myself and sticking to Claude Code.

January 17, 2026

Harness Beats the Model

The world is changing and it's changing fast. Next five to ten years? Era of builders. Building has never been more exciting, truth be told. AI agents at your disposal can do almost any job. I'm getting more done than ever, given the time I have.

But limits. Great models consume limits very, very quickly. Spent the whole day figuring out subscription permutations. Trying to get bang for buck, decent limits. Tried everything. Opus. Cursor. Claude code. Antigravity. Different combinations, different stacks.

Had this revelation, practically experienced it: harness matters. Possibly more than the model itself. Been using GLM 4.7 with opencode. Not great. Proxied it through Claude Code and suddenly it's good. Like 40% better capability just from changing the harness. Set it up at claude-code-glm-setup if you want to follow that rabbit hole.

Now I've got Google's AI Pro subscription too. Multiple models, solid pricing. Trying to proxy it through Google's auth but still get model access into Claude Code because I've started appreciating how good the harness actually is. And honestly? Anthropic's models, especially Opus. Every time there's a hairy problem, Opus surprises me. Can't use it continuously, limits being limits, but when it counts, it delivers.

Other thing I'm thinking about: need more stimulus. More reasons to build things to explore. Wondering if I should get some hardware to hack on because the last thing I want to do with these models is build websites. Websites are easy, I can do those myself. Typical software engineering, backend, whatever? Not fun unless there's a great product idea behind it. Complex visualizations are fun, sure, but watching models struggle in unfamiliar environments is where you actually learn. Like when I was working with Kindle stuff. Different territory, you learn way more about the model, the tools, the whole stack.

Exciting times ahead. Really enjoying this phase. Equally nervous about how fast it's all moving.

January 15, 2026

Jailbreaking a Kindle

Jailbroke a Kindle Oasis. Because why wouldn't you turn a locked-down e-reader into a desk dashboard? E-ink displays are stupid expensive in India. Hundreds of dollars for a decent panel. But I had a Kindle. Linux-based. E-ink. Perfect.

The jailbreak itself? Tedious. Firmware 5.18.2, which blocks WinterBreak outright. AdBreak technically works but requires Special Offers (ads), and Amazon India doesn't expose that toggle. No ads = no AdBreak path. Got initial SSH access via USB, then handed it over to Claude Code. The agent did maybe 90% of the work. SSH'ing into the Kindle, reading forum threads, piecing together exploits, installing KOReader, setting up usbnetlive. Just back-and-forth between the agent and the device. I'd check in, confirm steps, let it continue.

Then came the real work: rendering a dashboard. Cal.com-inspired. Weather widget, calendar timeline with overlapping events, todos. Full landscape layout at 1680×1100. Sounds simple until you hit Kindle's ancient WebKit. No CSS Grid. No Flexbox. Google Fonts won't load. @font-face gets ignored. Fell back to tables, fixed widths, system fonts (Amazon Ember). Stripped out every modern CSS construct.

But here's the bigger problem: KOReader doesn't execute JavaScript. At all.

Spent hours building a client-side dashboard with XHR calls, DOM updates, intervals refreshing data every 5 minutes. Deployed it to the Kindle. Opened it. Saw the HTML skeleton with placeholder dashes. No weather. No calendar. No todos. Just static markup.

Suspected ES6 issues first. Converted const/let to var, rewrote arrow functions, ditched template literals. Still nothing. Built a test file with simple DOM manipulation (document.getElementById().textContent = 'test'). Pushed it. Text stayed unchanged. JavaScript just doesn't run in KOReader's HTML viewer.

Pivoted to server-side rendering. Built a /kindle endpoint in Express that fetches weather (open-meteo, live data for Gurugram), pulls 30 realistic calendar events with overlap logic, grabs todos, then generates complete static HTML with all data embedded. Added a meta refresh tag (5-minute reload). No JavaScript. Just HTML with data baked in.

Deployed it. Worked.

Then the agent found the Chromium binary buried in the system (/usr/bin/chromium/bin/kindle_browser). On its own. Just knew it needed to meet my requirements (landscape dashboard, no URL bar, clean interface) and hunted through the filesystem until it discovered the browser binary and figured out the --content-shell-hide-toolbar flag. URL bar? Gone. KUAL menu integration for easy launching. Slight moment of awe watching a multi-tool agent autonomously solve the problem. This felt like the real breakthrough. Auto-triggering the browser, no chrome, just content.

Tried framebuffer rendering too, basically overriding the screen directly, writing raw pixels to /dev/fb0 (the display memory), bypassing all UI layers. Rotated the image 90° for landscape. Worked perfectly. Until the battery overlay started stomping over it every few minutes. System-level UI elements you can't suppress. Explored, tested, abandoned. Browser-based approach is cleaner.

But landscape? Still not solved. Browser appears hardcoded to portrait. Tried mesquite framework orientation locks, tried CSS rotation (old WebKit laughs at transforms), tried physical rotation. Nothing. Browser stays portrait. Landscape is a hard requirement. Not there yet.

The workflow for this was pure agent magic. SSH'd into the Kindle over Wi-Fi, set up a continuous deploy loop where the agent curled the server-rendered HTML, transferred it via scp, tested it, iterated. I barely touched the terminal. Just gave it long-running tasks: "fix the layout," "add calendar overlap," "make the weather widget bigger," "find Chromium and remove the URL bar." Let it burn tokens.

How many tokens? Millions. Easily 200-300 million. Maybe more. The agent had filesystem access, SSH access, web access, multiple tool chains. It ran nodemon servers, killed port conflicts (port 5000 was occupied by ControlCe, moved to 5001), debugged CSS rendering bugs, wrote server routes, fixed API endpoints, created KUAL menu integrations, built screenshot capture tools (~/kindle-screenshot.sh that reads /dev/fb0 and converts to PNG for easy testing). Continuous loop. I'd give it a vague instruction, walk away, come back to a working solution.

Was it worth hundreds of dollars in token cost to render a calendar on a jailbroken Kindle? Absolutely. This is what agents are for. Not toy demos. Real work. Tedious, multi-step, iterative work where you'd normally context-switch yourself into exhaustion. The agent held the entire state: jailbreak logs, server config, SSH credentials, CSS constraints, WebKit limitations, Chromium launch flags, framebuffer stride calculations, deployment commands. I just steered.

The dashboard still isn't perfect. Landscape orientation unsolved (hard blocker). Event overlap rendering could be tighter. But it's there. Live weather. Full calendar. Todos. Auto-refreshing every 5 minutes. On an e-ink display. Code committed to git (kindle-dashboard once I push it). KUAL menu with launch options. Screenshot tool for testing. The whole system.

Tomorrow's a holiday. Probably spend it solving landscape, maybe hooking it to my actual calendar, adding real todo sync. Get it production-ready. Then push the repo public so someone else equally obsessed can skip the 72-hour jailbreak/CSS/SSH/Chromium debugging loop.

Also still learning vector databases on the side. That's the slow burner. Structured plan in place, just grinding through fundamentals. How they work, use cases, indexing strategies. But this? This was the fun chaos. The kind of project where you realize agents will become part of our life. You can't hold this much state in your head. The context switching alone would kill you. Jailbreak exploits, firmware limitations, WebKit quirks, server routing, SSH deployment, framebuffer rendering, browser flags, orientation locks. Let the agent handle it. Just keep feeding it problems.

Building on sand still sucks. But building with the right tools? That's when it clicks.

January 10, 2026

Particles That Listen

Built an audio-reactive 3D particle visualization from scratch. Like, properly from scratch. Microphone input, Web Audio API, physics simulation, the whole thing. Collaborative session with engineers on my team, just vibing and iterating. Started with a simple waveform line, ended with 20,000 particles forming an elastic mesh that ripples like fabric when you speak into it.

The journey was the fun part. Started with Google's Anti-Gravity. Gemini's surprisingly good at UI stuff. Then bounced to Opus for the heavy optimization work. Morning session I tried Open Code, another agentic terminal tool, just to feel out what different models bring to the table. Each tool had its strengths. Gemini for the initial visual intuition, Opus for the physics math and performance tuning.

Here's what made it click: I actually got to use physics. Real physics. Particle influence with inverse-distance falloff. Acceleration and deceleration curves. Wave propagation across a 2D grid. The whole "space-time continuum viewed from 45 degrees" thing started as a joke but became the actual architecture. Particles don't just move up and down. They influence neighbors, decay over time, respond to elasticity parameters. It's a connected system.

The iteration was messy in the best way. Started 1D (line), went 2D (plane), added cross-directional force propagation, doubled particle density multiple times, built a control panel with sliders for every parameter. At some point we were debugging why the particles looked "pointy" during sharp audio spikes. Turned out we needed broader activation patterns for smoother visual flow. The final thing runs at 60fps with a 200x100 grid. Synthwave aesthetic, horizon perspective, depth fog. Actually beautiful.

What stuck: Different models see problems differently. And physics knowledge from years ago? Still useful. Still satisfying to apply. Good problems to have.

January 9, 2026

Building From My Phone

Mobile dev problem? Finally solved. Like, actually solved. Built a shell script that auto-configures everything. Cursor, Termius, remote SSH, the works. Called it ClaudeGo. One command and you're set up. No more "wait let me fix my environment" nonsense. Just works.

Then hit a wall. Different problem. Memory across sessions? Doesn't exist. You spin up a new Claude Code session, you start from scratch. Every. Single. Time. Tried Mem0, wrote a setup guide, but honestly? Felt heavy. Then I was at the airport (traveling back from Bangalore), stuck in that dopamine loop of optimizing my agent ecosystem (again), and found spec story. Stores context as files. Direct chat. No LLM involvement. Simple. Elegant. Exactly what I needed.

Integrated it into my YOLO alias. Now every Claude session auto-sets up spec story. Context persists. Files don't vanish. I'm not nervous anymore. Cloud-based solutions exist, sure, but filesystem-based? For now? Perfect. Sometimes the simple answer is the right answer.

Then something weird happened. Showed my wife how I fixed my portfolio site. Through my phone, live, in front of her. Made a couple of changes. Worked perfectly. She watched. But I don't think it hit her yet. Like, really hit her. We're sitting on a slippery slope. The era of builders is here. Right now. You can build things from your phone while sitting on a couch. The world is changing fast. And I don't think most people realize how fast.

I'm pushing this at work too. Telling people: use Claude Code. Use agentic tools. Stop doing things manually. Some get it. Some don't yet. But they will.

Good last few days. Kept building. Kept shipping. That's the only way forward. Build. Push. Repeat.

January 7, 2026

The Blocker Is Gone

The last few days have been completely chaotic. Good chaotic, mostly. Multiple epiphanies, way too many rabbit holes, and a bunch of building that feels borderline manic. But also: clarity on something that's been a blocker for years.

Started with understanding how people are actually using AI. Not the marketing version. The real version. Read that paper, then fell into Twitter threads and posts for hours. Pattern after pattern. People aren't using AI to replace work. They're using it to compress the distance between idea and execution. That gap, the "I should build this but it'll take too long" gap? That's what AI collapses. Watched it click for dozens of people. Then realized: I've been stuck in that gap forever.

Building was always my mental blocker. I could architect systems, debug production fires, ship features with teams. But solo building? Painful. Slow. Tedious. The distance between "this would be useful" and "this exists" felt too long. Then Claude Code happened. Obsessed doesn't cover it. It's not about the code generation. It's about removing friction. The thing that used to take three hours of setup and boilerplate now takes ten minutes of conversation. So I stopped overthinking and just built stuff.

Built a bunch. Some of it is slop. Some of it is actually useful. local-smart-home-control: privacy-first smart home automation that runs locally because I don't trust cloud vendors with my light switches. claude-usage-widget: macOS menu bar thing that tracks Claude API usage so I stop accidentally burning credits. ai-catchup: terminal CLI for AI news with offline caching because doomscrolling HN and Twitter is inefficient. Built all three in a few days. That would've taken weeks before. The building isn't the hard part anymore. The deciding what to build is.

Then I got fixated on using my phone as an agent controller. The Claude Code web interface is fine but it's not intuitive. More importantly: it's not fun. I want to build from my phone. Lying down. Walking around. Wherever. Got Termius, which is this great SSH app. Spun up a Digital Ocean droplet. Should've been simple. Wasn't.

The sessions kept dying. Just: gone. Mid-work. I'd close the app, come back, fresh terminal. No tmux session. No context. Nothing. Infuriating. Spent way too long debugging this. OOM killer was murdering processes on a 512MB VM. Systemd user sessions were degraded. tmux was trying to create cgroups that didn't have permissions. Mosh was enabled but sessions still died when I switched networks or enabled VPN. The whole stack was fighting me.

Went through this absurdly long troubleshooting journey with ChatGPT. Two conversations. Hours of back and forth. Checked dmesg logs (permission denied). Added swap. Enabled linger. Tried systemd user services. Disabled systemd user services. The frustration was real. Every time I thought it was fixed, it broke again. Then finally: wrap tmux to bypass the broken systemd session entirely. One shell function. Done.

tmux() {
  command env -u DBUS_SESSION_BUS_ADDRESS -u XDG_RUNTIME_DIR /usr/bin/tmux "$@"
}

That's it. That's what made it work. Not the fancy systemd service. Not the perfectly tuned OOM scores. Just: don't let tmux talk to the degraded user systemd instance. It's been solid since. Phone works. Sessions survive. I can finally build from bed.

The real lesson here isn't about tmux or mosh or systemd. It's that I kept building through the frustration. Before? I would've stopped. "This is too annoying, I'll do it later on my laptop." But the idea that I could build from my phone, that the tooling should work seamlessly, kept me going. That's the shift. Building isn't optional anymore. It's the default mode.

Also realized: AI makes you impatient in a good way. You know things can be fast. You know the tools exist. So when something is slow or broken, you don't accept it. You fix it or you route around it. That impatience compounds. It's why I built three projects in a few days instead of thinking about them for three months.

The learning journey continues but it's on a tangent now. Still going deep on LLMs and embeddings and RAG pipelines. But also: building small useful things. Lots of slop. Some gems. The ratio doesn't matter. The momentum does. The mental blocker is gone. Building is easy now. Deciding what to build is the hard part. Good problems to have.

January 2, 2026

Theory Before the API

Went deep into LLMs. Proper deep. Like, sat down with Andrej Karpathy's "Intro to LLMs like ChatGPT" and didn't surface for hours kind of deep. The whole pretraining flow: tokenization (BPE), token embeddings in actual high-dimensional space (not the vague version), transformer layers doing attention, output distributions, softmax sampling. Then post-training, where the magic happens. RLHF, hallucinations, tool use, fine-tuning. The entire spectrum. Notes checkpoint by checkpoint are here if you want to follow the rabbit holes I went down.

Why? Building on sand sucks. If you're going to touch LLMs, the mental model has to come first. Not the "throw a prompt at the API and pray" version. The actual version. How these things work at the layer-by-layer level. It changes how you think about every problem downstream.

Then I built stuff. POCs first. Started with embeddings generation (OpenAI's text-embedding-3-small), realized I needed to actually understand vector similarity so I implemented cosine vs Euclidean. Set up ChromaDB. Tested different chunking strategies because I was curious (mistake? feature? still unclear). Then glued it all together into an end-to-end RAG pipeline. Experiments are here. Embeddings fundamentals, ChromaDB retrieval patterns, the pipeline. Three solid experiments.

The real takeaway: theory and code don't hold hands. Some chunking strategies look brilliant on a whiteboard. In practice? Garbage. Cosine similarity is elegant. Chunk size selection? Feels like art. Pure vibes. But here's the thing. Once you understand why embeddings work (semantic compression of meaning in vectors), debugging why your search sucks gets way faster. You're not just twiddling parameters hoping something sticks. You know what lever to pull. Also that video made me want to understand RL and DPO and all the weirder stuff that comes after. There's a lot here. Good problems to have.