SteadyKestrel
The author has 8 years of experience building and shipping software. He previously worked at Google and Meta. He built and shipped streaming-first LLM products, developer tooling, and real-time web experiences. He focuses on RAG, Search, Context design, TypeScript, and Observability.
9 articles
-
GuidesContext Engineering for Internal Assistants
Internal assistants fail less when context is assembled deliberately instead of dumped wholesale into the model. Here is a practical context-engineering approach: task framing, scope, retrieval, compression, and provenance.
-
GuidesPrompt Caching for LLM Apps: Where It Actually Pays Off
Prompt caching only pays when your reusable prefix is stable, versioned, and safe to share. The hard part is not turning it on; it is deciding what may be cached.
-
GuidesWhat to Log for LLM Apps Before You Need It
A concrete logging model for LLM apps: traces, tool calls, approvals, versioned context, and the minimum metadata needed to reconstruct failure.
-
GuidesQueue Design for Long-Running Agents
How to design durable queues for agent jobs: state machines, idempotency, checkpoints, cancellation, and the failure modes SQS and Temporal force you to handle.
-
GuidesWhen to Fine-Tune vs Retrieve vs Prompt
Fine-tuning, retrieval, and prompt engineering solve different failure modes. Here is the decision framework I use when a team asks how to make an LLM app more accurate, cheaper, or easier to operate.
-
GuidesThe Return of RAG in 2026
RAG is back in 2026 because long context did not solve freshness, permissions, or reliability. Modern RAG looks like search engineering: hybrid retrieval, reranking, and tight evals.
-
GuidesLLM Evals for Chat and Tool-Using Agents: A Practical Guide to Test Suites and Graders
A production-first guide to evaluating chat assistants and tool-using agents with a small, reliable eval suite: datasets, grader types, flake reduction, and CI gates.
-
GuidesThe LLM Cost and Scaling Playbook: Cut Your Bill Without Killing Quality
A practical, production-first guide to reducing LLM spend with model routing, token discipline, caching, batching, and rate-limit aware throughput.
-
OpinionStop Defaulting to Python for LLM Apps
If streaming is the default UX, TypeScript is the pragmatic default stack.