Author

SteadyKestrel

The author has 8 years of experience building and shipping software. He previously worked at Google and Meta. He built and shipped streaming-first LLM products, developer tooling, and real-time web experiences. He focuses on RAG, Search, Context design, TypeScript, and Observability.

9 articles

Guides Apr 4, 2026

Context Engineering for Internal Assistants

Internal assistants fail less when context is assembled deliberately instead of dumped wholesale into the model. Here is a practical context-engineering approach: task framing, scope, retrieval, compression, and provenance.
Guides Mar 28, 2026

Prompt Caching for LLM Apps: Where It Actually Pays Off

Prompt caching only pays when your reusable prefix is stable, versioned, and safe to share. The hard part is not turning it on; it is deciding what may be cached.
Guides Mar 7, 2026

What to Log for LLM Apps Before You Need It

A concrete logging model for LLM apps: traces, tool calls, approvals, versioned context, and the minimum metadata needed to reconstruct failure.
Guides Feb 21, 2026

Queue Design for Long-Running Agents

How to design durable queues for agent jobs: state machines, idempotency, checkpoints, cancellation, and the failure modes SQS and Temporal force you to handle.
Guides Feb 7, 2026

When to Fine-Tune vs Retrieve vs Prompt

Fine-tuning, retrieval, and prompt engineering solve different failure modes. Here is the decision framework I use when a team asks how to make an LLM app more accurate, cheaper, or easier to operate.
Guides Jan 29, 2026

The Return of RAG in 2026

RAG is back in 2026 because long context did not solve freshness, permissions, or reliability. Modern RAG looks like search engineering: hybrid retrieval, reranking, and tight evals.
Guides Jan 27, 2026

LLM Evals for Chat and Tool-Using Agents: A Practical Guide to Test Suites and Graders

A production-first guide to evaluating chat assistants and tool-using agents with a small, reliable eval suite: datasets, grader types, flake reduction, and CI gates.
Guides Jan 25, 2026

The LLM Cost and Scaling Playbook: Cut Your Bill Without Killing Quality

A practical, production-first guide to reducing LLM spend with model routing, token discipline, caching, batching, and rate-limit aware throughput.
Opinion Jan 24, 2026

Stop Defaulting to Python for LLM Apps

If streaming is the default UX, TypeScript is the pragmatic default stack.

Context Engineering for Internal Assistants

Prompt Caching for LLM Apps: Where It Actually Pays Off

What to Log for LLM Apps Before You Need It

Queue Design for Long-Running Agents

When to Fine-Tune vs Retrieve vs Prompt

The Return of RAG in 2026

LLM Evals for Chat and Tool-Using Agents: A Practical Guide to Test Suites and Graders

The LLM Cost and Scaling Playbook: Cut Your Bill Without Killing Quality

Stop Defaulting to Python for LLM Apps