Field Journal - Stop Defaulting to Python for LLM Apps

If you’re building an LLM app in 2026, the most important decision is how your product feels while the model is thinking, not which model you picked.

That usually means one thing: streaming by default. Not as a nice-to-have. Not “we’ll add it later.” The default.

And this is where Python starts to hurt, not because it’s “bad,” but because most teams use it in a way that makes streaming feel like a science project. Sure, you can do it. But you won’t. You’ll ship the blocking version, call it “MVP,” and hope nobody notices the UI freezing for 8 seconds.

My take: default to TypeScript (or Go) for the application layer. Keep Python where it shines: notebooks, offline evals, batch jobs, and training pipelines.

Why streaming is the default UX now

LLM latency isn’t an edge case anymore. Multi-step tool use, retrieval, and agent workflows routinely take seconds.

If the UI is frozen while your server “thinks,” the product feels broken, even when everything is technically “working.”

Streaming gives you a clean primitive for:

incremental output (“typing” UX)
progress updates (“searching…”, “calling tools…”, “summarizing…”)
cancellation (disconnect → stop spending money)
observability (events you can trace and replay)

SSE is the boring choice that usually wins. It’s one-way, browser-friendly, easy to proxy, and fits most “chat + updates” product shapes without the operational overhead of WebSockets.¹

What “streaming-first” actually means (beyond token trickle)

Most teams start by streaming raw text. That’s fine for a demo, but apps get dramatically easier to build and debug when you treat the stream as typed events.

Here’s a simple event model that covers 90% of LLM products:

Event	Purpose	Example payload
`message.delta`	Incremental assistant text	`{ "text": "…next chunk…" }`
`status`	UI-visible progress	`{ "label": "Searching docs" }`
`tool.call`	Tool invocation started	`{ "name": "web.search", "args": {...} }`
`tool.result`	Tool completed	`{ "name": "web.search", "result": {...} }`
`error`	User-safe error	`{ "message": "Timed out" }`
`final`	End-of-stream marker	`{ "usage": {...} }`

Once you have this, “streaming” becomes a real product surface:

you can render intermediate steps without leaking internal prompts
you can cancel mid-tool-call without orphaning work
you can replay a session from an event log
you can test agents as deterministic event sequences

The Python problem (in practice)

Python can absolutely stream. The issue is the path most teams take to production:

sync-first handlers
WSGI-by-habit deployments (or ASGI, but only in name)
proxies that buffer
“just yield strings” demos that fall apart under load
no cancellation propagation

To make Python streaming feel truly solid, you need the right ASGI stack, timeouts, worker model, async discipline, and careful I/O. It’s doable. It’s also exactly the kind of “we’ll clean it up later” work that rarely happens.

The outcome is predictable: click → spinner → giant blob of text. It’s a time capsule UX.

Streaming failure modes you only notice in production

These show up regardless of language, but Python teams hit them more often because the defaults skew sync-first.

Symptom	Common cause	Fix / mitigation
Client gets nothing, then a full response	Proxy buffering	Disable buffering for SSE routes; set correct headers; test through the real proxy/CDN
“Works locally, fails in prod”	WSGI server or middleware buffering	Use an ASGI server end-to-end; remove response transforms on streaming routes
Tokens keep streaming after user navigates away	No cancellation wiring	Propagate disconnect signals; cancel downstream tasks; stop tool calls where possible
One slow tool blocks everyone	Shared event loop work	Offload CPU-bound work; cap concurrency; isolate per-request tasks
Streaming stops randomly	Timeouts at proxy/load balancer	Raise idle timeouts; send heartbeat `status` events to keep the pipe alive

If your stack makes these footguns easy to trip, most teams won’t spend the time to un-trip them. They’ll ship the non-streaming version and move on.

Why TypeScript is the pragmatic default

If your product lives on the web (or has a web admin, dashboard, internal tool, etc.), TypeScript is the path of least resistance:

Streaming is a first-class citizen (fetch, ReadableStream, edge runtimes)
Types can travel from server → client (events, tool schemas, response shapes)
You can keep “stream semantics” consistent across local dev, server, and edge
Most teams already ship JS/TS daily (so the hard parts actually get built)

If you want high performance with straightforward concurrency, Go is also a great default for the serving layer.

The “default stack” table (serving layer)

There’s no universal best. The point is to pick a stack where streaming, cancellation, and typed events are the happy path.

Serving layer	Best for	Trade-offs
TypeScript on Node	Most web apps; shared types; rapid iteration	You still need discipline around backpressure and cancellation
TypeScript on edge runtimes	Low-latency streaming close to users	Runtime constraints; some libraries won’t run; debugging differs
Go	High concurrency; predictable performance	Slower iteration for UI-coupled products; fewer shared types by default
Python (ASGI)	Teams already deep in Python	Easy to drift into sync; streaming quality depends heavily on deployment details

Concrete examples (this is already where the world is going)

If you’re looking for signals about where the ecosystem is investing:

OpenAI Agents SDK (TypeScript): code-first agent workflows with streaming and tracing.
- https://openai.github.io/openai-agents-js/
OpenAI streaming responses: a first-class, evented streaming model for partial outputs and tool calls.
- https://platform.openai.com/docs/guides/streaming
LangChain for JS: streaming is a core concept (stream, streamEvents) across models and agents.
- https://js.langchain.com/

When the tooling is designed around typed events, the winning apps are the ones that embrace an evented architecture end-to-end.

A benchmark sanity check (a proxy for streaming concurrency)

No benchmark perfectly matches “LLM app streaming,” because real apps include upstream model latency, tool calls, queueing, and long-lived connections.

But if you want a rough proxy for “how well does this stack handle lots of concurrent clients when each client’s work is small?”, the TechEmpower Framework Benchmarks plaintext test is a useful sanity check.²

Here are a few datapoints from the same TFB run, all at 256 concurrent connections:

Stack (TFB plaintext @ 256 connections)	Requests/sec
Bun	2,689,225.88
Node.js	448,859.66
Python (Uvicorn)	404,610.11
Python (Starlette)	261,746.63
Go (Gin)	561,519.96

These are not “SSE numbers.” But the shape is relevant: event-loop-style runtimes and goroutine-based servers tend to do well when they’re mostly multiplexing I/O.

For a more directly “streaming-shaped” example, here’s a Go-focused writeup explicitly load-testing an SSE endpoint and running into (and then working past) connection limits: Scaling SSE to 1M connections.

A simple rule of thumb

Use Python where it dominates:

eval notebooks
data exploration
offline batch pipelines
model experiments

But for the “user is waiting for tokens” layer, treat streaming as table stakes and pick a stack where that’s the happy path (TypeScript/Node, edge runtimes, or Go).

That layer wants:

streaming-first transport (SSE)
cancellation by default
typed events
concurrency you don’t have to fight
deployment targets that don’t surprise you (Node, edge, Go)

If you do choose Python, make it a deliberate choice

Python can be the right answer when your app is effectively a thin UI around a Python-heavy workflow. In that case, pay the streaming tax up front:

commit to ASGI end-to-end (no “WSGI in prod” surprises)
explicitly test streaming through your real proxy/CDN
add cancellation tests (disconnect should stop work)
standardize an event schema (don’t stream raw strings forever)
isolate background work (tools, retrieval, CPU tasks) from request I/O

If you don’t want to do that work, don’t pick the stack that requires it.

A quick note on Bun

If you’re already leaning TypeScript, it’s hard not to bring up Bun.

You can build excellent LLM apps on standard Node.js. With the right libraries, configuration, and a clean streaming implementation, Node can match what most teams need (especially if you treat “don’t block the event loop” as a production requirement).³

But Bun tends to pull teams toward a better default experience:

fast installs and dev startup (which matters more than people admit)
a modern toolchain posture out of the box
fewer “papercuts” when you’re iterating quickly on servers + tooling

When a runtime makes the default loop feel that good, it also becomes strategically interesting. It’s the kind of project you could imagine a major platform company wanting to own someday. (Speculation.)

The checklist (if you want your app to feel modern)

If you want the product to feel fast even when it isn’t, build this:

Stream by default (SSE is fine).
Propagate cancellation (disconnect → stop work).
Emit typed events (not just raw text).
Design for backpressure (don’t buffer forever).
Trace everything (so you can debug agent workflows).
Choose a runtime you’ll actually optimize (Node or Bun are both fine - pick one and commit).

Do those things and you’ll ship something that feels modern. Skip them and you’ll end up with the classic “LLM demo” experience: impressive output, frustrating UX.

And yes, there are exceptions. But in 2026, “exceptions” shouldn’t define the default.

MDN: Server-sent events (SSE). https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events ↩
TechEmpower Framework Benchmarks: plaintext test results + raw outputs. https://tfb-status.techempower.com/results/66d86090-b6d0-46b3-9752-5aa4913b2e33 ↩
Node.js guide: “Don’t block the event loop.” https://nodejs.org/en/docs/guides/dont-block-the-event-loop/ ↩