Stop Defaulting to Python for LLM Apps
If you’re building an LLM app in 2026, the most important decision is how your product feels while the model is thinking, not which model you picked.
That usually means one thing: streaming by default. Not as a nice-to-have. Not “we’ll add it later.” The default.
And this is where Python starts to hurt, not because it’s “bad,” but because most teams use it in a way that makes streaming feel like a science project. Sure, you can do it. But you won’t. You’ll ship the blocking version, call it “MVP,” and hope nobody notices the UI freezing for 8 seconds.
My take: default to TypeScript (or Go) for the application layer. Keep Python where it shines: notebooks, offline evals, batch jobs, and training pipelines.
Why streaming is the default UX now
LLM latency isn’t an edge case anymore. Multi-step tool use, retrieval, and agent workflows routinely take seconds.
If the UI is frozen while your server “thinks,” the product feels broken, even when everything is technically “working.”
Streaming gives you a clean primitive for:
- incremental output (“typing” UX)
- progress updates (“searching…”, “calling tools…”, “summarizing…”)
- cancellation (disconnect → stop spending money)
- observability (events you can trace and replay)
SSE is the boring choice that usually wins. It’s one-way, browser-friendly, easy to proxy, and fits most “chat + updates” product shapes without the operational overhead of WebSockets.1
What “streaming-first” actually means (beyond token trickle)
Most teams start by streaming raw text. That’s fine for a demo, but apps get dramatically easier to build and debug when you treat the stream as typed events.
Here’s a simple event model that covers 90% of LLM products:
| Event | Purpose | Example payload |
|---|---|---|
message.delta | Incremental assistant text | { "text": "…next chunk…" } |
status | UI-visible progress | { "label": "Searching docs" } |
tool.call | Tool invocation started | { "name": "web.search", "args": {...} } |
tool.result | Tool completed | { "name": "web.search", "result": {...} } |
error | User-safe error | { "message": "Timed out" } |
final | End-of-stream marker | { "usage": {...} } |
Once you have this, “streaming” becomes a real product surface:
- you can render intermediate steps without leaking internal prompts
- you can cancel mid-tool-call without orphaning work
- you can replay a session from an event log
- you can test agents as deterministic event sequences
The Python problem (in practice)
Python can absolutely stream. The issue is the path most teams take to production:
- sync-first handlers
- WSGI-by-habit deployments (or ASGI, but only in name)
- proxies that buffer
- “just yield strings” demos that fall apart under load
- no cancellation propagation
To make Python streaming feel truly solid, you need the right ASGI stack, timeouts, worker model, async discipline, and careful I/O. It’s doable. It’s also exactly the kind of “we’ll clean it up later” work that rarely happens.
The outcome is predictable: click → spinner → giant blob of text. It’s a time capsule UX.
Streaming failure modes you only notice in production
These show up regardless of language, but Python teams hit them more often because the defaults skew sync-first.
| Symptom | Common cause | Fix / mitigation |
|---|---|---|
| Client gets nothing, then a full response | Proxy buffering | Disable buffering for SSE routes; set correct headers; test through the real proxy/CDN |
| “Works locally, fails in prod” | WSGI server or middleware buffering | Use an ASGI server end-to-end; remove response transforms on streaming routes |
| Tokens keep streaming after user navigates away | No cancellation wiring | Propagate disconnect signals; cancel downstream tasks; stop tool calls where possible |
| One slow tool blocks everyone | Shared event loop work | Offload CPU-bound work; cap concurrency; isolate per-request tasks |
| Streaming stops randomly | Timeouts at proxy/load balancer | Raise idle timeouts; send heartbeat status events to keep the pipe alive |
If your stack makes these footguns easy to trip, most teams won’t spend the time to un-trip them. They’ll ship the non-streaming version and move on.
Why TypeScript is the pragmatic default
If your product lives on the web (or has a web admin, dashboard, internal tool, etc.), TypeScript is the path of least resistance:
- Streaming is a first-class citizen (
fetch,ReadableStream, edge runtimes) - Types can travel from server → client (events, tool schemas, response shapes)
- You can keep “stream semantics” consistent across local dev, server, and edge
- Most teams already ship JS/TS daily (so the hard parts actually get built)
If you want high performance with straightforward concurrency, Go is also a great default for the serving layer.
The “default stack” table (serving layer)
There’s no universal best. The point is to pick a stack where streaming, cancellation, and typed events are the happy path.
| Serving layer | Best for | Trade-offs |
|---|---|---|
| TypeScript on Node | Most web apps; shared types; rapid iteration | You still need discipline around backpressure and cancellation |
| TypeScript on edge runtimes | Low-latency streaming close to users | Runtime constraints; some libraries won’t run; debugging differs |
| Go | High concurrency; predictable performance | Slower iteration for UI-coupled products; fewer shared types by default |
| Python (ASGI) | Teams already deep in Python | Easy to drift into sync; streaming quality depends heavily on deployment details |
Concrete examples (this is already where the world is going)
If you’re looking for signals about where the ecosystem is investing:
- OpenAI Agents SDK (TypeScript): code-first agent workflows with streaming and tracing.
- OpenAI streaming responses: a first-class, evented streaming model for partial outputs and tool calls.
- LangChain for JS: streaming is a core concept (
stream,streamEvents) across models and agents.
When the tooling is designed around typed events, the winning apps are the ones that embrace an evented architecture end-to-end.
A benchmark sanity check (a proxy for streaming concurrency)
No benchmark perfectly matches “LLM app streaming,” because real apps include upstream model latency, tool calls, queueing, and long-lived connections.
But if you want a rough proxy for “how well does this stack handle lots of concurrent clients when each client’s work is small?”, the TechEmpower Framework Benchmarks plaintext test is a useful sanity check.2
Here are a few datapoints from the same TFB run, all at 256 concurrent connections:
| Stack (TFB plaintext @ 256 connections) | Requests/sec |
|---|---|
| Bun | 2,689,225.88 |
| Node.js | 448,859.66 |
| Python (Uvicorn) | 404,610.11 |
| Python (Starlette) | 261,746.63 |
| Go (Gin) | 561,519.96 |
These are not “SSE numbers.” But the shape is relevant: event-loop-style runtimes and goroutine-based servers tend to do well when they’re mostly multiplexing I/O.
For a more directly “streaming-shaped” example, here’s a Go-focused writeup explicitly load-testing an SSE endpoint and running into (and then working past) connection limits: Scaling SSE to 1M connections.
A simple rule of thumb
Use Python where it dominates:
- eval notebooks
- data exploration
- offline batch pipelines
- model experiments
But for the “user is waiting for tokens” layer, treat streaming as table stakes and pick a stack where that’s the happy path (TypeScript/Node, edge runtimes, or Go).
That layer wants:
- streaming-first transport (SSE)
- cancellation by default
- typed events
- concurrency you don’t have to fight
- deployment targets that don’t surprise you (Node, edge, Go)
If you do choose Python, make it a deliberate choice
Python can be the right answer when your app is effectively a thin UI around a Python-heavy workflow. In that case, pay the streaming tax up front:
- commit to ASGI end-to-end (no “WSGI in prod” surprises)
- explicitly test streaming through your real proxy/CDN
- add cancellation tests (disconnect should stop work)
- standardize an event schema (don’t stream raw strings forever)
- isolate background work (tools, retrieval, CPU tasks) from request I/O
If you don’t want to do that work, don’t pick the stack that requires it.
A quick note on Bun
If you’re already leaning TypeScript, it’s hard not to bring up Bun.
You can build excellent LLM apps on standard Node.js. With the right libraries, configuration, and a clean streaming implementation, Node can match what most teams need (especially if you treat “don’t block the event loop” as a production requirement).3
But Bun tends to pull teams toward a better default experience:
- fast installs and dev startup (which matters more than people admit)
- a modern toolchain posture out of the box
- fewer “papercuts” when you’re iterating quickly on servers + tooling
When a runtime makes the default loop feel that good, it also becomes strategically interesting. It’s the kind of project you could imagine a major platform company wanting to own someday. (Speculation.)
The checklist (if you want your app to feel modern)
If you want the product to feel fast even when it isn’t, build this:
- Stream by default (SSE is fine).
- Propagate cancellation (disconnect → stop work).
- Emit typed events (not just raw text).
- Design for backpressure (don’t buffer forever).
- Trace everything (so you can debug agent workflows).
- Choose a runtime you’ll actually optimize (Node or Bun are both fine - pick one and commit).
Do those things and you’ll ship something that feels modern. Skip them and you’ll end up with the classic “LLM demo” experience: impressive output, frustrating UX.
And yes, there are exceptions. But in 2026, “exceptions” shouldn’t define the default.
Footnotes
-
MDN: Server-sent events (SSE). https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events ↩
-
TechEmpower Framework Benchmarks: plaintext test results + raw outputs. https://tfb-status.techempower.com/results/66d86090-b6d0-46b3-9752-5aa4913b2e33 ↩
-
Node.js guide: “Don’t block the event loop.” https://nodejs.org/en/docs/guides/dont-block-the-event-loop/ ↩