Problem
The agent backend runs on a single uvicorn worker process with an in-memory checkpointer on a 512MB Render starter instance. This is a global bottleneck — not per-user. All concurrent users share the same event loop, the same memory pool, and the same 200-thread checkpoint limit.
Currently the app breaks at ~3 concurrent connections (#63). At production scale (100-1000 users), it would be effectively unusable.
Architecture bottlenecks
1. Single worker process
- uvicorn runs with 1 worker (default) — all requests share one Python event loop
- Each GPT-5.4 visualization call takes 10-30s
- LangGraph has synchronous sections that block the event loop
- Throughput: ~2-6 visualization requests/minute
2. In-memory checkpointer (BoundedMemorySaver)
- All conversation state stored in RAM — shared global pool of 200 threads
- FIFO eviction: after 200 conversations across ALL users, oldest threads are silently deleted
- Users lose conversation context mid-session with no error
- Not thread-safe — designed for single-process async only
- On 512MB starter plan, memory pressure builds well before 200 threads
3. No backpressure or error surfacing
- When the backend is saturated, requests hang silently — no timeout, no error, no retry
- Frontend shows no indication that the agent is overloaded
- Health check at
/health returns 200 even when the event loop is blocked
Scale projections
| Concurrent users |
Behavior |
| 1-5 |
Works fine |
| 10-20 |
Noticeable latency, requests queue |
| 50+ |
Requests timeout, SSE connections drop |
| 100+ |
Effectively down, health checks fail, Render restarts |
Proposed solution
Phase 1 — Quick wins (config changes only)
Phase 2 — Persistent checkpointer
Phase 3 — Error handling and backpressure
Phase 4 — Horizontal scaling
Related issues
Key files
apps/agent/main.py — uvicorn config, BoundedMemorySaver(max_threads=200)
apps/agent/src/bounded_memory_saver.py — FIFO eviction logic
render.yaml — Render service config (starter plan, no worker config)
apps/app/src/app/api/copilotkit/route.ts — Frontend → agent connection
Problem
The agent backend runs on a single uvicorn worker process with an in-memory checkpointer on a 512MB Render starter instance. This is a global bottleneck — not per-user. All concurrent users share the same event loop, the same memory pool, and the same 200-thread checkpoint limit.
Currently the app breaks at ~3 concurrent connections (#63). At production scale (100-1000 users), it would be effectively unusable.
Architecture bottlenecks
1. Single worker process
2. In-memory checkpointer (BoundedMemorySaver)
3. No backpressure or error surfacing
/healthreturns 200 even when the event loop is blockedScale projections
Proposed solution
Phase 1 — Quick wins (config changes only)
--workers 4to uvicorn startCommand inrender.yaml— multiplies throughput ~4xrender.yamlRATE_LIMIT_ENABLED=true) with reasonable limits (e.g. 20 req/min per IP)Phase 2 — Persistent checkpointer
BoundedMemorySaverwith PostgreSQL or SQLite async checkpointerrender.yamlPhase 3 — Error handling and backpressure
sessionStorage) to avoid creating unnecessary threadsPhase 4 — Horizontal scaling
Related issues
Key files
apps/agent/main.py— uvicorn config, BoundedMemorySaver(max_threads=200)apps/agent/src/bounded_memory_saver.py— FIFO eviction logicrender.yaml— Render service config (starter plan, no worker config)apps/app/src/app/api/copilotkit/route.ts— Frontend → agent connection