ArchitectureBackendSystem Design
System Architecture: End-to-End Flow of a Single Question
Follow one question through every box in our backend — classifier, memory, LLM, YouTube, response, feedback, eval.
RM
Rohit Mehta
5 April 2025
The full path
User → API Gateway → Classifier → Memory → LLM → YouTube → Response → Feedback → Eval (judge model).
Tech stack
- Edge — Cloudflare Workers + TanStack Start for the SSR app.
- Classifier — Claude Haiku via a dedicated worker.
- Memory — Postgres + pgvector for user profile and recall.
- LLM — GPT-4o-mini for answers, GPT-4o for hard cases.
- Video — YouTube MCP.
- Eval — nightly DAG, judge prompt on a larger model.
Multi-tenancy
Every record carries a tenant_id. Institutions get their own row-level-secured slice, custom system prompts, and per-tenant analytics.
Observability
Every request emits a trace with spans for each box above. p95 end-to-end is currently 2.8s.