Back to blog
ArchitectureBackendSystem Design

System Architecture: End-to-End Flow of a Single Question

Follow one question through every box in our backend — classifier, memory, LLM, YouTube, response, feedback, eval.

RM
Rohit Mehta
5 April 2025
11 min read

The full path

User → API Gateway → Classifier → Memory → LLM → YouTube → Response → Feedback → Eval (judge model).

Tech stack

  • Edge — Cloudflare Workers + TanStack Start for the SSR app.
  • Classifier — Claude Haiku via a dedicated worker.
  • Memory — Postgres + pgvector for user profile and recall.
  • LLM — GPT-4o-mini for answers, GPT-4o for hard cases.
  • Video — YouTube MCP.
  • Eval — nightly DAG, judge prompt on a larger model.

Multi-tenancy

Every record carries a tenant_id. Institutions get their own row-level-secured slice, custom system prompts, and per-tenant analytics.

Observability

Every request emits a trace with spans for each box above. p95 end-to-end is currently 2.8s.

Share

Keep reading