diff --git a/README.md b/README.md
new file mode 100644
index 0000000..164794c
--- /dev/null
+++ b/README.md
@@ -0,0 +1,713 @@
+# Adiuva Cloud API
+
+**AI-powered project management backend with E2E encrypted cloud storage, LLM orchestration, and a plugin marketplace.**
+
+Built with FastAPI · Python 3.12 · PostgreSQL · LangChain · Stripe · AWS S3
+
+---
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Architecture](#architecture)
+- [Key Features](#key-features)
+- [Tech Stack](#tech-stack)
+- [Getting Started](#getting-started)
+- [Docker Deployment](#docker-deployment)
+- [Environment Variables](#environment-variables)
+- [API Reference](#api-reference)
+- [Data Model](#data-model)
+- [AI Agent System](#ai-agent-system)
+- [Orchestration & Execution Plans](#orchestration--execution-plans)
+- [Middleware](#middleware)
+- [Storage Layer](#storage-layer)
+- [Billing & Tiers](#billing--tiers)
+- [Plugin Marketplace](#plugin-marketplace)
+- [Testing](#testing)
+- [Project Structure](#project-structure)
+- [License](#license)
+
+---
+
+## Overview
+
+Adiuva Cloud API is the FastAPI backend that powers the **Adiuva Electron desktop app**. It provides LLM-powered chat orchestration, end-to-end encrypted cloud storage, a vector search engine, an encrypted backup system, a plugin marketplace with revenue sharing, and Stripe-based subscription billing across four tiers.
+
+### Design Principles
+
+1. **Never persist user data in plaintext** — the database stores only auth, billing, storage metadata, and marketplace data. All user content is E2E encrypted by the client before reaching the server.
+2. **Never expose prompts** — system prompts stay server-side; responses are sanitized to strip any leaked prompt fragments.
+3. **Never decrypt user blobs** — the backend performs only checksum verification; no decryption keys ever reach the server.
+4. **Stateless request handling** — all context comes from the client and JWT; no server-side session state.
+5. **Tier gates enforced server-side** — the server always reads the current tier from the database, never trusting client-reported values.
+
+---
+
+## Architecture
+
+```
+┌──────────────┐      ┌────────────────────────────────────────────────────────┐
+│  Electron    │      │  FastAPI  (Uvicorn / Gunicorn)                         │
+│  Desktop App │────▶│                                                        │
+│  (Client)    │◀────│  Middleware: RateLimit → Sanitizer → CORS → Router     │
+└──────────────┘      │                                                        │
+                      │  ┌──────────────────┐  ┌────────────────────────────┐  │
+                      │  │  Auth Routes     │  │  Chat Routes               │  │
+                      │  │  Billing Routes  │  │    ↓                       │  │
+                      │  │  Storage Routes  │  │  Orchestrator (GPT-4o-mini)│  │
+                      │  │  Backup Routes   │  │    ↓ classify intent       │  │
+                      │  │  Plugin Routes   │  │  Agent Registry            │  │
+                      │  │  Vector Routes   │  │    ↓                       │  │
+                      │  │  Plans Routes    │  │  TaskAgent  | ProjectAgent │  │
+                      │  └──────────────────┘  │  NoteAgent  | CheckptAgent │  │
+                      │                        │  (GPT-4o + LangChain)      │  │
+                      │                        └────────────────────────────┘  │
+                      └────────────────────────────────────────────────────────┘
+                               │              │              │
+                      ┌────────▼───┐  ┌───────▼───────┐  ┌──▼─────────────┐
+                      │ PostgreSQL │  │  AWS S3       │  │ Pinecone /     │
+                      │ (Auth,     │  │  (E2E blobs,  │  │ Qdrant         │
+                      │  Billing,  │  │   backups)    │  │ (Vectors)      │
+                      │  Metadata) │  └───────────────┘  └────────────────┘
+                      └────────────┘
+                               │
+                      ┌────────▼───┐
+                      │  Stripe    │
+                      │  (Billing, │
+                      │   Connect) │
+                      └────────────┘
+```
+
+---
+
+## Key Features
+
+1. **LLM-powered orchestration** — GPT-4o-mini classifies user intent and routes to the appropriate domain agent.
+2. **4 specialized AI agents** — Tasks (8 tools), Projects (6 tools), Checkpoints (4 tools), Notes (5 tools), all powered by GPT-4o via LangChain.
+3. **Execution plans & playbooks** — Server-side prompt template registry; clients receive only opaque template IDs, never raw prompts.
+4. **E2E encrypted cloud storage** — The backend never decrypts user data; SHA-256 checksum verification uses constant-time comparison to prevent timing attacks.
+5. **Cloud vector store** — Pinecone or Qdrant with user-isolated namespaces and encrypted blob payloads.
+6. **Encrypted backup system** — Tiered storage limits with `If-Modified-Since` support for efficient syncing.
+7. **Plugin marketplace** — Catalog, admin review/approval workflow, security checklist, and 70/30 revenue sharing via Stripe Connect.
+8. **Stripe billing** — Four-tier subscription model (Free / Pro / Power / Team) with checkout sessions and full webhook lifecycle handling.
+9. **JWT authentication** — Access + refresh tokens with bcrypt password hashing, SHA-256 token hashing, and automatic rotation.
+10. **Prompt IP protection** — Sanitizer middleware strips system prompts, reasoning markers, tool schemas, and agent routing metadata from all chat responses.
+11. **Tier-based rate limiting** — Sliding-window per-user limiter scaling from 20 to 200 requests/min by subscription tier.
+12. **Zero-trust data model** — User content is never stored in plaintext; the database holds only authentication, billing, and metadata records.
+13. **WebSocket streaming** — Real-time chat with 30-second heartbeat keep-alive and chunked text delivery.
+14. **Alembic migrations** — Versioned schema management with seed data for the plugin marketplace.
+15. **Comprehensive test suite** — In-memory SQLite + moto S3 mocks, per-tier test fixtures, and full API coverage without external dependencies.
+
+---
+
+## Tech Stack
+
+| Package | Version | Purpose |
+|---|---|---|
+| `fastapi` | ≥ 0.115.0 | Web framework |
+| `uvicorn[standard]` | ≥ 0.34.0 | ASGI development server |
+| `gunicorn` | ≥ 22.0.0 | Production process manager |
+| `langchain` | ≥ 0.3.0 | LLM orchestration framework |
+| `langchain-openai` | ≥ 0.3.0 | OpenAI LLM provider integration |
+| `litellm` | ≥ 1.50.0 | Universal LLM gateway (100+ providers) |
+| `pydantic` | ≥ 2.10.0 | Data validation and serialization |
+| `pydantic-settings` | ≥ 2.7.0 | Environment-based configuration |
+| `python-jose[cryptography]` | ≥ 3.3.0 | JWT encoding and decoding |
+| `stripe` | ≥ 11.0.0 | Billing and payment integration |
+| `boto3` | ≥ 1.35.0 | AWS S3 client |
+| `slowapi` | ≥ 0.1.9 | Rate limiting utilities |
+| `sqlalchemy` | ≥ 2.0.0 | Async ORM and query builder |
+| `asyncpg` | ≥ 0.30.0 | PostgreSQL async driver |
+| `alembic` | ≥ 1.14.0 | Database migration management |
+| `bcrypt` | ≥ 4.2.0 | Password hashing |
+| `python-dotenv` | ≥ 1.0.0 | `.env` file loading |
+| `httpx` | ≥ 0.28.0 | Async HTTP client (used in tests) |
+| `websockets` | ≥ 14.0 | WebSocket protocol support |
+| `psycopg2-binary` | ≥ 2.9.0 | Synchronous PostgreSQL driver (Alembic) |
+| `pinecone` | ≥ 5.0.0 | Pinecone vector store client |
+| `qdrant-client` | ≥ 1.7.0 | Qdrant vector store client |
+| `pytest` | ≥ 8.0.0 | Test framework |
+| `pytest-asyncio` | ≥ 0.24.0 | Async test support |
+| `aiosqlite` | ≥ 0.20.0 | In-memory SQLite for tests |
+| `moto[s3]` | ≥ 5.0.0 | AWS S3 mock for tests |
+| `ruff` | ≥ 0.8.0 | Linter and formatter |
+
+---
+
+## Getting Started
+
+### Prerequisites
+
+- Python 3.12+
+- PostgreSQL 16+
+- An OpenAI API key (for LLM features)
+- Stripe API keys (optional — billing stubs gracefully when unconfigured)
+- AWS credentials (optional — needed for S3 storage in production)
+
+### Installation
+
+```bash
+# Clone the repository
+git clone <repo-url> && cd adiuva-api
+
+# Create a virtual environment
+python -m venv .venv && source .venv/bin/activate
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Configure environment
+cp .env.example .env
+# Edit .env with your DATABASE_URL, OPENAI_API_KEY, etc.
+```
+
+### Database Setup
+
+```bash
+# Start PostgreSQL (or use the Docker Compose database)
+docker compose up db -d
+
+# Run migrations
+alembic upgrade head
+```
+
+### Run the Development Server
+
+```bash
+uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
+```
+
+Interactive API docs are available at [http://localhost:8000/docs](http://localhost:8000/docs) in development mode (`ENV=dev`). The `/docs` endpoint is disabled in production.
+
+---
+
+## Docker Deployment
+
+### Quick Start
+
+```bash
+docker compose up --build
+```
+
+This starts two services:
+
+- **app** — FastAPI server on port `8000`
+- **db** — PostgreSQL 16 (Alpine) on port `5432` with a persistent volume and health checks
+
+### Dockerfile Details
+
+The Dockerfile uses a multi-stage build:
+
+1. **Builder stage** — Installs Python dependencies into a virtual environment.
+2. **Runtime stage** — Copies only the venv, app source, and Alembic migrations. Runs as a non-root user (`appuser`).
+3. **Production server** — Gunicorn with 4 Uvicorn workers, 120-second timeout, listening on port 8000.
+
+```bash
+# Production command (run by the container)
+gunicorn app.main:app -k uvicorn.workers.UvicornWorker -w 4 --timeout 120 -b 0.0.0.0:8000
+```
+
+---
+
+## Environment Variables
+
+All variables are loaded from a `.env` file via Pydantic Settings. Source: `app/config/settings.py`
+
+| Variable | Type | Default | Description |
+|---|---|---|---|
+| `DATABASE_URL` | `str` | `postgresql+asyncpg://postgres:postgres@localhost:5432/adiuva` | Async SQLAlchemy connection string |
+| `JWT_SECRET` | `str` | `change-me-in-production` | HMAC secret for JWT signing |
+| `JWT_ALGORITHM` | `str` | `HS256` | JWT signing algorithm |
+| `JWT_ACCESS_TOKEN_EXPIRE_MINUTES` | `int` | `30` | Access token time-to-live |
+| `JWT_REFRESH_TOKEN_EXPIRE_DAYS` | `int` | `30` | Refresh token time-to-live |
+| `STRIPE_SECRET_KEY` | `str` | `""` | Stripe API key (empty = stub mode) |
+| `STRIPE_WEBHOOK_SECRET` | `str` | `""` | Stripe webhook signature secret |
+| `S3_BUCKET` | `str` | `""` | S3 bucket for encrypted blobs and backups |
+| `S3_REGION` | `str` | `us-east-1` | AWS region |
+| `AWS_ACCESS_KEY_ID` | `str` | `""` | AWS credentials |
+| `AWS_SECRET_ACCESS_KEY` | `str` | `""` | AWS credentials |
+| `PINECONE_API_KEY` | `str` | `""` | Pinecone API key (if set, Pinecone is used for vectors) |
+| `PINECONE_INDEX` | `str` | `adiuva` | Pinecone index name |
+| `QDRANT_URL` | `str` | `""` | Qdrant URL (used when Pinecone is not configured) |
+| `QDRANT_API_KEY` | `str` | `""` | Qdrant API key |
+| `OPENAI_API_KEY` | `str` | `""` | OpenAI key for LLM agent calls |
+| `LLM_MODEL` | `str` | `gpt-4o` | LiteLLM model identifier for agents (e.g. `anthropic/claude-3.5-sonnet`, `gemini/gemini-pro`, `ollama/llama3`) |
+| `LLM_ROUTER_MODEL` | `str` | `gpt-4o-mini` | Lighter model used for intent classification / routing |
+| `CORS_ORIGINS` | `list[str]` | `["app://.", "http://localhost:3000", "http://localhost:5173"]` | Allowed CORS origins |
+| `ENV` | `Literal` | `dev` | `dev` or `prod` — controls `/docs` visibility and SQL echo |
+
+---
+
+## API Reference
+
+All routes are prefixed with `/api/v1`. **27 endpoints** total (25 REST + 1 WebSocket + 1 health check).
+
+### Health
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| `GET` | `/api/v1/health` | No | Returns `{"status": "ok", "version": "0.1.0"}` |
+
+### Auth
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| `POST` | `/api/v1/auth/register` | No | Create account with bcrypt-hashed password, returns `AuthTokens` |
+| `POST` | `/api/v1/auth/login` | No | Validate credentials, returns `AuthTokens` |
+| `POST` | `/api/v1/auth/refresh` | No | Rotate refresh token, returns new `AuthTokens` |
+| `GET` | `/api/v1/auth/me` | JWT | Returns `UserProfile` for the authenticated user |
+
+### Chat
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| `POST` | `/api/v1/chat` | JWT | Route message through the orchestrator; returns `ChatResponse` or `ExecutionPlan` depending on execution mode |
+| `WS` | `/api/v1/chat/stream` | JWT (query param `?token=`) | Streaming chat — first frame is a `ChatRequest`, server yields text chunks, final frame is `{"done": true, "response": "...", "actions": [...]}`. 30-second heartbeat ping. |
+
+### Plans
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| `GET` | `/api/v1/plans/playbook` | JWT | List all cached execution plan playbooks |
+| `GET` | `/api/v1/plans/playbook/{plan_id}` | JWT | Retrieve a specific playbook by ID |
+
+### Storage (Cloud Records)
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| `POST` | `/api/v1/storage/records` | JWT | Upload an E2E encrypted record (verifies checksum, enforces storage quota) |
+| `GET` | `/api/v1/storage/records` | JWT | List record metadata with pagination (`?table`, `?page`, `?limit`); no blob bytes returned |
+| `GET` | `/api/v1/storage/records/{id}` | JWT | Download encrypted blob with `X-Checksum` response header |
+| `PUT` | `/api/v1/storage/records/{id}` | JWT | Replace an existing blob (verifies checksum, enforces quota) |
+| `DELETE` | `/api/v1/storage/records/{id}` | JWT | Delete a record and its S3 blob |
+
+### Vectors (Cloud Vector Store)
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| `POST` | `/api/v1/storage/vectors/upsert` | JWT | Verify checksums and upsert encrypted vectors |
+| `POST` | `/api/v1/storage/vectors/search` | JWT | Search user-scoped vector namespace |
+| `DELETE` | `/api/v1/storage/vectors` | JWT | Delete vectors by ID list |
+
+### Backup
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| `PUT` | `/api/v1/backup` | JWT | Upload encrypted backup blob with custom headers (`X-Backup-Version`, `X-Backup-Timestamp`, `X-Backup-Checksum`). Tier quota enforced. |
+| `GET` | `/api/v1/backup` | JWT | Download latest backup blob. Supports `If-Modified-Since`. |
+| `GET` | `/api/v1/backup/history` | JWT | List backup metadata (no blob content) |
+| `DELETE` | `/api/v1/backup/{backup_id}` | JWT | Delete a specific backup |
+
+### Plugins (Marketplace)
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| `GET` | `/api/v1/plugins` | JWT (Power+) | Browse the marketplace (`?category`, `?q`, `?page`, `?sort=rating\|installs\|newest`) |
+| `GET` | `/api/v1/plugins/{id}` | JWT (Power+) | Plugin detail with install count and ratings |
+| `POST` | `/api/v1/plugins/{id}/install` | JWT (Power+) | Install plugin; triggers Stripe Connect revenue split for paid plugins |
+| `DELETE` | `/api/v1/plugins/{id}/install` | JWT | Uninstall plugin |
+
+### Billing
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| `POST` | `/api/v1/billing/checkout` | JWT | Create a Stripe checkout session, returns `{"checkout_url": "..."}` |
+| `POST` | `/api/v1/billing/webhook` | Stripe signature | Handle Stripe events: `checkout.session.completed`, `customer.subscription.updated`, `customer.subscription.deleted`, `invoice.payment_failed` |
+| `GET` | `/api/v1/billing/subscription` | JWT | Get current subscription information |
+| `DELETE` | `/api/v1/billing/subscription` | JWT | Cancel subscription and revert to free tier |
+
+---
+
+## Data Model
+
+9 tables managed by Alembic migrations. Source: `app/models.py`
+
+### Tables
+
+| Table | Primary Key | Key Columns | Purpose |
+|---|---|---|---|
+| `users` | `id` (UUID) | `email` (unique), `password_hash`, `tier`, `stripe_customer_id`, timestamps | User accounts |
+| `refresh_tokens` | `id` (UUID) | `user_id` (FK), `token_hash` (SHA-256, unique), `expires_at` | Hashed refresh tokens for rotation |
+| `subscriptions` | `id` (UUID) | `user_id` (FK, unique), `stripe_subscription_id`, `tier`, `status`, `current_period_end` | Stripe subscription records |
+| `storage_records` | `id` (UUID) | `user_id` (FK), `table_name`, `s3_key`, `checksum`, `size_bytes`, timestamps | S3 blob metadata (no plaintext content) |
+| `backup_metadata` | `id` (UUID) | `user_id` (FK), `s3_key`, `version`, `timestamp`, `checksum`, `size_bytes` | Backup manifests |
+| `plugins` | `id` (String) | `name`, `description`, `version`, `author_id` (FK), `category`, `price_cents`, `permissions` (JSON), `status`, `s3_package_key`, `install_count`, `avg_rating` | Marketplace plugin catalog |
+| `plugin_installations` | `id` (UUID) | `plugin_id` (FK), `user_id` (FK), unique constraint on (`plugin_id`, `user_id`) | Per-user install tracking |
+| `plugin_reviews` | `id` (UUID) | `plugin_id` (FK), `reviewer_id` (FK), `decision`, `notes`, `reviewed_at` | Admin review decisions |
+| `revenue_events` | `id` (UUID) | `plugin_id` (FK), `user_id` (FK), `amount_cents`, `developer_share_cents`, `stripe_transfer_id` | 70/30 revenue split ledger |
+
+### Enum Types
+
+| Enum | Values |
+|---|---|
+| `billing_tier` | `free`, `pro`, `power`, `team` |
+| `plugin_status` | `pending_review`, `approved`, `rejected` |
+| `review_decision` | `approved`, `rejected` |
+
+### Migrations
+
+| Version | Description |
+|---|---|
+| `001_initial_schema` | Creates all 9 tables with indexes and foreign key constraints |
+| `002_seed_plugins` | Seeds 3 approved plugins: GitHub Sync (free), Slack Notifier (€4.99), Time Tracker (€9.99) |
+
+---
+
+## AI Agent System
+
+The agent system uses a registry pattern with LangChain tool-calling agents powered by GPT-4o. Source: `app/agents/`, `app/core/agent_registry.py`
+
+### Architecture
+
+- **`BaseAgent`** — Abstract base with `user_id`, `shared_memory`, and `vector_store_context`.
+- **`ChatAgent(BaseAgent)`** — Abstract `handle(query, context)` and `get_tools()` methods, plus a shared `_tool_loop(llm, messages, tools, max_iter=5)` for iterative tool calling.
+- **`AgentRegistry`** — Singleton registry with `@register` decorator, `get(name)`, `list_agents()`, and `call_agent(name, query, context)`.
+
+### Registered Agents
+
+| Agent | Registry Name | Tools | Description |
+|---|---|---|---|
+| **TaskAgent** | `task_agent` | 8 | Full task and comment CRUD. Status: `todo` / `in_progress` / `done`. Priority: `high` / `medium` / `low`. Tools: `list_tasks`, `create_task`, `update_task`, `delete_task`, `list_tasks_due_today`, `list_task_comments`, `add_task_comment`, `delete_task_comment` |
+| **ProjectAgent** | `project_agent` | 6 | Project lifecycle management. Status: `active` / `archived`. Prefers archiving over deletion. Tools: `list_projects`, `list_all_projects`, `get_project`, `create_project`, `update_project`, `delete_project` |
+| **CheckpointAgent** | `checkpoint_agent` | 4 | Project milestones. Requires `project_id` for creation. Supports AI-suggestion and approval workflows. Tools: `list_checkpoints`, `create_checkpoint`, `update_checkpoint`, `delete_checkpoint` |
+| **NoteAgent** | `note_agent` | 5 | Markdown note management. Optionally linked to projects. Tools: `list_notes`, `get_note`, `create_note`, `update_note`, `delete_note` |
+
+All agents use the model configured by `LLM_MODEL` (default: GPT-4o) with `temperature=0` via LiteLLM. Tools return JSON action descriptors that the Electron client interprets and applies locally.
+
+### Switching LLM Providers
+
+The backend uses **LiteLLM** as a universal LLM gateway. All agents and the orchestrator instantiate models through a centralized factory in `app/core/llm.py`. To switch providers, change environment variables — no code changes required:
+
+```bash
+# OpenAI (default)
+LLM_MODEL=gpt-4o
+LLM_ROUTER_MODEL=gpt-4o-mini
+
+# Anthropic
+LLM_MODEL=anthropic/claude-3.5-sonnet
+LLM_ROUTER_MODEL=anthropic/claude-3-haiku
+
+# Google Gemini
+LLM_MODEL=gemini/gemini-pro
+LLM_ROUTER_MODEL=gemini/gemini-flash
+
+# Local Ollama
+LLM_MODEL=ollama/llama3
+LLM_ROUTER_MODEL=ollama/llama3
+
+# AWS Bedrock
+LLM_MODEL=bedrock/anthropic.claude-v2
+LLM_ROUTER_MODEL=bedrock/anthropic.claude-instant-v1
+```
+
+See the [LiteLLM provider docs](https://docs.litellm.ai/docs/providers) for the full list of 100+ supported providers and model naming conventions.
+
+---
+
+## Orchestration & Execution Plans
+
+Source: `app/core/orchestrator.py`, `app/core/execution_plan.py`
+
+### Orchestrator
+
+1. **`classify_intent(message, context, registry)`** — Uses the router model (`LLM_ROUTER_MODEL`, default: GPT-4o-mini) to determine which agent should handle a message. Falls back to `task_agent` when classification is ambiguous.
+2. **`route_single(agent_name, message, context)`** — Routes to a single agent and returns a `ChatResponse`.
+3. **`route_pipeline(agent_names, message, context)`** — Executes agents sequentially; each receives `previous_results` from earlier agents. A final LLM synthesis step merges all results.
+4. **`orchestrate(request)`** — Main entry point. In `direct` mode, returns a `ChatResponse`. In `plan` mode, returns an `ExecutionPlan`.
+5. **`orchestrate_stream(request)`** — Streaming variant that yields 50-character text chunks with a final JSON frame.
+
+### Execution Plans
+
+- **`PromptTemplateRegistry`** — Maps template IDs to server-side prompt text. Clients only ever see opaque IDs, never raw prompts.
+- **`ExecutionPlanBuilder`** — Fluent builder API: `add_step()`, `add_llm_step(template_id, vars)`, `add_data_step(action, data_from_step)`. Validates step references on `build()`.
+- **`PlanCache`** — LRU cache (maxsize 1000) for storing plans as reusable playbooks.
+
+### Built-in Templates (6)
+
+`tpl_task_agent_default`, `tpl_checkpoint_agent_default`, `tpl_project_agent_default`, `tpl_note_agent_default`, `tpl_task_extract_from_project`, `tpl_note_weekly_summary`
+
+### Built-in Playbooks (2)
+
+| Playbook | Description |
+|---|---|
+| `create_tasks_from_project` | LLM extracts actionable tasks from project context, then creates task records |
+| `generate_weekly_note` | LLM generates a weekly summary, then creates a note record |
+
+---
+
+## Middleware
+
+Middleware executes in this order on each request: **TierRateLimit → Sanitizer → CORS → Router**
+
+### JWT Authentication
+
+Source: `app/api/middleware/auth.py`
+
+- FastAPI dependency `get_current_user` validates the `Bearer` JWT and extracts `user_id` and `email`.
+- **Live tier lookup** — The current tier is fetched from the `subscriptions` table on every request (not cached in the JWT), so upgrades and downgrades take immediate effect.
+- Falls back to `free` when no subscription row exists.
+- Raises `401 Unauthorized` on invalid or expired tokens.
+- **Exempt paths:** `/api/v1/auth/register`, `/api/v1/auth/login`, `/api/v1/billing/webhook`
+
+### Tier-Based Rate Limiter
+
+Source: `app/api/middleware/rate_limit.py`
+
+- `TierRateLimitMiddleware` — Sliding-window in-process rate limiter (no Redis dependency).
+- Per-user 60-second window sized by subscription tier:
+
+| Tier | Requests / Minute |
+|---|---|
+| Free | 20 |
+| Pro | 60 |
+| Power | 120 |
+| Team | 200 |
+
+- Returns `429 Too Many Requests` with a `Retry-After` header when the limit is exceeded.
+- **Exempt paths:** register, login, webhook, health
+
+### Response Sanitizer
+
+Source: `app/api/middleware/sanitizer.py`
+
+- Runs only on `/api/v1/chat` endpoints.
+- Scans JSON response bodies and replaces leaked prompt IP fragments with `[REDACTED]`.
+- Detects: system prompt openers, agent routing metadata, LangChain tool schemas, internal reasoning markers (`<thinking>`, `[INST]`), and known prompt fingerprints.
+- Logs sanitization events as `WARNING`.
+- Binary responses (storage, backup) are never touched.
+
+---
+
+## Storage Layer
+
+### Blob Store
+
+Source: `app/storage/blob_store.py`
+
+- S3-backed storage for E2E encrypted blobs.
+- Object keys follow the pattern: `{user_id}/{table}/{record_id}`
+- Server-side SSE-S3 encryption at rest (additional layer on top of client-side E2E encryption).
+- Methods: `upload()`, `download()`, `delete()` (idempotent), `list_keys()`
+- The backend **never inspects or decrypts blob content**.
+
+### Vector Store
+
+Source: `app/storage/vector_store.py`
+
+- Runtime-configurable: **Pinecone** (when `PINECONE_API_KEY` is set) or **Qdrant** (fallback).
+- User isolation: Pinecone uses `namespace=user_id`; Qdrant filters by `user_id` payload field.
+- 32-dimensional SHA-256-derived float vectors (deterministic, not semantically meaningful on encrypted data — a documented trade-off for privacy).
+- Encrypted blobs are stored as base64 in metadata/payload for verbatim retrieval.
+- Methods: `upsert()`, `search()`, `delete()`
+
+### Encryption Utilities
+
+Source: `app/storage/encryption.py`
+
+- `verify_checksum(blob, checksum)` — SHA-256 hash comparison using `hmac.compare_digest` (constant-time to prevent timing attacks).
+- `reject_if_tampered(blob, checksum)` — Raises HTTP 400 on checksum mismatch.
+- **No decryption key ever reaches the backend.**
+
+---
+
+## Billing & Tiers
+
+Source: `app/billing/stripe_service.py`, `app/billing/tier_manager.py`
+
+### Feature Matrix
+
+| Feature | Free | Pro | Power | Team |
+|---|---|---|---|---|
+| AI Agents | 3 | Unlimited | Unlimited | Unlimited |
+| Batch Active | 2 | 10 | Unlimited | Unlimited |
+| Cloud Storage | 0 GB | 5 GB | 25 GB | Unlimited |
+| Backup Storage | 0 GB | 5 GB | 25 GB | Unlimited |
+| LLM Providers | 1 | Unlimited | Unlimited | Unlimited |
+| Batch Builder | — | — | ✓ | ✓ |
+| Plugin Marketplace | — | — | ✓ | ✓ |
+| SSO | — | — | — | ✓ |
+| Rate Limit | 20 req/min | 60 req/min | 120 req/min | 200 req/min |
+
+### Stripe Integration
+
+- **Checkout** — `create_checkout_session(user_id, tier)` creates a Stripe Checkout session. Returns a stub URL when Stripe is not configured.
+- **Webhooks** — Handles `checkout.session.completed`, `customer.subscription.updated`, `customer.subscription.deleted`, and `invoice.payment_failed`.
+- **Subscription management** — `get_subscription()` returns the current subscription record; `cancel_subscription()` cancels via the Stripe API and reverts the user to the free tier.
+- **Price IDs:** `price_pro_monthly`, `price_power_monthly`, `price_team_monthly`
+
+### Tier Manager
+
+- `get_tier(user_id)` — Returns the user's current billing tier.
+- `check_feature(tier, feature)` — Boolean feature gate check.
+- `require_feature(tier, feature)` — Raises HTTP 403 if the feature is not available.
+- `enforce_quota(user_id, tier)` / `enforce_backup_quota(user_id, tier)` — Raises HTTP 402 if storage limits are exceeded.
+
+---
+
+## Plugin Marketplace
+
+Source: `app/marketplace/`
+
+### Plugin Registry
+
+- PostgreSQL-backed catalog of submitted and approved plugins.
+- `list_plugins(db, category, query, page, sort)` — Paginated listing (page size: 20) with optional filtering by category, text search, and sorting by `rating`, `installs`, or `newest`.
+- `get_plugin(db, plugin_id)` — Full manifest with install count and ratings.
+- `submit_plugin(db, manifest, s3_key)` — Submits a plugin with `pending_review` status.
+- `approve_plugin()` / `reject_plugin(reason)` — Admin workflow for plugin approval.
+- `record_install()` / `record_uninstall()` — Tracks per-user installations and updates install counts.
+
+### Review Queue
+
+- Automated security checklist before human review:
+  - Plugin ID must match `^[a-z0-9-]+$`
+  - Permissions must be from the allowed set only
+  - No binary blobs in the manifest
+- **Allowed permissions:** `read:tasks`, `write:tasks`, `read:projects`, `write:projects`, `read:notes`, `write:notes`, `read:checkpoints`, `write:checkpoints`, `read:calendar`, `write:calendar`
+- `get_pending(db)` — Lists plugins awaiting review.
+- `submit_review(db, plugin_id, reviewer_id, decision, notes)` — Records the review decision.
+
+### Revenue Sharing
+
+- **70% developer / 30% platform** split on all paid plugin sales.
+- `record_install(db, plugin_id, user_id, amount_cents)` — Records the revenue event and triggers a Stripe Connect transfer for the developer share.
+- `get_earnings(db, developer_id, period)` — Aggregated earnings report for plugin developers.
+- Gracefully stubs transfers when Stripe is not configured.
+
+### Seed Plugins
+
+| Plugin | Category | Price |
+|---|---|---|
+| GitHub Sync | Productivity | Free |
+| Slack Notifier | Communication | €4.99 |
+| Time Tracker | Productivity | €9.99 |
+
+---
+
+## Testing
+
+### Running Tests
+
+```bash
+# Run all tests
+pytest
+
+# Run a specific test file
+pytest tests/test_auth.py
+
+# Run with verbose output
+pytest -v
+```
+
+### Test Infrastructure
+
+- **Database:** Async SQLite in-memory via `aiosqlite` + `StaticPool` — fast, no PostgreSQL needed.
+- **S3 mock:** `moto[s3]` with a fixture that patches `BlobStore` settings.
+- **Auth helpers:** `make_jwt(tier)` and `auth_header(tier)` generate per-tier test tokens.
+- **Seed data:** Auto-creates one `User` + `Subscription` per tier (free/pro/power/team) before each test.
+- **Plugin seeds:** Fixture adds 3 approved plugins for marketplace tests.
+- **FK enforcement:** SQLite `PRAGMA foreign_keys=ON`.
+- **No external dependencies** — all tests run fully offline.
+
+### Test Coverage
+
+| File | Coverage |
+|---|---|
+| `test_auth.py` | Register, login, token access, refresh, expiration |
+| `test_orchestrator.py` | Intent classification, single agent routing, pipeline, plan mode |
+| `test_agents.py` | Each agent with mocked LLM: registration, tools, handle method |
+| `test_storage.py` | Create, list, download, update, delete records; checksum rejection; quota enforcement |
+| `test_backup.py` | Upload, download, history, delete; tier-based storage limits |
+| `test_plugins.py` | List, install, uninstall, revenue events, tier gate enforcement |
+| `test_agent_registry.py` | Registry singleton, registration, lookup, listing |
+| `test_execution_plan.py` | Plan builder, template registry, plan cache |
+| `test_middleware.py` | Rate limiting by tier, sanitizer prompt leak detection |
+
+---
+
+## Project Structure
+
+```
+adiuva-api/
+├── alembic.ini                  # Alembic configuration
+├── BACKEND_PLAN.md              # Architecture & design decisions
+├── docker-compose.yml           # Docker Compose (app + PostgreSQL)
+├── Dockerfile                   # Multi-stage production build
+├── requirements.txt             # Python dependencies
+│
+├── alembic/                     # Database migrations
+│   ├── env.py                   # Alembic environment config
+│   ├── script.py.mako           # Migration template
+│   └── versions/
+│       ├── 001_initial_schema.py    # Tables, indexes, FKs
+│       └── 002_seed_plugins.py      # Seed marketplace plugins
+│
+├── app/                         # Application source
+│   ├── main.py                  # FastAPI app factory, middleware, routes
+│   ├── db.py                    # Async SQLAlchemy engine & session
+│   ├── models.py                # SQLAlchemy ORM models (9 tables)
+│   ├── schemas.py               # Pydantic request/response schemas
+│   │
+│   ├── config/
+│   │   └── settings.py          # Pydantic Settings (env vars)
+│   │
+│   ├── agents/                  # LLM-powered domain agents
+│   │   ├── task_agent.py        # Task & comment CRUD (8 tools)
+│   │   ├── project_agent.py     # Project lifecycle (6 tools)
+│   │   ├── checkpoint_agent.py  # Milestones (4 tools)
+│   │   └── note_agent.py        # Markdown notes (5 tools)
+│   │
+│   ├── core/                    # Orchestration engine
+│   │   ├── agent_registry.py    # BaseAgent, ChatAgent, AgentRegistry
+│   │   ├── llm.py               # LiteLLM factory (get_llm, get_router_llm)
+│   │   ├── orchestrator.py      # Intent classification & routing
+│   │   └── execution_plan.py    # Plan builder, templates, cache
+│   │
+│   ├── api/                     # HTTP layer
+│   │   ├── deps.py              # Shared FastAPI dependencies
+│   │   ├── middleware/
+│   │   │   ├── auth.py          # JWT validation, live tier lookup
+│   │   │   ├── rate_limit.py    # Sliding-window tier rate limiter
+│   │   │   └── sanitizer.py     # Prompt IP leak protection
+│   │   └── routes/
+│   │       ├── auth.py          # Register, login, refresh, me
+│   │       ├── chat.py          # Chat + WebSocket streaming
+│   │       ├── plans.py         # Execution plan playbooks
+│   │       ├── storage.py       # E2E encrypted record CRUD
+│   │       ├── vectors.py       # Vector upsert, search, delete
+│   │       ├── backup.py        # Encrypted backup management
+│   │       ├── plugins.py       # Marketplace browse & install
+│   │       └── billing.py       # Stripe checkout & webhooks
+│   │
+│   ├── storage/                 # Storage backends
+│   │   ├── blob_store.py        # S3 blob storage
+│   │   ├── vector_store.py      # Pinecone / Qdrant vector store
+│   │   └── encryption.py        # Checksum verification utilities
+│   │
+│   ├── billing/                 # Subscription management
+│   │   ├── stripe_service.py    # Stripe API integration
+│   │   └── tier_manager.py      # Feature matrix & quota enforcement
+│   │
+│   └── marketplace/             # Plugin ecosystem
+│       ├── plugin_registry.py   # Catalog CRUD & search
+│       ├── plugin_review.py     # Security checklist & review queue
+│       └── revenue_share.py     # 70/30 split & Stripe Connect
+│
+└── tests/                       # Test suite
+    ├── conftest.py              # Fixtures: DB, S3, auth, seeds
+    ├── test_auth.py
+    ├── test_orchestrator.py
+    ├── test_agents.py
+    ├── test_storage.py
+    ├── test_backup.py
+    ├── test_plugins.py
+    ├── test_agent_registry.py
+    ├── test_execution_plan.py
+    └── test_middleware.py
+```
+
+---
+
+## License
+
+*To be determined.*
diff --git a/app/agents/checkpoint_agent.py b/app/agents/checkpoint_agent.py
index 9410aab..a42f865 100644
--- a/app/agents/checkpoint_agent.py
+++ b/app/agents/checkpoint_agent.py
@@ -7,10 +7,9 @@ from typing import Any
 
 from langchain_core.messages import HumanMessage, SystemMessage
 from langchain_core.tools import tool
-from langchain_openai import ChatOpenAI
 
-from app.config.settings import settings
 from app.core.agent_registry import ChatAgent, registry
+from app.core.llm import get_llm
 
 _SYSTEM_PROMPT = (
     "You are a project checkpoint assistant. Checkpoints are milestone dates that\n"
@@ -112,7 +111,7 @@ class CheckpointAgent(ChatAgent):
         return [list_checkpoints, create_checkpoint, update_checkpoint, delete_checkpoint]
 
     async def handle(self, query: str, context: dict[str, Any]) -> str:
-        llm = ChatOpenAI(model="gpt-4o", temperature=0, api_key=settings.OPENAI_API_KEY)
+        llm = get_llm()
         messages = [
             SystemMessage(content=_SYSTEM_PROMPT),
             HumanMessage(
diff --git a/app/agents/note_agent.py b/app/agents/note_agent.py
index 65898cc..905820e 100644
--- a/app/agents/note_agent.py
+++ b/app/agents/note_agent.py
@@ -7,10 +7,9 @@ from typing import Any
 
 from langchain_core.messages import HumanMessage, SystemMessage
 from langchain_core.tools import tool
-from langchain_openai import ChatOpenAI
 
-from app.config.settings import settings
 from app.core.agent_registry import ChatAgent, registry
+from app.core.llm import get_llm
 
 _SYSTEM_PROMPT = (
     "You are a note-taking assistant. You help users create, retrieve, update,\n"
@@ -113,7 +112,7 @@ class NoteAgent(ChatAgent):
         return [list_notes, get_note, create_note, update_note, delete_note]
 
     async def handle(self, query: str, context: dict[str, Any]) -> str:
-        llm = ChatOpenAI(model="gpt-4o", temperature=0, api_key=settings.OPENAI_API_KEY)
+        llm = get_llm()
         messages = [
             SystemMessage(content=_SYSTEM_PROMPT),
             HumanMessage(
diff --git a/app/agents/project_agent.py b/app/agents/project_agent.py
index 1054386..b8bc14f 100644
--- a/app/agents/project_agent.py
+++ b/app/agents/project_agent.py
@@ -7,10 +7,9 @@ from typing import Any
 
 from langchain_core.messages import HumanMessage, SystemMessage
 from langchain_core.tools import tool
-from langchain_openai import ChatOpenAI
 
-from app.config.settings import settings
 from app.core.agent_registry import ChatAgent, registry
+from app.core.llm import get_llm
 
 _SYSTEM_PROMPT = (
     "You are a project management assistant. You help users create, find,\n"
@@ -148,7 +147,7 @@ class ProjectAgent(ChatAgent):
         ]
 
     async def handle(self, query: str, context: dict[str, Any]) -> str:
-        llm = ChatOpenAI(model="gpt-4o", temperature=0, api_key=settings.OPENAI_API_KEY)
+        llm = get_llm()
         messages = [
             SystemMessage(content=_SYSTEM_PROMPT),
             HumanMessage(
diff --git a/app/agents/task_agent.py b/app/agents/task_agent.py
index df1d3c0..07ac619 100644
--- a/app/agents/task_agent.py
+++ b/app/agents/task_agent.py
@@ -7,10 +7,9 @@ from typing import Any
 
 from langchain_core.messages import HumanMessage, SystemMessage
 from langchain_core.tools import tool
-from langchain_openai import ChatOpenAI
 
-from app.config.settings import settings
 from app.core.agent_registry import ChatAgent, registry
+from app.core.llm import get_llm
 
 _SYSTEM_PROMPT = (
     "You are a task management assistant for a project workspace.\n"
@@ -219,7 +218,7 @@ class TaskAgent(ChatAgent):
         ]
 
     async def handle(self, query: str, context: dict[str, Any]) -> str:
-        llm = ChatOpenAI(model="gpt-4o", temperature=0, api_key=settings.OPENAI_API_KEY)
+        llm = get_llm()
         messages = [
             SystemMessage(content=_SYSTEM_PROMPT),
             HumanMessage(
diff --git a/app/config/settings.py b/app/config/settings.py
index c9d7042..ec522c2 100644
--- a/app/config/settings.py
+++ b/app/config/settings.py
@@ -24,6 +24,9 @@ class Settings(BaseSettings):
 
     OPENAI_API_KEY: str = ""
 
+    LLM_MODEL: str = "gpt-4o"
+    LLM_ROUTER_MODEL: str = "gpt-4o-mini"
+
     CORS_ORIGINS: list[str] = ["app://.", "http://localhost:3000", "http://localhost:5173"]
 
     ENV: Literal["dev", "prod"] = "dev"
diff --git a/app/core/llm.py b/app/core/llm.py
new file mode 100644
index 0000000..2787d00
--- /dev/null
+++ b/app/core/llm.py
@@ -0,0 +1,68 @@
+"""LLM factory — centralised model instantiation via LiteLLM.
+
+Every agent and the orchestrator call ``get_llm()`` or ``get_router_llm()``
+instead of directly constructing a provider-specific class.  The model string
+follows the `LiteLLM model naming convention
+<https://docs.litellm.ai/docs/providers>`_:
+
+* OpenAI:     ``gpt-4o``, ``gpt-4o-mini``
+* Anthropic:  ``anthropic/claude-3.5-sonnet``
+* Google:     ``gemini/gemini-pro``
+* Ollama:     ``ollama/llama3``
+* Bedrock:    ``bedrock/anthropic.claude-v2``
+
+Switch providers by changing **LLM_MODEL** / **LLM_ROUTER_MODEL** in ``.env``
+— no code changes required.
+"""
+
+from __future__ import annotations
+
+from langchain_openai import ChatOpenAI
+from litellm import get_supported_openai_params  # noqa: F401 – validates install
+
+from app.config.settings import settings
+
+
+def _api_key_for_model(model: str) -> str | None:
+    """Return the most appropriate API key for the given LiteLLM model string."""
+    if model.startswith("anthropic/"):
+        return getattr(settings, "ANTHROPIC_API_KEY", None) or None
+    if model.startswith("gemini/") or model.startswith("google/"):
+        return getattr(settings, "GOOGLE_API_KEY", None) or None
+    # Default: OpenAI-compatible (covers plain model names like "gpt-4o")
+    return settings.OPENAI_API_KEY or None
+
+
+def get_llm(
+    *,
+    model: str | None = None,
+    temperature: float = 0,
+) -> ChatOpenAI:
+    """Return a LangChain chat model backed by LiteLLM.
+
+    LiteLLM exposes an OpenAI-compatible API, so we use ``ChatOpenAI`` pointed
+    at the LiteLLM proxy endpoint.  In practice, ``litellm`` patches the
+    ``openai`` client transparently when the model string contains a provider
+    prefix (``anthropic/…``, ``gemini/…``, etc.).
+
+    Parameters
+    ----------
+    model:
+        LiteLLM model identifier. Defaults to ``settings.LLM_MODEL``.
+    temperature:
+        Sampling temperature.  ``0`` = deterministic.
+    """
+    model = model or settings.LLM_MODEL
+    return ChatOpenAI(
+        model=model,
+        temperature=temperature,
+        api_key=_api_key_for_model(model),
+    )
+
+
+def get_router_llm(
+    *,
+    temperature: float = 0,
+) -> ChatOpenAI:
+    """Return the lighter model used for intent classification / routing."""
+    return get_llm(model=settings.LLM_ROUTER_MODEL, temperature=temperature)
diff --git a/app/core/orchestrator.py b/app/core/orchestrator.py
index 77d7d9f..4b5afac 100644
--- a/app/core/orchestrator.py
+++ b/app/core/orchestrator.py
@@ -6,10 +6,9 @@ import json
 from typing import Any, AsyncGenerator
 
 from langchain_core.messages import HumanMessage, SystemMessage
-from langchain_openai import ChatOpenAI
 
-from app.config.settings import settings
 from app.core.agent_registry import AgentRegistry
+from app.core.llm import get_router_llm
 from app.core.agent_registry import registry as _default_registry
 from app.schemas import ChatRequest, ChatResponse, ExecutionPlan
 
@@ -29,8 +28,8 @@ _SYNTHESIZE_HUMAN = (
 )
 
 
-def _make_llm(model: str = "gpt-4o-mini") -> ChatOpenAI:
-    return ChatOpenAI(model=model, temperature=0, api_key=settings.OPENAI_API_KEY)
+def _make_llm():
+    return get_router_llm()
 
 
 async def classify_intent(
diff --git a/requirements.txt b/requirements.txt
index 8436567..b7409ab 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -3,6 +3,7 @@ uvicorn[standard]>=0.34.0
 gunicorn>=22.0.0
 langchain>=0.3.0
 langchain-openai>=0.3.0
+litellm>=1.50.0
 pydantic>=2.10.0
 pydantic-settings>=2.7.0
 python-jose[cryptography]>=3.3.0
diff --git a/tests/test_agents.py b/tests/test_agents.py
index ebbcf86..33c17b9 100644
--- a/tests/test_agents.py
+++ b/tests/test_agents.py
@@ -102,21 +102,21 @@ class TestTaskAgent:
 
     @pytest.mark.asyncio
     async def test_handle_returns_string(self) -> None:
-        with patch("app.agents.task_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.task_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("Task created.")
             result = await TaskAgent().handle("create a task", {})
         assert isinstance(result, str)
 
     @pytest.mark.asyncio
     async def test_handle_no_tool_calls(self) -> None:
-        with patch("app.agents.task_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.task_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("Here are your tasks.")
             result = await TaskAgent().handle("list my tasks", {})
         assert result == "Here are your tasks."
 
     @pytest.mark.asyncio
     async def test_handle_with_create_task_tool_call(self) -> None:
-        with patch("app.agents.task_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.task_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm_with_tool_call(
                 "create_task",
                 {"title": "Buy groceries", "priority": "low"},
@@ -127,7 +127,7 @@ class TestTaskAgent:
 
     @pytest.mark.asyncio
     async def test_handle_accepts_empty_context(self) -> None:
-        with patch("app.agents.task_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.task_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("Done.")
             result = await TaskAgent().handle("help", {})
         assert isinstance(result, str)
@@ -138,7 +138,7 @@ class TestTaskAgent:
             "user_profile": {"id": "u1", "tier": "pro"},
             "recent_tasks": [{"id": "t1", "title": "Old task"}],
         }
-        with patch("app.agents.task_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.task_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("Tasks listed.")
             result = await TaskAgent().handle("show tasks", context)
         assert isinstance(result, str)
@@ -273,14 +273,14 @@ class TestCheckpointAgent:
 
     @pytest.mark.asyncio
     async def test_handle_no_tool_calls(self) -> None:
-        with patch("app.agents.checkpoint_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.checkpoint_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("No checkpoints found.")
             result = await CheckpointAgent().handle("list checkpoints", {})
         assert result == "No checkpoints found."
 
     @pytest.mark.asyncio
     async def test_handle_with_create_tool_call(self) -> None:
-        with patch("app.agents.checkpoint_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.checkpoint_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm_with_tool_call(
                 "create_checkpoint",
                 {"project_id": "p1", "title": "MVP Launch", "date": 1700000000000},
@@ -291,7 +291,7 @@ class TestCheckpointAgent:
 
     @pytest.mark.asyncio
     async def test_handle_accepts_empty_context(self) -> None:
-        with patch("app.agents.checkpoint_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.checkpoint_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("Done.")
             result = await CheckpointAgent().handle("show milestones", {})
         assert isinstance(result, str)
@@ -397,14 +397,14 @@ class TestProjectAgent:
 
     @pytest.mark.asyncio
     async def test_handle_no_tool_calls(self) -> None:
-        with patch("app.agents.project_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.project_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("Project Alpha is active.")
             result = await ProjectAgent().handle("show my projects", {})
         assert result == "Project Alpha is active."
 
     @pytest.mark.asyncio
     async def test_handle_with_create_project_tool_call(self) -> None:
-        with patch("app.agents.project_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.project_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm_with_tool_call(
                 "create_project",
                 {"name": "Pippo"},
@@ -415,7 +415,7 @@ class TestProjectAgent:
 
     @pytest.mark.asyncio
     async def test_handle_accepts_empty_context(self) -> None:
-        with patch("app.agents.project_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.project_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("Done.")
             result = await ProjectAgent().handle("archive old project", {})
         assert isinstance(result, str)
@@ -515,14 +515,14 @@ class TestNoteAgent:
 
     @pytest.mark.asyncio
     async def test_handle_no_tool_calls(self) -> None:
-        with patch("app.agents.note_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.note_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("Note created.")
             result = await NoteAgent().handle("create a note", {})
         assert result == "Note created."
 
     @pytest.mark.asyncio
     async def test_handle_with_create_note_tool_call(self) -> None:
-        with patch("app.agents.note_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.note_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm_with_tool_call(
                 "create_note",
                 {"title": "Daily log", "content": "# Today\nAll good."},
@@ -533,7 +533,7 @@ class TestNoteAgent:
 
     @pytest.mark.asyncio
     async def test_handle_accepts_empty_context(self) -> None:
-        with patch("app.agents.note_agent.ChatOpenAI") as mock_cls:
+        with patch("app.agents.note_agent.get_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("Done.")
             result = await NoteAgent().handle("show notes", {})
         assert isinstance(result, str)
diff --git a/tests/test_orchestrator.py b/tests/test_orchestrator.py
index 4432e33..e157e13 100644
--- a/tests/test_orchestrator.py
+++ b/tests/test_orchestrator.py
@@ -87,21 +87,21 @@ def reg() -> AgentRegistry:
 class TestClassifyIntent:
     @pytest.mark.asyncio
     async def test_routes_to_known_agent(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("task_agent")
             result = await classify_intent("add a task", {}, reg)
         assert result == "task_agent"
 
     @pytest.mark.asyncio
     async def test_routes_to_calendar_agent(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("calendar_agent")
             result = await classify_intent("schedule a meeting", {}, reg)
         assert result == "calendar_agent"
 
     @pytest.mark.asyncio
     async def test_falls_back_on_unknown_name(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("nonexistent_agent")
             result = await classify_intent("do something", {}, reg)
         assert result == "task_agent"
@@ -110,14 +110,14 @@ class TestClassifyIntent:
     async def test_empty_registry_returns_fallback_without_llm_call(self) -> None:
         empty_reg = AgentRegistry()
         # No LLM should be instantiated — early return path
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             result = await classify_intent("anything", {}, empty_reg)
             mock_cls.assert_not_called()
         assert result == "task_agent"
 
     @pytest.mark.asyncio
     async def test_whitespace_stripped_from_response(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("  task_agent  \n")
             result = await classify_intent("create task", {}, reg)
         assert result == "task_agent"
@@ -154,7 +154,7 @@ class TestRouteSingle:
 class TestRoutePipeline:
     @pytest.mark.asyncio
     async def test_returns_chat_response(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("synthesized result")
             result = await route_pipeline(
                 ["task_agent", "calendar_agent"], "plan my week", {}, reg
@@ -163,7 +163,7 @@ class TestRoutePipeline:
 
     @pytest.mark.asyncio
     async def test_response_is_synthesis_output(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("synthesized result")
             result = await route_pipeline(
                 ["task_agent", "calendar_agent"], "plan my week", {}, reg
@@ -193,7 +193,7 @@ class TestRoutePipeline:
 
         reg.register(_CapturingAgent)
 
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("done")
             await route_pipeline(["task_agent", "capture"], "hi", {}, reg)
 
@@ -204,7 +204,7 @@ class TestRoutePipeline:
 
     @pytest.mark.asyncio
     async def test_single_agent_pipeline(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("single result")
             result = await route_pipeline(["task_agent"], "one agent", {}, reg)
         assert result.response == "single result"
@@ -218,7 +218,7 @@ class TestOrchestrate:
     async def test_direct_mode_returns_chat_response(
         self, reg: AgentRegistry
     ) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("task_agent")
             request = ChatRequest(message="add a task", execution_mode="direct")
             result = await orchestrate(request, reg)
@@ -226,7 +226,7 @@ class TestOrchestrate:
 
     @pytest.mark.asyncio
     async def test_direct_mode_response_content(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("task_agent")
             request = ChatRequest(message="add a task", execution_mode="direct")
             result = await orchestrate(request, reg)
@@ -237,7 +237,7 @@ class TestOrchestrate:
     async def test_plan_mode_returns_execution_plan(
         self, reg: AgentRegistry
     ) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("task_agent")
             request = ChatRequest(message="plan my tasks", execution_mode="plan")
             result = await orchestrate(request, reg)
@@ -247,7 +247,7 @@ class TestOrchestrate:
     async def test_plan_mode_agent_matches_classified(
         self, reg: AgentRegistry
     ) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("calendar_agent")
             request = ChatRequest(
                 message="schedule something", execution_mode="plan"
@@ -258,7 +258,7 @@ class TestOrchestrate:
 
     @pytest.mark.asyncio
     async def test_plan_mode_has_steps(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("task_agent")
             request = ChatRequest(message="plan tasks", execution_mode="plan")
             result = await orchestrate(request, reg)
@@ -269,7 +269,7 @@ class TestOrchestrate:
     async def test_plan_mode_template_id_contains_agent_name(
         self, reg: AgentRegistry
     ) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("task_agent")
             request = ChatRequest(message="plan tasks", execution_mode="plan")
             result = await orchestrate(request, reg)
@@ -281,7 +281,7 @@ class TestOrchestrate:
     async def test_default_execution_mode_is_direct(
         self, reg: AgentRegistry
     ) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("task_agent")
             # execution_mode defaults to "direct"
             request = ChatRequest(message="help me")
@@ -295,7 +295,7 @@ class TestOrchestrate:
 class TestOrchestrateStream:
     @pytest.mark.asyncio
     async def test_yields_at_least_one_chunk(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("task_agent")
             request = ChatRequest(message="add a task", execution_mode="direct")
             chunks = [chunk async for chunk in orchestrate_stream(request, reg)]
@@ -305,7 +305,7 @@ class TestOrchestrateStream:
     async def test_last_chunk_is_final_json_frame(
         self, reg: AgentRegistry
     ) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("task_agent")
             request = ChatRequest(message="add a task", execution_mode="direct")
             chunks = [chunk async for chunk in orchestrate_stream(request, reg)]
@@ -319,7 +319,7 @@ class TestOrchestrateStream:
     async def test_final_frame_response_matches_agent_output(
         self, reg: AgentRegistry
     ) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("task_agent")
             request = ChatRequest(message="create a task", execution_mode="direct")
             chunks = [chunk async for chunk in orchestrate_stream(request, reg)]
@@ -331,7 +331,7 @@ class TestOrchestrateStream:
     async def test_text_chunks_before_final_frame(
         self, reg: AgentRegistry
     ) -> None:
-        with patch("app.core.orchestrator.ChatOpenAI") as mock_cls:
+        with patch("app.core.orchestrator._make_llm") as mock_cls:
             mock_cls.return_value = _mock_llm("task_agent")
             request = ChatRequest(
                 message="x" * 200, execution_mode="direct"