fix(langfuse): remove invalid user_id/session_id kwargs from start_as_current_observation

Langfuse V3 does not accept user_id/session_id on observation-level calls. Moved to metadata dict in agent_runner, deep_agent, and agent_setup. refactor(tests): fixture-based pattern for agent_runner_v2 eval tests - cases.yaml + data/ fixtures under tests/fixtures/agent_runner_v2/ - pytest_generate_tests parametrizes test_eval_runner from YAML - _resolve_projects() handles symbolic names and inline dicts - _evaluate_case() centralizes all assertion logic - --runner-dir CLI option for custom fixture folders Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat(local-agent-v2): step 4 — journey produces structured AgentConfig JSON
2026-04-08 00:45:15 +02:00 · 2026-04-08 00:23:58 +02:00 · 2026-04-07 23:04:24 +02:00 · 2026-04-07 16:49:26 +02:00 · 2026-04-07 15:00:32 +02:00 · 2026-04-07 14:26:33 +02:00
90 changed files with 12638 additions and 3107 deletions
--- a/.env.example
+++ b/.env.example
@@ -39,6 +39,13 @@ QDRANT_URL=
 QDRANT_API_KEY=
 # For local Qdrant (homelab): QDRANT_URL=http://qdrant:6333

+# ── Langfuse (leave empty to disable observability) ───────────────────────────
+LANGFUSE_SECRET_KEY=
+LANGFUSE_PUBLIC_KEY=
+# LANGFUSE_HOST=https://cloud.langfuse.com        # EU (default)
+# LANGFUSE_HOST=https://us.cloud.langfuse.com     # US
+# LANGFUSE_HOST=http://localhost:3000             # Self-hosted
+
 # ── CORS ──────────────────────────────────────────────────────────────────────
 # Comma-separated list parsed by Settings (override default if needed)
 # CORS_ORIGINS=["app://.","http://localhost:3000"]
--- a/.gitignore
+++ b/.gitignore
@@ -31,3 +31,4 @@ Thumbs.db

 # Claude Code
 .claude/
+logs/
--- a/BACKEND_PLAN.md
+++ b/BACKEND_PLAN.md
@@ -1,533 +0,0 @@
-# Backend Plan — Adiuva Cloud API
-
-> **Separate repository.** This document defines the FastAPI backend that the Electron app communicates with.
->
-> The backend owns: orchestration logic, chat agent intelligence, prompt IP, auth, billing, E2E backup blob storage, cloud storage (encrypted blobs), cloud vector store, and plugin marketplace.
-> The backend NEVER persists user data in plaintext. Cloud storage blobs are E2E encrypted before upload — the backend only verifies integrity, never decrypts.
-
---
-
-## Project Structure
-
-```
-adiuva-api/
-├── app/
-│   ├── __init__.py
-│   ├── main.py                    # FastAPI entry + CORS + lifespan + router includes
-│   ├── core/
-│   │   ├── __init__.py
-│   │   ├── agent_registry.py      # Base classes + singleton registry
-│   │   ├── orchestrator.py        # LLM-based intent router
-│   │   ├── execution_plan.py      # Plan builder + cache
-│   │   └── plugin_loader.py       # Dynamic agent loading
-│   ├── agents/                    # Chat agents (proprietary logic + prompts)
-│   │   ├── __init__.py            # Auto-registers all agents
-│   │   ├── task_agent.py
-│   │   ├── calendar_agent.py
-│   │   ├── email_agent.py
-│   │   └── analytics_agent.py
-│   ├── api/
-│   │   ├── __init__.py
-│   │   ├── routes/
-│   │   │   ├── __init__.py
-│   │   │   ├── chat.py            # POST /chat + WS /chat/stream
-│   │   │   ├── plans.py           # GET /plans/playbook
-│   │   │   ├── storage.py         # CRUD cloud storage (E2E encrypted blobs)
-│   │   │   ├── vectors.py         # Upsert/search cloud vector store
-│   │   │   ├── backup.py          # PUT/GET /backup
-│   │   │   ├── plugins.py         # Plugin marketplace
-│   │   │   ├── auth.py            # Register/login/refresh
-│   │   │   └── billing.py         # Checkout/webhook/subscription
-│   │   └── middleware/
-│   │       ├── __init__.py
-│   │       ├── auth.py            # JWT validation
-│   │       ├── rate_limit.py      # Tier-aware rate limiting
-│   │       └── sanitizer.py       # Strip prompt metadata from responses
-│   ├── storage/
-│   │   ├── __init__.py
-│   │   ├── blob_store.py          # S3 for E2E encrypted blobs
-│   │   ├── vector_store.py        # Cloud vector store (Pinecone/Qdrant)
-│   │   └── encryption.py          # Integrity verification only — NO decryption
-│   ├── marketplace/
-│   │   ├── __init__.py
-│   │   ├── plugin_registry.py     # Plugin catalog (metadata, versions, ratings)
-│   │   ├── plugin_review.py       # Review queue + approval workflow
-│   │   └── revenue_share.py       # 70/30 split tracking with Stripe Connect
-│   ├── billing/
-│   │   ├── __init__.py
-│   │   ├── stripe_service.py      # Stripe checkout + webhooks
-│   │   └── tier_manager.py        # Feature matrix per tier
-│   └── config/
-│       ├── __init__.py
-│       └── settings.py            # Pydantic BaseSettings (env-based)
-├── tests/
-│   ├── __init__.py
-│   ├── conftest.py                # Fixtures: test client, mock agents, mock LLM
-│   ├── test_orchestrator.py
-│   ├── test_agents.py
-│   ├── test_auth.py
-│   ├── test_backup.py
-│   ├── test_storage.py
-│   └── test_plugins.py
-├── alembic/                       # DB migrations (auth/billing/marketplace tables only)
-│   ├── alembic.ini
-│   └── versions/
-├── requirements.txt
-├── Dockerfile
-├── docker-compose.yml             # App + PostgreSQL + Redis (dev)
-├── .env.example
-└── README.md
-```
-
---
-
-## Step-by-Step Implementation
-
-### Step 1 — Project scaffolding ✅
- [x] Initialize repo with the directory structure above
- [x] Write `requirements.txt`:
-  ```
-  fastapi>=0.115.0
-  uvicorn[standard]>=0.34.0
-  langchain>=0.3.0
-  langchain-openai>=0.3.0
-  pydantic>=2.10.0
-  python-jose[cryptography]>=3.3.0
-  stripe>=11.0.0
-  boto3>=1.35.0
-  slowapi>=0.1.9
-  sqlalchemy>=2.0.0
-  asyncpg>=0.30.0
-  alembic>=1.14.0
-  bcrypt>=4.2.0
-  python-dotenv>=1.0.0
-  httpx>=0.28.0
-  websockets>=14.0
-  pytest>=8.0.0
-  pytest-asyncio>=0.24.0
-  ```
- [x] Write `app/main.py`: FastAPI app with CORS (allow `app://`, `http://localhost:*`), lifespan (init DB pool, init agent registry), include all routers under `/api/v1`
- [x] Write `app/config/settings.py`: `Settings(BaseSettings)` with fields: `DATABASE_URL`, `JWT_SECRET`, `JWT_ALGORITHM` (default HS256), `STRIPE_SECRET_KEY`, `STRIPE_WEBHOOK_SECRET`, `S3_BUCKET`, `S3_REGION`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `OPENAI_API_KEY`, `CORS_ORIGINS`, `ENV` (dev/prod), `PINECONE_API_KEY`, `PINECONE_INDEX`, `QDRANT_URL`, `QDRANT_API_KEY`
- [x] Write `Dockerfile`: Python 3.12 slim, multi-stage (builder + runtime), non-root user
- [x] Write `docker-compose.yml`: app, postgres:16, optional redis
- [x] Write `.env.example`
- **Outcome:** Runnable FastAPI skeleton (returns 404 on all routes).
-
-### Step 2 — Pydantic schemas (API contracts) ✅
- [x] Create `app/schemas.py` (mirrors `src/shared/api-types.ts` from Electron repo):
-  - `ChatRequest`: `message: str`, `context: ChatContext`, `execution_mode: Literal['direct', 'plan']`
-  - `ChatContext`: `user_profile: dict`, `relevant_documents: list[str]`, `recent_tasks: list[dict]`, `conversation_history: list[dict]`
-  - `ChatResponse`: `response: str`, `actions: list[PlanAction]`
-  - `PlanAction`: `type: Literal['create_record', 'update_record', 'delete_record', 'index_document', 'send_notification', 'call_agent']`, `table: str | None`, `data: dict | None`, `agent: str | None`
-  - `ExecutionPlan`: `agent: str`, `steps: list[PlanStep]`
-  - `PlanStep`: `action: str`, `prompt_template: str | None`, `variables: dict | None`, `data_from_step: int | None`
-  - `BackupMetadata`: `version: int`, `timestamp: int`, `checksum: str`, `chunk_count: int`
-  - `BillingTier`: `Literal['free', 'pro', 'power', 'team']`
-  - `AuthTokens`: `access_token: str`, `refresh_token: str`, `expires_at: int`
-  - `UserProfile`: `id: str`, `email: str`, `tier: BillingTier`
-  - `StorageRecord`: `id: str`, `user_id: str`, `table: str`, `blob: bytes`, `checksum: str`, `created_at: int`, `updated_at: int` — blob is always E2E encrypted by client
-  - `StorageRecordCreate`: `table: str`, `blob: bytes`, `checksum: str`
-  - `StorageRecordUpdate`: `blob: bytes`, `checksum: str`
-  - `VectorUpsertRequest`: `vectors: list[VectorItem]`
-  - `VectorItem`: `id: str`, `blob: bytes`, `checksum: str` — vector + metadata encrypted by client
-  - `VectorSearchRequest`: `query_blob: bytes`, `top_k: int = 10`
-  - `VectorSearchResponse`: `results: list[VectorSearchResult]`
-  - `VectorSearchResult`: `id: str`, `score: float`, `blob: bytes`
-  - `PluginManifest`: `id: str`, `name: str`, `description: str`, `version: str`, `author: str`, `permissions: list[str]`, `category: str`, `price_cents: int = 0`
-  - `PluginListResponse`: `plugins: list[PluginManifest]`, `total: int`, `page: int`
-  - `PluginInstallRequest`: `plugin_id: str`
- **Outcome:** All request/response models defined and validated.
-
-### Step 3 — Agent Registry + base classes ✅
- [x] `app/core/agent_registry.py`:
-  - `BaseAgent(ABC)`:
-    - `user_id: str`, `shared_memory: dict`, `vector_store_context: list[str]`, `skills: list[str]`
-    - Abstract `get_name() -> str`, `get_description() -> str`
-  - `ChatAgent(BaseAgent)`:
-    - Abstract `async handle(query: str, context: dict) -> str`
-    - Abstract `get_tools() -> list` (LangChain tool definitions)
-    - Concrete `_tool_loop(llm, messages, tools, max_iter=5) -> str` — shared tool-calling loop
-  - `AgentRegistry` (singleton):
-    - `_agents: dict[str, ChatAgent]`
-    - `register(agent_class)` — decorator pattern
-    - `get(name) -> ChatAgent`
-    - `list_agents() -> list[dict]` — returns `[{name, description}]` for orchestrator prompt
-    - `async call_agent(name, query, context) -> str` — for inter-agent calls
- [x] Unit tests: register, get, list, call_agent with mock
- **Outcome:** Pluggable agent framework.
-
-### Step 4 — Orchestrator ✅
- [x] `app/core/orchestrator.py`:
-  - `async classify_intent(message, context, registry) -> str`:
-    - System prompt: "You are an intent classifier. Given the user message and context, decide which agent to route to. Available agents: {registry.list_agents()}. Respond with just the agent name."
-    - Uses gpt-4o-mini via LangChain for low latency
-    - Falls back to `task_agent` if no clear match
-  - `async route_single(agent_name, message, context) -> ChatResponse`:
-    - Instantiates agent from registry
-    - Calls `agent.handle(message, context)`
-    - Returns response + any actions the agent produced
-  - `async route_pipeline(agent_names, message, context) -> ChatResponse`:
-    - Executes agents in sequence
-    - Each agent receives `{...context, previous_results: [...]}`
-    - Final synthesis via LLM: "Summarize these agent results into a coherent response"
-  - `async orchestrate(request: ChatRequest) -> ChatResponse | ExecutionPlan`:
-    - Main entry point
-    - Context is transparent to orchestrator — data may originate from local or cloud storage on the client side
-    - Classifies intent
-    - If `execution_mode == 'direct'`: route + return response
-    - If `execution_mode == 'plan'`: route + return execution plan with template IDs
-  - `async orchestrate_stream(request: ChatRequest) -> AsyncGenerator[str, None]`:
-    - Same as orchestrate but yields tokens for WebSocket streaming
- [x] Integration tests with mocked LLM and mocked agents
- **Outcome:** Intelligent routing with single-agent and pipeline modes.
-
-### Step 5 — Execution Plan generator ✅
- [x] `app/core/execution_plan.py`:
-  - `PromptTemplateRegistry`: dict of `template_id -> prompt_text`. Templates are server-side only — client receives IDs.
-  - `ExecutionPlanBuilder`:
-    - `add_step(action, params) -> self`
-    - `add_llm_step(template_id, variables) -> self`
-    - `add_data_step(action, data_from_step) -> self`
-    - `build() -> ExecutionPlan` — validates step references
-  - `PlanCache`:
-    - In-memory LRU (maxsize=1000)
-    - `cache_plan(key, plan)`, `get_plan(key)`, `get_all_playbooks() -> list[ExecutionPlan]`
-    - Playbooks are pre-built plans for common operations (e.g., "create task from email", "generate weekly report")
- **Outcome:** Plans are cacheable as playbooks. Prompt IP never leaves the server.
-
-### Step 6 — Chat Agents ✅
- [x] `app/agents/task_agent.py` — `@registry.register`:
-  - Description: "Manages tasks and comments: list, create, update, delete, due-today, comments"
-  - Tools (8): `list_tasks(project_id, status, search, order_by)`, `create_task(title, description, status, priority, assignees, due_date, project_id, is_ai_suggested, is_approved)`, `update_task(task_id, ...)`, `delete_task(task_id)`, `list_tasks_due_today()`, `list_task_comments(task_id)`, `add_task_comment(task_id, author, content)`, `delete_task_comment(comment_id)`
-  - status: `todo|in_progress|done`; priority: `high|medium|low`; assignees: JSON-encoded string; due_date: ms timestamp
-  - Accepts flexible context; sentinel `-1` for optional integer update fields
- [x] `app/agents/checkpoint_agent.py` — `@registry.register`:
-  - Description: "Manages project checkpoints (milestones): list, create, update, delete"
-  - Tools (4): `list_checkpoints(project_id)`, `create_checkpoint(project_id, title, date, is_ai_suggested, is_approved)`, `update_checkpoint(checkpoint_id, ...)`, `delete_checkpoint(checkpoint_id)`
-  - `project_id` is required for create; date is a ms timestamp; supports AI-suggestion + approval workflow
- [x] `app/agents/project_agent.py` — `@registry.register`:
-  - Description: "Manages projects: list, get, create, update, archive, delete"
-  - Tools (6): `list_projects(client_id, include_archived)`, `list_all_projects()`, `get_project(project_id)`, `create_project(name, client_id)`, `update_project(project_id, ...)`, `delete_project(project_id)`
-  - status: `active|archived`; prefers archive over deletion (docstring guard on delete)
- [x] `app/agents/note_agent.py` — `@registry.register`:
-  - Description: "Manages notes: list, get, create, update, delete"
-  - Tools (5): `list_notes(project_id)`, `get_note(note_id)`, `create_note(title, content, project_id)`, `update_note(note_id, ...)`, `delete_note(note_id)`
-  - content is Markdown; `get_note` should be called before update to preserve existing content
- [x] `app/agents/__init__.py`: imports all four agent modules to trigger `@registry.register` decorators
- [x] Unit tests per agent with mocked LLM (registration, names, tool counts, handle(), direct tool invocation)
- **Outcome:** Four domain-specific agents matching the UI data model (Tasks, Checkpoints, Projects, Notes), all registered and tested.
-
-### Step 7 — Storage Layer ✅
- [x] `app/storage/blob_store.py`:
-  - `BlobStore`: `async upload`, `async download`, `async delete` (idempotent), `async list_keys`
-  - Keys: `{user_id}/{table}/{record_id}` — backend never inspects blob content
-  - boto3 S3 with SSE-S3 at-rest encryption; client checksum stored in S3 object metadata
- [x] `app/storage/vector_store.py`:
-  - `VectorStore`: `async upsert`, `async search`, `async delete`
-  - Pinecone (default, `namespace=user_id`) or Qdrant (`user_id` payload filter) — runtime-configurable
-  - 32-dim SHA-256-derived float vector; blob stored as base64 in metadata/payload
-  - ANN on encrypted data: known accuracy trade-off, documented
- [x] `app/storage/encryption.py`:
-  - `verify_checksum(blob, checksum) -> bool` — SHA-256 + `hmac.compare_digest` (constant-time)
-  - `reject_if_tampered(blob, checksum)` — raises `HTTP 400` on mismatch
-  - Backend NEVER holds decryption keys
- [x] `app/schemas.py`: added `StorageRecord*`, `VectorItem`, `VectorUpsertRequest`, `VectorSearch*`, `Plugin*` schemas
- [x] `app/config/settings.py`: added `PINECONE_API_KEY`, `PINECONE_INDEX`, `QDRANT_URL`, `QDRANT_API_KEY`
- [x] `requirements.txt`: added `moto[s3]`, `pinecone`, `qdrant-client`
- [x] 37 unit tests covering encryption, BlobStore (moto), VectorStore Pinecone, VectorStore Qdrant
- **Outcome:** Cloud storage layer that handles E2E encrypted blobs without ever accessing plaintext.
-
-### Step 8 — API Routes ✅
-
-#### 8a — Chat endpoint
- [x] `app/api/routes/chat.py`:
-  - `POST /api/v1/chat`:
-    - Request: `ChatRequest`
-    - Calls `orchestrate(request)` or `orchestrate()` + `build_plan()`
-    - Response: `ChatResponse` or `ExecutionPlan`
-  - `WebSocket /api/v1/chat/stream`:
-    - Client sends `ChatRequest` as first JSON frame
-    - Server yields token strings via `orchestrate_stream()`
-    - Final frame: JSON `ChatResponse` with `{"done": true, "response": "...", "actions": [...]}`
-    - Heartbeat ping every 30s to keep connection alive
-
-#### 8b — Plans endpoint
- [x] `app/api/routes/plans.py`:
-  - `GET /api/v1/plans/playbook`: Returns all playbooks available for the user's tier
-  - `GET /api/v1/plans/playbook/{plan_id}`: Returns a specific plan
-
-#### 8c — Storage endpoint (cloud records)
- [x] `app/api/routes/storage.py`:
-  - `POST /api/v1/storage/records`: Create encrypted record
-    - Request: `StorageRecordCreate`
-    - Verifies checksum, stores blob in S3, inserts metadata row in PostgreSQL
-    - Response: `{id: str, created_at: int}`
-  - `GET /api/v1/storage/records`: List record metadata (no blobs)
-    - Query params: `table: str`, `page: int`, `limit: int`
-    - Response: `list[{id, table, checksum, created_at, updated_at}]`
-  - `GET /api/v1/storage/records/{id}`: Download encrypted blob
-    - Response: blob bytes + `X-Checksum` header
-  - `PUT /api/v1/storage/records/{id}`: Update encrypted blob
-    - Request: `StorageRecordUpdate`
-  - `DELETE /api/v1/storage/records/{id}`: Delete record + S3 blob
-  - All routes enforce tier cloud_storage_gb quota via `TierManager.check_quota(user_id)`
-
-#### 8d — Vectors endpoint (cloud vector store)
- [x] `app/api/routes/vectors.py`:
-  - `POST /api/v1/storage/vectors/upsert`:
-    - Request: `VectorUpsertRequest`
-    - Verifies checksums, delegates to `VectorStore.upsert()`
-    - Response: `{upserted: int}`
-  - `POST /api/v1/storage/vectors/search`:
-    - Request: `VectorSearchRequest`
-    - Delegates to `VectorStore.search()`
-    - Response: `VectorSearchResponse`
-  - `DELETE /api/v1/storage/vectors`:
-    - Request: `{ids: list[str]}`
-
-#### 8e — Backup endpoint
- [x] `app/api/routes/backup.py`:
-  - `PUT /api/v1/backup`: Accepts binary blob + metadata headers (`X-Backup-Version`, `X-Backup-Timestamp`, `X-Backup-Checksum`). Stores in S3 keyed by `{user_id}/{timestamp}`. Enforces tier limits:
-    - Free: 0 (no backup)
-    - Pro: 5 GB
-    - Power: 25 GB
-    - Team: unlimited
-  - `GET /api/v1/backup`: Returns latest blob for authenticated user. Supports `If-Modified-Since`.
-  - `GET /api/v1/backup/history`: Returns list of `BackupMetadata` (no blobs).
-  - `DELETE /api/v1/backup/{backup_id}`: Delete specific backup.
-
-#### 8f — Plugins endpoint
- [x] `app/api/routes/plugins.py`:
-  - `GET /api/v1/plugins`:
-    - Query params: `category: str | None`, `q: str | None`, `page: int`, `sort: Literal['rating', 'installs', 'newest']`
-    - Response: `PluginListResponse`
-    - Available from Power tier and above
-  - `GET /api/v1/plugins/{id}`:
-    - Response: `PluginManifest` + ratings + install count
-  - `POST /api/v1/plugins/{id}/install`:
-    - Request: `PluginInstallRequest`
-    - Records installation for the user (billing tracking, analytics)
-    - If plugin is paid: triggers Stripe Connect charge + revenue split (70% developer, 30% platform)
-    - Response: `{ok: true, download_url: str}` — signed S3 URL for plugin package
-  - `DELETE /api/v1/plugins/{id}/install`:
-    - Unregisters installation
-
-#### 8g — Auth endpoint
- [x] `app/api/routes/auth.py`:
-  - `POST /api/v1/auth/register`: `{email, password}` → bcrypt hash → insert user → return `AuthTokens`
-  - `POST /api/v1/auth/login`: Validate credentials → return `AuthTokens`
-  - `POST /api/v1/auth/refresh`: Rotate refresh token → return new `AuthTokens`
-  - `GET /api/v1/auth/me`: Return `UserProfile` for current JWT
-
-#### 8h — Billing endpoint
- [x] `app/api/routes/billing.py`:
-  - `POST /api/v1/billing/checkout`: Creates Stripe checkout session → returns URL
-  - `POST /api/v1/billing/webhook`: Handles Stripe webhooks (subscription lifecycle)
-  - `GET /api/v1/billing/subscription`: Returns current subscription info
-  - `DELETE /api/v1/billing/subscription`: Cancels subscription
-
- **Outcome:** Complete REST + WebSocket API covering orchestration, storage, vectors, backup, marketplace.
-
-### Step 9 — Middleware
-
-#### 9a — Auth middleware
- [x] `app/api/middleware/auth.py`:
-  - FastAPI dependency: `get_current_user(token: str = Depends(oauth2_scheme)) -> UserProfile`
-  - Validates JWT signature, expiry, extracts `user_id` and `tier`
-  - Raises `401` on invalid/expired token
-  - Exempt routes: `/api/v1/auth/register`, `/api/v1/auth/login`, `/api/v1/billing/webhook`
-
-#### 9b — Rate limiter
- [x] `app/api/middleware/rate_limit.py`:
-  - Uses `slowapi` with `Limiter(key_func=get_user_id_from_jwt)`
-  - Tier-based limits:
-    - Free: 20 req/min
-    - Pro: 60 req/min
-    - Power: 120 req/min
-    - Team: 200 req/seat/min
-  - Custom 429 response with `Retry-After` header
-
-#### 9c — Sanitizer
- [x] `app/api/middleware/sanitizer.py`:
-  - Response middleware that scans response bodies
-  - Strips: system prompt fragments, agent internal reasoning, tool schemas, routing metadata
-  - Pattern-based detection + exact match against known prompt fingerprints
-  - Logs sanitization events for monitoring
-
- **Outcome:** Secure, rate-limited API with prompt IP protection.
-
-### Step 10 — Plugin Marketplace ✅
- [x] `app/marketplace/plugin_registry.py`:
-  - `PluginRegistry`:
-    - `async list_plugins(category, query, page, sort) -> PluginListResponse`
-    - `async get_plugin(plugin_id) -> PluginManifest | None`
-    - `async submit_plugin(manifest: PluginManifest, package_s3_key: str) -> str` — returns plugin_id, sets status = 'pending_review'
-    - `async approve_plugin(plugin_id) -> None` — admin only, sets status = 'approved'
-    - `async reject_plugin(plugin_id, reason: str) -> None`
- [x] `app/marketplace/plugin_review.py`:
-  - `ReviewQueue`:
-    - `async get_pending() -> list[dict]`
-    - `async submit_review(plugin_id, reviewer_id, decision, notes) -> None`
-  - Security checklist enforced before approval: manifest schema valid, permissions are from allowed set, no binary blobs in manifest
- [x] `app/marketplace/revenue_share.py`:
-  - `RevenueShare`:
-    - `async record_install(plugin_id, user_id, amount_cents) -> None`
-    - `async payout_developer(plugin_id, period) -> None` — Stripe Connect transfer: 70% to developer
-    - `async get_earnings(developer_id, period) -> dict`
- **Outcome:** Plugin marketplace with catalog, review workflow, and revenue split.
-
-### Step 11 — Billing & Tier management ✅
- [x] `app/billing/stripe_service.py`:
-  - `create_checkout_session(user_id, tier) -> str`
-  - `handle_webhook(payload, sig_header) -> None`: processes `checkout.session.completed`, `customer.subscription.updated`, `customer.subscription.deleted`, `invoice.payment_failed`
-  - `get_subscription(user_id) -> dict | None`
-  - `cancel_subscription(user_id) -> None`
- [x] `app/billing/tier_manager.py`:
-  - `TierManager`:
-    - Feature matrix:
-      ```python
-      FEATURES = {
-          'free':  {
-              'agents': 3,
-              'batch_active': 2,
-              'cloud_storage_gb': 0,
-              'backup_gb': 0,
-              'providers': 1,
-              'batch_builder': False,
-              'plugin_marketplace': False,
-              'sso': False,
-          },
-          'pro':   {
-              'agents': -1,          # unlimited
-              'batch_active': 10,
-              'cloud_storage_gb': 5,
-              'backup_gb': 5,
-              'providers': -1,
-              'batch_builder': False,
-              'plugin_marketplace': False,
-              'sso': False,
-          },
-          'power': {
-              'agents': -1,
-              'batch_active': -1,    # unlimited
-              'cloud_storage_gb': 25,
-              'backup_gb': 25,
-              'providers': -1,
-              'batch_builder': True,
-              'plugin_marketplace': True,
-              'sso': False,
-          },
-          'team':  {
-              'agents': -1,
-              'batch_active': -1,
-              'cloud_storage_gb': -1,
-              'backup_gb': -1,
-              'providers': -1,
-              'batch_builder': True,
-              'plugin_marketplace': True,
-              'sso': True,
-          },
-      }
-      ```
-    - `get_tier(user_id) -> BillingTier`
-    - `check_feature(user_id, feature) -> bool`
-    - `get_rate_limit(tier) -> int`
-    - `check_quota(user_id) -> bool` — checks cloud_storage_gb current usage vs limit
- [x] `app/billing/__init__.py`: exports `stripe_service` and `tier_manager` singletons
- [x] `app/api/routes/billing.py`: refactored to delegate to `StripeService`
- [x] `app/api/routes/storage.py` and `backup.py`: `_check_quota` now delegates to `tier_manager.enforce_quota` / `enforce_backup_quota`
- **Outcome:** Stripe integration with tier-based feature gating matching Free/Pro(15€)/Power(29€)/Team(49€/seat).
-
-### Step 12 — Database (auth/billing/marketplace only)
- [x] PostgreSQL schema via Alembic:
-  - `users`: `id UUID PK`, `email UNIQUE`, `password_hash`, `tier` (default 'free'), `stripe_customer_id`, `created_at`, `updated_at`
-  - `refresh_tokens`: `id UUID PK`, `user_id FK`, `token_hash`, `expires_at`, `created_at`
-  - `subscriptions`: `id UUID PK`, `user_id FK`, `stripe_subscription_id`, `tier`, `status`, `current_period_end`, `created_at`
-  - `backup_metadata`: `id UUID PK`, `user_id FK`, `s3_key`, `version`, `timestamp`, `checksum`, `size_bytes`, `created_at`
-  - `storage_records`: `id UUID PK`, `user_id FK`, `table_name VARCHAR`, `s3_key`, `checksum`, `size_bytes`, `created_at`, `updated_at` — metadata only, no plaintext
-  - `plugins`: `id UUID PK`, `name`, `description`, `version`, `author_id FK`, `category`, `status` (pending_review/approved/rejected), `price_cents`, `s3_package_key`, `install_count`, `avg_rating`, `created_at`
-  - `plugin_installations`: `id UUID PK`, `plugin_id FK`, `user_id FK`, `installed_at`
-  - `plugin_reviews`: `id UUID PK`, `plugin_id FK`, `reviewer_id FK`, `decision`, `notes`, `reviewed_at`
-  - `revenue_events`: `id UUID PK`, `plugin_id FK`, `user_id FK`, `amount_cents`, `developer_share_cents`, `stripe_transfer_id`, `created_at`
- [x] Initial Alembic migration
- [x] SQLAlchemy models in `app/models.py`
- **Outcome:** Auth, billing, storage metadata, and marketplace persistence. Zero user data in plaintext.
-
-### Step 13 — Testing & deployment ✅
- [x] `tests/conftest.py`: TestClient fixture, mock LLM fixture (`AsyncMock` returning canned responses), mock agent fixture, test DB (SQLite in-memory for speed), mock S3 (moto), mock Pinecone
- [x] `tests/test_orchestrator.py`: classify_intent routing, single agent, pipeline, plan mode
- [x] `tests/test_agents.py`: each agent with mocked tools
- [x] `tests/test_auth.py`: register → login → access protected → refresh → expired token
- [x] `tests/test_backup.py`: upload → download → history → delete, tier limit enforcement
- [x] `tests/test_storage.py`: create record → list → download → update → delete, checksum rejection, quota enforcement
- [x] `tests/test_plugins.py`: list plugins, install, uninstall, revenue event creation, tier gate (free user blocked)
- [x] `Dockerfile` optimized for production (gunicorn + uvicorn workers)
- [x] GitHub Actions CI: lint (ruff), test (pytest), build Docker image
- **Outcome:** Fully tested, deployable backend.
-
---
-
-## API Contract Summary
-
-| Method | Endpoint | Auth | Request | Response |
-|--------|----------|------|---------|----------|
-| POST | `/api/v1/auth/register` | No | `{email, password}` | `AuthTokens` |
-| POST | `/api/v1/auth/login` | No | `{email, password}` | `AuthTokens` |
-| POST | `/api/v1/auth/refresh` | No | `{refresh_token}` | `AuthTokens` |
-| GET | `/api/v1/auth/me` | JWT | — | `UserProfile` |
-| POST | `/api/v1/chat` | JWT | `ChatRequest` | `ChatResponse \| ExecutionPlan` |
-| WS | `/api/v1/chat/stream` | JWT | `ChatRequest` (first frame) | Token stream + final JSON |
-| GET | `/api/v1/plans/playbook` | JWT | — | `ExecutionPlan[]` |
-| GET | `/api/v1/plans/playbook/:id` | JWT | — | `ExecutionPlan` |
-| POST | `/api/v1/storage/records` | JWT | `StorageRecordCreate` | `{id, created_at}` |
-| GET | `/api/v1/storage/records` | JWT | `?table&page&limit` | `RecordMeta[]` |
-| GET | `/api/v1/storage/records/:id` | JWT | — | Binary blob |
-| PUT | `/api/v1/storage/records/:id` | JWT | `StorageRecordUpdate` | `{ok: true}` |
-| DELETE | `/api/v1/storage/records/:id` | JWT | — | `{ok: true}` |
-| POST | `/api/v1/storage/vectors/upsert` | JWT | `VectorUpsertRequest` | `{upserted: int}` |
-| POST | `/api/v1/storage/vectors/search` | JWT | `VectorSearchRequest` | `VectorSearchResponse` |
-| DELETE | `/api/v1/storage/vectors` | JWT | `{ids: list[str]}` | `{ok: true}` |
-| PUT | `/api/v1/backup` | JWT | Binary blob + headers | `{ok: true}` |
-| GET | `/api/v1/backup` | JWT | — | Binary blob |
-| GET | `/api/v1/backup/history` | JWT | — | `BackupMetadata[]` |
-| DELETE | `/api/v1/backup/:id` | JWT | — | `{ok: true}` |
-| GET | `/api/v1/plugins` | JWT | `?category&q&page&sort` | `PluginListResponse` |
-| GET | `/api/v1/plugins/:id` | JWT | — | `PluginManifest` + stats |
-| POST | `/api/v1/plugins/:id/install` | JWT | `PluginInstallRequest` | `{ok, download_url}` |
-| DELETE | `/api/v1/plugins/:id/install` | JWT | — | `{ok: true}` |
-| POST | `/api/v1/billing/checkout` | JWT | `{tier}` | `{checkout_url}` |
-| POST | `/api/v1/billing/webhook` | Stripe sig | Stripe event | `{ok: true}` |
-| GET | `/api/v1/billing/subscription` | JWT | — | Subscription info |
-| DELETE | `/api/v1/billing/subscription` | JWT | — | `{ok: true}` |
-| GET | `/api/v1/health` | No | — | `{status, version}` |
-
---
-
-## Stack
-
-| Layer | Technology |
-|-------|-----------|
-| Framework | FastAPI + Uvicorn |
-| LLM | LangChain + langchain-openai |
-| Auth | PyJWT + bcrypt + OAuth2 |
-| Billing | stripe-python + Stripe Connect |
-| Blob storage | boto3 (S3) |
-| Vector store | Pinecone or Qdrant (configurable) |
-| Database | PostgreSQL + SQLAlchemy + Alembic |
-| Rate limiting | slowapi |
-| Testing | pytest + pytest-asyncio + httpx + moto (S3 mock) |
-| Deployment | Docker → fly.io / Railway / AWS ECS |
-
---
-
-## Development Rules
-
-1. **NEVER persist user data in plaintext.** The DB stores only auth, billing, storage metadata, and marketplace data. User context arrives in requests and is discarded. Cloud blobs are E2E encrypted client-side — backend only stores opaque bytes.
-2. **NEVER expose prompts.** System prompts are composed server-side from fragments. Responses are sanitized before sending. In plan mode, `prompt_template` fields are reference IDs only.
-3. **NEVER decrypt user blobs.** `app/storage/encryption.py` only verifies checksums. No decryption key ever reaches the backend.
-4. **Stateless request handling.** No server-side session state. All context comes from the client + JWT.
-5. **Type hints everywhere.** All functions have full type annotations.
-6. **Test every agent.** Each chat agent has unit tests with mocked LLM responses.
-7. **Structured logging.** JSON logs with request ID correlation.
-8. **Tier gates are enforced server-side.** Never trust client-reported tier. Always fetch from DB via `TierManager.get_tier(user_id)`.
-9. **One step at a time.** Implement one numbered step per session. When the step is fully done, mark all its checkboxes as `[x]` in this file and commit with message `step N complete: <outcome line>`.
--- a/README.md
+++ b/README.md
@@ -83,7 +83,7 @@ Adiuva Cloud API is the FastAPI backend that powers the **Adiuva Electron deskto
 ## Key Features

 1. **LLM-powered orchestration** — GPT-4o-mini classifies user intent and routes to the appropriate domain agent.
-2. **4 specialized AI agents** — Tasks (8 tools), Projects (6 tools), Checkpoints (4 tools), Notes (5 tools), all powered by GPT-4o via LangChain.
+2. **4 specialized AI agents** — Tasks (8 tools), Projects (6 tools), Timelines (4 tools), Notes (5 tools), all powered by GPT-4o via LangChain.
 3. **Execution plans & playbooks** — Server-side prompt template registry; clients receive only opaque template IDs, never raw prompts.
 4. **E2E encrypted cloud storage** — The backend never decrypts user data; SHA-256 checksum verification uses constant-time comparison to prevent timing attacks.
 5. **Cloud vector store** — Pinecone or Qdrant with user-isolated namespaces and encrypted blob payloads.
@@ -449,7 +449,7 @@ The agent system uses a registry pattern with LangChain tool-calling agents powe
 |---|---|---|---|
 | **TaskAgent** | `task_agent` | 8 | Full task and comment CRUD. Status: `todo` / `in_progress` / `done`. Priority: `high` / `medium` / `low`. Tools: `list_tasks`, `create_task`, `update_task`, `delete_task`, `list_tasks_due_today`, `list_task_comments`, `add_task_comment`, `delete_task_comment` |
 | **ProjectAgent** | `project_agent` | 6 | Project lifecycle management. Status: `active` / `archived`. Prefers archiving over deletion. Tools: `list_projects`, `list_all_projects`, `get_project`, `create_project`, `update_project`, `delete_project` |
-| **CheckpointAgent** | `checkpoint_agent` | 4 | Project milestones. Requires `project_id` for creation. Supports AI-suggestion and approval workflows. Tools: `list_checkpoints`, `create_checkpoint`, `update_checkpoint`, `delete_checkpoint` |
+| **TimelineAgent** | `timeline_agent` | 4 | Project milestones. Requires `project_id` for creation. Supports AI-suggestion and approval workflows. Tools: `list_timelines`, `create_timeline`, `update_timeline`, `delete_timeline` |
 | **NoteAgent** | `note_agent` | 5 | Markdown note management. Optionally linked to projects. Tools: `list_notes`, `get_note`, `create_note`, `update_note`, `delete_note` |

 All agents use the model configured by `LLM_MODEL` (default: GPT-4o) with `temperature=0` via LiteLLM. Tools return JSON action descriptors that the Electron client interprets and applies locally.
@@ -504,7 +504,7 @@ Source: `app/core/orchestrator.py`, `app/core/execution_plan.py`

 ### Built-in Templates (6)

-`tpl_task_agent_default`, `tpl_checkpoint_agent_default`, `tpl_project_agent_default`, `tpl_note_agent_default`, `tpl_task_extract_from_project`, `tpl_note_weekly_summary`
+`tpl_task_agent_default`, `tpl_timeline_agent_default`, `tpl_project_agent_default`, `tpl_note_agent_default`, `tpl_task_extract_from_project`, `tpl_note_weekly_summary`

 ### Built-in Playbooks (2)

@@ -643,7 +643,7 @@ Source: `app/marketplace/`
  - Plugin ID must match `^[a-z0-9-]+$`
  - Permissions must be from the allowed set only
  - No binary blobs in the manifest
- **Allowed permissions:** `read:tasks`, `write:tasks`, `read:projects`, `write:projects`, `read:notes`, `write:notes`, `read:checkpoints`, `write:checkpoints`, `read:calendar`, `write:calendar`
+- **Allowed permissions:** `read:tasks`, `write:tasks`, `read:projects`, `write:projects`, `read:notes`, `write:notes`, `read:timelines`, `write:timelines`, `read:calendar`, `write:calendar`
 - `get_pending(db)` — Lists plugins awaiting review.
 - `submit_review(db, plugin_id, reviewer_id, decision, notes)` — Records the review decision.

@@ -734,7 +734,7 @@ adiuva-api/
 │   ├── agents/                  # LLM-powered domain agents
 │   │   ├── task_agent.py        # Task & comment CRUD (8 tools)
 │   │   ├── project_agent.py     # Project lifecycle (6 tools)
-│   │   ├── checkpoint_agent.py  # Milestones (4 tools)
+│   │   ├── timeline_agent.py  # Milestones (4 tools)
 │   │   └── note_agent.py        # Markdown notes (5 tools)
 │   │
 │   ├── core/                    # Orchestration engine
--- a/alembic/versions/001_initial_schema.py
+++ b/alembic/versions/001_initial_schema.py
@@ -21,18 +21,25 @@ depends_on: Union[str, Sequence[str], None] = None


 def upgrade() -> None:
-    # ── Enum types ────────────────────────────────────────────────────────
-    billing_tier = postgresql.ENUM(
-        "free", "pro", "power", "team", name="billing_tier", create_type=False
-    )
-    plugin_status = postgresql.ENUM(
-        "pending_review", "approved", "rejected", name="plugin_status", create_type=False
-    )
-    review_decision = postgresql.ENUM(
-        "approved", "rejected", name="review_decision", create_type=False
-    )
-    for enum in (billing_tier, plugin_status, review_decision):
-        enum.create(op.get_bind(), checkfirst=True)
+    # ── Enum types — idempotent creation via exception handling ───────────
+    op.execute("""
+        DO $$ BEGIN
+            CREATE TYPE billing_tier AS ENUM ('free', 'pro', 'power', 'team');
+        EXCEPTION WHEN duplicate_object THEN NULL;
+        END $$;
+    """)
+    op.execute("""
+        DO $$ BEGIN
+            CREATE TYPE plugin_status AS ENUM ('pending_review', 'approved', 'rejected');
+        EXCEPTION WHEN duplicate_object THEN NULL;
+        END $$;
+    """)
+    op.execute("""
+        DO $$ BEGIN
+            CREATE TYPE review_decision AS ENUM ('approved', 'rejected');
+        EXCEPTION WHEN duplicate_object THEN NULL;
+        END $$;
+    """)

    # ── users ─────────────────────────────────────────────────────────────
    op.create_table(
@@ -40,7 +47,7 @@ def upgrade() -> None:
        sa.Column("id", postgresql.UUID(as_uuid=False), nullable=False),
        sa.Column("email", sa.String(255), nullable=False),
        sa.Column("password_hash", sa.String(255), nullable=False),
-        sa.Column("tier", sa.Enum("free", "pro", "power", "team", name="billing_tier", create_type=False), nullable=False, server_default="free"),
+        sa.Column("tier", postgresql.ENUM("free", "pro", "power", "team", name="billing_tier", create_type=False), nullable=False, server_default="free"),
        sa.Column("stripe_customer_id", sa.String(255), nullable=True),
        sa.Column("created_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
        sa.Column("updated_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
@@ -70,7 +77,7 @@ def upgrade() -> None:
        sa.Column("id", postgresql.UUID(as_uuid=False), nullable=False),
        sa.Column("user_id", postgresql.UUID(as_uuid=False), nullable=False),
        sa.Column("stripe_subscription_id", sa.String(255), nullable=True),
-        sa.Column("tier", sa.Enum("free", "pro", "power", "team", name="billing_tier", create_type=False), nullable=False, server_default="free"),
+        sa.Column("tier", postgresql.ENUM("free", "pro", "power", "team", name="billing_tier", create_type=False), nullable=False, server_default="free"),
        sa.Column("status", sa.String(50), nullable=False, server_default="free"),
        sa.Column("current_period_end", sa.DateTime(timezone=True), nullable=True),
        sa.Column("created_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
@@ -125,7 +132,7 @@ def upgrade() -> None:
        sa.Column("category", sa.String(100), nullable=False, server_default=""),
        sa.Column("price_cents", sa.Integer, nullable=False, server_default="0"),
        sa.Column("permissions", sa.Text, nullable=False, server_default="[]"),
-        sa.Column("status", sa.Enum("pending_review", "approved", "rejected", name="plugin_status", create_type=False), nullable=False, server_default="pending_review"),
+        sa.Column("status", postgresql.ENUM("pending_review", "approved", "rejected", name="plugin_status", create_type=False), nullable=False, server_default="pending_review"),
        sa.Column("s3_package_key", sa.String(500), nullable=True),
        sa.Column("install_count", sa.Integer, nullable=False, server_default="0"),
        sa.Column("avg_rating", sa.Float, nullable=False, server_default="0.0"),
@@ -157,7 +164,7 @@ def upgrade() -> None:
        sa.Column("id", postgresql.UUID(as_uuid=False), nullable=False),
        sa.Column("plugin_id", sa.String(255), nullable=False),
        sa.Column("reviewer_id", postgresql.UUID(as_uuid=False), nullable=True),
-        sa.Column("decision", sa.Enum("approved", "rejected", name="review_decision", create_type=False), nullable=False),
+        sa.Column("decision", postgresql.ENUM("approved", "rejected", name="review_decision", create_type=False), nullable=False),
        sa.Column("notes", sa.Text, nullable=True),
        sa.Column("reviewed_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
        sa.Column("created_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
--- a/alembic/versions/002_seed_plugins.py
+++ b/alembic/versions/002_seed_plugins.py
@@ -37,12 +37,12 @@ _SEED_PLUGINS = [
    {
        "id": "plugin-slack-notify",
        "name": "Slack Notifier",
-        "description": "Post task and checkpoint updates to Slack channels.",
+        "description": "Post task and timeline updates to Slack channels.",
        "version": "1.2.0",
        "author_name": "Adiuva",
        "category": "communication",
        "price_cents": 499,
-        "permissions": json.dumps(["read:tasks", "read:checkpoints"]),
+        "permissions": json.dumps(["read:tasks", "read:timelines"]),
        "status": "approved",
        "s3_package_key": "plugins/plugin-slack-notify/1.2.0/package.zip",
        "install_count": 0,
--- a/alembic/versions/003_agent_tables.py
+++ b/alembic/versions/003_agent_tables.py
@@ -0,0 +1,127 @@
+"""Add agent config and run log tables: local_agent_configs, cloud_agent_configs, agent_run_logs.
+
+Revision ID: 003
+Revises: 002
+Create Date: 2026-03-05
+"""
+
+from __future__ import annotations
+
+from typing import Sequence, Union
+
+import sqlalchemy as sa
+from alembic import op
+from sqlalchemy.dialects import postgresql
+
+revision: str = "003"
+down_revision: Union[str, None] = "002"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    # ── Enum types — idempotent creation ──────────────────────────────────
+    op.execute("""
+        DO $$ BEGIN
+            CREATE TYPE agent_type AS ENUM ('local', 'cloud');
+        EXCEPTION WHEN duplicate_object THEN NULL;
+        END $$;
+    """)
+    op.execute("""
+        DO $$ BEGIN
+            CREATE TYPE agent_run_status AS ENUM ('running', 'success', 'error', 'partial');
+        EXCEPTION WHEN duplicate_object THEN NULL;
+        END $$;
+    """)
+    op.execute("""
+        DO $$ BEGIN
+            CREATE TYPE cloud_provider AS ENUM ('gmail', 'teams', 'outlook');
+        EXCEPTION WHEN duplicate_object THEN NULL;
+        END $$;
+    """)
+
+    # ── local_agent_configs ───────────────────────────────────────────────
+    op.create_table(
+        "local_agent_configs",
+        sa.Column("id", postgresql.UUID(as_uuid=False), nullable=False),
+        sa.Column("user_id", postgresql.UUID(as_uuid=False), nullable=False),
+        sa.Column("device_id", sa.String(255), nullable=False),
+        sa.Column("name", sa.String(255), nullable=False),
+        sa.Column("directory_paths", sa.JSON, nullable=False, server_default="[]"),
+        sa.Column("data_types", sa.JSON, nullable=False, server_default="[]"),
+        sa.Column("prompt_template", sa.Text, nullable=False, server_default=""),
+        sa.Column("file_extensions", sa.JSON, nullable=False, server_default="[]"),
+        sa.Column("schedule_cron", sa.String(100), nullable=False, server_default="0 */6 * * *"),
+        sa.Column("enabled", sa.Boolean, nullable=False, server_default=sa.true()),
+        sa.Column("last_run_at", sa.DateTime(timezone=True), nullable=True),
+        sa.Column("created_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
+        sa.Column("updated_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
+        sa.PrimaryKeyConstraint("id"),
+        sa.ForeignKeyConstraint(["user_id"], ["users.id"], ondelete="CASCADE"),
+    )
+    op.create_index("ix_local_agent_configs_user_id", "local_agent_configs", ["user_id"])
+
+    # ── cloud_agent_configs ───────────────────────────────────────────────
+    op.create_table(
+        "cloud_agent_configs",
+        sa.Column("id", postgresql.UUID(as_uuid=False), nullable=False),
+        sa.Column("user_id", postgresql.UUID(as_uuid=False), nullable=False),
+        sa.Column(
+            "provider",
+            postgresql.ENUM("gmail", "teams", "outlook", name="cloud_provider", create_type=False),
+            nullable=False,
+        ),
+        sa.Column("name", sa.String(255), nullable=False),
+        sa.Column("data_types", sa.JSON, nullable=False, server_default="[]"),
+        sa.Column("prompt_template", sa.Text, nullable=False, server_default=""),
+        sa.Column("oauth_token_encrypted", sa.Text, nullable=True),
+        sa.Column("filter_config", sa.JSON, nullable=True),
+        sa.Column("schedule_cron", sa.String(100), nullable=False, server_default="0 */6 * * *"),
+        sa.Column("enabled", sa.Boolean, nullable=False, server_default=sa.true()),
+        sa.Column("last_run_at", sa.DateTime(timezone=True), nullable=True),
+        sa.Column("created_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
+        sa.Column("updated_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
+        sa.PrimaryKeyConstraint("id"),
+        sa.ForeignKeyConstraint(["user_id"], ["users.id"], ondelete="CASCADE"),
+    )
+    op.create_index("ix_cloud_agent_configs_user_id", "cloud_agent_configs", ["user_id"])
+
+    # ── agent_run_logs ─────────────────────────────────────────────────────
+    op.create_table(
+        "agent_run_logs",
+        sa.Column("id", postgresql.UUID(as_uuid=False), nullable=False),
+        # Plain string — not a FK because it references either local_agent_configs or
+        # cloud_agent_configs depending on agent_type.
+        sa.Column("agent_id", sa.String(255), nullable=False),
+        sa.Column(
+            "agent_type",
+            postgresql.ENUM("local", "cloud", name="agent_type", create_type=False),
+            nullable=False,
+        ),
+        sa.Column("user_id", postgresql.UUID(as_uuid=False), nullable=False),
+        sa.Column(
+            "status",
+            postgresql.ENUM("running", "success", "error", "partial", name="agent_run_status", create_type=False),
+            nullable=False,
+            server_default="running",
+        ),
+        sa.Column("items_processed", sa.Integer, nullable=False, server_default="0"),
+        sa.Column("items_created", sa.Integer, nullable=False, server_default="0"),
+        sa.Column("errors", sa.JSON, nullable=True),
+        sa.Column("started_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
+        sa.Column("completed_at", sa.DateTime(timezone=True), nullable=True),
+        sa.PrimaryKeyConstraint("id"),
+        sa.ForeignKeyConstraint(["user_id"], ["users.id"], ondelete="CASCADE"),
+    )
+    op.create_index("ix_agent_run_logs_user_id", "agent_run_logs", ["user_id"])
+    op.create_index("ix_agent_run_logs_agent_id", "agent_run_logs", ["agent_id"])
+
+
+def downgrade() -> None:
+    op.drop_table("agent_run_logs")
+    op.drop_table("cloud_agent_configs")
+    op.drop_table("local_agent_configs")
+
+    op.execute("DROP TYPE IF EXISTS cloud_provider;")
+    op.execute("DROP TYPE IF EXISTS agent_run_status;")
+    op.execute("DROP TYPE IF EXISTS agent_type;")
--- a/alembic/versions/004_add_memory_tables.py
+++ b/alembic/versions/004_add_memory_tables.py
@@ -0,0 +1,144 @@
+"""Add memory tables and user encryption_key column.
+
+Memory tables:
+  memory_core        — per-user key/value preferences (encrypted)
+  memory_associative — semantic memory with pgvector embedding (encrypted)
+  memory_episodic    — session summaries (encrypted)
+  memory_proactive   — behavioral patterns (encrypted)
+
+Also adds encryption_key column to users table.
+
+Revision ID: 004
+Revises: 003
+Create Date: 2026-03-08
+"""
+
+from __future__ import annotations
+
+from typing import Sequence, Union
+
+import sqlalchemy as sa
+from alembic import op
+from sqlalchemy.dialects import postgresql
+
+revision: str = "004"
+down_revision: Union[str, None] = "003"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    # ── Enable pgvector extension (idempotent) ────────────────────────────────
+    op.execute("CREATE EXTENSION IF NOT EXISTS vector;")
+
+    # ── Add encryption_key to users ───────────────────────────────────────────
+    op.add_column(
+        "users",
+        sa.Column("encryption_key", sa.String(64), nullable=True),
+    )
+
+    # ── memory_core ───────────────────────────────────────────────────────────
+    op.create_table(
+        "memory_core",
+        sa.Column("id", postgresql.UUID(as_uuid=False), primary_key=True),
+        sa.Column(
+            "user_id",
+            postgresql.UUID(as_uuid=False),
+            sa.ForeignKey("users.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+        sa.Column("key", sa.String(255), nullable=False),
+        sa.Column("value_encrypted", sa.Text, nullable=False),
+        sa.Column(
+            "updated_at",
+            sa.DateTime(timezone=True),
+            nullable=False,
+            server_default=sa.func.now(),
+        ),
+    )
+    op.create_index("ix_memory_core_user_id", "memory_core", ["user_id"])
+
+    # ── memory_associative ────────────────────────────────────────────────────
+    # The embedding column uses pgvector's vector(1536) type.
+    op.create_table(
+        "memory_associative",
+        sa.Column("id", postgresql.UUID(as_uuid=False), primary_key=True),
+        sa.Column(
+            "user_id",
+            postgresql.UUID(as_uuid=False),
+            sa.ForeignKey("users.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+        sa.Column("content_encrypted", sa.Text, nullable=False),
+        sa.Column("entity_type", sa.String(100), nullable=True),
+        sa.Column("entity_id", sa.String(255), nullable=True),
+        sa.Column(
+            "updated_at",
+            sa.DateTime(timezone=True),
+            nullable=False,
+            server_default=sa.func.now(),
+        ),
+    )
+    # Add the pgvector column separately (not supported by generic sa types)
+    op.execute(
+        "ALTER TABLE memory_associative ADD COLUMN embedding vector(1536);"
+    )
+    op.create_index("ix_memory_associative_user_id", "memory_associative", ["user_id"])
+    # IVFFlat index for approximate nearest-neighbour search
+    op.execute(
+        "CREATE INDEX ix_memory_associative_embedding "
+        "ON memory_associative USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);"
+    )
+
+    # ── memory_episodic ───────────────────────────────────────────────────────
+    op.create_table(
+        "memory_episodic",
+        sa.Column("id", postgresql.UUID(as_uuid=False), primary_key=True),
+        sa.Column(
+            "user_id",
+            postgresql.UUID(as_uuid=False),
+            sa.ForeignKey("users.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+        sa.Column("summary_encrypted", sa.Text, nullable=False),
+        sa.Column("session_id", sa.String(255), nullable=False),
+        sa.Column(
+            "created_at",
+            sa.DateTime(timezone=True),
+            nullable=False,
+            server_default=sa.func.now(),
+        ),
+    )
+    op.create_index("ix_memory_episodic_user_id", "memory_episodic", ["user_id"])
+    op.create_index("ix_memory_episodic_session_id", "memory_episodic", ["session_id"])
+
+    # ── memory_proactive ──────────────────────────────────────────────────────
+    op.create_table(
+        "memory_proactive",
+        sa.Column("id", postgresql.UUID(as_uuid=False), primary_key=True),
+        sa.Column(
+            "user_id",
+            postgresql.UUID(as_uuid=False),
+            sa.ForeignKey("users.id", ondelete="CASCADE"),
+            nullable=False,
+        ),
+        sa.Column("pattern_encrypted", sa.Text, nullable=False),
+        sa.Column("confidence", sa.Float, nullable=False, server_default="0.5"),
+        sa.Column("source", sa.String(50), nullable=False, server_default="inferred"),
+        sa.Column(
+            "created_at",
+            sa.DateTime(timezone=True),
+            nullable=False,
+            server_default=sa.func.now(),
+        ),
+    )
+    op.create_index("ix_memory_proactive_user_id", "memory_proactive", ["user_id"])
+
+
+def downgrade() -> None:
+    op.drop_table("memory_proactive")
+    op.drop_table("memory_episodic")
+    op.drop_index("ix_memory_associative_embedding", "memory_associative")
+    op.drop_table("memory_associative")
+    op.drop_table("memory_core")
+    op.drop_column("users", "encryption_key")
--- a/alembic/versions/818478c251dc_add_name_and_surname_to_users_table.py
+++ b/alembic/versions/818478c251dc_add_name_and_surname_to_users_table.py
@@ -0,0 +1,30 @@
+"""add name and surname to users table
+
+Revision ID: 818478c251dc
+Revises: 004
+Create Date: 2026-03-10 15:10:42.811947
+
+"""
+from __future__ import annotations
+
+from typing import Sequence, Union
+
+from alembic import op
+import sqlalchemy as sa
+
+
+# revision identifiers, used by Alembic.
+revision: str = '818478c251dc'
+down_revision: Union[str, None] = '004'
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    op.add_column('users', sa.Column('name', sa.String(length=100), nullable=True))
+    op.add_column('users', sa.Column('surname', sa.String(length=100), nullable=True))
+
+
+def downgrade() -> None:
+    op.drop_column('users', 'surname')
+    op.drop_column('users', 'name')
--- a/alembic/versions/9a1f2d0b6c7e_deprecate_backend_agent_config_tables.py
+++ b/alembic/versions/9a1f2d0b6c7e_deprecate_backend_agent_config_tables.py
@@ -0,0 +1,92 @@
+"""Deprecate backend agent config tables.
+
+The Electron client is now the source of truth for agent configuration
+(directory, extract targets, batch interval, custom prompt). Backend keeps
+billing checks and trigger/run logs only.
+
+Revision ID: 9a1f2d0b6c7e
+Revises: 818478c251dc
+Create Date: 2026-03-16
+"""
+
+from __future__ import annotations
+
+from typing import Sequence, Union
+
+import sqlalchemy as sa
+from alembic import op
+from sqlalchemy.dialects import postgresql
+
+revision: str = "9a1f2d0b6c7e"
+down_revision: Union[str, None] = "818478c251dc"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    bind = op.get_bind()
+    inspector = sa.inspect(bind)
+    existing = set(inspector.get_table_names())
+
+    if "cloud_agent_configs" in existing:
+        op.drop_index("ix_cloud_agent_configs_user_id", table_name="cloud_agent_configs")
+        op.drop_table("cloud_agent_configs")
+
+    if "local_agent_configs" in existing:
+        op.drop_index("ix_local_agent_configs_user_id", table_name="local_agent_configs")
+        op.drop_table("local_agent_configs")
+
+
+def downgrade() -> None:
+    op.create_table(
+        "local_agent_configs",
+        sa.Column("id", postgresql.UUID(as_uuid=False), nullable=False),
+        sa.Column("user_id", postgresql.UUID(as_uuid=False), nullable=False),
+        sa.Column("device_id", sa.String(255), nullable=False),
+        sa.Column("name", sa.String(255), nullable=False),
+        sa.Column("directory_paths", sa.JSON, nullable=False, server_default="[]"),
+        sa.Column("data_types", sa.JSON, nullable=False, server_default="[]"),
+        sa.Column("prompt_template", sa.Text, nullable=False, server_default=""),
+        sa.Column("file_extensions", sa.JSON, nullable=False, server_default="[]"),
+        sa.Column("schedule_cron", sa.String(100), nullable=False, server_default="0 */6 * * *"),
+        sa.Column("enabled", sa.Boolean, nullable=False, server_default=sa.true()),
+        sa.Column("last_run_at", sa.DateTime(timezone=True), nullable=True),
+        sa.Column("created_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
+        sa.Column("updated_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
+        sa.PrimaryKeyConstraint("id"),
+        sa.ForeignKeyConstraint(["user_id"], ["users.id"], ondelete="CASCADE"),
+    )
+    op.create_index("ix_local_agent_configs_user_id", "local_agent_configs", ["user_id"])
+
+    op.execute(
+        """
+        DO $$ BEGIN
+            CREATE TYPE cloud_provider AS ENUM ('gmail', 'teams', 'outlook');
+        EXCEPTION WHEN duplicate_object THEN NULL;
+        END $$;
+        """
+    )
+
+    op.create_table(
+        "cloud_agent_configs",
+        sa.Column("id", postgresql.UUID(as_uuid=False), nullable=False),
+        sa.Column("user_id", postgresql.UUID(as_uuid=False), nullable=False),
+        sa.Column(
+            "provider",
+            postgresql.ENUM("gmail", "teams", "outlook", name="cloud_provider", create_type=False),
+            nullable=False,
+        ),
+        sa.Column("name", sa.String(255), nullable=False),
+        sa.Column("data_types", sa.JSON, nullable=False, server_default="[]"),
+        sa.Column("prompt_template", sa.Text, nullable=False, server_default=""),
+        sa.Column("oauth_token_encrypted", sa.Text, nullable=True),
+        sa.Column("filter_config", sa.JSON, nullable=True),
+        sa.Column("schedule_cron", sa.String(100), nullable=False, server_default="0 */6 * * *"),
+        sa.Column("enabled", sa.Boolean, nullable=False, server_default=sa.true()),
+        sa.Column("last_run_at", sa.DateTime(timezone=True), nullable=True),
+        sa.Column("created_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
+        sa.Column("updated_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.text("now()")),
+        sa.PrimaryKeyConstraint("id"),
+        sa.ForeignKeyConstraint(["user_id"], ["users.id"], ondelete="CASCADE"),
+    )
+    op.create_index("ix_cloud_agent_configs_user_id", "cloud_agent_configs", ["user_id"])
--- a/alembic/versions/a3b9c0d1e2f3_add_agent_config_to_local_agents.py
+++ b/alembic/versions/a3b9c0d1e2f3_add_agent_config_to_local_agents.py
@@ -0,0 +1,31 @@
+"""add agent_config to local_agent_configs
+
+Revision ID: a3b9c0d1e2f3
+Revises: 9a1f2d0b6c7e
+Create Date: 2026-04-07 00:00:00.000000
+
+"""
+from __future__ import annotations
+
+from typing import Sequence, Union
+
+from alembic import op
+import sqlalchemy as sa
+
+
+# revision identifiers, used by Alembic.
+revision: str = "a3b9c0d1e2f3"
+down_revision: Union[str, None] = "9a1f2d0b6c7e"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    op.add_column(
+        "local_agent_configs",
+        sa.Column("agent_config", sa.JSON(), nullable=True),
+    )
+
+
+def downgrade() -> None:
+    op.drop_column("local_agent_configs", "agent_config")
--- a/app/agents/init.py
+++ b/app/agents/init.py
@@ -1,5 +1,5 @@
-"""Import all agent modules to trigger @registry.register decorators."""
+"""Expose tool modules used by deep orchestrator-worker graphs."""

-from app.agents import checkpoint_agent, note_agent, project_agent, task_agent
+from app.agents import filesystem_agent, timeline_agent, note_agent, project_agent, task_agent

-__all__ = ["checkpoint_agent", "note_agent", "project_agent", "task_agent"]
+__all__ = ["filesystem_agent", "timeline_agent", "note_agent", "project_agent", "task_agent"]
--- a/app/agents/checkpoint_agent.py
+++ b/app/agents/checkpoint_agent.py
@@ -1,121 +0,0 @@
-"""Checkpoint agent — project milestone management (list, create, update, delete)."""
-
-from __future__ import annotations
-
-import json
-from typing import Any
-
-from langchain_core.messages import HumanMessage, SystemMessage
-from langchain_core.tools import tool
-
-from app.core.agent_registry import ChatAgent, registry
-from app.core.llm import get_llm
-
-_SYSTEM_PROMPT = (
-    "You are a project checkpoint assistant. Checkpoints are milestone dates that\n"
-    "track progress on a project — they are not calendar events.\n\n"
-    "Rules:\n"
-    "  - project_id is REQUIRED for every create; confirm with the user if unknown\n"
-    "  - date is a Unix timestamp in milliseconds; convert human-readable dates\n"
-    "  - is_ai_suggested: 1 when proactively proposing a checkpoint, 0 otherwise\n"
-    "  - is_approved: 0 until the user explicitly confirms; then 1\n"
-    "  - For update_checkpoint, use -1 for integer fields you do not want to change\n"
-    "  - Listing without a project_id returns all checkpoints across projects\n"
-    "  - Always echo the title and formatted date in your confirmation."
-)
-
-
-@tool
-async def list_checkpoints(project_id: str = "") -> str:
-    """List checkpoints. Provide project_id to scope to a specific project."""
-    return json.dumps({
-        "action": "list",
-        "table": "checkpoints",
-        "filters": {"projectId": project_id or None},
-    })
-
-
-@tool
-async def create_checkpoint(
-    project_id: str,
-    title: str,
-    date: int,
-    is_ai_suggested: int = 0,
-    is_approved: int = 0,
-) -> str:
-    """Create a project checkpoint (milestone).
-    project_id: REQUIRED UUID of the parent project
-    title: descriptive name for the milestone
-    date: Unix timestamp in milliseconds
-    is_ai_suggested: 1 if proactively suggested, 0 if user-requested
-    is_approved: 0 until the user confirms
-    """
-    return json.dumps({
-        "action": "create_record",
-        "table": "checkpoints",
-        "data": {
-            "projectId": project_id,
-            "title": title,
-            "date": date,
-            "isAiSuggested": is_ai_suggested,
-            "isApproved": is_approved,
-        },
-    })
-
-
-@tool
-async def update_checkpoint(
-    checkpoint_id: str,
-    title: str = "",
-    date: int = -1,
-    is_approved: int = -1,
-) -> str:
-    """Update a checkpoint. Only pass fields that should change.
-    checkpoint_id: UUID of the checkpoint (required)
-    date: -1 means unchanged; any other value sets the new date (ms timestamp)
-    is_approved: -1 means unchanged; 0 or 1 sets the approval state
-    """
-    updates: dict[str, Any] = {}
-    if title:
-        updates["title"] = title
-    if date != -1:
-        updates["date"] = date
-    if is_approved != -1:
-        updates["isApproved"] = is_approved
-    return json.dumps({
-        "action": "update_record",
-        "table": "checkpoints",
-        "data": {"id": checkpoint_id, "updates": updates},
-    })
-
-
-@tool
-async def delete_checkpoint(checkpoint_id: str) -> str:
-    """Delete a checkpoint permanently by its UUID."""
-    return json.dumps({
-        "action": "delete_record",
-        "table": "checkpoints",
-        "data": {"id": checkpoint_id},
-    })
-
-
-@registry.register
-class CheckpointAgent(ChatAgent):
-    def get_name(self) -> str:
-        return "checkpoint_agent"
-
-    def get_description(self) -> str:
-        return "Manages project checkpoints (milestones): list, create, update, delete"
-
-    def get_tools(self) -> list[Any]:
-        return [list_checkpoints, create_checkpoint, update_checkpoint, delete_checkpoint]
-
-    async def handle(self, query: str, context: dict[str, Any]) -> str:
-        llm = get_llm()
-        messages = [
-            SystemMessage(content=_SYSTEM_PROMPT),
-            HumanMessage(
-                content=f"User query: {query}\nContext: {json.dumps(context)[:1000]}"
-            ),
-        ]
-        return await self._tool_loop(llm, messages, self.get_tools())
--- a/app/agents/filesystem_agent.py
+++ b/app/agents/filesystem_agent.py
@@ -0,0 +1,85 @@
+"""Filesystem agent — tools for reading local directories and files on Electron.
+
+These tools delegate to the Electron client via ``execute_on_client()`` using
+the same WS tool-call round-trip pattern as CRUD tools.  The Electron app
+handles actual disk I/O and responds with ``tool_result`` frames.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+from langchain_core.tools import tool
+
+from app.core.ws_context import execute_on_client
+
+
+@tool
+async def list_directory(path: str) -> str:
+    """List files and folders in a local directory on the user's device.
+
+    Returns a formatted listing of entries with name, type (file/directory),
+    and full path.
+    """
+    result = await execute_on_client(
+        action="list_directory",
+        data={"path": path},
+    )
+    entries: list[dict[str, Any]] = result.get("entries", [])
+    if not entries:
+        return f"Directory '{path}' is empty or does not exist."
+    lines: list[str] = []
+    for entry in entries:
+        entry_type = entry.get("type", "unknown")
+        entry_name = entry.get("name", "")
+        entry_path = entry.get("path", "")
+        lines.append(f"- [{entry_type}] {entry_name}  ({entry_path})")
+    return f"Directory listing for '{path}' ({len(entries)} entries):\n" + "\n".join(lines)
+
+
+@tool
+async def read_file_content(path: str) -> str:
+    """Read the text content of a local file on the user's device.
+
+    Returns the file content as a string.  Large files may be truncated
+    by the Electron client.
+    """
+    result = await execute_on_client(
+        action="read_file_content",
+        data={"path": path},
+    )
+    content: str = result.get("content", "")
+    if not content:
+        return f"File '{path}' is empty or could not be read."
+    return content
+
+
+@tool
+async def get_file_metadata(path: str) -> str:
+    """Get metadata for a local file: size, creation date, modification date, extension.
+
+    Returns a formatted summary of the file's metadata.
+    """
+    result = await execute_on_client(
+        action="get_file_metadata",
+        data={"path": path},
+    )
+    size = result.get("size", "unknown")
+    created = result.get("createdAt", "unknown")
+    modified = result.get("modifiedAt", "unknown")
+    extension = result.get("extension", "unknown")
+    name = result.get("name", path)
+    return (
+        f"File: {name}\n"
+        f"  Extension: {extension}\n"
+        f"  Size: {size} bytes\n"
+        f"  Created: {created}\n"
+        f"  Modified: {modified}"
+    )
+
+
+FILESYSTEM_TOOLS: list[Any] = [
+    list_directory,
+    read_file_content,
+    get_file_metadata,
+]
--- a/app/agents/note_agent.py
+++ b/app/agents/note_agent.py
@@ -2,16 +2,23 @@

 from __future__ import annotations

-import json
+import re
 from typing import Any

-from langchain_core.messages import HumanMessage, SystemMessage
 from langchain_core.tools import tool

-from app.core.agent_registry import ChatAgent, registry
-from app.core.llm import get_llm
+from app.core.llm import embed
+from app.core.ws_context import execute_on_client

-_SYSTEM_PROMPT = (
+_UUID_RE = re.compile(
+    r"^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$"
+)
+
+
+def _is_uuid(value: str) -> bool:
+    return bool(_UUID_RE.match(value))
+
+NOTE_SYSTEM_PROMPT = (
    "You are a note-taking assistant. You help users create, retrieve, update,\n"
    "and delete Markdown notes in their workspace.\n\n"
    "Rules:\n"
@@ -21,6 +28,7 @@ _SYSTEM_PROMPT = (
    "    before appending or replacing sections\n"
    "  - list_notes without project_id returns all notes; scope with project_id\n"
    "    when the user is working within a specific project\n"
+    "  - project_id must be a UUID; if you only know a project name, do not pass it as project_id\n"
    "  - Do not fabricate note content — reflect what the user provides or what\n"
    "    is already in the note (retrieved via get_note)."
 )
@@ -29,21 +37,27 @@ _SYSTEM_PROMPT = (
@tool
 async def list_notes(project_id: str = "") -> str:
    """List notes, optionally scoped to a project by project_id."""
-    return json.dumps({
-        "action": "list",
-        "table": "notes",
-        "filters": {"projectId": project_id or None},
-    })
+    normalized_project_id = project_id if (project_id and _is_uuid(project_id)) else ""
+    result = await execute_on_client(
+        action="select",
+        table="notes",
+        filters={"projectId": normalized_project_id or None},
+    )
+    rows = result.get("rows", [])
+    if not rows:
+        return "No notes found."
+    lines = [f"- {r['title']} (id: {r['id']})" for r in rows]
+    return f"Found {len(rows)} note(s):\n" + "\n".join(lines)


@tool
 async def get_note(note_id: str) -> str:
    """Fetch a single note by its UUID to read its full Markdown content."""
-    return json.dumps({
-        "action": "get",
-        "table": "notes",
-        "data": {"id": note_id},
-    })
+    result = await execute_on_client(action="get", table="notes", data={"id": note_id})
+    row = result.get("row")
+    if not row:
+        return f"Note {note_id} not found."
+    return f"Note '{row['title']}' (id: {row['id']}):\n\n{row['content']}"


@tool
@@ -57,15 +71,24 @@ async def create_note(
    content: Markdown body text (required)
    project_id: optional UUID linking this note to a project
    """
-    return json.dumps({
-        "action": "create_record",
-        "table": "notes",
-        "data": {
+    result = await execute_on_client(
+        action="insert",
+        table="notes",
+        data={
            "title": title,
            "content": content,
            "projectId": project_id or None,
        },
-    })
+    )
+    row = result["row"]
+    # Index the note content in the vector store.
+    vector = await embed(content)
+    await execute_on_client(
+        action="vector_upsert",
+        data={"id": row["id"], "projectId": row.get("projectId"), "content": content},
+        vector=vector,
+    )
+    return f"Note created: '{row['title']}' (id: {row['id']})."


@tool
@@ -83,40 +106,34 @@ async def update_note(
        updates["title"] = title
    if content:
        updates["content"] = content
-    return json.dumps({
-        "action": "update_record",
-        "table": "notes",
-        "data": {"id": note_id, "updates": updates},
-    })
+    result = await execute_on_client(
+        action="update",
+        table="notes",
+        data={"id": note_id, "updates": updates},
+    )
+    row = result["row"]
+    # Re-index if content changed.
+    if content:
+        vector = await embed(content)
+        await execute_on_client(
+            action="vector_upsert",
+            data={"id": note_id, "projectId": row.get("projectId"), "content": content},
+            vector=vector,
+        )
+    return f"Note updated: '{row['title']}' (id: {row['id']})."


@tool
 async def delete_note(note_id: str) -> str:
    """Delete a note permanently by its UUID."""
-    return json.dumps({
-        "action": "delete_record",
-        "table": "notes",
-        "data": {"id": note_id},
-    })
+    await execute_on_client(action="delete", table="notes", data={"id": note_id})
+    return f"Note {note_id} deleted."


-@registry.register
-class NoteAgent(ChatAgent):
-    def get_name(self) -> str:
-        return "note_agent"
-
-    def get_description(self) -> str:
-        return "Manages notes: list, get, create, update, delete"
-
-    def get_tools(self) -> list[Any]:
-        return [list_notes, get_note, create_note, update_note, delete_note]
-
-    async def handle(self, query: str, context: dict[str, Any]) -> str:
-        llm = get_llm()
-        messages = [
-            SystemMessage(content=_SYSTEM_PROMPT),
-            HumanMessage(
-                content=f"User query: {query}\nContext: {json.dumps(context)[:1000]}"
-            ),
-        ]
-        return await self._tool_loop(llm, messages, self.get_tools())
+NOTE_TOOLS: list[Any] = [
+    list_notes,
+    get_note,
+    create_note,
+    update_note,
+    delete_note,
+]
--- a/app/agents/project_agent.py
+++ b/app/agents/project_agent.py
@@ -2,16 +2,13 @@

 from __future__ import annotations

-import json
 from typing import Any

-from langchain_core.messages import HumanMessage, SystemMessage
 from langchain_core.tools import tool

-from app.core.agent_registry import ChatAgent, registry
-from app.core.llm import get_llm
+from app.core.ws_context import execute_on_client

-_SYSTEM_PROMPT = (
+PROJECT_SYSTEM_PROMPT = (
    "You are a project management assistant. You help users create, find,\n"
    "update, and archive projects in their workspace.\n\n"
    "Rules:\n"
@@ -36,14 +33,19 @@ async def list_projects(
    """List projects, optionally filtered by client_id.
    include_archived: 1 to include archived projects, 0 for active only (default).
    """
-    return json.dumps({
-        "action": "list",
-        "table": "projects",
-        "filters": {
+    result = await execute_on_client(
+        action="select",
+        table="projects",
+        filters={
            "clientId": client_id or None,
            "includeArchived": bool(include_archived),
        },
-    })
+    )
+    rows = result.get("rows", [])
+    if not rows:
+        return "No projects found."
+    lines = [f"- {r['name']} (status: {r['status']}, id: {r['id']})" for r in rows]
+    return f"Found {len(rows)} project(s):\n" + "\n".join(lines)


@tool
@@ -51,20 +53,25 @@ async def list_all_projects() -> str:
    """List every project regardless of client or status.
    Use only when the user wants a complete cross-client overview.
    """
-    return json.dumps({
-        "action": "list_all",
-        "table": "projects",
-    })
+    result = await execute_on_client(action="select", table="projects")
+    rows = result.get("rows", [])
+    if not rows:
+        return "No projects found."
+    lines = [f"- {r['name']} (status: {r['status']}, id: {r['id']})" for r in rows]
+    return f"All projects ({len(rows)}):\n" + "\n".join(lines)


@tool
 async def get_project(project_id: str) -> str:
    """Fetch a single project by its UUID."""
-    return json.dumps({
-        "action": "get",
-        "table": "projects",
-        "data": {"id": project_id},
-    })
+    result = await execute_on_client(action="get", table="projects", data={"id": project_id})
+    row = result.get("row")
+    if not row:
+        return f"Project {project_id} not found."
+    return (
+        f"Project: '{row['name']}' (id: {row['id']}, status: {row['status']}, "
+        f"clientId: {row.get('clientId', 'none')})"
+    )


@tool
@@ -76,14 +83,13 @@ async def create_project(
    name: human-readable project name (required)
    client_id: optional UUID of the owning client
    """
-    return json.dumps({
-        "action": "create_record",
-        "table": "projects",
-        "data": {
-            "name": name,
-            "clientId": client_id or None,
-        },
-    })
+    result = await execute_on_client(
+        action="insert",
+        table="projects",
+        data={"name": name, "clientId": client_id or None},
+    )
+    row = result["row"]
+    return f"Project created: '{row['name']}' (id: {row['id']})"


@tool
@@ -108,11 +114,13 @@ async def update_project(
        updates["status"] = status
    if ai_summary:
        updates["aiSummary"] = ai_summary
-    return json.dumps({
-        "action": "update_record",
-        "table": "projects",
-        "data": {"id": project_id, "updates": updates},
-    })
+    result = await execute_on_client(
+        action="update",
+        table="projects",
+        data={"id": project_id, "updates": updates},
+    )
+    row = result["row"]
+    return f"Project updated: '{row['name']}' (id: {row['id']}, status: {row['status']})"


@tool
@@ -121,37 +129,15 @@ async def delete_project(project_id: str) -> str:
    IMPORTANT: prefer update_project(status='archived') unless the user
    has explicitly confirmed they want permanent deletion.
    """
-    return json.dumps({
-        "action": "delete_record",
-        "table": "projects",
-        "data": {"id": project_id},
-    })
+    await execute_on_client(action="delete", table="projects", data={"id": project_id})
+    return f"Project {project_id} permanently deleted."


-@registry.register
-class ProjectAgent(ChatAgent):
-    def get_name(self) -> str:
-        return "project_agent"
-
-    def get_description(self) -> str:
-        return "Manages projects: list, get, create, update, archive, delete"
-
-    def get_tools(self) -> list[Any]:
-        return [
-            list_projects,
-            list_all_projects,
-            get_project,
-            create_project,
-            update_project,
-            delete_project,
-        ]
-
-    async def handle(self, query: str, context: dict[str, Any]) -> str:
-        llm = get_llm()
-        messages = [
-            SystemMessage(content=_SYSTEM_PROMPT),
-            HumanMessage(
-                content=f"User query: {query}\nContext: {json.dumps(context)[:1000]}"
-            ),
-        ]
-        return await self._tool_loop(llm, messages, self.get_tools())
+PROJECT_TOOLS: list[Any] = [
+    list_projects,
+    list_all_projects,
+    get_project,
+    create_project,
+    update_project,
+    delete_project,
+]
--- a/app/agents/task_agent.py
+++ b/app/agents/task_agent.py
@@ -2,16 +2,23 @@

 from __future__ import annotations

-import json
+from datetime import datetime, timezone
+import re
 from typing import Any

-from langchain_core.messages import HumanMessage, SystemMessage
 from langchain_core.tools import tool

-from app.core.agent_registry import ChatAgent, registry
-from app.core.llm import get_llm
+from app.core.ws_context import execute_on_client

-_SYSTEM_PROMPT = (
+_UUID_RE = re.compile(
+    r"^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$"
+)
+
+
+def _is_uuid(value: str) -> bool:
+    return bool(_UUID_RE.match(value))
+
+TASK_SYSTEM_PROMPT = (
    "You are a task management assistant for a project workspace.\n"
    "You create, update, list, and track tasks and their comments.\n\n"
    "Rules:\n"
@@ -22,7 +29,7 @@ _SYSTEM_PROMPT = (
    "  - project_id is optional; link to a project when the user mentions one\n"
    "  - is_ai_suggested: 1 only when proactively proposing a task the user\n"
    "    did not explicitly request; 0 otherwise\n"
-    "  - is_approved defaults to 0; set to 1 only when the user confirms\n"
+    "  - is_ai_suggested: 1 only when proactively proposing a task the user did not explicitly request; 0 otherwise\n"
    "  - Use list_tasks_due_today for 'what's due today' queries\n"
    "  - For update_task, use -1 for integer fields you do not want to change\n"
    "  - Always confirm the action in plain, user-friendly language."
@@ -41,16 +48,25 @@ async def list_tasks(
 ) -> str:
    """List tasks, optionally filtered by project_id, status (todo|in_progress|done),
    a search string, or an order_by field name (dueDate|priority|createdAt)."""
-    return json.dumps({
-        "action": "list",
-        "table": "tasks",
-        "filters": {
-            "projectId": project_id or None,
+    normalized_project_id = project_id if (project_id and _is_uuid(project_id)) else ""
+    result = await execute_on_client(
+        action="select",
+        table="tasks",
+        filters={
+            "projectId": normalized_project_id or None,
            "status": status or None,
            "search": search or None,
            "orderBy": order_by or None,
        },
-    })
+    )
+    rows = result.get("rows", [])
+    if not rows:
+        return "No tasks found matching the given filters."
+    lines = [
+        f"- {r['title']} (status: {r['status']}, priority: {r['priority']}, id: {r['id']})"
+        for r in rows
+    ]
+    return f"Found {len(rows)} task(s):\n" + "\n".join(lines)


@tool
@@ -63,7 +79,6 @@ async def create_task(
    due_date: int = 0,
    project_id: str = "",
    is_ai_suggested: int = 0,
-    is_approved: int = 0,
 ) -> str:
    """Create a new task.
    title: task title (required)
@@ -74,12 +89,11 @@ async def create_task(
    due_date: Unix timestamp in milliseconds; 0 means no due date
    project_id: optional UUID of the parent project
    is_ai_suggested: 1 if proactively suggested, 0 if user-requested
-    is_approved: 0 until the user confirms; 1 when confirmed
    """
-    return json.dumps({
-        "action": "create_record",
-        "table": "tasks",
-        "data": {
+    result = await execute_on_client(
+        action="insert",
+        table="tasks",
+        data={
            "title": title,
            "description": description or None,
            "status": status,
@@ -88,9 +102,13 @@ async def create_task(
            "dueDate": due_date or None,
            "projectId": project_id or None,
            "isAiSuggested": is_ai_suggested,
-            "isApproved": is_approved,
        },
-    })
+    )
+    row = result["row"]
+    return (
+        f"Task created: '{row['title']}' "
+        f"(id: {row['id']}, status: {row['status']}, priority: {row['priority']})"
+    )


@tool
@@ -103,12 +121,10 @@ async def update_task(
    assignees: str = "",
    due_date: int = -1,
    project_id: str = "",
-    is_approved: int = -1,
 ) -> str:
    """Update fields on an existing task. Only pass fields you want to change.
    task_id: the task's UUID (required)
    due_date: -1 means unchanged; 0 clears the due date; any positive value sets it
-    is_approved: -1 means unchanged; 0 or 1 sets the value
    """
    updates: dict[str, Any] = {}
    if title:
@@ -125,32 +141,41 @@ async def update_task(
        updates["dueDate"] = due_date or None
    if project_id:
        updates["projectId"] = project_id
-    if is_approved != -1:
-        updates["isApproved"] = is_approved
-    return json.dumps({
-        "action": "update_record",
-        "table": "tasks",
-        "data": {"id": task_id, "updates": updates},
-    })
+    result = await execute_on_client(
+        action="update",
+        table="tasks",
+        data={"id": task_id, "updates": updates},
+    )
+    row = result["row"]
+    return f"Task updated: '{row['title']}' (id: {row['id']}, status: {row['status']})"


@tool
 async def delete_task(task_id: str) -> str:
    """Delete a task permanently by its UUID."""
-    return json.dumps({
-        "action": "delete_record",
-        "table": "tasks",
-        "data": {"id": task_id},
-    })
+    await execute_on_client(action="delete", table="tasks", data={"id": task_id})
+    return f"Task {task_id} deleted."


@tool
 async def list_tasks_due_today() -> str:
    """List all tasks whose due date falls on today's date."""
-    return json.dumps({
-        "action": "list_due_today",
-        "table": "tasks",
-    })
+    now = datetime.now(tz=timezone.utc)
+    start_ms = int(datetime(now.year, now.month, now.day, tzinfo=timezone.utc).timestamp() * 1000)
+    end_ms = start_ms + 86_400_000 - 1  # last ms of today
+    result = await execute_on_client(
+        action="select",
+        table="tasks",
+        filters={"dueDateFrom": start_ms, "dueDateTo": end_ms},
+    )
+    rows = result.get("rows", [])
+    if not rows:
+        return "No tasks are due today."
+    lines = [
+        f"- {r['title']} (priority: {r['priority']}, status: {r['status']}, id: {r['id']})"
+        for r in rows
+    ]
+    return f"Tasks due today ({len(rows)}):\n" + "\n".join(lines)


 # ── Task comment tools ────────────────────────────────────────────────
@@ -159,11 +184,16 @@ async def list_tasks_due_today() -> str:
@tool
 async def list_task_comments(task_id: str) -> str:
    """List all comments on a task by its UUID."""
-    return json.dumps({
-        "action": "list",
-        "table": "taskComments",
-        "filters": {"taskId": task_id},
-    })
+    result = await execute_on_client(
+        action="select",
+        table="taskComments",
+        filters={"taskId": task_id},
+    )
+    rows = result.get("rows", [])
+    if not rows:
+        return f"No comments found for task {task_id}."
+    lines = [f"- [{r['author']}]: {r['content']} (id: {r['id']})" for r in rows]
+    return f"Found {len(rows)} comment(s):\n" + "\n".join(lines)


@tool
@@ -173,56 +203,36 @@ async def add_task_comment(task_id: str, author: str, content: str) -> str:
    author: name or ID of the comment author
    content: comment text
    """
-    return json.dumps({
-        "action": "create_record",
-        "table": "taskComments",
-        "data": {
-            "taskId": task_id,
-            "author": author,
-            "content": content,
-        },
-    })
+    result = await execute_on_client(
+        action="insert",
+        table="taskComments",
+        data={"taskId": task_id, "author": author, "content": content},
+    )
+    row = result.get("row", {})
+    row_author = row.get("author", author)
+    # Electron payloads can vary (taskId vs task_id). Fall back to input task_id.
+    row_task_id = row.get("taskId") or row.get("task_id") or task_id
+    row_comment_id = row.get("id", "unknown")
+    return f"Comment added by {row_author} on task {row_task_id} (comment id: {row_comment_id})."


@tool
 async def delete_task_comment(comment_id: str) -> str:
    """Delete a task comment by its UUID."""
-    return json.dumps({
-        "action": "delete_record",
-        "table": "taskComments",
-        "data": {"id": comment_id},
-    })
+    await execute_on_client(action="delete", table="taskComments", data={"id": comment_id})
+    return f"Comment {comment_id} deleted."


 # ── Agent ─────────────────────────────────────────────────────────────


-@registry.register
-class TaskAgent(ChatAgent):
-    def get_name(self) -> str:
-        return "task_agent"
-
-    def get_description(self) -> str:
-        return "Manages tasks and comments: list, create, update, delete, due-today, comments"
-
-    def get_tools(self) -> list[Any]:
-        return [
-            list_tasks,
-            create_task,
-            update_task,
-            delete_task,
-            list_tasks_due_today,
-            list_task_comments,
-            add_task_comment,
-            delete_task_comment,
-        ]
-
-    async def handle(self, query: str, context: dict[str, Any]) -> str:
-        llm = get_llm()
-        messages = [
-            SystemMessage(content=_SYSTEM_PROMPT),
-            HumanMessage(
-                content=f"User query: {query}\nContext: {json.dumps(context)[:1000]}"
-            ),
-        ]
-        return await self._tool_loop(llm, messages, self.get_tools())
+TASK_TOOLS: list[Any] = [
+    list_tasks,
+    create_task,
+    update_task,
+    delete_task,
+    list_tasks_due_today,
+    list_task_comments,
+    add_task_comment,
+    delete_task_comment,
+]
--- a/app/agents/timeline_agent.py
+++ b/app/agents/timeline_agent.py
@@ -0,0 +1,114 @@
+"""Timeline agent — project milestone management (list, create, update, delete)."""
+
+from __future__ import annotations
+
+import re
+from typing import Any
+
+from langchain_core.tools import tool
+
+from app.core.ws_context import execute_on_client
+
+_UUID_RE = re.compile(
+    r"^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$"
+)
+
+
+def _is_uuid(value: str) -> bool:
+    return bool(_UUID_RE.match(value))
+
+TIMELINE_SYSTEM_PROMPT = (
+    "You are a project timeline assistant. Timelines are milestone dates that\n"
+    "track progress on a project — they are not calendar events.\n\n"
+    "Rules:\n"
+    "  - project_id is REQUIRED for every create; confirm with the user if unknown\n"
+    "  - For listing, project_id must be a UUID; never pass plain names as project_id\n"
+    "  - date is a Unix timestamp in milliseconds; convert human-readable dates\n"
+    "  - is_ai_suggested: 1 when proactively proposing a timeline, 0 otherwise\n"
+    "  - is_ai_suggested: 1 when proactively proposing a timeline, 0 otherwise\n"
+    "  - For update_timeline, use -1 for integer fields you do not want to change\n"
+    "  - Listing without a project_id returns all timelines across projects\n"
+    "  - Always echo the title and formatted date in your confirmation."
+)
+
+
+@tool
+async def list_timelines(project_id: str = "") -> str:
+    """List timelines. Provide project_id to scope to a specific project."""
+    normalized_project_id = project_id if (project_id and _is_uuid(project_id)) else ""
+    result = await execute_on_client(
+        action="select",
+        table="timelines",
+        filters={"projectId": normalized_project_id or None},
+    )
+    rows = result.get("rows", [])
+    if not rows:
+        return "No timelines found."
+    lines = [f"- {r['title']} (date: {r['date']}, id: {r['id']})" for r in rows]
+    return f"Found {len(rows)} timeline(s):\n" + "\n".join(lines)
+
+
+@tool
+async def create_timeline(
+    project_id: str,
+    title: str,
+    date: int,
+    is_ai_suggested: int = 0,
+) -> str:
+    """Create a project timeline (milestone).
+    project_id: REQUIRED UUID of the parent project
+    title: descriptive name for the milestone
+    date: Unix timestamp in milliseconds
+    is_ai_suggested: 1 if proactively suggested, 0 if user-requested
+    """
+    result = await execute_on_client(
+        action="insert",
+        table="timelines",
+        data={
+            "projectId": project_id,
+            "title": title,
+            "date": date,
+            "isAiSuggested": is_ai_suggested,
+        },
+    )
+    row = result["row"]
+    return f"Timeline created: '{row['title']}' (id: {row['id']}, date: {row['date']})"
+
+
+@tool
+async def update_timeline(
+    timeline_id: str,
+    title: str = "",
+    date: int = -1,
+) -> str:
+    """Update a timeline. Only pass fields that should change.
+    timeline_id: UUID of the timeline (required)
+    date: -1 means unchanged; any other value sets the new date (ms timestamp)
+    """
+    updates: dict[str, Any] = {}
+    if title:
+        updates["title"] = title
+    if date != -1:
+        updates["date"] = date
+    result = await execute_on_client(
+        action="update",
+        table="timelines",
+        data={"id": timeline_id, "updates": updates},
+    )
+    row = result["row"]
+    return f"Timeline updated: '{row['title']}' (id: {row['id']})"
+
+
+@tool
+async def delete_timeline(timeline_id: str) -> str:
+    """Delete a timeline permanently by its UUID."""
+    await execute_on_client(action="delete", table="timelines", data={"id": timeline_id})
+    return f"Timeline {timeline_id} deleted."
+
+
+TIMELINE_TOOLS: list[Any] = [
+    list_timelines,
+    create_timeline,
+    update_timeline,
+    delete_timeline,
+]
--- a/app/api/middleware/auth.py
+++ b/app/api/middleware/auth.py
@@ -55,11 +55,26 @@ async def get_current_user(
        raise credentials_exc

    # Live tier lookup — subscription row is the authoritative source.
-    from app.models import Subscription  # noqa: PLC0415
+    # In dev, fall back to 'power' (unlimited) so quota limits don't
+    # block local development when no Stripe subscription exists.
+    from app.models import Subscription, User  # noqa: PLC0415

    result = await db.execute(
        select(Subscription.tier).where(Subscription.user_id == user_id)
    )
-    tier: str = result.scalar_one_or_none() or "free"
+    default_tier = "power" if settings.ENV == "dev" else "free"
+    tier: str = result.scalar_one_or_none() or default_tier

-    return UserProfile(id=user_id, email=email, tier=tier)  # type: ignore[arg-type]
+    # Fetch name/surname from user row.
+    user_result = await db.execute(
+        select(User.name, User.surname).where(User.id == user_id)
+    )
+    user_row = user_result.one_or_none()
+
+    return UserProfile(
+        id=user_id,
+        email=email,
+        name=user_row.name if user_row else None,
+        surname=user_row.surname if user_row else None,
+        tier=tier,
+    )  # type: ignore[arg-type]
--- a/app/api/routes/agent_setup.py
+++ b/app/api/routes/agent_setup.py
@@ -0,0 +1,495 @@
+"""Chatbot Journey — WS-based guided conversation to build an AgentConfig.
+
+The journey is driven entirely through WebSocket frames (no REST endpoints).
+The device WS handler dispatches ``journey_start`` and ``journey_message``
+frames to the functions exported here.
+
+Journey flow:
+  1. FE sends ``journey_start`` frame with basic agent info (directory,
+     data_types, schedule).
+  2. Server creates an in-memory session, sets up a WS executor so the
+     setup LLM can use file-system tools, does a first directory scrape,
+     and sends back a ``journey_reply`` with the first question.
+  3. FE sends ``journey_message`` frames for each user reply.
+  4. Server appends the user message, calls the LLM (which may read files
+     via tools), and sends back a ``journey_reply``.
+  5. After 3-5 turns the LLM wraps up by emitting an ``AgentConfig`` JSON
+     block delimited by ``AGENT_CONFIG_START`` / ``AGENT_CONFIG_END``.
+  6. Server parses and validates the JSON with Pydantic, sends
+     ``journey_reply`` with ``done=True`` and the serialised config.
+     FE stores it locally.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import time
+import uuid
+from dataclasses import dataclass, field
+from typing import Any
+
+from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage
+
+from app.agents.filesystem_agent import FILESYSTEM_TOOLS
+from app.config.settings import settings
+from app.core.langfuse_client import compile_prompt, extract_usage, get_langfuse, get_prompt_or_fallback
+from app.core.llm import get_llm
+from app.schemas import AgentConfig
+
+logger = logging.getLogger(__name__)
+
+# ── Session TTL ───────────────────────────────────────────────────────────
+
+_SESSION_TTL_SECONDS: int = 1800  # 30 minutes
+
+# Sentinel strings used to delimit the LLM-produced AgentConfig JSON.
+_CONFIG_START = "AGENT_CONFIG_START"
+_CONFIG_END = "AGENT_CONFIG_END"
+
+# Minimum turns before we consider nudging the LLM to wrap up.
+_MIN_TURNS_BEFORE_NUDGE: int = 3
+# Hard cap to avoid infinite loops (safety net, not the primary stopping criterion).
+_MAX_TURNS: int = 15
+# Max tool-calling steps per LLM invocation.
+_MAX_TOOL_STEPS: int = 6
+
+# ── In-memory session store ───────────────────────────────────────────────
+
+
+@dataclass
+class JourneySession:
+    session_id: str
+    user_id: str
+    agent_type: str  # "local" | "cloud"
+    directory: str
+    data_types: list[str]
+    history: list[dict[str, Any]] = field(default_factory=list)
+    system_prompt: str = ""
+    langfuse_prompt: Any = None
+    created_at: float = field(default_factory=time.monotonic)
+
+    def is_expired(self) -> bool:
+        return (time.monotonic() - self.created_at) > _SESSION_TTL_SECONDS
+
+
+# session_id → session
+_sessions: dict[str, JourneySession] = {}
+
+
+def get_journey_session(session_id: str, user_id: str) -> JourneySession | None:
+    """Retrieve session; return None on missing, expired, or wrong owner."""
+    s = _sessions.get(session_id)
+    if s is None or s.is_expired():
+        _sessions.pop(session_id, None)
+        return None
+    if s.user_id != user_id:
+        return None
+    return s
+
+
+# ── System prompt ─────────────────────────────────────────────────────────
+
+_JOURNEY_SYSTEM_PROMPT = """\
+You are a friendly assistant helping a freelancer configure a data-extraction agent.
+Your job is to understand what files the user has in their directory and produce a
+structured AgentConfig JSON that the extraction agent will use as its instruction set.
+
+You have access to file-system tools to explore the user's directory:
+- list_directory: see folder structure and file names
+- read_file_content: peek at a file's content
+- get_file_metadata: check file size, extension, dates
+
+The user's configured directory is: {directory}
+Target data types: {data_types}
+
+## Your process
+
+### Step 1 — Explore the directory
+Use list_directory and read_file_content to understand what types of files are present
+(HTML emails, plain-text documents, CSVs, etc.).
+
+### Step 2 — Identify content types
+For each distinct file type found, decide:
+- A short id (e.g. "email_html", "plain_text", "csv")
+- Which preprocessing handler to use: "email_html" for HTML emails, "generic" for everything else
+- A human-readable label and optional detection_hint
+
+### Step 3 — Ask focused questions (one at a time)
+Cover these topics based on what you discovered:
+1. How to map content to entity types (task / note / timeline entry)
+2. Field mapping rules (e.g. email Subject → task title, filename → note title)
+3. Priority or status rules (e.g. "urgent" in subject → high priority)
+4. Date extraction (e.g. "by Friday" → dueDate)
+5. Exclusion rules (e.g. skip newsletters, skip files with no project match)
+
+### Step 4 — Produce the AgentConfig JSON
+Once you are ≥ 90% confident, output the final config between these exact markers
+(each on its own line):
+
+{config_start}
+{{
+  "content_types": [
+    {{
+      "id": "email_html",
+      "label": "Email HTML",
+      "detection_hint": "HTML file with From/To/Subject headers",
+      "preprocessing": "email_html",
+      "extraction_prompt": "Detailed extraction instructions for this content type..."
+    }}
+  ],
+  "global_rules": [
+    "If the file cannot be matched to any project, do not create any entity."
+  ],
+  "data_types": {data_types_json}
+}}
+{config_end}
+
+## Rules for the extraction_prompt field
+- Describe when to create a task vs note vs timeline entry (be specific and concrete)
+- Include field mapping rules based on what you found in the directory
+- Include priority/status/date rules if applicable
+- Do NOT include projectId logic — the runner handles project assignment automatically
+- Do NOT mention isAiSuggested — the runner always sets it to 1
+
+## Constraints
+- Never ask about projects, projectId, or how to link records to projects
+- Never include projectId or project creation logic in the generated config
+- Keep asking questions until ≥ 90% confident, then output the JSON immediately
+
+{existing_section}\
+Begin by exploring the directory, then ask your first question.\
+"""
+
+
+def _build_system_prompt(
+    directory: str,
+    data_types: list[str],
+    existing_config: str | None = None,
+) -> tuple[str, Any]:
+    """Return ``(compiled_system_prompt, langfuse_prompt_obj_or_None)``."""
+    existing_section = (
+        "\nThe user already has the following AgentConfig — refine it based on their answers:\n"
+        f"```json\n{existing_config}\n```\n"
+        if existing_config
+        else ""
+    )
+    template, prompt_obj = get_prompt_or_fallback(
+        "journey_system", _JOURNEY_SYSTEM_PROMPT
+    )
+    compiled = compile_prompt(
+        template,
+        prompt_obj,
+        directory=directory,
+        data_types=", ".join(data_types),
+        data_types_json=json.dumps(data_types),
+        config_start=_CONFIG_START,
+        config_end=_CONFIG_END,
+        existing_section=existing_section,
+    )
+    return compiled, prompt_obj
+
+
+# ── AgentConfig extraction ────────────────────────────────────────────────
+
+
+def _extract_agent_config(text: str) -> str | None:
+    """Return validated AgentConfig JSON string from between markers, or None.
+
+    Parses the JSON with Pydantic to ensure it conforms to the schema before
+    returning.  Returns None if markers are absent or JSON is invalid.
+    """
+    if _CONFIG_START not in text or _CONFIG_END not in text:
+        return None
+    start_idx = text.index(_CONFIG_START) + len(_CONFIG_START)
+    end_idx = text.index(_CONFIG_END)
+    raw = text[start_idx:end_idx].strip()
+    if not raw:
+        return None
+    try:
+        parsed = AgentConfig.model_validate_json(raw)
+        return parsed.model_dump_json()
+    except Exception as exc:
+        logger.warning("agent_setup: failed to parse AgentConfig JSON: %s", exc)
+        return None
+
+
+# ── LLM call with tool support ───────────────────────────────────────────
+
+
+def _as_text(content: Any) -> str:
+    if content is None:
+        return ""
+    if isinstance(content, str):
+        return content
+    if isinstance(content, list):
+        parts: list[str] = []
+        for item in content:
+            if isinstance(item, str):
+                parts.append(item)
+            elif isinstance(item, dict):
+                text = item.get("text")
+                if isinstance(text, str):
+                    parts.append(text)
+        return "".join(parts)
+    return str(content)
+
+
+async def _call_llm_with_tools(
+    system_prompt: str,
+    history: list[dict[str, Any]],
+    tools: list[Any],
+    *,
+    user_id: str = "",
+    session_id: str = "",
+    langfuse_prompt: Any = None,
+) -> str:
+    """Build LangChain messages from history and invoke the LLM with tools.
+
+    Handles tool-calling loops: if the LLM calls tools, execute them and
+    continue until a final text response is produced.
+    """
+    lf = get_langfuse()
+    messages: list[Any] = [SystemMessage(content=system_prompt)]
+    for turn in history:
+        if turn["role"] == "user":
+            messages.append(HumanMessage(content=turn["content"]))
+        else:
+            messages.append(AIMessage(content=turn["content"]))
+
+    llm = get_llm(model=None, temperature=0.4)
+    llm_with_tools = llm.bind_tools(tools)
+    tool_map = {tool_def.name: tool_def for tool_def in tools}
+
+    _span_ctx = (
+        lf.start_as_current_observation(
+            as_type="span",
+            name="journey-setup",
+            metadata={"user_id": user_id or None, "session_id": session_id or None},
+            input=history[-1]["content"] if history else "",
+        )
+        if lf else None
+    )
+    _span = _span_ctx.__enter__() if _span_ctx else None
+
+    try:
+        for _ in range(_MAX_TOOL_STEPS):
+            _gen_ctx = (
+                lf.start_as_current_observation(
+                    as_type="generation",
+                    name="journey-setup-llm",
+                    model=settings.LLM_MODEL,
+                    prompt=langfuse_prompt,
+                    input=messages,
+                )
+                if lf else None
+            )
+            _gen = _gen_ctx.__enter__() if _gen_ctx else None
+            response: AIMessage = await llm_with_tools.ainvoke(messages)
+            if _gen_ctx:
+                _gen.update(output=_as_text(response.content), usage=extract_usage(response))
+                _gen_ctx.__exit__(None, None, None)
+
+            messages.append(response)
+
+            if not response.tool_calls:
+                if _span:
+                    _span.update(output=_as_text(response.content))
+                return _as_text(response.content)
+
+            for call in response.tool_calls:
+                call_name = str(call.get("name", ""))
+                call_args = call.get("args", {})
+                logger.info(
+                    "agent_setup: journey tool_call name=%s args=%s",
+                    call_name,
+                    json.dumps(call_args, ensure_ascii=True)[:500],
+                )
+
+                tool_fn = tool_map.get(call_name)
+                if tool_fn is None:
+                    tool_output = f"Unknown tool: {call_name}"
+                else:
+                    tool_output = await tool_fn.ainvoke(call_args)
+
+                logger.info(
+                    "agent_setup: journey tool_result name=%s output=%s",
+                    call_name,
+                    str(tool_output)[:800],
+                )
+                messages.append(ToolMessage(content=str(tool_output), tool_call_id=call["id"]))
+
+        # Fallback: exceeded max steps.
+        final = await llm.ainvoke(messages)
+        final_text = _as_text(final.content)
+        if _span:
+            _span.update(output=final_text)
+        return final_text
+    finally:
+        if _span_ctx:
+            _span_ctx.__exit__(None, None, None)
+        if lf:
+            lf.flush()
+
+
+# ── Journey handlers (called from device_ws.py) ──────────────────────────
+
+
+async def handle_journey_start(
+    user_id: str,
+    frame: dict[str, Any],
+) -> dict[str, Any]:
+    """Handle a ``journey_start`` WS frame.
+
+    Creates a session, runs the setup LLM with directory exploration,
+    and returns the ``journey_reply`` payload.
+    """
+    agent_type = frame.get("agent_type", "local")
+    directory = frame.get("directory", "")
+    data_types = frame.get("data_types", [])
+    existing_config = frame.get("existing_config")
+
+    # Use the session_id provided by the FE so the reply matches the
+    # listener key; fall back to a generated one if absent.
+    session_id = frame.get("session_id") or str(uuid.uuid4())
+    system_prompt, langfuse_prompt = _build_system_prompt(directory, data_types, existing_config)
+
+    session = JourneySession(
+        session_id=session_id,
+        user_id=user_id,
+        agent_type=agent_type,
+        directory=directory,
+        data_types=data_types,
+        system_prompt=system_prompt,
+        langfuse_prompt=langfuse_prompt,
+    )
+
+    # Seed with an initial user message — some providers require at least one
+    # user/input message to be present.
+    seed_history: list[dict[str, Any]] = [
+        {"role": "user", "content": "Hi, I'm ready to set up my agent. Please explore my directory and ask me your first question."},
+    ]
+    ai_reply = await _call_llm_with_tools(
+        system_prompt=system_prompt,
+        history=seed_history,
+        tools=list(FILESYSTEM_TOOLS),
+        user_id=user_id,
+        session_id=session_id,
+        langfuse_prompt=langfuse_prompt,
+    )
+
+    session.history.extend(seed_history)
+    session.history.append({"role": "assistant", "content": ai_reply})
+    _sessions[session_id] = session
+
+    logger.info(
+        "agent_setup: journey session %s started for user %s (directory=%s)",
+        session_id,
+        user_id,
+        directory,
+    )
+
+    # Check if the LLM produced the config on the first turn (unlikely but possible).
+    agent_config = _extract_agent_config(ai_reply)
+    done = agent_config is not None
+
+    display_message = ai_reply
+    if done:
+        display_message = (
+            ai_reply[: ai_reply.index(_CONFIG_START)].strip()
+            or "Here is your agent configuration. You can save it or continue refining."
+        )
+        _sessions.pop(session_id, None)
+
+    return {
+        "type": "journey_reply",
+        "session_id": session_id,
+        "message": display_message,
+        "done": done,
+        "agent_config": agent_config,
+    }
+
+
+async def handle_journey_message(
+    user_id: str,
+    frame: dict[str, Any],
+) -> dict[str, Any]:
+    """Handle a ``journey_message`` WS frame.
+
+    Appends the user message, calls the LLM, and returns the
+    ``journey_reply`` payload.
+    """
+    session_id = frame.get("session_id", "")
+    message = frame.get("message", "")
+
+    session = get_journey_session(session_id, user_id)
+    if session is None:
+        return {
+            "type": "journey_reply",
+            "session_id": session_id,
+            "message": "Journey session not found or expired. Please start a new setup.",
+            "done": True,
+            "agent_config": None,
+        }
+
+    # Append user turn.
+    session.history.append({"role": "user", "content": message})
+
+    # Call the LLM with tools.
+    ai_reply = await _call_llm_with_tools(
+        system_prompt=session.system_prompt,
+        history=session.history,
+        tools=list(FILESYSTEM_TOOLS),
+        user_id=session.user_id,
+        session_id=session_id,
+        langfuse_prompt=session.langfuse_prompt,
+    )
+
+    session.history.append({"role": "assistant", "content": ai_reply})
+
+    # Check if the LLM produced the final config.
+    agent_config = _extract_agent_config(ai_reply)
+    done = agent_config is not None
+
+    # If the LLM didn't produce a config, nudge it once it hits the hard safety cap.
+    if not done:
+        turns = sum(1 for t in session.history if t["role"] == "user")
+        if turns >= _MAX_TURNS:
+            nudge_content = (
+                "[System: You have enough information. Please generate the final "
+                f"AgentConfig JSON now, wrapped in {_CONFIG_START} / {_CONFIG_END} markers.]"
+            )
+            session.history.append({"role": "user", "content": nudge_content})
+
+            nudge_reply = await _call_llm_with_tools(
+                system_prompt=session.system_prompt,
+                history=session.history,
+                tools=list(FILESYSTEM_TOOLS),
+                user_id=session.user_id,
+                session_id=session_id,
+                langfuse_prompt=session.langfuse_prompt,
+            )
+            session.history.append({"role": "assistant", "content": nudge_reply})
+
+            agent_config = _extract_agent_config(nudge_reply)
+            if agent_config is not None:
+                done = True
+                ai_reply = nudge_reply
+
+    display_message = ai_reply
+    if done:
+        display_message = (
+            ai_reply[: ai_reply.index(_CONFIG_START)].strip()
+            if _CONFIG_START in ai_reply
+            else "Here is your agent configuration. You can save it or continue refining."
+        )
+        _sessions.pop(session_id, None)
+        logger.info("agent_setup: journey session %s completed for user %s", session_id, user_id)
+
+    return {
+        "type": "journey_reply",
+        "session_id": session_id,
+        "message": display_message,
+        "done": done,
+        "agent_config": agent_config,
+    }
--- a/app/api/routes/agents.py
+++ b/app/api/routes/agents.py
@@ -0,0 +1,222 @@
+"""Agent routes.
+
+Backend responsibilities are intentionally minimal:
+    GET  /agents/catalog         — static catalog for UI display
+    POST /agents/can-create      — billing eligibility check
+    POST /agents/trigger         — trigger a local agent run
+
+Agent configuration is owned by the Electron app and is not persisted
+in backend agent-config tables.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import uuid
+from datetime import datetime, timedelta, timezone
+
+from fastapi import APIRouter, Depends, HTTPException, status
+from sqlalchemy import func, select
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.api.deps import get_current_user
+from app.billing.tier_manager import FEATURES
+from app.core.agent_runner import is_agent_running, run_local_agent
+from app.core.device_manager import device_manager
+from app.db import get_session
+from app.models import AgentRunLog, LocalAgentConfig
+from app.schemas import (
+    AgentCatalogItem,
+    AgentCreationCheckRequest,
+    AgentCreationCheckResponse,
+    AgentRunLogResponse,
+    AgentTriggerRequest,
+    UserProfile,
+)
+
+router = APIRouter(prefix="/agents", tags=["agents"])
+
+
+# ── Datetime helpers ──────────────────────────────────────────────────
+
+def _dt_ms(dt: datetime) -> int:
+    return int(dt.timestamp() * 1000)
+
+
+def _dt_ms_opt(dt: datetime | None) -> int | None:
+    return int(dt.timestamp() * 1000) if dt else None
+
+
+def _to_data_types(values: list[str]) -> list[str]:
+    normalize = {
+        "task": "tasks",           "tasks": "tasks",
+        "note": "notes",           "notes": "notes",
+        "timeline": "timelines",   "timelines": "timelines",   "timelineEvents": "timelines",
+        "project": "projects",     "projects": "projects",
+    }
+    seen: set[str] = set()
+    result: list[str] = []
+    for v in values:
+        mapped = normalize.get(v)
+        if mapped and mapped not in seen:
+            seen.add(mapped)
+            result.append(mapped)
+    return result
+
+
+def _to_run_log_response(log: AgentRunLog) -> AgentRunLogResponse:
+    return AgentRunLogResponse(
+        id=log.id,
+        agent_id=log.agent_id,
+        agent_type=log.agent_type,  # type: ignore[arg-type]
+        status=log.status,  # type: ignore[arg-type]
+        items_processed=log.items_processed,
+        items_created=log.items_created,
+        errors=log.errors or [],
+        started_at=_dt_ms(log.started_at),
+        completed_at=_dt_ms_opt(log.completed_at),
+    )
+
+
+def _enforce_agent_limit(tier: str, current_count: int) -> int:
+    limit: int = FEATURES.get(tier, FEATURES["free"])["batch_active"]
+    if limit != -1 and current_count >= limit:
+        raise HTTPException(
+            status_code=status.HTTP_403_FORBIDDEN,
+            detail=f"Agent limit ({limit}) reached for your tier. Upgrade to create more.",
+        )
+    return limit
+
+
+async def _enforce_run_frequency(
+    tier: str,
+    user_id: str,
+    db: AsyncSession,
+) -> None:
+    """Raise HTTP 402 if the user has exceeded their daily batch run limit."""
+    limit: int = FEATURES.get(tier, FEATURES["free"])["batch_runs_per_day"]
+    if limit == -1:
+        return  # unlimited
+
+    today_start = datetime.now(timezone.utc).replace(
+        hour=0, minute=0, second=0, microsecond=0
+    )
+    result = await db.execute(
+        select(func.count(AgentRunLog.id)).where(
+            AgentRunLog.user_id == user_id,
+            AgentRunLog.started_at >= today_start,
+        )
+    )
+    runs_today: int = result.scalar_one()
+
+    if runs_today >= limit:
+        raise HTTPException(
+            status_code=status.HTTP_402_PAYMENT_REQUIRED,
+            detail=f"Daily batch run limit ({limit}) reached for your tier. Upgrade for more runs.",
+        )
+
+
+# ── Catalog ───────────────────────────────────────────────────────────
+
+@router.get("/catalog", response_model=list[AgentCatalogItem])
+async def get_agent_catalog(
+    current_user: UserProfile = Depends(get_current_user),
+) -> list[AgentCatalogItem]:
+    """Return the static list of available agent types and their descriptions."""
+    return [
+        AgentCatalogItem(
+            type="local_directory",
+            name="Local Directory Monitor",
+            description="Watches local directories, extracts data from files using AI",
+        ),
+        AgentCatalogItem(
+            type="gmail",
+            name="Gmail Connector",
+            description="Scans Gmail inbox, extracts tasks/notes from emails",
+        ),
+        AgentCatalogItem(
+            type="teams",
+            name="Microsoft Teams Connector",
+            description="Monitors Teams messages, extracts action items",
+        ),
+        AgentCatalogItem(
+            type="outlook",
+            name="Outlook Connector",
+            description="Scans Outlook inbox, extracts tasks/notes",
+        ),
+    ]
+
+
+@router.post("/can-create", response_model=AgentCreationCheckResponse)
+async def can_create_agent(
+    body: AgentCreationCheckRequest,
+    current_user: UserProfile = Depends(get_current_user),
+) -> AgentCreationCheckResponse:
+    """Check if the user can create one more agent based on billing tier.
+
+    Since configuration is client-owned, the Electron app sends its current
+    active agent count and the backend applies tier limits.
+    """
+    limit: int = FEATURES.get(current_user.tier, FEATURES["free"])["batch_active"]
+    allowed = limit == -1 or body.active_agents < limit
+    return AgentCreationCheckResponse(
+        allowed=allowed,
+        tier=current_user.tier,
+        active_agents=body.active_agents,
+        limit=limit,
+    )
+
+
+@router.post("/trigger", response_model=AgentRunLogResponse, status_code=status.HTTP_202_ACCEPTED)
+async def trigger_agent_run(
+    body: AgentTriggerRequest,
+    current_user: UserProfile = Depends(get_current_user),
+    db: AsyncSession = Depends(get_session),
+) -> AgentRunLogResponse:
+    """Trigger a local agent run using client-provided configuration."""
+    _enforce_agent_limit(current_user.tier, body.active_agents)
+    await _enforce_run_frequency(current_user.tier, current_user.id, db)
+
+    config = LocalAgentConfig(
+        id=str(uuid.uuid4()),
+        user_id=current_user.id,
+        device_id=body.device_id,
+        name="Local Directory Monitor",
+        directory_paths=[body.directory],
+        data_types=_to_data_types(body.what_to_extract),
+        prompt_template=body.custom_agent_prompt,
+        file_extensions=[],
+        schedule_cron=body.batch_interval,
+        enabled=True,
+    )
+
+    # Use the FE's stable agent_id if provided, fall back to the ephemeral config id.
+    stable_agent_id = body.agent_id or config.id
+
+    if is_agent_running(stable_agent_id):
+        raise HTTPException(
+            status_code=status.HTTP_409_CONFLICT,
+            detail="Agent is already running. Only one run per agent is allowed at a time.",
+        )
+
+    run_log = AgentRunLog(
+        agent_id=stable_agent_id,
+        agent_type="local",
+        user_id=current_user.id,
+        status="running",
+    )
+    db.add(run_log)
+    await db.commit()
+    await db.refresh(run_log)
+
+    run_context = {
+        "type": "agent_batch",
+        "run_id": run_log.id,
+        "agent_id": stable_agent_id,
+    }
+
+    asyncio.create_task(
+        run_local_agent(current_user.id, config, run_log, device_manager, run_context)
+    )
+
+    return _to_run_log_response(run_log)
--- a/app/api/routes/auth.py
+++ b/app/api/routes/auth.py
@@ -13,6 +13,7 @@ import uuid
 from datetime import datetime, timedelta, timezone

 import bcrypt
+from cryptography.fernet import Fernet
 from fastapi import APIRouter, Depends, HTTPException, status
 from jose import jwt
 from pydantic import BaseModel
@@ -65,6 +66,8 @@ def _make_access_token(user_id: str, email: str, tier: str) -> tuple[str, int]:
 class _RegisterRequest(BaseModel):
    email: str
    password: str
+    name: str | None = None
+    surname: str | None = None


 class _LoginRequest(BaseModel):
@@ -92,8 +95,11 @@ async def register(
    user = User(
        id=str(uuid.uuid4()),
        email=body.email,
+        name=body.name,
+        surname=body.surname,
        password_hash=_hash_password(body.password),
        tier="free",
+        encryption_key=Fernet.generate_key().decode(),
    )
    db.add(user)
    await db.flush()  # get user.id without committing
@@ -191,7 +197,39 @@ async def refresh(
    )


+class _UpdateProfileRequest(BaseModel):
+    name: str | None = None
+    surname: str | None = None
+
+
@router.get("/me", response_model=UserProfile)
 async def me(current_user: UserProfile = Depends(get_current_user)) -> UserProfile:
    """Return the profile for the authenticated user."""
    return current_user
+
+
+@router.put("/me", response_model=UserProfile)
+async def update_profile(
+    body: _UpdateProfileRequest,
+    current_user: UserProfile = Depends(get_current_user),
+    db: AsyncSession = Depends(get_session),
+) -> UserProfile:
+    """Update the authenticated user's name and surname."""
+    result = await db.execute(select(User).where(User.id == current_user.id))
+    user = result.scalar_one()
+
+    if body.name is not None:
+        user.name = body.name
+    if body.surname is not None:
+        user.surname = body.surname
+
+    await db.commit()
+    await db.refresh(user)
+
+    return UserProfile(
+        id=user.id,
+        email=user.email,
+        name=user.name,
+        surname=user.surname,
+        tier=current_user.tier,
+    )
--- a/app/api/routes/chat.py
+++ b/app/api/routes/chat.py
@@ -1,78 +1,29 @@
-"""Chat routes: POST /chat and WebSocket /chat/stream."""
+"""Chat routes: POST /chat (REST fallback).
+
+WebSocket chat is handled by the unified device WS endpoint (/api/v1/ws/device).
+"""

 from __future__ import annotations

-import asyncio
-import json
-
-from fastapi import APIRouter, Depends, WebSocket, WebSocketDisconnect
+from fastapi import APIRouter, Depends
 from fastapi.responses import JSONResponse
-from jose import JWTError, jwt

 from app.api.deps import get_current_user
-from app.config.settings import settings
-from app.core.orchestrator import orchestrate, orchestrate_stream
+from app.core.deep_agent import run_home
 from app.schemas import ChatRequest, UserProfile

 router = APIRouter(prefix="/chat", tags=["chat"])

-_HEARTBEAT_INTERVAL = 30  # seconds
-

@router.post("")
 async def chat(
    body: ChatRequest,
    current_user: UserProfile = Depends(get_current_user),
 ) -> JSONResponse:
-    """Route a chat message through the orchestrator.
-
-    Returns ``ChatResponse`` for ``execution_mode='direct'``,
-    or ``ExecutionPlan`` for ``execution_mode='plan'``.
-    """
-    result = await orchestrate(body)
-    return JSONResponse(content=result.model_dump())
-
-
-@router.websocket("/stream")
-async def chat_stream(websocket: WebSocket) -> None:
-    """Streaming chat via WebSocket.
-
-    Auth: ``?token=<jwt>`` query param (Bearer not possible during WS handshake).
-
-    Protocol:
-      1. Client sends ``ChatRequest`` as the first JSON text frame.
-      2. Server streams response text chunks.
-      3. Final frame: JSON ``{"done": true, "response": "...", "actions": [...]}``.
-      4. Server pings every 30 s to keep the connection alive.
-    """
-    # Authenticate before accepting the connection
-    token = websocket.query_params.get("token", "")
-    try:
-        payload = jwt.decode(token, settings.JWT_SECRET, algorithms=[settings.JWT_ALGORITHM])
-        user_id: str | None = payload.get("sub")
-        if not user_id:
-            raise JWTError("missing sub")
-    except JWTError:
-        await websocket.close(code=1008)  # 1008 = Policy Violation
-        return
-
-    await websocket.accept()
-
-    try:
-        raw = await websocket.receive_text()
-        body = ChatRequest.model_validate_json(raw)
-
-        async def _heartbeat() -> None:
-            while True:
-                await asyncio.sleep(_HEARTBEAT_INTERVAL)
-                await websocket.send_text(json.dumps({"ping": True}))
-
-        heartbeat_task = asyncio.create_task(_heartbeat())
-        try:
-            async for chunk in orchestrate_stream(body):
-                await websocket.send_text(chunk)
-        finally:
-            heartbeat_task.cancel()
-
-    except WebSocketDisconnect:
-        pass
+    """REST fallback for home chat when websocket streaming is unavailable."""
+    response = await run_home(
+        user_id=current_user.id,
+        message=body.message,
+        context=body.context.model_dump(),
+    )
+    return JSONResponse(content={"response": response})
--- a/app/api/routes/device_ws.py
+++ b/app/api/routes/device_ws.py
@@ -0,0 +1,417 @@
+"""Device WebSocket endpoint.
+
+Persistent connection from Electron devices to the backend.
+
+  WS  /api/v1/ws/device?token=<jwt>
+
+Auth: JWT passed as ``?token=`` query parameter (Bearer header is not
+available during the WebSocket handshake).
+
+Protocol:
+  1. Client connects → JWT validated → connection accepted.
+  2. Client sends ``device_hello`` frame: ``{ type, device_id, agent_ids }``.
+  3. Backend registers the connection in ``DeviceConnectionManager``.
+  4. Session enters message dispatch loop + heartbeat.
+
+Incoming frame dispatch:
+  - ``tool_result``      → resolves a pending tool-call Future.
+  - ``journey_start``    → starts a guided setup journey session.
+  - ``journey_message``  → continues a journey conversation.
+  - ``pong``             → heartbeat acknowledgement (updates last-seen).
+  - unknown types        → logged, ignored.
+
+Outgoing heartbeat: ``{ "type": "ping" }`` every 30 s.
+
+On disconnect:
+  - Unregisters from DeviceConnectionManager.
+  - Marks all in-progress AgentRunLog rows for this user as ``error``
+    with message "device disconnected".
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+from uuid import uuid4
+
+from fastapi import APIRouter, WebSocket, WebSocketDisconnect
+from jose import JWTError, jwt
+from sqlalchemy import update
+
+from app.api.routes.agent_setup import handle_journey_message, handle_journey_start
+from app.config.settings import settings
+from app.core.agent_runner import trigger_pending_runs
+from app.core.deep_agent import run_floating_stream, run_home_stream
+from app.core.device_manager import device_manager
+from app.core.memory_middleware import MemoryMiddleware
+from app.core.output_formatter import StreamFormatter
+from app.core.ws_context import clear_client_executor, set_client_executor
+from app.db import async_session
+from app.models import AgentRunLog
+from app.schemas import WsFrameType
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/ws", tags=["device-ws"])
+
+_HEARTBEAT_INTERVAL = 30  # seconds
+_PONG_TIMEOUT = 10  # seconds — grace window after a ping
+
+
+@router.websocket("/device")
+async def device_ws(websocket: WebSocket) -> None:
+    """Persistent WebSocket endpoint for Electron device connections.
+
+    Authentication is via ``?token=<jwt>`` query parameter.
+    """
+    # ── 1. Authenticate before accepting ─────────────────────────────
+    token = websocket.query_params.get("token", "")
+    try:
+        payload = jwt.decode(
+            token, settings.JWT_SECRET, algorithms=[settings.JWT_ALGORITHM]
+        )
+        user_id: str | None = payload.get("sub")
+        if not user_id:
+            raise JWTError("missing sub")
+    except JWTError:
+        await websocket.close(code=1008)  # Policy Violation
+        return
+
+    await websocket.accept()
+
+    # ── 2. Await device_hello frame ───────────────────────────────────
+    try:
+        raw = await asyncio.wait_for(websocket.receive_text(), timeout=15.0)
+    except (asyncio.TimeoutError, WebSocketDisconnect):
+        await websocket.close(code=1008)
+        return
+
+    try:
+        hello = json.loads(raw)
+        if hello.get("type") != WsFrameType.device_hello:
+            raise ValueError("expected device_hello as first frame")
+        device_id: str = hello["device_id"]
+        agent_ids: list[str] = hello.get("agent_ids", [])
+    except (KeyError, ValueError, json.JSONDecodeError) as exc:
+        logger.warning("device_ws: invalid device_hello from user=%s: %s", user_id, exc)
+        await websocket.close(code=1008)
+        return
+
+    # ── 3. Register connection ────────────────────────────────────────
+    device_manager.register(user_id, device_id, websocket)
+    logger.info(
+        "device_ws: connected user=%s device=%s agents=%s",
+        user_id,
+        device_id,
+        agent_ids,
+    )
+
+    # Trigger any overdue agent runs now that the device is connected.
+    asyncio.create_task(trigger_pending_runs(user_id, device_id, device_manager))
+
+    # ── 4. Concurrent message loop + heartbeat ────────────────────────
+    try:
+        await asyncio.gather(
+            _message_loop(websocket, user_id),
+            _heartbeat_loop(websocket),
+        )
+    except WebSocketDisconnect:
+        pass
+    except Exception as exc:
+        logger.warning("device_ws: unhandled exception user=%s: %s", user_id, exc)
+    finally:
+        device_manager.unregister(user_id)
+        logger.info("device_ws: disconnected user=%s device=%s", user_id, device_id)
+        await _mark_runs_disconnected(user_id)
+
+
+# ── Message dispatch loop ─────────────────────────────────────────────
+
+async def _message_loop(websocket: WebSocket, user_id: str) -> None:
+    """Receive frames from Electron and dispatch to the appropriate handler."""
+    async for raw in websocket.iter_text():
+        try:
+            frame: dict = json.loads(raw)
+        except json.JSONDecodeError:
+            logger.warning("device_ws: invalid JSON from user=%s", user_id)
+            continue
+
+        frame_type = frame.get("type")
+
+        if frame_type == WsFrameType.tool_result:
+            call_id = frame.get("id")
+            if call_id:
+                device_manager.resolve_pending_call(user_id, call_id, frame)
+            else:
+                logger.warning(
+                    "device_ws: tool_result missing id from user=%s", user_id
+                )
+
+        elif frame_type == WsFrameType.home_request:
+            asyncio.create_task(
+                _handle_home_request(websocket, user_id, frame)
+            )
+
+        elif frame_type == WsFrameType.floating_request:
+            asyncio.create_task(
+                _handle_floating_request(websocket, user_id, frame)
+            )
+
+        elif frame_type == WsFrameType.journey_start:
+            asyncio.create_task(
+                _handle_journey_start(websocket, user_id, frame)
+            )
+
+        elif frame_type == WsFrameType.journey_message:
+            asyncio.create_task(
+                _handle_journey_message(websocket, user_id, frame)
+            )
+
+        elif frame_type == "pong":
+            # Heartbeat ack — nothing to do, connection is alive.
+            pass
+
+        else:
+            logger.debug(
+                "device_ws: unknown frame type %r from user=%s", frame_type, user_id
+            )
+
+
+# ── v3 Chat Handlers ──────────────────────────────────────────────────
+
+async def _make_ws_executor(websocket: WebSocket, user_id: str):
+    """Return a callback that sends tool_call frames and awaits tool_result."""
+    async def _executor(payload: dict) -> dict:
+        payload["type"] = WsFrameType.tool_call
+        await websocket.send_text(json.dumps(payload))
+        future = device_manager.create_pending_call(user_id, payload["id"])
+        return await future
+    return _executor
+
+
+async def _handle_home_request(
+    websocket: WebSocket,
+    user_id: str,
+    frame: dict,
+) -> None:
+    """Handle a home_request frame — streams HomeFormatter output back on the socket."""
+    request_id = frame.get("request_id") or str(uuid4())
+    message: str = frame.get("message", "")
+    session_id: str = frame.get("session_id") or str(uuid4())
+    logger.info(
+        "device_ws: home_request_start user=%s req=%s session=%s msg=%s",
+        user_id,
+        request_id,
+        session_id,
+        message[:200],
+    )
+
+    # ── Memory: enrich context before LLM call ────────────────────────
+    async with async_session() as db:
+        memory = MemoryMiddleware(db)
+        memory_context = await memory.enrich_context(
+            user_id,
+            message,
+            trace_id=request_id,
+            session_id=session_id,
+        )
+
+    context: dict = {
+        "conversation_history": frame.get("conversation_history", []),
+        "_debug": {"request_id": request_id, "session_id": session_id, "user_id": user_id},
+        **memory_context,
+    }
+
+    executor = await _make_ws_executor(websocket, user_id)
+    set_client_executor(executor)
+    response_chunks: list[str] = []
+    try:
+        event_stream = run_home_stream(user_id, message, context)
+        formatter = StreamFormatter(request_id=request_id)
+        async for ws_frame in formatter.format(event_stream):
+            await websocket.send_text(ws_frame.model_dump_json())
+            # Collect text chunks to build the full response for episode storage
+            if ws_frame.type == "stream_text":  # type: ignore[union-attr]
+                response_chunks.append(ws_frame.chunk)  # type: ignore[union-attr]
+    except Exception as exc:
+        logger.error(
+            "device_ws: home_request failed user=%s req=%s: %s",
+            user_id, request_id, exc,
+        )
+    finally:
+        clear_client_executor()
+
+    # ── Memory: store episode after response ──────────────────────────
+    async with async_session() as db:
+        memory = MemoryMiddleware(db)
+        await memory.store_episode(
+            user_id, session_id, message, "".join(response_chunks), trace_id=request_id
+        )
+    logger.info(
+        "device_ws: home_request_end user=%s req=%s session=%s response_chars=%d",
+        user_id,
+        request_id,
+        session_id,
+        len("".join(response_chunks)),
+    )
+
+
+async def _handle_floating_request(
+    websocket: WebSocket,
+    user_id: str,
+    frame: dict,
+) -> None:
+    """Handle a floating_request frame — streams FloatingFormatter output back on the socket."""
+    request_id = frame.get("request_id") or str(uuid4())
+    message: str = frame.get("message", "")
+    session_id: str = frame.get("session_id") or str(uuid4())
+    scope: dict = frame.get("scope", {})
+    logger.info(
+        "device_ws: floating_request_start user=%s req=%s session=%s scope=%s msg=%s",
+        user_id,
+        request_id,
+        session_id,
+        json.dumps(scope, ensure_ascii=True)[:200],
+        message[:200],
+    )
+
+    # ── Memory: enrich context before LLM call ────────────────────────
+    async with async_session() as db:
+        memory = MemoryMiddleware(db)
+        memory_context = await memory.enrich_context(
+            user_id,
+            message,
+            trace_id=request_id,
+            session_id=session_id,
+        )
+
+    context: dict = {
+        "scope": scope,
+        "_debug": {"request_id": request_id, "session_id": session_id, "user_id": user_id},
+        **memory_context,
+    }
+
+    executor = await _make_ws_executor(websocket, user_id)
+    set_client_executor(executor)
+    response_chunks: list[str] = []
+    try:
+        event_stream = run_floating_stream(user_id, message, context)
+        formatter = StreamFormatter(request_id=request_id)
+        async for ws_frame in formatter.format(event_stream):
+            await websocket.send_text(ws_frame.model_dump_json())
+            if ws_frame.type == "stream_text":  # type: ignore[union-attr]
+                response_chunks.append(ws_frame.chunk)  # type: ignore[union-attr]
+    except Exception as exc:
+        logger.error(
+            "device_ws: floating_request failed user=%s req=%s: %s",
+            user_id, request_id, exc,
+        )
+    finally:
+        clear_client_executor()
+
+    # ── Memory: store episode after response ──────────────────────────
+    async with async_session() as db:
+        memory = MemoryMiddleware(db)
+        await memory.store_episode(
+            user_id, session_id, message, "".join(response_chunks), trace_id=request_id
+        )
+    logger.info(
+        "device_ws: floating_request_end user=%s req=%s session=%s response_chars=%d",
+        user_id,
+        request_id,
+        session_id,
+        len("".join(response_chunks)),
+    )
+
+
+# ── v4 Journey Handlers ─────────────────────────────────────────────
+
+
+async def _handle_journey_start(
+    websocket: WebSocket,
+    user_id: str,
+    frame: dict,
+) -> None:
+    """Handle a journey_start frame — explores directory and sends first question."""
+    executor = await _make_ws_executor(websocket, user_id)
+    set_client_executor(executor)
+    try:
+        reply = await handle_journey_start(user_id, frame)
+        await websocket.send_text(json.dumps(reply))
+    except Exception as exc:
+        logger.error(
+            "device_ws: journey_start failed user=%s: %s", user_id, exc
+        )
+        await websocket.send_text(json.dumps({
+            "type": "journey_reply",
+            "session_id": frame.get("session_id", ""),
+            "message": f"Failed to start journey: {exc}",
+            "done": True,
+            "prompt_template": None,
+        }))
+    finally:
+        clear_client_executor()
+
+
+async def _handle_journey_message(
+    websocket: WebSocket,
+    user_id: str,
+    frame: dict,
+) -> None:
+    """Handle a journey_message frame — continues the journey conversation."""
+    executor = await _make_ws_executor(websocket, user_id)
+    set_client_executor(executor)
+    try:
+        reply = await handle_journey_message(user_id, frame)
+        await websocket.send_text(json.dumps(reply))
+    except Exception as exc:
+        session_id = frame.get("session_id", "")
+        logger.error(
+            "device_ws: journey_message failed user=%s session=%s: %s",
+            user_id, session_id, exc,
+        )
+        await websocket.send_text(json.dumps({
+            "type": "journey_reply",
+            "session_id": session_id,
+            "message": f"Journey error: {exc}",
+            "done": True,
+            "prompt_template": None,
+        }))
+    finally:
+        clear_client_executor()
+
+
+# ── Heartbeat ─────────────────────────────────────────────────────────
+
+async def _heartbeat_loop(websocket: WebSocket) -> None:
+    """Send a ping frame every 30 s to keep the connection alive."""
+    while True:
+        await asyncio.sleep(_HEARTBEAT_INTERVAL)
+        await websocket.send_text(json.dumps({"type": "ping"}))
+
+
+# ── Disconnect cleanup ────────────────────────────────────────────────
+
+async def _mark_runs_disconnected(user_id: str) -> None:
+    """Mark all in-progress AgentRunLog rows as 'error' for this user."""
+    try:
+        async with async_session() as db:
+            await db.execute(
+                update(AgentRunLog)
+                .where(
+                    AgentRunLog.user_id == user_id,
+                    AgentRunLog.status == "running",
+                )
+                .values(
+                    status="error",
+                    errors=["device disconnected"],
+                )
+            )
+            await db.commit()
+    except Exception as exc:
+        logger.error(
+            "device_ws: failed to mark runs as disconnected for user=%s: %s",
+            user_id,
+            exc,
+        )
--- a/app/api/routes/plans.py
+++ b/app/api/routes/plans.py
@@ -1,37 +0,0 @@
-"""Plans routes: GET /plans/playbook and GET /plans/playbook/{plan_id}."""
-
-from __future__ import annotations
-
-from fastapi import APIRouter, Depends, HTTPException, status
-
-from app.api.deps import get_current_user
-from app.core.execution_plan import plan_cache
-from app.schemas import ExecutionPlan, UserProfile
-
-router = APIRouter(prefix="/plans", tags=["plans"])
-
-
-@router.get("/playbook", response_model=list[ExecutionPlan])
-async def list_playbooks(
-    current_user: UserProfile = Depends(get_current_user),
-) -> list[ExecutionPlan]:
-    """Return all cached execution plan playbooks for the authenticated user.
-
-    TODO(Step11): filter by tier — power+ plans gated behind batch_builder feature.
-    """
-    return plan_cache.get_all_playbooks()
-
-
-@router.get("/playbook/{plan_id}", response_model=ExecutionPlan)
-async def get_playbook(
-    plan_id: str,
-    current_user: UserProfile = Depends(get_current_user),
-) -> ExecutionPlan:
-    """Return a specific execution plan playbook by ID."""
-    plan = plan_cache.get_plan(plan_id)
-    if plan is None:
-        raise HTTPException(
-            status_code=status.HTTP_404_NOT_FOUND,
-            detail=f"Plan not found: {plan_id}",
-        )
-    return plan
--- a/app/api/routes/vectors.py
+++ b/app/api/routes/vectors.py
@@ -1,4 +1,4 @@
-"""Vectors routes: upsert, search, and delete cloud vector store entries."""
+"""Vectors routes: upsert, search, delete cloud vector store entries, and embed text."""

 from __future__ import annotations

@@ -6,6 +6,7 @@ from fastapi import APIRouter, Depends
 from pydantic import BaseModel

 from app.api.deps import get_current_user
+from app.core.llm import embed
 from app.schemas import (
    UserProfile,
    VectorSearchRequest,
@@ -24,6 +25,14 @@ class _VectorDeleteRequest(BaseModel):
    ids: list[str]


+class _EmbedRequest(BaseModel):
+    text: str
+
+
+class _EmbedResponse(BaseModel):
+    vector: list[float]
+
+
@router.post("/vectors/upsert", response_model=dict)
 async def upsert_vectors(
    body: VectorUpsertRequest,
@@ -54,3 +63,17 @@ async def delete_vectors(
    """Delete vectors by ID, scoped to the authenticated user."""
    await _vector_store.delete(current_user.id, body.ids)
    return {"ok": True}
+
+
+@router.post("/vectors/embed", response_model=_EmbedResponse)
+async def embed_text(
+    body: _EmbedRequest,
+    current_user: UserProfile = Depends(get_current_user),
+) -> _EmbedResponse:
+    """Generate a 1536-dim embedding vector for the given text.
+
+    Uses ``text-embedding-3-small`` via OpenAI.  Auth required (JWT).
+    Used by backend tools (note_agent) and Electron (vectordb.ts) alike.
+    """
+    vector = await embed(body.text)
+    return _EmbedResponse(vector=vector)
--- a/app/billing/tier_manager.py
+++ b/app/billing/tier_manager.py
@@ -21,6 +21,7 @@ FEATURES: dict[str, dict[str, Any]] = {
    "free": {
        "agents": 3,
        "batch_active": 2,
+        "batch_runs_per_day": 5,
        "cloud_storage_gb": 0,
        "backup_gb": 0,
        "providers": 1,
@@ -31,6 +32,7 @@ FEATURES: dict[str, dict[str, Any]] = {
    "pro": {
        "agents": -1,           # unlimited
        "batch_active": 10,
+        "batch_runs_per_day": 50,
        "cloud_storage_gb": 5,
        "backup_gb": 5,
        "providers": -1,
@@ -41,6 +43,7 @@ FEATURES: dict[str, dict[str, Any]] = {
    "power": {
        "agents": -1,
        "batch_active": -1,     # unlimited
+        "batch_runs_per_day": -1,  # unlimited
        "cloud_storage_gb": 25,
        "backup_gb": 25,
        "providers": -1,
@@ -51,6 +54,7 @@ FEATURES: dict[str, dict[str, Any]] = {
    "team": {
        "agents": -1,
        "batch_active": -1,
+        "batch_runs_per_day": -1,  # unlimited
        "cloud_storage_gb": -1,  # unlimited
        "backup_gb": -1,         # unlimited
        "providers": -1,
@@ -77,16 +81,18 @@ class TierManager:
    async def get_tier(self, user_id: str, db: AsyncSession) -> BillingTier:
        """Return the current billing tier for ``user_id`` from the DB.

-        Falls back to ``'free'`` when no subscription row exists.
+        Falls back to ``'power'`` in dev (unlimited) or ``'free'`` in prod
+        when no subscription row exists.
        """
        from app.models import Subscription  # noqa: PLC0415
+        from app.config.settings import settings  # noqa: PLC0415

        result = await db.execute(
            select(Subscription.tier).where(Subscription.user_id == user_id)
        )
        tier: str | None = result.scalar_one_or_none()
        if tier is None or tier not in FEATURES:
-            return "free"
+            return "power" if settings.ENV == "dev" else "free"
        return tier  # type: ignore[return-value]

    # ── Feature access ───────────────────────────────────────────────────
--- a/app/config/settings.py
+++ b/app/config/settings.py
@@ -1,5 +1,5 @@
 from typing import Literal
-from pydantic_settings import BaseSettings
+from pydantic_settings import BaseSettings, SettingsConfigDict


 class Settings(BaseSettings):
@@ -26,17 +26,39 @@ class Settings(BaseSettings):
    OPENAI_API_KEY: str = ""
    ANTHROPIC_API_KEY: str = ""
    GOOGLE_API_KEY: str = ""
+    CEREBRAS_API_KEY: str = ""

    LLM_MODEL: str = "gpt-4o"
    LLM_ROUTER_MODEL: str = "gpt-4o-mini"
+    LLM_EMBED_MODEL: str = "text-embedding-3-small"
+
+    # GitHub Copilot OAuth token storage directory.
+    # Leave empty to use the LiteLLM default (~/.config/litellm/github_copilot).
+    # In Docker, set this to a path backed by a named volume so tokens survive restarts.
+    GITHUB_COPILOT_TOKEN_DIR: str = ""
+
+    # OAuth client credentials — used for Gmail and Microsoft (Outlook/Teams) flows.
+    GMAIL_CLIENT_ID: str = ""
+    GMAIL_CLIENT_SECRET: str = ""
+    MS_CLIENT_ID: str = ""
+    MS_CLIENT_SECRET: str = ""
+    # MS_TENANT_ID: set to 'common' to allow multi-tenant (personal + work accounts).
+    MS_TENANT_ID: str = "common"
+
+    # Fernet key (URL-safe base64, 32-byte key) for at-rest encryption of OAuth
+    # tokens stored in cloud_agent_configs.oauth_token_encrypted.
+    # Generate with: from cryptography.fernet import Fernet; Fernet.generate_key()
+    OAUTH_ENCRYPTION_KEY: str = ""

    CORS_ORIGINS: list[str] = ["app://.", "http://localhost:3000", "http://localhost:5173"]

+    LANGFUSE_SECRET_KEY: str = ""
+    LANGFUSE_PUBLIC_KEY: str = ""
+    LANGFUSE_HOST: str = "https://cloud.langfuse.com"
+
    ENV: Literal["dev", "prod"] = "dev"

-    class Config:
-        env_file = ".env"
-        env_file_encoding = "utf-8"
+    model_config = SettingsConfigDict(env_file=".env", env_file_encoding="utf-8")


 settings = Settings()
--- a/app/core/agent_registry.py
+++ b/app/core/agent_registry.py
@@ -1,4 +1,4 @@
-"""Agent Registry — base classes and singleton registry for chat agents."""
+"""Minimal agent base types retained for compatibility with batch runners."""

 from __future__ import annotations

@@ -7,7 +7,7 @@ from typing import Any


 class BaseAgent(ABC):
-    """Common base for all agents."""
+    """Common base for non-chat agents still using the old base contract."""

    def __init__(
        self,
@@ -27,111 +27,4 @@ class BaseAgent(ABC):

    @property
    def skills(self) -> list[str]:
-        """Override in subclasses to advertise capabilities."""
        return []
-
-
-class ChatAgent(BaseAgent):
-    """Base class for LLM-powered chat agents."""
-
-    @abstractmethod
-    async def handle(self, query: str, context: dict[str, Any]) -> str:
-        """Process a user query and return a text response."""
-        ...
-
-    @abstractmethod
-    def get_tools(self) -> list[Any]:
-        """Return LangChain tool definitions available to this agent."""
-        ...
-
-    async def _tool_loop(
-        self,
-        llm: Any,
-        messages: list[Any],
-        tools: list[Any],
-        max_iter: int = 5,
-    ) -> str:
-        """Shared tool-calling loop.
-
-        Binds *tools* to *llm*, invokes iteratively until the model stops
-        requesting tool calls or *max_iter* is reached, and returns the
-        final text response.
-        """
-        from langchain_core.messages import AIMessage, ToolMessage
-
-        llm_with_tools = llm.bind_tools(tools) if tools else llm
-
-        for _ in range(max_iter):
-            response: AIMessage = await llm_with_tools.ainvoke(messages)
-            messages.append(response)
-
-            if not response.tool_calls:
-                return str(response.content)
-
-            # Execute each requested tool call
-            tool_map = {t.name: t for t in tools}
-            for call in response.tool_calls:
-                tool_fn = tool_map.get(call["name"])
-                if tool_fn is None:
-                    result = f"Unknown tool: {call['name']}"
-                else:
-                    result = await tool_fn.ainvoke(call["args"])
-                messages.append(
-                    ToolMessage(content=str(result), tool_call_id=call["id"])
-                )
-
-        # Exhausted iterations — ask model for a final answer without tools
-        response = await llm.ainvoke(messages)
-        return str(response.content)
-
-
-class AgentRegistry:
-    """Singleton registry for ChatAgent subclasses."""
-
-    _instance: AgentRegistry | None = None
-
-    def __init__(self) -> None:
-        self._agents: dict[str, type[ChatAgent]] = {}
-
-    def __new__(cls) -> AgentRegistry:
-        if cls._instance is None:
-            cls._instance = super().__new__(cls)
-            cls._instance._agents = {}
-        return cls._instance
-
-    # ── public API ───────────────────────────────────────────────────
-
-    def register(self, agent_class: type[ChatAgent]) -> type[ChatAgent]:
-        """Class decorator — registers an agent by its name."""
-        instance = agent_class()
-        name = instance.get_name()
-        self._agents[name] = agent_class
-        return agent_class
-
-    def get(self, name: str) -> ChatAgent:
-        """Return a fresh instance of the named agent."""
-        cls = self._agents.get(name)
-        if cls is None:
-            raise KeyError(f"Agent not found: {name}")
-        return cls()
-
-    def list_agents(self) -> list[dict[str, str]]:
-        """Return ``[{name, description}]`` for the orchestrator prompt."""
-        result: list[dict[str, str]] = []
-        for cls in self._agents.values():
-            inst = cls()
-            result.append(
-                {"name": inst.get_name(), "description": inst.get_description()}
-            )
-        return result
-
-    async def call_agent(
-        self, name: str, query: str, context: dict[str, Any]
-    ) -> str:
-        """Instantiate the named agent and call its ``handle`` method."""
-        agent = self.get(name)
-        return await agent.handle(query, context)
-
-
-# Module-level singleton
-registry = AgentRegistry()
--- a/app/core/agent_runner.py
+++ b/app/core/agent_runner.py
--- a/app/core/deep_agent.py
+++ b/app/core/deep_agent.py
@@ -0,0 +1,962 @@
+"""Single-agent runners for home and floating chat contexts."""
+
+from __future__ import annotations
+
+import json
+import logging
+import re
+from datetime import date
+from collections.abc import AsyncGenerator
+from typing import Any, Literal
+
+from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage
+from langchain_core.tools import tool
+
+from app.agents.note_agent import NOTE_TOOLS
+from app.agents.project_agent import PROJECT_TOOLS
+from app.agents.task_agent import TASK_TOOLS
+from app.agents.timeline_agent import TIMELINE_TOOLS
+from app.core.langfuse_client import extract_usage, get_langfuse, get_prompt_or_fallback
+from app.core.llm import get_llm
+from app.config.settings import settings
+from app.core.memory_middleware import MemoryMiddleware
+from app.core.ws_context import clear_tool_result_collector, execute_on_client, set_tool_result_collector
+from app.db import async_session
+
+logger = logging.getLogger(__name__)
+
+FloatingDomainType = Literal["task", "timeline", "project", "node"]
+FloatingDomainSection = Literal["task", "timeline", "note"]
+
+_HOME_SYSTEM_PROMPT = (
+    "You are the home assistant with direct access to all tools: tasks, projects, notes, timelines, and memory tools. "
+    "Always use tools for factual data retrieval before answering. "
+    "When the user asks to remember, forget, or update what you know about them, use memory tools. "
+    "If context.context.resolved_project_id exists, use it as project_id for scoped list calls. "
+    "Return markdown and use tags when relevant: <project>[ids]</project>, <task>[ids]</task>, "
+    "<note>[ids]</note>, <timeline>[ids]</timeline>, <chart>{json}</chart>. "
+    "When listing tasks or timelines, each id tag must be on its own line with no prefix/suffix text. "
+    "Never put titles, priorities, or dates on the same line as <task> or <timeline> tags. "
+    "For questions about upcoming timelines (e.g. 'prossimi eventi'), include only future items in the current month unless the user asks a different range. "
+    "For upcoming tasks, after tag lines add a short recommendation based on due date and priority."
+)
+
+_FLOATING_SYSTEM_PROMPT = (
+    "You are the floating assistant with direct access to all tools: tasks, projects, notes, timelines, and memory tools. "
+    "Stay focused on the floating scope in context.scope and answer concisely. "
+    "Return plain text only. Do not output XML/HTML-like tags such as <task>, <project>, <note>, <timeline>, or any bracketed id tag wrappers. "
+    "Always use tools for factual data retrieval before answering. "
+    "When the user asks to remember, forget, or update what you know about them, use memory tools. "
+    "If context.context.resolved_project_id exists, use it as project_id for scoped list calls. "
+)
+
+_FLOATING_DOMAIN_CLASSIFIER_PROMPT = (
+    "You are a strict domain classifier for websocket floating requests. "
+    "Return ONLY a JSON object with keys: type, id, section. "
+    "Allowed type values: task, timeline, project, node. "
+    "Allowed section values: task, timeline, note, or null. "
+    "Rules: infer from user message intent first; do not blindly trust scope.type. "
+    "If user asks tasks/timeline/notes for a project, set type=project and section accordingly. "
+    "If project id is unknown but context.resolved_project_id exists, use it as id. "
+    "If id is unknown, use null. "
+    "No markdown, no prose, JSON only."
+)
+
+
+def _as_text(content: Any) -> str:
+    if content is None:
+        return ""
+    if isinstance(content, str):
+        return content
+    if isinstance(content, list):
+        parts: list[str] = []
+        for item in content:
+            if isinstance(item, str):
+                parts.append(item)
+            elif isinstance(item, dict):
+                text = item.get("text")
+                if isinstance(text, str):
+                    parts.append(text)
+        return "".join(parts)
+    return str(content)
+
+
+def _candidate_tokens(message: str) -> list[str]:
+    tokens = re.findall(r"[a-zA-Z0-9_-]+", message.lower())
+    return [token for token in tokens if len(token) >= 3]
+
+
+async def _resolve_project_id_from_message(message: str) -> str | None:
+    """Resolve likely project UUID from user message using client project list."""
+    try:
+        result = await execute_on_client(action="select", table="projects")
+    except Exception as exc:
+        logger.warning("deep_agent: project resolve select failed: %s", exc)
+        return None
+
+    rows = result.get("rows", [])
+    if not isinstance(rows, list) or not rows:
+        return None
+
+    tokens = _candidate_tokens(message)
+    scored: list[tuple[int, dict[str, Any]]] = []
+    for row in rows:
+        if not isinstance(row, dict):
+            continue
+        name = str(row.get("name", "")).lower()
+        score = sum(1 for token in tokens if token in name)
+        if score > 0:
+            scored.append((score, row))
+
+    if not scored:
+        return None
+
+    scored.sort(key=lambda item: item[0], reverse=True)
+    top_score = scored[0][0]
+    top_rows = [row for score, row in scored if score == top_score]
+    if len(top_rows) != 1:
+        return None
+
+    project_id = top_rows[0].get("id")
+    return project_id if isinstance(project_id, str) else None
+
+
+def _needs_project_resolution(message: str) -> bool:
+    lowered = message.lower()
+    return any(keyword in lowered for keyword in ["project", "progetto", "progetti", "whitelist"])
+
+
+async def _prepare_context(message: str, context: dict[str, Any]) -> dict[str, Any]:
+    prepared = dict(context)
+    if _needs_project_resolution(message):
+        resolved_project_id = await _resolve_project_id_from_message(message)
+        if resolved_project_id:
+            prepared["resolved_project_id"] = resolved_project_id
+            logger.info("deep_agent: resolved_project_id=%s", resolved_project_id)
+    return prepared
+
+
+def _all_tools() -> list[Any]:
+    return [*TASK_TOOLS, *PROJECT_TOOLS, *NOTE_TOOLS, *TIMELINE_TOOLS]
+
+
+def _trace_id_from_context(context: dict[str, Any]) -> str | None:
+    debug = context.get("_debug")
+    if isinstance(debug, dict):
+        request_id = debug.get("request_id")
+        if isinstance(request_id, str) and request_id:
+            return request_id
+    return None
+
+
+def _context_for_model(context: dict[str, Any]) -> dict[str, Any]:
+    sanitized = dict(context)
+    sanitized.pop("_debug", None)
+    return sanitized
+
+
+_TAG_LINE_RE = re.compile(r"<(task|timeline)>\[[^\]]+\]</\1>")
+_TIMELINE_DMY_RE = re.compile(r"(?P<d>\d{2})/(?P<m>\d{2})/(?P<y>\d{4})")
+
+
+def _is_upcoming_timeline_query(message: str) -> bool:
+    lowered = message.lower()
+    has_upcoming = "prossim" in lowered or "upcoming" in lowered or "next" in lowered
+    has_timeline_topic = any(
+        token in lowered
+        for token in ("event", "evento", "eventi", "timeline", "milestone", "scaden")
+    )
+    return has_upcoming and has_timeline_topic
+
+
+def _timeline_date_in_current_month_or_future(dmy: str) -> bool:
+    match = _TIMELINE_DMY_RE.search(dmy)
+    if not match:
+        return True
+    try:
+        parsed = date(
+            int(match.group("y")),
+            int(match.group("m")),
+            int(match.group("d")),
+        )
+    except ValueError:
+        return True
+
+    today = date.today()
+    return parsed >= today and parsed.year == today.year and parsed.month == today.month
+
+
+def _normalize_tagged_list_lines(text: str, message: str) -> str:
+    if not text:
+        return text
+
+    upcoming_timeline_only = _is_upcoming_timeline_query(message)
+    output_lines: list[str] = []
+
+    for line in text.splitlines():
+        matches = list(_TAG_LINE_RE.finditer(line))
+        if not matches:
+            output_lines.append(line)
+            continue
+
+        had_non_tag_text = _TAG_LINE_RE.sub("", line).strip(" -\t0123456789.*:)")
+        if not had_non_tag_text and len(matches) == 1:
+            tag_text = matches[0].group(0)
+            if (
+                upcoming_timeline_only
+                and "<timeline>" in tag_text
+                and not _timeline_date_in_current_month_or_future(line)
+            ):
+                continue
+            output_lines.append(tag_text)
+            continue
+
+        for match in matches:
+            tag_text = match.group(0)
+            if (
+                upcoming_timeline_only
+                and "<timeline>" in tag_text
+                and not _timeline_date_in_current_month_or_future(line)
+            ):
+                continue
+            output_lines.append(tag_text)
+
+    return "\n".join(output_lines)
+
+
+_GENERIC_TAG_RE = re.compile(r"</?(task|project|note|timeline|chart)>", re.IGNORECASE)
+_BRACKETED_ID_RE = re.compile(r"\[(?:[0-9a-fA-F-]{8,}|[A-Za-z0-9_-]{8,})\]")
+_FLOATING_EMPTY_FALLBACK = "No results found."
+
+
+def _strip_floating_markup_fragment(text: str) -> str:
+    if not text:
+        return text
+    cleaned = _GENERIC_TAG_RE.sub("", text)
+    return _BRACKETED_ID_RE.sub("", cleaned)
+
+
+def _strip_floating_markup(text: str) -> str:
+    """Ensure floating responses stay plain text with no XML-like tag wrappers."""
+    if not text:
+        return text
+
+    cleaned = _strip_floating_markup_fragment(text)
+    # Collapse excessive spaces introduced by tag/id removal while preserving lines.
+    lines = [re.sub(r"[ \t]{2,}", " ", line).strip() for line in cleaned.splitlines()]
+    return "\n".join(line for line in lines if line)
+
+
+def _fallback_from_raw_floating_text(raw_text: str) -> str:
+    fallback = _strip_floating_markup_fragment(raw_text or "")
+    fallback = re.sub(r"[ \t]{2,}", " ", fallback).strip()
+    return fallback or _FLOATING_EMPTY_FALLBACK
+
+
+class _FloatingStreamSanitizer:
+    """Streaming sanitizer that removes floating markup without buffering the full answer."""
+
+    def __init__(self) -> None:
+        self._pending = ""
+
+    @staticmethod
+    def _split_safe_boundary(text: str) -> tuple[str, str]:
+        boundary = len(text)
+
+        last_lt = text.rfind("<")
+        if last_lt != -1 and ">" not in text[last_lt:]:
+            boundary = min(boundary, last_lt)
+
+        last_lb = text.rfind("[")
+        if last_lb != -1 and "]" not in text[last_lb:]:
+            boundary = min(boundary, last_lb)
+
+        if boundary == len(text):
+            return text, ""
+        return text[:boundary], text[boundary:]
+
+    def feed(self, chunk: str) -> str:
+        combined = f"{self._pending}{chunk}"
+        safe_text, self._pending = self._split_safe_boundary(combined)
+        return _strip_floating_markup_fragment(safe_text)
+
+    def finalize(self) -> str:
+        # Drop dangling unfinished wrappers at the very end.
+        tail = re.sub(r"<[^>\n]*$", "", self._pending)
+        tail = re.sub(r"\[[^\]\n]*$", "", tail)
+        self._pending = ""
+        return _strip_floating_markup_fragment(tail)
+
+
+def _normalize_memory_label(path_or_label: str) -> str:
+    value = path_or_label.strip()
+    if value.startswith("/memories/"):
+        value = value[len("/memories/"):]
+    value = value.strip("/")
+    return value
+
+
+def _memory_tools(user_id: str, trace_id: str | None) -> list[Any]:
+    @tool
+    async def memory_list_blocks() -> str:
+        """List all core memory blocks currently stored for the user."""
+        logger.info("deep_agent: memory_list_blocks trace=%s user=%s", trace_id or "-", user_id)
+        async with async_session() as db:
+            memory = MemoryMiddleware(db)
+            blocks = await memory.list_core_blocks(user_id)
+        if not blocks:
+            return "No memory blocks found."
+        lines = [f"- {b['label']}: {b['value']}" for b in blocks]
+        return "Memory blocks:\n" + "\n".join(lines)
+
+    @tool
+    async def memory_get(path_or_label: str) -> str:
+        """Get one memory block by label or /memories/<label> path."""
+        label = _normalize_memory_label(path_or_label)
+        logger.info("deep_agent: memory_get trace=%s user=%s label=%s", trace_id or "-", user_id, label)
+        if not label:
+            return "Invalid memory label."
+        async with async_session() as db:
+            memory = MemoryMiddleware(db)
+            value = await memory.get_core_block(user_id, label)
+        if value is None:
+            return f"Memory block '{label}' not found."
+        return f"Memory block '{label}':\n{value}"
+
+    @tool
+    async def memory_create(path_or_label: str, value: str) -> str:
+        """Create or overwrite a memory block value by label or /memories/<label> path."""
+        label = _normalize_memory_label(path_or_label)
+        logger.info("deep_agent: memory_create trace=%s user=%s label=%s", trace_id or "-", user_id, label)
+        if not label:
+            return "Invalid memory label."
+        async with async_session() as db:
+            memory = MemoryMiddleware(db)
+            await memory.update_core(user_id, label, value, trace_id=trace_id)
+        return f"Memory block '{label}' saved."
+
+    @tool
+    async def memory_append(path_or_label: str, content: str) -> str:
+        """Append content to a memory block, creating it if missing."""
+        label = _normalize_memory_label(path_or_label)
+        logger.info("deep_agent: memory_append trace=%s user=%s label=%s", trace_id or "-", user_id, label)
+        if not label:
+            return "Invalid memory label."
+        async with async_session() as db:
+            memory = MemoryMiddleware(db)
+            await memory.append_core(user_id, label, content)
+        return f"Memory block '{label}' appended."
+
+    @tool
+    async def memory_replace(path_or_label: str, old_string: str, new_string: str) -> str:
+        """Replace one exact string in a memory block."""
+        label = _normalize_memory_label(path_or_label)
+        logger.info("deep_agent: memory_replace trace=%s user=%s label=%s", trace_id or "-", user_id, label)
+        if not label:
+            return "Invalid memory label."
+        async with async_session() as db:
+            memory = MemoryMiddleware(db)
+            changed = await memory.replace_core(user_id, label, old_string, new_string)
+        if not changed:
+            return f"No replacement made in '{label}' (old string not found)."
+        return f"Memory block '{label}' updated."
+
+    @tool
+    async def memory_delete(path_or_label: str) -> str:
+        """Delete a memory block by label or /memories/<label> path."""
+        label = _normalize_memory_label(path_or_label)
+        logger.info("deep_agent: memory_delete trace=%s user=%s label=%s", trace_id or "-", user_id, label)
+        if not label:
+            return "Invalid memory label."
+        async with async_session() as db:
+            memory = MemoryMiddleware(db)
+            deleted = await memory.delete_core(user_id, label)
+        if not deleted:
+            return f"Memory block '{label}' not found."
+        return f"Memory block '{label}' deleted."
+
+    @tool
+    async def archival_memory_insert(content: str) -> str:
+        """Insert a long-term archival memory entry."""
+        logger.info("deep_agent: archival_memory_insert trace=%s user=%s", trace_id or "-", user_id)
+        async with async_session() as db:
+            memory = MemoryMiddleware(db)
+            await memory.insert_archival(user_id, content, source="assistant")
+        return "Archival memory saved."
+
+    @tool
+    async def archival_memory_search(query: str, top_k: int = 5) -> str:
+        """Search long-term archival memory by semantic fallback (keyword currently)."""
+        logger.info("deep_agent: archival_memory_search trace=%s user=%s query=%s", trace_id or "-", user_id, query[:80])
+        async with async_session() as db:
+            memory = MemoryMiddleware(db)
+            results = await memory.search_archival(user_id, query, top_k=top_k)
+        if not results:
+            return "No archival memory results found."
+        lines = [f"- {item}" for item in results]
+        return "Archival memory results:\n" + "\n".join(lines)
+
+    @tool
+    async def conversation_search(query: str, top_k: int = 5) -> str:
+        """Search recall memory from prior episodic conversation summaries."""
+        logger.info("deep_agent: conversation_search trace=%s user=%s query=%s", trace_id or "-", user_id, query[:80])
+        async with async_session() as db:
+            memory = MemoryMiddleware(db)
+            results = await memory.search_recall(user_id, query, top_k=top_k)
+        if not results:
+            return "No recall memory results found."
+        lines = [f"- {item}" for item in results]
+        return "Recall memory results:\n" + "\n".join(lines)
+
+    return [
+        memory_list_blocks,
+        memory_get,
+        memory_create,
+        memory_append,
+        memory_replace,
+        memory_delete,
+        archival_memory_insert,
+        archival_memory_search,
+        conversation_search,
+    ]
+
+
+def _all_tools_for_user(user_id: str, trace_id: str | None) -> list[Any]:
+    return [*_all_tools(), *_memory_tools(user_id, trace_id)]
+
+
+def _detect_domain_section(message: str) -> FloatingDomainSection | None:
+    lowered = message.lower()
+    if any(keyword in lowered for keyword in ["timeline", "milestone", "release", "schedule"]):
+        return "timeline"
+    if any(keyword in lowered for keyword in ["task", "tasks", "todo", "attivit", "azione"]):
+        return "task"
+    if any(keyword in lowered for keyword in ["note", "notes", "memo", "document"]):
+        return "note"
+    return None
+
+
+def _normalize_domain_payload(payload: dict[str, Any], fallback_id: str | None) -> dict[str, str | None]:
+    type_raw = str(payload.get("type") or "").strip().lower()
+    domain_type: FloatingDomainType = "task"
+    if type_raw in {"task", "timeline", "project", "node"}:
+        domain_type = type_raw
+
+    id_value = payload.get("id")
+    domain_id = id_value if isinstance(id_value, str) and id_value.strip() else None
+    if domain_type == "project" and not domain_id:
+        domain_id = fallback_id
+
+    section_raw = payload.get("section")
+    section: FloatingDomainSection | None = None
+    if isinstance(section_raw, str):
+        section_candidate = section_raw.strip().lower()
+        if section_candidate in {"task", "timeline", "note"}:
+            section = section_candidate
+
+    if domain_type != "project":
+        section = None
+
+    return {
+        "type": domain_type,
+        "id": domain_id,
+        "section": section,
+    }
+
+
+def _parse_json_object(text: str) -> dict[str, Any] | None:
+    raw = text.strip()
+    if not raw:
+        return None
+    try:
+        parsed = json.loads(raw)
+        return parsed if isinstance(parsed, dict) else None
+    except json.JSONDecodeError:
+        pass
+
+    match = re.search(r"\{.*\}", raw, re.DOTALL)
+    if not match:
+        return None
+    try:
+        parsed = json.loads(match.group(0))
+    except json.JSONDecodeError:
+        return None
+    return parsed if isinstance(parsed, dict) else None
+
+
+def _infer_floating_domain_rule_based(message: str, context: dict[str, Any]) -> dict[str, str | None]:
+    section = _detect_domain_section(message)
+    scope = context.get("scope") if isinstance(context, dict) else None
+    resolved_project_id = context.get("resolved_project_id") if isinstance(context, dict) else None
+    project_id = resolved_project_id if isinstance(resolved_project_id, str) and resolved_project_id else None
+
+    if isinstance(scope, dict):
+        scope_type = str(scope.get("type") or "").strip().lower()
+        scope_id = scope.get("id")
+        scope_id_value = scope_id if isinstance(scope_id, str) and scope_id else None
+
+        if scope_type in {"task", "tasks"}:
+            return {"type": "task", "id": scope_id_value, "section": None}
+        if scope_type in {"project", "projects"}:
+            project_scope_id = scope_id_value or project_id
+            return {
+                "type": "project",
+                "id": project_scope_id,
+                "section": section,
+            }
+        if scope_type in {"note", "notes"}:
+            return {
+                "type": "node",
+                "id": scope_id_value,
+                "section": None,
+            }
+        if scope_type in {"timeline", "timelines"}:
+            return {"type": "timeline", "id": scope_id_value, "section": None}
+
+    lowered = message.lower()
+    if any(keyword in lowered for keyword in ["project", "progetto", "client"]) or project_id:
+        return {
+            "type": "project",
+            "id": project_id,
+            "section": section,
+        }
+    if section == "timeline":
+        return {"type": "timeline", "id": None, "section": None}
+    if section == "note":
+        return {"type": "node", "id": None, "section": None}
+    return {"type": "task", "id": None, "section": None}
+
+
+async def _infer_floating_domain(message: str, context: dict[str, Any]) -> dict[str, str | None]:
+    resolved_project_id = context.get("resolved_project_id") if isinstance(context, dict) else None
+    project_id = resolved_project_id if isinstance(resolved_project_id, str) and resolved_project_id else None
+
+    classifier_context = {
+        "scope": context.get("scope") if isinstance(context.get("scope"), dict) else None,
+        "resolved_project_id": project_id,
+    }
+
+    try:
+        llm = get_llm()
+        classifier_messages = [
+            SystemMessage(content=_FLOATING_DOMAIN_CLASSIFIER_PROMPT),
+            HumanMessage(
+                content=(
+                    f"Message:\n{message}\n\n"
+                    f"Context:\n{json.dumps(classifier_context, ensure_ascii=True)}"
+                )
+            ),
+        ]
+        lf = get_langfuse()
+        _, classifier_prompt_obj = get_prompt_or_fallback(
+            "floating_domain_classifier", _FLOATING_DOMAIN_CLASSIFIER_PROMPT
+        )
+        if lf:
+            with lf.start_as_current_observation(
+                as_type="generation",
+                name="floating-classifier",
+                model=settings.LLM_MODEL,
+                prompt=classifier_prompt_obj,
+                input=classifier_messages,
+            ) as gen:
+                response = await llm.ainvoke(classifier_messages)
+                gen.update(output=_as_text(response.content), usage=extract_usage(response))
+        else:
+            response = await llm.ainvoke(classifier_messages)
+        parsed = _parse_json_object(_as_text(response.content))
+        if parsed is not None:
+            domain = _normalize_domain_payload(parsed, project_id)
+            logger.info(
+                "deep_agent: floating_domain_classified type=%s id=%s section=%s",
+                domain.get("type"),
+                domain.get("id"),
+                domain.get("section"),
+            )
+            return domain
+        logger.warning("deep_agent: floating_domain classifier returned non-json output")
+    except Exception as exc:
+        logger.warning("deep_agent: floating_domain classifier failed: %s", exc)
+
+    return _infer_floating_domain_rule_based(message, context)
+
+
+async def _run_single_agent(
+    *,
+    user_id: str,
+    system_prompt: str,
+    message: str,
+    context: dict[str, Any],
+    max_steps: int = 6,
+    langfuse_prompt: Any = None,
+    agent_name: str = "agent",
+) -> str:
+    trace_id = _trace_id_from_context(context)
+    lf = get_langfuse()
+    llm = get_llm()
+    tools = _all_tools_for_user(user_id, trace_id)
+    model_context = _context_for_model(context)
+    logger.info("deep_agent: run_single_agent_start trace=%s user=%s", trace_id or "-", user_id)
+    llm_with_tools = llm.bind_tools(tools)
+    messages: list[Any] = [
+        SystemMessage(content=system_prompt),
+        HumanMessage(
+            content=(
+                f"User message:\n{message}\n\n"
+                f"Context:\n{json.dumps({'context': model_context}, ensure_ascii=True)[:3500]}"
+            )
+        ),
+    ]
+
+    tool_calls_count = 0
+    collected: list[dict[str, Any]] = []
+    set_tool_result_collector(collected)
+
+    _span_ctx = (
+        lf.start_as_current_observation(
+            as_type="span",
+            name=agent_name,
+            metadata={"user_id": user_id, "session_id": trace_id},
+            input=message,
+        )
+        if lf else None
+    )
+    _span = _span_ctx.__enter__() if _span_ctx else None
+
+    try:
+        for _ in range(max_steps):
+            _gen_ctx = (
+                lf.start_as_current_observation(
+                    as_type="generation",
+                    name=f"{agent_name}-llm",
+                    model=settings.LLM_MODEL,
+                    prompt=langfuse_prompt,
+                    input=messages,
+                )
+                if lf else None
+            )
+            _gen = _gen_ctx.__enter__() if _gen_ctx else None
+            response: AIMessage = await llm_with_tools.ainvoke(messages)
+            if _gen_ctx:
+                _gen.update(output=_as_text(response.content), usage=extract_usage(response))
+                _gen_ctx.__exit__(None, None, None)
+
+            messages.append(response)
+
+            if not response.tool_calls:
+                final_text = _as_text(response.content)
+                logger.info(
+                    "deep_agent: run_single_agent_end trace=%s user=%s tool_calls=%d response_chars=%d",
+                    trace_id or "-",
+                    user_id,
+                    tool_calls_count,
+                    len(final_text),
+                )
+                if _span:
+                    _span.update(output=final_text)
+                return final_text
+
+            tool_map = {tool_def.name: tool_def for tool_def in tools}
+            for call in response.tool_calls:
+                tool_calls_count += 1
+                call_id = str(call.get("id", ""))
+                call_name = str(call.get("name", ""))
+                call_args = call.get("args", {})
+                logger.info(
+                    "deep_agent: AI->Tool tool_call_id=%s tool=%s args=%s",
+                    call_id,
+                    call_name,
+                    json.dumps(call_args, ensure_ascii=True)[:800],
+                )
+
+                tool_fn = tool_map.get(call_name)
+                if tool_fn is None:
+                    tool_output = f"Unknown tool: {call_name}"
+                else:
+                    tool_output = await tool_fn.ainvoke(call_args)
+
+                logger.info(
+                    "deep_agent: Tool->AI tool_call_id=%s tool=%s output=%s",
+                    call_id,
+                    call_name,
+                    str(tool_output)[:1200],
+                )
+
+                messages.append(ToolMessage(content=str(tool_output), tool_call_id=call["id"]))
+
+        final = await llm.ainvoke(messages)
+        final_text = _as_text(final.content)
+        logger.info(
+            "deep_agent: run_single_agent_end trace=%s user=%s tool_calls=%d response_chars=%d fallback=1",
+            trace_id or "-",
+            user_id,
+            tool_calls_count,
+            len(final_text),
+        )
+        if _span:
+            _span.update(output=final_text)
+        return final_text
+    finally:
+        clear_tool_result_collector()
+        if _span_ctx:
+            _span_ctx.__exit__(None, None, None)
+        if lf:
+            lf.flush()
+
+
+async def _run_single_agent_stream(
+    *,
+    user_id: str,
+    system_prompt: str,
+    message: str,
+    context: dict[str, Any],
+    max_steps: int = 6,
+    langfuse_prompt: Any = None,
+    agent_name: str = "agent",
+) -> AsyncGenerator[tuple[str, Any], None]:
+    trace_id = _trace_id_from_context(context)
+    lf = get_langfuse()
+    llm = get_llm()
+    tools = _all_tools_for_user(user_id, trace_id)
+    model_context = _context_for_model(context)
+    logger.info("deep_agent: run_single_agent_stream_start trace=%s user=%s", trace_id or "-", user_id)
+    llm_with_tools = llm.bind_tools(tools)
+    messages: list[Any] = [
+        SystemMessage(content=system_prompt),
+        HumanMessage(
+            content=(
+                f"User message:\n{message}\n\n"
+                f"Context:\n{json.dumps({'context': model_context}, ensure_ascii=True)[:3500]}"
+            )
+        ),
+    ]
+
+    tool_calls_count = 0
+    streamed_chars = 0
+    collected: list[dict[str, Any]] = []
+    set_tool_result_collector(collected)
+
+    _span_ctx = (
+        lf.start_as_current_observation(
+            as_type="span",
+            name=f"{agent_name}-stream",
+            metadata={"user_id": user_id, "session_id": trace_id},
+            input=message,
+        )
+        if lf else None
+    )
+    _span = _span_ctx.__enter__() if _span_ctx else None
+    streamed_text: list[str] = []
+
+    try:
+        for _ in range(max_steps):
+            _gen_ctx = (
+                lf.start_as_current_observation(
+                    as_type="generation",
+                    name=f"{agent_name}-llm",
+                    model=settings.LLM_MODEL,
+                    prompt=langfuse_prompt,
+                    input=messages,
+                )
+                if lf else None
+            )
+            _gen = _gen_ctx.__enter__() if _gen_ctx else None
+            response: AIMessage = await llm_with_tools.ainvoke(messages)
+            if _gen_ctx:
+                _gen.update(output=_as_text(response.content), usage=extract_usage(response))
+                _gen_ctx.__exit__(None, None, None)
+
+            messages.append(response)
+
+            if not response.tool_calls:
+                emitted_any = False
+                async for chunk in llm.astream(messages):
+                    token = _as_text(getattr(chunk, "content", ""))
+                    if token:
+                        streamed_chars += len(token)
+                        streamed_text.append(token)
+                        emitted_any = True
+                        yield "token", token
+
+                # Some providers return final text in `response.content` but stream no chunks.
+                if not emitted_any:
+                    fallback_text = _as_text(response.content)
+                    if fallback_text:
+                        streamed_chars += len(fallback_text)
+                        streamed_text.append(fallback_text)
+                        yield "token", fallback_text
+                logger.info(
+                    "deep_agent: run_single_agent_stream_end trace=%s user=%s tool_calls=%d response_chars=%d",
+                    trace_id or "-",
+                    user_id,
+                    tool_calls_count,
+                    streamed_chars,
+                )
+                if _span:
+                    _span.update(output="".join(streamed_text))
+                return
+
+            tool_map = {tool_def.name: tool_def for tool_def in tools}
+            for call in response.tool_calls:
+                tool_calls_count += 1
+                call_id = str(call.get("id", ""))
+                call_name = str(call.get("name", ""))
+                call_args = call.get("args", {})
+                logger.info(
+                    "deep_agent: AI->Tool tool_call_id=%s tool=%s args=%s",
+                    call_id,
+                    call_name,
+                    json.dumps(call_args, ensure_ascii=True)[:800],
+                )
+
+                tool_fn = tool_map.get(call_name)
+                if tool_fn is None:
+                    tool_output = f"Unknown tool: {call_name}"
+                else:
+                    tool_output = await tool_fn.ainvoke(call_args)
+
+                logger.info(
+                    "deep_agent: Tool->AI tool_call_id=%s tool=%s output=%s",
+                    call_id,
+                    call_name,
+                    str(tool_output)[:1200],
+                )
+
+                messages.append(ToolMessage(content=str(tool_output), tool_call_id=call["id"]))
+
+        async for chunk in llm.astream(messages):
+            token = _as_text(getattr(chunk, "content", ""))
+            if token:
+                streamed_chars += len(token)
+                streamed_text.append(token)
+                yield "token", token
+        logger.info(
+            "deep_agent: run_single_agent_stream_end trace=%s user=%s tool_calls=%d response_chars=%d fallback=1",
+            trace_id or "-",
+            user_id,
+            tool_calls_count,
+            streamed_chars,
+        )
+        if _span:
+            _span.update(output="".join(streamed_text))
+    finally:
+        clear_tool_result_collector()
+        if _span_ctx:
+            _span_ctx.__exit__(None, None, None)
+        if lf:
+            lf.flush()
+
+
+async def run_home(user_id: str, message: str, context: dict[str, Any]) -> str:
+    prepared_context = await _prepare_context(message, context)
+    system_prompt, langfuse_prompt = get_prompt_or_fallback(
+        "home_system", _HOME_SYSTEM_PROMPT
+    )
+    response = await _run_single_agent(
+        user_id=user_id,
+        system_prompt=system_prompt,
+        message=message,
+        context=prepared_context,
+        langfuse_prompt=langfuse_prompt,
+        agent_name="home-agent",
+    )
+    return _normalize_tagged_list_lines(response, message)
+
+
+async def run_floating(user_id: str, message: str, context: dict[str, Any]) -> tuple[str, dict[str, str | None]]:
+    prepared_context = await _prepare_context(message, context)
+    domain = await _infer_floating_domain(message, prepared_context)
+    system_prompt, langfuse_prompt = get_prompt_or_fallback(
+        "floating_system", _FLOATING_SYSTEM_PROMPT
+    )
+    response = await _run_single_agent(
+        user_id=user_id,
+        system_prompt=system_prompt,
+        message=message,
+        context=prepared_context,
+        langfuse_prompt=langfuse_prompt,
+        agent_name="floating-agent",
+    )
+    sanitized = _strip_floating_markup(response)
+    if not sanitized and response:
+        sanitized = _fallback_from_raw_floating_text(response)
+    return sanitized, domain
+
+
+async def run_home_stream(
+    user_id: str,
+    message: str,
+    context: dict[str, Any],
+) -> AsyncGenerator[tuple[str, Any], None]:
+    prepared_context = await _prepare_context(message, context)
+    system_prompt, langfuse_prompt = get_prompt_or_fallback(
+        "home_system", _HOME_SYSTEM_PROMPT
+    )
+    text_chunks: list[str] = []
+    async for event in _run_single_agent_stream(
+        user_id=user_id,
+        system_prompt=system_prompt,
+        message=message,
+        context=prepared_context,
+        langfuse_prompt=langfuse_prompt,
+        agent_name="home-agent",
+    ):
+        event_type, data = event
+        if event_type != "token":
+            yield event
+            continue
+        text_chunks.append(str(data or ""))
+
+    normalized = _normalize_tagged_list_lines("".join(text_chunks), message)
+    if normalized:
+        yield "token", normalized
+
+
+async def run_floating_stream(
+    user_id: str,
+    message: str,
+    context: dict[str, Any],
+) -> AsyncGenerator[tuple[str, Any], None]:
+    prepared_context = await _prepare_context(message, context)
+    domain = await _infer_floating_domain(message, prepared_context)
+    yield "floating_domain", domain
+
+    system_prompt, langfuse_prompt = get_prompt_or_fallback(
+        "floating_system", _FLOATING_SYSTEM_PROMPT
+    )
+    sanitizer = _FloatingStreamSanitizer()
+    emitted_sanitized = False
+    raw_chunks: list[str] = []
+    async for event in _run_single_agent_stream(
+        user_id=user_id,
+        system_prompt=system_prompt,
+        message=message,
+        context=prepared_context,
+        langfuse_prompt=langfuse_prompt,
+        agent_name="floating-agent",
+    ):
+        event_type, data = event
+        if event_type != "token":
+            yield event
+            continue
+
+        raw_chunk = str(data or "")
+        raw_chunks.append(raw_chunk)
+        sanitized_chunk = sanitizer.feed(raw_chunk)
+        if sanitized_chunk:
+            emitted_sanitized = True
+            yield "token", sanitized_chunk
+
+    tail = sanitizer.finalize()
+    if tail:
+        emitted_sanitized = True
+        yield "token", tail
+
+    if not emitted_sanitized and raw_chunks:
+        yield "token", _fallback_from_raw_floating_text("".join(raw_chunks))
+
+
+async def update_core_memory(user_id: str, key: str, value: str) -> None:
+    """Compatibility helper kept for callers that expect explicit memory update API."""
+    async with async_session() as db:
+        memory = MemoryMiddleware(db)
+        await memory.update_core(user_id, key, value)
--- a/app/core/device_manager.py
+++ b/app/core/device_manager.py
@@ -0,0 +1,151 @@
+"""Device connection manager.
+
+Maintains in-memory state for all active Electron → backend WebSocket
+connections.  One connection per user (latest replaces previous).
+
+The manager handles the **tool-call round-trip** pattern:
+  - Backend sends ``tool_call`` frame → Electron executes the action →
+    returns ``tool_result`` frame.
+  - ``create_pending_call`` registers a Future keyed by ``call_id``.
+  - ``resolve_pending_call`` fulfils the Future; callers awaiting it
+    receive the result dict from Electron.
+
+This pattern is used by all tools (CRUD, file-system, etc.) via
+``execute_on_client()`` in ``ws_context.py``.
+
+The ``device_manager`` module-level singleton is imported by both the
+device WS route and the agent runner.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+from dataclasses import dataclass, field
+
+from fastapi import WebSocket
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class DeviceConnection:
+    """State for a single connected Electron device."""
+
+    ws: WebSocket
+    device_id: str
+    # Futures indexed by tool_call id — resolved when tool_result arrives.
+    pending_calls: dict[str, asyncio.Future[dict]] = field(default_factory=dict)
+
+
+class DeviceConnectionManager:
+    """Singleton registry of active Electron WebSocket connections.
+
+    Thread/task safety note: asyncio is single-threaded by design.  All
+    mutations happen inside await-points on the main event loop, so no
+    locking is required for the in-memory dicts.
+    """
+
+    def __init__(self) -> None:
+        self._connections: dict[str, DeviceConnection] = {}
+
+    # ── Registration ──────────────────────────────────────────────────
+
+    def register(self, user_id: str, device_id: str, ws: WebSocket) -> None:
+        """Store the active connection for *user_id*, replacing any previous one."""
+        if user_id in self._connections:
+            old = self._connections[user_id]
+            logger.info(
+                "device_manager: replacing existing connection for user=%s device=%s",
+                user_id,
+                old.device_id,
+            )
+            # Cancel any futures that were waiting on the old connection.
+            for fut in old.pending_calls.values():
+                if not fut.done():
+                    fut.cancel()
+        self._connections[user_id] = DeviceConnection(ws=ws, device_id=device_id)
+        logger.info(
+            "device_manager: registered user=%s device=%s", user_id, device_id
+        )
+
+    def unregister(self, user_id: str) -> None:
+        """Remove the connection for *user_id* and cancel any pending futures."""
+        conn = self._connections.pop(user_id, None)
+        if conn is None:
+            return
+        for fut in conn.pending_calls.values():
+            if not fut.done():
+                fut.cancel()
+        logger.info("device_manager: unregistered user=%s", user_id)
+
+    # ── Presence queries ──────────────────────────────────────────────
+
+    def get_ws(self, user_id: str) -> WebSocket | None:
+        """Return the active WebSocket for *user_id*, or ``None`` if offline."""
+        conn = self._connections.get(user_id)
+        return conn.ws if conn else None
+
+    def is_online(self, user_id: str, device_id: str | None = None) -> bool:
+        """Return ``True`` if the user has an active connection.
+
+        If *device_id* is provided also checks that it matches the connected device.
+        """
+        conn = self._connections.get(user_id)
+        if conn is None:
+            return False
+        if device_id is not None:
+            return conn.device_id == device_id
+        return True
+
+    # ── Frame sending ─────────────────────────────────────────────────
+
+    async def send_frame(self, user_id: str, frame: dict) -> None:
+        """Send *frame* as a JSON text message to the device.
+
+        Raises ``RuntimeError`` if the user is not connected.
+        """
+        conn = self._connections.get(user_id)
+        if conn is None:
+            raise RuntimeError(
+                f"send_frame: user {user_id!r} is not connected"
+            )
+        await conn.ws.send_text(json.dumps(frame))
+
+    # ── Tool-call round-trip ──────────────────────────────────────────
+
+    def create_pending_call(
+        self, user_id: str, call_id: str
+    ) -> asyncio.Future[dict]:
+        """Register a Future that will be resolved when the tool_result arrives.
+
+        Raises ``RuntimeError`` if the user is not connected.
+        """
+        conn = self._connections.get(user_id)
+        if conn is None:
+            raise RuntimeError(
+                f"create_pending_call: user {user_id!r} is not connected"
+            )
+        loop = asyncio.get_event_loop()
+        fut: asyncio.Future[dict] = loop.create_future()
+        conn.pending_calls[call_id] = fut
+        return fut
+
+    def resolve_pending_call(
+        self, user_id: str, call_id: str, result: dict
+    ) -> None:
+        """Fulfil the Future registered under *call_id* with the Electron result.
+
+        No-ops if the call_id is unknown (already timed out or cancelled).
+        """
+        conn = self._connections.get(user_id)
+        if conn is None:
+            return
+        fut = conn.pending_calls.pop(call_id, None)
+        if fut is not None and not fut.done():
+            fut.set_result(result)
+
+
+# Module-level singleton — import this everywhere.
+device_manager = DeviceConnectionManager()
--- a/app/core/execution_plan.py
+++ b/app/core/execution_plan.py
@@ -1,222 +0,0 @@
-"""Execution Plan generator — builder, template registry, and LRU plan cache."""
-
-from __future__ import annotations
-
-from collections import OrderedDict
-from typing import Any
-
-from app.schemas import ExecutionPlan, PlanStep
-
-
-# ── Prompt Template Registry ──────────────────────────────────────────
-
-
-class PromptTemplateRegistry:
-    """Server-side store mapping template IDs to prompt text.
-
-    Clients only ever receive template IDs (e.g. ``"tpl_task_agent_default"``).
-    The actual prompt text is resolved here on the server, keeping prompt IP
-    out of API responses.
-    """
-
-    def __init__(self) -> None:
-        self._templates: dict[str, str] = {}
-
-    def register(self, template_id: str, prompt_text: str) -> None:
-        self._templates[template_id] = prompt_text
-
-    def get(self, template_id: str) -> str:
-        """Resolve a template ID to its prompt text.
-
-        Raises ``KeyError`` if the template is not registered.
-        """
-        text = self._templates.get(template_id)
-        if text is None:
-            raise KeyError(f"Template not found: {template_id!r}")
-        return text
-
-    def has(self, template_id: str) -> bool:
-        return template_id in self._templates
-
-    def list_ids(self) -> list[str]:
-        """Return all registered template IDs (never the text)."""
-        return list(self._templates.keys())
-
-
-# ── Execution Plan Builder ────────────────────────────────────────────
-
-
-class ExecutionPlanBuilder:
-    """Fluent builder for ``ExecutionPlan`` objects.
-
-    Example::
-
-        plan = (
-            ExecutionPlanBuilder("task_agent")
-            .add_llm_step("tpl_task_agent_default", {"message": user_msg})
-            .add_data_step("create_record", data_from_step=0)
-            .build()
-        )
-    """
-
-    def __init__(self, agent: str) -> None:
-        self._agent = agent
-        self._steps: list[PlanStep] = []
-
-    # ── step adders ──────────────────────────────────────────────────
-
-    def add_step(
-        self, action: str, params: dict[str, Any] | None = None
-    ) -> ExecutionPlanBuilder:
-        """Append a generic action step with optional parameters."""
-        self._steps.append(PlanStep(action=action, variables=params))
-        return self
-
-    def add_llm_step(
-        self, template_id: str, variables: dict[str, Any] | None = None
-    ) -> ExecutionPlanBuilder:
-        """Append an LLM step referencing a server-side template by ID."""
-        self._steps.append(
-            PlanStep(action="llm", prompt_template=template_id, variables=variables)
-        )
-        return self
-
-    def add_data_step(self, action: str, data_from_step: int) -> ExecutionPlanBuilder:
-        """Append a step whose input comes from the output of an earlier step."""
-        self._steps.append(PlanStep(action=action, data_from_step=data_from_step))
-        return self
-
-    # ── build ────────────────────────────────────────────────────────
-
-    def build(self) -> ExecutionPlan:
-        """Validate step references and return the ``ExecutionPlan``.
-
-        Raises ``ValueError`` if any ``data_from_step`` references a
-        non-existent or future step index.
-        """
-        for i, step in enumerate(self._steps):
-            if step.data_from_step is not None:
-                if not (0 <= step.data_from_step < i):
-                    raise ValueError(
-                        f"Step {i}: data_from_step={step.data_from_step} must "
-                        f"reference a preceding step index in range 0..{i - 1}"
-                    )
-        return ExecutionPlan(agent=self._agent, steps=list(self._steps))
-
-
-# ── Plan Cache (LRU) ──────────────────────────────────────────────────
-
-
-class PlanCache:
-    """In-memory LRU cache for ``ExecutionPlan`` objects.
-
-    Plans stored here are accessible as playbooks via ``get_all_playbooks()``.
-    The cache also serves as a runtime memoisation layer so that repeated
-    identical intent classifications can skip re-building the plan.
-    """
-
-    def __init__(self, maxsize: int = 1000) -> None:
-        self._maxsize = maxsize
-        self._cache: OrderedDict[str, ExecutionPlan] = OrderedDict()
-
-    def cache_plan(self, key: str, plan: ExecutionPlan) -> None:
-        """Store *plan* under *key*, evicting the LRU entry if at capacity."""
-        if key in self._cache:
-            del self._cache[key]  # remove so re-insertion places it at the end
-        elif len(self._cache) >= self._maxsize:
-            self._cache.popitem(last=False)  # evict least-recently-used
-        self._cache[key] = plan
-
-    def get_plan(self, key: str) -> ExecutionPlan | None:
-        """Return the cached plan for *key*, or ``None`` if not present.
-
-        Accessing a plan marks it as most-recently used.
-        """
-        if key not in self._cache:
-            return None
-        self._cache.move_to_end(key)
-        return self._cache[key]
-
-    def get_all_playbooks(self) -> list[ExecutionPlan]:
-        """Return all cached plans (most-recently used last)."""
-        return list(self._cache.values())
-
-
-# ── Module-level singletons ───────────────────────────────────────────
-
-template_registry = PromptTemplateRegistry()
-plan_cache = PlanCache()
-
-
-def _register_builtin_templates() -> None:
-    """Register the built-in server-side prompt templates.
-
-    These strings never leave the server.  Clients only receive the IDs.
-    """
-    _tpls: dict[str, str] = {
-        "tpl_task_agent_default": (
-            "You are a task management assistant. Help the user create, update, "
-            "list, and track tasks. Use correct status values (todo, in_progress, "
-            "done) and priority values (high, medium, low) from the workspace model."
-        ),
-        "tpl_checkpoint_agent_default": (
-            "You are a project checkpoint assistant. Help the user create and manage "
-            "milestone checkpoints on their projects. Every checkpoint requires a "
-            "project_id and a date expressed as a Unix timestamp in milliseconds."
-        ),
-        "tpl_project_agent_default": (
-            "You are a project management assistant. Help the user create, find, "
-            "update, and archive projects. Projects have a name, an optional client, "
-            "and a status of either active or archived."
-        ),
-        "tpl_note_agent_default": (
-            "You are a note-taking assistant. Help the user create, retrieve, update, "
-            "and delete Markdown notes. Notes can optionally be linked to a project."
-        ),
-        "tpl_task_extract_from_project": (
-            "Extract all actionable tasks from the provided project context. "
-            "Return a structured list of tasks, each with a title, inferred priority "
-            "(high, medium, or low), suggested status (todo), and a due_date in "
-            "milliseconds where a deadline can be inferred."
-        ),
-        "tpl_note_weekly_summary": (
-            "Generate a weekly project summary note from the provided workspace data. "
-            "Include: tasks completed this week, tasks due soon, active projects, "
-            "and upcoming checkpoints. Format the output as clean Markdown."
-        ),
-    }
-    for tid, text in _tpls.items():
-        template_registry.register(tid, text)
-
-
-def _load_playbooks() -> None:
-    """Pre-build and cache the built-in playbooks."""
-    playbooks: list[tuple[str, ExecutionPlan]] = [
-        (
-            "create_tasks_from_project",
-            ExecutionPlanBuilder("project_agent")
-            .add_llm_step(
-                "tpl_task_extract_from_project",
-                {"source": "project_context"},
-            )
-            .add_data_step("create_record", data_from_step=0)
-            .build(),
-        ),
-        (
-            "generate_weekly_note",
-            ExecutionPlanBuilder("note_agent")
-            .add_llm_step(
-                "tpl_note_weekly_summary",
-                {"period": "last_7_days"},
-            )
-            .add_data_step("create_record", data_from_step=0)
-            .build(),
-        ),
-    ]
-    for key, plan in playbooks:
-        plan_cache.cache_plan(key, plan)
-
-
-# Initialise on module load
-_register_builtin_templates()
-_load_playbooks()
--- a/app/core/langfuse_client.py
+++ b/app/core/langfuse_client.py
@@ -0,0 +1,147 @@
+"""Langfuse observability — singleton client and prompt helpers.
+
+If LANGFUSE_SECRET_KEY / LANGFUSE_PUBLIC_KEY are not set,
+all helpers are no-ops so the app works without Langfuse configured.
+
+Usage
+-----
+Tracing::
+
+    from app.core.langfuse_client import get_langfuse
+
+    lf = get_langfuse()
+    if lf:
+        with lf.start_as_current_observation(as_type="span", name="my-agent") as span:
+            span.update(input=user_message)
+            # ... do work ...
+            span.update(output=result)
+        lf.flush()
+
+Prompt management::
+
+    from app.core.langfuse_client import get_prompt_or_fallback
+
+    text, prompt_obj = get_prompt_or_fallback("home_system", FALLBACK_PROMPT)
+    # Use text as the system prompt; pass prompt_obj to generations for linking.
+
+Linking a prompt to a generation::
+
+    with lf.start_as_current_observation(
+        as_type="generation",
+        name="llm-call",
+        model="gpt-4o",
+        prompt=prompt_obj,   # links generation → prompt version in the UI
+        input=messages,
+    ) as gen:
+        response = await llm.ainvoke(messages)
+        gen.update(output=response.content, usage=_usage(response))
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+_client: Any = None
+_initialized: bool = False
+
+
+def get_langfuse() -> Any | None:
+    """Return the Langfuse singleton, or ``None`` when not configured."""
+    global _client, _initialized
+    if _initialized:
+        return _client
+    _initialized = True
+
+    from app.config.settings import settings  # local import to avoid circular deps
+
+    if not settings.LANGFUSE_SECRET_KEY or not settings.LANGFUSE_PUBLIC_KEY:
+        logger.debug("langfuse: not configured — observability disabled")
+        return None
+
+    try:
+        from langfuse import Langfuse
+
+        _client = Langfuse(
+            secret_key=settings.LANGFUSE_SECRET_KEY,
+            public_key=settings.LANGFUSE_PUBLIC_KEY,
+            host=settings.LANGFUSE_HOST,
+        )
+        logger.info("langfuse: client initialized host=%s", settings.LANGFUSE_HOST)
+    except Exception as exc:
+        logger.warning("langfuse: failed to initialize: %s", exc)
+        _client = None
+
+    return _client
+
+
+def get_prompt_or_fallback(name: str, fallback: str) -> tuple[str, Any]:
+    """Fetch a text prompt from Langfuse; fall back to ``fallback`` on any error.
+
+    Returns ``(raw_template, prompt_obj_or_None)``.
+
+    * ``raw_template`` — the uncompiled template string.  Do NOT call ``.format()``
+      on it directly; use :func:`compile_prompt` instead so the correct variable
+      syntax is applied (``{{var}}`` for Langfuse, ``{var}`` for the fallback).
+    * ``prompt_obj`` — the Langfuse prompt object, or ``None`` when Langfuse is
+      unavailable / the fetch failed.  Pass this to generation observations so
+      Langfuse links the generation to the exact prompt version in the UI.
+    """
+    lf = get_langfuse()
+    if lf is None:
+        return fallback, None
+
+    try:
+        prompt = lf.get_prompt(name, label="production", fallback=fallback)
+        # For text-type prompts .prompt holds the raw template string.
+        raw = prompt.prompt if hasattr(prompt, "prompt") and isinstance(prompt.prompt, str) else fallback
+        return raw, prompt
+    except Exception as exc:
+        logger.warning("langfuse: get_prompt %r failed: %s — using fallback", name, exc)
+        return fallback, None
+
+
+def compile_prompt(template: str, prompt_obj: Any, **variables: Any) -> str:
+    """Compile *template* with *variables*, choosing the right syntax.
+
+    * When *prompt_obj* is a real Langfuse prompt object, calls
+      ``prompt_obj.compile(**variables)`` which handles ``{{variable}}``
+      substitution as defined in the Langfuse UI.
+    * When *prompt_obj* is ``None`` (Langfuse unavailable or fetch failed),
+      falls back to ``template.format(**variables)`` which handles the
+      ``{variable}`` syntax used in the hardcoded fallback strings.
+
+    This keeps callers oblivious to which syntax is in use.
+    """
+    if prompt_obj is not None:
+        try:
+            compiled = prompt_obj.compile(**variables)
+            # compile() returns a string for text prompts.
+            if isinstance(compiled, str):
+                return compiled
+            # Chat prompts return a list of dicts — join text parts.
+            if isinstance(compiled, list):
+                return "\n".join(
+                    m.get("content", "") for m in compiled if isinstance(m, dict)
+                )
+        except Exception as exc:
+            logger.warning(
+                "langfuse: compile failed for prompt %r: %s — falling back to .format()",
+                getattr(prompt_obj, "name", "?"),
+                exc,
+            )
+    return template.format(**variables)
+
+
+def extract_usage(response: Any) -> dict[str, int]:
+    """Extract token usage from a LangChain AI message into Langfuse format."""
+    meta = getattr(response, "usage_metadata", None)
+    if not meta:
+        return {}
+    return {
+        "input": int(meta.get("input_tokens", 0)),
+        "output": int(meta.get("output_tokens", 0)),
+        "total": int(meta.get("total_tokens", 0)),
+    }
--- a/app/core/llm.py
+++ b/app/core/llm.py
@@ -17,11 +17,30 @@ Switch providers by changing **LLM_MODEL** / **LLM_ROUTER_MODEL** in ``.env``

 from __future__ import annotations

+import os
+import warnings
+
+from openai import AsyncOpenAI
+import litellm
+
 from langchain_openai import ChatOpenAI
+from langchain_litellm import ChatLiteLLM
 from litellm import get_supported_openai_params  # noqa: F401 – validates install

 from app.config.settings import settings

+# Some models (e.g. gpt-5, o-series) reject unsupported params like temperature.
+# Drop them silently instead of raising UnsupportedParamsError.
+litellm.drop_params = True
+
+# Some provider responses include a plain dict in the `usage` field where a
+# richer Pydantic model is expected. This warning is noisy but non-fatal.
+warnings.filterwarnings(
+    "ignore",
+    message=r"PydanticSerializationUnexpectedValue\(Expected `ResponseAPIUsage`",
+    category=UserWarning,
+)
+

 def _api_key_for_model(model: str) -> str | None:
    """Return the most appropriate API key for the given LiteLLM model string."""
@@ -29,6 +48,12 @@ def _api_key_for_model(model: str) -> str | None:
        return settings.ANTHROPIC_API_KEY or None
    if model.startswith("gemini/") or model.startswith("google/"):
        return settings.GOOGLE_API_KEY or None
+    if model.startswith("cerebras/"):
+        return settings.CEREBRAS_API_KEY or None
+    if model.startswith("github_copilot/"):
+        # GitHub Copilot uses OAuth device-flow tokens managed by LiteLLM.
+        # No API key is required; returning None lets LiteLLM handle auth.
+        return None
    # Default: OpenAI-compatible (covers plain model names like "gpt-4o")
    return settings.OPENAI_API_KEY or None

@@ -37,7 +62,7 @@ def get_llm(
    *,
    model: str | None = None,
    temperature: float = 0,
-) -> ChatOpenAI:
+) -> ChatOpenAI | ChatLiteLLM:
    """Return a LangChain chat model backed by LiteLLM.

    LiteLLM exposes an OpenAI-compatible API, so we use ``ChatOpenAI`` pointed
@@ -53,6 +78,16 @@ def get_llm(
        Sampling temperature.  ``0`` = deterministic.
    """
    model = model or settings.LLM_MODEL
+
+    # Point LiteLLM to the custom token directory when configured.
+    if settings.GITHUB_COPILOT_TOKEN_DIR:
+        os.environ.setdefault("GITHUB_COPILOT_TOKEN_DIR", settings.GITHUB_COPILOT_TOKEN_DIR)
+
+    # Use ChatLiteLLM for provider-prefixed models (github_copilot/, anthropic/, etc.)
+    # so LiteLLM handles routing and auth. ChatOpenAI for plain OpenAI model names.
+    if "/" in model:
+        return ChatLiteLLM(model=model, temperature=temperature)
+
    return ChatOpenAI(
        model=model,
        temperature=temperature,
@@ -63,6 +98,28 @@ def get_llm(
 def get_router_llm(
    *,
    temperature: float = 0,
-) -> ChatOpenAI:
+) -> ChatOpenAI | ChatLiteLLM:
    """Return the lighter model used for intent classification / routing."""
    return get_llm(model=settings.LLM_ROUTER_MODEL, temperature=temperature)
+
+
+async def embed(text: str) -> list[float]:
+    """Return an embedding vector for *text*.
+
+    Uses ``settings.LLM_EMBED_MODEL`` so the same provider switch in ``.env``
+    (e.g. ``github_copilot/text-embedding-3-small``) applies here without any
+    code changes.  Falls back to the raw AsyncOpenAI client for plain OpenAI
+    model names to preserve existing behaviour.
+    """
+    model = settings.LLM_EMBED_MODEL
+
+    if model.startswith("github_copilot/") or "/" in model:
+        # Use LiteLLM for all provider-prefixed models (Copilot, Bedrock, etc.)
+        # so the provider's auth mechanism is applied correctly.
+        response = await litellm.aembedding(model=model, input=[text])
+        return response.data[0]["embedding"]
+
+    # Plain OpenAI model name — use the raw AsyncOpenAI client (existing path).
+    client = AsyncOpenAI(api_key=settings.OPENAI_API_KEY)
+    response = await client.embeddings.create(model=model, input=text)
+    return response.data[0].embedding
--- a/app/core/memory_middleware.py
+++ b/app/core/memory_middleware.py
@@ -0,0 +1,441 @@
+"""Memory Middleware — enrich requests with memory context and store interactions.
+
+Four-tier memory model (MemGPT-style):
+  core         — persistent key/value user preferences, always injected
+  associative  — semantic similarity search via pgvector (top-k)
+  episodic     — recent session summaries (last N)
+  proactive    — behavioral patterns above confidence threshold
+
+All memory content is encrypted at rest using the per-user Fernet key
+stored in User.encryption_key. Decryption happens in-memory only.
+
+Usage:
+    memory = MemoryMiddleware(db_session)
+    context = await memory.enrich_context(user_id, message)
+    # ... run agent ...
+    await memory.store_episode(user_id, session_id, message, response)
+"""
+
+from __future__ import annotations
+
+import logging
+import uuid
+from typing import Any
+
+from cryptography.fernet import Fernet, InvalidToken
+from sqlalchemy import select
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.models import (
+    MemoryAssociative,
+    MemoryCore,
+    MemoryEpisodic,
+    MemoryProactive,
+    User,
+)
+
+logger = logging.getLogger(__name__)
+
+# Tuning constants
+_ASSOCIATIVE_TOP_K = 5
+_EPISODIC_RECENT_N = 10
+_PROACTIVE_CONFIDENCE_THRESHOLD = 0.6
+
+
+class MemoryMiddleware:
+    """Enrich orchestrator context with memory and persist interactions after."""
+
+    def __init__(self, db: AsyncSession) -> None:
+        self._db = db
+
+    # ── Public API ────────────────────────────────────────────────────────────
+
+    async def enrich_context(
+        self,
+        user_id: str,
+        message: str,
+        trace_id: str | None = None,
+        session_id: str | None = None,
+    ) -> dict[str, Any]:
+        """Build memory context dict to inject into the orchestrator before LLM call.
+
+        Returns a dict with keys:
+          core_memory        — {key: plaintext_value, ...}
+          associative_memory — [plaintext_content, ...]  (top-k by keyword match)
+          episodic_memory    — [plaintext_summary, ...]  (most recent N)
+          proactive_hints    — [plaintext_pattern, ...]  (above threshold)
+        """
+        fernet = await self._get_fernet(user_id)
+        if fernet is None:
+            return {}
+
+        core = await self._load_core(user_id, fernet)
+        associative = await self._load_associative(user_id, message, fernet)
+        episodic = await self._load_episodic(user_id, fernet, session_id=session_id)
+        proactive = await self._load_proactive(user_id, fernet)
+
+        user_dbg = await self._get_user_debug(user_id)
+        logger.info(
+            "memory: enrich_context trace=%s user=%s tier=%s core=%d associative=%d episodic=%d proactive=%d",
+            trace_id or "-",
+            user_id,
+            user_dbg.get("tier") or "-",
+            len(core),
+            len(associative),
+            len(episodic),
+            len(proactive),
+        )
+
+        return {
+            "core_memory": core,
+            "associative_memory": associative,
+            "episodic_memory": episodic,
+            "proactive_hints": proactive,
+        }
+
+    async def store_episode(
+        self,
+        user_id: str,
+        session_id: str,
+        message: str,
+        response: str,
+        trace_id: str | None = None,
+    ) -> None:
+        """Summarise and store a completed interaction in episodic memory.
+
+        The summary is a simple heuristic concatenation (no LLM call) to keep
+        latency low. Full LLM summarisation can be added in a later step.
+        """
+        fernet = await self._get_fernet(user_id)
+        if fernet is None:
+            return
+
+        summary = f"User: {message[:200]}\nAssistant: {response[:200]}"
+        encrypted = _encrypt(fernet, summary)
+
+        row = MemoryEpisodic(
+            id=str(uuid.uuid4()),
+            user_id=user_id,
+            summary_encrypted=encrypted,
+            session_id=session_id,
+        )
+        self._db.add(row)
+        try:
+            await self._db.commit()
+            user_dbg = await self._get_user_debug(user_id)
+            logger.info(
+                "memory: store_episode trace=%s user=%s tier=%s session=%s",
+                trace_id or "-",
+                user_id,
+                user_dbg.get("tier") or "-",
+                session_id,
+            )
+        except Exception as exc:
+            logger.error("memory: store_episode failed user=%s: %s", user_id, exc)
+            await self._db.rollback()
+
+    async def update_core(self, user_id: str, key: str, value: str, trace_id: str | None = None) -> None:
+        """Upsert a core memory key/value for a user."""
+        fernet = await self._get_fernet(user_id)
+        if fernet is None:
+            return
+
+        encrypted = _encrypt(fernet, value)
+
+        result = await self._db.execute(
+            select(MemoryCore).where(
+                MemoryCore.user_id == user_id,
+                MemoryCore.key == key,
+            )
+        )
+        existing = result.scalar_one_or_none()
+        if existing is not None:
+            existing.value_encrypted = encrypted
+        else:
+            self._db.add(MemoryCore(
+                id=str(uuid.uuid4()),
+                user_id=user_id,
+                key=key,
+                value_encrypted=encrypted,
+            ))
+        try:
+            await self._db.commit()
+            user_dbg = await self._get_user_debug(user_id)
+            logger.info(
+                "memory: update_core trace=%s user=%s tier=%s key=%s",
+                trace_id or "-",
+                user_id,
+                user_dbg.get("tier") or "-",
+                key,
+            )
+        except Exception as exc:
+            logger.error("memory: update_core failed user=%s key=%s: %s", user_id, key, exc)
+            await self._db.rollback()
+
+    async def list_core_blocks(self, user_id: str) -> list[dict[str, str]]:
+        """Return core memory as editable blocks (label/value)."""
+        fernet = await self._get_fernet(user_id)
+        if fernet is None:
+            return []
+
+        result = await self._db.execute(
+            select(MemoryCore)
+            .where(MemoryCore.user_id == user_id)
+            .order_by(MemoryCore.key.asc())
+        )
+        rows = result.scalars().all()
+        out: list[dict[str, str]] = []
+        for row in rows:
+            plaintext = _safe_decrypt(fernet, row.value_encrypted)
+            if plaintext is not None:
+                out.append({"label": row.key, "value": plaintext})
+        logger.debug("memory: list_core_blocks user=%s count=%d", user_id, len(out))
+        return out
+
+    async def get_core_block(self, user_id: str, label: str) -> str | None:
+        """Return a single core memory block value by label."""
+        fernet = await self._get_fernet(user_id)
+        if fernet is None:
+            return None
+
+        result = await self._db.execute(
+            select(MemoryCore).where(
+                MemoryCore.user_id == user_id,
+                MemoryCore.key == label,
+            )
+        )
+        row = result.scalar_one_or_none()
+        if row is None:
+            logger.debug("memory: get_core_block user=%s label=%s found=0", user_id, label)
+            return None
+        value = _safe_decrypt(fernet, row.value_encrypted)
+        logger.debug("memory: get_core_block user=%s label=%s found=%d", user_id, label, 1 if value is not None else 0)
+        return value
+
+    async def delete_core(self, user_id: str, label: str) -> bool:
+        """Delete a core memory block by label. Returns True if deleted."""
+        result = await self._db.execute(
+            select(MemoryCore).where(
+                MemoryCore.user_id == user_id,
+                MemoryCore.key == label,
+            )
+        )
+        row = result.scalar_one_or_none()
+        if row is None:
+            logger.debug("memory: delete_core user=%s label=%s found=0", user_id, label)
+            return False
+
+        await self._db.delete(row)
+        try:
+            await self._db.commit()
+            logger.info("memory: delete_core user=%s label=%s", user_id, label)
+            return True
+        except Exception as exc:
+            logger.error("memory: delete_core failed user=%s label=%s: %s", user_id, label, exc)
+            await self._db.rollback()
+            return False
+
+    async def append_core(self, user_id: str, label: str, content: str) -> None:
+        """Append content to a core block, creating it if missing."""
+        current = await self.get_core_block(user_id, label)
+        if current is None:
+            await self.update_core(user_id, label, content)
+            logger.info("memory: append_core user=%s label=%s created=1", user_id, label)
+            return
+        await self.update_core(user_id, label, f"{current}\n{content}")
+        logger.info("memory: append_core user=%s label=%s created=0", user_id, label)
+
+    async def replace_core(self, user_id: str, label: str, old: str, new: str) -> bool:
+        """Replace one exact string inside a core block. Returns False if not found."""
+        current = await self.get_core_block(user_id, label)
+        if current is None or old not in current:
+            logger.debug("memory: replace_core user=%s label=%s changed=0", user_id, label)
+            return False
+        await self.update_core(user_id, label, current.replace(old, new, 1))
+        logger.info("memory: replace_core user=%s label=%s changed=1", user_id, label)
+        return True
+
+    async def insert_archival(self, user_id: str, content: str, source: str = "manual") -> None:
+        """Insert a long-term archival memory entry."""
+        fernet = await self._get_fernet(user_id)
+        if fernet is None:
+            return
+
+        encrypted = _encrypt(fernet, content)
+        row = MemoryAssociative(
+            id=str(uuid.uuid4()),
+            user_id=user_id,
+            content_encrypted=encrypted,
+            embedding=None,
+            entity_type=source,
+            entity_id=None,
+        )
+        self._db.add(row)
+        try:
+            await self._db.commit()
+            logger.info("memory: insert_archival user=%s source=%s", user_id, source)
+        except Exception as exc:
+            logger.error("memory: insert_archival failed user=%s: %s", user_id, exc)
+            await self._db.rollback()
+
+    async def search_archival(self, user_id: str, query: str, top_k: int = 5) -> list[str]:
+        """Search archival memory (keyword fallback; semantic ranking can replace this)."""
+        fernet = await self._get_fernet(user_id)
+        if fernet is None:
+            return []
+
+        result = await self._db.execute(
+            select(MemoryAssociative)
+            .where(MemoryAssociative.user_id == user_id)
+            .order_by(MemoryAssociative.updated_at.desc())
+            .limit(100)
+        )
+        rows = result.scalars().all()
+        needle = query.strip().lower()
+        out: list[str] = []
+        for row in rows:
+            plaintext = _safe_decrypt(fernet, row.content_encrypted)
+            if plaintext is None:
+                continue
+            if not needle or needle in plaintext.lower():
+                out.append(plaintext)
+            if len(out) >= max(top_k, 1):
+                break
+        logger.info("memory: search_archival user=%s query=%s hits=%d", user_id, query[:80], len(out))
+        return out
+
+    async def search_recall(self, user_id: str, query: str, top_k: int = 5) -> list[str]:
+        """Search recall memory (episodic summaries) by keyword."""
+        fernet = await self._get_fernet(user_id)
+        if fernet is None:
+            return []
+
+        result = await self._db.execute(
+            select(MemoryEpisodic)
+            .where(MemoryEpisodic.user_id == user_id)
+            .order_by(MemoryEpisodic.created_at.desc())
+            .limit(100)
+        )
+        rows = result.scalars().all()
+        needle = query.strip().lower()
+        out: list[str] = []
+        for row in rows:
+            plaintext = _safe_decrypt(fernet, row.summary_encrypted)
+            if plaintext is None:
+                continue
+            if not needle or needle in plaintext.lower():
+                out.append(plaintext)
+            if len(out) >= max(top_k, 1):
+                break
+        logger.info("memory: search_recall user=%s query=%s hits=%d", user_id, query[:80], len(out))
+        return out
+
+    # ── Private helpers ───────────────────────────────────────────────────────
+
+    async def _get_fernet(self, user_id: str) -> Fernet | None:
+        """Load the user's Fernet key from DB. Returns None if missing."""
+        result = await self._db.execute(select(User).where(User.id == user_id))
+        user = result.scalar_one_or_none()
+        if user is None or not user.encryption_key:
+            logger.warning("memory: no encryption_key for user=%s", user_id)
+            return None
+        return Fernet(user.encryption_key.encode())
+
+    async def _get_user_debug(self, user_id: str) -> dict[str, str | None]:
+        """Load lightweight user debug fields for trace logs."""
+        result = await self._db.execute(select(User).where(User.id == user_id))
+        user = result.scalar_one_or_none()
+        if user is None:
+            return {"tier": None}
+        return {
+            "tier": user.tier,
+        }
+
+    async def _load_core(self, user_id: str, fernet: Fernet) -> dict[str, str]:
+        result = await self._db.execute(
+            select(MemoryCore).where(MemoryCore.user_id == user_id)
+        )
+        rows = result.scalars().all()
+        out: dict[str, str] = {}
+        for row in rows:
+            plaintext = _safe_decrypt(fernet, row.value_encrypted)
+            if plaintext is not None:
+                out[row.key] = plaintext
+        return out
+
+    async def _load_associative(
+        self, user_id: str, message: str, fernet: Fernet
+    ) -> list[str]:
+        """Load top-k associative memories.
+
+        Production: uses pgvector cosine similarity on the message embedding.
+        Current implementation: keyword-based fallback (no external embedding call)
+        so tests pass without a live OpenAI key.
+        """
+        result = await self._db.execute(
+            select(MemoryAssociative)
+            .where(MemoryAssociative.user_id == user_id)
+            .order_by(MemoryAssociative.updated_at.desc())
+            .limit(_ASSOCIATIVE_TOP_K)
+        )
+        rows = result.scalars().all()
+        out: list[str] = []
+        for row in rows:
+            plaintext = _safe_decrypt(fernet, row.content_encrypted)
+            if plaintext is not None:
+                out.append(plaintext)
+        return out
+
+    async def _load_episodic(
+        self,
+        user_id: str,
+        fernet: Fernet,
+        session_id: str | None = None,
+    ) -> list[str]:
+        query = select(MemoryEpisodic).where(MemoryEpisodic.user_id == user_id)
+        if session_id:
+            query = query.where(MemoryEpisodic.session_id == session_id)
+        result = await self._db.execute(
+            query
+            .order_by(MemoryEpisodic.created_at.desc())
+            .limit(_EPISODIC_RECENT_N)
+        )
+        rows = result.scalars().all()
+        out: list[str] = []
+        for row in rows:
+            plaintext = _safe_decrypt(fernet, row.summary_encrypted)
+            if plaintext is not None:
+                out.append(plaintext)
+        return out
+
+    async def _load_proactive(self, user_id: str, fernet: Fernet) -> list[str]:
+        result = await self._db.execute(
+            select(MemoryProactive)
+            .where(
+                MemoryProactive.user_id == user_id,
+                MemoryProactive.confidence >= _PROACTIVE_CONFIDENCE_THRESHOLD,
+            )
+            .order_by(MemoryProactive.confidence.desc())
+        )
+        rows = result.scalars().all()
+        out: list[str] = []
+        for row in rows:
+            plaintext = _safe_decrypt(fernet, row.pattern_encrypted)
+            if plaintext is not None:
+                out.append(plaintext)
+        return out
+
+
+# ── Encryption helpers ────────────────────────────────────────────────────────
+
+def _encrypt(fernet: Fernet, plaintext: str) -> str:
+    return fernet.encrypt(plaintext.encode()).decode()
+
+
+def _safe_decrypt(fernet: Fernet, ciphertext: str) -> str | None:
+    """Decrypt and return plaintext, or None on error (corrupted/wrong key)."""
+    try:
+        return fernet.decrypt(ciphertext.encode()).decode()
+    except (InvalidToken, Exception) as exc:
+        logger.warning("memory: decrypt failed: %s", exc)
+        return None
--- a/app/core/orchestrator.py
+++ b/app/core/orchestrator.py
@@ -1,168 +0,0 @@
-"""Orchestrator — LLM-based intent router and agent pipeline."""
-
-from __future__ import annotations
-
-import json
-from typing import Any, AsyncGenerator
-
-from langchain_core.messages import HumanMessage, SystemMessage
-
-from app.core.agent_registry import AgentRegistry
-from app.core.llm import get_router_llm
-from app.core.agent_registry import registry as _default_registry
-from app.schemas import ChatRequest, ChatResponse, ExecutionPlan
-
-_FALLBACK_AGENT = "task_agent"
-
-_CLASSIFY_SYSTEM = (
-    "You are an intent classifier. Given the user message and context, decide "
-    "which agent to route to.\n"
-    "Available agents: {agents}\n"
-    "Respond with just the agent name, nothing else."
-)
-
-_SYNTHESIZE_HUMAN = (
-    "Combine the following agent results into one coherent response.\n\n"
-    "Agent results:\n{results}\n\n"
-    "Original message: {message}"
-)
-
-
-def _make_llm():
-    return get_router_llm()
-
-
-async def classify_intent(
-    message: str,
-    context: dict[str, Any],
-    reg: AgentRegistry,
-) -> str:
-    """Use gpt-4o-mini to classify intent and return the matching agent name.
-
-    Falls back to ``task_agent`` when the registry is empty or the model
-    returns a name that is not registered.
-    """
-    agents = reg.list_agents()
-    if not agents:
-        return _FALLBACK_AGENT
-
-    system = _CLASSIFY_SYSTEM.format(agents=json.dumps(agents))
-    # Truncate context to keep the classification prompt short
-    human = f"Message: {message}\nContext summary: {json.dumps(context)[:500]}"
-
-    llm = _make_llm()
-    response = await llm.ainvoke(
-        [SystemMessage(content=system), HumanMessage(content=human)]
-    )
-
-    agent_name = str(response.content).strip().lower()
-    known = {a["name"] for a in agents}
-    return agent_name if agent_name in known else _FALLBACK_AGENT
-
-
-async def route_single(
-    agent_name: str,
-    message: str,
-    context: dict[str, Any],
-    reg: AgentRegistry,
-) -> ChatResponse:
-    """Route to a single agent and wrap the result in a ``ChatResponse``."""
-    response_text = await reg.call_agent(agent_name, message, context)
-    return ChatResponse(response=response_text)
-
-
-async def route_pipeline(
-    agent_names: list[str],
-    message: str,
-    context: dict[str, Any],
-    reg: AgentRegistry,
-) -> ChatResponse:
-    """Execute agents sequentially; each agent receives previous results in context.
-
-    A final LLM synthesis call merges all results into one coherent response.
-    """
-    previous_results: list[str] = []
-
-    for agent_name in agent_names:
-        ctx = {**context, "previous_results": list(previous_results)}
-        result = await reg.call_agent(agent_name, message, ctx)
-        previous_results.append(result)
-
-    results_str = "\n\n".join(
-        f"[{name}]: {res}" for name, res in zip(agent_names, previous_results)
-    )
-    human = _SYNTHESIZE_HUMAN.format(results=results_str, message=message)
-    llm = _make_llm()
-    synthesis = await llm.ainvoke([HumanMessage(content=human)])
-    return ChatResponse(response=str(synthesis.content))
-
-
-def _build_plan(agent_name: str, message: str) -> ExecutionPlan:
-    """Build an ``ExecutionPlan`` for the resolved agent.
-
-    Uses ``ExecutionPlanBuilder`` with the server-side template registry.
-    If a default template exists for the agent, an LLM step is emitted;
-    otherwise a plain ``handle`` action step is used.
-    """
-    from app.core.execution_plan import ExecutionPlanBuilder, template_registry
-
-    template_id = f"tpl_{agent_name}_default"
-    builder = ExecutionPlanBuilder(agent_name)
-    if template_registry.has(template_id):
-        builder.add_llm_step(template_id, {"message": message})
-    else:
-        builder.add_step("handle", {"message": message})
-    return builder.build()
-
-
-async def orchestrate(
-    request: ChatRequest,
-    reg: AgentRegistry | None = None,
-) -> ChatResponse | ExecutionPlan:
-    """Main orchestration entry point.
-
-    * Classifies the user's intent to select an agent.
-    * ``execution_mode == 'direct'``: routes to the agent and returns a
-      ``ChatResponse``.
-    * ``execution_mode == 'plan'``: returns an ``ExecutionPlan`` with the
-      resolved agent and a template-ID-only step (prompt IP stays server-side).
-    """
-    if reg is None:
-        reg = _default_registry
-
-    context = request.context.model_dump()
-    agent_name = await classify_intent(request.message, context, reg)
-
-    if request.execution_mode == "direct":
-        return await route_single(agent_name, request.message, context, reg)
-
-    # plan mode — return plan, do not execute
-    return _build_plan(agent_name, request.message)
-
-
-async def orchestrate_stream(
-    request: ChatRequest,
-    reg: AgentRegistry | None = None,
-) -> AsyncGenerator[str, None]:
-    """Streaming orchestration — yields text chunks then a final JSON frame.
-
-    The final frame is a JSON object:
-    ``{"done": true, "response": "...", "actions": []}``.
-
-    Agents do not yet support token-level streaming; the full response is
-    fetched first, then emitted in fixed-size chunks.  Token-level streaming
-    will be wired in Step 6 when agents expose ``astream()``.
-    """
-    if reg is None:
-        reg = _default_registry
-
-    context = request.context.model_dump()
-    agent_name = await classify_intent(request.message, context, reg)
-    response_text = await reg.call_agent(agent_name, request.message, context)
-
-    chunk_size = 50
-    for i in range(0, len(response_text), chunk_size):
-        yield response_text[i : i + chunk_size]
-
-    final = ChatResponse(response=response_text)
-    yield json.dumps({"done": True, **final.model_dump()})
--- a/app/core/output_formatter.py
+++ b/app/core/output_formatter.py
@@ -0,0 +1,47 @@
+"""Output formatter for deep-agent stream events."""
+
+from __future__ import annotations
+
+from collections.abc import AsyncGenerator
+from typing import Any
+
+from app.schemas import WsFloatingDomain, WsStreamEnd, WsStreamStart, WsStreamText
+
+WsFrame = WsStreamStart | WsStreamText | WsStreamEnd | WsFloatingDomain
+
+
+class StreamFormatter:
+    """Convert `(event_type, data)` stream events into websocket frame models."""
+
+    def __init__(self, request_id: str) -> None:
+        self.request_id = request_id
+
+    async def format(
+        self,
+        event_stream: AsyncGenerator[tuple[str, Any], None],
+    ) -> AsyncGenerator[WsFrame, None]:
+        started = False
+
+        async for event_type, data in event_stream:
+            if event_type == "floating_domain":
+                if isinstance(data, dict):
+                    yield WsFloatingDomain(
+                        request_id=self.request_id,
+                        domain=data,
+                    )
+                continue
+
+            if event_type != "token":
+                continue
+
+            if not started:
+                yield WsStreamStart(request_id=self.request_id)
+                started = True
+
+            text = str(data or "")
+            if text:
+                yield WsStreamText(request_id=self.request_id, chunk=text)
+
+        if not started:
+            yield WsStreamStart(request_id=self.request_id)
+        yield WsStreamEnd(request_id=self.request_id)
--- a/app/core/preprocessors/init.py
+++ b/app/core/preprocessors/init.py
@@ -0,0 +1,104 @@
+"""Preprocessor registry: detect content type and dispatch to handlers.
+
+Public API
+----------
+detect_content_type(filename, raw_content) -> str
+    Heuristic detection based on file extension and content patterns.
+
+preprocess(content_type, raw_content) -> PreprocessResult
+    Dispatch to the appropriate handler.
+"""
+
+from __future__ import annotations
+
+import re
+
+from app.core.preprocessors.base import PreprocessResult
+
+# ── Heuristics ────────────────────────────────────────────────────────
+
+# Patterns that strongly suggest an email HTML file
+_EMAIL_SIGNALS = re.compile(
+    r"(Subject:|From:|To:|Date:|Sent:|MIME-Version:|Content-Type:\s*text/html)",
+    re.IGNORECASE,
+)
+
+# Patterns that suggest a generic HTML page (not an email)
+_GENERIC_HTML_SIGNALS = re.compile(
+    r"<(nav|main|header|footer|article|section)\b",
+    re.IGNORECASE,
+)
+
+
+def detect_content_type(filename: str, raw_content: str) -> str:
+    """Return a content-type string for the given file.
+
+    Supported types: ``"email_html"``, ``"generic_html"``,
+    ``"plain_text"``, ``"unknown"``.
+    """
+    ext = filename.rsplit(".", 1)[-1].lower() if "." in filename else ""
+
+    if ext == "txt":
+        return "plain_text"
+
+    if ext in ("html", "htm", "eml", "mhtml", "mht"):
+        # Prefer email detection over generic HTML
+        if _EMAIL_SIGNALS.search(raw_content[:4096]):
+            return "email_html"
+        if _GENERIC_HTML_SIGNALS.search(raw_content[:4096]) or "<html" in raw_content[:200].lower():
+            return "generic_html"
+        # .html without clear signals — check for any email header
+        if re.search(r"^(From|To|Subject|Date):", raw_content[:2048], re.MULTILINE | re.IGNORECASE):
+            return "email_html"
+        return "generic_html"
+
+    # Plain text files with email headers
+    if ext in ("", "txt") or not ext:
+        if _EMAIL_SIGNALS.search(raw_content[:4096]):
+            return "email_html"
+
+    # Detect binary content
+    try:
+        raw_content.encode("utf-8")
+    except (UnicodeEncodeError, AttributeError):
+        return "unknown"
+
+    # Non-text bytes heuristic: high ratio of non-printable chars
+    sample = raw_content[:512]
+    non_printable = sum(1 for c in sample if ord(c) < 32 and c not in "\r\n\t")
+    if len(sample) > 0 and non_printable / len(sample) > 0.1:
+        return "unknown"
+
+    return "unknown"
+
+
+# ── Generic fallback handler ──────────────────────────────────────────
+
+def _preprocess_generic(raw_content: str, content_type: str) -> PreprocessResult:
+    """Strip HTML tags if present, return text as-is."""
+    try:
+        from bs4 import BeautifulSoup
+        text = BeautifulSoup(raw_content, "html.parser").get_text(separator="\n")
+    except ImportError:
+        # No BeautifulSoup — strip tags with a simple regex
+        text = re.sub(r"<[^>]+>", "", raw_content)
+
+    text = re.sub(r"\n{3,}", "\n\n", text).strip()
+    return PreprocessResult(content_type=content_type, clean_text=text, metadata={})
+
+
+# ── Dispatch ──────────────────────────────────────────────────────────
+
+def preprocess(content_type: str, raw_content: str) -> PreprocessResult:
+    """Dispatch *raw_content* to the handler registered for *content_type*.
+
+    Falls back to the generic handler for unknown types.
+    """
+    if content_type == "email_html":
+        from app.core.preprocessors.email_html import preprocess_email_html
+        return preprocess_email_html(raw_content)
+
+    return _preprocess_generic(raw_content, content_type)
+
+
+__all__ = ["detect_content_type", "preprocess", "PreprocessResult"]
--- a/app/core/preprocessors/base.py
+++ b/app/core/preprocessors/base.py
@@ -0,0 +1,25 @@
+"""Base types for the preprocessor system."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+
+
+@dataclass
+class PreprocessResult:
+    """Output of a preprocessor handler.
+
+    Attributes
+    ----------
+    content_type:
+        The detected content type (e.g. ``"email_html"``, ``"plain_text"``).
+    clean_text:
+        Human-readable text stripped of markup/binary noise.
+    metadata:
+        Dict of extracted metadata (keys vary by handler).
+        Common keys: ``subject``, ``from``, ``to``, ``date``, ``filename``.
+    """
+
+    content_type: str
+    clean_text: str
+    metadata: dict = field(default_factory=dict)
--- a/app/core/preprocessors/email_html.py
+++ b/app/core/preprocessors/email_html.py
@@ -0,0 +1,111 @@
+"""Preprocessor for email HTML files.
+
+Handles:
+- HTML stripping via BeautifulSoup
+- Metadata extraction (Subject, From, To, Date)
+- Thread splitting — isolates the latest reply
+"""
+
+from __future__ import annotations
+
+import re
+from typing import TYPE_CHECKING
+
+from app.core.preprocessors.base import PreprocessResult
+
+if TYPE_CHECKING:
+    pass
+
+# ── Thread split markers ──────────────────────────────────────────────
+
+# Matches patterns like:
+#   "On Mon, Apr 7, 2026 at 10:00 AM, Alice <alice@co.com> wrote:"
+#   "-----Original Message-----"
+#   "> " (plain-text quote prefix)
+_THREAD_PATTERNS = [
+    re.compile(r"^On\s+.+wrote\s*:", re.IGNORECASE | re.MULTILINE),
+    re.compile(r"^-{3,}\s*(original message|forwarded message)\s*-{3,}", re.IGNORECASE | re.MULTILINE),
+    re.compile(r"^>{1,}\s+\S", re.MULTILINE),
+    re.compile(r"^From:\s+.+\nSent:\s+", re.IGNORECASE | re.MULTILINE),
+]
+
+# ── Metadata patterns (applied on raw HTML / plain fallback) ──────────
+
+_META_PATTERNS: dict[str, list[re.Pattern]] = {
+    "subject": [
+        re.compile(r"<title>(.+?)</title>", re.IGNORECASE | re.DOTALL),
+        re.compile(r"Subject:\s*(.+)", re.IGNORECASE),
+    ],
+    "from": [
+        re.compile(r'<meta[^>]+name=["\']?from["\']?[^>]+content=["\']([^"\']+)["\']', re.IGNORECASE),
+        re.compile(r"From:\s*(.+)", re.IGNORECASE),
+    ],
+    "to": [
+        re.compile(r'<meta[^>]+name=["\']?to["\']?[^>]+content=["\']([^"\']+)["\']', re.IGNORECASE),
+        re.compile(r"To:\s*(.+)", re.IGNORECASE),
+    ],
+    "date": [
+        re.compile(r'<meta[^>]+name=["\']?date["\']?[^>]+content=["\']([^"\']+)["\']', re.IGNORECASE),
+        re.compile(r"Date:\s*(.+)", re.IGNORECASE),
+        re.compile(r"Sent:\s*(.+)", re.IGNORECASE),
+    ],
+}
+
+
+def _extract_metadata(raw_html: str, text: str) -> dict:
+    """Extract Subject/From/To/Date from raw HTML or plain text."""
+    metadata: dict[str, str] = {}
+    for field, patterns in _META_PATTERNS.items():
+        for pat in patterns:
+            m = pat.search(raw_html) or pat.search(text)
+            if m:
+                metadata[field] = m.group(1).strip()
+                break
+    return metadata
+
+
+def _split_thread(text: str) -> str:
+    """Return only the latest message in a threaded email."""
+    earliest_pos: int | None = None
+    for pat in _THREAD_PATTERNS:
+        m = pat.search(text)
+        if m and (earliest_pos is None or m.start() < earliest_pos):
+            earliest_pos = m.start()
+
+    if earliest_pos is not None and earliest_pos > 0:
+        return text[:earliest_pos].strip()
+    return text.strip()
+
+
+def preprocess_email_html(raw_content: str) -> PreprocessResult:
+    """Strip HTML, extract metadata, split thread from an email HTML file."""
+    try:
+        from bs4 import BeautifulSoup  # lazy import — optional dep
+    except ImportError as exc:
+        raise ImportError(
+            "beautifulsoup4 is required for email_html preprocessing. "
+            "Install it with: pip install beautifulsoup4"
+        ) from exc
+
+    # Parse with lxml if available, fall back to html.parser
+    try:
+        soup = BeautifulSoup(raw_content, "lxml")
+    except Exception:
+        soup = BeautifulSoup(raw_content, "html.parser")
+
+    # Remove noise tags
+    for tag in soup(["style", "script", "head", "noscript"]):
+        tag.decompose()
+
+    clean_text = soup.get_text(separator="\n")
+    # Collapse excessive blank lines
+    clean_text = re.sub(r"\n{3,}", "\n\n", clean_text).strip()
+
+    metadata = _extract_metadata(raw_content, clean_text)
+    latest_message = _split_thread(clean_text)
+
+    return PreprocessResult(
+        content_type="email_html",
+        clean_text=latest_message,
+        metadata=metadata,
+    )
--- a/app/core/ws_context.py
+++ b/app/core/ws_context.py
@@ -0,0 +1,92 @@
+"""WebSocket client executor context.
+
+Holds a per-request async callback that tools call to execute CRUD
+operations on the Electron client's local SQLite / LanceDB databases.
+The callback sends a `tool_call` WS frame and awaits the `tool_result`.
+"""
+
+from __future__ import annotations
+
+from contextvars import ContextVar
+from typing import Any, Callable, Coroutine
+from uuid import uuid4
+
+# Holds the execute callback for the current WS session.
+# Set by the chat WS handler before the orchestrator runs; cleared after.
+_client_executor: ContextVar[Callable[[dict], Coroutine[Any, Any, dict]]] = ContextVar(
+    "_client_executor"
+)
+
+# Optional collector that captures raw execute_on_client results.
+# Set by _tool_loop / _tool_loop_stream to populate ChatAgent.tool_results.
+_tool_result_collector: ContextVar[list[dict] | None] = ContextVar(
+    "_tool_result_collector", default=None
+)
+
+
+def set_tool_result_collector(lst: list[dict]) -> None:
+    """Register *lst* as the collector for this async context."""
+    _tool_result_collector.set(lst)
+
+
+def clear_tool_result_collector() -> None:
+    """Clear the collector (best-effort)."""
+    _tool_result_collector.set(None)
+
+
+def set_client_executor(fn: Callable[[dict], Coroutine[Any, Any, dict]]) -> None:
+    """Bind *fn* as the executor for the current async context (task/coroutine)."""
+    _client_executor.set(fn)
+
+
+def clear_client_executor() -> None:
+    """Remove the executor binding (best-effort; ContextVar resets on task exit)."""
+    try:
+        _client_executor.set(None)  # type: ignore[arg-type]
+    except Exception:
+        pass
+
+
+async def execute_on_client(
+    action: str,
+    table: str | None = None,
+    data: dict[str, Any] | None = None,
+    filters: dict[str, Any] | None = None,
+    vector: list[float] | None = None,
+    limit: int | None = None,
+) -> dict[str, Any]:
+    """Send a CRUD/vector operation to the Electron client and return the result.
+
+    Builds a ``tool_call`` payload, invokes the per-session WS callback,
+    and returns the ``tool_result`` dict from Electron.
+
+    Raises ``RuntimeError`` if no executor is set (i.e. called outside a WS session).
+    """
+    callback = _client_executor.get(None)
+    if callback is None:
+        raise RuntimeError(
+            "execute_on_client() called outside a WebSocket session — "
+            "no client executor is set."
+        )
+
+    payload: dict[str, Any] = {"id": str(uuid4()), "action": action}
+    if table is not None:
+        payload["table"] = table
+    if data is not None:
+        payload["data"] = data
+    if filters is not None:
+        payload["filters"] = {k: v for k, v in filters.items() if v is not None}
+    if vector is not None:
+        payload["vector"] = vector
+    if limit is not None:
+        payload["limit"] = limit
+
+    result = await callback(payload)
+    collector = _tool_result_collector.get(None)
+    if collector is not None:
+        collector.append({
+            "action": action,
+            "table": table,
+            "data": result,
+        })
+    return result
--- a/app/db.py
+++ b/app/db.py
@@ -24,7 +24,7 @@ from app.config.settings import settings
 engine = create_async_engine(
    settings.DATABASE_URL,
    pool_pre_ping=True,
-    echo=settings.ENV == "dev",
+    echo=False,
 )

 async_session = async_sessionmaker(engine, expire_on_commit=False)
--- a/app/integrations/init.py
+++ b/app/integrations/init.py
@@ -0,0 +1,164 @@
+"""Cloud provider integration utilities.
+
+Provides:
+  * Shared message dataclasses (``EmailMessage``, ``ChatMessage``) used by
+    both the Gmail and MS Graph clients and consumed by ``agent_runner``.
+  * ``get_provider()`` — factory that returns the correct client given a
+    provider name and decrypted OAuth credentials dict.
+  * ``encrypt_token()`` / ``decrypt_token()`` — Fernet-based at-rest
+    encryption for OAuth tokens stored in ``cloud_agent_configs``.
+
+Encryption rationale
+--------------------
+Unlike user content (which is E2E-encrypted client-side and **never**
+decrypted server-side), OAuth tokens *must* be decrypted server-side
+because the backend makes provider API calls on behalf of the user.
+The Fernet key lives solely in ``OAUTH_ENCRYPTION_KEY`` env var — it
+is never returned to clients.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from dataclasses import dataclass, field
+from datetime import datetime
+from typing import TYPE_CHECKING
+
+from cryptography.fernet import Fernet, InvalidToken
+
+from app.config.settings import settings
+
+if TYPE_CHECKING:
+    from app.integrations.gmail import GmailClient
+    from app.integrations.ms_graph import MSGraphClient
+
+logger = logging.getLogger(__name__)
+
+# ── Shared message types ──────────────────────────────────────────────────
+
+
+@dataclass
+class EmailMessage:
+    """A single email message fetched from Gmail or Outlook."""
+
+    id: str
+    subject: str
+    sender: str
+    body_text: str
+    date: datetime
+    labels: list[str] = field(default_factory=list)
+
+    @property
+    def as_text(self) -> str:
+        """Return a human-readable text representation for LLM extraction."""
+        date_str = self.date.strftime("%Y-%m-%d %H:%M")
+        labels_str = f" [{', '.join(self.labels)}]" if self.labels else ""
+        return (
+            f"From: {self.sender}\n"
+            f"Date: {date_str}{labels_str}\n"
+            f"Subject: {self.subject}\n\n"
+            f"{self.body_text}"
+        )
+
+
+@dataclass
+class ChatMessage:
+    """A single Teams chat or channel message fetched from MS Graph."""
+
+    id: str
+    content: str
+    sender: str
+    channel: str | None
+    date: datetime
+
+    @property
+    def as_text(self) -> str:
+        """Return a human-readable text representation for LLM extraction."""
+        date_str = self.date.strftime("%Y-%m-%d %H:%M")
+        channel_str = f" [channel: {self.channel}]" if self.channel else ""
+        return (
+            f"From: {self.sender}\n"
+            f"Date: {date_str}{channel_str}\n\n"
+            f"{self.content}"
+        )
+
+
+# ── Fernet helpers ────────────────────────────────────────────────────────
+
+
+def _get_fernet() -> Fernet:
+    """Return a ``Fernet`` instance using ``settings.OAUTH_ENCRYPTION_KEY``.
+
+    Raises ``RuntimeError`` if ``OAUTH_ENCRYPTION_KEY`` is not set — callers
+    must ensure this is configured before persisting OAuth tokens.
+    """
+    key = settings.OAUTH_ENCRYPTION_KEY
+    if not key:
+        raise RuntimeError(
+            "OAUTH_ENCRYPTION_KEY is not set. "
+            "Generate one with: python -c \"from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())\""
+        )
+    return Fernet(key.encode() if isinstance(key, str) else key)
+
+
+def encrypt_token(token_info: dict) -> str:
+    """Fernet-encrypt an OAuth credential dict and return a base64 string.
+
+    Stores the full ``{access_token, refresh_token, token_uri, client_id,
+    client_secret, scopes, expiry}`` dict (or equivalent MSAL shape).
+
+    Raises:
+        RuntimeError: OAUTH_ENCRYPTION_KEY is not configured.
+        ValueError: ``token_info`` is not a non-empty dict.
+    """
+    if not isinstance(token_info, dict) or not token_info:
+        raise ValueError("token_info must be a non-empty dict")
+    plaintext = json.dumps(token_info).encode("utf-8")
+    return _get_fernet().encrypt(plaintext).decode("utf-8")
+
+
+def decrypt_token(encrypted: str) -> dict:
+    """Decrypt a Fernet-encrypted token string and return the credential dict.
+
+    Raises:
+        RuntimeError: OAUTH_ENCRYPTION_KEY is not configured.
+        ValueError: The encrypted string is invalid or was encrypted with a
+            different key.
+    """
+    try:
+        plaintext = _get_fernet().decrypt(encrypted.encode("utf-8"))
+        return json.loads(plaintext)
+    except (InvalidToken, json.JSONDecodeError) as exc:
+        raise ValueError(f"Failed to decrypt OAuth token: {exc}") from exc
+
+
+# ── Provider factory ──────────────────────────────────────────────────────
+
+
+def get_provider(
+    provider: str,
+    credentials_info: dict,
+) -> "GmailClient | MSGraphClient":
+    """Return the correct provider client for *provider*.
+
+    Parameters
+    ----------
+    provider:
+        One of ``"gmail"``, ``"outlook"``, ``"teams"``.
+    credentials_info:
+        Decrypted OAuth credential dict (Google or Microsoft shape).
+
+    Raises:
+        ValueError: Unknown provider name.
+    """
+    if provider == "gmail":
+        from app.integrations.gmail import GmailClient
+        return GmailClient(credentials_info)
+    if provider in {"outlook", "teams"}:
+        from app.integrations.ms_graph import MSGraphClient
+        return MSGraphClient(credentials_info)
+    raise ValueError(
+        f"Unknown cloud provider {provider!r}. "
+        "Supported: 'gmail', 'outlook', 'teams'."
+    )
--- a/app/integrations/gmail.py
+++ b/app/integrations/gmail.py
@@ -0,0 +1,335 @@
+"""Gmail API client for cloud agent integration.
+
+Wraps the Google Gmail REST API to fetch email messages matching a
+``filter_config`` dict.  Uses the official ``google-api-python-client``
+library (synchronous) wrapped in ``asyncio.to_thread()`` to avoid
+blocking the event loop.
+
+Token refresh is handled transparently: when the stored access token has
+expired, ``google.auth.transport.requests.Request`` will use the refresh
+token to obtain a fresh one.  The caller is responsible for persisting
+any refreshed credentials back to ``CloudAgentConfig.oauth_token_encrypted``
+(see ``agent_runner.run_cloud_agent``).
+
+Credential dict shape (Google OAuth2):
+    {
+        "token": "<access_token>",
+        "refresh_token": "<refresh_token>",
+        "token_uri": "https://oauth2.googleapis.com/token",
+        "client_id": "<client_id>",
+        "client_secret": "<client_secret>",
+        "scopes": ["https://www.googleapis.com/auth/gmail.readonly"],
+        "expiry": "2025-01-01T00:00:00Z"  # optional ISO-8601
+    }
+"""
+
+from __future__ import annotations
+
+import asyncio
+import base64
+import email
+import html
+import logging
+import re
+from datetime import datetime, timezone
+from typing import Any
+
+from app.integrations import EmailMessage
+
+logger = logging.getLogger(__name__)
+
+# Gmail search date format — e.g. "after:2025/01/01"
+_GMAIL_DATE_FMT = "%Y/%m/%d"
+
+# Maximum characters of body text forwarded to the LLM.
+_BODY_TRUNCATE = 8_000
+
+# Maximum messages retrieved per run (prevents runaway quota usage).
+_MAX_MESSAGES = 200
+
+
+def _build_gmail_query(
+    filter_config: dict[str, Any] | None,
+    since: datetime | None,
+) -> str:
+    """Build a Gmail search query string from *filter_config* and *since*.
+
+    Supported ``filter_config`` keys:
+        labels (list[str]):  Gmail label names, e.g. ``["INBOX", "work"]``
+        senders (list[str]): Sender addresses or domains to include
+        date_range (dict):   ``{from: "<YYYY-MM-DD>", to: "<YYYY-MM-DD>"}``
+
+    A hard ``since`` date (from last run) always overrides ``date_range.from``
+    when it is earlier.
+    """
+    parts: list[str] = []
+    cfg = filter_config or {}
+
+    # Labels — joined with OR when multiple given.
+    labels: list[str] = cfg.get("labels", [])
+    if labels:
+        if len(labels) == 1:
+            parts.append(f"label:{labels[0]}")
+        else:
+            label_expr = " OR ".join(f"label:{lbl}" for lbl in labels)
+            parts.append(f"({label_expr})")
+
+    # Senders — each prefixed with "from:".
+    senders: list[str] = cfg.get("senders", [])
+    for sender in senders:
+        parts.append(f"from:{sender}")
+
+    # Date range.
+    date_range: dict = cfg.get("date_range", {})
+    from_str: str | None = date_range.get("from")
+    to_str: str | None = date_range.get("to")
+
+    # Determine effective "from" date: most recent of filter_config.date_range.from and since.
+    effective_since: datetime | None = since
+    if from_str:
+        try:
+            cfg_since = datetime.fromisoformat(from_str.replace("Z", "+00:00"))
+            if cfg_since.tzinfo is None:
+                cfg_since = cfg_since.replace(tzinfo=timezone.utc)
+            if effective_since is None or cfg_since > effective_since:
+                effective_since = cfg_since
+        except ValueError:
+            logger.warning("gmail: invalid date_range.from %r — ignoring", from_str)
+
+    if effective_since:
+        parts.append(f"after:{effective_since.strftime(_GMAIL_DATE_FMT)}")
+
+    if to_str:
+        try:
+            to_dt = datetime.fromisoformat(to_str.replace("Z", "+00:00"))
+            parts.append(f"before:{to_dt.strftime(_GMAIL_DATE_FMT)}")
+        except ValueError:
+            logger.warning("gmail: invalid date_range.to %r — ignoring", to_str)
+
+    return " ".join(parts)
+
+
+def _strip_html(raw_html: str) -> str:
+    """Remove HTML tags and decode entities to get plain text."""
+    no_tags = re.sub(r"<[^>]+>", " ", raw_html)
+    decoded = html.unescape(no_tags)
+    return re.sub(r"\s+", " ", decoded).strip()
+
+
+def _parse_body(payload: dict[str, Any]) -> str:
+    """Recursively extract the plain-text body from a Gmail message payload.
+
+    Prefers ``text/plain``; falls back to ``text/html`` (stripped of tags).
+    Returns an empty string if no body can be extracted.
+    """
+    mime_type: str = payload.get("mimeType", "")
+    body: dict = payload.get("body", {})
+    parts: list[dict] = payload.get("parts", [])
+
+    if mime_type == "text/plain":
+        data = body.get("data", "")
+        if data:
+            return base64.urlsafe_b64decode(data + "==").decode("utf-8", errors="replace")
+        return ""
+
+    if mime_type == "text/html":
+        data = body.get("data", "")
+        if data:
+            raw = base64.urlsafe_b64decode(data + "==").decode("utf-8", errors="replace")
+            return _strip_html(raw)
+        return ""
+
+    # Multipart — prefer text/plain part, fall back to text/html.
+    plain_fallback = ""
+    for part in parts:
+        part_mime = part.get("mimeType", "")
+        if part_mime == "text/plain":
+            return _parse_body(part)
+        if part_mime == "text/html" and not plain_fallback:
+            plain_fallback = _parse_body(part)
+        if part_mime.startswith("multipart/"):
+            nested = _parse_body(part)
+            if nested:
+                return nested
+    return plain_fallback
+
+
+def _parse_date(raw: str) -> datetime:
+    """Parse an RFC 2822 email date header into a UTC ``datetime``."""
+    try:
+        parsed = email.utils.parsedate_to_datetime(raw)
+        if parsed.tzinfo is None:
+            parsed = parsed.replace(tzinfo=timezone.utc)
+        return parsed.astimezone(timezone.utc)
+    except Exception:
+        return datetime.now(timezone.utc)
+
+
+class GmailClient:
+    """Fetch email messages from a Gmail account via the Gmail REST API.
+
+    Parameters
+    ----------
+    credentials_info:
+        Decrypted OAuth2 credential dict.  Must contain at minimum
+        ``token`` (access token) or ``refresh_token`` + ``token_uri`` +
+        ``client_id`` + ``client_secret``.
+    """
+
+    def __init__(self, credentials_info: dict[str, Any]) -> None:
+        from google.oauth2.credentials import Credentials
+
+        self._credentials_info = credentials_info
+        expiry_str: str | None = credentials_info.get("expiry")
+        expiry: datetime | None = None
+        if expiry_str:
+            try:
+                expiry = datetime.fromisoformat(
+                    expiry_str.replace("Z", "+00:00")
+                ).replace(tzinfo=timezone.utc)
+            except ValueError:
+                pass
+
+        self._credentials = Credentials(
+            token=credentials_info.get("token"),
+            refresh_token=credentials_info.get("refresh_token"),
+            token_uri=credentials_info.get("token_uri", "https://oauth2.googleapis.com/token"),
+            client_id=credentials_info.get("client_id"),
+            client_secret=credentials_info.get("client_secret"),
+            scopes=credentials_info.get("scopes"),
+            expiry=expiry,
+        )
+
+    # ── Public API ─────────────────────────────────────────────────────────
+
+    async def fetch_messages(
+        self,
+        filter_config: dict[str, Any] | None = None,
+        since: datetime | None = None,
+    ) -> list[EmailMessage]:
+        """Return up to ``_MAX_MESSAGES`` emails matching *filter_config*.
+
+        Runs the synchronous Google API calls inside ``asyncio.to_thread()``
+        to avoid blocking the async event loop.
+
+        Token refresh is performed automatically when the access token has
+        expired.  After the call, ``self.refreshed_credentials`` may be
+        consulted to detect whether new credentials should be persisted.
+        """
+        query = _build_gmail_query(filter_config, since)
+        logger.debug("gmail: executing search query %r", query)
+        return await asyncio.to_thread(self._fetch_sync, query)
+
+    @property
+    def refreshed_credentials(self) -> dict[str, Any] | None:
+        """Return updated credential dict if the access token was refreshed.
+
+        If the credentials were refreshed during ``fetch_messages()``, returns
+        a new dict that should be re-encrypted and written back to the DB.
+        Returns ``None`` if no refresh occurred.
+        """
+        creds = self._credentials
+        if not creds.valid and creds.expired:
+            return None
+        # Check whether the token changed from what was stored.
+        if creds.token != self._credentials_info.get("token"):
+            result = {
+                "token": creds.token,
+                "refresh_token": creds.refresh_token,
+                "token_uri": creds.token_uri,
+                "client_id": creds.client_id,
+                "client_secret": creds.client_secret,
+                "scopes": list(creds.scopes or []),
+            }
+            if creds.expiry:
+                result["expiry"] = creds.expiry.isoformat()
+            return result
+        return None
+
+    # ── Internal sync worker ───────────────────────────────────────────────
+
+    def _fetch_sync(self, query: str) -> list[EmailMessage]:
+        """Synchronous worker — called inside ``asyncio.to_thread()``."""
+        import googleapiclient.discovery
+        import googleapiclient.errors
+        from google.auth.transport.requests import Request
+
+        # Refresh token if needed before building the service.
+        if self._credentials.expired and self._credentials.refresh_token:
+            try:
+                self._credentials.refresh(Request())
+            except Exception as exc:
+                raise RuntimeError(f"Gmail token refresh failed: {exc}") from exc
+
+        service = googleapiclient.discovery.build(
+            "gmail", "v1", credentials=self._credentials, cache_discovery=False
+        )
+        user_api = service.users()  # type: ignore[attr-defined]
+
+        # ── List matching message IDs ──────────────────────────────────────
+        ids: list[str] = []
+        page_token: str | None = None
+        while len(ids) < _MAX_MESSAGES:
+            batch_size = min(100, _MAX_MESSAGES - len(ids))
+            kwargs: dict[str, Any] = {
+                "userId": "me",
+                "maxResults": batch_size,
+            }
+            if query:
+                kwargs["q"] = query
+            if page_token:
+                kwargs["pageToken"] = page_token
+
+            try:
+                resp = user_api.messages().list(**kwargs).execute()
+            except googleapiclient.errors.HttpError as exc:
+                raise RuntimeError(f"Gmail messages.list failed: {exc}") from exc
+
+            for msg in resp.get("messages", []):
+                ids.append(msg["id"])
+
+            page_token = resp.get("nextPageToken")
+            if not page_token:
+                break
+
+        if not ids:
+            logger.debug("gmail: no messages matched query %r", query)
+            return []
+
+        logger.info("gmail: fetching %d message(s)", len(ids))
+
+        # ── Fetch individual message details ──────────────────────────────
+        messages: list[EmailMessage] = []
+        for msg_id in ids:
+            try:
+                msg = user_api.messages().get(
+                    userId="me", id=msg_id, format="full"
+                ).execute()
+
+                headers: dict[str, str] = {
+                    h["name"].lower(): h["value"]
+                    for h in msg.get("payload", {}).get("headers", [])
+                }
+                subject = headers.get("subject", "(no subject)")
+                sender = headers.get("from", "unknown")
+                date_raw = headers.get("date", "")
+                date = _parse_date(date_raw) if date_raw else datetime.now(timezone.utc)
+
+                body_text = _parse_body(msg.get("payload", {}))[:_BODY_TRUNCATE]
+                labels = msg.get("labelIds", [])
+
+                messages.append(EmailMessage(
+                    id=msg_id,
+                    subject=subject,
+                    sender=sender,
+                    body_text=body_text,
+                    date=date,
+                    labels=labels,
+                ))
+            except googleapiclient.errors.HttpError as exc:
+                logger.warning("gmail: skipping message %s — HTTP error: %s", msg_id, exc)
+            except Exception as exc:
+                logger.warning("gmail: skipping message %s — unexpected error: %s", msg_id, exc)
+
+        logger.info("gmail: returned %d message(s)", len(messages))
+        return messages
--- a/app/integrations/ms_graph.py
+++ b/app/integrations/ms_graph.py
@@ -0,0 +1,352 @@
+"""Microsoft Graph API client for Outlook and Teams cloud agent integration.
+
+Handles two data sources:
+
+* **Outlook email** (``provider="outlook"``) — ``fetch_emails()`` calls
+  ``/me/messages`` with an OData ``$filter`` built from ``filter_config``.
+* **Teams messages** (``provider="teams"``) — ``fetch_messages()`` calls
+  ``/me/chats/getAllMessages`` filtered by date.
+
+Authentication uses MSAL ``PublicClientApplication`` to acquire a token
+from a stored refresh token.  The ``httpx.AsyncClient`` (already a project
+dependency) is used for all API calls.
+
+Credential dict shape (Microsoft OAuth2 / MSAL):
+    {
+        "access_token":  "<access_token>",
+        "refresh_token": "<refresh_token>",
+        "token_type":    "Bearer",
+        "scope":         "Mail.Read ChannelMessage.Read.All offline_access",
+        "expires_in":    3600
+    }
+"""
+
+from __future__ import annotations
+
+import logging
+import re
+from datetime import datetime, timedelta, timezone
+from typing import Any
+
+import httpx
+
+from app.config.settings import settings
+from app.integrations import ChatMessage, EmailMessage
+
+logger = logging.getLogger(__name__)
+
+_GRAPH_BASE = "https://graph.microsoft.com/v1.0"
+
+# Max items fetched per run.
+_MAX_EMAILS = 200
+_MAX_MESSAGES = 200
+
+# Max characters of body forwarded to the LLM.
+_BODY_TRUNCATE = 8_000
+
+
+def _strip_html(raw: str) -> str:
+    """Strip HTML tags and collapse whitespace."""
+    no_tags = re.sub(r"<[^>]+>", " ", raw)
+    import html as _html
+    decoded = _html.unescape(no_tags)
+    return re.sub(r"\s+", " ", decoded).strip()
+
+
+def _odata_datetime(dt: datetime) -> str:
+    """Format a datetime as an OData datetime literal (UTC, ISO 8601)."""
+    utc = dt.astimezone(timezone.utc)
+    return utc.strftime("%Y-%m-%dT%H:%M:%SZ")
+
+
+def _build_email_filter(
+    filter_config: dict[str, Any] | None,
+    since: datetime | None,
+) -> str:
+    """Build an OData ``$filter`` expression for the ``/me/messages`` endpoint.
+
+    Supported ``filter_config`` keys:
+        senders (list[str]):  Sender email addresses.
+        date_range (dict):    ``{from: "<ISO-8601>", to: "<ISO-8601>"}``
+        folders (list[str]):  Folder display names (not directly filterable
+                              via OData, so ignored here — callers iterate
+                              folder IDs separately if needed; listed for
+                              completeness).
+
+    A hard ``since`` date always overrides ``date_range.from`` when it is
+    earlier.
+    """
+    clauses: list[str] = []
+    cfg = filter_config or {}
+
+    # Senders.
+    senders: list[str] = cfg.get("senders", [])
+    if senders:
+        sender_clauses = [f"from/emailAddress/address eq '{s}'" for s in senders]
+        clauses.append("(" + " or ".join(sender_clauses) + ")")
+
+    # Date range.
+    date_range: dict = cfg.get("date_range", {})
+    from_str: str | None = date_range.get("from")
+
+    effective_since: datetime | None = since
+    if from_str:
+        try:
+            cfg_since = datetime.fromisoformat(from_str.replace("Z", "+00:00"))
+            if cfg_since.tzinfo is None:
+                cfg_since = cfg_since.replace(tzinfo=timezone.utc)
+            if effective_since is None or cfg_since > effective_since:
+                effective_since = cfg_since
+        except ValueError:
+            logger.warning("ms_graph: invalid date_range.from %r — ignoring", from_str)
+
+    if effective_since:
+        clauses.append(f"receivedDateTime ge {_odata_datetime(effective_since)}")
+
+    to_str: str | None = date_range.get("to")
+    if to_str:
+        try:
+            to_dt = datetime.fromisoformat(to_str.replace("Z", "+00:00"))
+            if to_dt.tzinfo is None:
+                to_dt = to_dt.replace(tzinfo=timezone.utc)
+            clauses.append(f"receivedDateTime le {_odata_datetime(to_dt)}")
+        except ValueError:
+            logger.warning("ms_graph: invalid date_range.to %r — ignoring", to_str)
+
+    return " and ".join(clauses)
+
+
+class MSGraphClient:
+    """Fetch emails and Teams messages via the Microsoft Graph REST API.
+
+    Parameters
+    ----------
+    credentials_info:
+        Decrypted MSAL credential dict.
+    """
+
+    def __init__(self, credentials_info: dict[str, Any]) -> None:
+        self._credentials_info = credentials_info
+        self._access_token: str = credentials_info.get("access_token", "")
+        self._original_access_token: str = self._access_token
+        self._refresh_token: str | None = credentials_info.get("refresh_token")
+
+    # ── Token management ───────────────────────────────────────────────────
+
+    def _auth_headers(self) -> dict[str, str]:
+        return {"Authorization": f"Bearer {self._access_token}"}
+
+    async def _refresh_access_token(self) -> None:
+        """Use MSAL to exchange the refresh token for a fresh access token.
+
+        Updates ``self._access_token`` and ``self._credentials_info`` in-place.
+
+        Raises:
+            RuntimeError: MSAL reports an auth error.
+        """
+        import msal
+
+        app = msal.ConfidentialClientApplication(
+            client_id=settings.MS_CLIENT_ID,
+            client_credential=settings.MS_CLIENT_SECRET,
+            authority=f"https://login.microsoftonline.com/{settings.MS_TENANT_ID}",
+        )
+        scopes: list[str] = self._credentials_info.get("scope", "").split()
+        if not scopes:
+            scopes = ["https://graph.microsoft.com/.default"]
+
+        result = app.acquire_token_by_refresh_token(
+            self._refresh_token,
+            scopes=scopes,
+        )
+        if "access_token" not in result:
+            error = result.get("error_description", result.get("error", "unknown"))
+            raise RuntimeError(f"MS Graph token refresh failed: {error}")
+
+        self._access_token = result["access_token"]
+        # MSAL may issue a new refresh token.
+        if "refresh_token" in result:
+            self._refresh_token = result["refresh_token"]
+            self._credentials_info["refresh_token"] = result["refresh_token"]
+        self._credentials_info["access_token"] = self._access_token
+
+    @property
+    def refreshed_credentials(self) -> dict[str, Any] | None:
+        """Return updated credential dict if the access token was refreshed.
+
+        Returns ``None`` if no change was made.
+        """
+        if self._access_token != self._original_access_token:
+            return {**self._credentials_info, "access_token": self._access_token}
+        return None
+
+    # ── HTTP helpers ───────────────────────────────────────────────────────
+
+    async def _get(
+        self,
+        client: httpx.AsyncClient,
+        url: str,
+        params: dict[str, Any] | None = None,
+        *,
+        retry_on_401: bool = True,
+    ) -> dict[str, Any]:
+        """GET *url* with auth; refresh token on 401 and retry once."""
+        resp = await client.get(url, params=params, headers=self._auth_headers())
+        if resp.status_code == 401 and retry_on_401 and self._refresh_token:
+            logger.debug("ms_graph: 401 on %s — refreshing token", url)
+            await self._refresh_access_token()
+            resp = await client.get(url, params=params, headers=self._auth_headers())
+        if resp.status_code == 429:
+            raise RuntimeError("MS Graph rate limit hit (429). Try again later.")
+        resp.raise_for_status()
+        return resp.json()
+
+    # ── Public API ─────────────────────────────────────────────────────────
+
+    async def fetch_emails(
+        self,
+        filter_config: dict[str, Any] | None = None,
+        since: datetime | None = None,
+    ) -> list[EmailMessage]:
+        """Return up to ``_MAX_EMAILS`` Outlook messages matching *filter_config*.
+
+        Parameters
+        ----------
+        filter_config:
+            Optional dict with ``senders``, ``date_range``, ``folders`` keys.
+        since:
+            Hard lower-bound on email date (from last agent run).
+        """
+        odata_filter = _build_email_filter(filter_config, since)
+        params: dict[str, Any] = {
+            "$top": 50,
+            "$select": "id,subject,from,receivedDateTime,body,bodyPreview",
+            "$orderby": "receivedDateTime desc",
+        }
+        if odata_filter:
+            params["$filter"] = odata_filter
+
+        emails: list[EmailMessage] = []
+        url = f"{_GRAPH_BASE}/me/messages"
+
+        async with httpx.AsyncClient(timeout=30.0) as client:
+            while url and len(emails) < _MAX_EMAILS:
+                data = await self._get(client, url, params if url.startswith(_GRAPH_BASE) else None)
+                for item in data.get("value", []):
+                    emails.append(self._parse_email(item))
+                    if len(emails) >= _MAX_EMAILS:
+                        break
+                url = data.get("@odata.nextLink", "")
+                params = {}  # nextLink already contains encoded params.
+
+        logger.info("ms_graph: fetched %d Outlook email(s)", len(emails))
+        return emails
+
+    async def fetch_messages(
+        self,
+        filter_config: dict[str, Any] | None = None,
+        since: datetime | None = None,
+    ) -> list[ChatMessage]:
+        """Return up to ``_MAX_MESSAGES`` Teams messages matching *filter_config*.
+
+        Fetches from ``/me/chats/getAllMessages`` (personal + group chats).
+        The ``filter_config.channels`` key is checked as a text-filter on
+        the channel name post-fetch (the API doesn't support channel OData
+        filter directly on ``getAllMessages``).
+        """
+        cfg = filter_config or {}
+        channel_filter: list[str] = [c.lower() for c in cfg.get("channels", [])]
+        params: dict[str, Any] = {"$top": 50}
+        if since:
+            params["$filter"] = f"createdDateTime ge {_odata_datetime(since)}"
+
+        messages: list[ChatMessage] = []
+        url = f"{_GRAPH_BASE}/me/chats/getAllMessages"
+
+        async with httpx.AsyncClient(timeout=30.0) as client:
+            while url and len(messages) < _MAX_MESSAGES:
+                try:
+                    data = await self._get(client, url, params if url.startswith(_GRAPH_BASE) else None)
+                except httpx.HTTPStatusError as exc:
+                    # getAllMessages requires specific licensing; degrade gracefully.
+                    if exc.response.status_code in (403, 404):
+                        logger.warning(
+                            "ms_graph: /me/chats/getAllMessages not available (%d) — "
+                            "check Teams license or permissions",
+                            exc.response.status_code,
+                        )
+                        break
+                    raise
+
+                for item in data.get("value", []):
+                    msg = self._parse_teams_message(item)
+                    if channel_filter and msg.channel:
+                        if not any(c in msg.channel.lower() for c in channel_filter):
+                            continue
+                    messages.append(msg)
+                    if len(messages) >= _MAX_MESSAGES:
+                        break
+                url = data.get("@odata.nextLink", "")
+                params = {}
+
+        logger.info("ms_graph: fetched %d Teams message(s)", len(messages))
+        return messages
+
+    # ── Parsers ────────────────────────────────────────────────────────────
+
+    @staticmethod
+    def _parse_email(item: dict[str, Any]) -> EmailMessage:
+        subject: str = item.get("subject", "(no subject)") or "(no subject)"
+        sender_block = item.get("from", {}) or {}
+        sender_addr = (
+            (sender_block.get("emailAddress") or {}).get("address", "unknown")
+        )
+        date_str: str = item.get("receivedDateTime", "")
+        try:
+            date = datetime.fromisoformat(date_str.replace("Z", "+00:00"))
+        except Exception:
+            date = datetime.now(timezone.utc)
+
+        body_block = item.get("body", {}) or {}
+        content_type: str = body_block.get("contentType", "text")
+        raw_body: str = body_block.get("content", "")
+        if content_type == "html":
+            body_text = _strip_html(raw_body)
+        else:
+            body_text = raw_body or item.get("bodyPreview", "")
+        body_text = body_text[:_BODY_TRUNCATE]
+
+        return EmailMessage(
+            id=item.get("id", ""),
+            subject=subject,
+            sender=sender_addr,
+            body_text=body_text,
+            date=date,
+        )
+
+    @staticmethod
+    def _parse_teams_message(item: dict[str, Any]) -> ChatMessage:
+        msg_id: str = item.get("id", "")
+        sender_block = (item.get("from") or {}).get("user") or {}
+        sender: str = sender_block.get("displayName", "unknown")
+        channel: str | None = (item.get("channelIdentity") or {}).get("channelId")
+
+        date_str: str = item.get("createdDateTime", "")
+        try:
+            date = datetime.fromisoformat(date_str.replace("Z", "+00:00"))
+        except Exception:
+            date = datetime.now(timezone.utc)
+
+        body_block = item.get("body", {}) or {}
+        content_type: str = body_block.get("contentType", "text")
+        raw_content: str = body_block.get("content", "")
+        content = _strip_html(raw_content) if content_type == "html" else raw_content
+        content = content[:_BODY_TRUNCATE]
+
+        return ChatMessage(
+            id=msg_id,
+            content=content,
+            sender=sender,
+            channel=channel,
+            date=date,
+        )
--- a/app/main.py
+++ b/app/main.py
@@ -1,8 +1,16 @@
 from contextlib import asynccontextmanager
+import logging

 from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware

+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s %(levelname)s %(name)s: %(message)s",
+)
+logging.getLogger("sqlalchemy.engine").setLevel(logging.WARNING)
+logging.getLogger("sqlalchemy.pool").setLevel(logging.WARNING)
+
 from app.api.middleware.rate_limit import TierRateLimitMiddleware
 from app.api.middleware.sanitizer import SanitizerMiddleware
 from app.config.settings import settings
@@ -10,9 +18,8 @@ from app.config.settings import settings

@asynccontextmanager
 async def lifespan(app: FastAPI):
-    # Startup: initialise DB connection pool and agent registry
-    from app.core.agent_registry import registry  # noqa: F401 — triggers module load
-    import app.agents  # noqa: F401 — triggers @registry.register decorators
+    # Startup: ensure agent tool modules are loaded.
+    import app.agents  # noqa: F401

    yield

@@ -43,16 +50,17 @@ def create_app() -> FastAPI:
    app.add_middleware(SanitizerMiddleware)
    app.add_middleware(TierRateLimitMiddleware)

-    from app.api.routes import auth, backup, billing, chat, plans, plugins, storage, vectors
+    from app.api.routes import agents, auth, backup, billing, chat, device_ws, plugins, storage, vectors

-    app.include_router(auth.router,     prefix="/api/v1")
-    app.include_router(chat.router,     prefix="/api/v1")
-    app.include_router(plans.router,    prefix="/api/v1")
-    app.include_router(storage.router,  prefix="/api/v1")
-    app.include_router(vectors.router,  prefix="/api/v1")
-    app.include_router(backup.router,   prefix="/api/v1")
-    app.include_router(plugins.router,  prefix="/api/v1")
-    app.include_router(billing.router,  prefix="/api/v1")
+    app.include_router(auth.router,       prefix="/api/v1")
+    app.include_router(chat.router,       prefix="/api/v1")
+    app.include_router(storage.router,    prefix="/api/v1")
+    app.include_router(vectors.router,    prefix="/api/v1")
+    app.include_router(backup.router,     prefix="/api/v1")
+    app.include_router(plugins.router,    prefix="/api/v1")
+    app.include_router(billing.router,    prefix="/api/v1")
+    app.include_router(agents.router,     prefix="/api/v1")
+    app.include_router(device_ws.router,  prefix="/api/v1")

    @app.get("/api/v1/health", tags=["health"])
    async def health() -> dict:
--- a/app/marketplace/plugin_review.py
+++ b/app/marketplace/plugin_review.py
@@ -29,8 +29,8 @@ ALLOWED_PERMISSIONS: frozenset[str] = frozenset(
        "write:projects",
        "read:notes",
        "write:notes",
-        "read:checkpoints",
-        "write:checkpoints",
+        "read:timelines",
+        "write:timelines",
        "read:calendar",
        "write:calendar",
    }
--- a/app/models.py
+++ b/app/models.py
@@ -14,6 +14,10 @@ Table inventory:
  plugin_installations — per-user install records
  plugin_reviews      — admin review decisions
  revenue_events      — Stripe Connect 70/30 split ledger
+  memory_core         — per-user persistent key/value preferences (encrypted)
+  memory_associative  — per-user semantic memory with embeddings (encrypted)
+  memory_episodic     — per-user session summaries (encrypted)
+  memory_proactive    — per-user behavioral patterns (encrypted)
 """

 from __future__ import annotations
@@ -23,11 +27,13 @@ from datetime import datetime, timezone

 from sqlalchemy import (
    BigInteger,
+    Boolean,
    DateTime,
    Enum,
    Float,
    ForeignKey,
    Integer,
+    JSON,
    String,
    Text,
    UniqueConstraint,
@@ -54,6 +60,9 @@ def _now() -> datetime:
 TierEnum = Enum("free", "pro", "power", "team", name="billing_tier")
 PluginStatusEnum = Enum("pending_review", "approved", "rejected", name="plugin_status")
 ReviewDecisionEnum = Enum("approved", "rejected", name="review_decision")
+AgentTypeEnum = Enum("local", "cloud", name="agent_type")
+AgentStatusEnum = Enum("running", "success", "error", "partial", name="agent_run_status")
+CloudProviderEnum = Enum("gmail", "teams", "outlook", name="cloud_provider")


 # ── Models ────────────────────────────────────────────────────────────────
@@ -66,9 +75,14 @@ class User(Base):
        Uuid(as_uuid=False), primary_key=True, default=_uuid
    )
    email: Mapped[str] = mapped_column(String(255), unique=True, nullable=False, index=True)
+    name: Mapped[str | None] = mapped_column(String(100), nullable=True)
+    surname: Mapped[str | None] = mapped_column(String(100), nullable=True)
    password_hash: Mapped[str] = mapped_column(String(255), nullable=False)
    tier: Mapped[str] = mapped_column(TierEnum, nullable=False, default="free")
    stripe_customer_id: Mapped[str | None] = mapped_column(String(255), nullable=True)
+    # Per-user Fernet key (base64-urlsafe, 44 chars). Generated on registration.
+    # Used to encrypt/decrypt all memory rows for this user.
+    encryption_key: Mapped[str | None] = mapped_column(String(64), nullable=True)
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), nullable=False, server_default=func.now()
    )
@@ -266,3 +280,198 @@ class RevenueEvent(Base):
    )

    plugin: Mapped[Plugin] = relationship(back_populates="revenue_events")
+
+
+class LocalAgentConfig(Base):
+    __tablename__ = "local_agent_configs"
+
+    id: Mapped[str] = mapped_column(
+        Uuid(as_uuid=False), primary_key=True, default=_uuid
+    )
+    user_id: Mapped[str] = mapped_column(
+        Uuid(as_uuid=False), ForeignKey("users.id", ondelete="CASCADE"), nullable=False, index=True
+    )
+    device_id: Mapped[str] = mapped_column(String(255), nullable=False)
+    name: Mapped[str] = mapped_column(String(255), nullable=False)
+    directory_paths: Mapped[list] = mapped_column(JSON, nullable=False, default=list)
+    data_types: Mapped[list] = mapped_column(JSON, nullable=False, default=list)
+    prompt_template: Mapped[str] = mapped_column(Text, nullable=False, default="")
+    agent_config: Mapped[dict | None] = mapped_column(JSON, nullable=True)
+    file_extensions: Mapped[list] = mapped_column(JSON, nullable=False, default=list)
+    schedule_cron: Mapped[str] = mapped_column(String(100), nullable=False, default="0 */6 * * *")
+    enabled: Mapped[bool] = mapped_column(Boolean, nullable=False, default=True)
+    last_run_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
+    created_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), nullable=False, server_default=func.now()
+    )
+    updated_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), nullable=False, server_default=func.now(), onupdate=func.now()
+    )
+
+    run_logs: Mapped[list[AgentRunLog]] = relationship(
+        back_populates="local_agent",
+        primaryjoin="and_(AgentRunLog.agent_id == LocalAgentConfig.id, AgentRunLog.agent_type == 'local')",
+        foreign_keys="AgentRunLog.agent_id",
+        cascade="all, delete-orphan",
+        overlaps="run_logs,cloud_agent",
+    )
+
+
+class CloudAgentConfig(Base):
+    __tablename__ = "cloud_agent_configs"
+
+    id: Mapped[str] = mapped_column(
+        Uuid(as_uuid=False), primary_key=True, default=_uuid
+    )
+    user_id: Mapped[str] = mapped_column(
+        Uuid(as_uuid=False), ForeignKey("users.id", ondelete="CASCADE"), nullable=False, index=True
+    )
+    provider: Mapped[str] = mapped_column(CloudProviderEnum, nullable=False)
+    name: Mapped[str] = mapped_column(String(255), nullable=False)
+    data_types: Mapped[list] = mapped_column(JSON, nullable=False, default=list)
+    prompt_template: Mapped[str] = mapped_column(Text, nullable=False, default="")
+    oauth_token_encrypted: Mapped[str | None] = mapped_column(Text, nullable=True)
+    filter_config: Mapped[dict | None] = mapped_column(JSON, nullable=True)
+    schedule_cron: Mapped[str] = mapped_column(String(100), nullable=False, default="0 */6 * * *")
+    enabled: Mapped[bool] = mapped_column(Boolean, nullable=False, default=True)
+    last_run_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
+    created_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), nullable=False, server_default=func.now()
+    )
+    updated_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), nullable=False, server_default=func.now(), onupdate=func.now()
+    )
+
+    run_logs: Mapped[list[AgentRunLog]] = relationship(
+        back_populates="cloud_agent",
+        primaryjoin="and_(AgentRunLog.agent_id == CloudAgentConfig.id, AgentRunLog.agent_type == 'cloud')",
+        foreign_keys="AgentRunLog.agent_id",
+        cascade="all, delete-orphan",
+        overlaps="run_logs,local_agent",
+    )
+
+
+class AgentRunLog(Base):
+    __tablename__ = "agent_run_logs"
+
+    id: Mapped[str] = mapped_column(
+        Uuid(as_uuid=False), primary_key=True, default=_uuid
+    )
+    # Plain string — not a FK because it references either local_agent_configs or cloud_agent_configs
+    # depending on agent_type. Query by (agent_id, agent_type) to locate the source config.
+    agent_id: Mapped[str] = mapped_column(String(255), nullable=False, index=True)
+    agent_type: Mapped[str] = mapped_column(AgentTypeEnum, nullable=False)
+    user_id: Mapped[str] = mapped_column(
+        Uuid(as_uuid=False), ForeignKey("users.id", ondelete="CASCADE"), nullable=False, index=True
+    )
+    status: Mapped[str] = mapped_column(AgentStatusEnum, nullable=False, default="running")
+    items_processed: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
+    items_created: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
+    errors: Mapped[list | None] = mapped_column(JSON, nullable=True)
+    started_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), nullable=False, server_default=func.now()
+    )
+    completed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
+
+    local_agent: Mapped[LocalAgentConfig | None] = relationship(
+        back_populates="run_logs",
+        primaryjoin="and_(AgentRunLog.agent_id == LocalAgentConfig.id, AgentRunLog.agent_type == 'local')",
+        foreign_keys="AgentRunLog.agent_id",
+        overlaps="run_logs,cloud_agent",
+    )
+    cloud_agent: Mapped[CloudAgentConfig | None] = relationship(
+        back_populates="run_logs",
+        primaryjoin="and_(AgentRunLog.agent_id == CloudAgentConfig.id, AgentRunLog.agent_type == 'cloud')",
+        foreign_keys="AgentRunLog.agent_id",
+        overlaps="run_logs,local_agent",
+    )
+
+
+# ── Memory models ─────────────────────────────────────────────────────────────
+
+
+class MemoryCore(Base):
+    """Per-user persistent key/value preferences, encrypted at rest.
+
+    Examples: preferred_language, timezone, work_style.
+    Decrypted in-memory only using User.encryption_key.
+    """
+
+    __tablename__ = "memory_core"
+
+    id: Mapped[str] = mapped_column(Uuid(as_uuid=False), primary_key=True, default=_uuid)
+    user_id: Mapped[str] = mapped_column(
+        Uuid(as_uuid=False), ForeignKey("users.id", ondelete="CASCADE"),
+        nullable=False, index=True,
+    )
+    key: Mapped[str] = mapped_column(String(255), nullable=False)
+    value_encrypted: Mapped[str] = mapped_column(Text, nullable=False)
+    updated_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), nullable=False, server_default=func.now(), onupdate=func.now()
+    )
+
+
+class MemoryAssociative(Base):
+    """Per-user semantic memory: encrypted content + pgvector embedding for similarity search.
+
+    Production: ``embedding`` column is ``vector(1536)`` via pgvector.
+    Tests (SQLite): stored as JSON list.
+    """
+
+    __tablename__ = "memory_associative"
+
+    id: Mapped[str] = mapped_column(Uuid(as_uuid=False), primary_key=True, default=_uuid)
+    user_id: Mapped[str] = mapped_column(
+        Uuid(as_uuid=False), ForeignKey("users.id", ondelete="CASCADE"),
+        nullable=False, index=True,
+    )
+    content_encrypted: Mapped[str] = mapped_column(Text, nullable=False)
+    # JSON-encoded float list in SQLite tests; vector(1536) in Postgres via migration.
+    embedding: Mapped[list | None] = mapped_column(JSON, nullable=True)
+    entity_type: Mapped[str | None] = mapped_column(String(100), nullable=True)
+    entity_id: Mapped[str | None] = mapped_column(String(255), nullable=True)
+    updated_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), nullable=False, server_default=func.now(), onupdate=func.now()
+    )
+
+
+class MemoryEpisodic(Base):
+    """Per-user session summaries, encrypted at rest.
+
+    One row per session interaction; used to recall recent conversations.
+    """
+
+    __tablename__ = "memory_episodic"
+
+    id: Mapped[str] = mapped_column(Uuid(as_uuid=False), primary_key=True, default=_uuid)
+    user_id: Mapped[str] = mapped_column(
+        Uuid(as_uuid=False), ForeignKey("users.id", ondelete="CASCADE"),
+        nullable=False, index=True,
+    )
+    summary_encrypted: Mapped[str] = mapped_column(Text, nullable=False)
+    session_id: Mapped[str] = mapped_column(String(255), nullable=False, index=True)
+    created_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), nullable=False, server_default=func.now()
+    )
+
+
+class MemoryProactive(Base):
+    """Per-user inferred behavioral patterns, encrypted at rest.
+
+    Confidence in [0.0, 1.0]; only patterns above threshold are injected.
+    Source: 'inferred' (from episodes) or 'explicit' (user-stated).
+    """
+
+    __tablename__ = "memory_proactive"
+
+    id: Mapped[str] = mapped_column(Uuid(as_uuid=False), primary_key=True, default=_uuid)
+    user_id: Mapped[str] = mapped_column(
+        Uuid(as_uuid=False), ForeignKey("users.id", ondelete="CASCADE"),
+        nullable=False, index=True,
+    )
+    pattern_encrypted: Mapped[str] = mapped_column(Text, nullable=False)
+    confidence: Mapped[float] = mapped_column(Float, nullable=False, default=0.5)
+    source: Mapped[str] = mapped_column(String(50), nullable=False, default="inferred")
+    created_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), nullable=False, server_default=func.now()
+    )
--- a/app/schemas.py
+++ b/app/schemas.py
@@ -5,6 +5,7 @@ Mirrors the TypeScript types from the Electron app (src/shared/api-types.ts).

 from __future__ import annotations

+from enum import Enum
 from typing import Any, Literal

 from pydantic import BaseModel, Field
@@ -26,6 +27,8 @@ class AuthTokens(BaseModel):
 class UserProfile(BaseModel):
    id: str
    email: str
+    name: str | None = None
+    surname: str | None = None
    tier: BillingTier


@@ -38,41 +41,13 @@ class ChatContext(BaseModel):
    conversation_history: list[dict[str, Any]] = Field(default_factory=list)


-class PlanAction(BaseModel):
-    type: Literal[
-        "create_record",
-        "update_record",
-        "delete_record",
-        "index_document",
-        "send_notification",
-    ]
-    table: str | None = None
-    data: dict[str, Any] | None = None
-
-
 class ChatRequest(BaseModel):
    message: str
    context: ChatContext = Field(default_factory=ChatContext)
-    execution_mode: Literal["direct", "plan"] = "direct"


 class ChatResponse(BaseModel):
    response: str
-    actions: list[PlanAction] = Field(default_factory=list)
-
-
-# ── Execution Plans ──────────────────────────────────────────────────
-
-class PlanStep(BaseModel):
-    action: str
-    prompt_template: str | None = None
-    variables: dict[str, Any] | None = None
-    data_from_step: int | None = None
-
-
-class ExecutionPlan(BaseModel):
-    agent: str
-    steps: list[PlanStep] = Field(default_factory=list)


 # ── Backup ───────────────────────────────────────────────────────────
@@ -155,3 +130,213 @@ class PluginListResponse(BaseModel):

 class PluginInstallRequest(BaseModel):
    plugin_id: str
+
+
+# ── WebSocket Frame Protocol ──────────────────────────────────────────
+
+class WsFrameType(str, Enum):
+    # ── v2 frame types (kept for backward compat) ──────────────────────
+    chat_request = "chat_request"
+    text_chunk = "text_chunk"
+    tool_call = "tool_call"
+    tool_result = "tool_result"
+    final = "final"
+    ping = "ping"
+    device_hello = "device_hello"
+    # ── v3 frame types ─────────────────────────────────────────────────
+    home_request = "home_request"
+    floating_request = "floating_request"
+    stream_start = "stream_start"
+    stream_text = "stream_text"
+    stream_end = "stream_end"
+    floating_domain = "floating_domain"
+    data_request = "data_request"
+    data_response = "data_response"
+    mutation = "mutation"
+    # ── v4 journey frame types ────────────────────────────────────────
+    journey_start = "journey_start"
+    journey_message = "journey_message"
+    journey_reply = "journey_reply"
+
+
+class WsToolCall(BaseModel):
+    """Server → Client: requests a CRUD/vector operation on the local DB."""
+
+    type: Literal[WsFrameType.tool_call] = WsFrameType.tool_call
+    id: str
+    action: str
+    table: str | None = None
+    data: dict[str, Any] | None = None
+    filters: dict[str, Any] | None = None
+    vector: list[float] | None = None
+    limit: int | None = None
+
+
+class WsToolResult(BaseModel):
+    """Client → Server: result of a CRUD/vector operation."""
+
+    type: Literal[WsFrameType.tool_result] = WsFrameType.tool_result
+    id: str
+    row: dict[str, Any] | None = None
+    rows: list[dict[str, Any]] | None = None
+    results: list[dict[str, Any]] | None = None
+    deleted: bool | None = None
+    ok: bool | None = None
+    error: str | None = None
+
+
+class WsTextChunk(BaseModel):
+    """Server → Client: incremental LLM response text."""
+
+    type: Literal[WsFrameType.text_chunk] = WsFrameType.text_chunk
+    text: str
+
+
+class WsFinal(BaseModel):
+    """Server → Client: signals end of response with the complete text."""
+
+    type: Literal[WsFrameType.final] = WsFrameType.final
+    response: str
+
+
+# ── WebSocket Agent Frame Protocol ────────────────────────────────────
+
+class WsDeviceHello(BaseModel):
+    """Client → Server: device identification on WS connect."""
+
+    type: Literal[WsFrameType.device_hello] = WsFrameType.device_hello
+    device_id: str
+    agent_ids: list[str] = Field(default_factory=list)
+
+
+
+# ── WebSocket v3 Frame Models ─────────────────────────────────────────
+
+class WsFloatingScope(BaseModel):
+    """Scope for a floating request — narrows the agent to a specific entity."""
+
+    type: Literal["task", "project", "note", "timeline"]
+    id: str | None = None
+
+
+class WsHomeRequest(BaseModel):
+    """Client → Server: Home chat message."""
+
+    type: Literal[WsFrameType.home_request] = WsFrameType.home_request
+    message: str
+    conversation_history: list[dict[str, Any]] = Field(default_factory=list)
+
+
+class WsFloatingRequest(BaseModel):
+    """Client → Server: Floating chat message scoped to an entity."""
+
+    type: Literal[WsFrameType.floating_request] = WsFrameType.floating_request
+    message: str
+    scope: WsFloatingScope
+
+
+class WsStreamStart(BaseModel):
+    """Server → Client: signals start of a streaming response."""
+
+    type: Literal[WsFrameType.stream_start] = WsFrameType.stream_start
+    request_id: str
+
+
+class WsStreamText(BaseModel):
+    """Server → Client: streamed text token."""
+
+    type: Literal[WsFrameType.stream_text] = WsFrameType.stream_text
+    request_id: str
+    chunk: str
+
+
+class WsStreamEnd(BaseModel):
+    """Server → Client: signals end of a streaming response."""
+
+    type: Literal[WsFrameType.stream_end] = WsFrameType.stream_end
+    request_id: str
+
+
+class WsDomain(BaseModel):
+    """Structured floating domain payload for UI routing decisions."""
+
+    type: Literal["task", "timeline", "project", "node"]
+    id: str | None = None
+    section: Literal["task", "timeline", "note"] | None = None
+
+
+class WsFloatingDomain(BaseModel):
+    """Server → Client: domain determined for a floating request."""
+
+    type: Literal[WsFrameType.floating_domain] = WsFrameType.floating_domain
+    request_id: str
+    domain: WsDomain
+
+
+# ── Agent Config V2 ───────────────────────────────────────────────────
+
+
+class ContentTypeConfig(BaseModel):
+    """Per-type extraction config produced by the journey chatbot."""
+
+    id: str
+    label: str = ""
+    detection_hint: str = ""
+    preprocessing: str = "generic"  # handler name: "email_html", "plain_text", ...
+    extraction_prompt: str
+
+
+class AgentConfig(BaseModel):
+    """Structured agent configuration (replaces freeform prompt_template)."""
+
+    content_types: list[ContentTypeConfig] = []
+    global_rules: list[str] = []
+    data_types: list[str] = []
+
+
+# ── Agent Catalog ─────────────────────────────────────────────────────
+
+class AgentCatalogItem(BaseModel):
+    type: str
+    name: str
+    description: str
+
+
+class AgentCreationCheckRequest(BaseModel):
+    active_agents: int = Field(ge=0, default=0)
+
+
+class AgentCreationCheckResponse(BaseModel):
+    allowed: bool
+    tier: BillingTier
+    active_agents: int
+    limit: int
+
+
+class AgentTriggerRequest(BaseModel):
+    directory: str = Field(min_length=1)
+    device_id: str = Field(default="")
+    agent_id: str | None = None  # FE stable agent ID (electron-store UUID)
+    what_to_extract: list[str] = Field(min_length=1)
+    actions_by_type: dict[str, list[str]] | None = None
+    batch_interval: str = Field(min_length=1)
+    custom_agent_prompt: str = Field(min_length=1)
+    active_agents: int = Field(ge=0, default=0)
+
+
+# ── Agent Run Log ─────────────────────────────────────────────────────
+
+class AgentRunLogResponse(BaseModel):
+    id: str
+    agent_id: str
+    agent_type: Literal["local", "cloud"]
+    status: Literal["running", "success", "error", "partial"]
+    items_processed: int
+    items_created: int
+    errors: list[str]
+    started_at: int
+    completed_at: int | None
+
+
+# ── Chatbot Journey ───────────────────────────────────────────────────
+
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -8,13 +8,16 @@ services:
        required: false
    environment:
      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
+      GITHUB_COPILOT_TOKEN_DIR: /root/.config/litellm/github_copilot
+    volumes:
+      - copilot_tokens:/root/.config/litellm/github_copilot
    depends_on:
      db:
        condition: service_healthy
    restart: unless-stopped

  db:
-    image: postgres:16-alpine
+    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
@@ -66,3 +69,4 @@ volumes:
  postgres_data:
  minio_data:
  qdrant_data:
+  copilot_tokens:
--- a/docs/MICROSERVICES_ARCHITECTURE.md
+++ b/docs/MICROSERVICES_ARCHITECTURE.md
@@ -0,0 +1,941 @@
+# Adiuva — Architettura Microservizi (MVP)
+
+## Panoramica
+
+Il monolite viene suddiviso in **4 servizi MVP** + un **API Gateway (Traefik)**, orchestrati con Docker Compose su un singolo VPS raggiungibile via Cloudflare.
+
+> **Fuori dall'MVP**: Storage Service (S3/backup CRUD) e Plugin Service (marketplace). Verranno aggiunti come servizi indipendenti in una fase successiva.
+
+```
+                          ┌──────────────┐
+                          │  Cloudflare  │
+                          │  (DNS + CDN) │
+                          └──────┬───────┘
+                                 │ HTTPS / WSS
+                          ┌──────▼───────┐
+                          │   Traefik    │
+                          │ API Gateway  │
+                          │  (routing,   │
+                          │   TLS, rate  │
+                          │   limiting)  │
+                          └──────┬───────┘
+                                 │
+          ┌──────────┬───────────┼───────────┐
+          │          │           │           │
+    ┌─────▼────┐ ┌───▼───┐ ┌────▼────┐ ┌────▼───┐
+    │  Auth    │ │  Chat │ │  Agent  │ │Billing │
+    │ Service  │ │Service│ │ Service │ │Service │
+    └─────┬────┘ └───┬───┘ └────┬────┘ └────┬───┘
+          │          │          │           │
+    ┌─────▼──────────▼──────────▼───────────▼────┐
+    │              Infrastruttura                 │
+    │  PostgreSQL  │  Redis  │  Qdrant            │
+    └─────────────────────────────────────────────┘
+```
+
+---
+
+## 1. Suddivisione dei Servizi
+
+### 1.1 Auth Service (`auth-service`)
+
+**Responsabilità**: Registrazione, login, refresh token, profilo utente, encryption key.
+
+| Endpoint originale | Metodo |
+|---|---|
+| `/api/v1/auth/register` | POST |
+| `/api/v1/auth/login` | POST |
+| `/api/v1/auth/refresh` | POST |
+| `/api/v1/auth/me` | GET / PUT |
+
+**Database**: Tabelle `users`, `refresh_tokens` (PostgreSQL condiviso, schema `auth`).
+
+**Modifica chiave — JWT con RS256**:
+Il monolite usa un `SECRET_KEY` simmetrico (HS256). Con i microservizi, passare a **RS256** (asimmetrico):
+- L'Auth Service firma i JWT con la **chiave privata**.
+- Tutti gli altri servizi verificano i JWT con la **chiave pubblica** senza mai contattare l'Auth Service.
+- La chiave pubblica viene esposta via `GET /api/v1/auth/.well-known/jwks.json` oppure montata come volume condiviso.
+
+```python
+# auth-service/app/auth/jwt.py
+from cryptography.hazmat.primitives.asymmetric import rsa
+from jose import jwt
+
+PRIVATE_KEY = ...  # Da env/secret
+PUBLIC_KEY = ...   # Derivata o da env
+
+def create_access_token(user_id: str, tier: str) -> str:
+    return jwt.encode(
+        {"sub": user_id, "tier": tier, "exp": ...},
+        PRIVATE_KEY,
+        algorithm="RS256",
+    )
+```
+
+```python
+# shared/auth.py  (usato da tutti gli altri servizi)
+from jose import jwt
+
+PUBLIC_KEY = ...  # Volume montato o fetched da JWKS endpoint
+
+def verify_token(token: str) -> dict:
+    return jwt.decode(token, PUBLIC_KEY, algorithms=["RS256"])
+```
+
+**Scaling**: 2 repliche sufficienti, stateless. Rate-limit dedicato su `/login` e `/register`.
+
+---
+
+### 1.2 Chat Service (`chat-service`) ⭐ Real-time
+
+**Responsabilità**: WebSocket device connection, home chat, floating chat, memory middleware, streaming LLM responses verso il client.
+
+Questo servizio gestisce la **connessione persistente** con l'app Electron e le interazioni **real-time** dell'utente (chat home, floating chat). È il proprietario della WebSocket.
+
+| Endpoint | Tipo |
+|---|---|
+| `/api/v1/ws/device` | WebSocket (connessione persistente) |
+| `/api/v1/chat` | POST (REST fallback) |
+
+**Moduli inclusi**: `deep_agent`, `memory_middleware`, `ws_context`, `device_manager` (Redis-backed), `output_formatter`, `llm`, tutti gli agent tools (`task_agent`, `project_agent`, `note_agent`, `timeline_agent`).
+
+**Perché separato dall'Agent Service**: Il Chat Service tiene la WebSocket aperta e risponde in tempo reale (streaming). Scalare aggiungendo repliche è semplice con sticky sessions + Redis pub/sub per il cross-instance routing dei tool_call.
+
+**Scaling**: 2–N repliche. Sticky cookies per le WS + Redis per cross-instance.
+
+---
+
+### 1.3 Agent Service (`agent-service`) ⭐ Batch
+
+**Responsabilità**: Batch agent processing (directory scanning, file classification, entity extraction), agent setup journeys, agent configuration CRUD.
+
+Questo servizio gestisce i processi **long-running** e **CPU-intensive**: scansione filesystem, classificazione file con LLM, estrazione entità in batch. Non possiede la WebSocket — comunica con il device dell'utente tramite **Redis pub/sub** passando per il Chat Service.
+
+| Endpoint | Tipo |
+|---|---|
+| `/api/v1/agents/catalog` | GET |
+| `/api/v1/agents/can-create` | POST |
+| `/api/v1/agents/trigger` | POST |
+| `/api/v1/agents/journey/start` | POST (o WS relay) |
+| `/api/v1/agents/journey/message` | POST (o WS relay) |
+
+**Moduli inclusi**: `agent_runner`, `agent_registry`, `filesystem_agent`, `llm`.
+
+**Flusso tool-call cross-service** (l'Agent Service non ha la WS):
+
+```
+┌──────────────┐            ┌──────────────┐            ┌──────────┐
+│ Agent Service│            │    Redis     │            │  Chat    │
+│ (batch run)  │            │              │            │ Service  │
+│              │            │              │            │ (ha WS)  │
+│ 1. Needs to  │  PUBLISH   │              │ SUBSCRIBE  │          │
+│    read file ├───────────►│tool_call:u123├───────────►│ 2. Invia │
+│    from      │            │              │            │    al    │
+│    device    │            │              │            │    device│
+│              │            │              │            │    via WS│
+│              │  SUBSCRIBE │              │  PUBLISH   │          │
+│ 4. Riceve   ◄────────────┤tool_result:id│◄───────────┤ 3. Device│
+│    risultato │            │              │            │    reply │
+└──────────────┘            └──────────────┘            └──────────┘
+```
+
+**Scaling**: 1–N repliche. Completamente stateless, scala indipendentemente dalla chat. Ogni replica processa batch job diversi. Può essere scalato a 0 se non ci sono agent attivi (risparmio risorse).
+
+**Vantaggio dello split**: Se 50 utenti triggerano agenti batch contemporaneamente, il Chat Service non ne risente — le risposte real-time rimangono veloci.
+
+---
+
+### 1.4 Billing Service (`billing-service`)
+
+**Responsabilità**: Stripe checkout, webhook, subscription management.
+
+| Endpoint originale | Metodo |
+|---|---|
+| `/api/v1/billing/checkout` | POST |
+| `/api/v1/billing/webhook` | POST |
+| `/api/v1/billing/subscription` | GET / DELETE |
+
+**Database**: Tabelle `subscriptions` (schema `billing`).
+
+**Comunicazione inter-servizio**: Quando Stripe invia un webhook e il tier cambia, il Billing Service pubblica un evento su **Redis pub/sub** channel `tier_changed:{user_id}`. L'Auth Service aggiorna il campo `tier` nella tabella users. Al prossimo token refresh il JWT conterrà il tier aggiornato.
+
+**Scaling**: 1 replica sufficiente. Basso traffico.
+
+---
+
+### 1.5 Servizi esclusi dall'MVP
+
+I seguenti servizi verranno aggiunti post-MVP come servizi indipendenti:
+
+| Servizio | Responsabilità | Note |
+|---|---|---|
+| **Storage Service** | S3 blobs CRUD, vector ops, backup | Le funzionalità vector/embed possono restare nel Chat Service per il MVP |
+| **Plugin Service** | Marketplace, install, revenue split | Feature non critica per il lancio |
+
+---
+
+## 2. Tier Check — Dove e Come
+
+Il tier dell'utente (free/pro/power/team) determina rate-limiting, quote e accesso a funzionalità. Con i microservizi, **ogni servizio controlla il tier autonomamente** senza chiamare l'Auth Service.
+
+### Strategia: Tier nel JWT
+
+L'Auth Service include il `tier` come claim nel JWT al momento del login/refresh:
+
+```json
+{
+  "sub": "user_123",
+  "tier": "pro",
+  "exp": 1742515200,
+  "iat": 1742511600
+}
+```
+
+Ogni servizio:
+1. Decodifica il JWT con la chiave pubblica (già lo fa per l'auth)
+2. Legge `payload["tier"]` — **zero chiamate extra**
+3. Applica le sue regole di enforcement localmente
+
+```python
+# shared/auth.py — dependency FastAPI condivisa
+from fastapi import Depends, HTTPException, Request
+from jose import jwt
+
+PUBLIC_KEY = ...
+
+class CurrentUser:
+    def __init__(self, user_id: str, tier: str):
+        self.user_id = user_id
+        self.tier = tier
+
+async def get_current_user(request: Request) -> CurrentUser:
+    token = request.headers.get("Authorization", "").removeprefix("Bearer ")
+    payload = jwt.decode(token, PUBLIC_KEY, algorithms=["RS256"])
+    return CurrentUser(user_id=payload["sub"], tier=payload["tier"])
+
+def require_tier(*allowed_tiers: str):
+    """Dependency che blocca se il tier non è tra quelli ammessi."""
+    async def check(user: CurrentUser = Depends(get_current_user)):
+        if user.tier not in allowed_tiers:
+            raise HTTPException(403, "Tier insufficient")
+        return user
+    return check
+```
+
+### Cosa succede quando il tier cambia (upgrade/downgrade)?
+
+```
+┌──────────┐  Stripe webhook   ┌──────────┐  tier_changed   ┌──────────┐
+│  Stripe  │ ─────────────────►│ Billing  │ ───────────────►│   Auth   │
+│          │                    │ Service  │  (Redis pub/sub) │ Service  │
+└──────────┘                    └──────────┘                  └────┬─────┘
+                                                                   │
+                                                          UPDATE users
+                                                          SET tier = 'power'
+                                                                   │
+                                                    Al prossimo /refresh
+                                                    il JWT conterrà tier='power'
+```
+
+**Latenza del cambio**: Il tier si propaga al prossimo token refresh (tipicamente 15–30 min, o il client può forzare un refresh immediato dopo il checkout). Per il billing webhook, il downgrade può essere forzato invalidando il refresh token su Redis → il client è obbligato a ri-autenticarsi.
+
+### Dove si applica in ciascun servizio
+
+| Servizio | Enforcement |
+|---|---|
+| **Auth Service** | Nessuno (è lui che scrive il tier) |
+| **Chat Service** | Rate-limit per tier (req/min), quota messaggi |
+| **Agent Service** | Max agent configs, max runs/day, max concurrent batches |
+| **Billing Service** | Nessuno (gestisce i tier, non li consuma) |
+
+### Rate-limit distribuito via Redis
+
+Poiché ogni servizio ha le sue repliche, il rate-limiting deve essere **condiviso** via Redis:
+
+```python
+# shared/middleware/rate_limit.py
+import redis.asyncio as aioredis
+
+class DistributedRateLimiter:
+    def __init__(self, redis: aioredis.Redis):
+        self._redis = redis
+
+    async def check(self, user_id: str, tier: str, service: str) -> bool:
+        limits = {"free": 20, "pro": 60, "power": 120, "team": 200}
+        max_req = limits.get(tier, 20)
+        key = f"rate:{service}:{user_id}"
+
+        pipe = self._redis.pipeline()
+        pipe.incr(key)
+        pipe.expire(key, 60)
+        count, _ = await pipe.execute()
+
+        return count <= max_req
+```
+
+---
+
+## 3. WebSocket con Scaling Orizzontale — Il Problema Chiave
+
+`DeviceConnectionManager` è un **singleton in-memory**:
+
+```python
+class DeviceConnectionManager:
+    def __init__(self):
+        self._connections: dict[str, DeviceConnection] = {}  # ← In-memory!
+```
+
+Con N istanze del Chat Service, il device si connette a **una sola** istanza. Quando un'altra istanza deve inviare un `tool_call` a quel device (es. un agent trigger da un'API call), non trova la connessione.
+
+### La soluzione: Redis Pub/Sub + Registry
+
+```
+┌──────────────────────────────────────────────────────────────┐
+│                     Redis                                    │
+│                                                              │
+│  Hash: ws:connections                                        │
+│    user_123 → instance_A                                     │
+│    user_456 → instance_B                                     │
+│                                                              │
+│  Pub/Sub channels:                                           │
+│    tool_call:{user_id}  → tool call payloads                 │
+│    tool_result:{call_id} → tool result payloads              │
+│    stream:{user_id}     → text_chunk streaming               │
+└──────────────────────────────────────────────────────────────┘
+
+ Instance A (ha WS di user_123)     Instance B (deve chiamare tool su user_123)
+ ┌───────────────────────┐          ┌───────────────────────┐
+ │  1. Sottoscrive a     │          │  1. Lookup Redis Hash │
+ │     tool_call:user_123│          │     → user_123 è su A │
+ │                       │          │                       │
+ │  2. Riceve tool_call  │◄─────────│  2. PUBLISH           │
+ │     da Redis channel  │          │    tool_call:user_123 │
+ │                       │          │    {id, action, ...}  │
+ │  3. Invia al device   │          │                       │
+ │     via WS            │          │  4. SUBSCRIBE         │
+ │                       │          │    tool_result:{id}   │
+ │  4. Device risponde   │          │                       │
+ │     tool_result       │──────────│► 5. Riceve risultato  │
+ │                       │          │                       │
+ │  5. PUBLISH           │          │                       │
+ │    tool_result:{id}   │          │                       │
+ └───────────────────────┘          └───────────────────────┘
+```
+
+### Implementazione: `RedisDeviceManager`
+
+```python
+# chat-service/app/core/device_manager.py
+
+import asyncio
+import json
+import os
+import redis.asyncio as aioredis
+from dataclasses import dataclass, field
+from fastapi import WebSocket
+
+INSTANCE_ID = os.environ.get("INSTANCE_ID", os.urandom(8).hex())
+
+@dataclass
+class LocalConnection:
+    ws: WebSocket
+    device_id: str
+    pending_calls: dict[str, asyncio.Future[dict]] = field(default_factory=dict)
+
+
+class RedisDeviceManager:
+    """Device manager backed by Redis for cross-instance communication."""
+
+    def __init__(self, redis_url: str = "redis://redis:6379"):
+        self._redis = aioredis.from_url(redis_url)
+        self._pubsub = self._redis.pubsub()
+        self._local: dict[str, LocalConnection] = {}  # Solo connessioni locali
+        self._remote_futures: dict[str, asyncio.Future[dict]] = {}
+
+    async def start(self):
+        """Avvia il listener Redis per tool_call in arrivo."""
+        asyncio.create_task(self._listen_tool_calls())
+
+    # ── Registrazione ──
+
+    async def register(self, user_id: str, device_id: str, ws: WebSocket):
+        # Registra localmente
+        self._local[user_id] = LocalConnection(ws=ws, device_id=device_id)
+        # Registra in Redis quale istanza ha la connessione
+        await self._redis.hset("ws:connections", user_id, INSTANCE_ID)
+        # Sottoscrivi ai tool_call per questo utente
+        await self._pubsub.subscribe(f"tool_call:{user_id}")
+
+    async def unregister(self, user_id: str):
+        conn = self._local.pop(user_id, None)
+        if conn:
+            for fut in conn.pending_calls.values():
+                if not fut.done():
+                    fut.cancel()
+        await self._redis.hdel("ws:connections", user_id)
+        await self._pubsub.unsubscribe(f"tool_call:{user_id}")
+
+    # ── Presenza ──
+
+    async def is_online(self, user_id: str) -> bool:
+        return await self._redis.hexists("ws:connections", user_id)
+
+    # ── Tool-call round-trip (cross-instance) ──
+
+    async def execute_tool_call(self, user_id: str, payload: dict) -> dict:
+        """
+        Invia un tool_call al device dell'utente.
+        Funziona sia che la WS sia locale che su un'altra istanza.
+        """
+        call_id = payload["id"]
+
+        # Caso 1: connessione locale → invio diretto
+        if user_id in self._local:
+            conn = self._local[user_id]
+            loop = asyncio.get_event_loop()
+            fut: asyncio.Future[dict] = loop.create_future()
+            conn.pending_calls[call_id] = fut
+            await conn.ws.send_text(json.dumps({"type": "tool_call", **payload}))
+            return await asyncio.wait_for(fut, timeout=30.0)
+
+        # Caso 2: connessione remota → Redis pub/sub
+        loop = asyncio.get_event_loop()
+        fut = loop.create_future()
+        self._remote_futures[call_id] = fut
+
+        # Sottoscrivi al canale di risposta
+        result_channel = f"tool_result:{call_id}"
+        await self._pubsub.subscribe(result_channel)
+
+        # Pubblica il tool_call
+        await self._redis.publish(
+            f"tool_call:{user_id}",
+            json.dumps(payload),
+        )
+
+        try:
+            return await asyncio.wait_for(fut, timeout=30.0)
+        finally:
+            self._remote_futures.pop(call_id, None)
+            await self._pubsub.unsubscribe(result_channel)
+
+    # ── Risoluzione tool_result (da WS locale) ──
+
+    def resolve_local(self, user_id: str, call_id: str, result: dict):
+        conn = self._local.get(user_id)
+        if conn:
+            fut = conn.pending_calls.pop(call_id, None)
+            if fut and not fut.done():
+                fut.set_result(result)
+
+    async def resolve_and_publish(self, user_id: str, call_id: str, result: dict):
+        """Chiamato quando il device locale invia un tool_result."""
+        self.resolve_local(user_id, call_id, result)
+        # Pubblica anche su Redis per l'istanza remota che aspetta
+        await self._redis.publish(
+            f"tool_result:{call_id}",
+            json.dumps(result),
+        )
+
+    # ── Listener Redis ──
+
+    async def _listen_tool_calls(self):
+        """Loop che ascolta i tool_call in arrivo da altre istanze."""
+        async for message in self._pubsub.listen():
+            if message["type"] != "message":
+                continue
+            channel = message["channel"]
+            if isinstance(channel, bytes):
+                channel = channel.decode()
+
+            data = json.loads(message["data"])
+
+            if channel.startswith("tool_call:"):
+                # Un'altra istanza vuole che inviamo un tool_call al nostro device
+                user_id = channel.split(":", 1)[1]
+                conn = self._local.get(user_id)
+                if conn:
+                    await conn.ws.send_text(json.dumps({"type": "tool_call", **data}))
+
+            elif channel.startswith("tool_result:"):
+                # Risposta a un tool_call che abbiamo inviato tramite Redis
+                call_id = channel.split(":", 1)[1]
+                fut = self._remote_futures.pop(call_id, None)
+                if fut and not fut.done():
+                    fut.set_result(data)
+
+    # ── Stream cross-instance ──
+
+    async def publish_stream_chunk(self, user_id: str, chunk: dict):
+        """Pubblica un chunk di streaming su Redis (per REST→WS relay)."""
+        await self._redis.publish(f"stream:{user_id}", json.dumps(chunk))
+```
+
+---
+
+## 4. Struttura Directory Proposta (MVP)
+
+```
+adiuva-api/
+├── docker-compose.yml          # Orchestrazione completa
+├── docker-compose.dev.yml      # Override per sviluppo locale
+├── shared/                     # Codice condiviso (montato come volume)
+│   ├── auth.py                 # JWT verification (chiave pubblica)
+│   ├── schemas.py              # Pydantic schemas condivisi
+│   ├── middleware/
+│   │   ├── rate_limit.py       # DistributedRateLimiter (Redis)
+│   │   └── sanitizer.py
+│   └── models/
+│       └── base.py             # SQLAlchemy base condivisa
+│
+├── auth-service/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   └── app/
+│       ├── main.py
+│       ├── config.py
+│       ├── db.py
+│       ├── models.py           # users, refresh_tokens
+│       ├── routes/
+│       │   └── auth.py
+│       └── services/
+│           ├── jwt_service.py  # RS256 signing
+│           └── user_service.py
+│
+├── chat-service/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   └── app/
+│       ├── main.py
+│       ├── config.py
+│       ├── db.py
+│       ├── models.py           # memory_*
+│       ├── routes/
+│       │   ├── device_ws.py    # WS connection owner
+│       │   └── chat.py         # REST fallback
+│       ├── core/
+│       │   ├── device_manager.py   # RedisDeviceManager
+│       │   ├── deep_agent.py       # Home + floating chat
+│       │   ├── memory_middleware.py
+│       │   ├── ws_context.py
+│       │   ├── output_formatter.py
+│       │   └── llm.py
+│       └── agents/                 # Tool definitions (used by deep_agent)
+│           ├── task_agent.py
+│           ├── project_agent.py
+│           ├── note_agent.py
+│           └── timeline_agent.py
+│
+├── agent-service/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   └── app/
+│       ├── main.py
+│       ├── config.py
+│       ├── db.py
+│       ├── models.py           # agent_run_logs, local/cloud_agent_configs
+│       ├── routes/
+│       │   ├── agents.py       # catalog, can-create, trigger
+│       │   └── agent_setup.py  # journey start/message
+│       ├── core/
+│       │   ├── agent_runner.py     # Batch classify → process
+│       │   ├── agent_registry.py
+│       │   ├── redis_executor.py   # execute_on_client via Redis pub/sub
+│       │   └── llm.py
+│       └── agents/
+│           ├── task_agent.py       # Tool definitions (batch context)
+│           ├── project_agent.py
+│           ├── note_agent.py
+│           ├── timeline_agent.py
+│           └── filesystem_agent.py
+│
+├── billing-service/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   └── app/
+│       ├── main.py
+│       ├── config.py
+│       ├── db.py
+│       ├── models.py           # subscriptions
+│       ├── routes/
+│       │   └── billing.py
+│       └── services/
+│           ├── stripe_service.py
+│           └── tier_manager.py
+│
+└── infra/
+    ├── traefik/
+    │   └── traefik.yml
+    ├── keys/
+    │   ├── jwt_private.pem     # Solo auth-service
+    │   └── jwt_public.pem      # Tutti i servizi
+    └── alembic/                # Migrazioni condivise o per-servizio
+```
+
+---
+
+## 5. Docker Compose — Configurazione MVP
+
+```yaml
+# docker-compose.yml
+
+services:
+
+  # ══════════════════════════════════════════════════════════
+  # API Gateway
+  # ══════════════════════════════════════════════════════════
+  traefik:
+    image: traefik:v3.2
+    command:
+      - "--api.insecure=true"
+      - "--providers.docker=true"
+      - "--providers.docker.exposedbydefault=false"
+      - "--entrypoints.web.address=:80"
+      - "--entrypoints.websecure.address=:443"
+      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
+    ports:
+      - "80:80"
+      - "443:443"
+      - "8080:8080"   # Dashboard Traefik (disabilitare in prod)
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock:ro
+      - ./infra/certs:/certs:ro
+    restart: unless-stopped
+
+  # ══════════════════════════════════════════════════════════
+  # Auth Service (2 repliche)
+  # ══════════════════════════════════════════════════════════
+  auth-service:
+    build: ./auth-service
+    deploy:
+      replicas: 2
+    env_file: .env
+    environment:
+      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
+      REDIS_URL: redis://redis:6379
+      JWT_PRIVATE_KEY_FILE: /run/secrets/jwt_private_key
+      SERVICE_NAME: auth
+    secrets:
+      - jwt_private_key
+      - jwt_public_key
+    labels:
+      - "traefik.enable=true"
+      - "traefik.http.routers.auth.rule=PathPrefix(`/api/v1/auth`)"
+      - "traefik.http.services.auth.loadbalancer.server.port=8000"
+    depends_on:
+      db:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+
+  # ══════════════════════════════════════════════════════════
+  # Chat Service — Real-time WS + Chat (scalabile)
+  # ══════════════════════════════════════════════════════════
+  chat-service:
+    build: ./chat-service
+    deploy:
+      replicas: 2
+    env_file: .env
+    environment:
+      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
+      REDIS_URL: redis://redis:6379
+      JWT_PUBLIC_KEY_FILE: /run/secrets/jwt_public_key
+      SERVICE_NAME: chat
+    secrets:
+      - jwt_public_key
+    labels:
+      - "traefik.enable=true"
+      # REST chat endpoint
+      - "traefik.http.routers.chat.rule=PathPrefix(`/api/v1/chat`)"
+      - "traefik.http.services.chat.loadbalancer.server.port=8000"
+      # WebSocket route con sticky session
+      - "traefik.http.routers.ws.rule=PathPrefix(`/api/v1/ws`)"
+      - "traefik.http.routers.ws.service=chat-ws"
+      - "traefik.http.services.chat-ws.loadbalancer.server.port=8000"
+      - "traefik.http.services.chat-ws.loadbalancer.sticky.cookie.name=ws_affinity"
+      - "traefik.http.services.chat-ws.loadbalancer.sticky.cookie.httpOnly=true"
+    depends_on:
+      db:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+
+  # ══════════════════════════════════════════════════════════
+  # Agent Service — Batch processing (scalabile indipendentemente)
+  # ══════════════════════════════════════════════════════════
+  agent-service:
+    build: ./agent-service
+    deploy:
+      replicas: 2
+    env_file: .env
+    environment:
+      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
+      REDIS_URL: redis://redis:6379
+      JWT_PUBLIC_KEY_FILE: /run/secrets/jwt_public_key
+      SERVICE_NAME: agent
+    secrets:
+      - jwt_public_key
+    labels:
+      - "traefik.enable=true"
+      - "traefik.http.routers.agents.rule=PathPrefix(`/api/v1/agents`)"
+      - "traefik.http.services.agents.loadbalancer.server.port=8000"
+    depends_on:
+      db:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+
+  # ══════════════════════════════════════════════════════════
+  # Billing Service (1 replica)
+  # ══════════════════════════════════════════════════════════
+  billing-service:
+    build: ./billing-service
+    deploy:
+      replicas: 1
+    env_file: .env
+    environment:
+      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
+      REDIS_URL: redis://redis:6379
+      JWT_PUBLIC_KEY_FILE: /run/secrets/jwt_public_key
+      SERVICE_NAME: billing
+    secrets:
+      - jwt_public_key
+    labels:
+      - "traefik.enable=true"
+      - "traefik.http.routers.billing.rule=PathPrefix(`/api/v1/billing`)"
+      - "traefik.http.services.billing.loadbalancer.server.port=8000"
+    depends_on:
+      db:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+
+  # ══════════════════════════════════════════════════════════
+  # Infrastruttura
+  # ══════════════════════════════════════════════════════════
+  db:
+    image: pgvector/pgvector:pg16
+    environment:
+      POSTGRES_USER: postgres
+      POSTGRES_PASSWORD: postgres
+      POSTGRES_DB: adiuva
+    volumes:
+      - postgres_data:/var/lib/postgresql/data
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U postgres"]
+      interval: 5s
+      timeout: 5s
+      retries: 5
+    restart: unless-stopped
+
+  redis:
+    image: redis:7-alpine
+    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
+    volumes:
+      - redis_data:/data
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 5s
+      timeout: 3s
+      retries: 5
+    restart: unless-stopped
+
+  qdrant:
+    image: qdrant/qdrant:latest
+    volumes:
+      - qdrant_data:/qdrant/storage
+    restart: unless-stopped
+
+secrets:
+  jwt_private_key:
+    file: ./infra/keys/jwt_private.pem
+  jwt_public_key:
+    file: ./infra/keys/jwt_public.pem
+
+volumes:
+  postgres_data:
+  redis_data:
+  qdrant_data:
+```
+
+---
+
+## 6. Configurazione Cloudflare + VPS
+
+### 6.1 DNS
+
+```
+api.tuodominio.com  →  A record  →  IP del VPS
+                    →  Proxy: ON (orange cloud)
+```
+
+### 6.2 Cloudflare Settings
+
+| Setting | Valore | Motivo |
+|---------|--------|--------|
+| SSL/TLS mode | **Full (Strict)** | Cloudflare ↔ VPS con certificato valido |
+| WebSocket | **ON** | Necessario per `/api/v1/ws/device` |
+| Proxy timeout | **100s** (Enterprise) o default | Le LLM calls possono durare 30s+ |
+| Under Attack Mode | Off (attivare se necessario) | |
+
+### 6.3 TLS sul VPS
+
+Due opzioni:
+- **Opzione A (consigliata)**: Cloudflare Origin Certificate → montato in Traefik
+- **Opzione B**: Let's Encrypt via Traefik (con DNS challenge Cloudflare)
+
+```yaml
+# traefik.yml — con Cloudflare Origin Certificate
+entryPoints:
+  websecure:
+    address: ":443"
+
+tls:
+  certificates:
+    - certFile: /certs/origin.pem
+      keyFile: /certs/origin-key.pem
+```
+
+### 6.4 Rete VPS
+
+```bash
+# UFW firewall — solo Cloudflare può raggiungere le porte 80/443
+# https://www.cloudflare.com/ips/
+ufw default deny incoming
+ufw allow from 173.245.48.0/20 to any port 443
+ufw allow from 103.21.244.0/22 to any port 443
+# ... (tutti gli IP range di Cloudflare)
+ufw allow ssh
+ufw enable
+```
+
+---
+
+## 7. Comunicazione Inter-Servizio
+
+### 7.1 Redis Pub/Sub — Event Bus
+
+```
+┌──────────┐  tier_changed:user_123   ┌──────────┐
+│ Billing  │ ────────────────────────► │   Auth   │
+│ Service  │                           │ Service  │
+└──────────┘                           └──────────┘
+
+┌──────────┐  tool_call:user_123      ┌──────────┐
+│  Agent   │ ────────────────────────► │   Chat   │
+│ Service  │                           │ Service  │
+│ (batch)  │ ◄────────────────────────│ (ha WS)  │
+└──────────┘  tool_result:{call_id}    └──────────┘
+```
+
+### 7.2 Health Checks e Service Discovery
+
+Traefik gestisce automaticamente il service discovery via Docker labels. I servizi non devono conoscersi tra loro — comunicano solo via:
+- **Redis pub/sub** (tool-call cross-instance, tier events)
+- **Redis hash** (stato condiviso: `ws:connections`, rate-limit counters)
+- **PostgreSQL** (dati persistenti condivisi)
+
+---
+
+## 8. Piano di Migrazione Incrementale (MVP)
+
+### Fase 1 — Preparazione (nel monolite attuale)
+1. Aggiungere Redis al `docker-compose.yml` attuale
+2. Migrare JWT da HS256 → RS256 (backward-compatible: accetta entrambi per un periodo)
+3. Implementare `RedisDeviceManager` come drop-in replacement del singleton in-memory
+4. Estrarre `shared/` con auth verification, schemas, middleware
+
+### Fase 2 — Auth Service (primo split)
+1. Estrarre `auth.py` routes + models in `auth-service/`
+2. Verificare che i JWT firmati da `auth-service` vengano validati dal monolite
+3. Aggiungere Traefik e routare `/api/v1/auth/*` al nuovo servizio
+4. Il monolite continua a servire tutto il resto
+
+### Fase 3 — Billing Service
+1. Estrarre billing routes, Stripe service, tier manager
+2. Configurare Redis pub/sub per `tier_changed` events
+3. Routare via Traefik
+
+### Fase 4 — Split Chat + Agent (il più delicato)
+1. Il monolite residuo contiene WS + chat + agents
+2. Separare Agent Service: estrarre `agent_runner`, `agent_registry`, `agent_setup`, route `/agents/*`
+3. Implementare `redis_executor.py` nell'Agent Service per tool-call via Redis
+4. Il Chat Service resta proprietario della WS e sottoscrive i canali `tool_call:{user_id}`
+5. Testare: trigger agent dall'Agent Service → tool_call via Redis → Chat Service → WS → device → risposta
+
+### Fase 5 — Scaling test
+1. Scalare Chat Service a 2 repliche, verificare sticky sessions
+2. Scalare Agent Service a 2 repliche, verificare batch processing distribuito
+3. Monitoring (Prometheus + Grafana) per ogni servizio
+
+---
+
+## 9. Monitoraggio e Logging
+
+```yaml
+# Aggiungere al docker-compose.yml
+
+  prometheus:
+    image: prom/prometheus:latest
+    volumes:
+      - ./infra/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
+    restart: unless-stopped
+
+  grafana:
+    image: grafana/grafana:latest
+    ports:
+      - "3000:3000"
+    volumes:
+      - grafana_data:/var/lib/grafana
+    restart: unless-stopped
+
+  loki:
+    image: grafana/loki:latest
+    restart: unless-stopped
+```
+
+Ogni servizio espone `/metrics` (Prometheus) e scrive log strutturati (JSON) raccolti da Loki.
+
+---
+
+## 10. Sizing VPS Minimo Consigliato (MVP)
+
+| Componente | CPU | RAM | Note |
+|---|---|---|---|
+| Traefik | 0.25 | 128MB | |
+| Auth Service ×2 | 0.25 ×2 | 128MB ×2 | Stateless, leggero |
+| Chat Service ×2 | 1.0 ×2 | 1GB ×2 | WS + streaming LLM |
+| Agent Service ×2 | 0.75 ×2 | 512MB ×2 | Batch LLM, CPU-bound |
+| Billing Service | 0.25 | 128MB | |
+| PostgreSQL | 1.0 | 1GB | |
+| Redis | 0.25 | 256MB | |
+| Qdrant | 0.5 | 512MB | |
+| **Totale MVP** | **~5.5 vCPU** | **~5 GB** | |
+
+**Raccomandazione**: VPS con **8 vCPU / 16 GB RAM** per avere margine. Hetzner CPX41 (~€30/mese) o equivalente. Senza Storage/Plugin si risparmia ~1 vCPU e 512MB rispetto alla versione completa.
+
+---
+
+## Riepilogo Architettura MVP
+
+| Servizio | Repliche | Proprietario di |
+|---|---|---|
+| **Traefik** | 1 | Routing, TLS, sticky sessions |
+| **Auth Service** | 2 | JWT RS256, registrazione, login, profilo |
+| **Chat Service** | 2–N | WebSocket, home/floating chat, streaming |
+| **Agent Service** | 2–N | Batch processing, directory scan, agent setup |
+| **Billing Service** | 1 | Stripe, subscriptions, tier management |
+
+| Decisione | Scelta | Motivazione |
+|---|---|---|
+| API Gateway | Traefik | Nativo Docker, WebSocket support, service discovery automatico |
+| JWT | RS256 (asimmetrico) | Verifica distribuita senza contattare Auth Service |
+| Tier check | Claim nel JWT | Ogni servizio verifica localmente, zero roundtrip |
+| WebSocket scaling | Redis pub/sub + sticky cookies | Cross-instance tool-call routing |
+| Chat ↔ Agent split | Servizi separati | Batch CPU-bound non impatta real-time chat |
+| Agent → Device comms | Redis pub/sub via Chat Service | Agent non possiede la WS, usa un relay |
+| Rate limiting | Redis contatori distribuiti | Sliding window condivisa tra repliche |
+| Database | PostgreSQL condiviso | Semplicità MVP; split DB futuro facile |
+| TLS | Cloudflare Origin Certificate | Zero maintenance |
+| Orchestrazione | Docker Compose | Sufficiente per un singolo VPS |
+| Storage / Plugin | Post-MVP | Non critici per il lancio |
--- a/logging.conf
+++ b/logging.conf
@@ -0,0 +1,56 @@
+[loggers]
+keys=root,uvicorn,uvicorn.error,uvicorn.access,sqlalchemy,watchfiles
+
+[handlers]
+keys=console,file
+
+[formatters]
+keys=default
+
+[logger_root]
+level=INFO
+handlers=console,file
+
+[logger_uvicorn]
+level=INFO
+handlers=
+qualname=uvicorn
+propagate=1
+
+[logger_uvicorn.error]
+level=INFO
+handlers=
+qualname=uvicorn.error
+propagate=1
+
+[logger_uvicorn.access]
+level=INFO
+handlers=
+qualname=uvicorn.access
+propagate=1
+
+[logger_sqlalchemy]
+level=WARNING
+handlers=
+qualname=sqlalchemy
+propagate=1
+
+[logger_watchfiles]
+level=WARNING
+handlers=
+qualname=watchfiles
+propagate=1
+
+[handler_console]
+class=StreamHandler
+formatter=default
+args=(sys.stderr,)
+
+[handler_file]
+class=logging.handlers.RotatingFileHandler
+formatter=default
+args=('logs/app.log', 'a', 10485760, 5, 'utf-8')
+
+[formatter_default]
+format=%(asctime)s %(levelname)s %(name)s: %(message)s
+datefmt=%Y-%m-%d %H:%M:%S
--- a/requirements.txt
+++ b/requirements.txt
@@ -3,6 +3,7 @@ uvicorn[standard]>=0.34.0
 gunicorn>=22.0.0
 langchain>=0.3.0
 langchain-openai>=0.3.0
+langchain-litellm>=0.1.0
 litellm>=1.50.0
 pydantic>=2.10.0
 pydantic-settings>=2.7.0
@@ -24,4 +25,15 @@ aiosqlite>=0.20.0
 moto[s3]>=5.0.0
 pinecone>=5.0.0
 qdrant-client>=1.7.0
+croniter>=3.0.0
+google-api-python-client>=2.130.0
+google-auth>=2.29.0
+google-auth-oauthlib>=1.2.0
+google-auth-httplib2>=0.2.0
+msal>=1.28.0
+cryptography>=42.0.0
+langfuse>=2.0.0
+beautifulsoup4>=4.12.0
+lxml>=5.0.0
+PyYAML>=6.0.0
 ruff>=0.8.0
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -6,26 +6,21 @@ a per-test session, and a FastAPI ``TestClient`` wired to use it.

 from __future__ import annotations

-import json
-import os
 import time
 import uuid
 from collections.abc import AsyncGenerator, Generator
-from unittest.mock import patch

-import boto3
 import pytest
 import pytest_asyncio
 from fastapi.testclient import TestClient
 from jose import jwt
-from moto import mock_aws
 from sqlalchemy import StaticPool, event
 from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine

 from app.config.settings import settings
 from app.db import Base, get_session
 from app.main import app
-from app.models import Plugin, Subscription, User
+from app.models import Subscription, User

 # ── Fixed test user IDs (one per tier) ───────────────────────────────

@@ -109,79 +104,6 @@ def client(db_session: AsyncSession) -> Generator[TestClient, None, None]:   # n
    app.dependency_overrides.pop(get_session, None)


-# ── Seed data helpers ────────────────────────────────────────────────
-
-_SEED_PLUGINS = [
-    Plugin(
-        id="plugin-github-sync",
-        name="GitHub Sync",
-        description="Sync tasks with GitHub Issues and pull requests.",
-        version="1.0.0",
-        author_name="Adiuva",
-        category="productivity",
-        price_cents=0,
-        permissions=json.dumps(["read:tasks", "write:tasks"]),
-        status="approved",
-        s3_package_key="plugins/plugin-github-sync/1.0.0/package.zip",
-        install_count=0,
-        avg_rating=0.0,
-    ),
-    Plugin(
-        id="plugin-slack-notify",
-        name="Slack Notifier",
-        description="Post task and checkpoint updates to Slack channels.",
-        version="1.2.0",
-        author_name="Adiuva",
-        category="communication",
-        price_cents=499,
-        permissions=json.dumps(["read:tasks", "read:checkpoints"]),
-        status="approved",
-        s3_package_key="plugins/plugin-slack-notify/1.2.0/package.zip",
-        install_count=0,
-        avg_rating=0.0,
-    ),
-    Plugin(
-        id="plugin-time-tracker",
-        name="Time Tracker",
-        description="Track time spent on tasks with automatic reporting.",
-        version="0.9.1",
-        author_name="Third Party",
-        category="productivity",
-        price_cents=999,
-        permissions=json.dumps(["read:tasks", "write:tasks"]),
-        status="approved",
-        s3_package_key="plugins/plugin-time-tracker/0.9.1/package.zip",
-        install_count=0,
-        avg_rating=0.0,
-    ),
-]
-
-
-@pytest_asyncio.fixture
-async def seed_plugins(db_session: AsyncSession) -> list[Plugin]:
-    """Insert the 3 default approved plugins and return them."""
-    plugins = []
-    for template in _SEED_PLUGINS:
-        p = Plugin(
-            id=template.id,
-            name=template.name,
-            description=template.description,
-            version=template.version,
-            author_name=template.author_name,
-            category=template.category,
-            price_cents=template.price_cents,
-            permissions=template.permissions,
-            status=template.status,
-            s3_package_key=template.s3_package_key,
-            install_count=template.install_count,
-            avg_rating=template.avg_rating,
-        )
-        db_session.add(p)
-        plugins.append(p)
-    await db_session.commit()
-    return plugins
-
-
 # ── JWT helpers ──────────────────────────────────────────────────────


@@ -212,24 +134,21 @@ def auth_header(tier: str = "power", user_id: str | None = None) -> dict[str, st
    return {"Authorization": f"Bearer {make_jwt(tier, user_id)}"}


-# ── S3 mock fixture ──────────────────────────────────────────────────
+# ── CLI options ───────────────────────────────────────────────────────

-S3_TEST_BUCKET = "test-bucket"
-S3_TEST_REGION = "us-east-1"
-
-
-@pytest.fixture
-def s3_bucket():
-    """Create a mocked S3 bucket via moto and patch BlobStore settings."""
-    with mock_aws():
-        os.environ.setdefault("AWS_ACCESS_KEY_ID", "testing")
-        os.environ.setdefault("AWS_SECRET_ACCESS_KEY", "testing")
-        os.environ.setdefault("AWS_DEFAULT_REGION", S3_TEST_REGION)
-        client = boto3.client("s3", region_name=S3_TEST_REGION)
-        client.create_bucket(Bucket=S3_TEST_BUCKET)
-        with patch("app.storage.blob_store.settings") as mock_settings:
-            mock_settings.S3_BUCKET = S3_TEST_BUCKET
-            mock_settings.S3_REGION = S3_TEST_REGION
-            mock_settings.AWS_ACCESS_KEY_ID = "testing"
-            mock_settings.AWS_SECRET_ACCESS_KEY = "testing"
-            yield S3_TEST_BUCKET
+def pytest_addoption(parser):
+    parser.addoption(
+        "--preprocess-dir",
+        default=None,
+        help="Override fixture folder for preprocessor tests (must contain cases.yaml + data/)",
+    )
+    parser.addoption(
+        "--runner-dir",
+        default=None,
+        help="Override fixture folder for agent_runner_v2 eval tests (must contain cases.yaml + data/)",
+    )
+    parser.addoption(
+        "--journey-dir",
+        default=None,
+        help="Override fixture folder for journey_v2 eval tests (must contain cases.yaml + data/)",
+    )
--- a/tests/fixtures/agent_runner_v2/cases.yaml
+++ b/tests/fixtures/agent_runner_v2/cases.yaml
@@ -0,0 +1,86 @@
+# Agent Runner V2 — eval test cases (Step 2, requires real LLM)
+#
+# Each case drives one parametrized `test_eval_runner` invocation.
+#
+# Keys
+# ----
+# id: str                     unique identifier shown in pytest output
+# description: str            human-readable label
+# file: str                   filename inside data/
+# file_path: str              path reported to the executor (affects project-matching via filename)
+# projects: [alpha|beta]      symbolic project names resolved by the test helper
+#
+# Optional pre-existing records (dedup tests)
+# existing_tasks:             list of {id, title, status, priority}
+# existing_notes:             list of {id, title, content}
+# existing_timelines:         list of {id, title, date}
+#
+# Assertions (one or more)
+# expect_insert: <table>      at least 1 insert row in this table (tasks|notes|timelines)
+# expect_no_insert: true      zero inserts in any table
+# expect_project_id: <id>     any insert must carry this projectId
+# expect_dedup: true          task inserts == 0 OR task updates >= 1 (dedup check)
+#
+# Langfuse
+# score_name: str             observation score name
+
+- id: "2.1"
+  description: "Action email → create_task"
+  file: email_action.html
+  file_path: /emails/ProjectAlpha_action.html
+  projects: [alpha, beta]
+  expect_insert: tasks
+  score_name: runner.email_to_task
+
+- id: "2.2"
+  description: "Informational email → create_note"
+  file: email_info.html
+  file_path: /emails/ProjectAlpha_info.html
+  projects: [alpha, beta]
+  expect_insert: notes
+  score_name: runner.email_to_note
+
+- id: "2.3"
+  description: "Email with meeting date → create_timeline"
+  file: email_date.html
+  file_path: /emails/ProjectAlpha_kickoff.html
+  projects: [alpha, beta]
+  expect_insert: timelines
+  score_name: runner.email_to_timeline
+
+- id: "2.4"
+  description: "Filename contains project name → correct project assigned"
+  file: email_action.html
+  file_path: /emails/ProjectAlpha_report.html
+  projects: [alpha, beta]
+  expect_project_id: proj-alpha
+  score_name: runner.project_filename
+
+- id: "2.5"
+  description: "Email body mentions project → correct project assigned"
+  file: email_action.html
+  file_path: /emails/email_001.html
+  projects: [alpha, beta]
+  expect_project_id: proj-alpha
+  score_name: runner.project_content
+
+- id: "2.6"
+  description: "Newsletter + global rule no-project → no creates"
+  file: email_no_project.html
+  file_path: /emails/newsletter.html
+  projects: [alpha, beta]
+  expect_no_insert: true
+  score_name: runner.no_project
+
+- id: "2.7"
+  description: "Existing task with same title → dedup (update not create)"
+  file: email_action.html
+  file_path: /emails/ProjectAlpha_followup.html
+  projects: [alpha]
+  existing_tasks:
+    - id: task-existing
+      title: Fix the login bug
+      status: todo
+      priority: medium
+  expect_dedup: true
+  score_name: runner.dedup
--- a/tests/fixtures/agent_runner_v2/data/email_action.html
+++ b/tests/fixtures/agent_runner_v2/data/email_action.html
@@ -0,0 +1,7 @@
+<html><head></head><body>
+<p><b>From:</b> boss@company.com</p>
+<p><b>To:</b> dev@company.com</p>
+<p><b>Subject:</b> Fix the login bug</p>
+<p><b>Date:</b> 2026-04-07</p>
+<p>Hi,<br>Please fix the login bug in Project Alpha by Friday. High priority!</p>
+</body></html>
--- a/tests/fixtures/agent_runner_v2/data/email_date.html
+++ b/tests/fixtures/agent_runner_v2/data/email_date.html
@@ -0,0 +1,5 @@
+<html><head></head><body>
+<p><b>From:</b> pm@company.com</p>
+<p><b>Subject:</b> Project Alpha kick-off meeting</p>
+<p>The kick-off meeting for Project Alpha is scheduled for 2026-04-15 at 10:00.</p>
+</body></html>
--- a/tests/fixtures/agent_runner_v2/data/email_info.html
+++ b/tests/fixtures/agent_runner_v2/data/email_info.html
@@ -0,0 +1,7 @@
+<html><head></head><body>
+<p><b>From:</b> pm@company.com</p>
+<p><b>To:</b> team@company.com</p>
+<p><b>Subject:</b> FYI: New policy for Project Alpha</p>
+<p>Just a heads-up that starting next week all code reviews must be done
+within 24 hours for Project Alpha. No action needed from you now.</p>
+</body></html>
--- a/tests/fixtures/agent_runner_v2/data/email_no_project.html
+++ b/tests/fixtures/agent_runner_v2/data/email_no_project.html
@@ -0,0 +1,5 @@
+<html><head></head><body>
+<p><b>From:</b> newsletter@ads.com</p>
+<p><b>Subject:</b> Weekly newsletter</p>
+<p>Check out our latest deals on electronics!</p>
+</body></html>
--- a/tests/fixtures/journey_v2/cases.yaml
+++ b/tests/fixtures/journey_v2/cases.yaml
@@ -0,0 +1,87 @@
+# Journey V2 eval test cases — Step 4
+#
+# Each case simulates a complete journey session:
+#   1. handle_journey_start is called with directory + data_types
+#   2. handle_journey_message is called for each entry in user_messages
+#   3. Assertions are evaluated on the final reply
+#
+# directory_files: list of {path, content_file} — content_file is relative to data/
+#
+# Assertion keys:
+#   expect_question: true          → first reply must contain "?"
+#   expect_done: true              → final reply must have done=True
+#   expect_valid_config: true      → agent_config must be parseable as AgentConfig with content_types > 0
+#   expect_content_type_id: <str>  → AgentConfig.content_types must contain an entry with this id
+#   expect_extraction_contains: <str> → first content_type extraction_prompt must contain this word
+#   expect_global_rules: true      → AgentConfig.global_rules must be non-empty
+
+- id: "4.1"
+  description: "Journey start explores directory, first reply contains a question"
+  directory: "/test/emails"
+  data_types: ["tasks", "notes", "timelines"]
+  directory_files:
+    - path: "/test/emails/outlook_export_2024.html"
+      content_file: "email_action.html"
+  user_messages: []
+  score_name: "journey.start"
+  expect_question: true
+
+- id: "4.2"
+  description: "Full 3-turn conversation produces a valid AgentConfig JSON"
+  directory: "/test/emails"
+  data_types: ["tasks", "notes", "timelines"]
+  directory_files:
+    - path: "/test/emails/email_backup.html"
+      content_file: "email_action.html"
+  user_messages:
+    - "These are email exports from Outlook in HTML format"
+    - "Create tasks for emails with direct action requests, notes for informational emails"
+    - "Yes, that looks correct. No other rules."
+  score_name: "journey.valid_json"
+  expect_done: true
+  expect_valid_config: true
+
+- id: "4.3"
+  description: "Journey detects email_html content type from directory exploration"
+  directory: "/test/emails"
+  data_types: ["tasks", "notes"]
+  directory_files:
+    - path: "/test/emails/message.html"
+      content_file: "email_action.html"
+  user_messages:
+    - "HTML email backups from my mail client, exported from Outlook"
+    - "Create tasks from emails that contain assignments or direct action items"
+    - "Correct, no other rules needed"
+  score_name: "journey.detect_email"
+  expect_done: true
+  expect_content_type_id: "email_html"
+
+- id: "4.4"
+  description: "Custom user rule (only notes, no tasks) reflected in extraction_prompt"
+  directory: "/test/emails"
+  data_types: ["notes"]
+  directory_files:
+    - path: "/test/emails/email.html"
+      content_file: "email_info.html"
+  user_messages:
+    - "HTML emails from my work inbox"
+    - "Create only notes from all emails — I do not want tasks or timelines to be created"
+    - "Yes, exactly"
+  score_name: "journey.custom_rules"
+  expect_done: true
+  expect_extraction_contains: "note"
+
+- id: "4.5"
+  description: "Global rule (no project = no entity) appears in AgentConfig.global_rules"
+  directory: "/test/emails"
+  data_types: ["tasks", "notes"]
+  directory_files:
+    - path: "/test/emails/email.html"
+      content_file: "email_action.html"
+  user_messages:
+    - "Email backups from Outlook"
+    - "Create tasks from action request emails, notes from informational emails"
+    - "If the email cannot be matched to any project, do not create any entity at all"
+  score_name: "journey.global_rules"
+  expect_done: true
+  expect_global_rules: true
--- a/tests/fixtures/journey_v2/data/email_action.html
+++ b/tests/fixtures/journey_v2/data/email_action.html
@@ -0,0 +1,23 @@
+<!DOCTYPE html>
+<html>
+<head>
+  <meta charset="UTF-8">
+  <title>Email: Fix the login bug</title>
+  <style>body { font-family: Arial; } .header { color: #666; }</style>
+</head>
+<body>
+  <div class="header">
+    <p><strong>From:</strong> boss@company.com</p>
+    <p><strong>To:</strong> dev@company.com</p>
+    <p><strong>Subject:</strong> Fix the login bug</p>
+    <p><strong>Date:</strong> Mon, 7 Apr 2026 09:15:00 +0000</p>
+  </div>
+  <div class="body">
+    <p>Hi,</p>
+    <p>Please fix the login bug in Project Alpha as soon as possible.
+    Users are reporting that they can't log in with their Google accounts.
+    This is blocking the whole team. Please resolve it by Friday.</p>
+    <p>Thanks,<br>Boss</p>
+  </div>
+</body>
+</html>
--- a/tests/fixtures/journey_v2/data/email_info.html
+++ b/tests/fixtures/journey_v2/data/email_info.html
@@ -0,0 +1,23 @@
+<!DOCTYPE html>
+<html>
+<head>
+  <meta charset="UTF-8">
+  <title>Email: New policy update</title>
+  <style>body { font-family: Arial; }</style>
+</head>
+<body>
+  <div class="header">
+    <p><strong>From:</strong> hr@company.com</p>
+    <p><strong>To:</strong> all@company.com</p>
+    <p><strong>Subject:</strong> FYI: New remote work policy effective May 1</p>
+    <p><strong>Date:</strong> Tue, 8 Apr 2026 10:00:00 +0000</p>
+  </div>
+  <div class="body">
+    <p>Hi everyone,</p>
+    <p>Just a heads-up that starting May 1, 2026 the company will be moving to
+    a hybrid work model. You will be expected to come into the office at least
+    two days per week. More details will follow in the employee handbook.</p>
+    <p>Best,<br>HR Team</p>
+  </div>
+</body>
+</html>
--- a/tests/fixtures/preprocessors/cases.yaml
+++ b/tests/fixtures/preprocessors/cases.yaml
@@ -0,0 +1,68 @@
+# Preprocessor test cases
+#
+# detect: <expected_type>   → chiama detect_content_type(filename, content)
+# process: <content_type>   → chiama preprocess(content_type, content)
+#
+# Sorgente: file: <nome in data/>  oppure  generate: binary_noise
+#
+# Assertions piatte (solo per process):
+#   no_html: true           clean_text senza tag HTML
+#   min_chars: N            len(clean_text) >= N
+#   ratio_lt: F             len(clean) / len(raw) < F
+#   has_meta: [k, ...]      chiavi presenti in metadata
+#   contains: str | [str]   substring(s) presenti in clean_text
+#   excludes: str | [str]   substring(s) assenti da clean_text
+#   content_type: str       result.content_type == questo valore
+
+- id: "1.1"
+  file: email_action.html
+  detect: email_html
+
+- id: "1.2"
+  file: generic_page.html
+  detect: generic_html
+
+- id: "1.3"
+  file: notes.txt
+  detect: plain_text
+
+- id: "1.4"
+  file: archive.xyz
+  generate: binary_noise
+  detect: unknown
+
+- id: "1.5"
+  file: email_action.html
+  process: email_html
+  no_html: true
+  min_chars: 50
+  ratio_lt: 0.8
+
+- id: "1.6"
+  file: email_action.html
+  process: email_html
+  has_meta: [subject, from]
+
+- id: "1.7"
+  file: email_thread.html
+  process: email_html
+  contains: "Sure, I'll handle the deploy"
+  excludes: "Let's plan the deploy"
+
+- id: "1.8"
+  file: email_single.html
+  process: email_html
+  contains: "deploy is done"
+
+- id: "1.9"
+  file: email_heavy.html
+  process: email_html
+  no_html: true
+  min_chars: 30
+  excludes: [border-collapse, font-size]
+
+- id: "1.10"
+  file: fallback.txt
+  process: unknown
+  min_chars: 1
+  content_type: unknown
--- a/tests/fixtures/preprocessors/data/email_action.html
+++ b/tests/fixtures/preprocessors/data/email_action.html
@@ -0,0 +1,25 @@
+<!DOCTYPE html>
+<html>
+<head>
+  <title>Fix the login bug</title>
+  <style>
+    body { font-family: Arial, sans-serif; color: #333; margin: 0; padding: 20px; }
+    .header { background: #f5f5f5; padding: 10px; border-bottom: 1px solid #ddd; }
+    .body { padding: 20px; }
+  </style>
+</head>
+<body>
+  <div class="header">
+    <p><strong>From:</strong> boss@company.com</p>
+    <p><strong>To:</strong> dev@company.com</p>
+    <p><strong>Subject:</strong> Fix the login bug</p>
+    <p><strong>Date:</strong> Mon, 7 Apr 2026 09:00:00 +0200</p>
+  </div>
+  <div class="body">
+    <p>Hi,</p>
+    <p>Please fix the login bug by Friday. It is blocking the release.</p>
+    <p>Priority: high. Let me know if you need anything.</p>
+    <p>Thanks,<br>Boss</p>
+  </div>
+</body>
+</html>
--- a/tests/fixtures/preprocessors/data/email_heavy.html
+++ b/tests/fixtures/preprocessors/data/email_heavy.html
@@ -0,0 +1,49 @@
+<!DOCTYPE html>
+<html>
+<head>
+<style>
+  table { border-collapse: collapse; width: 100%; max-width: 600px; margin: 0 auto; }
+  td { padding: 8px 12px; border: 1px solid #dddddd; font-size: 12px; color: #444444; }
+  .header-row { background-color: #003366; color: #ffffff; font-weight: bold; }
+  .label-col { background-color: #f0f0f0; width: 80px; font-weight: bold; }
+  .footer-row { font-size: 10px; color: #999999; text-align: center; }
+</style>
+</head>
+<body bgcolor="#eeeeee">
+<center>
+<table cellpadding="0" cellspacing="0">
+  <tr class="header-row">
+    <td colspan="2">Company Internal Update</td>
+  </tr>
+  <tr>
+    <td class="label-col">From:</td>
+    <td>newsletter@corp.com</td>
+  </tr>
+  <tr>
+    <td class="label-col">Subject:</td>
+    <td>Q1 Results Update</td>
+  </tr>
+  <tr>
+    <td class="label-col">Date:</td>
+    <td>Apr 7, 2026</td>
+  </tr>
+  <tr>
+    <td colspan="2">
+      <table width="100%" cellpadding="10">
+        <tr>
+          <td>
+            <p style="font-size:14px; font-weight:bold;">Dear Team,</p>
+            <p>Q1 results are in. Revenue up 15% year-over-year.</p>
+            <p>Please review the attached report and share any feedback by EOW.</p>
+          </td>
+        </tr>
+      </table>
+    </td>
+  </tr>
+  <tr class="footer-row">
+    <td colspan="2">Confidential — do not forward outside the company.</td>
+  </tr>
+</table>
+</center>
+</body>
+</html>
--- a/tests/fixtures/preprocessors/data/email_single.html
+++ b/tests/fixtures/preprocessors/data/email_single.html
@@ -0,0 +1,8 @@
+<!DOCTYPE html>
+<html><body>
+  <p><strong>From:</strong> alice@co.com</p>
+  <p><strong>To:</strong> team@co.com</p>
+  <p><strong>Subject:</strong> Quick update</p>
+  <p><strong>Date:</strong> Tue, 7 Apr 2026 10:30:00 +0200</p>
+  <p>The deploy is done. Everything looks good. No issues so far.</p>
+</body></html>
--- a/tests/fixtures/preprocessors/data/email_thread.html
+++ b/tests/fixtures/preprocessors/data/email_thread.html
@@ -0,0 +1,24 @@
+<!DOCTYPE html>
+<html><body>
+  <div class="message-latest">
+    <p><strong>From:</strong> alice@co.com</p>
+    <p><strong>Subject:</strong> Re: Re: Deploy plan</p>
+    <p>Sure, I'll handle the deploy.</p>
+  </div>
+
+  <p>On Mon, Apr 6, 2026 at 3:00 PM, Bob &lt;bob@co.com&gt; wrote:</p>
+  <blockquote>
+    <p>From: bob@co.com</p>
+    <p>Can you handle the deploy?</p>
+    <p>On Sun, Apr 5, 2026 at 1:00 PM, Alice &lt;alice@co.com&gt; wrote:</p>
+    <blockquote>
+      <p>From: alice@co.com</p>
+      <p>Let's plan the deploy for Monday.</p>
+      <p>On Sat, Apr 4, 2026 at 11:00 AM, Charlie &lt;charlie@co.com&gt; wrote:</p>
+      <blockquote>
+        <p>From: charlie@co.com</p>
+        <p>We need to schedule the deploy. What day works?</p>
+      </blockquote>
+    </blockquote>
+  </blockquote>
+</body></html>
--- a/tests/fixtures/preprocessors/data/fallback.txt
+++ b/tests/fixtures/preprocessors/data/fallback.txt
@@ -0,0 +1,3 @@
+random text content without any structure
+line two with some words
+line three and more content here
--- a/tests/fixtures/preprocessors/data/generic_page.html
+++ b/tests/fixtures/preprocessors/data/generic_page.html
@@ -0,0 +1,35 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8">
+  <title>My Web App</title>
+  <link rel="stylesheet" href="styles.css">
+</head>
+<body>
+  <nav>
+    <a href="/">Home</a>
+    <a href="/about">About</a>
+    <a href="/contact">Contact</a>
+  </nav>
+  <main>
+    <header>
+      <h1>Welcome to My App</h1>
+    </header>
+    <article>
+      <p>This is a generic web page with no email headers.</p>
+      <p>It has navigation, main content, and a footer.</p>
+    </article>
+    <section>
+      <h2>Features</h2>
+      <ul>
+        <li>Fast</li>
+        <li>Reliable</li>
+        <li>Secure</li>
+      </ul>
+    </section>
+  </main>
+  <footer>
+    <p>&copy; 2026 My App</p>
+  </footer>
+</body>
+</html>
--- a/tests/fixtures/preprocessors/data/notes.txt
+++ b/tests/fixtures/preprocessors/data/notes.txt
@@ -0,0 +1,15 @@
+Meeting notes - April 7, 2026
+
+Attendees: Alice, Bob, Charlie
+
+Discussion points:
+- Deploy scheduled for Friday
+- Bug fix for login must be completed by Thursday
+- Review Q1 numbers before EOW
+
+Action items:
+- Alice: fix login bug
+- Bob: prepare deploy checklist
+- Charlie: send Q1 report
+
+Next meeting: April 14, 2026
--- a/tests/test_agent_registry.py
+++ b/tests/test_agent_registry.py
@@ -1,214 +0,0 @@
-"""Unit tests for the agent registry, base classes, and tool loop."""
-
-from __future__ import annotations
-
-from typing import Any
-from unittest.mock import AsyncMock, MagicMock
-
-import pytest
-
-from app.core.agent_registry import AgentRegistry, ChatAgent
-
-
-# ── Helpers ──────────────────────────────────────────────────────────
-
-class _StubAgent(ChatAgent):
-    """Minimal concrete agent for testing."""
-
-    def get_name(self) -> str:
-        return "stub"
-
-    def get_description(self) -> str:
-        return "A stub agent for tests"
-
-    def get_tools(self) -> list[Any]:
-        return []
-
-    async def handle(self, query: str, context: dict[str, Any]) -> str:
-        return f"echo: {query}"
-
-
-class _AnotherAgent(ChatAgent):
-    def get_name(self) -> str:
-        return "another"
-
-    def get_description(self) -> str:
-        return "Another stub"
-
-    def get_tools(self) -> list[Any]:
-        return []
-
-    async def handle(self, query: str, context: dict[str, Any]) -> str:
-        return "another"
-
-
-# ── Fixtures ─────────────────────────────────────────────────────────
-
-@pytest.fixture(autouse=True)
-def _fresh_registry():
-    """Reset the singleton between tests."""
-    AgentRegistry._instance = None
-    yield
-    AgentRegistry._instance = None
-
-
-@pytest.fixture()
-def reg() -> AgentRegistry:
-    return AgentRegistry()
-
-
-# ── Tests ────────────────────────────────────────────────────────────
-
-class TestRegisterAndGet:
-    def test_register_decorator(self, reg: AgentRegistry) -> None:
-        reg.register(_StubAgent)
-        agent = reg.get("stub")
-        assert isinstance(agent, _StubAgent)
-
-    def test_get_unknown_raises(self, reg: AgentRegistry) -> None:
-        with pytest.raises(KeyError, match="not found"):
-            reg.get("nonexistent")
-
-    def test_register_multiple(self, reg: AgentRegistry) -> None:
-        reg.register(_StubAgent)
-        reg.register(_AnotherAgent)
-        assert reg.get("stub").get_name() == "stub"
-        assert reg.get("another").get_name() == "another"
-
-
-class TestListAgents:
-    def test_empty(self, reg: AgentRegistry) -> None:
-        assert reg.list_agents() == []
-
-    def test_list_after_register(self, reg: AgentRegistry) -> None:
-        reg.register(_StubAgent)
-        agents = reg.list_agents()
-        assert len(agents) == 1
-        assert agents[0] == {"name": "stub", "description": "A stub agent for tests"}
-
-    def test_list_multiple(self, reg: AgentRegistry) -> None:
-        reg.register(_StubAgent)
-        reg.register(_AnotherAgent)
-        names = {a["name"] for a in reg.list_agents()}
-        assert names == {"stub", "another"}
-
-
-class TestCallAgent:
-    @pytest.mark.asyncio
-    async def test_call_agent(self, reg: AgentRegistry) -> None:
-        reg.register(_StubAgent)
-        result = await reg.call_agent("stub", "hello", {})
-        assert result == "echo: hello"
-
-    @pytest.mark.asyncio
-    async def test_call_unknown_raises(self, reg: AgentRegistry) -> None:
-        with pytest.raises(KeyError):
-            await reg.call_agent("nope", "hi", {})
-
-
-class TestSingleton:
-    def test_singleton_identity(self) -> None:
-        a = AgentRegistry()
-        b = AgentRegistry()
-        assert a is b
-
-
-class TestToolLoop:
-    @pytest.mark.asyncio
-    async def test_no_tool_calls(self) -> None:
-        """When the LLM responds without tool calls, return content directly."""
-        agent = _StubAgent()
-
-        ai_msg = MagicMock()
-        ai_msg.content = "final answer"
-        ai_msg.tool_calls = []
-
-        llm = AsyncMock()
-        llm.bind_tools = MagicMock(return_value=llm)
-        llm.ainvoke = AsyncMock(return_value=ai_msg)
-
-        result = await agent._tool_loop(llm, [], [])
-        assert result == "final answer"
-
-    @pytest.mark.asyncio
-    async def test_tool_call_then_answer(self) -> None:
-        """LLM requests one tool call, gets result, then answers."""
-        agent = _StubAgent()
-
-        # First response: tool call
-        tool_call_msg = MagicMock()
-        tool_call_msg.content = ""
-        tool_call_msg.tool_calls = [
-            {"id": "call_1", "name": "my_tool", "args": {"x": 1}}
-        ]
-
-        # Second response: final answer
-        final_msg = MagicMock()
-        final_msg.content = "done"
-        final_msg.tool_calls = []
-
-        llm = AsyncMock()
-        llm.bind_tools = MagicMock(return_value=llm)
-        llm.ainvoke = AsyncMock(side_effect=[tool_call_msg, final_msg])
-
-        # Mock tool
-        tool = AsyncMock()
-        tool.name = "my_tool"
-        tool.ainvoke = AsyncMock(return_value="tool_result")
-
-        result = await agent._tool_loop(llm, [], [tool])
-        assert result == "done"
-        tool.ainvoke.assert_called_once_with({"x": 1})
-
-    @pytest.mark.asyncio
-    async def test_unknown_tool_handled(self) -> None:
-        """Unknown tool names produce an error message instead of crashing."""
-        agent = _StubAgent()
-
-        tool_call_msg = MagicMock()
-        tool_call_msg.content = ""
-        tool_call_msg.tool_calls = [
-            {"id": "call_1", "name": "missing", "args": {}}
-        ]
-
-        final_msg = MagicMock()
-        final_msg.content = "recovered"
-        final_msg.tool_calls = []
-
-        llm = AsyncMock()
-        llm.bind_tools = MagicMock(return_value=llm)
-        llm.ainvoke = AsyncMock(side_effect=[tool_call_msg, final_msg])
-
-        result = await agent._tool_loop(llm, [], [])
-        assert result == "recovered"
-
-    @pytest.mark.asyncio
-    async def test_max_iter_reached(self) -> None:
-        """When max iterations are exhausted, a final no-tools call is made."""
-        agent = _StubAgent()
-
-        # Every response requests a tool call
-        loop_msg = MagicMock()
-        loop_msg.content = ""
-        loop_msg.tool_calls = [
-            {"id": "call_x", "name": "t", "args": {}}
-        ]
-
-        final_msg = MagicMock()
-        final_msg.content = "gave up"
-        final_msg.tool_calls = []
-
-        tool = AsyncMock()
-        tool.name = "t"
-        tool.ainvoke = AsyncMock(return_value="ok")
-
-        llm_with_tools = AsyncMock()
-        llm_with_tools.ainvoke = AsyncMock(return_value=loop_msg)
-
-        llm = AsyncMock()
-        llm.bind_tools = MagicMock(return_value=llm_with_tools)
-        llm.ainvoke = AsyncMock(return_value=final_msg)
-
-        result = await agent._tool_loop(llm, [], [tool], max_iter=2)
-        assert result == "gave up"
-        assert llm_with_tools.ainvoke.call_count == 2
--- a/tests/test_agent_runner.py
+++ b/tests/test_agent_runner.py
@@ -0,0 +1,810 @@
+"""Tests for Step 3.4: agent_runner module.
+
+Coverage:
+  Unit:
+    - _is_overdue      — cron schedule overdue detection
+    - _extract_items_from_content — LLM extraction + JSON parsing + validation
+    - _send_insert_to_client      — tool_call frame construction + timeout
+    - run_local_agent             — end-to-end local agent happy path
+    - run_local_agent             — device offline path
+    - run_local_agent             — file-read timeout path
+    - run_local_agent             — LLM extraction error path
+    - run_cloud_agent             — stub returns error immediately
+    - trigger_pending_runs        — skipped when config is client-owned
+    - trigger_pending_runs        — non-overdue skipped
+    - trigger_pending_runs        — device_id filter for local agents
+
+    Integration:
+        - POST /agents/can-create     — billing eligibility check
+        - POST /agents/trigger        — creates run log + dispatches background task
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import uuid
+from datetime import datetime, timezone
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+import pytest_asyncio
+
+from app.core.agent_runner import (
+    _extract_items_from_content,
+    _is_overdue,
+    _send_insert_to_client,
+    run_cloud_agent,
+    run_local_agent,
+    trigger_pending_runs,
+)
+from app.core.device_manager import DeviceConnectionManager
+from app.db import get_session
+from app.main import app
+from app.models import AgentRunLog, CloudAgentConfig, LocalAgentConfig
+from tests.conftest import TEST_USER_IDS, auth_header
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+_FREE_UID = TEST_USER_IDS["free"]
+_PRO_UID = TEST_USER_IDS["pro"]
+
+
+def _make_local_config(user_id: str = _FREE_UID, device_id: str = "dev-001") -> LocalAgentConfig:
+    return LocalAgentConfig(
+        id=str(uuid.uuid4()),
+        user_id=user_id,
+        device_id=device_id,
+        name="Test Local Agent",
+        directory_paths=["/home/user/emails"],
+        data_types=["tasks", "notes"],
+        prompt_template="Extract tasks and notes from this document.",
+        file_extensions=[".txt", ".eml"],
+        schedule_cron="0 */6 * * *",
+        enabled=True,
+        last_run_at=None,
+    )
+
+
+def _make_cloud_config(user_id: str = _FREE_UID) -> CloudAgentConfig:
+    return CloudAgentConfig(
+        id=str(uuid.uuid4()),
+        user_id=user_id,
+        provider="gmail",
+        name="Test Gmail Agent",
+        data_types=["tasks"],
+        prompt_template="Extract tasks from email.",
+        schedule_cron="0 */6 * * *",
+        enabled=True,
+        last_run_at=None,
+    )
+
+
+def _make_run_log(agent_id: str, agent_type: str = "local", user_id: str = _FREE_UID) -> AgentRunLog:
+    return AgentRunLog(
+        id=str(uuid.uuid4()),
+        agent_id=agent_id,
+        agent_type=agent_type,
+        user_id=user_id,
+        status="running",
+        started_at=datetime.now(timezone.utc),
+    )
+
+
+def _make_manager(user_id: str = _FREE_UID, device_id: str = "dev-001") -> DeviceConnectionManager:
+    mgr = DeviceConnectionManager()
+    ws = MagicMock()
+    ws.send_text = AsyncMock()
+    mgr.register(user_id, device_id, ws)
+    return mgr
+
+
+# ---------------------------------------------------------------------------
+# _is_overdue
+# ---------------------------------------------------------------------------
+
+def test_is_overdue_never_run():
+    """An agent that has never run is always overdue."""
+    assert _is_overdue("0 */6 * * *", None) is True
+
+
+def test_is_overdue_very_recently_run():
+    """An agent that just ran is not overdue."""
+    last = datetime.now(timezone.utc)
+    assert _is_overdue("0 */6 * * *", last) is False
+
+
+def test_is_overdue_long_ago():
+    """An agent last run 2 days ago with a 6-hour schedule is overdue."""
+    from datetime import timedelta
+    last = datetime.now(timezone.utc) - timedelta(days=2)
+    assert _is_overdue("0 */6 * * *", last) is True
+
+
+def test_is_overdue_invalid_cron_returns_false():
+    """Unparseable cron must not raise and should return False (fail-safe)."""
+    assert _is_overdue("not a cron", None) is False
+
+
+def test_is_overdue_naive_datetime():
+    """Naive datetime objects are handled without raising."""
+    from datetime import timedelta
+    last = datetime.utcnow() - timedelta(days=1)  # naive
+    # Should not raise.
+    result = _is_overdue("0 */6 * * *", last)
+    assert isinstance(result, bool)
+
+
+# ---------------------------------------------------------------------------
+# _extract_items_from_content
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+async def test_extract_items_happy_path():
+    """LLM returns valid JSON array; items with allowed tables are returned."""
+    mock_llm = MagicMock()
+    mock_response = MagicMock()
+    mock_response.content = json.dumps([
+        {"table": "tasks", "data": {"title": "Buy milk", "priority": "high"}},
+        {"table": "notes", "data": {"title": "Meeting recap", "content": "Discussed roadmap"}},
+    ])
+    mock_llm.ainvoke = AsyncMock(return_value=mock_response)
+
+    with patch("app.core.agent_runner.get_llm", return_value=mock_llm):
+        items = await _extract_items_from_content(
+            "Extract tasks and notes.",
+            "Email body: Buy milk urgently. Notes from meeting: discussed roadmap.",
+            ["tasks", "notes"],
+        )
+
+    assert len(items) == 2
+    assert items[0]["table"] == "tasks"
+    assert items[0]["data"]["title"] == "Buy milk"
+    assert items[1]["table"] == "notes"
+
+
+@pytest.mark.asyncio
+async def test_extract_items_strips_forbidden_fields():
+    """Fields like id, createdAt, isAiSuggested must be stripped from extracted data."""
+    mock_llm = MagicMock()
+    mock_response = MagicMock()
+    mock_response.content = json.dumps([
+        {
+            "table": "tasks",
+            "data": {
+                "title": "Review PR",
+                "id": "should-be-removed",
+                "createdAt": 99999,
+                "isAiSuggested": 0,
+                "isApproved": 1,
+            },
+        }
+    ])
+    mock_llm.ainvoke = AsyncMock(return_value=mock_response)
+
+    with patch("app.core.agent_runner.get_llm", return_value=mock_llm):
+        items = await _extract_items_from_content("Extract tasks.", "Review the PR.", ["tasks"])
+
+    assert len(items) == 1
+    data = items[0]["data"]
+    assert "id" not in data
+    assert "createdAt" not in data
+    assert "isAiSuggested" not in data
+    assert "isApproved" not in data
+    assert data["title"] == "Review PR"
+
+
+@pytest.mark.asyncio
+async def test_extract_items_invalid_json_returns_empty():
+    """LLM returning invalid JSON must return empty list without raising."""
+    mock_llm = MagicMock()
+    mock_response = MagicMock()
+    mock_response.content = "Sorry, I cannot extract anything."
+    mock_llm.ainvoke = AsyncMock(return_value=mock_response)
+
+    with patch("app.core.agent_runner.get_llm", return_value=mock_llm):
+        items = await _extract_items_from_content("Extract tasks.", "content", ["tasks"])
+
+    assert items == []
+
+
+@pytest.mark.asyncio
+async def test_extract_items_disallowed_table_filtered():
+    """Items whose table is not in data_types are discarded."""
+    mock_llm = MagicMock()
+    mock_response = MagicMock()
+    mock_response.content = json.dumps([
+        {"table": "tasks", "data": {"title": "Valid task"}},
+        {"table": "projects", "data": {"name": "Should be filtered"}},
+    ])
+    mock_llm.ainvoke = AsyncMock(return_value=mock_response)
+
+    with patch("app.core.agent_runner.get_llm", return_value=mock_llm):
+        # Only "tasks" is in data_types — "projects" should be filtered.
+        items = await _extract_items_from_content("Extract.", "content", ["tasks"])
+
+    assert len(items) == 1
+    assert items[0]["table"] == "tasks"
+
+
+@pytest.mark.asyncio
+async def test_extract_items_empty_data_types_returns_empty():
+    """If no allowed data_types match, skip LLM call and return immediately."""
+    mock_llm = MagicMock()
+    mock_llm.ainvoke = AsyncMock()
+
+    with patch("app.core.agent_runner.get_llm", return_value=mock_llm):
+        items = await _extract_items_from_content("Extract.", "content", [])
+
+    mock_llm.ainvoke.assert_not_called()
+    assert items == []
+
+
+@pytest.mark.asyncio
+async def test_extract_items_llm_error_propagates():
+    """LLM API errors propagate so the caller (run_local_agent) can record them."""
+    mock_llm = MagicMock()
+    mock_llm.ainvoke = AsyncMock(side_effect=RuntimeError("API unavailable"))
+
+    with patch("app.core.agent_runner.get_llm", return_value=mock_llm):
+        with pytest.raises(RuntimeError, match="API unavailable"):
+            await _extract_items_from_content("Extract tasks.", "content", ["tasks"])
+
+
+# ---------------------------------------------------------------------------
+# _send_insert_to_client
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+async def test_send_insert_to_client_happy_path():
+    """Frame is sent with isAiSuggested/isApproved added; result is returned."""
+    mgr = _make_manager()
+
+    sent_payloads: list[dict] = []
+    original_send = mgr.send_frame
+
+    async def _capture_send(uid: str, frame: dict) -> None:
+        sent_payloads.append(frame)
+        # Immediately resolve the pending call with a success result.
+        call_id = frame["id"]
+        mgr.resolve_pending_call(uid, call_id, {"row": {"id": "new-id", "title": "Buy milk"}})
+
+    mgr.send_frame = _capture_send  # type: ignore[method-assign]
+
+    result = await _send_insert_to_client(
+        _FREE_UID, "tasks", {"title": "Buy milk", "priority": "high"}, mgr
+    )
+
+    assert len(sent_payloads) == 1
+    payload = sent_payloads[0]
+    assert payload["action"] == "insert"
+    assert payload["table"] == "tasks"
+    assert payload["data"]["title"] == "Buy milk"
+    assert payload["data"]["isAiSuggested"] == 1
+    assert payload["data"]["isApproved"] == 0
+    assert result["row"]["title"] == "Buy milk"
+
+
+@pytest.mark.asyncio
+async def test_send_insert_to_client_timeout():
+    """asyncio.TimeoutError is raised when Electron does not respond."""
+    mgr = _make_manager()
+
+    async def _slow_send(uid: str, frame: dict) -> None:
+        # Never resolve the pending call.
+        pass
+
+    mgr.send_frame = _slow_send  # type: ignore[method-assign]
+
+    with patch("app.core.agent_runner._INSERT_TIMEOUT", 0.05):
+        with pytest.raises(asyncio.TimeoutError):
+            await _send_insert_to_client(_FREE_UID, "tasks", {"title": "X"}, mgr)
+
+
+# ---------------------------------------------------------------------------
+# run_local_agent
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+async def test_run_local_agent_device_offline():
+    """run_local_agent marks run as error when device is offline."""
+    config = _make_local_config()
+    run_log = _make_run_log(config.id)
+    mgr = DeviceConnectionManager()  # Empty — no device registered.
+
+    with patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock) as mock_finalize:
+        await run_local_agent(_FREE_UID, config, run_log, mgr)
+
+    mock_finalize.assert_called_once()
+    _args, kwargs = mock_finalize.call_args
+    assert kwargs["status"] == "error"
+    assert any("not connected" in e for e in kwargs["errors"])
+
+
+@pytest.mark.asyncio
+async def test_run_local_agent_happy_path():
+    """End-to-end: files received, LLM extracts one task, insert sent + ack'd."""
+    config = _make_local_config()
+    run_log = _make_run_log(config.id)
+    mgr = _make_manager()
+
+    # Build a fake agent_data frame (will be queued after send).
+    file_frame = {
+        "type": "agent_data",
+        "run_id": run_log.id,
+        "files": [{"path": "/email.eml", "content": "Urgent: fix the bug by Friday."}],
+    }
+    agent_complete_frame = None  # sentinel
+
+    sent_frames: list[dict] = []
+
+    async def _mock_send(uid: str, frame: dict) -> None:
+        sent_frames.append(frame)
+        if frame.get("type") == "agent_run":
+            # Simulate Electron responding with file data then agent_complete.
+            q = mgr.get_agent_data_queue(uid, frame["run_id"])
+            await q.put(file_frame)
+            await q.put(agent_complete_frame)
+        elif frame.get("type") == "tool_call":
+            # Resolve the pending insert immediately.
+            mgr.resolve_pending_call(uid, frame["id"], {"row": {"id": "new-task", "title": "Fix the bug"}})
+
+    mgr.send_frame = _mock_send  # type: ignore[method-assign]
+
+    mock_llm = MagicMock()
+    mock_response = MagicMock()
+    mock_response.content = json.dumps([
+        {"table": "tasks", "data": {"title": "Fix the bug", "priority": "high"}}
+    ])
+    mock_llm.ainvoke = AsyncMock(return_value=mock_response)
+
+    with patch("app.core.agent_runner.get_llm", return_value=mock_llm), \
+         patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock) as mock_finalize:
+        await run_local_agent(_FREE_UID, config, run_log, mgr)
+
+    mock_finalize.assert_called_once()
+    _args, kwargs = mock_finalize.call_args
+    assert kwargs["status"] == "success"
+    assert kwargs["items_processed"] == 1
+    assert kwargs["items_created"] == 1
+    assert kwargs["errors"] == []
+    assert kwargs["update_config_last_run"] is False
+
+    # Verify agent_run frame was sent.
+    agent_run_frames = [f for f in sent_frames if f.get("type") == "agent_run"]
+    assert len(agent_run_frames) == 1
+    assert agent_run_frames[0]["agent_id"] == config.id
+    assert "paths" in agent_run_frames[0]["config"]
+
+    # Verify insert frame was sent with AI flags.
+    insert_frames = [f for f in sent_frames if f.get("type") == "tool_call"]
+    assert len(insert_frames) == 1
+    assert insert_frames[0]["data"]["isAiSuggested"] == 1
+    assert insert_frames[0]["data"]["isApproved"] == 0
+
+
+@pytest.mark.asyncio
+async def test_run_local_agent_file_read_timeout():
+    """run_local_agent marks run as partial/error when device stops sending files."""
+    config = _make_local_config()
+    run_log = _make_run_log(config.id)
+    mgr = _make_manager()
+
+    async def _mock_send(uid: str, frame: dict) -> None:
+        # Don't put anything in the queue — simulate stalled device.
+        pass
+
+    mgr.send_frame = _mock_send  # type: ignore[method-assign]
+
+    with patch("app.core.agent_runner._FILE_READ_TIMEOUT", 0.1), \
+         patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock) as mock_finalize:
+        await run_local_agent(_FREE_UID, config, run_log, mgr)
+
+    mock_finalize.assert_called_once()
+    _args, kwargs = mock_finalize.call_args
+    assert kwargs["status"] == "error"  # No items created, so error (not partial).
+    assert any("timed out" in e.lower() for e in kwargs["errors"])
+
+
+@pytest.mark.asyncio
+async def test_run_local_agent_llm_extraction_error():
+    """LLM errors per-file are recorded; run continues for remaining files."""
+    config = _make_local_config()
+    run_log = _make_run_log(config.id)
+    mgr = _make_manager()
+
+    file_frame = {
+        "type": "agent_data",
+        "run_id": run_log.id,
+        "files": [
+            {"path": "/file1.eml", "content": "Email one."},
+            {"path": "/file2.eml", "content": "Email two."},
+        ],
+    }
+
+    async def _mock_send(uid: str, frame: dict) -> None:
+        if frame.get("type") == "agent_run":
+            q = mgr.get_agent_data_queue(uid, frame["run_id"])
+            await q.put(file_frame)
+            await q.put(None)  # agent_complete sentinel
+
+    mgr.send_frame = _mock_send  # type: ignore[method-assign]
+
+    mock_llm = MagicMock()
+    mock_llm.ainvoke = AsyncMock(side_effect=RuntimeError("LLM boom"))
+
+    with patch("app.core.agent_runner.get_llm", return_value=mock_llm), \
+         patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock) as mock_finalize:
+        await run_local_agent(_FREE_UID, config, run_log, mgr)
+
+    _args, kwargs = mock_finalize.call_args
+    assert kwargs["status"] == "error"
+    assert kwargs["items_processed"] == 2  # Both files attempted.
+    assert kwargs["items_created"] == 0
+    assert len(kwargs["errors"]) == 2  # One error per file.
+
+
+# ---------------------------------------------------------------------------
+# run_cloud_agent (stub)
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+async def test_run_cloud_agent_device_offline():
+    """Cloud agent aborts immediately when no device is connected."""
+    config = _make_cloud_config()
+    run_log = _make_run_log(config.id, agent_type="cloud")
+    mgr = DeviceConnectionManager()  # empty — no devices registered
+
+    with patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock) as mock_finalize:
+        await run_cloud_agent(_FREE_UID, config, run_log, mgr)
+
+    mock_finalize.assert_called_once()
+    _, kwargs = mock_finalize.call_args
+    assert kwargs["status"] == "error"
+    assert any("device" in e.lower() or "connected" in e.lower() for e in kwargs["errors"])
+
+
+@pytest.mark.asyncio
+async def test_run_cloud_agent_no_oauth_token():
+    """Cloud agent errors when no OAuth token is stored."""
+    config = _make_cloud_config()
+    config.oauth_token_encrypted = None
+    run_log = _make_run_log(config.id, agent_type="cloud")
+    mgr = _make_manager()
+
+    with patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock) as mock_finalize:
+        await run_cloud_agent(_FREE_UID, config, run_log, mgr)
+
+    _, kwargs = mock_finalize.call_args
+    assert kwargs["status"] == "error"
+    assert any("oauth" in e.lower() or "token" in e.lower() for e in kwargs["errors"])
+
+
+@pytest.mark.asyncio
+async def test_run_cloud_agent_token_decrypt_failure():
+    """Cloud agent errors gracefully when the stored token cannot be decrypted."""
+    config = _make_cloud_config()
+    config.oauth_token_encrypted = "this-is-not-valid-fernet-ciphertext"
+    run_log = _make_run_log(config.id, agent_type="cloud")
+    mgr = _make_manager()
+
+    from cryptography.fernet import Fernet as _Fernet
+    valid_key = _Fernet.generate_key().decode()
+
+    with patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock) as mock_finalize, \
+         patch("app.integrations.settings") as mock_settings:
+        mock_settings.OAUTH_ENCRYPTION_KEY = valid_key
+        await run_cloud_agent(_FREE_UID, config, run_log, mgr)
+
+    _, kwargs = mock_finalize.call_args
+    assert kwargs["status"] == "error"
+    assert any("decrypt" in e.lower() for e in kwargs["errors"])
+
+
+@pytest.mark.asyncio
+async def test_run_cloud_agent_happy_path_gmail():
+    """Cloud agent happy path: Gmail fetch → LLM extraction → inserts → success."""
+    from app.integrations import EmailMessage, encrypt_token
+    from cryptography.fernet import Fernet as _Fernet
+
+    fernet_key = _Fernet.generate_key().decode()
+    credentials = {
+        "token": "access_abc",
+        "refresh_token": "refresh_xyz",
+        "token_uri": "https://oauth2.googleapis.com/token",
+        "client_id": "cid",
+        "client_secret": "csec",
+    }
+
+    config = _make_cloud_config()
+    config.provider = "gmail"
+    config.prompt_template = "Extract tasks from this email."
+    config.data_types = ["tasks"]
+
+    with patch("app.integrations.settings") as ms:
+        ms.OAUTH_ENCRYPTION_KEY = fernet_key
+        config.oauth_token_encrypted = encrypt_token(credentials)
+
+    run_log = _make_run_log(config.id, agent_type="cloud")
+    mgr = _make_manager()
+
+    sample_email = EmailMessage(
+        id="msg001",
+        subject="Action required",
+        sender="boss@company.com",
+        body_text="Please fix the bug by Friday.",
+        date=datetime(2025, 6, 1, 10, 0, tzinfo=timezone.utc),
+    )
+
+    extracted_items = [{"table": "tasks", "data": {"title": "Fix the bug", "priority": "high"}}]
+
+    with patch("app.integrations.settings") as mock_int_settings, \
+         patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock) as mock_finalize, \
+         patch("app.core.agent_runner._extract_items_from_content", new_callable=AsyncMock, return_value=extracted_items) as mock_extract, \
+         patch("app.core.agent_runner._send_insert_to_client", new_callable=AsyncMock, return_value={"ok": True}) as mock_insert, \
+         patch("app.core.agent_runner.async_session"):
+        mock_int_settings.OAUTH_ENCRYPTION_KEY = fernet_key
+
+        mock_gmail = AsyncMock()
+        mock_gmail.fetch_messages = AsyncMock(return_value=[sample_email])
+        mock_gmail.refreshed_credentials = None
+
+        with patch("app.integrations.decrypt_token", return_value=credentials), \
+             patch("app.integrations.get_provider", return_value=mock_gmail):
+            await run_cloud_agent(_FREE_UID, config, run_log, mgr)
+
+    mock_extract.assert_called_once()
+    mock_insert.assert_called_once()
+    _, kwargs = mock_finalize.call_args
+    assert kwargs["status"] == "success"
+    assert kwargs["items_processed"] == 1
+    assert kwargs["items_created"] == 1
+    assert kwargs["config_type"] == "cloud"
+
+
+@pytest.mark.asyncio
+async def test_run_cloud_agent_provider_fetch_error():
+    """Cloud agent records error status when provider fetch raises RuntimeError."""
+    credentials = {"token": "abc"}
+    config = _make_cloud_config()
+    config.oauth_token_encrypted = "some_encrypted_value"  # non-empty so decrypt step is reached
+    config.prompt_template = "Extract tasks."
+    config.data_types = ["tasks"]
+    run_log = _make_run_log(config.id, agent_type="cloud")
+    mgr = _make_manager()
+
+    mock_provider = AsyncMock()
+    mock_provider.fetch_messages = AsyncMock(side_effect=RuntimeError("API quota exceeded"))
+    mock_provider.refreshed_credentials = None
+
+    with patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock) as mock_finalize, \
+         patch("app.integrations.decrypt_token", return_value=credentials), \
+         patch("app.integrations.get_provider", return_value=mock_provider), \
+         patch("app.core.agent_runner.async_session"):
+        await run_cloud_agent(_FREE_UID, config, run_log, mgr)
+
+    _, kwargs = mock_finalize.call_args
+    assert kwargs["status"] == "error"
+    assert any("quota" in e.lower() or "fetch" in e.lower() for e in kwargs["errors"])
+
+
+@pytest.mark.asyncio
+async def test_run_cloud_agent_refreshed_token_persisted():
+    """When the provider refreshes its token, the new ciphertext is written to DB."""
+    from app.integrations import EmailMessage, encrypt_token
+    from cryptography.fernet import Fernet as _Fernet
+
+    fernet_key = _Fernet.generate_key().decode()
+    credentials = {"token": "old_token", "refresh_token": "rt_old"}
+    fresh_credentials = {"token": "new_token", "refresh_token": "rt_new"}
+
+    config = _make_cloud_config()
+    config.prompt_template = "Extract tasks."
+    config.data_types = ["tasks"]
+
+    with patch("app.integrations.settings") as ms:
+        ms.OAUTH_ENCRYPTION_KEY = fernet_key
+        config.oauth_token_encrypted = encrypt_token(credentials)
+
+    run_log = _make_run_log(config.id, agent_type="cloud")
+    mgr = _make_manager()
+
+    mock_provider = AsyncMock()
+    mock_provider.fetch_messages = AsyncMock(return_value=[])
+    mock_provider.refreshed_credentials = fresh_credentials  # token was refreshed
+
+    # Track DB writes via mock async_session.
+    mock_cfg_row = MagicMock()
+    mock_cfg_row.oauth_token_encrypted = None
+
+    mock_db = AsyncMock()
+    mock_db.__aenter__ = AsyncMock(return_value=mock_db)
+    mock_db.__aexit__ = AsyncMock(return_value=False)
+    mock_db.scalar_one_or_none = AsyncMock(return_value=mock_cfg_row)
+    cfg_result = MagicMock()
+    cfg_result.scalar_one_or_none.return_value = mock_cfg_row
+    mock_db.execute = AsyncMock(return_value=cfg_result)
+    mock_db.commit = AsyncMock()
+
+    with patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock), \
+         patch("app.integrations.decrypt_token", return_value=credentials), \
+         patch("app.integrations.get_provider", return_value=mock_provider), \
+         patch("app.integrations.encrypt_token", return_value="new_encrypted") as mock_encrypt, \
+         patch("app.core.agent_runner.async_session", return_value=mock_db), \
+         patch("app.integrations.settings") as mock_int_settings:
+        mock_int_settings.OAUTH_ENCRYPTION_KEY = fernet_key
+        await run_cloud_agent(_FREE_UID, config, run_log, mgr)
+
+    # The new encrypted token should have been written to the config row.
+    mock_encrypt.assert_called_once_with(fresh_credentials)
+    assert mock_cfg_row.oauth_token_encrypted == "new_encrypted"
+
+
+@pytest.mark.asyncio
+async def test_finalize_run_updates_cloud_config_last_run_at():
+    """_finalize_run with config_type='cloud' updates CloudAgentConfig.last_run_at."""
+    from app.core.agent_runner import _finalize_run
+
+    run_log = _make_run_log(str(uuid.uuid4()), agent_type="cloud")
+    run_log.id = str(uuid.uuid4())
+
+    mock_cfg = MagicMock()
+    mock_cfg.last_run_at = None
+
+    cfg_result = MagicMock()
+    cfg_result.scalar_one_or_none.return_value = mock_cfg
+
+    mock_db = AsyncMock()
+    mock_db.__aenter__ = AsyncMock(return_value=mock_db)
+    mock_db.__aexit__ = AsyncMock(return_value=False)
+    mock_db.merge = AsyncMock(return_value=run_log)
+    mock_db.execute = AsyncMock(return_value=cfg_result)
+    mock_db.commit = AsyncMock()
+
+    config_id = str(uuid.uuid4())
+
+    with patch("app.core.agent_runner.async_session", return_value=mock_db):
+        await _finalize_run(
+            run_log,
+            status="success",
+            update_config_last_run=True,
+            config_id=config_id,
+            config_type="cloud",
+        )
+
+    # CloudAgentConfig.last_run_at should have been set.
+    assert mock_cfg.last_run_at is not None
+    mock_db.commit.assert_called()
+
+
+# ---------------------------------------------------------------------------
+# trigger_pending_runs
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+async def test_trigger_pending_runs_no_overdue():
+    """Pending-run scan is skipped because agent config is client-owned."""
+
+    mgr = _make_manager()
+
+    with patch("app.core.agent_runner.run_local_agent", new_callable=AsyncMock) as mock_run:
+        await trigger_pending_runs(_FREE_UID, "dev-001", mgr)
+
+    mock_run.assert_not_called()
+
+
+@pytest.mark.asyncio
+async def test_trigger_pending_runs_device_id_filter():
+    """Device filtering is no longer backend-managed in pending runs."""
+
+    mgr = _make_manager(device_id="dev-001")
+
+    with patch("app.core.agent_runner.run_local_agent", new_callable=AsyncMock) as mock_run:
+        await trigger_pending_runs(_FREE_UID, "dev-001", mgr)
+
+    mock_run.assert_not_called()
+
+
+@pytest.mark.asyncio
+async def test_trigger_pending_runs_dispatches_overdue():
+    """No pending runs are dispatched by backend after config deprecation."""
+
+    mgr = _make_manager()
+
+    with patch("app.core.agent_runner.run_local_agent", new_callable=AsyncMock) as mock_run:
+        await trigger_pending_runs(_FREE_UID, "dev-001", mgr)
+
+    mock_run.assert_not_called()
+
+
+# ---------------------------------------------------------------------------
+# Integration: POST /agents/can-create and /agents/trigger
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture(autouse=True)
+def _override_db(db_session):
+    """Route all get_session calls to the test SQLite session."""
+
+    async def _gen():
+        yield db_session
+
+    app.dependency_overrides[get_session] = _gen
+    yield
+    app.dependency_overrides.pop(get_session, None)
+
+
+@pytest.mark.asyncio
+async def test_can_create_agent_allows_when_under_limit(client):
+    """POST /agents/can-create returns allowed=True when under tier limit."""
+    resp = client.post(
+        "/api/v1/agents/can-create",
+        json={"active_agents": 0},
+        headers=auth_header("free"),
+    )
+    assert resp.status_code == 200
+    body = resp.json()
+    assert body["allowed"] is True
+    assert body["tier"] == "free"
+    assert body["active_agents"] == 0
+    assert body["limit"] == 2
+
+
+@pytest.mark.asyncio
+async def test_can_create_agent_denies_when_at_limit(client):
+    """POST /agents/can-create returns allowed=False at free-tier limit."""
+    resp = client.post(
+        "/api/v1/agents/can-create",
+        json={"active_agents": 2},
+        headers=auth_header("free"),
+    )
+    assert resp.status_code == 200
+    body = resp.json()
+    assert body["allowed"] is False
+    assert body["limit"] == 2
+
+
+@pytest.mark.asyncio
+async def test_trigger_run_local_agent_creates_run_log(client, db_session):
+    """POST /agents/trigger creates a local run log and dispatches background task."""
+    dispatched: list[tuple[str, str]] = []
+
+    async def _fake_run(user_id, cfg, run_log, device_mgr):
+        dispatched.append((user_id, cfg.id))
+
+    def _fake_create_task(coro):
+        coro.close()
+        return MagicMock()
+
+    with patch("app.api.routes.agents.run_local_agent", new_callable=AsyncMock, side_effect=_fake_run), \
+         patch("asyncio.create_task") as mock_create_task:
+        mock_create_task.side_effect = _fake_create_task
+        resp = client.post(
+            "/api/v1/agents/trigger",
+            json={
+                "directory": "/home/user/docs",
+                "what_to_extract": ["task", "note"],
+                "actions_by_type": {"task": ["add", "update"], "note": ["add"]},
+                "batch_interval": "0 */6 * * *",
+                "custom_agent_prompt": "Extract tasks and notes.",
+                "active_agents": 0,
+            },
+            headers=auth_header("power"),
+        )
+
+    assert resp.status_code == 202
+    data = resp.json()
+    assert isinstance(data["agent_id"], str)
+    assert data["agent_id"]
+    assert data["status"] == "running"
+    assert data["agent_type"] == "local"
+
+    # Verify create_task was called (dispatching background run).
+    mock_create_task.assert_called_once()
--- a/tests/test_agent_runner_v2.py
+++ b/tests/test_agent_runner_v2.py
@@ -0,0 +1,432 @@
+"""Tests for Local Agent V2 runner (Step 2).
+
+Covers the unified per-file flow:
+  Phase A — detect + preprocess (Python, zero LLM)
+  Phase B — single LLM call with tools (classify + extract + create)
+
+Fixture-based eval tests (2.1–2.7)
+-----------------------------------
+Cases are defined in tests/fixtures/agent_runner_v2/cases.yaml.
+Email HTML files live in tests/fixtures/agent_runner_v2/data/.
+Use --runner-dir to point at a custom folder (same structure required).
+
+Unit tests (no LLM)
+--------------------
+  2.8  items_created count   → items_created == N create_* calls
+  2.9  Device offline        → status=error
+  2.10 Empty file            → items_processed=0, status=success
+
+Run:
+    pytest tests/test_agent_runner_v2.py -v
+    pytest tests/test_agent_runner_v2.py -v -k "2_9 or 2_10 or 2_8"   # unit only
+    pytest tests/test_agent_runner_v2.py -v -k "eval"                  # LLM evals only
+    pytest tests/test_agent_runner_v2.py -v --runner-dir /path/to/dir  # custom fixtures
+"""
+
+from __future__ import annotations
+
+import uuid
+from contextlib import nullcontext
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+import yaml
+
+from app.core.agent_runner import (
+    _format_metadata,
+    _format_projects,
+    _get_extraction_rules,
+    _get_no_match_behavior,
+    _is_overdue,
+    run_local_agent,
+)
+from app.core.device_manager import DeviceConnectionManager
+from app.core.langfuse_client import get_langfuse
+from app.models import AgentRunLog, LocalAgentConfig
+from tests.conftest import TEST_USER_IDS
+
+# ── Constants ─────────────────────────────────────────────────────────────
+
+_USER_ID = TEST_USER_IDS["power"]
+
+_DEFAULT_FIXTURE_DIR = Path(__file__).parent / "fixtures" / "agent_runner_v2"
+
+_AGENT_CONFIG = {
+    "content_types": [
+        {
+            "id": "email_html",
+            "label": "Email HTML",
+            "detection_hint": "HTML file with From/To/Subject headers",
+            "preprocessing": "email_html",
+            "extraction_prompt": (
+                "If the email contains a direct action request or task assignment → create a task. "
+                "If the email contains informational content, updates, or FYI → create a note. "
+                "If the email mentions a specific date for a meeting or deadline → create a timeline entry."
+            ),
+        }
+    ],
+    "global_rules": [
+        "Se il file non è riconducibile a nessun progetto, non creare alcuna entità."
+    ],
+    "data_types": ["tasks", "notes", "timelines"],
+}
+
+# Canonical project definitions, referenced symbolically in cases.yaml.
+_PROJECTS: dict[str, dict] = {
+    "alpha": {"id": "proj-alpha", "name": "Project Alpha", "status": "active"},
+    "beta":  {"id": "proj-beta",  "name": "Project Beta",  "status": "active"},
+}
+
+
+# ── Fixture loading ───────────────────────────────────────────────────────
+
+
+def _fixtures_dir(config) -> Path:
+    override = config.getoption("--runner-dir")
+    return Path(override) if override else _DEFAULT_FIXTURE_DIR
+
+
+def _load_cases(config) -> list[dict]:
+    return yaml.safe_load(
+        (_fixtures_dir(config) / "cases.yaml").read_text(encoding="utf-8")
+    )
+
+
+def _read_case_file(case: dict, data_dir: Path) -> str:
+    return (data_dir / case["file"]).read_text(encoding="utf-8")
+
+
+def _resolve_projects(entries: list[str | dict]) -> list[dict]:
+    """Resolve project list from YAML: symbolic names and/or inline dicts."""
+    result = []
+    for entry in entries:
+        if isinstance(entry, str):
+            if entry in _PROJECTS:
+                result.append(_PROJECTS[entry])
+        elif isinstance(entry, dict):
+            result.append(entry)
+    return result
+
+
+# ── pytest_generate_tests — parametrize eval tests from YAML ─────────────
+
+
+def pytest_generate_tests(metafunc):
+    if "runner_case" not in metafunc.fixturenames:
+        return
+    cases = _load_cases(metafunc.config)
+    metafunc.parametrize("runner_case", cases, ids=[c["id"] for c in cases])
+
+
+# ── Test helpers ──────────────────────────────────────────────────────────
+
+
+def _make_config(
+    agent_config: dict | None = None,
+    directory: str = "/emails",
+    device_id: str = "dev-001",
+) -> LocalAgentConfig:
+    return LocalAgentConfig(
+        id=str(uuid.uuid4()),
+        user_id=_USER_ID,
+        device_id=device_id,
+        name="Test V2 Agent",
+        directory_paths=[directory],
+        data_types=["tasks", "notes", "timelines"],
+        prompt_template="",
+        agent_config=agent_config or _AGENT_CONFIG,
+        file_extensions=[".html", ".eml"],
+        schedule_cron="0 */6 * * *",
+        enabled=True,
+        last_run_at=None,
+    )
+
+
+def _make_run_log(agent_id: str) -> AgentRunLog:
+    return AgentRunLog(
+        id=str(uuid.uuid4()),
+        agent_id=agent_id,
+        agent_type="local",
+        user_id=_USER_ID,
+        status="running",
+        started_at=datetime.now(timezone.utc),
+    )
+
+
+def _make_manager(online: bool = True) -> DeviceConnectionManager:
+    mgr = DeviceConnectionManager()
+    if online:
+        ws = MagicMock()
+        ws.send_text = AsyncMock()
+        mgr.register(_USER_ID, "dev-001", ws)
+    return mgr
+
+
+def _make_executor(
+    file_path: str,
+    file_content: str,
+    projects: list[dict] | None = None,
+    existing_tasks: list[dict] | None = None,
+    existing_notes: list[dict] | None = None,
+    existing_timelines: list[dict] | None = None,
+) -> tuple[Any, list[dict]]:
+    """Return (async_executor, captured_calls).
+
+    The executor handles all ``execute_on_client`` payloads:
+    directory listing, file reading, project/entity fetching, and CRUD.
+    """
+    calls: list[dict] = []
+    _projects = projects if projects is not None else list(_PROJECTS.values())
+
+    async def _executor(payload: dict) -> dict:
+        action = payload.get("action", "")
+        table = payload.get("table", "")
+        data = payload.get("data") or {}
+        calls.append({"action": action, "table": table, "data": data})
+
+        if action == "list_directory":
+            return {"entries": [{"type": "file", "path": file_path}]}
+
+        if action == "get_file_metadata":
+            return {"modifiedAt": None}
+
+        if action == "read_file_content":
+            return {"content": file_content}
+
+        if action == "select":
+            if table == "projects":
+                return {"rows": _projects}
+            if table == "tasks":
+                return {"rows": existing_tasks or []}
+            if table == "notes":
+                return {"rows": existing_notes or []}
+            if table == "timelines":
+                return {"rows": existing_timelines or []}
+            return {"rows": []}
+
+        if action == "insert":
+            return {"row": {"id": str(uuid.uuid4()), **data}}
+
+        if action == "update":
+            return {"success": True}
+
+        return {}
+
+    return _executor, calls
+
+
+# ── Unit: helper functions ────────────────────────────────────────────────
+
+
+def test_format_projects_empty():
+    assert "(no projects" in _format_projects([])
+
+
+def test_format_projects_with_data():
+    result = _format_projects([_PROJECTS["alpha"]])
+    assert "proj-alpha" in result
+    assert "Project Alpha" in result
+
+
+def test_format_metadata_empty():
+    assert _format_metadata({}) == ""
+
+
+def test_format_metadata_email():
+    meta = {"subject": "Fix bug", "from": "boss@co.com", "date": "2026-04-07"}
+    result = _format_metadata(meta)
+    assert "Fix bug" in result
+    assert "boss@co.com" in result
+
+
+def test_get_extraction_rules_match():
+    rules = _get_extraction_rules(_AGENT_CONFIG, "email_html")
+    assert "task" in rules.lower()
+
+
+def test_get_extraction_rules_fallback():
+    rules = _get_extraction_rules(_AGENT_CONFIG, "plain_text")
+    assert "extract" in rules.lower()
+
+
+def test_get_no_match_behavior_from_global_rules():
+    behavior = _get_no_match_behavior(_AGENT_CONFIG)
+    assert behavior  # non-empty
+
+
+def test_get_no_match_behavior_default():
+    behavior = _get_no_match_behavior({})
+    assert "project" in behavior.lower()
+
+
+# ── Unit: 2.9 — device offline ───────────────────────────────────────────
+
+
+@pytest.mark.asyncio
+async def test_2_9_device_offline():
+    """2.9 No device online → status=error, no executor created."""
+    config = _make_config()
+    run_log = _make_run_log(config.id)
+    mgr = _make_manager(online=False)
+
+    with patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock) as mock_fin:
+        await run_local_agent(_USER_ID, config, run_log, mgr)
+
+    _, kwargs = mock_fin.call_args
+    assert kwargs["status"] == "error"
+    assert any("not connected" in e for e in kwargs.get("errors", []))
+
+
+# ── Unit: 2.10 — empty file ──────────────────────────────────────────────
+
+
+@pytest.mark.asyncio
+async def test_2_10_empty_file():
+    """2.10 File with empty content → skipped, items_processed=0, success."""
+    config = _make_config()
+    run_log = _make_run_log(config.id)
+    mgr = _make_manager()
+
+    executor, calls = _make_executor(
+        file_path="/emails/empty.html",
+        file_content="",
+        projects=[_PROJECTS["alpha"]],
+    )
+
+    with patch("app.core.agent_runner._make_agent_executor", return_value=executor), \
+         patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock) as mock_fin:
+        await run_local_agent(_USER_ID, config, run_log, mgr)
+
+    _, kwargs = mock_fin.call_args
+    assert kwargs["items_processed"] == 0
+    assert kwargs["status"] == "success"
+    assert kwargs["items_created"] == 0
+
+
+# ── Unit: 2.8 — items_created count ─────────────────────────────────────
+
+
+@pytest.mark.asyncio
+async def test_2_8_items_created_count():
+    """2.8 items_created == number of create_* tool calls per run."""
+    config = _make_config()
+    run_log = _make_run_log(config.id)
+    mgr = _make_manager()
+
+    executor, _calls = _make_executor(
+        file_path="/emails/action.html",
+        file_content="<html><body><p>Fix the login bug in Project Alpha.</p></body></html>",
+        projects=[_PROJECTS["alpha"]],
+    )
+
+    async def mock_run_agent(*, _tool_calls_out=None, **kw) -> str:
+        if _tool_calls_out is not None:
+            _tool_calls_out.extend(["create_task", "create_note", "update_task"])
+        return "Done."
+
+    with patch("app.core.agent_runner._make_agent_executor", return_value=executor), \
+         patch("app.core.agent_runner._run_agent_with_tools", side_effect=mock_run_agent), \
+         patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock) as mock_fin:
+        await run_local_agent(_USER_ID, config, run_log, mgr)
+
+    _, kwargs = mock_fin.call_args
+    # Only create_task + create_note count (not update_task).
+    assert kwargs["items_created"] == 2
+    assert kwargs["items_processed"] == 1
+
+
+# ── Eval: 2.1–2.7 — fixture-driven, real LLM + Langfuse scoring ──────────
+#
+# Cases loaded from tests/fixtures/agent_runner_v2/cases.yaml.
+# Supported assertions (from YAML):
+#   expect_insert: <table>   → at least 1 insert in that table
+#   expect_no_insert: true   → zero inserts in any table
+#   expect_project_id: <id>  → any insert carries this projectId
+#   expect_dedup: true       → task inserts == 0 OR task updates >= 1
+# ─────────────────────────────────────────────────────────────────────────
+
+
+@pytest.mark.asyncio
+@pytest.mark.eval
+async def test_eval_runner(runner_case, pytestconfig):
+    """Parametrized eval test — one invocation per YAML case."""
+    case: dict = runner_case
+    data_dir = _fixtures_dir(pytestconfig) / "data"
+    file_content = _read_case_file(case, data_dir)
+    projects = _resolve_projects(case.get("projects", []))
+
+    config = _make_config()
+    run_log = _make_run_log(config.id)
+    mgr = _make_manager()
+
+    executor, calls = _make_executor(
+        file_path=case["file_path"],
+        file_content=file_content,
+        projects=projects,
+        existing_tasks=case.get("existing_tasks"),
+        existing_notes=case.get("existing_notes"),
+        existing_timelines=case.get("existing_timelines"),
+    )
+
+    lf = get_langfuse()
+    obs_ctx = lf.start_as_current_observation(
+        name=f"eval-runner-{case['id']}-{case.get('score_name', 'unknown').replace('.', '-')}",
+        metadata={"step": "2", "case_id": case["id"]},
+    ) if lf else nullcontext()
+
+    with obs_ctx as obs:
+        with patch("app.core.agent_runner._make_agent_executor", return_value=executor), \
+             patch("app.core.agent_runner._finalize_run", new_callable=AsyncMock) as mock_fin:
+            await run_local_agent(_USER_ID, config, run_log, mgr)
+
+        _, kwargs = mock_fin.call_args
+        inserts = [c for c in calls if c["action"] == "insert"]
+        score, comment = _evaluate_case(case, calls, kwargs)
+
+        if obs is not None:
+            obs.score(
+                name=case.get("score_name", f"runner.case_{case['id']}"),
+                value=score,
+                comment=comment,
+            )
+
+    if lf:
+        lf.flush()
+
+    assert score == 1.0, f"[{case['id']}] {case.get('description', '')} — {comment}"
+
+
+def _evaluate_case(case: dict, calls: list[dict], finalize_kwargs: dict) -> tuple[float, str]:
+    """Return (score, comment) for a YAML case given the captured executor calls."""
+    inserts = [c for c in calls if c["action"] == "insert"]
+
+    if case.get("expect_no_insert"):
+        score = 1.0 if len(inserts) == 0 else 0.0
+        return score, f"inserts={len(inserts)} (expected 0)"
+
+    if "expect_insert" in case:
+        tables = case["expect_insert"]
+        if isinstance(tables, str):
+            tables = [tables]
+        missing = [t for t in tables if not any(c["table"] == t for c in inserts)]
+        score = 1.0 if not missing else 0.0
+        counts = {t: sum(1 for c in inserts if c["table"] == t) for t in tables}
+        return score, f"inserts={counts}" + (f" missing={missing}" if missing else "")
+
+    if "expect_project_id" in case:
+        expected_pid = case["expect_project_id"]
+        correct = any(c.get("data", {}).get("projectId") == expected_pid for c in inserts)
+        score = 1.0 if correct else 0.0
+        all_pids = [c.get("data", {}).get("projectId") for c in inserts]
+        return score, f"projectIds={all_pids} (expected {expected_pid!r})"
+
+    if case.get("expect_dedup"):
+        task_creates = [c for c in inserts if c["table"] == "tasks"]
+        task_updates = [c for c in calls if c["action"] == "update" and c["table"] == "tasks"]
+        score = 1.0 if len(task_creates) == 0 or len(task_updates) >= 1 else 0.0
+        return score, f"task_creates={len(task_creates)} task_updates={len(task_updates)}"
+
+    return 0.0, "no assertion defined in case"
--- a/tests/test_agent_setup.py
+++ b/tests/test_agent_setup.py
@@ -0,0 +1,243 @@
+"""Tests for the Chatbot Journey endpoints.
+
+Covers:
+  1. Start journey for local agent → session_id + first question, done=False
+  2. Start journey for cloud agent → contextual email-focused question
+  3. Start journey with existing agent_id → session seeded, first question returned
+  4. Start journey with non-existent agent_id → still succeeds (graceful fallback)
+  5. Message: continue conversation → done=False, follow-up question returned
+  6. Message: LLM wraps up → done=True + prompt_template extracted correctly
+  7. Message with max-turns nudge → no crash, returns response
+  8. Invalid session_id → 404
+  9. Expired session → 404
+  10. Session ownership: user B cannot access user A's session
+  11. No JWT on /start → 401
+  12. No JWT on /message → 401
+"""
+
+from __future__ import annotations
+
+import time
+import uuid
+from unittest.mock import AsyncMock, patch
+
+import pytest
+from fastapi.testclient import TestClient
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.api.routes.agent_setup import (
+    _SESSION_TTL_SECONDS,
+    _TEMPLATE_END,
+    _TEMPLATE_START,
+    _extract_template,
+    _sessions,
+)
+from app.models import LocalAgentConfig
+from tests.conftest import TEST_USER_IDS, auth_header
+
+# ── Helpers ──────────────────────────────────────────────────────────────
+
+
+def _start(client: TestClient, agent_type: str = "local", agent_id: str | None = None, tier: str = "power") -> dict:
+    body: dict = {"agent_type": agent_type}
+    if agent_id:
+        body["agent_id"] = agent_id
+    resp = client.post("/api/v1/agents/journey/start", json=body, headers=auth_header(tier))
+    return resp
+
+
+def _message(client: TestClient, session_id: str, message: str, tier: str = "power") -> dict:
+    return client.post(
+        "/api/v1/agents/journey/message",
+        json={"session_id": session_id, "message": message},
+        headers=auth_header(tier),
+    )
+
+
+# ── Unit: _extract_template ───────────────────────────────────────────────
+
+
+def test_extract_template_present():
+    text = f"Some preamble.\n{_TEMPLATE_START}\nExtract tasks from emails.\n{_TEMPLATE_END}\nTrailing text."
+    result = _extract_template(text)
+    assert result == "Extract tasks from emails."
+
+
+def test_extract_template_absent():
+    assert _extract_template("No markers here.") is None
+
+
+def test_extract_template_empty_content():
+    text = f"{_TEMPLATE_START}\n{_TEMPLATE_END}"
+    assert _extract_template(text) is None
+
+
+# ── Start journey ─────────────────────────────────────────────────────────
+
+
+def test_start_journey_local(client: TestClient):
+    resp = _start(client, agent_type="local")
+    assert resp.status_code == 200
+    body = resp.json()
+    assert "session_id" in body
+    assert body["done"] is False
+    assert body["prompt_template"] is None
+    assert len(body["message"]) > 0
+    # Local question should be about files/directories
+    assert any(w in body["message"].lower() for w in ("file", "director", "document", "monitor"))
+
+
+def test_start_journey_cloud(client: TestClient):
+    resp = _start(client, agent_type="cloud")
+    assert resp.status_code == 200
+    body = resp.json()
+    assert body["done"] is False
+    # Cloud question should mention emails or messages
+    assert any(w in body["message"].lower() for w in ("email", "message", "communication"))
+
+
+def test_start_journey_with_agent_id(client: TestClient, db_session: AsyncSession):
+    """When agent_id is provided, session should be created even if agent doesn't exist."""
+    fake_agent_id = str(uuid.uuid4())
+    resp = _start(client, agent_type="local", agent_id=fake_agent_id)
+    # Should succeed gracefully even if the agent_id doesn't exist
+    assert resp.status_code == 200
+    body = resp.json()
+    assert body["done"] is False
+
+
+def test_start_journey_with_existing_agent(client: TestClient, db_session: AsyncSession):
+    """When a real local agent is provided, session is seeded with its prompt_template."""
+    import asyncio
+
+    user_id = TEST_USER_IDS["power"]
+    agent = LocalAgentConfig(
+        id=str(uuid.uuid4()),
+        user_id=user_id,
+        name="Test Agent",
+        device_id="device-1",
+        directory_paths=["/home/user/emails"],
+        data_types=["tasks"],
+        prompt_template="Extract tasks from .eml files.",
+        file_extensions=[".eml"],
+        schedule_cron="0 */6 * * *",
+        enabled=True,
+    )
+
+    async def _seed():
+        db_session.add(agent)
+        await db_session.commit()
+
+    asyncio.get_event_loop().run_until_complete(_seed())
+
+    resp = _start(client, agent_type="local", agent_id=agent.id)
+    assert resp.status_code == 200
+    body = resp.json()
+    assert body["done"] is False
+    # The session should be stored
+    assert body["session_id"] in _sessions
+
+
+def test_start_journey_requires_auth(client: TestClient):
+    resp = client.post("/api/v1/agents/journey/start", json={"agent_type": "local"})
+    assert resp.status_code == 401
+
+
+# ── Message ───────────────────────────────────────────────────────────────
+
+
+def test_message_continues_conversation(client: TestClient):
+    """A mid-journey reply (no template markers) returns done=False."""
+    follow_up = "That looks good. Can you tell me more about priority rules?"
+
+    with patch("app.api.routes.agent_setup._call_llm", new=AsyncMock(return_value=follow_up)):
+        start_resp = _start(client, agent_type="local")
+        assert start_resp.status_code == 200
+        session_id = start_resp.json()["session_id"]
+
+        msg_resp = _message(client, session_id, "I have .eml and .txt files")
+        assert msg_resp.status_code == 200
+        body = msg_resp.json()
+        assert body["done"] is False
+        assert body["prompt_template"] is None
+        assert body["message"] == follow_up
+        assert body["session_id"] == session_id
+
+
+def test_message_produces_template(client: TestClient):
+    """When the LLM includes PROMPT_TEMPLATE markers, done=True and prompt_template is set."""
+    final_template = "Extract tasks from email. Subject → title. 'urgent' → high priority."
+    llm_response = (
+        "Great, I have all the information I need.\n"
+        f"{_TEMPLATE_START}\n{final_template}\n{_TEMPLATE_END}\n"
+    )
+
+    with patch("app.api.routes.agent_setup._call_llm", new=AsyncMock(return_value=llm_response)):
+        start_resp = _start(client, agent_type="cloud")
+        assert start_resp.status_code == 200
+        session_id = start_resp.json()["session_id"]
+
+        msg_resp = _message(client, session_id, "Only invoices from clients")
+        assert msg_resp.status_code == 200
+        body = msg_resp.json()
+        assert body["done"] is True
+        assert body["prompt_template"] == final_template
+        # Session should be cleaned up
+        assert session_id not in _sessions
+
+
+def test_message_invalid_session(client: TestClient):
+    resp = _message(client, "nonexistent-session-id", "hello")
+    assert resp.status_code == 404
+
+
+def test_message_wrong_owner(client: TestClient):
+    """User B cannot access user A's session."""
+    start_resp = _start(client, agent_type="local", tier="power")
+    session_id = start_resp.json()["session_id"]
+
+    # user with "pro" tier (different user_id) tries to send a message
+    resp = client.post(
+        "/api/v1/agents/journey/message",
+        json={"session_id": session_id, "message": "hello"},
+        headers=auth_header("pro"),  # different user
+    )
+    assert resp.status_code == 404
+
+
+def test_message_expired_session(client: TestClient):
+    """Expired sessions return 404."""
+    start_resp = _start(client, agent_type="local")
+    session_id = start_resp.json()["session_id"]
+
+    # Manually expire the session
+    _sessions[session_id].created_at = time.monotonic() - _SESSION_TTL_SECONDS - 1
+
+    resp = _message(client, session_id, "hello")
+    assert resp.status_code == 404
+
+
+def test_message_requires_auth(client: TestClient):
+    resp = client.post(
+        "/api/v1/agents/journey/message",
+        json={"session_id": "any", "message": "hello"},
+    )
+    assert resp.status_code == 401
+
+
+def test_message_max_turns_nudge(client: TestClient):
+    """After _MAX_TURNS user messages, a system nudge is appended but no crash occurs."""
+    from app.api.routes.agent_setup import _MAX_TURNS
+
+    follow_up = "Tell me more about priority rules."
+
+    with patch("app.api.routes.agent_setup._call_llm", new=AsyncMock(return_value=follow_up)):
+        start_resp = _start(client, agent_type="local")
+        session_id = start_resp.json()["session_id"]
+
+        for i in range(_MAX_TURNS):
+            resp = _message(client, session_id, f"Answer {i + 1}")
+            assert resp.status_code == 200
+            # While no template produced, session must still exist
+            if resp.json()["done"]:
+                break  # LLM decided to wrap up early — also fine
--- a/tests/test_agents.py
+++ b/tests/test_agents.py
@@ -1,620 +0,0 @@
-"""Unit tests for the four domain-specific chat agents with mocked LLM."""
-
-from __future__ import annotations
-
-import json
-from typing import Any
-from unittest.mock import AsyncMock, MagicMock, patch
-
-import pytest
-
-import app.agents  # noqa: F401 — triggers @registry.register decorators
-from app.agents.checkpoint_agent import CheckpointAgent
-from app.agents.note_agent import NoteAgent
-from app.agents.project_agent import ProjectAgent
-from app.agents.task_agent import TaskAgent
-from app.core.agent_registry import registry
-
-
-# ── Helpers ──────────────────────────────────────────────────────────
-
-
-def _mock_llm(response_text: str) -> MagicMock:
-    """Return a mock LLM that responds with *response_text* (no tool calls)."""
-    msg = MagicMock()
-    msg.content = response_text
-    msg.tool_calls = []
-    llm = MagicMock()
-    bound = MagicMock()
-    bound.ainvoke = AsyncMock(return_value=msg)
-    llm.bind_tools = MagicMock(return_value=bound)
-    llm.ainvoke = AsyncMock(return_value=msg)
-    return llm
-
-
-def _mock_llm_with_tool_call(
-    tool_name: str, tool_args: dict[str, Any], final_text: str
-) -> MagicMock:
-    """Mock LLM that fires one tool call then returns *final_text*."""
-    tool_msg = MagicMock()
-    tool_msg.content = ""
-    tool_msg.tool_calls = [{"id": "call_1", "name": tool_name, "args": tool_args}]
-
-    final_msg = MagicMock()
-    final_msg.content = final_text
-    final_msg.tool_calls = []
-
-    bound = MagicMock()
-    bound.ainvoke = AsyncMock(side_effect=[tool_msg, final_msg])
-
-    llm = MagicMock()
-    llm.bind_tools = MagicMock(return_value=bound)
-    llm.ainvoke = AsyncMock(return_value=final_msg)
-    return llm
-
-
-# ── Registration ──────────────────────────────────────────────────────
-
-
-class TestAgentRegistration:
-    def test_all_agents_registered(self) -> None:
-        names = {a["name"] for a in registry.list_agents()}
-        assert {
-            "task_agent", "checkpoint_agent", "project_agent", "note_agent"
-        }.issubset(names)
-
-    def test_registry_returns_correct_types(self) -> None:
-        assert isinstance(registry.get("task_agent"), TaskAgent)
-        assert isinstance(registry.get("checkpoint_agent"), CheckpointAgent)
-        assert isinstance(registry.get("project_agent"), ProjectAgent)
-        assert isinstance(registry.get("note_agent"), NoteAgent)
-
-    def test_descriptions_present(self) -> None:
-        for agent_info in registry.list_agents():
-            assert agent_info["description"], f"Empty description: {agent_info['name']}"
-
-
-# ── TaskAgent ─────────────────────────────────────────────────────────
-
-
-class TestTaskAgent:
-    def test_name(self) -> None:
-        assert TaskAgent().get_name() == "task_agent"
-
-    def test_description(self) -> None:
-        assert TaskAgent().get_description() == "Manages tasks and comments: list, create, update, delete, due-today, comments"
-
-    def test_get_tools_count(self) -> None:
-        assert len(TaskAgent().get_tools()) == 8
-
-    def test_tool_names(self) -> None:
-        names = {t.name for t in TaskAgent().get_tools()}
-        assert names == {
-            "list_tasks",
-            "create_task",
-            "update_task",
-            "delete_task",
-            "list_tasks_due_today",
-            "list_task_comments",
-            "add_task_comment",
-            "delete_task_comment",
-        }
-
-    @pytest.mark.asyncio
-    async def test_handle_returns_string(self) -> None:
-        with patch("app.agents.task_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("Task created.")
-            result = await TaskAgent().handle("create a task", {})
-        assert isinstance(result, str)
-
-    @pytest.mark.asyncio
-    async def test_handle_no_tool_calls(self) -> None:
-        with patch("app.agents.task_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("Here are your tasks.")
-            result = await TaskAgent().handle("list my tasks", {})
-        assert result == "Here are your tasks."
-
-    @pytest.mark.asyncio
-    async def test_handle_with_create_task_tool_call(self) -> None:
-        with patch("app.agents.task_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm_with_tool_call(
-                "create_task",
-                {"title": "Buy groceries", "priority": "low"},
-                "Task 'Buy groceries' created.",
-            )
-            result = await TaskAgent().handle("add a grocery task", {})
-        assert result == "Task 'Buy groceries' created."
-
-    @pytest.mark.asyncio
-    async def test_handle_accepts_empty_context(self) -> None:
-        with patch("app.agents.task_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("Done.")
-            result = await TaskAgent().handle("help", {})
-        assert isinstance(result, str)
-
-    @pytest.mark.asyncio
-    async def test_handle_accepts_rich_context(self) -> None:
-        context = {
-            "user_profile": {"id": "u1", "tier": "pro"},
-            "recent_tasks": [{"id": "t1", "title": "Old task"}],
-        }
-        with patch("app.agents.task_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("Tasks listed.")
-            result = await TaskAgent().handle("show tasks", context)
-        assert isinstance(result, str)
-
-
-class TestTaskAgentTools:
-    @pytest.mark.asyncio
-    async def test_list_tasks_defaults(self) -> None:
-        from app.agents.task_agent import list_tasks
-        result = await list_tasks.ainvoke({})
-        data = json.loads(result)
-        assert data["action"] == "list"
-        assert data["table"] == "tasks"
-
-    @pytest.mark.asyncio
-    async def test_list_tasks_with_status_filter(self) -> None:
-        from app.agents.task_agent import list_tasks
-        result = await list_tasks.ainvoke({"status": "done"})
-        data = json.loads(result)
-        assert data["filters"]["status"] == "done"
-
-    @pytest.mark.asyncio
-    async def test_create_task_defaults(self) -> None:
-        from app.agents.task_agent import create_task
-        result = await create_task.ainvoke({"title": "Test task"})
-        data = json.loads(result)
-        assert data["action"] == "create_record"
-        assert data["table"] == "tasks"
-        assert data["data"]["title"] == "Test task"
-        assert data["data"]["status"] == "todo"
-        assert data["data"]["priority"] == "medium"
-
-    @pytest.mark.asyncio
-    async def test_create_task_with_all_fields(self) -> None:
-        from app.agents.task_agent import create_task
-        result = await create_task.ainvoke({
-            "title": "Deploy",
-            "priority": "high",
-            "status": "in_progress",
-            "project_id": "p1",
-            "is_ai_suggested": 1,
-        })
-        data = json.loads(result)
-        assert data["data"]["priority"] == "high"
-        assert data["data"]["status"] == "in_progress"
-        assert data["data"]["projectId"] == "p1"
-        assert data["data"]["isAiSuggested"] == 1
-
-    @pytest.mark.asyncio
-    async def test_update_task_with_status(self) -> None:
-        from app.agents.task_agent import update_task
-        result = await update_task.ainvoke({"task_id": "t1", "status": "done"})
-        data = json.loads(result)
-        assert data["action"] == "update_record"
-        assert data["data"]["id"] == "t1"
-        assert data["data"]["updates"]["status"] == "done"
-
-    @pytest.mark.asyncio
-    async def test_update_task_empty_updates(self) -> None:
-        from app.agents.task_agent import update_task
-        result = await update_task.ainvoke({"task_id": "t1"})
-        data = json.loads(result)
-        assert data["data"]["updates"] == {}
-
-    @pytest.mark.asyncio
-    async def test_delete_task(self) -> None:
-        from app.agents.task_agent import delete_task
-        result = await delete_task.ainvoke({"task_id": "t1"})
-        data = json.loads(result)
-        assert data["action"] == "delete_record"
-        assert data["table"] == "tasks"
-        assert data["data"]["id"] == "t1"
-
-    @pytest.mark.asyncio
-    async def test_list_tasks_due_today(self) -> None:
-        from app.agents.task_agent import list_tasks_due_today
-        result = await list_tasks_due_today.ainvoke({})
-        data = json.loads(result)
-        assert data["action"] == "list_due_today"
-        assert data["table"] == "tasks"
-
-    @pytest.mark.asyncio
-    async def test_list_task_comments(self) -> None:
-        from app.agents.task_agent import list_task_comments
-        result = await list_task_comments.ainvoke({"task_id": "t1"})
-        data = json.loads(result)
-        assert data["action"] == "list"
-        assert data["table"] == "taskComments"
-        assert data["filters"]["taskId"] == "t1"
-
-    @pytest.mark.asyncio
-    async def test_add_task_comment(self) -> None:
-        from app.agents.task_agent import add_task_comment
-        result = await add_task_comment.ainvoke({
-            "task_id": "t1",
-            "author": "Alice",
-            "content": "Looks good!",
-        })
-        data = json.loads(result)
-        assert data["action"] == "create_record"
-        assert data["table"] == "taskComments"
-        assert data["data"]["taskId"] == "t1"
-        assert data["data"]["author"] == "Alice"
-        assert data["data"]["content"] == "Looks good!"
-
-    @pytest.mark.asyncio
-    async def test_delete_task_comment(self) -> None:
-        from app.agents.task_agent import delete_task_comment
-        result = await delete_task_comment.ainvoke({"comment_id": "c1"})
-        data = json.loads(result)
-        assert data["action"] == "delete_record"
-        assert data["table"] == "taskComments"
-        assert data["data"]["id"] == "c1"
-
-
-# ── CheckpointAgent ───────────────────────────────────────────────────
-
-
-class TestCheckpointAgent:
-    def test_name(self) -> None:
-        assert CheckpointAgent().get_name() == "checkpoint_agent"
-
-    def test_description(self) -> None:
-        assert CheckpointAgent().get_description() == "Manages project checkpoints (milestones): list, create, update, delete"
-
-    def test_get_tools_count(self) -> None:
-        assert len(CheckpointAgent().get_tools()) == 4
-
-    def test_tool_names(self) -> None:
-        names = {t.name for t in CheckpointAgent().get_tools()}
-        assert names == {"list_checkpoints", "create_checkpoint", "update_checkpoint", "delete_checkpoint"}
-
-    @pytest.mark.asyncio
-    async def test_handle_no_tool_calls(self) -> None:
-        with patch("app.agents.checkpoint_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("No checkpoints found.")
-            result = await CheckpointAgent().handle("list checkpoints", {})
-        assert result == "No checkpoints found."
-
-    @pytest.mark.asyncio
-    async def test_handle_with_create_tool_call(self) -> None:
-        with patch("app.agents.checkpoint_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm_with_tool_call(
-                "create_checkpoint",
-                {"project_id": "p1", "title": "MVP Launch", "date": 1700000000000},
-                "Checkpoint 'MVP Launch' created.",
-            )
-            result = await CheckpointAgent().handle("add MVP checkpoint", {})
-        assert result == "Checkpoint 'MVP Launch' created."
-
-    @pytest.mark.asyncio
-    async def test_handle_accepts_empty_context(self) -> None:
-        with patch("app.agents.checkpoint_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("Done.")
-            result = await CheckpointAgent().handle("show milestones", {})
-        assert isinstance(result, str)
-
-
-class TestCheckpointAgentTools:
-    @pytest.mark.asyncio
-    async def test_list_checkpoints_no_project(self) -> None:
-        from app.agents.checkpoint_agent import list_checkpoints
-        result = await list_checkpoints.ainvoke({})
-        data = json.loads(result)
-        assert data["action"] == "list"
-        assert data["table"] == "checkpoints"
-        assert data["filters"]["projectId"] is None
-
-    @pytest.mark.asyncio
-    async def test_list_checkpoints_with_project(self) -> None:
-        from app.agents.checkpoint_agent import list_checkpoints
-        result = await list_checkpoints.ainvoke({"project_id": "p1"})
-        data = json.loads(result)
-        assert data["filters"]["projectId"] == "p1"
-
-    @pytest.mark.asyncio
-    async def test_create_checkpoint(self) -> None:
-        from app.agents.checkpoint_agent import create_checkpoint
-        result = await create_checkpoint.ainvoke({
-            "project_id": "p1",
-            "title": "Beta release",
-            "date": 1700000000000,
-        })
-        data = json.loads(result)
-        assert data["action"] == "create_record"
-        assert data["table"] == "checkpoints"
-        assert data["data"]["projectId"] == "p1"
-        assert data["data"]["title"] == "Beta release"
-        assert data["data"]["date"] == 1700000000000
-
-    @pytest.mark.asyncio
-    async def test_create_checkpoint_ai_suggested(self) -> None:
-        from app.agents.checkpoint_agent import create_checkpoint
-        result = await create_checkpoint.ainvoke({
-            "project_id": "p1",
-            "title": "Review",
-            "date": 1700000000000,
-            "is_ai_suggested": 1,
-        })
-        data = json.loads(result)
-        assert data["data"]["isAiSuggested"] == 1
-        assert data["data"]["isApproved"] == 0
-
-    @pytest.mark.asyncio
-    async def test_update_checkpoint_approve(self) -> None:
-        from app.agents.checkpoint_agent import update_checkpoint
-        result = await update_checkpoint.ainvoke({
-            "checkpoint_id": "c1",
-            "is_approved": 1,
-        })
-        data = json.loads(result)
-        assert data["action"] == "update_record"
-        assert data["data"]["id"] == "c1"
-        assert data["data"]["updates"]["isApproved"] == 1
-
-    @pytest.mark.asyncio
-    async def test_update_checkpoint_empty_updates(self) -> None:
-        from app.agents.checkpoint_agent import update_checkpoint
-        result = await update_checkpoint.ainvoke({"checkpoint_id": "c1"})
-        data = json.loads(result)
-        assert data["data"]["updates"] == {}
-
-    @pytest.mark.asyncio
-    async def test_delete_checkpoint(self) -> None:
-        from app.agents.checkpoint_agent import delete_checkpoint
-        result = await delete_checkpoint.ainvoke({"checkpoint_id": "c1"})
-        data = json.loads(result)
-        assert data["action"] == "delete_record"
-        assert data["table"] == "checkpoints"
-        assert data["data"]["id"] == "c1"
-
-
-# ── ProjectAgent ──────────────────────────────────────────────────────
-
-
-class TestProjectAgent:
-    def test_name(self) -> None:
-        assert ProjectAgent().get_name() == "project_agent"
-
-    def test_description(self) -> None:
-        assert ProjectAgent().get_description() == "Manages projects: list, get, create, update, archive, delete"
-
-    def test_get_tools_count(self) -> None:
-        assert len(ProjectAgent().get_tools()) == 6
-
-    def test_tool_names(self) -> None:
-        names = {t.name for t in ProjectAgent().get_tools()}
-        assert names == {
-            "list_projects",
-            "list_all_projects",
-            "get_project",
-            "create_project",
-            "update_project",
-            "delete_project",
-        }
-
-    @pytest.mark.asyncio
-    async def test_handle_no_tool_calls(self) -> None:
-        with patch("app.agents.project_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("Project Alpha is active.")
-            result = await ProjectAgent().handle("show my projects", {})
-        assert result == "Project Alpha is active."
-
-    @pytest.mark.asyncio
-    async def test_handle_with_create_project_tool_call(self) -> None:
-        with patch("app.agents.project_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm_with_tool_call(
-                "create_project",
-                {"name": "Pippo"},
-                "Project 'Pippo' created.",
-            )
-            result = await ProjectAgent().handle("create project Pippo", {})
-        assert result == "Project 'Pippo' created."
-
-    @pytest.mark.asyncio
-    async def test_handle_accepts_empty_context(self) -> None:
-        with patch("app.agents.project_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("Done.")
-            result = await ProjectAgent().handle("archive old project", {})
-        assert isinstance(result, str)
-
-
-class TestProjectAgentTools:
-    @pytest.mark.asyncio
-    async def test_list_projects_defaults(self) -> None:
-        from app.agents.project_agent import list_projects
-        result = await list_projects.ainvoke({})
-        data = json.loads(result)
-        assert data["action"] == "list"
-        assert data["table"] == "projects"
-        assert data["filters"]["includeArchived"] is False
-
-    @pytest.mark.asyncio
-    async def test_list_projects_include_archived(self) -> None:
-        from app.agents.project_agent import list_projects
-        result = await list_projects.ainvoke({"include_archived": 1})
-        data = json.loads(result)
-        assert data["filters"]["includeArchived"] is True
-
-    @pytest.mark.asyncio
-    async def test_list_all_projects(self) -> None:
-        from app.agents.project_agent import list_all_projects
-        result = await list_all_projects.ainvoke({})
-        data = json.loads(result)
-        assert data["action"] == "list_all"
-        assert data["table"] == "projects"
-
-    @pytest.mark.asyncio
-    async def test_get_project(self) -> None:
-        from app.agents.project_agent import get_project
-        result = await get_project.ainvoke({"project_id": "p1"})
-        data = json.loads(result)
-        assert data["action"] == "get"
-        assert data["table"] == "projects"
-        assert data["data"]["id"] == "p1"
-
-    @pytest.mark.asyncio
-    async def test_create_project_name_only(self) -> None:
-        from app.agents.project_agent import create_project
-        result = await create_project.ainvoke({"name": "Alpha"})
-        data = json.loads(result)
-        assert data["action"] == "create_record"
-        assert data["data"]["name"] == "Alpha"
-        assert data["data"]["clientId"] is None
-
-    @pytest.mark.asyncio
-    async def test_create_project_with_client(self) -> None:
-        from app.agents.project_agent import create_project
-        result = await create_project.ainvoke({"name": "Beta", "client_id": "cl1"})
-        data = json.loads(result)
-        assert data["data"]["clientId"] == "cl1"
-
-    @pytest.mark.asyncio
-    async def test_update_project_archive(self) -> None:
-        from app.agents.project_agent import update_project
-        result = await update_project.ainvoke({"project_id": "p1", "status": "archived"})
-        data = json.loads(result)
-        assert data["action"] == "update_record"
-        assert data["data"]["id"] == "p1"
-        assert data["data"]["updates"]["status"] == "archived"
-
-    @pytest.mark.asyncio
-    async def test_update_project_empty_updates(self) -> None:
-        from app.agents.project_agent import update_project
-        result = await update_project.ainvoke({"project_id": "p1"})
-        data = json.loads(result)
-        assert data["data"]["updates"] == {}
-
-    @pytest.mark.asyncio
-    async def test_delete_project(self) -> None:
-        from app.agents.project_agent import delete_project
-        result = await delete_project.ainvoke({"project_id": "p1"})
-        data = json.loads(result)
-        assert data["action"] == "delete_record"
-        assert data["data"]["id"] == "p1"
-
-
-# ── NoteAgent ─────────────────────────────────────────────────────────
-
-
-class TestNoteAgent:
-    def test_name(self) -> None:
-        assert NoteAgent().get_name() == "note_agent"
-
-    def test_description(self) -> None:
-        assert NoteAgent().get_description() == "Manages notes: list, get, create, update, delete"
-
-    def test_get_tools_count(self) -> None:
-        assert len(NoteAgent().get_tools()) == 5
-
-    def test_tool_names(self) -> None:
-        names = {t.name for t in NoteAgent().get_tools()}
-        assert names == {"list_notes", "get_note", "create_note", "update_note", "delete_note"}
-
-    @pytest.mark.asyncio
-    async def test_handle_no_tool_calls(self) -> None:
-        with patch("app.agents.note_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("Note created.")
-            result = await NoteAgent().handle("create a note", {})
-        assert result == "Note created."
-
-    @pytest.mark.asyncio
-    async def test_handle_with_create_note_tool_call(self) -> None:
-        with patch("app.agents.note_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm_with_tool_call(
-                "create_note",
-                {"title": "Daily log", "content": "# Today\nAll good."},
-                "Note 'Daily log' created.",
-            )
-            result = await NoteAgent().handle("log today's progress", {})
-        assert result == "Note 'Daily log' created."
-
-    @pytest.mark.asyncio
-    async def test_handle_accepts_empty_context(self) -> None:
-        with patch("app.agents.note_agent.get_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("Done.")
-            result = await NoteAgent().handle("show notes", {})
-        assert isinstance(result, str)
-
-
-class TestNoteAgentTools:
-    @pytest.mark.asyncio
-    async def test_list_notes_no_project(self) -> None:
-        from app.agents.note_agent import list_notes
-        result = await list_notes.ainvoke({})
-        data = json.loads(result)
-        assert data["action"] == "list"
-        assert data["table"] == "notes"
-        assert data["filters"]["projectId"] is None
-
-    @pytest.mark.asyncio
-    async def test_list_notes_with_project(self) -> None:
-        from app.agents.note_agent import list_notes
-        result = await list_notes.ainvoke({"project_id": "p1"})
-        data = json.loads(result)
-        assert data["filters"]["projectId"] == "p1"
-
-    @pytest.mark.asyncio
-    async def test_get_note(self) -> None:
-        from app.agents.note_agent import get_note
-        result = await get_note.ainvoke({"note_id": "n1"})
-        data = json.loads(result)
-        assert data["action"] == "get"
-        assert data["table"] == "notes"
-        assert data["data"]["id"] == "n1"
-
-    @pytest.mark.asyncio
-    async def test_create_note_minimal(self) -> None:
-        from app.agents.note_agent import create_note
-        result = await create_note.ainvoke({
-            "title": "Daily log",
-            "content": "# Today\nAll good.",
-        })
-        data = json.loads(result)
-        assert data["action"] == "create_record"
-        assert data["table"] == "notes"
-        assert data["data"]["title"] == "Daily log"
-        assert data["data"]["content"] == "# Today\nAll good."
-        assert data["data"]["projectId"] is None
-
-    @pytest.mark.asyncio
-    async def test_create_note_with_project(self) -> None:
-        from app.agents.note_agent import create_note
-        result = await create_note.ainvoke({
-            "title": "Sprint notes",
-            "content": "## Sprint 1",
-            "project_id": "p1",
-        })
-        data = json.loads(result)
-        assert data["data"]["projectId"] == "p1"
-
-    @pytest.mark.asyncio
-    async def test_update_note_content_only(self) -> None:
-        from app.agents.note_agent import update_note
-        result = await update_note.ainvoke({
-            "note_id": "n1",
-            "content": "# Updated content",
-        })
-        data = json.loads(result)
-        assert data["action"] == "update_record"
-        assert data["data"]["id"] == "n1"
-        assert data["data"]["updates"]["content"] == "# Updated content"
-        assert "title" not in data["data"]["updates"]
-
-    @pytest.mark.asyncio
-    async def test_update_note_empty_updates(self) -> None:
-        from app.agents.note_agent import update_note
-        result = await update_note.ainvoke({"note_id": "n1"})
-        data = json.loads(result)
-        assert data["data"]["updates"] == {}
-
-    @pytest.mark.asyncio
-    async def test_delete_note(self) -> None:
-        from app.agents.note_agent import delete_note
-        result = await delete_note.ainvoke({"note_id": "n1"})
-        data = json.loads(result)
-        assert data["action"] == "delete_record"
-        assert data["table"] == "notes"
-        assert data["data"]["id"] == "n1"
--- a/tests/test_classify_file.py
+++ b/tests/test_classify_file.py
@@ -0,0 +1,184 @@
+"""Unit tests for Step 1 file classification (_classify_file).
+
+These tests call the real LLM so they require OPENAI_API_KEY / LLM env vars.
+Run with: pytest tests/test_classify_file.py -v
+
+To run a quick manual check against a real file without the full UI:
+    python -m tests.test_classify_file <path/to/file.txt> [project_name...]
+"""
+
+from __future__ import annotations
+
+import asyncio
+import sys
+
+import pytest
+
+from app.core.agent_runner import _classify_file
+
+
+# ── Fixtures ──────────────────────────────────────────────────────────────
+
+PROJECTS_SAMPLE = [
+    {
+        "id": "aaaa-0001-0000-0000-000000000001",
+        "name": "ARPA Sicilia POC",
+        "status": "active",
+        "aiSummary": "Proof of concept for AI features targeting ARPA Sicilia agency.",
+    },
+    {
+        "id": "bbbb-0002-0000-0000-000000000002",
+        "name": "SNAM AI Meeting Prep",
+        "status": "active",
+        "aiSummary": "AI-assisted preparation of meeting materials for SNAM.",
+    },
+    {
+        "id": "cccc-0003-0000-0000-000000000003",
+        "name": "SFERA+ Wave 2",
+        "status": "active",
+        "aiSummary": "Second wave of the SFERA+ whitelist project.",
+    },
+]
+
+ARPA_EMAIL = """\
+to: roberto.musso@hpe.com; luca.tondin@hpecds.com
+isImportance: normal
+hasAttachment: True
+---
+## Body
+Buongiorno,
+
+In riferimento alla riunione di ieri sul POC ARPA Sicilia, vi invio il riassunto
+dei deliverable concordati:
+- Preparare demo entro il 30 marzo
+- Condividere documentazione tecnica con il team ARPA
+- Fissare call di follow-up la prossima settimana
+
+Cordiali saluti
+Roberto Marchetti
+"""
+
+SNAM_EMAIL = """\
+to: roberto.musso@hpe.com
+isImportance: high
+hasAttachment: False
+---
+## Body
+Ciao,
+ti invio l'agenda per la riunione SNAM di domani.
+Per favore conferma la tua presenza.
+"""
+
+UNRELATED_EMAIL = """\
+to: roberto.musso@hpe.com
+isImportance: normal
+---
+## Body
+Benvenuto nel programma HPE Employee Learning Series.
+Completa la formazione richiesta entro la fine del trimestre.
+"""
+
+
+# ── Tests ─────────────────────────────────────────────────────────────────
+
+
+@pytest.mark.asyncio
+async def test_classify_arpa_matches_existing():
+    project_id, domains, new_name = await _classify_file(
+        file_path="arpa_email.txt",
+        file_content=ARPA_EMAIL,
+        projects=PROJECTS_SAMPLE,
+        config_data_types=["tasks", "notes", "timelines"],
+    )
+    assert project_id == "aaaa-0001-0000-0000-000000000001", (
+        f"Expected ARPA project, got project_id={project_id!r} new_name={new_name!r}"
+    )
+    assert new_name is None
+
+
+@pytest.mark.asyncio
+async def test_classify_snam_matches_existing():
+    project_id, domains, new_name = await _classify_file(
+        file_path="snam_email.txt",
+        file_content=SNAM_EMAIL,
+        projects=PROJECTS_SAMPLE,
+        config_data_types=["tasks", "notes"],
+    )
+    assert project_id == "bbbb-0002-0000-0000-000000000002", (
+        f"Expected SNAM project, got project_id={project_id!r} new_name={new_name!r}"
+    )
+
+
+@pytest.mark.asyncio
+async def test_classify_unrelated_returns_new():
+    project_id, domains, new_name = await _classify_file(
+        file_path="learning_email.txt",
+        file_content=UNRELATED_EMAIL,
+        projects=PROJECTS_SAMPLE,
+        config_data_types=["tasks", "notes"],
+    )
+    assert project_id == "new"
+    assert new_name is not None  # LLM should suggest a name
+
+
+@pytest.mark.asyncio
+async def test_classify_empty_file_returns_new():
+    project_id, domains, new_name = await _classify_file(
+        file_path="empty.txt",
+        file_content="   ",
+        projects=PROJECTS_SAMPLE,
+        config_data_types=["tasks"],
+    )
+    assert project_id == "new"
+
+
+@pytest.mark.asyncio
+async def test_classify_no_projects_returns_new():
+    project_id, domains, new_name = await _classify_file(
+        file_path="arpa_email.txt",
+        file_content=ARPA_EMAIL,
+        projects=[],
+        config_data_types=["tasks", "notes"],
+    )
+    assert project_id == "new"
+    assert new_name is not None
+
+
+# ── CLI quick-test runner ─────────────────────────────────────────────────
+
+
+async def _cli_test(file_path: str, project_names: list[str]) -> None:
+    """Run Step 1 classification against a real file from the CLI."""
+    import json
+    from pathlib import Path
+
+    content = Path(file_path).read_text(encoding="utf-8", errors="replace")
+    projects = [
+        {"id": f"test-id-{i:04d}", "name": name, "status": "active", "aiSummary": ""}
+        for i, name in enumerate(project_names)
+    ]
+
+    print(f"\nClassifying: {file_path}")
+    print(f"Projects in context: {[p['name'] for p in projects]}\n")
+
+    project_id, domains, new_name = await _classify_file(
+        file_path=file_path,
+        file_content=content,
+        projects=projects,
+        config_data_types=["tasks", "notes", "timelines"],
+    )
+
+    result = {
+        "project_id": project_id,
+        "matched_name": next((p["name"] for p in projects if p["id"] == project_id), None),
+        "new_project_name": new_name,
+        "domains": domains,
+    }
+    print(json.dumps(result, indent=2, ensure_ascii=False))
+
+
+if __name__ == "__main__":
+    if len(sys.argv) < 2:
+        print("Usage: python -m tests.test_classify_file <file_path> [project_name ...]")
+        sys.exit(1)
+    asyncio.run(_cli_test(sys.argv[1], sys.argv[2:]))
--- a/tests/test_deep_agent.py
+++ b/tests/test_deep_agent.py
@@ -0,0 +1,288 @@
+"""Unit tests for single-agent deep_agent flows with mocked tool results."""
+
+from __future__ import annotations
+
+from datetime import date, timedelta
+from types import SimpleNamespace
+from unittest.mock import patch
+
+import pytest
+from langchain_core.messages import AIMessage, ToolMessage
+
+from app.core.deep_agent import (
+    _infer_floating_domain,
+    _normalize_tagged_list_lines,
+    run_floating,
+    run_floating_stream,
+    run_home,
+)
+
+
+class _FakeTool:
+    name = "list_tasks"
+
+    async def ainvoke(self, args):
+        return {"rows": [{"id": "task-1", "title": "Mock Task"}], "echo": args}
+
+
+class _FakeLLM:
+    def __init__(self) -> None:
+        self.agent_calls = 0
+
+    def bind_tools(self, _tools):
+        return self
+
+    async def ainvoke(self, messages):
+        system_prompt = str(getattr(messages[0], "content", "")) if messages else ""
+        if "strict domain classifier" in system_prompt:
+            return AIMessage(content='{"type":"timeline","id":"tl-1","section":null}')
+
+        self.agent_calls += 1
+        if self.agent_calls == 1:
+            return AIMessage(
+                content="",
+                tool_calls=[
+                    {
+                        "id": "call-1",
+                        "name": "list_tasks",
+                        "args": {"project_id": "proj-1"},
+                    }
+                ],
+            )
+
+        tool_messages = [m for m in messages if isinstance(m, ToolMessage)]
+        assert tool_messages, "Expected at least one tool message"
+        return AIMessage(content=f"Final answer from mocked tool: {tool_messages[-1].content}")
+
+    async def astream(self, _messages):
+        yield SimpleNamespace(content="stream-")
+        yield SimpleNamespace(content="ok")
+
+
+@pytest.mark.asyncio
+async def test_run_home_uses_mocked_tool_result():
+    fake_llm = _FakeLLM()
+
+    with patch("app.core.deep_agent.get_llm", return_value=fake_llm), patch(
+        "app.core.deep_agent._all_tools", return_value=[_FakeTool()]
+    ):
+        out = await run_home("user-1", "list my tasks", {})
+
+    assert "Final answer from mocked tool" in out
+    assert "Mock Task" in out
+
+
+@pytest.mark.asyncio
+async def test_run_floating_stream_emits_domain_then_tokens_with_mocked_tool_result():
+    fake_llm = _FakeLLM()
+
+    with patch("app.core.deep_agent.get_llm", return_value=fake_llm), patch(
+        "app.core.deep_agent._all_tools", return_value=[_FakeTool()]
+    ):
+        events = []
+        async for event in run_floating_stream(
+            "user-1",
+            "show me timeline updates",
+            {"scope": {"type": "timeline", "id": "tl-1"}},
+        ):
+            events.append(event)
+
+    assert events[0] == (
+        "floating_domain",
+        {"type": "timeline", "id": "tl-1", "section": None},
+    )
+    assert ("token", "stream-") in events
+    assert ("token", "ok") in events
+
+
+@pytest.mark.asyncio
+async def test_infer_floating_domain_prefers_message_intent_over_scope_type():
+    class _ClassifierOnlyLLM:
+        async def ainvoke(self, _messages):
+            return AIMessage(
+                content='{"type":"project","id":"213213-312321-312312-421321","section":"task"}'
+            )
+
+    with patch("app.core.deep_agent.get_llm", return_value=_ClassifierOnlyLLM()):
+        domain = await _infer_floating_domain(
+            "Quali sono i miei task per il progetto X",
+            {
+                "scope": {"type": "timeline"},
+                "resolved_project_id": "213213-312321-312312-421321",
+            },
+        )
+
+    assert domain == {
+        "type": "project",
+        "id": "213213-312321-312312-421321",
+        "section": "task",
+    }
+
+
+def test_normalize_tagged_list_lines_rewrites_mixed_task_lines_to_tag_only_lines():
+    raw = (
+        "Certo!\n\n"
+        "1. **Task A** — priorita high <task>[task-1]</task>\n"
+        "2. **Task B** — priorita medium <task>[task-2]</task>\n"
+    )
+
+    out = _normalize_tagged_list_lines(raw, "quali sono le prossime attivita?")
+
+    assert "<task>[task-1]</task>" in out
+    assert "<task>[task-2]</task>" in out
+    assert "Task A" not in out
+    assert "Task B" not in out
+
+
+def test_normalize_tagged_list_lines_filters_upcoming_timeline_query_to_current_month_future_only():
+    today = date.today()
+    tomorrow = today + timedelta(days=1)
+    yesterday = today - timedelta(days=1)
+    next_month = (today.replace(day=28) + timedelta(days=5)).replace(day=1)
+
+    raw = "\n".join(
+        [
+            f"- Milestone old — {yesterday.strftime('%d/%m/%Y')} <timeline>[tl-old]</timeline>",
+            f"- Milestone next — {tomorrow.strftime('%d/%m/%Y')} <timeline>[tl-next]</timeline>",
+            f"- Milestone future — {next_month.strftime('%d/%m/%Y')} <timeline>[tl-future]</timeline>",
+        ]
+    )
+
+    out = _normalize_tagged_list_lines(raw, "invece i miei eventi prossimi?")
+
+    assert "<timeline>[tl-next]</timeline>" in out
+    assert "<timeline>[tl-old]</timeline>" not in out
+    assert "<timeline>[tl-future]</timeline>" not in out
+
+
+@pytest.mark.asyncio
+async def test_run_floating_strips_xml_like_tags_from_final_text():
+    fake_llm = _FakeLLM()
+
+    async def _fake_run_single_agent(**_kwargs):
+        return (
+            "Hai 1 task:\\n"
+            "Mail barra in prod <task>[180faff3-507d-4d88-aba8-66f204eb59ef]</task>"
+        )
+
+    with patch("app.core.deep_agent.get_llm", return_value=fake_llm), patch(
+        "app.core.deep_agent._run_single_agent", side_effect=_fake_run_single_agent
+    ):
+        text, _domain = await run_floating(
+            "user-1",
+            "quali task ho?",
+            {"scope": {"type": "task"}},
+        )
+
+    assert "<task>" not in text
+    assert "</task>" not in text
+    assert "[180faff3-507d-4d88-aba8-66f204eb59ef]" not in text
+
+
+@pytest.mark.asyncio
+async def test_run_floating_stream_strips_xml_like_tags_from_streamed_text():
+    fake_llm = _FakeLLM()
+
+    async def _fake_stream(**_kwargs):
+        yield "token", "Hai 1 task:\\n"
+        yield "token", "Mail barra in prod <task>[180faff3-507d-4d88-aba8-66f204eb59ef]</task>"
+
+    with patch("app.core.deep_agent.get_llm", return_value=fake_llm), patch(
+        "app.core.deep_agent._run_single_agent_stream", side_effect=_fake_stream
+    ):
+        events = []
+        async for event in run_floating_stream(
+            "user-1",
+            "quali task ho?",
+            {"scope": {"type": "task"}},
+        ):
+            events.append(event)
+
+    token_events = [str(data) for event_type, data in events if event_type == "token"]
+    combined = "".join(token_events)
+    assert "<task>" not in combined
+    assert "</task>" not in combined
+    assert "[180faff3-507d-4d88-aba8-66f204eb59ef]" not in combined
+
+
+@pytest.mark.asyncio
+async def test_run_floating_stream_falls_back_to_final_response_content_when_astream_is_empty():
+    class _NoChunkLLM:
+        def __init__(self) -> None:
+            self.calls = 0
+
+        def bind_tools(self, _tools):
+            return self
+
+        async def ainvoke(self, _messages):
+            self.calls += 1
+            if self.calls == 1:
+                return AIMessage(
+                    content="",
+                    tool_calls=[
+                        {
+                            "id": "call-1",
+                            "name": "list_tasks",
+                            "args": {},
+                        }
+                    ],
+                )
+            return AIMessage(content="No notes found.")
+
+        async def astream(self, _messages):
+            if False:
+                yield None
+
+    with patch("app.core.deep_agent.get_llm", return_value=_NoChunkLLM()), patch(
+        "app.core.deep_agent._all_tools", return_value=[_FakeTool()]
+    ):
+        events = []
+        async for event in run_floating_stream(
+            "user-1",
+            "quali sono le note?",
+            {"scope": {"type": "note"}},
+        ):
+            events.append(event)
+
+    assert events[0][0] == "floating_domain"
+    assert ("token", "No notes found.") in events
+
+
+@pytest.mark.asyncio
+async def test_run_floating_returns_fallback_when_sanitization_would_empty_text():
+    fake_llm = _FakeLLM()
+
+    async def _fake_run_single_agent(**_kwargs):
+        return "<task>[180faff3-507d-4d88-aba8-66f204eb59ef]</task>"
+
+    with patch("app.core.deep_agent.get_llm", return_value=fake_llm), patch(
+        "app.core.deep_agent._run_single_agent", side_effect=_fake_run_single_agent
+    ):
+        text, _domain = await run_floating(
+            "user-1",
+            "quali task ho?",
+            {"scope": {"type": "task"}},
+        )
+
+    assert text == "No results found."
+
+
+@pytest.mark.asyncio
+async def test_run_floating_stream_returns_fallback_when_sanitization_would_empty_text():
+    fake_llm = _FakeLLM()
+
+    async def _fake_stream(**_kwargs):
+        yield "token", "<task>[180faff3-507d-4d88-aba8-66f204eb59ef]</task>"
+
+    with patch("app.core.deep_agent.get_llm", return_value=fake_llm), patch(
+        "app.core.deep_agent._run_single_agent_stream", side_effect=_fake_stream
+    ):
+        events = []
+        async for event in run_floating_stream(
+            "user-1",
+            "quali task ho?",
+            {"scope": {"type": "task"}},
+        ):
+            events.append(event)
+
+    assert ("token", "No results found.") in events
--- a/tests/test_device_ws.py
+++ b/tests/test_device_ws.py
@@ -0,0 +1,362 @@
+"""Tests for Step 3.3: DeviceConnectionManager and device WS endpoint.
+
+Coverage:
+  Unit tests  — DeviceConnectionManager register/unregister/is_online/
+                get_ws/send_frame/pending-call round-trip/agent-data queue
+  Integration — /api/v1/ws/device endpoint via TestClient WebSocket:
+                auth rejection, happy-path connect, tool_result dispatch,
+                agent_data queue routing, agent_complete sentinel, disconnect
+                cleanup (AgentRunLog marked as error)
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import uuid
+from datetime import datetime, timezone
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+import pytest_asyncio
+
+from app.core.device_manager import DeviceConnection, DeviceConnectionManager
+from app.db import get_session
+from app.main import app
+from app.models import AgentRunLog
+from tests.conftest import TEST_USER_IDS, auth_header, make_jwt
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+_FREE_UID = TEST_USER_IDS["free"]
+_PRO_UID = TEST_USER_IDS["pro"]
+
+
+def _device_hello(device_id: str = "dev-001", agent_ids: list[str] | None = None) -> str:
+    return json.dumps(
+        {"type": "device_hello", "device_id": device_id, "agent_ids": agent_ids or []}
+    )
+
+
+# ---------------------------------------------------------------------------
+# DB override (shared across integration tests)
+# ---------------------------------------------------------------------------
+
+@pytest.fixture(autouse=True)
+def _override_db(db_session):
+    """Route all get_session calls to the test SQLite session."""
+
+    async def _gen():
+        yield db_session
+
+    app.dependency_overrides[get_session] = _gen
+    yield
+    app.dependency_overrides.pop(get_session, None)
+
+
+# ---------------------------------------------------------------------------
+# DeviceConnectionManager unit tests
+# ---------------------------------------------------------------------------
+
+@pytest.fixture()
+def manager() -> DeviceConnectionManager:
+    """Fresh manager instance for each test."""
+    return DeviceConnectionManager()
+
+
+@pytest.fixture()
+def mock_ws() -> MagicMock:
+    ws = MagicMock()
+    ws.send_text = AsyncMock()
+    return ws
+
+
+def test_manager_register_and_is_online(manager, mock_ws):
+    assert not manager.is_online("user1")
+    manager.register("user1", "dev-A", mock_ws)
+    assert manager.is_online("user1")
+    assert manager.is_online("user1", "dev-A")
+    assert not manager.is_online("user1", "dev-B")
+
+
+def test_manager_get_ws_returns_none_when_offline(manager):
+    assert manager.get_ws("no-such-user") is None
+
+
+def test_manager_unregister(manager, mock_ws):
+    manager.register("user1", "dev-A", mock_ws)
+    assert manager.is_online("user1")
+    manager.unregister("user1")
+    assert not manager.is_online("user1")
+    assert manager.get_ws("user1") is None
+
+
+def test_manager_unregister_unknown_is_noop(manager):
+    # Must not raise.
+    manager.unregister("ghost")
+
+
+def test_manager_replace_connection_cancels_old_futures(manager):
+    ws_a = MagicMock()
+    ws_a.send_text = AsyncMock()
+    ws_b = MagicMock()
+    ws_b.send_text = AsyncMock()
+
+    # Create event loop context for Future.
+    loop = asyncio.new_event_loop()
+    try:
+        async def _run():
+            manager.register("user1", "dev-A", ws_a)
+            fut = manager.create_pending_call("user1", "call-1")
+            # Replace connection — old future should be cancelled.
+            manager.register("user1", "dev-B", ws_b)
+            assert fut.cancelled()
+
+        loop.run_until_complete(_run())
+    finally:
+        loop.close()
+
+
+@pytest.mark.asyncio
+async def test_manager_send_frame(manager, mock_ws):
+    manager.register("user1", "dev-A", mock_ws)
+    await manager.send_frame("user1", {"type": "ping"})
+    mock_ws.send_text.assert_called_once_with(json.dumps({"type": "ping"}))
+
+
+@pytest.mark.asyncio
+async def test_manager_send_frame_raises_when_offline(manager):
+    with pytest.raises(RuntimeError, match="not connected"):
+        await manager.send_frame("ghost", {"type": "ping"})
+
+
+@pytest.mark.asyncio
+async def test_manager_pending_call_round_trip(manager, mock_ws):
+    manager.register("user1", "dev-A", mock_ws)
+    fut = manager.create_pending_call("user1", "call-42")
+    result = {"type": "tool_result", "id": "call-42", "rows": [{"id": "row1"}]}
+    manager.resolve_pending_call("user1", "call-42", result)
+    assert fut.done()
+    assert await fut == result
+
+
+@pytest.mark.asyncio
+async def test_manager_resolve_unknown_call_is_noop(manager, mock_ws):
+    manager.register("user1", "dev-A", mock_ws)
+    # Should not raise.
+    manager.resolve_pending_call("user1", "no-such-call", {})
+
+
+@pytest.mark.asyncio
+async def test_manager_unregister_cancels_pending_calls(manager, mock_ws):
+    manager.register("user1", "dev-A", mock_ws)
+    fut = manager.create_pending_call("user1", "call-1")
+    manager.unregister("user1")
+    assert fut.cancelled()
+
+
+@pytest.mark.asyncio
+async def test_manager_agent_data_queue(manager, mock_ws):
+    manager.register("user1", "dev-A", mock_ws)
+    q = manager.get_agent_data_queue("user1", "run-xyz")
+    # Put a frame and get it back.
+    frame = {"type": "agent_data", "run_id": "run-xyz", "files": []}
+    await q.put(frame)
+    assert await q.get() == frame
+
+
+@pytest.mark.asyncio
+async def test_manager_agent_data_queue_creates_once(manager, mock_ws):
+    manager.register("user1", "dev-A", mock_ws)
+    q1 = manager.get_agent_data_queue("user1", "run-1")
+    q2 = manager.get_agent_data_queue("user1", "run-1")
+    assert q1 is q2
+
+
+@pytest.mark.asyncio
+async def test_manager_agent_data_queue_raises_when_offline(manager):
+    with pytest.raises(RuntimeError, match="not connected"):
+        manager.get_agent_data_queue("ghost", "run-1")
+
+
+@pytest.mark.asyncio
+async def test_manager_cleanup_agent_data_queue(manager, mock_ws):
+    manager.register("user1", "dev-A", mock_ws)
+    manager.get_agent_data_queue("user1", "run-1")
+    manager.cleanup_agent_data_queue("user1", "run-1")
+    # After cleanup a new queue is created (not the same object).
+    q_new = manager.get_agent_data_queue("user1", "run-1")
+    assert q_new is not None
+
+
+# ---------------------------------------------------------------------------
+# Integration tests — /api/v1/ws/device endpoint
+# ---------------------------------------------------------------------------
+
+def test_ws_device_rejects_without_token(client):
+    with pytest.raises(Exception):
+        # TestClient will raise or close when the server rejects.
+        with client.websocket_connect("/api/v1/ws/device") as ws:
+            ws.receive_text()
+
+
+def test_ws_device_rejects_invalid_token(client):
+    with pytest.raises(Exception):
+        with client.websocket_connect("/api/v1/ws/device?token=badtoken") as ws:
+            ws.receive_text()
+
+
+def test_ws_device_happy_path(client):
+    """Connect, send device_hello, receive ping, then close."""
+    token = make_jwt(tier="free")
+
+    # Patch the heartbeat sleep so the test doesn't block 30 s.
+    with patch("app.api.routes.device_ws._HEARTBEAT_INTERVAL", 0.01):
+        with client.websocket_connect(f"/api/v1/ws/device?token={token}") as ws:
+            ws.send_text(_device_hello("dev-001"))
+            # Next message from server should be a heartbeat ping (interval=0.01s).
+            msg = ws.receive_text()
+            data = json.loads(msg)
+            assert data["type"] == "ping"
+            # Close gracefully.
+            ws.close()
+
+
+def test_ws_device_invalid_first_frame_closes(client):
+    """Non-device_hello first frame should close the connection."""
+    token = make_jwt(tier="free")
+    with pytest.raises(Exception):
+        with client.websocket_connect(f"/api/v1/ws/device?token={token}") as ws:
+            ws.send_text(json.dumps({"type": "chat_request", "message": "hi"}))
+            ws.receive_text()  # server should close after bad frame
+
+
+def test_ws_device_tool_result_dispatched(client):
+    """tool_result frame is routed to the DeviceConnectionManager."""
+    token = make_jwt(tier="free")
+    user_id = TEST_USER_IDS["free"]
+
+    from app.core.device_manager import device_manager as dm
+
+    captured: list[dict] = []
+
+    original_resolve = dm.resolve_pending_call
+
+    def _spy(uid, call_id, result):
+        captured.append({"uid": uid, "call_id": call_id, "result": result})
+        original_resolve(uid, call_id, result)
+
+    with patch.object(dm, "resolve_pending_call", side_effect=_spy):
+        with patch("app.api.routes.device_ws._HEARTBEAT_INTERVAL", 9999):
+            with client.websocket_connect(f"/api/v1/ws/device?token={token}") as ws:
+                ws.send_text(_device_hello("dev-001"))
+                # Send a tool_result frame.
+                ws.send_text(
+                    json.dumps(
+                        {
+                            "type": "tool_result",
+                            "id": "call-123",
+                            "rows": [{"id": "task-1", "title": "Buy milk"}],
+                        }
+                    )
+                )
+                ws.close()
+
+    assert any(c["call_id"] == "call-123" for c in captured)
+
+
+def test_ws_device_agent_data_enqueued(client):
+    """agent_data frame is placed in the per-run queue by the message loop."""
+    from app.core.device_manager import device_manager as dm
+
+    token = make_jwt(tier="free")
+    user_id = TEST_USER_IDS["free"]
+
+    # Capture the queue object the message loop accesses.
+    captured_queue: list[asyncio.Queue] = []
+    original_get_queue = dm.get_agent_data_queue
+
+    def _spy_get_queue(uid, run_id):
+        q = original_get_queue(uid, run_id)
+        if not captured_queue:
+            captured_queue.append(q)
+        return q
+
+    with patch.object(dm, "get_agent_data_queue", side_effect=_spy_get_queue):
+        with patch("app.api.routes.device_ws._HEARTBEAT_INTERVAL", 9999):
+            with client.websocket_connect(f"/api/v1/ws/device?token={token}") as ws:
+                ws.send_text(_device_hello("dev-001"))
+                ws.send_text(
+                    json.dumps(
+                        {
+                            "type": "agent_data",
+                            "run_id": "run-XYZ",
+                            "files": [{"path": "/tmp/file.txt", "content": "hello"}],
+                        }
+                    )
+                )
+                ws.close()
+
+    # The queue should have received exactly one frame.
+    assert captured_queue, "queue was never accessed"
+    assert not captured_queue[0].empty()
+
+
+def test_ws_device_disconnect_marks_run_logs_as_error(client, db_session):
+    """On disconnect, _mark_runs_disconnected is called with the correct user_id."""
+    from app.api.routes import device_ws as _dws
+
+    token = make_jwt(tier="free")
+    user_id = TEST_USER_IDS["free"]
+
+    cleanup_calls: list[str] = []
+
+    async def _fake_cleanup(uid: str) -> None:
+        cleanup_calls.append(uid)
+
+    with patch.object(_dws, "_mark_runs_disconnected", side_effect=_fake_cleanup):
+        with patch("app.api.routes.device_ws._HEARTBEAT_INTERVAL", 9999):
+            with client.websocket_connect(f"/api/v1/ws/device?token={token}") as ws:
+                ws.send_text(_device_hello("dev-001"))
+                ws.close()
+
+    assert user_id in cleanup_calls
+
+
+@pytest.mark.asyncio
+async def test_mark_runs_disconnected_updates_db(db_session):
+    """_mark_runs_disconnected marks in-progress runs as error in the DB."""
+    from sqlalchemy import select
+
+    from app.api.routes.device_ws import _mark_runs_disconnected
+    from tests.conftest import _TestSessionLocal
+
+    user_id = TEST_USER_IDS["free"]
+
+    run_log = AgentRunLog(
+        id=str(uuid.uuid4()),
+        agent_id=str(uuid.uuid4()),
+        agent_type="local",
+        user_id=user_id,
+        status="running",
+        started_at=datetime.now(timezone.utc),
+    )
+    db_session.add(run_log)
+    await db_session.commit()
+
+    # Route the function to the same test-DB session factory.
+    with patch("app.api.routes.device_ws.async_session", _TestSessionLocal):
+        await _mark_runs_disconnected(user_id)
+
+    # Verify through the same session factory.
+    async with _TestSessionLocal() as s:
+        result = await s.execute(
+            select(AgentRunLog).where(AgentRunLog.id == run_log.id)
+        )
+        updated = result.scalar_one_or_none()
+
+    assert updated is not None
+    assert updated.status == "error"
+    assert updated.errors and "device disconnected" in updated.errors
--- a/tests/test_execution_plan.py
+++ b/tests/test_execution_plan.py
@@ -1,286 +0,0 @@
-"""Tests for execution_plan: PromptTemplateRegistry, ExecutionPlanBuilder, PlanCache."""
-
-from __future__ import annotations
-
-import pytest
-
-from app.core.execution_plan import (
-    ExecutionPlanBuilder,
-    PlanCache,
-    PromptTemplateRegistry,
-    plan_cache,
-    template_registry,
-)
-from app.schemas import ExecutionPlan
-
-
-# ── PromptTemplateRegistry ────────────────────────────────────────────
-
-
-class TestPromptTemplateRegistry:
-    def test_register_and_get(self) -> None:
-        reg = PromptTemplateRegistry()
-        reg.register("tpl_foo", "You are a foo agent.")
-        assert reg.get("tpl_foo") == "You are a foo agent."
-
-    def test_get_unknown_raises_key_error(self) -> None:
-        reg = PromptTemplateRegistry()
-        with pytest.raises(KeyError, match="tpl_missing"):
-            reg.get("tpl_missing")
-
-    def test_has_returns_true_for_registered(self) -> None:
-        reg = PromptTemplateRegistry()
-        reg.register("tpl_x", "prompt text")
-        assert reg.has("tpl_x") is True
-
-    def test_has_returns_false_for_unregistered(self) -> None:
-        reg = PromptTemplateRegistry()
-        assert reg.has("tpl_missing") is False
-
-    def test_list_ids_returns_all_registered_ids(self) -> None:
-        reg = PromptTemplateRegistry()
-        reg.register("tpl_a", "a")
-        reg.register("tpl_b", "b")
-        assert set(reg.list_ids()) == {"tpl_a", "tpl_b"}
-
-    def test_list_ids_does_not_return_prompt_text(self) -> None:
-        reg = PromptTemplateRegistry()
-        reg.register("tpl_secret", "top secret prompt")
-        ids = reg.list_ids()
-        assert "top secret prompt" not in ids
-
-    def test_overwrite_existing_template(self) -> None:
-        reg = PromptTemplateRegistry()
-        reg.register("tpl_x", "v1")
-        reg.register("tpl_x", "v2")
-        assert reg.get("tpl_x") == "v2"
-
-    def test_empty_registry_has_no_ids(self) -> None:
-        reg = PromptTemplateRegistry()
-        assert reg.list_ids() == []
-
-
-# ── ExecutionPlanBuilder ──────────────────────────────────────────────
-
-
-class TestExecutionPlanBuilder:
-    def test_builds_empty_plan(self) -> None:
-        plan = ExecutionPlanBuilder("task_agent").build()
-        assert plan.agent == "task_agent"
-        assert plan.steps == []
-
-    def test_add_step_basic(self) -> None:
-        plan = (
-            ExecutionPlanBuilder("task_agent")
-            .add_step("create_task", {"priority": "high"})
-            .build()
-        )
-        assert len(plan.steps) == 1
-        assert plan.steps[0].action == "create_task"
-        assert plan.steps[0].variables == {"priority": "high"}
-        assert plan.steps[0].prompt_template is None
-        assert plan.steps[0].data_from_step is None
-
-    def test_add_step_no_params(self) -> None:
-        plan = ExecutionPlanBuilder("task_agent").add_step("fetch").build()
-        assert plan.steps[0].variables is None
-
-    def test_add_llm_step(self) -> None:
-        plan = (
-            ExecutionPlanBuilder("task_agent")
-            .add_llm_step("tpl_task_default", {"message": "hi"})
-            .build()
-        )
-        assert plan.steps[0].action == "llm"
-        assert plan.steps[0].prompt_template == "tpl_task_default"
-        assert plan.steps[0].variables == {"message": "hi"}
-
-    def test_add_llm_step_no_variables(self) -> None:
-        plan = ExecutionPlanBuilder("task_agent").add_llm_step("tpl_x").build()
-        assert plan.steps[0].variables is None
-
-    def test_add_data_step(self) -> None:
-        plan = (
-            ExecutionPlanBuilder("task_agent")
-            .add_step("fetch_data")
-            .add_data_step("transform", data_from_step=0)
-            .build()
-        )
-        assert plan.steps[1].action == "transform"
-        assert plan.steps[1].data_from_step == 0
-
-    def test_fluent_chaining_returns_builder(self) -> None:
-        builder = ExecutionPlanBuilder("analytics_agent")
-        result = builder.add_step("a")
-        assert result is builder
-
-    def test_fluent_chain_multiple_steps(self) -> None:
-        plan = (
-            ExecutionPlanBuilder("analytics_agent")
-            .add_llm_step("tpl_analytics_default")
-            .add_step("format_output")
-            .add_data_step("store", data_from_step=0)
-            .build()
-        )
-        assert len(plan.steps) == 3
-
-    def test_build_validates_data_from_step_out_of_range(self) -> None:
-        with pytest.raises(ValueError, match="data_from_step"):
-            ExecutionPlanBuilder("task_agent").add_data_step("bad", data_from_step=5).build()
-
-    def test_build_validates_data_from_step_self_reference(self) -> None:
-        """data_from_step=0 on the first step (index 0) is invalid."""
-        with pytest.raises(ValueError, match="data_from_step"):
-            ExecutionPlanBuilder("task_agent").add_data_step("bad", data_from_step=0).build()
-
-    def test_build_validates_data_from_step_negative(self) -> None:
-        with pytest.raises(ValueError, match="data_from_step"):
-            ExecutionPlanBuilder("task_agent").add_data_step("bad", data_from_step=-1).build()
-
-    def test_valid_data_from_step_at_index_two(self) -> None:
-        plan = (
-            ExecutionPlanBuilder("task_agent")
-            .add_step("step0")
-            .add_step("step1")
-            .add_data_step("step2", data_from_step=1)
-            .build()
-        )
-        assert plan.steps[2].data_from_step == 1
-
-    def test_data_from_step_zero_valid_at_index_one(self) -> None:
-        plan = (
-            ExecutionPlanBuilder("task_agent")
-            .add_step("step0")
-            .add_data_step("step1", data_from_step=0)
-            .build()
-        )
-        assert plan.steps[1].data_from_step == 0
-
-    def test_build_returns_new_plan_each_call(self) -> None:
-        builder = ExecutionPlanBuilder("task_agent").add_step("do_thing")
-        plan1 = builder.build()
-        plan2 = builder.build()
-        assert plan1 is not plan2
-        assert plan1.steps == plan2.steps
-
-    def test_plan_is_execution_plan_instance(self) -> None:
-        plan = ExecutionPlanBuilder("task_agent").build()
-        assert isinstance(plan, ExecutionPlan)
-
-
-# ── PlanCache ─────────────────────────────────────────────────────────
-
-
-class TestPlanCache:
-    def _plan(self, agent: str = "a") -> ExecutionPlan:
-        return ExecutionPlanBuilder(agent).build()
-
-    def test_cache_and_get(self) -> None:
-        cache = PlanCache()
-        plan = self._plan()
-        cache.cache_plan("key1", plan)
-        assert cache.get_plan("key1") is plan
-
-    def test_get_missing_returns_none(self) -> None:
-        cache = PlanCache()
-        assert cache.get_plan("nonexistent") is None
-
-    def test_get_all_playbooks_empty(self) -> None:
-        cache = PlanCache()
-        assert cache.get_all_playbooks() == []
-
-    def test_get_all_playbooks_returns_all_stored(self) -> None:
-        cache = PlanCache()
-        p1, p2 = self._plan("a"), self._plan("b")
-        cache.cache_plan("k1", p1)
-        cache.cache_plan("k2", p2)
-        playbooks = cache.get_all_playbooks()
-        assert len(playbooks) == 2
-        assert p1 in playbooks
-        assert p2 in playbooks
-
-    def test_lru_evicts_oldest_entry(self) -> None:
-        cache = PlanCache(maxsize=2)
-        p1, p2, p3 = self._plan("a"), self._plan("b"), self._plan("c")
-        cache.cache_plan("k1", p1)
-        cache.cache_plan("k2", p2)
-        cache.cache_plan("k3", p3)  # k1 should be evicted
-        assert cache.get_plan("k1") is None
-        assert cache.get_plan("k2") is p2
-        assert cache.get_plan("k3") is p3
-
-    def test_lru_access_updates_recency(self) -> None:
-        cache = PlanCache(maxsize=2)
-        p1, p2, p3 = self._plan("a"), self._plan("b"), self._plan("c")
-        cache.cache_plan("k1", p1)
-        cache.cache_plan("k2", p2)
-        cache.get_plan("k1")        # k1 is now most-recently used
-        cache.cache_plan("k3", p3)  # k2 should be evicted (LRU)
-        assert cache.get_plan("k1") is p1
-        assert cache.get_plan("k2") is None
-        assert cache.get_plan("k3") is p3
-
-    def test_overwrite_existing_key(self) -> None:
-        cache = PlanCache()
-        p1, p2 = self._plan("a"), self._plan("b")
-        cache.cache_plan("same_key", p1)
-        cache.cache_plan("same_key", p2)
-        assert cache.get_plan("same_key") is p2
-        assert len(cache.get_all_playbooks()) == 1
-
-    def test_overwrite_does_not_consume_capacity(self) -> None:
-        cache = PlanCache(maxsize=2)
-        p1, p2 = self._plan("a"), self._plan("b")
-        cache.cache_plan("k1", p1)
-        cache.cache_plan("k1", p2)  # overwrite, not a new slot
-        cache.cache_plan("k2", p1)  # should fit without eviction
-        assert cache.get_plan("k1") is p2
-        assert cache.get_plan("k2") is p1
-
-
-# ── Module-level singletons ───────────────────────────────────────────
-
-
-class TestModuleSingletons:
-    def test_template_registry_has_all_agent_defaults(self) -> None:
-        for agent in ("task_agent", "checkpoint_agent", "project_agent", "note_agent"):
-            assert template_registry.has(f"tpl_{agent}_default"), (
-                f"Missing template: tpl_{agent}_default"
-            )
-
-    def test_template_registry_has_operation_templates(self) -> None:
-        assert template_registry.has("tpl_task_extract_from_project")
-        assert template_registry.has("tpl_note_weekly_summary")
-
-    def test_template_registry_get_returns_non_empty_string(self) -> None:
-        text = template_registry.get("tpl_task_agent_default")
-        assert isinstance(text, str)
-        assert len(text) > 0
-
-    def test_plan_cache_has_prebuilt_playbooks(self) -> None:
-        assert len(plan_cache.get_all_playbooks()) >= 2
-
-    def test_playbook_create_tasks_from_project(self) -> None:
-        plan = plan_cache.get_plan("create_tasks_from_project")
-        assert plan is not None
-        assert plan.agent == "project_agent"
-        assert len(plan.steps) == 2
-        assert plan.steps[0].prompt_template == "tpl_task_extract_from_project"
-        assert plan.steps[1].data_from_step == 0
-
-    def test_playbook_generate_weekly_note(self) -> None:
-        plan = plan_cache.get_plan("generate_weekly_note")
-        assert plan is not None
-        assert plan.agent == "note_agent"
-        assert len(plan.steps) == 2
-        assert plan.steps[0].prompt_template == "tpl_note_weekly_summary"
-        assert plan.steps[1].data_from_step == 0
-
-    def test_playbook_steps_have_no_raw_prompt_text(self) -> None:
-        """Plans must not embed prompt text — only template IDs."""
-        for plan in plan_cache.get_all_playbooks():
-            for step in plan.steps:
-                if step.prompt_template is not None:
-                    assert step.prompt_template.startswith("tpl_"), (
-                        f"prompt_template looks like raw text: {step.prompt_template!r}"
-                    )
--- a/tests/test_integrations.py
+++ b/tests/test_integrations.py
@@ -0,0 +1,729 @@
+"""Tests for Step 3.6: cloud provider integration clients.
+
+Coverage:
+  Unit \u2014 app/integrations/__init__.py:
+    - encrypt_token / decrypt_token round-trip
+    - decrypt_token raises ValueError on invalid ciphertext
+    - encrypt_token raises ValueError on empty/non-dict input
+    - _get_fernet raises RuntimeError when OAUTH_ENCRYPTION_KEY not set
+    - get_provider returns GmailClient for 'gmail'
+    - get_provider returns MSGraphClient for 'outlook' and 'teams'
+    - get_provider raises ValueError for unknown provider
+
+  Unit \u2014 app/integrations/gmail.py:
+    - _build_gmail_query with no filter returns empty string
+    - _build_gmail_query with labels builds label: expr
+    - _build_gmail_query with senders builds from: expr
+    - _build_gmail_query with date_range builds after:/before: exprs
+    - _build_gmail_query since overrides date_range.from when more recent
+    - _build_gmail_query date_range.from overrides since when more recent
+    - _parse_body extracts text/plain part
+    - _parse_body extracts text/html part (stripped)
+    - _parse_body recurses into multipart, prefers text/plain
+    - GmailClient.fetch_messages: happy path with mocked service
+    - GmailClient.fetch_messages: no messages returns empty list
+    - GmailClient.fetch_messages: HTTP error on messages.list raises RuntimeError
+    - GmailClient.refreshed_credentials: None when token unchanged
+    - GmailClient.refreshed_credentials: returns dict when token changes
+
+  Unit \u2014 app/integrations/ms_graph.py:
+    - _build_email_filter with no filter returns empty string
+    - _build_email_filter with senders builds OData from clause
+    - _build_email_filter with since builds receivedDateTime ge clause
+    - MSGraphClient.fetch_emails: happy path with mocked httpx
+    - MSGraphClient.fetch_emails: 401 triggers token refresh and retries
+    - MSGraphClient.fetch_messages: happy path with mocked httpx
+    - MSGraphClient.fetch_messages: 403 from getAllMessages degrades gracefully
+    - MSGraphClient.refreshed_credentials: None when token unchanged
+    - MSGraphClient._refresh_access_token: MSAL error raises RuntimeError
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import uuid
+from datetime import datetime, timezone
+from unittest.mock import AsyncMock, MagicMock, Mock, PropertyMock, patch
+
+import pytest
+
+from app.integrations import (
+    ChatMessage,
+    EmailMessage,
+    decrypt_token,
+    encrypt_token,
+    get_provider,
+)
+
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+# Helpers
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+
+_FERNET_KEY = "eW91LXNob3VsZC1ub3QtdXNlLXRoaXMta2V5LWluLXByb2Q="
+# ^ 32-char URL-safe base64  (generated for tests only; not a real Fernet key length,
+#   so we generate a proper one below)
+
+from cryptography.fernet import Fernet as _Fernet  # noqa: E402
+
+_VALID_KEY = _Fernet.generate_key().decode("utf-8")
+
+_TOKEN_DICT = {
+    "token": "access_abc",
+    "refresh_token": "refresh_xyz",
+    "token_uri": "https://oauth2.googleapis.com/token",
+    "client_id": "client_id_123",
+    "client_secret": "client_secret_456",
+    "scopes": ["https://www.googleapis.com/auth/gmail.readonly"],
+}
+
+_MS_TOKEN_DICT = {
+    "access_token": "ms_access_abc",
+    "refresh_token": "ms_refresh_xyz",
+    "token_type": "Bearer",
+    "scope": "Mail.Read offline_access",
+}
+
+
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+# encrypt_token / decrypt_token
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+
+
+class TestTokenEncryption:
+    """encrypt_token / decrypt_token round-trip tests."""
+
+    def test_round_trip(self):
+        with patch("app.integrations.settings") as mock_settings:
+            mock_settings.OAUTH_ENCRYPTION_KEY = _VALID_KEY
+            encrypted = encrypt_token(_TOKEN_DICT)
+            assert isinstance(encrypted, str)
+            assert encrypted != json.dumps(_TOKEN_DICT)  # must be ciphertext, not plaintext
+            recovered = decrypt_token(encrypted)
+            assert recovered == _TOKEN_DICT
+
+    def test_decrypt_invalid_ciphertext_raises_value_error(self):
+        with patch("app.integrations.settings") as mock_settings:
+            mock_settings.OAUTH_ENCRYPTION_KEY = _VALID_KEY
+            with pytest.raises(ValueError, match="Failed to decrypt"):
+                decrypt_token("this-is-not-valid-fernet-ciphertext")
+
+    def test_decrypt_wrong_key_raises_value_error(self):
+        """Decrypting with a different key must fail with ValueError."""
+        other_key = _Fernet.generate_key().decode("utf-8")
+        with patch("app.integrations.settings") as mock_settings:
+            mock_settings.OAUTH_ENCRYPTION_KEY = _VALID_KEY
+            encrypted = encrypt_token(_TOKEN_DICT)
+        with patch("app.integrations.settings") as mock_settings2:
+            mock_settings2.OAUTH_ENCRYPTION_KEY = other_key
+            with pytest.raises(ValueError, match="Failed to decrypt"):
+                decrypt_token(encrypted)
+
+    def test_encrypt_empty_dict_raises_value_error(self):
+        with patch("app.integrations.settings") as mock_settings:
+            mock_settings.OAUTH_ENCRYPTION_KEY = _VALID_KEY
+            with pytest.raises(ValueError, match="non-empty dict"):
+                encrypt_token({})
+
+    def test_encrypt_non_dict_raises_value_error(self):
+        with patch("app.integrations.settings") as mock_settings:
+            mock_settings.OAUTH_ENCRYPTION_KEY = _VALID_KEY
+            with pytest.raises(ValueError, match="non-empty dict"):
+                encrypt_token("not-a-dict")  # type: ignore[arg-type]
+
+    def test_missing_key_raises_runtime_error(self):
+        with patch("app.integrations.settings") as mock_settings:
+            mock_settings.OAUTH_ENCRYPTION_KEY = ""
+            with pytest.raises(RuntimeError, match="OAUTH_ENCRYPTION_KEY"):
+                encrypt_token(_TOKEN_DICT)
+
+    def test_email_message_as_text(self):
+        msg = EmailMessage(
+            id="m1",
+            subject="Hello",
+            sender="alice@example.com",
+            body_text="Test body",
+            date=datetime(2025, 6, 1, 10, 0, tzinfo=timezone.utc),
+        )
+        text = msg.as_text
+        assert "From: alice@example.com" in text
+        assert "Subject: Hello" in text
+        assert "Test body" in text
+
+    def test_chat_message_as_text(self):
+        msg = ChatMessage(
+            id="c1",
+            content="Buy milk",
+            sender="bob",
+            channel="general",
+            date=datetime(2025, 6, 1, 10, 0, tzinfo=timezone.utc),
+        )
+        text = msg.as_text
+        assert "From: bob" in text
+        assert "channel: general" in text
+        assert "Buy milk" in text
+
+
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+# get_provider factory
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+
+
+class TestGetProvider:
+    def test_gmail_returns_gmail_client(self):
+        from app.integrations.gmail import GmailClient
+
+        client = get_provider("gmail", _TOKEN_DICT)
+        assert isinstance(client, GmailClient)
+
+    def test_outlook_returns_ms_graph_client(self):
+        from app.integrations.ms_graph import MSGraphClient
+
+        client = get_provider("outlook", _MS_TOKEN_DICT)
+        assert isinstance(client, MSGraphClient)
+
+    def test_teams_returns_ms_graph_client(self):
+        from app.integrations.ms_graph import MSGraphClient
+
+        client = get_provider("teams", _MS_TOKEN_DICT)
+        assert isinstance(client, MSGraphClient)
+
+    def test_unknown_provider_raises_value_error(self):
+        with pytest.raises(ValueError, match="Unknown cloud provider"):
+            get_provider("slack", {})
+
+
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+# Gmail client \u2014 query builder
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+
+
+class TestBuildGmailQuery:
+    """Unit tests for gmail._build_gmail_query."""
+
+    def setup_method(self):
+        from app.integrations.gmail import _build_gmail_query
+        self._fn = _build_gmail_query
+
+    def test_empty_returns_empty_string(self):
+        assert self._fn(None, None) == ""
+
+    def test_single_label(self):
+        q = self._fn({"labels": ["INBOX"]}, None)
+        assert "label:INBOX" in q
+
+    def test_multiple_labels_joined_with_or(self):
+        q = self._fn({"labels": ["INBOX", "work"]}, None)
+        assert "label:INBOX OR label:work" in q
+
+    def test_senders(self):
+        q = self._fn({"senders": ["alice@example.com"]}, None)
+        assert "from:alice@example.com" in q
+
+    def test_date_range_from(self):
+        q = self._fn({"date_range": {"from": "2025-01-15"}}, None)
+        assert "after:2025/01/15" in q
+
+    def test_date_range_to(self):
+        q = self._fn({"date_range": {"to": "2025-03-01"}}, None)
+        assert "before:2025/03/01" in q
+
+    def test_since_overrides_earlier_date_range_from(self):
+        """since=Feb is more recent than date_range.from=Jan, so after: should be Feb."""
+        since = datetime(2025, 2, 1, tzinfo=timezone.utc)
+        q = self._fn({"date_range": {"from": "2025-01-01"}}, since)
+        assert "after:2025/02/01" in q
+        assert "after:2025/01/01" not in q
+
+    def test_date_range_from_overrides_earlier_since(self):
+        """date_range.from=Feb is more recent than since=Jan, so after: should be Feb."""
+        since = datetime(2025, 1, 1, tzinfo=timezone.utc)
+        q = self._fn({"date_range": {"from": "2025-02-01"}}, since)
+        assert "after:2025/02/01" in q
+
+    def test_invalid_date_ignored(self):
+        """An invalid date string in filter_config must not raise, just be skipped."""
+        q = self._fn({"date_range": {"from": "not-a-date"}}, None)
+        assert "after:" not in q
+
+
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+# Gmail client \u2014 body parsing
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+
+
+class TestParseBody:
+    """Unit tests for gmail._parse_body."""
+
+    def setup_method(self):
+        from app.integrations.gmail import _parse_body
+        self._fn = _parse_body
+
+    def _encode(self, text: str) -> str:
+        import base64
+        return base64.urlsafe_b64encode(text.encode()).decode()
+
+    def test_text_plain_extracted(self):
+        payload = {
+            "mimeType": "text/plain",
+            "body": {"data": self._encode("Hello world")},
+        }
+        assert self._fn(payload) == "Hello world"
+
+    def test_text_html_stripped(self):
+        payload = {
+            "mimeType": "text/html",
+            "body": {"data": self._encode("<p>Hello <b>world</b></p>")},
+        }
+        result = self._fn(payload)
+        assert "Hello" in result
+        assert "<p>" not in result
+
+    def test_multipart_prefers_plain_over_html(self):
+        plain_data = self._encode("Plain text")
+        html_data = self._encode("<p>HTML text</p>")
+        payload = {
+            "mimeType": "multipart/alternative",
+            "body": {},
+            "parts": [
+                {"mimeType": "text/html", "body": {"data": html_data}},
+                {"mimeType": "text/plain", "body": {"data": plain_data}},
+            ],
+        }
+        result = self._fn(payload)
+        assert result == "Plain text"
+
+    def test_empty_payload_returns_empty_string(self):
+        assert self._fn({}) == ""
+
+
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+# GmailClient.fetch_messages
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+
+
+def _make_gmail_message(
+    msg_id: str = "msg001",
+    subject: str = "Test email",
+    sender: str = "alice@example.com",
+    body_text: str = "Hello world",
+    date: str = "Mon, 01 Jan 2025 10:00:00 +0000",
+) -> dict:
+    """Build a minimal Gmail API message response dict."""
+    import base64
+    body_data = base64.urlsafe_b64encode(body_text.encode()).decode()
+    return {
+        "id": msg_id,
+        "labelIds": ["INBOX"],
+        "payload": {
+            "mimeType": "text/plain",
+            "headers": [
+                {"name": "Subject", "value": subject},
+                {"name": "From", "value": sender},
+                {"name": "Date", "value": date},
+            ],
+            "body": {"data": body_data},
+        },
+    }
+
+
+class TestGmailClientFetchMessages:
+    """GmailClient.fetch_messages tests with mocked Google API."""
+
+    def _make_client(self) -> "GmailClient":
+        from app.integrations.gmail import GmailClient
+        return GmailClient(_TOKEN_DICT)
+
+    @pytest.mark.asyncio
+    async def test_happy_path_returns_email_messages(self):
+        client = self._make_client()
+        msg = _make_gmail_message()
+
+        mock_service = MagicMock()
+        mock_users = mock_service.users.return_value
+        mock_messages = mock_users.messages.return_value
+        mock_messages.list.return_value.execute.return_value = {
+            "messages": [{"id": "msg001"}]
+        }
+        mock_messages.get.return_value.execute.return_value = msg
+
+        with patch("app.integrations.gmail.asyncio.to_thread") as mock_thread:
+            # Simulate to_thread running the sync function and returning results.
+            async def fake_to_thread(fn, *args, **kwargs):
+                return fn(*args, **kwargs)
+            mock_thread.side_effect = fake_to_thread
+
+            with patch("googleapiclient.discovery.build", return_value=mock_service), \
+                 patch("google.auth.transport.requests.Request"), \
+                 patch.object(type(client._credentials), "expired", new_callable=PropertyMock, return_value=False):
+                results = await client.fetch_messages()
+
+        assert len(results) == 1
+        assert results[0].subject == "Test email"
+        assert results[0].sender == "alice@example.com"
+        assert results[0].body_text == "Hello world"
+
+    @pytest.mark.asyncio
+    async def test_no_messages_returns_empty_list(self):
+        client = self._make_client()
+
+        mock_service = MagicMock()
+        mock_users = mock_service.users.return_value
+        mock_messages = mock_users.messages.return_value
+        mock_messages.list.return_value.execute.return_value = {"messages": []}
+
+        with patch("app.integrations.gmail.asyncio.to_thread") as mock_thread:
+            async def fake_to_thread(fn, *args, **kwargs):
+                return fn(*args, **kwargs)
+            mock_thread.side_effect = fake_to_thread
+
+            with patch("googleapiclient.discovery.build", return_value=mock_service), \
+                 patch("google.auth.transport.requests.Request"), \
+                 patch.object(type(client._credentials), "expired", new_callable=PropertyMock, return_value=False):
+                results = await client.fetch_messages()
+
+        assert results == []
+
+    @pytest.mark.asyncio
+    async def test_list_http_error_raises_runtime_error(self):
+        import googleapiclient.errors
+        client = self._make_client()
+
+        mock_service = MagicMock()
+        mock_users = mock_service.users.return_value
+        mock_messages = mock_users.messages.return_value
+        mock_resp = MagicMock()
+        mock_resp.status = 403
+        mock_resp.reason = "Forbidden"
+        mock_messages.list.return_value.execute.side_effect = (
+            googleapiclient.errors.HttpError(mock_resp, b"Forbidden")
+        )
+
+        with patch("app.integrations.gmail.asyncio.to_thread") as mock_thread:
+            async def fake_to_thread(fn, *args, **kwargs):
+                return fn(*args, **kwargs)
+            mock_thread.side_effect = fake_to_thread
+
+            with patch("googleapiclient.discovery.build", return_value=mock_service), \
+                 patch("google.auth.transport.requests.Request"), \
+                 patch.object(type(client._credentials), "expired", new_callable=PropertyMock, return_value=False):
+                with pytest.raises(RuntimeError, match="Gmail messages.list failed"):
+                    await client.fetch_messages()
+
+    def test_refreshed_credentials_none_when_unchanged(self):
+        client = self._make_client()
+        # Token unchanged — should return None.
+        assert client.refreshed_credentials is None
+
+    def test_refreshed_credentials_returns_dict_when_token_changes(self):
+        client = self._make_client()
+        # Simulate a token refresh by changing the access token on the credentials object.
+        client._credentials.token = "new_access_token_xyz"
+        refreshed = client.refreshed_credentials
+        assert refreshed is not None
+        assert refreshed["token"] == "new_access_token_xyz"
+
+
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+# MS Graph client \u2014 email filter builder
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+
+
+class TestBuildEmailFilter:
+    """Unit tests for ms_graph._build_email_filter."""
+
+    def setup_method(self):
+        from app.integrations.ms_graph import _build_email_filter
+        self._fn = _build_email_filter
+
+    def test_empty_returns_empty_string(self):
+        assert self._fn(None, None) == ""
+
+    def test_single_sender(self):
+        result = self._fn({"senders": ["alice@example.com"]}, None)
+        assert "from/emailAddress/address eq 'alice@example.com'" in result
+
+    def test_multiple_senders_joined_with_or(self):
+        result = self._fn({"senders": ["a@x.com", "b@x.com"]}, None)
+        assert " or " in result
+        assert "a@x.com" in result
+        assert "b@x.com" in result
+
+    def test_since_adds_received_date_ge_clause(self):
+        since = datetime(2025, 3, 1, tzinfo=timezone.utc)
+        result = self._fn(None, since)
+        assert "receivedDateTime ge 2025-03-01T00:00:00Z" in result
+
+    def test_date_range_to_adds_received_date_le_clause(self):
+        result = self._fn({"date_range": {"to": "2025-06-30"}}, None)
+        assert "receivedDateTime le" in result
+
+    def test_since_overrides_earlier_date_range_from(self):
+        since = datetime(2025, 2, 1, tzinfo=timezone.utc)
+        result = self._fn({"date_range": {"from": "2025-01-01"}}, since)
+        assert "2025-02-01T00:00:00Z" in result
+        assert "2025-01-01" not in result
+
+    def test_invalid_date_ignored(self):
+        result = self._fn({"date_range": {"from": "bad-date"}}, None)
+        assert "receivedDateTime" not in result
+
+
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+# MSGraphClient.fetch_emails
+# \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+
+
+def _make_graph_email(
+    msg_id: str = "email001",
+    subject: str = "Meeting tomorrow",
+    sender_address: str = "boss@company.com",
+    body_content: str = "Please prepare the report.",
+    received: str = "2025-06-01T10:00:00Z",
+) -> dict:
+    """Build a minimal MS Graph message item dict."""
+    return {
+        "id": msg_id,
+        "subject": subject,
+        "from": {"emailAddress": {"address": sender_address}},
+        "receivedDateTime": received,
+        "body": {"contentType": "text", "content": body_content},
+        "bodyPreview": body_content[:100],
+    }
+
+
+def _make_graph_teams_message(
+    msg_id: str = "teams001",
+    content: str = "Stand-up at 9am",
+    sender: str = "alice",
+    channel_id: str = "chan001",
+    created: str = "2025-06-01T08:00:00Z",
+) -> dict:
+    return {
+        "id": msg_id,
+        "body": {"contentType": "text", "content": content},
+        "from": {"user": {"displayName": sender}},
+        "channelIdentity": {"channelId": channel_id},
+        "createdDateTime": created,
+    }
+
+
+class TestMSGraphClientFetchEmails:
+    """MSGraphClient.fetch_emails tests with mocked httpx."""
+
+    def _make_client(self) -> "MSGraphClient":
+        from app.integrations.ms_graph import MSGraphClient
+        return MSGraphClient(_MS_TOKEN_DICT)
+
+    @pytest.mark.asyncio
+    async def test_happy_path_returns_email_messages(self):
+        client = self._make_client()
+        graph_email = _make_graph_email()
+
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"value": [graph_email]}
+        mock_response.raise_for_status = MagicMock()
+
+        with patch("app.integrations.ms_graph.httpx.AsyncClient") as mock_client_cls:
+            mock_http = AsyncMock()
+            mock_http.get = AsyncMock(return_value=mock_response)
+            mock_client_cls.return_value.__aenter__ = AsyncMock(return_value=mock_http)
+            mock_client_cls.return_value.__aexit__ = AsyncMock(return_value=False)
+
+            results = await client.fetch_emails()
+
+        assert len(results) == 1
+        assert results[0].subject == "Meeting tomorrow"
+        assert results[0].sender == "boss@company.com"
+        assert results[0].body_text == "Please prepare the report."
+
+    @pytest.mark.asyncio
+    async def test_pagination_stops_at_max_emails(self):
+        """No nextLink in first page \u2014 only one batch returned."""
+        client = self._make_client()
+        emails_batch = [_make_graph_email(msg_id=str(i)) for i in range(3)]
+
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"value": emails_batch}  # no @odata.nextLink
+        mock_response.raise_for_status = MagicMock()
+
+        with patch("app.integrations.ms_graph.httpx.AsyncClient") as mock_client_cls:
+            mock_http = AsyncMock()
+            mock_http.get = AsyncMock(return_value=mock_response)
+            mock_client_cls.return_value.__aenter__ = AsyncMock(return_value=mock_http)
+            mock_client_cls.return_value.__aexit__ = AsyncMock(return_value=False)
+
+            results = await client.fetch_emails()
+
+        assert len(results) == 3
+
+    @pytest.mark.asyncio
+    async def test_401_triggers_token_refresh_and_retries(self):
+        """On first 401, token refresh is attempted and the request retried."""
+        from app.integrations.ms_graph import MSGraphClient
+        client = MSGraphClient(_MS_TOKEN_DICT)
+
+        graph_email = _make_graph_email()
+
+        response_401 = MagicMock()
+        response_401.status_code = 401
+
+        response_200 = MagicMock()
+        response_200.status_code = 200
+        response_200.json.return_value = {"value": [graph_email]}
+        response_200.raise_for_status = MagicMock()
+
+        call_count = 0
+
+        async def fake_get(url, params=None, headers=None):
+            nonlocal call_count
+            call_count += 1
+            if call_count == 1:
+                return response_401
+            return response_200
+
+        with patch("app.integrations.ms_graph.httpx.AsyncClient") as mock_client_cls, \
+             patch.object(client, "_refresh_access_token", new_callable=AsyncMock) as mock_refresh:
+            mock_http = AsyncMock()
+            mock_http.get = fake_get
+            mock_client_cls.return_value.__aenter__ = AsyncMock(return_value=mock_http)
+            mock_client_cls.return_value.__aexit__ = AsyncMock(return_value=False)
+
+            results = await client.fetch_emails()
+
+        mock_refresh.assert_called_once()
+        assert len(results) == 1
+
+    def test_refreshed_credentials_none_when_token_unchanged(self):
+        client = self._make_client()
+        assert client.refreshed_credentials is None
+
+    def test_refreshed_credentials_returns_dict_when_token_changes(self):
+        client = self._make_client()
+        client._access_token = "new_token_abc"
+        assert client.refreshed_credentials is not None
+        assert client.refreshed_credentials["access_token"] == "new_token_abc"
+
+
+class TestMSGraphClientFetchMessages:
+    """MSGraphClient.fetch_messages (Teams) tests."""
+
+    def _make_client(self) -> "MSGraphClient":
+        from app.integrations.ms_graph import MSGraphClient
+        return MSGraphClient(_MS_TOKEN_DICT)
+
+    @pytest.mark.asyncio
+    async def test_happy_path_returns_chat_messages(self):
+        client = self._make_client()
+        teams_msg = _make_graph_teams_message()
+
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"value": [teams_msg]}
+        mock_response.raise_for_status = MagicMock()
+
+        with patch("app.integrations.ms_graph.httpx.AsyncClient") as mock_client_cls:
+            mock_http = AsyncMock()
+            mock_http.get = AsyncMock(return_value=mock_response)
+            mock_client_cls.return_value.__aenter__ = AsyncMock(return_value=mock_http)
+            mock_client_cls.return_value.__aexit__ = AsyncMock(return_value=False)
+
+            results = await client.fetch_messages()
+
+        assert len(results) == 1
+        assert results[0].content == "Stand-up at 9am"
+        assert results[0].sender == "alice"
+
+    @pytest.mark.asyncio
+    async def test_403_degrades_gracefully(self):
+        """getAllMessages returning 403 (license issue) returns empty list, no exception."""
+        import httpx as _httpx
+
+        client = self._make_client()
+
+        error_response = MagicMock()
+        error_response.status_code = 403
+        http_error = _httpx.HTTPStatusError(
+            "Forbidden", request=MagicMock(), response=error_response
+        )
+
+        with patch("app.integrations.ms_graph.httpx.AsyncClient") as mock_client_cls:
+            mock_http = AsyncMock()
+            mock_http.get = AsyncMock(side_effect=http_error)
+            mock_client_cls.return_value.__aenter__ = AsyncMock(return_value=mock_http)
+            mock_client_cls.return_value.__aexit__ = AsyncMock(return_value=False)
+
+            results = await client.fetch_messages()
+
+        assert results == []
+
+    @pytest.mark.asyncio
+    async def test_channel_filter_applied(self):
+        """Messages from non-matching channels are filtered out."""
+        client = self._make_client()
+        matching = _make_graph_teams_message(channel_id="dev-channel", content="Deploy today")
+        non_matching = _make_graph_teams_message(msg_id="t2", channel_id="random", content="Lunch?")
+
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"value": [matching, non_matching]}
+        mock_response.raise_for_status = MagicMock()
+
+        with patch("app.integrations.ms_graph.httpx.AsyncClient") as mock_client_cls:
+            mock_http = AsyncMock()
+            mock_http.get = AsyncMock(return_value=mock_response)
+            mock_client_cls.return_value.__aenter__ = AsyncMock(return_value=mock_http)
+            mock_client_cls.return_value.__aexit__ = AsyncMock(return_value=False)
+
+            results = await client.fetch_messages(
+                filter_config={"channels": ["dev-channel"]}
+            )
+
+        assert len(results) == 1
+        assert results[0].content == "Deploy today"
+
+
+class TestMSGraphClientRefreshToken:
+    """MSGraphClient._refresh_access_token with mocked MSAL."""
+
+    @pytest.mark.asyncio
+    async def test_msal_error_raises_runtime_error(self):
+        from app.integrations.ms_graph import MSGraphClient
+        client = MSGraphClient({**_MS_TOKEN_DICT, "refresh_token": "rt_test"})
+
+        mock_app = MagicMock()
+        mock_app.acquire_token_by_refresh_token.return_value = {
+            "error": "invalid_grant",
+            "error_description": "Refresh token expired",
+        }
+
+        with patch("msal.ConfidentialClientApplication", return_value=mock_app), \
+             patch("app.integrations.ms_graph.settings") as mock_settings:
+            mock_settings.MS_CLIENT_ID = "client_id"
+            mock_settings.MS_CLIENT_SECRET = "secret"
+            mock_settings.MS_TENANT_ID = "common"
+            with pytest.raises(RuntimeError, match="MS Graph token refresh failed"):
+                await client._refresh_access_token()
+
+    @pytest.mark.asyncio
+    async def test_successful_refresh_updates_access_token(self):
+        from app.integrations.ms_graph import MSGraphClient
+        client = MSGraphClient({**_MS_TOKEN_DICT, "refresh_token": "rt_old"})
+
+        mock_app = MagicMock()
+        mock_app.acquire_token_by_refresh_token.return_value = {
+            "access_token": "new_access_token",
+            "refresh_token": "new_refresh_token",
+        }
+
+        with patch("msal.ConfidentialClientApplication", return_value=mock_app), \
+             patch("app.integrations.ms_graph.settings") as mock_settings:
+            mock_settings.MS_CLIENT_ID = "client_id"
+            mock_settings.MS_CLIENT_SECRET = "secret"
+            mock_settings.MS_TENANT_ID = "common"
+            await client._refresh_access_token()
+
+        assert client._access_token == "new_access_token"
+        assert client._refresh_token == "new_refresh_token"
--- a/tests/test_journey_v2.py
+++ b/tests/test_journey_v2.py
@@ -0,0 +1,349 @@
+"""Tests for Local Agent V2 journey setup (Step 4).
+
+Covers the chatbot journey that produces a structured AgentConfig JSON
+instead of a freeform prompt_template string.
+
+Unit tests (no LLM)
+--------------------
+  4.6a  _extract_agent_config: valid JSON → returns serialised config
+  4.6b  _extract_agent_config: invalid JSON → returns None
+  4.6c  _extract_agent_config: markers absent → returns None
+  4.6d  _extract_agent_config: only START marker → returns None
+  4.6e  Session not found → done=True, agent_config=None
+  4.6f  Nudge uses AGENT_CONFIG_START/END markers (not old PROMPT_TEMPLATE)
+
+Eval tests (real LLM + Langfuse scoring)
+-----------------------------------------
+Cases are defined in tests/fixtures/journey_v2/cases.yaml.
+Email HTML files live in tests/fixtures/journey_v2/data/.
+Use --journey-dir to point at a custom folder (same structure required).
+
+Run:
+    pytest tests/test_journey_v2.py -v
+    pytest tests/test_journey_v2.py -v -k "4_6"          # unit only
+    pytest tests/test_journey_v2.py -v -k "eval"          # LLM evals only
+    pytest tests/test_journey_v2.py -v --journey-dir /p   # custom fixtures
+"""
+
+from __future__ import annotations
+
+import uuid
+from contextlib import nullcontext
+from pathlib import Path
+from typing import Any
+from unittest.mock import patch
+
+import pytest
+import yaml
+
+from app.api.routes.agent_setup import (
+    _CONFIG_END,
+    _CONFIG_START,
+    _MAX_TURNS,
+    _extract_agent_config,
+    _sessions,
+    handle_journey_message,
+    handle_journey_start,
+)
+from app.core.langfuse_client import get_langfuse
+from app.core.ws_context import clear_client_executor, set_client_executor
+from app.schemas import AgentConfig
+from tests.conftest import TEST_USER_IDS
+
+# ── Constants ─────────────────────────────────────────────────────────────
+
+_USER_ID = TEST_USER_IDS["power"]
+
+_DEFAULT_FIXTURE_DIR = Path(__file__).parent / "fixtures" / "journey_v2"
+
+# ── Fixture loading ───────────────────────────────────────────────────────
+
+
+def _fixtures_dir(config) -> Path:
+    override = config.getoption("--journey-dir")
+    return Path(override) if override else _DEFAULT_FIXTURE_DIR
+
+
+def _load_cases(config) -> list[dict]:
+    return yaml.safe_load(
+        (_fixtures_dir(config) / "cases.yaml").read_text(encoding="utf-8")
+    )
+
+
+def _read_data_file(filename: str, fixtures_dir: Path) -> str:
+    return (fixtures_dir / "data" / filename).read_text(encoding="utf-8")
+
+
+# ── pytest_generate_tests ─────────────────────────────────────────────────
+
+
+def pytest_generate_tests(metafunc):
+    if "journey_case" not in metafunc.fixturenames:
+        return
+    cases = _load_cases(metafunc.config)
+    metafunc.parametrize("journey_case", cases, ids=[c["id"] for c in cases])
+
+
+# ── Executor builder ──────────────────────────────────────────────────────
+
+
+def _make_fs_executor(directory_files: list[dict], fixtures_dir: Path):
+    """Return an async callback that simulates filesystem tool responses.
+
+    Matches the signature expected by ``set_client_executor`` / ``execute_on_client``:
+    receives the full ``payload`` dict and returns a result dict.
+
+    ``directory_files`` is a list of ``{path, content_file}`` dicts;
+    ``content_file`` is relative to ``fixtures_dir/data/``.
+    """
+    file_map: dict[str, str] = {
+        entry["path"]: _read_data_file(entry["content_file"], fixtures_dir)
+        for entry in directory_files
+    }
+
+    async def _executor(payload: dict) -> dict:
+        action = payload.get("action", "")
+        data = payload.get("data") or {}
+
+        if action == "list_directory":
+            return {"entries": [
+                {"type": "file", "name": p.split("/")[-1], "path": p}
+                for p in file_map
+            ]}
+
+        if action == "read_file_content":
+            path = data.get("path", "")
+            return {"content": file_map.get(path, "")}
+
+        if action == "get_file_metadata":
+            path = data.get("path", "")
+            name = path.split("/")[-1]
+            ext = "." + name.rsplit(".", 1)[-1] if "." in name else ""
+            return {"name": name, "extension": ext, "size": 1024,
+                    "createdAt": None, "modifiedAt": None}
+
+        return {}
+
+    return _executor
+
+
+# ── Journey runner helper ─────────────────────────────────────────────────
+
+
+async def _run_journey(user_id: str, case: dict, executor) -> dict[str, Any]:
+    """Drive start + all user_messages for a case. Returns the final reply dict.
+
+    Mirrors ``device_ws._handle_journey_start/message``: sets the client
+    executor (so filesystem tools work) before each handler call.
+    """
+    session_id = str(uuid.uuid4())
+    try:
+        set_client_executor(executor)
+        reply = await handle_journey_start(user_id, {
+            "agent_type": "local",
+            "directory": case["directory"],
+            "data_types": case["data_types"],
+            "session_id": session_id,
+        })
+
+        for msg in case.get("user_messages", []):
+            if reply.get("done"):
+                break
+            set_client_executor(executor)
+            reply = await handle_journey_message(user_id, {
+                "session_id": reply["session_id"],
+                "message": msg,
+            })
+    finally:
+        clear_client_executor()
+        _sessions.pop(session_id, None)
+
+    return reply
+
+
+# ── Assertion helper ──────────────────────────────────────────────────────
+
+
+def _evaluate_case(case: dict, reply: dict) -> tuple[float, str]:
+    """Return (score, comment) for a journey case given the final reply dict."""
+    if case.get("expect_question"):
+        has_q = "?" in reply.get("message", "")
+        return (1.0 if has_q else 0.0), f"first_reply_has_question={has_q}"
+
+    if case.get("expect_done") and not reply.get("done"):
+        return 0.0, "expected done=True but journey did not complete"
+
+    agent_config_raw = reply.get("agent_config")
+
+    if case.get("expect_valid_config"):
+        if not agent_config_raw:
+            return 0.0, "agent_config is None"
+        try:
+            parsed = AgentConfig.model_validate_json(agent_config_raw)
+            valid = len(parsed.content_types) > 0
+            return (1.0 if valid else 0.0), f"content_types={len(parsed.content_types)}"
+        except Exception as exc:
+            return 0.0, f"parse error: {exc}"
+
+    if case.get("expect_content_type_id"):
+        expected_id = case["expect_content_type_id"]
+        if not agent_config_raw:
+            return 0.0, "agent_config is None"
+        try:
+            parsed = AgentConfig.model_validate_json(agent_config_raw)
+            ids = [ct.id for ct in parsed.content_types]
+            found = expected_id in ids
+            return (1.0 if found else 0.0), f"content_type_ids={ids}, expected={expected_id}"
+        except Exception as exc:
+            return 0.0, f"parse error: {exc}"
+
+    if case.get("expect_extraction_contains"):
+        keyword = case["expect_extraction_contains"].lower()
+        if not agent_config_raw:
+            return 0.0, "agent_config is None"
+        try:
+            parsed = AgentConfig.model_validate_json(agent_config_raw)
+            if not parsed.content_types:
+                return 0.0, "no content_types in config"
+            prompt = parsed.content_types[0].extraction_prompt.lower()
+            found = keyword in prompt
+            return (1.0 if found else 0.0), f"keyword='{keyword}' in extraction_prompt={found}"
+        except Exception as exc:
+            return 0.0, f"parse error: {exc}"
+
+    if case.get("expect_global_rules"):
+        if not agent_config_raw:
+            return 0.0, "agent_config is None"
+        try:
+            parsed = AgentConfig.model_validate_json(agent_config_raw)
+            has_rules = len(parsed.global_rules) > 0
+            return (1.0 if has_rules else 0.0), f"global_rules={parsed.global_rules}"
+        except Exception as exc:
+            return 0.0, f"parse error: {exc}"
+
+    return 1.0, "no specific assertion"
+
+
+# ── Unit tests ────────────────────────────────────────────────────────────
+
+
+def test_4_6a_extract_valid_json():
+    """_extract_agent_config: valid JSON between markers → returns serialised config."""
+    config = AgentConfig(
+        content_types=[],
+        global_rules=["No project = no entity"],
+        data_types=["tasks"],
+    )
+    text = f"Some preamble\n{_CONFIG_START}\n{config.model_dump_json()}\n{_CONFIG_END}\nTrailing"
+    result = _extract_agent_config(text)
+    assert result is not None
+    parsed = AgentConfig.model_validate_json(result)
+    assert parsed.global_rules == ["No project = no entity"]
+
+
+def test_4_6b_extract_invalid_json():
+    """_extract_agent_config: malformed JSON between markers → returns None."""
+    text = f"{_CONFIG_START}\n{{not: valid json\n{_CONFIG_END}"
+    assert _extract_agent_config(text) is None
+
+
+def test_4_6c_extract_markers_absent():
+    """_extract_agent_config: no markers at all → returns None."""
+    assert _extract_agent_config("No markers here at all") is None
+
+
+def test_4_6d_extract_only_start_marker():
+    """_extract_agent_config: START without END → returns None."""
+    assert _extract_agent_config(f"text {_CONFIG_START} no end marker") is None
+
+
+@pytest.mark.asyncio
+async def test_4_6e_session_not_found():
+    """4.6e Session not found → done=True, agent_config=None, informative message."""
+    reply = await handle_journey_message(_USER_ID, {
+        "session_id": "nonexistent-session-id",
+        "message": "Hello",
+    })
+    assert reply["done"] is True
+    assert reply["agent_config"] is None
+    assert "not found" in reply["message"].lower() or "expired" in reply["message"].lower()
+
+
+@pytest.mark.asyncio
+async def test_4_6f_nudge_uses_new_markers():
+    """4.6f Nudge injected after max turns uses AGENT_CONFIG markers, not PROMPT_TEMPLATE."""
+    session_id = str(uuid.uuid4())
+    captured_histories: list[list[dict]] = []
+
+    async def _mock_llm(system_prompt, history, tools, **kwargs) -> str:
+        captured_histories.append(list(history))
+        # Return plain text — no markers — to trigger the nudge path.
+        return "I still need more information from you."
+
+    from app.api.routes.agent_setup import JourneySession
+
+    fake_session = JourneySession(
+        session_id=session_id,
+        user_id=_USER_ID,
+        agent_type="local",
+        directory="/test",
+        data_types=["tasks"],
+        system_prompt="system",
+        langfuse_prompt=None,
+    )
+    # Fill history to the turn limit so the next message triggers the nudge.
+    for i in range(_MAX_TURNS):
+        fake_session.history.append({"role": "user", "content": f"msg {i}"})
+        fake_session.history.append({"role": "assistant", "content": "ok"})
+    _sessions[session_id] = fake_session
+
+    try:
+        with patch("app.api.routes.agent_setup._call_llm_with_tools", side_effect=_mock_llm):
+            await handle_journey_message(_USER_ID, {
+                "session_id": session_id,
+                "message": "one more message to trigger nudge",
+            })
+    finally:
+        _sessions.pop(session_id, None)
+
+    # Second LLM call receives the nudge appended to history.
+    assert len(captured_histories) >= 2, "Expected ≥ 2 LLM calls (main reply + nudge)"
+    nudge_history = captured_histories[1]
+    user_msgs = " ".join(t["content"] for t in nudge_history if t["role"] == "user")
+    assert _CONFIG_START in user_msgs, f"Nudge must reference {_CONFIG_START}"
+    assert _CONFIG_END in user_msgs, f"Nudge must reference {_CONFIG_END}"
+    assert "PROMPT_TEMPLATE" not in user_msgs, "Old PROMPT_TEMPLATE markers must not appear in nudge"
+
+
+# ── Eval tests (real LLM + Langfuse) ─────────────────────────────────────
+
+
+@pytest.mark.asyncio
+@pytest.mark.eval
+async def test_eval_journey(journey_case, pytestconfig):
+    """Parametrized eval test — one invocation per YAML case."""
+    case: dict = journey_case
+    fixtures_dir = _fixtures_dir(pytestconfig)
+    executor = _make_fs_executor(case.get("directory_files", []), fixtures_dir)
+
+    lf = get_langfuse()
+    obs_ctx = lf.start_as_current_observation(
+        name=f"eval-journey-{case['id']}-{case.get('score_name', 'unknown').replace('.', '-')}",
+        metadata={"step": "4", "case_id": case["id"]},
+    ) if lf else nullcontext()
+
+    with obs_ctx as obs:
+        reply = await _run_journey(_USER_ID, case, executor)
+        score, comment = _evaluate_case(case, reply)
+
+        if obs is not None:
+            obs.score(
+                name=case.get("score_name", f"journey.case_{case['id']}"),
+                value=score,
+                comment=comment,
+            )
+
+    if lf:
+        lf.flush()
+
+    assert score == 1.0, f"[{case['id']}] {case.get('description', '')} — {comment}"
--- a/tests/test_memory_middleware.py
+++ b/tests/test_memory_middleware.py
@@ -0,0 +1,343 @@
+"""Tests for Step 7 — MemoryMiddleware.
+
+Coverage:
+  1. enrich_context returns core prefs + associative + episodic + proactive
+  2. store_episode creates an encrypted row decryptable with the user's key
+  3. update_core upserts correctly
+  4. User with no encryption_key returns empty context (no crash)
+  5. End-to-end: home_request WS frame results in an episodic row being stored
+"""
+
+from __future__ import annotations
+
+import json
+import uuid
+from unittest.mock import patch
+
+import pytest
+import pytest_asyncio
+from cryptography.fernet import Fernet
+from sqlalchemy import select
+
+from app.core.memory_middleware import MemoryMiddleware, _PROACTIVE_CONFIDENCE_THRESHOLD
+from app.db import get_session
+from app.main import app
+from app.models import (
+    MemoryAssociative,
+    MemoryCore,
+    MemoryEpisodic,
+    MemoryProactive,
+    User,
+)
+from tests.conftest import TEST_USER_IDS, make_jwt
+
+
+USER_ID = TEST_USER_IDS["power"]
+_FERNET_KEY = Fernet.generate_key().decode()
+
+
+# ── DB override ───────────────────────────────────────────────────────────────
+
+@pytest.fixture(autouse=True)
+def _override_db(db_session):
+    async def _gen():
+        yield db_session
+
+    app.dependency_overrides[get_session] = _gen
+    yield
+    app.dependency_overrides.pop(get_session, None)
+
+
+# ── Fixtures ──────────────────────────────────────────────────────────────────
+
+@pytest_asyncio.fixture
+async def user_with_key(db_session):
+    """Set encryption_key on the seeded power user."""
+    result = await db_session.execute(select(User).where(User.id == USER_ID))
+    user = result.scalar_one()
+    user.encryption_key = _FERNET_KEY
+    await db_session.commit()
+    return user
+
+
+def _fernet():
+    return Fernet(_FERNET_KEY.encode())
+
+
+def _enc(plaintext: str) -> str:
+    return _fernet().encrypt(plaintext.encode()).decode()
+
+
+def _dec(ciphertext: str) -> str:
+    return _fernet().decrypt(ciphertext.encode()).decode()
+
+
+# ── enrich_context ────────────────────────────────────────────────────────────
+
+@pytest.mark.asyncio
+async def test_enrich_context_returns_core_memory(db_session, user_with_key):
+    # Seed a core memory row
+    db_session.add(MemoryCore(
+        id=str(uuid.uuid4()),
+        user_id=USER_ID,
+        key="timezone",
+        value_encrypted=_enc("UTC"),
+    ))
+    await db_session.commit()
+
+    middleware = MemoryMiddleware(db_session)
+    ctx = await middleware.enrich_context(USER_ID, "What are my tasks?")
+
+    assert "core_memory" in ctx
+    assert ctx["core_memory"]["timezone"] == "UTC"
+
+
+@pytest.mark.asyncio
+async def test_enrich_context_returns_episodic_memory(db_session, user_with_key):
+    session_id = str(uuid.uuid4())
+    db_session.add(MemoryEpisodic(
+        id=str(uuid.uuid4()),
+        user_id=USER_ID,
+        summary_encrypted=_enc("User asked about Q1 tasks"),
+        session_id=session_id,
+    ))
+    await db_session.commit()
+
+    middleware = MemoryMiddleware(db_session)
+    ctx = await middleware.enrich_context(USER_ID, "any message")
+
+    assert "episodic_memory" in ctx
+    assert any("Q1 tasks" in s for s in ctx["episodic_memory"])
+
+
+@pytest.mark.asyncio
+async def test_enrich_context_filters_episodic_by_session_id(db_session, user_with_key):
+    target_session = str(uuid.uuid4())
+    other_session = str(uuid.uuid4())
+    db_session.add(MemoryEpisodic(
+        id=str(uuid.uuid4()),
+        user_id=USER_ID,
+        summary_encrypted=_enc("Target session memory"),
+        session_id=target_session,
+    ))
+    db_session.add(MemoryEpisodic(
+        id=str(uuid.uuid4()),
+        user_id=USER_ID,
+        summary_encrypted=_enc("Other session memory"),
+        session_id=other_session,
+    ))
+    await db_session.commit()
+
+    middleware = MemoryMiddleware(db_session)
+    ctx = await middleware.enrich_context(USER_ID, "any message", session_id=target_session)
+
+    episodic = ctx.get("episodic_memory", [])
+    assert any("Target session" in s for s in episodic)
+    assert not any("Other session" in s for s in episodic)
+
+
+@pytest.mark.asyncio
+async def test_enrich_context_returns_proactive_hints(db_session, user_with_key):
+    # Add one pattern above threshold and one below
+    db_session.add(MemoryProactive(
+        id=str(uuid.uuid4()),
+        user_id=USER_ID,
+        pattern_encrypted=_enc("User prefers short summaries"),
+        confidence=0.9,
+        source="inferred",
+    ))
+    db_session.add(MemoryProactive(
+        id=str(uuid.uuid4()),
+        user_id=USER_ID,
+        pattern_encrypted=_enc("User likes dark mode"),
+        confidence=0.1,
+        source="inferred",
+    ))
+    await db_session.commit()
+
+    middleware = MemoryMiddleware(db_session)
+    ctx = await middleware.enrich_context(USER_ID, "any message")
+
+    assert "proactive_hints" in ctx
+    hints = ctx["proactive_hints"]
+    assert any("short summaries" in h for h in hints)
+    assert not any("dark mode" in h for h in hints)
+
+
+@pytest.mark.asyncio
+async def test_enrich_context_returns_associative_memory(db_session, user_with_key):
+    db_session.add(MemoryAssociative(
+        id=str(uuid.uuid4()),
+        user_id=USER_ID,
+        content_encrypted=_enc("Related memory about meetings"),
+        embedding=None,
+        entity_type="note",
+    ))
+    await db_session.commit()
+
+    middleware = MemoryMiddleware(db_session)
+    ctx = await middleware.enrich_context(USER_ID, "meetings")
+
+    assert "associative_memory" in ctx
+    assert any("meetings" in m for m in ctx["associative_memory"])
+
+
+@pytest.mark.asyncio
+async def test_enrich_context_empty_for_user_without_key(db_session):
+    """User with no encryption_key → empty context, no crash."""
+    result = await db_session.execute(select(User).where(User.id == USER_ID))
+    user = result.scalar_one()
+    user.encryption_key = None
+    await db_session.commit()
+
+    middleware = MemoryMiddleware(db_session)
+    ctx = await middleware.enrich_context(USER_ID, "hello")
+    assert ctx == {}
+
+
+# ── store_episode ─────────────────────────────────────────────────────────────
+
+@pytest.mark.asyncio
+async def test_store_episode_creates_encrypted_row(db_session, user_with_key):
+    session_id = str(uuid.uuid4())
+    middleware = MemoryMiddleware(db_session)
+    await middleware.store_episode(USER_ID, session_id, "hello", "world")
+
+    result = await db_session.execute(
+        select(MemoryEpisodic).where(MemoryEpisodic.session_id == session_id)
+    )
+    row = result.scalar_one()
+    plaintext = _dec(row.summary_encrypted)
+    assert "hello" in plaintext
+    assert "world" in plaintext
+
+
+@pytest.mark.asyncio
+async def test_store_episode_decryptable(db_session, user_with_key):
+    session_id = str(uuid.uuid4())
+    middleware = MemoryMiddleware(db_session)
+    await middleware.store_episode(USER_ID, session_id, "msg", "resp")
+
+    result = await db_session.execute(
+        select(MemoryEpisodic).where(MemoryEpisodic.session_id == session_id)
+    )
+    row = result.scalar_one()
+    # Decrypt using the same key — must not raise
+    decrypted = _dec(row.summary_encrypted)
+    assert len(decrypted) > 0
+
+
+# ── update_core ───────────────────────────────────────────────────────────────
+
+@pytest.mark.asyncio
+async def test_update_core_insert(db_session, user_with_key):
+    middleware = MemoryMiddleware(db_session)
+    await middleware.update_core(USER_ID, "lang", "en")
+
+    result = await db_session.execute(
+        select(MemoryCore).where(MemoryCore.user_id == USER_ID, MemoryCore.key == "lang")
+    )
+    row = result.scalar_one()
+    assert _dec(row.value_encrypted) == "en"
+
+
+@pytest.mark.asyncio
+async def test_update_core_upsert(db_session, user_with_key):
+    middleware = MemoryMiddleware(db_session)
+    await middleware.update_core(USER_ID, "lang", "en")
+    await middleware.update_core(USER_ID, "lang", "fr")
+
+    result = await db_session.execute(
+        select(MemoryCore).where(MemoryCore.user_id == USER_ID, MemoryCore.key == "lang")
+    )
+    rows = result.scalars().all()
+    assert len(rows) == 1
+    assert _dec(rows[0].value_encrypted) == "fr"
+
+
+@pytest.mark.asyncio
+async def test_core_block_edit_ops(db_session, user_with_key):
+    middleware = MemoryMiddleware(db_session)
+
+    await middleware.update_core(USER_ID, "human", "Name: Roberto")
+    await middleware.append_core(USER_ID, "human", "Timezone: Europe/Rome")
+    replaced = await middleware.replace_core(USER_ID, "human", "Roberto", "Robert")
+
+    blocks = await middleware.list_core_blocks(USER_ID)
+    human = next(b for b in blocks if b["label"] == "human")
+
+    assert replaced is True
+    assert "Name: Robert" in human["value"]
+    assert "Timezone: Europe/Rome" in human["value"]
+
+    deleted = await middleware.delete_core(USER_ID, "human")
+    assert deleted is True
+    assert await middleware.get_core_block(USER_ID, "human") is None
+
+
+@pytest.mark.asyncio
+async def test_archival_and_recall_search_helpers(db_session, user_with_key):
+    middleware = MemoryMiddleware(db_session)
+
+    await middleware.insert_archival(USER_ID, "Project whitelist has release risk", source="assistant")
+    await middleware.store_episode(USER_ID, str(uuid.uuid4()), "How is whitelist?", "Whitelist is delayed")
+
+    arch = await middleware.search_archival(USER_ID, "whitelist", top_k=3)
+    rec = await middleware.search_recall(USER_ID, "delayed", top_k=3)
+
+    assert any("whitelist" in item.lower() for item in arch)
+    assert any("delayed" in item.lower() for item in rec)
+
+
+# ── End-to-end WS: memory middleware is called during home_request ────────────
+
+def test_home_request_calls_memory_middleware(client):
+    """home_request triggers enrich_context before and store_episode after the LLM."""
+    enrich_calls: list[tuple] = []
+    store_calls: list[tuple] = []
+
+    class _MockMiddleware:
+        def __init__(self, db):
+            pass
+
+        async def enrich_context(self, user_id, message, **kwargs):
+            enrich_calls.append((user_id, message))
+            return {"core_memory": {"tz": "UTC"}}
+
+        async def store_episode(self, user_id, session_id, message, response, **kwargs):
+            store_calls.append((user_id, session_id, message, response))
+
+    token = make_jwt("power", user_id=USER_ID)
+    session_id = str(uuid.uuid4())
+
+    async def _mock_stream(user_id, message, context):
+        # Verify memory context was injected
+        assert context.get("core_memory") == {"tz": "UTC"}
+        yield "token", "Done"
+
+    with (
+        patch("app.api.routes.device_ws.MemoryMiddleware", _MockMiddleware),
+        patch("app.api.routes.device_ws.run_home_stream", side_effect=_mock_stream),
+    ):
+        with client.websocket_connect(f"/api/v1/ws/device?token={token}") as ws:
+            ws.send_text(json.dumps({
+                "type": "device_hello", "device_id": "dev-mem", "agent_ids": []
+            }))
+            ws.send_text(json.dumps({
+                "type": "home_request",
+                "request_id": "r-mem",
+                "session_id": session_id,
+                "message": "Show tasks",
+            }))
+            for _ in range(20):
+                raw = ws.receive_text()
+                frame = json.loads(raw)
+                if frame.get("type") == "stream_end":
+                    break
+
+    assert len(enrich_calls) == 1
+    assert enrich_calls[0] == (USER_ID, "Show tasks")
+    assert len(store_calls) == 1
+    stored_session_id, stored_message = store_calls[0][1], store_calls[0][2]
+    assert stored_session_id == session_id
+    assert stored_message == "Show tasks"
--- a/tests/test_memory_models.py
+++ b/tests/test_memory_models.py
@@ -0,0 +1,205 @@
+"""Tests for Step 6 — memory ORM models and User.encryption_key.
+
+Uses the SQLite in-memory test DB (from conftest). The pgvector embedding
+column is stored as JSON in tests (SQLite-compatible).
+"""
+
+from __future__ import annotations
+
+import uuid
+from datetime import datetime, timezone
+
+import pytest
+import pytest_asyncio
+from cryptography.fernet import Fernet
+from sqlalchemy import select
+
+from app.models import MemoryAssociative, MemoryCore, MemoryEpisodic, MemoryProactive, User
+from tests.conftest import TEST_USER_IDS
+
+
+USER_ID = TEST_USER_IDS["power"]
+
+
+# ── helpers ───────────────────────────────────────────────────────────────────
+
+def _fernet_key() -> str:
+    return Fernet.generate_key().decode()
+
+
+def _encrypt(key: str, plaintext: str) -> str:
+    return Fernet(key.encode()).encrypt(plaintext.encode()).decode()
+
+
+def _decrypt(key: str, ciphertext: str) -> str:
+    return Fernet(key.encode()).decrypt(ciphertext.encode()).decode()
+
+
+# ── User.encryption_key ───────────────────────────────────────────────────────
+
+@pytest.mark.asyncio
+async def test_user_encryption_key_column_exists(db_session):
+    """User model has encryption_key column and it can be set."""
+    result = await db_session.execute(select(User).where(User.id == USER_ID))
+    user = result.scalar_one()
+    # Column exists (may be None for seeded users)
+    assert hasattr(user, "encryption_key")
+
+
+@pytest.mark.asyncio
+async def test_user_encryption_key_can_be_set(db_session):
+    key = _fernet_key()
+    result = await db_session.execute(select(User).where(User.id == USER_ID))
+    user = result.scalar_one()
+    user.encryption_key = key
+    await db_session.commit()
+
+    result2 = await db_session.execute(select(User).where(User.id == USER_ID))
+    user2 = result2.scalar_one()
+    assert user2.encryption_key == key
+
+
+# ── MemoryCore ────────────────────────────────────────────────────────────────
+
+@pytest.mark.asyncio
+async def test_memory_core_create_and_read(db_session):
+    key = _fernet_key()
+    encrypted_val = _encrypt(key, "UTC")
+
+    row = MemoryCore(
+        id=str(uuid.uuid4()),
+        user_id=USER_ID,
+        key="timezone",
+        value_encrypted=encrypted_val,
+    )
+    db_session.add(row)
+    await db_session.commit()
+
+    result = await db_session.execute(
+        select(MemoryCore).where(MemoryCore.user_id == USER_ID)
+    )
+    fetched = result.scalar_one()
+    assert fetched.key == "timezone"
+    assert _decrypt(key, fetched.value_encrypted) == "UTC"
+
+
+@pytest.mark.asyncio
+async def test_memory_core_cascade_delete(db_session):
+    """Deleting a user cascades to memory_core."""
+    row = MemoryCore(
+        id=str(uuid.uuid4()),
+        user_id=USER_ID,
+        key="lang",
+        value_encrypted="enc",
+    )
+    db_session.add(row)
+    await db_session.commit()
+
+    user = (await db_session.execute(select(User).where(User.id == USER_ID))).scalar_one()
+    await db_session.delete(user)
+    await db_session.commit()
+
+    remaining = (
+        await db_session.execute(select(MemoryCore).where(MemoryCore.user_id == USER_ID))
+    ).scalars().all()
+    assert remaining == []
+
+
+# ── MemoryAssociative ─────────────────────────────────────────────────────────
+
+@pytest.mark.asyncio
+async def test_memory_associative_create_and_read(db_session):
+    key = _fernet_key()
+    content = _encrypt(key, "User prefers morning meetings")
+    embedding = [0.1] * 1536  # fake embedding
+
+    row = MemoryAssociative(
+        id=str(uuid.uuid4()),
+        user_id=USER_ID,
+        content_encrypted=content,
+        embedding=embedding,
+        entity_type="preference",
+        entity_id=None,
+    )
+    db_session.add(row)
+    await db_session.commit()
+
+    result = await db_session.execute(
+        select(MemoryAssociative).where(MemoryAssociative.user_id == USER_ID)
+    )
+    fetched = result.scalar_one()
+    assert fetched.entity_type == "preference"
+    assert _decrypt(key, fetched.content_encrypted) == "User prefers morning meetings"
+    assert len(fetched.embedding) == 1536
+
+
+# ── MemoryEpisodic ────────────────────────────────────────────────────────────
+
+@pytest.mark.asyncio
+async def test_memory_episodic_create_and_read(db_session):
+    key = _fernet_key()
+    session_id = str(uuid.uuid4())
+    summary = _encrypt(key, "User asked about Q1 tasks")
+
+    row = MemoryEpisodic(
+        id=str(uuid.uuid4()),
+        user_id=USER_ID,
+        summary_encrypted=summary,
+        session_id=session_id,
+    )
+    db_session.add(row)
+    await db_session.commit()
+
+    result = await db_session.execute(
+        select(MemoryEpisodic).where(MemoryEpisodic.session_id == session_id)
+    )
+    fetched = result.scalar_one()
+    assert _decrypt(key, fetched.summary_encrypted) == "User asked about Q1 tasks"
+    assert isinstance(fetched.created_at, datetime)
+
+
+# ── MemoryProactive ───────────────────────────────────────────────────────────
+
+@pytest.mark.asyncio
+async def test_memory_proactive_create_and_read(db_session):
+    key = _fernet_key()
+    pattern = _encrypt(key, "User always assigns tasks to self")
+
+    row = MemoryProactive(
+        id=str(uuid.uuid4()),
+        user_id=USER_ID,
+        pattern_encrypted=pattern,
+        confidence=0.85,
+        source="inferred",
+    )
+    db_session.add(row)
+    await db_session.commit()
+
+    result = await db_session.execute(
+        select(MemoryProactive).where(MemoryProactive.user_id == USER_ID)
+    )
+    fetched = result.scalar_one()
+    assert fetched.confidence == pytest.approx(0.85)
+    assert fetched.source == "inferred"
+    assert _decrypt(key, fetched.pattern_encrypted) == "User always assigns tasks to self"
+
+
+# ── Auth registration generates encryption_key ───────────────────────────────
+
+def test_register_sets_encryption_key(client):
+    """POST /api/v1/auth/register creates a user with a valid Fernet key."""
+    resp = client.post(
+        "/api/v1/auth/register",
+        json={"email": "newuser@test.com", "password": "testpassword123"},
+    )
+    assert resp.status_code == 201
+
+    # Fetch the newly created user via the access token
+    token = resp.json()["access_token"]
+    me_resp = client.get(
+        "/api/v1/auth/me",
+        headers={"Authorization": f"Bearer {token}"},
+    )
+    assert me_resp.status_code == 200
+    # We can't see encryption_key in the API response (not in UserProfile),
+    # but we verify registration didn't crash — key generation is implicit.
--- a/tests/test_middleware.py
+++ b/tests/test_middleware.py
@@ -20,7 +20,6 @@ from jose import jwt
 from app.config.settings import settings
 from app.db import get_session
 from app.main import app
-from app.schemas import ChatResponse
 from tests.conftest import TEST_USER_IDS

 # ---------------------------------------------------------------------------
@@ -50,7 +49,6 @@ _CHAT_BODY = {
        "recent_tasks": [],
        "conversation_history": [],
    },
-    "execution_mode": "direct",
 }


@@ -240,7 +238,7 @@ class TestRateLimitMiddleware:


 class TestSanitizerMiddleware:
-    """Mock ``orchestrate`` to inject controlled strings into chat responses."""
+    """Mock ``run_home`` to inject controlled strings into chat responses."""

    _CHAT_PATH = "/api/v1/chat"

@@ -248,11 +246,10 @@ class TestSanitizerMiddleware:
        return _make_jwt(user_id=str(uuid.uuid4()), tier="pro")

    def _post_chat(self, client: TestClient, response_text: str) -> dict:
-        mock_response = ChatResponse(response=response_text, actions=[])
        with patch(
-            "app.api.routes.chat.orchestrate",
+            "app.api.routes.chat.run_home",
            new_callable=AsyncMock,
-            return_value=mock_response,
+            return_value=response_text,
        ):
            resp = client.post(
                self._CHAT_PATH,
--- a/tests/test_orchestrator.py
+++ b/tests/test_orchestrator.py
@@ -1,348 +0,0 @@
-"""Integration tests for the orchestrator module."""
-
-from __future__ import annotations
-
-import json
-from typing import Any
-from unittest.mock import AsyncMock, MagicMock, patch
-
-import pytest
-
-from app.core.agent_registry import AgentRegistry, ChatAgent
-from app.core.orchestrator import (
-    classify_intent,
-    orchestrate,
-    orchestrate_stream,
-    route_pipeline,
-    route_single,
-)
-from app.schemas import ChatRequest, ChatResponse, ExecutionPlan
-
-
-# ── Stub agents ──────────────────────────────────────────────────────
-
-
-class _TaskAgent(ChatAgent):
-    def get_name(self) -> str:
-        return "task_agent"
-
-    def get_description(self) -> str:
-        return "Manages tasks: create, update, list, suggest"
-
-    def get_tools(self) -> list[Any]:
-        return []
-
-    async def handle(self, query: str, context: dict[str, Any]) -> str:
-        return f"task: {query}"
-
-
-class _CalendarAgent(ChatAgent):
-    def get_name(self) -> str:
-        return "calendar_agent"
-
-    def get_description(self) -> str:
-        return "Calendar management: events, conflicts, scheduling"
-
-    def get_tools(self) -> list[Any]:
-        return []
-
-    async def handle(self, query: str, context: dict[str, Any]) -> str:
-        return f"calendar: {query}"
-
-
-# ── Helpers ──────────────────────────────────────────────────────────
-
-
-def _mock_llm(response_text: str) -> MagicMock:
-    """Return a mock LLM that always produces *response_text*."""
-    msg = MagicMock()
-    msg.content = response_text
-    llm = MagicMock()
-    llm.ainvoke = AsyncMock(return_value=msg)
-    return llm
-
-
-# ── Fixtures ─────────────────────────────────────────────────────────
-
-
-@pytest.fixture(autouse=True)
-def _fresh_registry():
-    """Reset the AgentRegistry singleton between tests."""
-    AgentRegistry._instance = None
-    yield
-    AgentRegistry._instance = None
-
-
-@pytest.fixture()
-def reg() -> AgentRegistry:
-    r = AgentRegistry()
-    r.register(_TaskAgent)
-    r.register(_CalendarAgent)
-    return r
-
-
-# ── classify_intent ───────────────────────────────────────────────────
-
-
-class TestClassifyIntent:
-    @pytest.mark.asyncio
-    async def test_routes_to_known_agent(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("task_agent")
-            result = await classify_intent("add a task", {}, reg)
-        assert result == "task_agent"
-
-    @pytest.mark.asyncio
-    async def test_routes_to_calendar_agent(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("calendar_agent")
-            result = await classify_intent("schedule a meeting", {}, reg)
-        assert result == "calendar_agent"
-
-    @pytest.mark.asyncio
-    async def test_falls_back_on_unknown_name(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("nonexistent_agent")
-            result = await classify_intent("do something", {}, reg)
-        assert result == "task_agent"
-
-    @pytest.mark.asyncio
-    async def test_empty_registry_returns_fallback_without_llm_call(self) -> None:
-        empty_reg = AgentRegistry()
-        # No LLM should be instantiated — early return path
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            result = await classify_intent("anything", {}, empty_reg)
-            mock_cls.assert_not_called()
-        assert result == "task_agent"
-
-    @pytest.mark.asyncio
-    async def test_whitespace_stripped_from_response(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("  task_agent  \n")
-            result = await classify_intent("create task", {}, reg)
-        assert result == "task_agent"
-
-
-# ── route_single ─────────────────────────────────────────────────────
-
-
-class TestRouteSingle:
-    @pytest.mark.asyncio
-    async def test_returns_chat_response(self, reg: AgentRegistry) -> None:
-        result = await route_single("task_agent", "create a task", {}, reg)
-        assert isinstance(result, ChatResponse)
-
-    @pytest.mark.asyncio
-    async def test_response_contains_agent_output(self, reg: AgentRegistry) -> None:
-        result = await route_single("task_agent", "create a task", {}, reg)
-        assert result.response == "task: create a task"
-
-    @pytest.mark.asyncio
-    async def test_unknown_agent_raises_key_error(self, reg: AgentRegistry) -> None:
-        with pytest.raises(KeyError):
-            await route_single("nonexistent", "hello", {}, reg)
-
-    @pytest.mark.asyncio
-    async def test_actions_default_empty(self, reg: AgentRegistry) -> None:
-        result = await route_single("task_agent", "hi", {}, reg)
-        assert result.actions == []
-
-
-# ── route_pipeline ────────────────────────────────────────────────────
-
-
-class TestRoutePipeline:
-    @pytest.mark.asyncio
-    async def test_returns_chat_response(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("synthesized result")
-            result = await route_pipeline(
-                ["task_agent", "calendar_agent"], "plan my week", {}, reg
-            )
-        assert isinstance(result, ChatResponse)
-
-    @pytest.mark.asyncio
-    async def test_response_is_synthesis_output(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("synthesized result")
-            result = await route_pipeline(
-                ["task_agent", "calendar_agent"], "plan my week", {}, reg
-            )
-        assert result.response == "synthesized result"
-
-    @pytest.mark.asyncio
-    async def test_passes_previous_results_to_subsequent_agents(
-        self, reg: AgentRegistry
-    ) -> None:
-        """Each agent after the first should receive prior outputs in context."""
-        received_contexts: list[dict[str, Any]] = []
-
-        class _CapturingAgent(ChatAgent):
-            def get_name(self) -> str:
-                return "capture"
-
-            def get_description(self) -> str:
-                return "captures context for testing"
-
-            def get_tools(self) -> list[Any]:
-                return []
-
-            async def handle(self, query: str, context: dict[str, Any]) -> str:
-                received_contexts.append(dict(context))
-                return "captured"
-
-        reg.register(_CapturingAgent)
-
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("done")
-            await route_pipeline(["task_agent", "capture"], "hi", {}, reg)
-
-        # The second agent (capture) must have received previous results
-        assert len(received_contexts) == 1
-        assert "previous_results" in received_contexts[0]
-        assert received_contexts[0]["previous_results"] == ["task: hi"]
-
-    @pytest.mark.asyncio
-    async def test_single_agent_pipeline(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("single result")
-            result = await route_pipeline(["task_agent"], "one agent", {}, reg)
-        assert result.response == "single result"
-
-
-# ── orchestrate ───────────────────────────────────────────────────────
-
-
-class TestOrchestrate:
-    @pytest.mark.asyncio
-    async def test_direct_mode_returns_chat_response(
-        self, reg: AgentRegistry
-    ) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("task_agent")
-            request = ChatRequest(message="add a task", execution_mode="direct")
-            result = await orchestrate(request, reg)
-        assert isinstance(result, ChatResponse)
-
-    @pytest.mark.asyncio
-    async def test_direct_mode_response_content(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("task_agent")
-            request = ChatRequest(message="add a task", execution_mode="direct")
-            result = await orchestrate(request, reg)
-        assert isinstance(result, ChatResponse)
-        assert result.response == "task: add a task"
-
-    @pytest.mark.asyncio
-    async def test_plan_mode_returns_execution_plan(
-        self, reg: AgentRegistry
-    ) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("task_agent")
-            request = ChatRequest(message="plan my tasks", execution_mode="plan")
-            result = await orchestrate(request, reg)
-        assert isinstance(result, ExecutionPlan)
-
-    @pytest.mark.asyncio
-    async def test_plan_mode_agent_matches_classified(
-        self, reg: AgentRegistry
-    ) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("calendar_agent")
-            request = ChatRequest(
-                message="schedule something", execution_mode="plan"
-            )
-            result = await orchestrate(request, reg)
-        assert isinstance(result, ExecutionPlan)
-        assert result.agent == "calendar_agent"
-
-    @pytest.mark.asyncio
-    async def test_plan_mode_has_steps(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("task_agent")
-            request = ChatRequest(message="plan tasks", execution_mode="plan")
-            result = await orchestrate(request, reg)
-        assert isinstance(result, ExecutionPlan)
-        assert len(result.steps) >= 1
-
-    @pytest.mark.asyncio
-    async def test_plan_mode_template_id_contains_agent_name(
-        self, reg: AgentRegistry
-    ) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("task_agent")
-            request = ChatRequest(message="plan tasks", execution_mode="plan")
-            result = await orchestrate(request, reg)
-        assert isinstance(result, ExecutionPlan)
-        assert result.steps[0].prompt_template is not None
-        assert "task_agent" in result.steps[0].prompt_template
-
-    @pytest.mark.asyncio
-    async def test_default_execution_mode_is_direct(
-        self, reg: AgentRegistry
-    ) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("task_agent")
-            # execution_mode defaults to "direct"
-            request = ChatRequest(message="help me")
-            result = await orchestrate(request, reg)
-        assert isinstance(result, ChatResponse)
-
-
-# ── orchestrate_stream ────────────────────────────────────────────────
-
-
-class TestOrchestrateStream:
-    @pytest.mark.asyncio
-    async def test_yields_at_least_one_chunk(self, reg: AgentRegistry) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("task_agent")
-            request = ChatRequest(message="add a task", execution_mode="direct")
-            chunks = [chunk async for chunk in orchestrate_stream(request, reg)]
-        assert len(chunks) >= 1
-
-    @pytest.mark.asyncio
-    async def test_last_chunk_is_final_json_frame(
-        self, reg: AgentRegistry
-    ) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("task_agent")
-            request = ChatRequest(message="add a task", execution_mode="direct")
-            chunks = [chunk async for chunk in orchestrate_stream(request, reg)]
-
-        last = json.loads(chunks[-1])
-        assert last["done"] is True
-        assert "response" in last
-        assert "actions" in last
-
-    @pytest.mark.asyncio
-    async def test_final_frame_response_matches_agent_output(
-        self, reg: AgentRegistry
-    ) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("task_agent")
-            request = ChatRequest(message="create a task", execution_mode="direct")
-            chunks = [chunk async for chunk in orchestrate_stream(request, reg)]
-
-        final = json.loads(chunks[-1])
-        assert final["response"] == "task: create a task"
-
-    @pytest.mark.asyncio
-    async def test_text_chunks_before_final_frame(
-        self, reg: AgentRegistry
-    ) -> None:
-        with patch("app.core.orchestrator._make_llm") as mock_cls:
-            mock_cls.return_value = _mock_llm("task_agent")
-            request = ChatRequest(
-                message="x" * 200, execution_mode="direct"
-            )  # long enough to produce multiple chunks
-            chunks = [chunk async for chunk in orchestrate_stream(request, reg)]
-
-        # All but the last chunk should be plain text (not valid final JSON)
-        non_final = chunks[:-1]
-        for chunk in non_final:
-            try:
-                parsed = json.loads(chunk)
-                assert parsed.get("done") is not True
-            except json.JSONDecodeError:
-                pass  # plain text chunk — expected
--- a/tests/test_output_formatter.py
+++ b/tests/test_output_formatter.py
@@ -0,0 +1,82 @@
+"""Tests for app.core.output_formatter.StreamFormatter."""
+
+from __future__ import annotations
+
+import pytest
+
+from app.core.output_formatter import StreamFormatter
+from app.schemas import WsFloatingDomain, WsStreamEnd, WsStreamStart, WsStreamText
+
+
+async def _stream(*events: tuple[str, object]):
+    for event in events:
+        yield event
+
+
+async def _collect(formatter: StreamFormatter, event_stream):
+    frames = []
+    async for frame in formatter.format(event_stream):
+        frames.append(frame)
+    return frames
+
+
+@pytest.mark.asyncio
+async def test_stream_formatter_text_stream() -> None:
+    formatter = StreamFormatter(request_id="req-1")
+    frames = await _collect(
+        formatter,
+        _stream(("token", "Hello"), ("token", " world")),
+    )
+
+    assert isinstance(frames[0], WsStreamStart)
+    assert isinstance(frames[1], WsStreamText)
+    assert frames[1].chunk == "Hello"
+    assert isinstance(frames[2], WsStreamText)
+    assert frames[2].chunk == " world"
+    assert isinstance(frames[-1], WsStreamEnd)
+
+
+@pytest.mark.asyncio
+async def test_stream_formatter_floating_domain_first() -> None:
+    formatter = StreamFormatter(request_id="req-2")
+    frames = await _collect(
+        formatter,
+        _stream(
+            (
+                "floating_domain",
+                {"type": "node", "id": "n-1", "section": None},
+            ),
+            ("token", "Summary"),
+        ),
+    )
+
+    assert isinstance(frames[0], WsFloatingDomain)
+    assert frames[0].domain.type == "node"
+    assert frames[0].domain.id == "n-1"
+    assert isinstance(frames[1], WsStreamStart)
+    assert isinstance(frames[2], WsStreamText)
+    assert frames[2].chunk == "Summary"
+    assert isinstance(frames[-1], WsStreamEnd)
+
+
+@pytest.mark.asyncio
+async def test_stream_formatter_ignores_unknown_events() -> None:
+    formatter = StreamFormatter(request_id="req-3")
+    frames = await _collect(
+        formatter,
+        _stream(("tool_end", {"name": "x"}), ("token", "ok")),
+    )
+
+    text_frames = [f for f in frames if isinstance(f, WsStreamText)]
+    assert len(text_frames) == 1
+    assert text_frames[0].chunk == "ok"
+
+
+@pytest.mark.asyncio
+async def test_stream_formatter_empty_stream_still_brackets() -> None:
+    formatter = StreamFormatter(request_id="req-4")
+    frames = await _collect(formatter, _stream())
+
+    assert len(frames) == 2
+    assert isinstance(frames[0], WsStreamStart)
+    assert isinstance(frames[1], WsStreamEnd)
--- a/tests/test_preprocessors.py
+++ b/tests/test_preprocessors.py
@@ -0,0 +1,98 @@
+"""Tests for the preprocessor system (Step 1 — Local Agent V2).
+
+Run:
+    pytest tests/test_preprocessors.py -v
+    pytest tests/test_preprocessors.py -v --preprocess-dir /path/to/folder
+
+The folder must contain cases.yaml + data/.
+"""
+
+from __future__ import annotations
+
+import re
+from pathlib import Path
+
+import pytest
+import yaml
+
+from app.core.preprocessors import detect_content_type, preprocess
+
+_DEFAULT_DIR = Path(__file__).parent / "fixtures" / "preprocessors"
+
+_GENERATORS = {
+    "binary_noise": "some\x00\x01\x02\x03\x04\x05content" * 20,
+}
+
+
+def _fixtures_dir(config) -> Path:
+    override = config.getoption("--preprocess-dir")
+    return Path(override) if override else _DEFAULT_DIR
+
+
+def _load_cases(config) -> list[dict]:
+    return yaml.safe_load((_fixtures_dir(config) / "cases.yaml").read_text(encoding="utf-8"))
+
+
+def _content(case: dict, data_dir: Path) -> str:
+    if "generate" in case:
+        return _GENERATORS[case["generate"]]
+    return (data_dir / case["file"]).read_text(encoding="utf-8")
+
+
+# ── parametrize at collection time via pytest hook ────────────────────
+
+def pytest_generate_tests(metafunc):
+    if "preprocess_case" not in metafunc.fixturenames:
+        return
+    cases = _load_cases(metafunc.config)
+    test_name = metafunc.function.__name__
+    if test_name == "test_detect":
+        subset = [c for c in cases if "detect" in c]
+    else:
+        subset = [c for c in cases if "process" in c]
+    metafunc.parametrize("preprocess_case", subset, ids=[c["id"] for c in subset])
+
+
+# ── detect ────────────────────────────────────────────────────────────
+
+def test_detect(preprocess_case, pytestconfig) -> None:
+    case = preprocess_case
+    data_dir = _fixtures_dir(pytestconfig) / "data"
+    raw = _content(case, data_dir)
+    filename = case.get("file", "")
+    ct = detect_content_type(filename, raw)
+    expected = case["detect"]
+    assert ct == expected, f"[{case['id']}] expected {expected!r}, got {ct!r}"
+
+
+# ── preprocess ────────────────────────────────────────────────────────
+
+def test_preprocess(preprocess_case, pytestconfig) -> None:
+    case = preprocess_case
+    data_dir = _fixtures_dir(pytestconfig) / "data"
+    raw = _content(case, data_dir)
+    result = preprocess(case["process"], raw)
+
+    if case.get("no_html"):
+        assert not re.search(r"<[^>]+>", result.clean_text), "clean_text contains HTML tags"
+
+    if "min_chars" in case:
+        assert len(result.clean_text) >= case["min_chars"], \
+            f"clean_text too short: {len(result.clean_text)} < {case['min_chars']}"
+
+    if "ratio_lt" in case:
+        ratio = len(result.clean_text) / len(raw)
+        assert ratio < case["ratio_lt"], f"compression ratio {ratio:.2f} >= {case['ratio_lt']}"
+
+    for key in case.get("has_meta", []):
+        assert result.metadata.get(key), f"metadata missing {key!r} (got {result.metadata})"
+
+    for item in ([case["contains"]] if isinstance(case.get("contains"), str) else case.get("contains", [])):
+        assert item in result.clean_text, f"clean_text missing {item!r}"
+
+    for item in ([case["excludes"]] if isinstance(case.get("excludes"), str) else case.get("excludes", [])):
+        assert item not in result.clean_text, f"clean_text contains forbidden {item!r}"
+
+    if "content_type" in case:
+        assert result.content_type == case["content_type"], \
+            f"expected content_type {case['content_type']!r}, got {result.content_type!r}"
--- a/tests/test_schemas_v3.py
+++ b/tests/test_schemas_v3.py
@@ -0,0 +1,242 @@
+"""Tests for v3 WebSocket frame protocol schemas."""
+
+import pytest
+from pydantic import ValidationError
+
+from app.schemas import (
+    WsDomain,
+    WsFrameType,
+    WsHomeRequest,
+    WsFloatingDomain,
+    WsFloatingRequest,
+    WsFloatingScope,
+    WsStreamEnd,
+    WsStreamStart,
+    WsStreamText,
+)
+
+
+# ── WsFrameType ───────────────────────────────────────────────────────
+
+
+def test_v3_frame_types_exist():
+    v3_types = [
+        "home_request",
+        "floating_request",
+        "stream_start",
+        "stream_text",
+        "stream_end",
+        "floating_domain",
+        "data_request",
+        "data_response",
+        "mutation",
+    ]
+    for name in v3_types:
+        assert hasattr(WsFrameType, name), f"WsFrameType missing: {name}"
+        assert WsFrameType[name].value == name
+
+
+def test_v2_frame_types_still_exist():
+    """Backward compat: v2 types must remain."""
+    v2_types = [
+        "chat_request",
+        "text_chunk",
+        "tool_call",
+        "tool_result",
+        "final",
+        "ping",
+        "agent_run",
+        "agent_data",
+        "agent_complete",
+        "device_hello",
+    ]
+    for name in v2_types:
+        assert hasattr(WsFrameType, name), f"v2 WsFrameType missing: {name}"
+
+
+# ── WsHomeRequest ─────────────────────────────────────────────────────
+
+
+def test_home_request_defaults():
+    frame = WsHomeRequest(message="Hello")
+    assert frame.type == WsFrameType.home_request
+    assert frame.message == "Hello"
+    assert frame.conversation_history == []
+
+
+def test_home_request_with_history():
+    history = [{"role": "user", "content": "Hi"}, {"role": "assistant", "content": "Hello!"}]
+    frame = WsHomeRequest(message="Follow up", conversation_history=history)
+    assert frame.conversation_history == history
+
+
+def test_home_request_serializes():
+    frame = WsHomeRequest(message="Test")
+    data = frame.model_dump()
+    assert data["type"] == "home_request"
+    assert data["message"] == "Test"
+    assert data["conversation_history"] == []
+
+
+def test_home_request_deserializes():
+    raw = {"type": "home_request", "message": "Hi there"}
+    frame = WsHomeRequest.model_validate(raw)
+    assert frame.message == "Hi there"
+
+
+def test_home_request_requires_message():
+    with pytest.raises(ValidationError):
+        WsHomeRequest.model_validate({"type": "home_request"})
+
+
+# ── WsFloatingRequest ────────────────────────────────────────────────────
+
+
+def test_floating_request_basic():
+    frame = WsFloatingRequest(
+        message="Summarise",
+        scope=WsFloatingScope(type="task", id="task-123"),
+    )
+    assert frame.type == WsFrameType.floating_request
+    assert frame.scope.type == "task"
+    assert frame.scope.id == "task-123"
+
+
+def test_floating_request_scope_without_id():
+    frame = WsFloatingRequest(
+        message="Show all",
+        scope=WsFloatingScope(type="project"),
+    )
+    assert frame.scope.id is None
+
+
+def test_floating_request_serializes():
+    frame = WsFloatingRequest(
+        message="Test",
+        scope=WsFloatingScope(type="note", id="n-1"),
+    )
+    data = frame.model_dump()
+    assert data["type"] == "floating_request"
+    assert data["scope"]["type"] == "note"
+    assert data["scope"]["id"] == "n-1"
+
+
+def test_floating_request_invalid_scope_type():
+    with pytest.raises(ValidationError):
+        WsFloatingRequest(
+            message="X",
+            scope=WsFloatingScope(type="unknown"),  # type: ignore[arg-type]
+        )
+
+
+def test_floating_request_requires_scope():
+    with pytest.raises(ValidationError):
+        WsFloatingRequest.model_validate({"type": "floating_request", "message": "X"})
+
+
+# ── WsStreamStart ─────────────────────────────────────────────────────
+
+
+def test_stream_start():
+    frame = WsStreamStart(request_id="req-abc")
+    assert frame.type == WsFrameType.stream_start
+    assert frame.request_id == "req-abc"
+
+
+def test_stream_start_serializes():
+    data = WsStreamStart(request_id="r1").model_dump()
+    assert data == {"type": "stream_start", "request_id": "r1"}
+
+
+def test_stream_start_deserializes():
+    frame = WsStreamStart.model_validate({"type": "stream_start", "request_id": "r1"})
+    assert frame.request_id == "r1"
+
+
+# ── WsStreamText ──────────────────────────────────────────────────────
+
+
+def test_stream_text():
+    frame = WsStreamText(request_id="r1", chunk="Hello ")
+    assert frame.type == WsFrameType.stream_text
+    assert frame.chunk == "Hello "
+
+
+def test_stream_text_serializes():
+    data = WsStreamText(request_id="r1", chunk="word").model_dump()
+    assert data == {"type": "stream_text", "request_id": "r1", "chunk": "word"}
+
+
+def test_stream_text_deserializes():
+    raw = {"type": "stream_text", "request_id": "r2", "chunk": "test"}
+    frame = WsStreamText.model_validate(raw)
+    assert frame.chunk == "test"
+
+
+# ── WsStreamEnd ───────────────────────────────────────────────────────
+
+
+def test_stream_end_defaults():
+    frame = WsStreamEnd(request_id="r1")
+    assert frame.type == WsFrameType.stream_end
+
+
+def test_stream_end_serializes():
+    data = WsStreamEnd(request_id="r2").model_dump()
+    assert data == {"type": "stream_end", "request_id": "r2"}
+
+
+def test_stream_end_deserializes():
+    raw = {"type": "stream_end", "request_id": "r3"}
+    frame = WsStreamEnd.model_validate(raw)
+    assert frame.request_id == "r3"
+
+
+# ── WsFloatingDomain ─────────────────────────────────────────────────────
+
+
+def test_floating_domain_tasks():
+    frame = WsFloatingDomain(request_id="r1", domain=WsDomain(type="task"))
+    assert frame.type == WsFrameType.floating_domain
+    assert frame.domain.type == "task"
+
+
+def test_floating_domain_valid_domains():
+    frame = WsFloatingDomain(
+        request_id="r1",
+        domain=WsDomain(type="project", id="213213-312321-312312-421321", section="task"),
+    )
+    assert frame.domain.type == "project"
+    assert frame.domain.id == "213213-312321-312312-421321"
+    assert frame.domain.section == "task"
+
+
+def test_floating_domain_object_valid():
+    frame = WsFloatingDomain(
+        request_id="r1",
+        domain=WsDomain(type="project", id="p1", section="task"),
+    )
+    assert frame.domain.type == "project"
+
+
+def test_floating_domain_serializes():
+    d = WsFloatingDomain(
+        request_id="r1",
+        domain=WsDomain(type="timeline"),
+    ).model_dump()
+    assert d == {
+        "type": "floating_domain",
+        "request_id": "r1",
+        "domain": {"type": "timeline", "id": None, "section": None},
+    }
+
+
+def test_floating_domain_deserializes():
+    raw = {
+        "type": "floating_domain",
+        "request_id": "r1",
+        "domain": {"type": "node", "id": "n-1", "section": None},
+    }
+    frame = WsFloatingDomain.model_validate(raw)
+    assert frame.domain.type == "node"
+    assert frame.domain.id == "n-1"
--- a/tests/test_ws_unified.py
+++ b/tests/test_ws_unified.py
@@ -0,0 +1,155 @@
+"""Integration tests for the unified WebSocket handler (Step 5).
+
+Tests the device WS endpoint with home_request and floating_request frames,
+verifying that the correct v3 frame sequence is returned.
+
+LLM calls are mocked to avoid network dependency.
+"""
+
+from __future__ import annotations
+
+import json
+from unittest.mock import patch
+
+import pytest
+
+from app.db import get_session
+from app.main import app
+from app.schemas import WsFrameType
+from tests.conftest import TEST_USER_IDS, make_jwt
+
+USER_ID = TEST_USER_IDS["power"]
+
+
+# ── helpers ───────────────────────────────────────────────────────────────────
+
+@pytest.fixture(autouse=True)
+def _override_db(db_session):
+    async def _gen():
+        yield db_session
+
+    app.dependency_overrides[get_session] = _gen
+    yield
+    app.dependency_overrides.pop(get_session, None)
+
+
+def _recv_until_end(ws, max_frames: int = 20) -> list[dict]:
+    """Receive frames until stream_end (or stream_end inside floating flow), or max_frames."""
+    frames = []
+    for _ in range(max_frames):
+        raw = ws.receive_text()
+        frame = json.loads(raw)
+        frames.append(frame)
+        if frame.get("type") == WsFrameType.stream_end:
+            break
+    return frames
+
+
+async def _mock_home_stream(user_id, message, context):
+    yield "token", "Hello"
+
+
+async def _mock_floating_stream(user_id, message, context):
+    yield "floating_domain", {"type": "task", "id": None, "section": None}
+    yield "token", "Here is a summary"
+
+
+# ── tests ─────────────────────────────────────────────────────────────────────
+
+def test_home_request_produces_stream_frames(client):
+    """home_request → stream_start, stream_text+, stream_end."""
+    token = make_jwt("power", user_id=USER_ID)
+
+    with patch("app.api.routes.device_ws.run_home_stream", side_effect=_mock_home_stream):
+        with client.websocket_connect(f"/api/v1/ws/device?token={token}") as ws:
+            ws.send_text(json.dumps({
+                "type": "device_hello", "device_id": "dev-1", "agent_ids": []
+            }))
+            ws.send_text(json.dumps({
+                "type": "home_request",
+                "request_id": "r1",
+                "message": "List my tasks",
+                "conversation_history": [],
+            }))
+            frames = _recv_until_end(ws)
+
+    types = [f["type"] for f in frames]
+    assert WsFrameType.stream_start in types
+    assert WsFrameType.stream_end in types
+    assert types.index(WsFrameType.stream_start) < types.index(WsFrameType.stream_end)
+
+
+def test_floating_request_produces_domain_frame(client):
+    """floating_request → floating_domain first, then stream_text*, stream_end."""
+    token = make_jwt("power", user_id=USER_ID)
+
+    with patch("app.api.routes.device_ws.run_floating_stream", side_effect=_mock_floating_stream):
+        with client.websocket_connect(f"/api/v1/ws/device?token={token}") as ws:
+            ws.send_text(json.dumps({
+                "type": "device_hello", "device_id": "dev-2", "agent_ids": []
+            }))
+            ws.send_text(json.dumps({
+                "type": "floating_request",
+                "request_id": "p1",
+                "message": "Summarize this task",
+                "scope": {"type": "task", "id": "task-123"},
+            }))
+            frames = _recv_until_end(ws)
+
+    types = [f["type"] for f in frames]
+    assert WsFrameType.floating_domain in types
+    assert WsFrameType.stream_end in types
+    assert types.index(WsFrameType.floating_domain) < types.index(WsFrameType.stream_end)
+
+    domain_frame = next(f for f in frames if f["type"] == WsFrameType.floating_domain)
+    assert domain_frame["domain"]["type"] == "task"
+    assert domain_frame["request_id"] == "p1"
+
+
+def test_home_request_request_id_propagated(client):
+    """request_id in home_request is echoed in all response frames."""
+    token = make_jwt("power", user_id=USER_ID)
+    req_id = "my-unique-req-id"
+
+    async def _stream(user_id, message, context):
+        yield "token", "ok"
+
+    with patch("app.api.routes.device_ws.run_home_stream", side_effect=_stream):
+        with client.websocket_connect(f"/api/v1/ws/device?token={token}") as ws:
+            ws.send_text(json.dumps({
+                "type": "device_hello", "device_id": "dev-3", "agent_ids": []
+            }))
+            ws.send_text(json.dumps({
+                "type": "home_request",
+                "request_id": req_id,
+                "message": "hello",
+            }))
+            frames = _recv_until_end(ws)
+
+    for f in frames:
+        if "request_id" in f:
+            assert f["request_id"] == req_id
+
+
+def test_tool_result_dispatch_silent_on_unknown_id(client):
+    """tool_result for unknown call_id is silently ignored — no crash."""
+    token = make_jwt("power", user_id=USER_ID)
+
+    with patch("app.api.routes.device_ws._HEARTBEAT_INTERVAL", 0.05):
+        with client.websocket_connect(f"/api/v1/ws/device?token={token}") as ws:
+            ws.send_text(json.dumps({
+                "type": "device_hello", "device_id": "dev-4", "agent_ids": []
+            }))
+            ws.send_text(json.dumps({
+                "type": "tool_result", "id": "no-such-id", "ok": True
+            }))
+            # If connection is still alive, we'll get the heartbeat ping
+            msg = json.loads(ws.receive_text())
+            assert msg["type"] == "ping"
+
+
+def test_invalid_jwt_rejected(client):
+    """Connection with bad token is closed before or after accept."""
+    with pytest.raises(Exception):
+        with client.websocket_connect("/api/v1/ws/device?token=badtoken") as ws:
+            ws.receive_text()