From 78b91ecb3d651f78926d3e121cb317550f2980f4 Mon Sep 17 00:00:00 2001 From: roberto Date: Sat, 7 Mar 2026 21:57:29 +0100 Subject: [PATCH] AI plan v1 --- AI_REFACTOR_PLAN.md | 108 ++-------------------------- BACKEND_PLAN.md | 171 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 176 insertions(+), 103 deletions(-) diff --git a/AI_REFACTOR_PLAN.md b/AI_REFACTOR_PLAN.md index 12fe505..a8c691a 100644 --- a/AI_REFACTOR_PLAN.md +++ b/AI_REFACTOR_PLAN.md @@ -512,112 +512,14 @@ Cloud Agent: --- -## Phase 5 — Shared Memory (Agent KV + Chat WS Fix) +## ~~Phase 5 — Shared Memory (Agent KV + Chat WS Fix)~~ SUPERSEDED -> **Objective:** Give chat agents persistent memory via a KV store on the Electron client. Agents can `store_memory()` to remember user preferences, patterns, and corrections, and `recall_memories()` to retrieve them. All data lives in Electron's SQLite `agent_memory` table (local-first, never stored server-side). This also requires fixing the chat WS handler to support bidirectional tool calls — currently a critical gap that blocks all agent tools from working over the `/chat/stream` endpoint. +> **Phase 5 has been replaced by Architecture v2's cloud-side memory middleware.** > -> **Electron Phase 5 plan:** `../adiuva/AI_REFACTOR_PLAN.md` Phase 5 section. +> - Step 5.1 (chat WS bidirectional fix) → moved to `BACKEND_PLAN.md` Step 14 (V2.0.1) +> - Steps 5.2–5.4 (on-device KV memory) → replaced by Steps 16–20 (MemGPT-style memory on PostgreSQL + pgvector) > -> **Why agent KV matters:** Chat agents are currently stateless — they can't remember "User prefers to-do in lowercase" or "Client X billing cycle is the 15th". With KV memory, agents become learning assistants that improve over time. Users feel the AI "knows them" without any data leaving their device. -> -> **Why the chat WS fix is critical:** The existing `/chat/stream` WS handler (`app/api/routes/chat.py`) never calls `set_client_executor()`. This means `execute_on_client()` raises `RuntimeError` whenever any agent tool tries to call it during a chat session. All 23 tools are broken over chat WS. This must be fixed before memory tools (or any tools) can work. -> -> **New Electron tables** (managed by Electron, accessed by backend via `execute_on_client`): -> - `chat_messages`: `id`, `scope`, `role`, `content`, `error`, `created_at` -> - `agent_memory`: `id`, `agent_name`, `key`, `value`, `scope`, `created_at`, `updated_at` (unique on `agent_name, key, scope`) - -### Step 5.1 — Fix chat WS for bidirectional tool calls (PREREQUISITE) - -> **This is the highest-priority backend fix.** Without it, zero agent tools work over the chat WS connection. - -- [ ] Rewrite `app/api/routes/chat.py` — `chat_stream()` WS handler: - - After auth + accept, receive first frame as `{"type": "chat_request", ...}` (not raw `ChatRequest`) - - Parse frame, extract `message` and `context` - - Set up a local `pending_calls: dict[str, asyncio.Future]` for tool-call round-trips - - Define executor callback: - ```python - async def execute_callback(payload: dict) -> dict: - call_id = payload["id"] - fut = asyncio.get_event_loop().create_future() - pending_calls[call_id] = fut - await websocket.send_text(json.dumps({"type": "tool_call", **payload})) - return await asyncio.wait_for(fut, timeout=30.0) - ``` - - Call `set_client_executor(execute_callback)` before orchestrating - - Run two concurrent tasks: - 1. **Receive loop**: dispatches incoming frames — `tool_result` resolves pending Futures, `pong` ignored - 2. **Orchestration task**: calls `orchestrate_stream()`, wraps chunks in `{"type": "text_chunk", "text": "..."}` frames, sends `{"type": "final", "response": "..."}` on completion - - Call `clear_client_executor()` in finally block - - Keep heartbeat ping every 30s - - 30s timeout on each `tool_result` — tool returns error string to LLM on timeout -- [ ] Update `orchestrate_stream()` in `app/core/orchestrator.py` if needed: - - Ensure it properly yields text chunks (currently chunks by fixed 50-char slices — consider switching to yielding full response as single chunk for now) -- **Files:** `app/api/routes/chat.py`, `app/core/orchestrator.py` -- **Outcome:** Full bidirectional WS. Tool calls, text streaming, and heartbeats happen concurrently. All 23 existing agent tools now work over chat WS. - -### Step 5.2 — Agent memory tools - -- [ ] Create `app/agents/tools/memory_tools.py`: - - `create_memory_tools(agent_name: str) -> list[Tool]` — factory function that returns two LangChain `@tool` functions with `agent_name` bound via closure: - - **`store_memory(key: str, value: str, scope: str = "global")`**: - - Calls `execute_on_client(action="select", table="agentMemory", filters={"agentName": agent_name, "key": key, "scope": scope})` - - If row exists: `execute_on_client(action="update", table="agentMemory", data={"id": row["id"], "updates": {"value": value, "updatedAt": }})` - - If not: `execute_on_client(action="insert", table="agentMemory", data={"agentName": agent_name, "key": key, "value": value, "scope": scope})` - - Returns `"Stored memory: [key] = [value]"` - - **`recall_memories(key_pattern: str = None, scope: str = "global", limit: int = 10)`**: - - Calls `execute_on_client(action="select", table="agentMemory", filters={"agentName": agent_name, "scope": scope, "search": key_pattern})` - - Returns formatted list: `"key1: value1\nkey2: value2\n..."` or `"No memories found."` - - Timestamps are Unix milliseconds (consistent with Electron's `Date.now()`) - - Agent name scoping: each agent only sees its own memories (filtered by `agentName`) -- **Files:** `app/agents/tools/memory_tools.py` -- **Outcome:** Two reusable tools any agent can include. Upsert semantics via select-then-insert/update. - -### Step 5.3 — Register memory tools on all agents - -- [ ] Update `app/agents/task_agent.py`: - - Import `create_memory_tools` from `app/agents/tools/memory_tools` - - Add memory tools to `get_tools()`: `return [list_tasks, create_task, ..., *create_memory_tools("task_agent")]` - - Append to `_SYSTEM_PROMPT`: `"\n\nYou can store important facts about user preferences using store_memory and recall past facts using recall_memories. Store corrections, preferences, and patterns the user shares (e.g. 'User prefers short task titles', 'Default priority is medium'). Always check memories before giving advice."` -- [ ] Update `app/agents/project_agent.py` — same pattern with `create_memory_tools("project_agent")` -- [ ] Update `app/agents/note_agent.py` — same pattern with `create_memory_tools("note_agent")` -- [ ] Update `app/agents/checkpoint_agent.py` — same pattern with `create_memory_tools("checkpoint_agent")` -- **Files:** `app/agents/task_agent.py`, `app/agents/project_agent.py`, `app/agents/note_agent.py`, `app/agents/checkpoint_agent.py` -- **Outcome:** All 4 chat agents can store and recall persistent memories. Each agent's memories are scoped by `agentName`. - -### Step 5.4 — Extend ChatContext with agent memories - -- [ ] Update `app/schemas.py`: - - Add `agent_memories: list[dict[str, Any]] = Field(default_factory=list)` to `ChatContext` - - These are pre-loaded by Electron (from `agent_memory` table) and included in every request -- [ ] Agent `handle()` methods already receive full `context` dict — memories are visible in `context["agent_memories"]` -- [ ] Agent system prompts reference memories from context: agents see pre-loaded memories AND can call `recall_memories` for targeted lookup -- **Files:** `app/schemas.py` -- **Outcome:** Backend receives pre-loaded memories from Electron. Agents have dual-path access: context injection (passive) + tool call (active). - -### Phase 5 — Verification - -| # | Scenario | Expected | -|---|---|---| -| 1 | **Chat WS bidirectional** | Connect → send `chat_request` → receive `tool_call` → respond `tool_result` → receive `text_chunk` → `final` | -| 2 | **All existing tools work** | "List my tasks" over chat WS → `tool_call(select, tasks)` → Electron returns rows → LLM responds with real task data | -| 3 | **Store memory** | "Remember that I prefer short task titles" → `store_memory("task_title_preference", "short")` → `tool_call(insert, agentMemory)` → Electron persists | -| 4 | **Recall memory** | New chat session → "How should I name tasks?" → agent sees pre-loaded memory in context or calls `recall_memories` → references stored preference | -| 5 | **Upsert semantics** | Store same key twice → only one row exists with updated value | -| 6 | **Agent scope isolation** | `task_agent` stores memory → `note_agent` cannot see it (filtered by `agentName`) | -| 7 | **Project scope** | Store memory with `scope="project:"` → only visible in that project's chat context | -| 8 | **Tool timeout** | Disconnect Electron mid-tool-call → 30s timeout → tool returns error → LLM handles gracefully | -| 9 | **Concurrent tool calls** | Agent calls `list_tasks` then `recall_memories` in sequence → both WS round-trips succeed | -| 10 | **Existing tests pass** | `pytest` — no regressions in agent tools or orchestrator | - -### Phase 5 — Step Dependencies - -``` -Step 5.1 (chat WS fix) ──────────────► Step 5.2 (memory tools) ──► Step 5.3 (register on agents) - ──► Step 5.4 (extend ChatContext) - -Step 5.1 is the BLOCKER — nothing else works until bidirectional tool calls are wired. -Steps 5.3 and 5.4 can run in parallel after 5.2. -``` +> See `BACKEND_PLAN.md` Steps 14–28 for the full Architecture v2 implementation plan. --- diff --git a/BACKEND_PLAN.md b/BACKEND_PLAN.md index 8ed7dd8..9818a7a 100644 --- a/BACKEND_PLAN.md +++ b/BACKEND_PLAN.md @@ -516,6 +516,12 @@ adiuva-api/ | GET | `/api/v1/oauth/{provider}/authorize` | JWT | — | `{authorization_url}` | | GET | `/api/v1/oauth/{provider}/callback` | — | OAuth code | `{encrypted_token}` | | WS | `/api/v1/ws/device` | JWT | `device_hello` (first frame) | Agent trigger + tool_call frames | +| GET | `/api/v1/memory/core` | JWT | — | Core memory entries | +| GET | `/api/v1/memory/associative` | JWT | — | Associative memories | +| GET | `/api/v1/memory/episodic` | JWT | — | Episodic summaries | +| GET | `/api/v1/memory/proactive` | JWT | — | Proactive patterns | +| DELETE | `/api/v1/memory/{type}/{id}` | JWT | — | `{ok: true}` | +| POST | `/api/v1/oauth/{provider}/refresh` | JWT | — | `{encrypted_token}` | --- @@ -559,6 +565,171 @@ adiuva-api/ --- +--- + +## Architecture v2 — Integration Phases + +> **Reference:** `architecture-v2.md` — Local-first topology, BYOK LLM keys, MemGPT-style memory middleware, Popup scoping with navigation directives, Batch Agent. +> +> These phases build on top of the completed Steps 1–13 and Phase 3 (3.1–3.6). Phase 5 from `AI_REFACTOR_PLAN.md` (on-device KV memory) is superseded by cloud-side memory middleware below. + +### Step 14 — Fix chat WS for bidirectional tool calls (V2.0.1) +> Blocker: `chat_stream()` never calls `set_client_executor()` — all 23 agent tools fail during chat WS sessions. + +- [ ] Rewrite `app/api/routes/chat.py` `chat_stream()`: + - `pending_calls: dict[str, asyncio.Future]` for tool-call round-trips + - Concurrent receive loop (dispatches `tool_result` → resolves futures) + orchestration task + - `set_client_executor()` before orchestrating, `clear_client_executor()` in finally + - Parse first frame as `{"type": "chat_request", ...}` + - Send `{"type": "text_chunk", "text": "..."}` + `{"type": "final", "response": "..."}` + - Heartbeat ping every 30s, 30s timeout on tool_result +- [ ] Tests: verify all 23 tools work over chat WS +- **Files:** `app/api/routes/chat.py`, `app/core/orchestrator.py` +- **Outcome:** Full bidirectional chat WS. All agent tools now work over `/chat/stream`. + +### Step 15 — Agent scheduler + OAuth endpoints (V2.0.2) +- [ ] `app/core/agent_scheduler.py`: APScheduler, 60s check loop, PostgreSQL advisory locks for multi-instance +- [ ] `app/api/routes/oauth.py`: `GET /oauth/{provider}/authorize`, `GET /oauth/{provider}/callback`, `POST /oauth/{provider}/refresh` + - Gmail: `gmail.readonly` scope + - Outlook/Teams: `Mail.Read`, `ChannelMessage.Read.All` scopes + - Encrypts tokens with Fernet, returns encrypted blob for `CloudAgentConfig.oauth_token_encrypted` +- [ ] Integrate scheduler with FastAPI lifespan (start on startup, shutdown gracefully) +- **Dependencies:** `apscheduler>=4.0` +- **Files:** `app/core/agent_scheduler.py` (new), `app/api/routes/oauth.py` (new), `app/main.py` +- **Outcome:** Agents run on cron schedules. OAuth flow for Gmail/Teams/Outlook. + +### Step 16 — BYOK: API key passthrough in LLM factory (V2.1.1) +- [ ] Add `api_key: str | None = None` param to `get_llm()`, `get_router_llm()`, `embed()` +- [ ] When provided, use BYOK key instead of `_api_key_for_model()` server fallback +- [ ] Add Cerebras support: `_api_key_for_model()` handles `cerebras/` prefix +- [ ] Key is never persisted, never logged +- **Files:** `app/core/llm.py`, `app/config/settings.py` +- **Outcome:** LLM factory accepts per-request API keys with server-side fallback. + +### Step 17 — BYOK: Thread key through request lifecycle (V2.1.2) +- [ ] `ContextVar`: `_request_api_key: ContextVar[str | None]` in `app/core/llm.py` +- [ ] `get_llm()` reads from ContextVar when no explicit `api_key` param +- [ ] `ChatRequest` schema: add `api_key: str | None = None` +- [ ] WS handlers set ContextVar from incoming frame's `api_key` field +- [ ] Fallback: if no BYOK key → server-side key (backward compat + Batch Agent) +- **Files:** `app/schemas.py`, `app/core/llm.py`, `app/api/routes/chat.py`, `app/core/orchestrator.py` +- **Outcome:** BYOK key flows from request → orchestrator → agent → LLM. Never stored. + +### Step 18 — pgvector + memory DB tables (V2.2.1) +- [ ] Add `pgvector` to `requirements.txt` +- [ ] New SQLAlchemy models in `app/models.py`: + - `CoreMemory`: id, user_id, key, value, created_at, updated_at + - `AssociativeEmbedding`: id, user_id, entity_type, entity_id, label, embedding (pgvector Vector), metadata_json, created_at + - `EpisodicSummary`: id, user_id, session_id, summary, key_entities, created_at + - `ProactivePattern`: id, user_id, pattern_type, description, confidence, last_detected_at, created_at +- [ ] Alembic migration with `CREATE EXTENSION IF NOT EXISTS vector` +- **Files:** `app/models.py`, `requirements.txt`, `alembic/versions/` +- **Outcome:** Memory tables in PostgreSQL with pgvector support. + +### Step 19 — Memory service layer (V2.2.2) +- [ ] Create `app/core/memory.py` — `MemoryService` class: + - `load_core_memory(user_id)`, `write_core_memory(user_id, key, value)` (upsert) + - `search_associative(user_id, query_embedding, top_k=5)` (pgvector similarity) + - `write_associative(user_id, entity_type, entity_id, label, embedding, metadata)` + - `get_recent_episodic(user_id, limit=3)`, `write_episodic(user_id, session_id, summary, key_entities)` + - `get_proactive_patterns(user_id)`, `write_proactive_pattern(user_id, ...)` + - `delete_memory(user_id, memory_type, memory_id)` — user review/delete +- Uses async SQLAlchemy sessions from `app/db.py` +- **Files:** `app/core/memory.py` (new) +- **Outcome:** Complete CRUD + similarity search for all 4 memory types. + +### Step 20 — Memory middleware wrapper (V2.2.3) +- [ ] Create `app/core/memory_middleware.py`: + - `enrich_with_memory(user_id, message, context)`: + 1. Load core memory (always injected) + 2. Embed user message → pgvector similarity search on associative memory + 3. Load recent episodic summaries + 4. Load proactive patterns + 5. Return enriched context + - `post_process_memory(user_id, message, response, context)`: + 1. LLM decides what to remember (semi-autonomous) + 2. Write core memory for preferences + 3. Write associative for entity relationships + 4. Compress session into episodic summary when conversation ends +- **Files:** `app/core/memory_middleware.py` (new) +- **Outcome:** Memory wraps every orchestrator call — enrich before, learn after. + +### Step 21 — Integrate memory into orchestrator (V2.2.4) +- [ ] Modify `orchestrate()` / `orchestrate_stream()`: + - Before `classify_intent`: call `enrich_with_memory()` + - After agent response: call `post_process_memory()` +- [ ] Add `memory_write` tool to Router's system prompt +- **Files:** `app/core/orchestrator.py` +- **Outcome:** All chat interactions are memory-enriched. Router can explicitly write memories. + +### Step 22 — Memory management API (V2.2.5) +- [ ] Create `app/api/routes/memory.py`: + - `GET /api/v1/memory/core` — list core memories + - `GET /api/v1/memory/associative` — list associative memories + - `GET /api/v1/memory/episodic` — list episodic summaries + - `GET /api/v1/memory/proactive` — list proactive patterns + - `DELETE /api/v1/memory/{type}/{id}` — user deletes a memory +- [ ] Register router in `app/main.py` +- **Files:** `app/api/routes/memory.py` (new), `app/main.py` +- **Outcome:** Users can review and delete their memories (semi-autonomous model). + +### Step 23 — Scope context + structured response (V2.3.1) +- [ ] Update `ChatRequest`: add `source: Literal["home", "popup"] = "home"`, `scope: dict | None = None` +- [ ] New response schemas in `app/schemas.py`: + - `AiResponse`: response (text + ui_directive + data), navigation, mutations, context + - `NavigationDirective`: action, target, filter + - `MutationCommand`: action, data + - `ResponseContext`: scope_changed, new_scope +- [ ] Used when `source == "popup"` or navigation needed; `ChatResponse` kept for backward compat +- **Files:** `app/schemas.py` +- **Outcome:** Popup can receive navigation directives and scoped responses. + +### Step 24 — Enhanced Router capabilities (V2.3.2) +- [ ] Update orchestrator system prompt + tool set: + - `ask_user_clarification` — return clarification question, WS handler waits for next user message + - `render_ui_directive` — specify UI rendering (task_card, chart, diagram) + - `cross_entity_resolve` — include navigation directive when scope crosses entities +- **Files:** `app/core/orchestrator.py` +- **Outcome:** Router can clarify, render rich UI, and navigate across entities. + +### Step 25 — WS protocol evolution (V2.3.3) +- [ ] Add to `WsFrameType`: `user_request`, `data_request`, `data_response`, `ai_response`, `mutation` +- [ ] `user_request` = enhanced `chat_request` with source, scope, api_key +- [ ] `ai_response` = structured response with navigation + mutations + context +- [ ] Server auto-detects client frame format for backward compat +- **Files:** `app/schemas.py`, `app/api/routes/chat.py` +- **Outcome:** v2 WS protocol with full backward compatibility. + +### Step 26 — Batch agent implementation (V2.4.1) +- [ ] Create `app/agents/batch_agent.py` — background agent (not `ChatAgent`): + - `pattern_detection`: analyze episodic summaries for recurring patterns + - `memory_consolidation`: merge redundant episodic summaries + - `suggestion_generation`: create proactive pattern entries + - `overdue_detection`: request task data from Electron via device WS +- [ ] Uses server-side LLM key (not BYOK — runs without user request) +- [ ] Requires device online for entity data access +- **Files:** `app/agents/batch_agent.py` (new) +- **Outcome:** Background agent that learns patterns and generates proactive suggestions. + +### Step 27 — Batch agent scheduling + proactive surfacing (V2.4.2) +- [ ] Integrate with agent scheduler from Step 15 +- [ ] Default: every 6h per user, only when device online +- [ ] Proactive patterns surfaced via memory middleware in Router context +- **Files:** `app/core/agent_scheduler.py`, `app/core/memory_middleware.py` +- **Outcome:** Batch runs automatically. Suggestions appear in chat responses. + +### Step 28 — E2E memory encryption + tests (V2.5) +- [ ] Application-level Fernet encryption for all memory table writes +- [ ] Encryption key derived from user passphrase, sent with requests +- [ ] `tests/test_byok.py`: key threading, Cerebras model string, fallback +- [ ] `tests/test_memory.py`: all 4 memory types, pgvector search, middleware +- [ ] `tests/test_popup.py`: scope, navigation directives, cross-entity +- [ ] `tests/test_batch_agent.py`: pattern detection, consolidation +- **Files:** `app/core/memory.py`, `app/storage/encryption.py`, `tests/` +- **Outcome:** Fully tested, encrypted memory system. + +--- + ## Development Rules 1. **NEVER persist user data in plaintext.** The DB stores only auth, billing, storage metadata, and marketplace data. User context arrives in requests and is discarded. Cloud blobs are E2E encrypted client-side — backend only stores opaque bytes.