AI plan v1
This commit is contained in:
171
BACKEND_PLAN.md
171
BACKEND_PLAN.md
@@ -516,6 +516,12 @@ adiuva-api/
|
||||
| GET | `/api/v1/oauth/{provider}/authorize` | JWT | — | `{authorization_url}` |
|
||||
| GET | `/api/v1/oauth/{provider}/callback` | — | OAuth code | `{encrypted_token}` |
|
||||
| WS | `/api/v1/ws/device` | JWT | `device_hello` (first frame) | Agent trigger + tool_call frames |
|
||||
| GET | `/api/v1/memory/core` | JWT | — | Core memory entries |
|
||||
| GET | `/api/v1/memory/associative` | JWT | — | Associative memories |
|
||||
| GET | `/api/v1/memory/episodic` | JWT | — | Episodic summaries |
|
||||
| GET | `/api/v1/memory/proactive` | JWT | — | Proactive patterns |
|
||||
| DELETE | `/api/v1/memory/{type}/{id}` | JWT | — | `{ok: true}` |
|
||||
| POST | `/api/v1/oauth/{provider}/refresh` | JWT | — | `{encrypted_token}` |
|
||||
|
||||
---
|
||||
|
||||
@@ -559,6 +565,171 @@ adiuva-api/
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Architecture v2 — Integration Phases
|
||||
|
||||
> **Reference:** `architecture-v2.md` — Local-first topology, BYOK LLM keys, MemGPT-style memory middleware, Popup scoping with navigation directives, Batch Agent.
|
||||
>
|
||||
> These phases build on top of the completed Steps 1–13 and Phase 3 (3.1–3.6). Phase 5 from `AI_REFACTOR_PLAN.md` (on-device KV memory) is superseded by cloud-side memory middleware below.
|
||||
|
||||
### Step 14 — Fix chat WS for bidirectional tool calls (V2.0.1)
|
||||
> Blocker: `chat_stream()` never calls `set_client_executor()` — all 23 agent tools fail during chat WS sessions.
|
||||
|
||||
- [ ] Rewrite `app/api/routes/chat.py` `chat_stream()`:
|
||||
- `pending_calls: dict[str, asyncio.Future]` for tool-call round-trips
|
||||
- Concurrent receive loop (dispatches `tool_result` → resolves futures) + orchestration task
|
||||
- `set_client_executor()` before orchestrating, `clear_client_executor()` in finally
|
||||
- Parse first frame as `{"type": "chat_request", ...}`
|
||||
- Send `{"type": "text_chunk", "text": "..."}` + `{"type": "final", "response": "..."}`
|
||||
- Heartbeat ping every 30s, 30s timeout on tool_result
|
||||
- [ ] Tests: verify all 23 tools work over chat WS
|
||||
- **Files:** `app/api/routes/chat.py`, `app/core/orchestrator.py`
|
||||
- **Outcome:** Full bidirectional chat WS. All agent tools now work over `/chat/stream`.
|
||||
|
||||
### Step 15 — Agent scheduler + OAuth endpoints (V2.0.2)
|
||||
- [ ] `app/core/agent_scheduler.py`: APScheduler, 60s check loop, PostgreSQL advisory locks for multi-instance
|
||||
- [ ] `app/api/routes/oauth.py`: `GET /oauth/{provider}/authorize`, `GET /oauth/{provider}/callback`, `POST /oauth/{provider}/refresh`
|
||||
- Gmail: `gmail.readonly` scope
|
||||
- Outlook/Teams: `Mail.Read`, `ChannelMessage.Read.All` scopes
|
||||
- Encrypts tokens with Fernet, returns encrypted blob for `CloudAgentConfig.oauth_token_encrypted`
|
||||
- [ ] Integrate scheduler with FastAPI lifespan (start on startup, shutdown gracefully)
|
||||
- **Dependencies:** `apscheduler>=4.0`
|
||||
- **Files:** `app/core/agent_scheduler.py` (new), `app/api/routes/oauth.py` (new), `app/main.py`
|
||||
- **Outcome:** Agents run on cron schedules. OAuth flow for Gmail/Teams/Outlook.
|
||||
|
||||
### Step 16 — BYOK: API key passthrough in LLM factory (V2.1.1)
|
||||
- [ ] Add `api_key: str | None = None` param to `get_llm()`, `get_router_llm()`, `embed()`
|
||||
- [ ] When provided, use BYOK key instead of `_api_key_for_model()` server fallback
|
||||
- [ ] Add Cerebras support: `_api_key_for_model()` handles `cerebras/` prefix
|
||||
- [ ] Key is never persisted, never logged
|
||||
- **Files:** `app/core/llm.py`, `app/config/settings.py`
|
||||
- **Outcome:** LLM factory accepts per-request API keys with server-side fallback.
|
||||
|
||||
### Step 17 — BYOK: Thread key through request lifecycle (V2.1.2)
|
||||
- [ ] `ContextVar`: `_request_api_key: ContextVar[str | None]` in `app/core/llm.py`
|
||||
- [ ] `get_llm()` reads from ContextVar when no explicit `api_key` param
|
||||
- [ ] `ChatRequest` schema: add `api_key: str | None = None`
|
||||
- [ ] WS handlers set ContextVar from incoming frame's `api_key` field
|
||||
- [ ] Fallback: if no BYOK key → server-side key (backward compat + Batch Agent)
|
||||
- **Files:** `app/schemas.py`, `app/core/llm.py`, `app/api/routes/chat.py`, `app/core/orchestrator.py`
|
||||
- **Outcome:** BYOK key flows from request → orchestrator → agent → LLM. Never stored.
|
||||
|
||||
### Step 18 — pgvector + memory DB tables (V2.2.1)
|
||||
- [ ] Add `pgvector` to `requirements.txt`
|
||||
- [ ] New SQLAlchemy models in `app/models.py`:
|
||||
- `CoreMemory`: id, user_id, key, value, created_at, updated_at
|
||||
- `AssociativeEmbedding`: id, user_id, entity_type, entity_id, label, embedding (pgvector Vector), metadata_json, created_at
|
||||
- `EpisodicSummary`: id, user_id, session_id, summary, key_entities, created_at
|
||||
- `ProactivePattern`: id, user_id, pattern_type, description, confidence, last_detected_at, created_at
|
||||
- [ ] Alembic migration with `CREATE EXTENSION IF NOT EXISTS vector`
|
||||
- **Files:** `app/models.py`, `requirements.txt`, `alembic/versions/`
|
||||
- **Outcome:** Memory tables in PostgreSQL with pgvector support.
|
||||
|
||||
### Step 19 — Memory service layer (V2.2.2)
|
||||
- [ ] Create `app/core/memory.py` — `MemoryService` class:
|
||||
- `load_core_memory(user_id)`, `write_core_memory(user_id, key, value)` (upsert)
|
||||
- `search_associative(user_id, query_embedding, top_k=5)` (pgvector similarity)
|
||||
- `write_associative(user_id, entity_type, entity_id, label, embedding, metadata)`
|
||||
- `get_recent_episodic(user_id, limit=3)`, `write_episodic(user_id, session_id, summary, key_entities)`
|
||||
- `get_proactive_patterns(user_id)`, `write_proactive_pattern(user_id, ...)`
|
||||
- `delete_memory(user_id, memory_type, memory_id)` — user review/delete
|
||||
- Uses async SQLAlchemy sessions from `app/db.py`
|
||||
- **Files:** `app/core/memory.py` (new)
|
||||
- **Outcome:** Complete CRUD + similarity search for all 4 memory types.
|
||||
|
||||
### Step 20 — Memory middleware wrapper (V2.2.3)
|
||||
- [ ] Create `app/core/memory_middleware.py`:
|
||||
- `enrich_with_memory(user_id, message, context)`:
|
||||
1. Load core memory (always injected)
|
||||
2. Embed user message → pgvector similarity search on associative memory
|
||||
3. Load recent episodic summaries
|
||||
4. Load proactive patterns
|
||||
5. Return enriched context
|
||||
- `post_process_memory(user_id, message, response, context)`:
|
||||
1. LLM decides what to remember (semi-autonomous)
|
||||
2. Write core memory for preferences
|
||||
3. Write associative for entity relationships
|
||||
4. Compress session into episodic summary when conversation ends
|
||||
- **Files:** `app/core/memory_middleware.py` (new)
|
||||
- **Outcome:** Memory wraps every orchestrator call — enrich before, learn after.
|
||||
|
||||
### Step 21 — Integrate memory into orchestrator (V2.2.4)
|
||||
- [ ] Modify `orchestrate()` / `orchestrate_stream()`:
|
||||
- Before `classify_intent`: call `enrich_with_memory()`
|
||||
- After agent response: call `post_process_memory()`
|
||||
- [ ] Add `memory_write` tool to Router's system prompt
|
||||
- **Files:** `app/core/orchestrator.py`
|
||||
- **Outcome:** All chat interactions are memory-enriched. Router can explicitly write memories.
|
||||
|
||||
### Step 22 — Memory management API (V2.2.5)
|
||||
- [ ] Create `app/api/routes/memory.py`:
|
||||
- `GET /api/v1/memory/core` — list core memories
|
||||
- `GET /api/v1/memory/associative` — list associative memories
|
||||
- `GET /api/v1/memory/episodic` — list episodic summaries
|
||||
- `GET /api/v1/memory/proactive` — list proactive patterns
|
||||
- `DELETE /api/v1/memory/{type}/{id}` — user deletes a memory
|
||||
- [ ] Register router in `app/main.py`
|
||||
- **Files:** `app/api/routes/memory.py` (new), `app/main.py`
|
||||
- **Outcome:** Users can review and delete their memories (semi-autonomous model).
|
||||
|
||||
### Step 23 — Scope context + structured response (V2.3.1)
|
||||
- [ ] Update `ChatRequest`: add `source: Literal["home", "popup"] = "home"`, `scope: dict | None = None`
|
||||
- [ ] New response schemas in `app/schemas.py`:
|
||||
- `AiResponse`: response (text + ui_directive + data), navigation, mutations, context
|
||||
- `NavigationDirective`: action, target, filter
|
||||
- `MutationCommand`: action, data
|
||||
- `ResponseContext`: scope_changed, new_scope
|
||||
- [ ] Used when `source == "popup"` or navigation needed; `ChatResponse` kept for backward compat
|
||||
- **Files:** `app/schemas.py`
|
||||
- **Outcome:** Popup can receive navigation directives and scoped responses.
|
||||
|
||||
### Step 24 — Enhanced Router capabilities (V2.3.2)
|
||||
- [ ] Update orchestrator system prompt + tool set:
|
||||
- `ask_user_clarification` — return clarification question, WS handler waits for next user message
|
||||
- `render_ui_directive` — specify UI rendering (task_card, chart, diagram)
|
||||
- `cross_entity_resolve` — include navigation directive when scope crosses entities
|
||||
- **Files:** `app/core/orchestrator.py`
|
||||
- **Outcome:** Router can clarify, render rich UI, and navigate across entities.
|
||||
|
||||
### Step 25 — WS protocol evolution (V2.3.3)
|
||||
- [ ] Add to `WsFrameType`: `user_request`, `data_request`, `data_response`, `ai_response`, `mutation`
|
||||
- [ ] `user_request` = enhanced `chat_request` with source, scope, api_key
|
||||
- [ ] `ai_response` = structured response with navigation + mutations + context
|
||||
- [ ] Server auto-detects client frame format for backward compat
|
||||
- **Files:** `app/schemas.py`, `app/api/routes/chat.py`
|
||||
- **Outcome:** v2 WS protocol with full backward compatibility.
|
||||
|
||||
### Step 26 — Batch agent implementation (V2.4.1)
|
||||
- [ ] Create `app/agents/batch_agent.py` — background agent (not `ChatAgent`):
|
||||
- `pattern_detection`: analyze episodic summaries for recurring patterns
|
||||
- `memory_consolidation`: merge redundant episodic summaries
|
||||
- `suggestion_generation`: create proactive pattern entries
|
||||
- `overdue_detection`: request task data from Electron via device WS
|
||||
- [ ] Uses server-side LLM key (not BYOK — runs without user request)
|
||||
- [ ] Requires device online for entity data access
|
||||
- **Files:** `app/agents/batch_agent.py` (new)
|
||||
- **Outcome:** Background agent that learns patterns and generates proactive suggestions.
|
||||
|
||||
### Step 27 — Batch agent scheduling + proactive surfacing (V2.4.2)
|
||||
- [ ] Integrate with agent scheduler from Step 15
|
||||
- [ ] Default: every 6h per user, only when device online
|
||||
- [ ] Proactive patterns surfaced via memory middleware in Router context
|
||||
- **Files:** `app/core/agent_scheduler.py`, `app/core/memory_middleware.py`
|
||||
- **Outcome:** Batch runs automatically. Suggestions appear in chat responses.
|
||||
|
||||
### Step 28 — E2E memory encryption + tests (V2.5)
|
||||
- [ ] Application-level Fernet encryption for all memory table writes
|
||||
- [ ] Encryption key derived from user passphrase, sent with requests
|
||||
- [ ] `tests/test_byok.py`: key threading, Cerebras model string, fallback
|
||||
- [ ] `tests/test_memory.py`: all 4 memory types, pgvector search, middleware
|
||||
- [ ] `tests/test_popup.py`: scope, navigation directives, cross-entity
|
||||
- [ ] `tests/test_batch_agent.py`: pattern detection, consolidation
|
||||
- **Files:** `app/core/memory.py`, `app/storage/encryption.py`, `tests/`
|
||||
- **Outcome:** Fully tested, encrypted memory system.
|
||||
|
||||
---
|
||||
|
||||
## Development Rules
|
||||
|
||||
1. **NEVER persist user data in plaintext.** The DB stores only auth, billing, storage metadata, and marketplace data. User context arrives in requests and is discarded. Cloud blobs are E2E encrypted client-side — backend only stores opaque bytes.
|
||||
|
||||
Reference in New Issue
Block a user