api/AI_REFACTOR_PLAN.md

# AI Refactor Plan — Adiuva Backend

> **Objective:** Transform backend tools from JSON-action-descriptor-returning functions into real bidirectional executors. Each tool sends structured CRUD operations to the Electron client via WebSocket, receives real data back, and returns meaningful results to the LLM. The LLM reasons about actual user data instead of serialized action payloads.
>
> **Electron app:** Lives at `../adiuva/`. See `../adiuva/AI_REFACTOR_PLAN.md`.
>
> **Protocol:** Execute steps sequentially. Each step is atomic and committable. Mark `[x]` when done.

---

## Architecture — Before vs After

### Before (current)
```
LLM calls list_tasks(status="todo")
  → tool returns: '{"action":"list","table":"tasks","filters":{"status":"todo"}}'
  → _tool_loop feeds that JSON string as ToolMessage to LLM
  → LLM sees a descriptor, NOT real data — cannot reason about tasks
  → Final response: generic "Here are your tasks" (no actual task data)
  → Action descriptors sent in final WS frame for Electron to execute post-response
```

### After (target)
```
LLM calls list_tasks(status="todo")
  → tool calls execute_on_client(action="select", table="tasks", filters={status:"todo"})
    → WS frame sent to Electron: {type:"tool_call", id:"abc", action:"select", table:"tasks", filters:{status:"todo"}}
    → Electron runs: db.select().from(tasks).where(eq(tasks.status, "todo")).all()
    → WS frame back: {type:"tool_result", id:"abc", rows:[{id:"1",title:"Buy milk",...}, ...]}
  → tool returns: "Found 3 tasks: 1. Buy milk (high, due tomorrow) 2. ..."
  → _tool_loop feeds that as ToolMessage to LLM
  → LLM sees REAL data — can reason, count, compare, summarize
```

---

## WS Protocol — Typed Frames

| Direction | `type` | Payload |
|---|---|---|
| Client → Server | `chat_request` | `{ message: str, context: ChatContext }` |
| Server → Client | `text_chunk` | `{ text: str }` |
| Server → Client | `tool_call` | `{ id: str, action: str, table?: str, data?: dict, filters?: dict, vector?: list[float], limit?: int }` |
| Client → Server | `tool_result` | `{ id: str, row?: dict, rows?: list[dict], results?: list[dict], deleted?: bool, ok?: bool, error?: str }` |
| Server → Client | `final` | `{ response: str }` |
| Server → Client | `ping` | `{}` |

**Actions:**

| `action` | What Electron does (Drizzle) | `tool_result` shape |
|---|---|---|
| `select` | `db.select().from(table).where(filters)` | `{ rows: [...] }` |
| `get` | `db.select().from(table).where(id=...).get()` | `{ row: {...} or null }` |
| `insert` | `db.insert(table).values({id: uuid(), ...data}).returning().get()` | `{ row: {...} }` |
| `update` | `db.update(table).set(updates).where(id=...).returning().get()` | `{ row: {...} }` |
| `delete` | `db.delete(table).where(id=...).run()` | `{ deleted: true }` |
| `vector_upsert` | LanceDB upsert with pre-computed vector | `{ ok: true }` |
| `vector_search` | LanceDB search by vector | `{ results: [{id, content, score}...] }` |

**Electron generates IDs + timestamps.** Backend tools never send `id` or `createdAt` in `insert` data — Electron adds `id: uuid()`, `createdAt: Date.now()`, `updatedAt: Date.now()`.

---

## SQLite Schema Reference (Electron's local database)

Tools must use **camelCase** field names (Drizzle maps them to snake_case internally):

| Table | Columns |
|---|---|
| `tasks` | id, projectId, title, description, status (todo\|in_progress\|done), priority (high\|medium\|low), assignee (JSON array string), dueDate (ms), isAiSuggested (0\|1), isApproved (0\|1), createdAt (ms) |
| `projects` | id, clientId, name, status (active\|archived), aiSummary, createdAt (ms) |
| `checkpoints` | id, projectId (required), title, date (ms), isAiSuggested (0\|1), isApproved (0\|1), createdAt (ms) |
| `notes` | id, projectId, title, content (markdown), createdAt (ms), updatedAt (ms) |
| `taskComments` | id, taskId, author, content, createdAt (ms) |
| `clients` | id, parentId, name, industry, createdAt (ms) |

---

## Phase B — Backend Changes

### Step B.1 — WS context + frame types
- [x] Create `app/core/ws_context.py` (~25 lines):
  - `_client_executor: ContextVar[Callable]` — holds the async callback for the current WS session
  - `async def execute_on_client(action, table=None, data=None, filters=None, vector=None, limit=None) -> dict`:
    - Reads callback from ContextVar
    - Builds `tool_call` payload: `{id: str(uuid4()), action, table, data, filters, vector, limit}` (omits None fields)
    - Calls `await callback(payload)` — which sends the WS frame and waits for `tool_result`
    - Returns the result dict
  - `def set_client_executor(fn)` / `def clear_client_executor()` — ContextVar management
- [x] Add to `app/schemas.py`:
  - `WsFrameType(str, Enum)`: `chat_request`, `text_chunk`, `tool_call`, `tool_result`, `final`, `ping`
  - `WsToolCall(BaseModel)`: `type`, `id`, `action`, `table?`, `data?`, `filters?`, `vector?`, `limit?`
  - `WsToolResult(BaseModel)`: `type`, `id`, `row?`, `rows?`, `results?`, `deleted?`, `ok?`, `error?`
  - `WsTextChunk(BaseModel)`: `type`, `text`
  - `WsFinal(BaseModel)`: `type`, `response`
- **Files:** `app/core/ws_context.py`, `app/schemas.py`
- **Outcome:** Any tool can `await execute_on_client(...)` to query/mutate the user's local DB.

### Step B.2 — Rewrite all 23 tools to use `execute_on_client()`
- [x] Each tool: same `@tool` decorator, same parameters, same docstring. Replace `return json.dumps({...})` body with:
  1. Call `result = await execute_on_client(action=..., table=..., data/filters=...)`
  2. Return human-readable string with confirmation + key data from `result`

- [x] **`app/agents/task_agent.py` (8 tools):**
  - `list_tasks(project_id, status, search, order_by)`:
    ```python
    result = await execute_on_client(action="select", table="tasks", filters={
        "projectId": project_id or None,
        "status": status or None,
        "search": search or None,
        "orderBy": order_by or None,
    })
    rows = result.get("rows", [])
    if not rows:
        return "No tasks found matching the given filters."
    lines = [f"- {r['title']} (status: {r['status']}, priority: {r['priority']}, id: {r['id']})" for r in rows]
    return f"Found {len(rows)} task(s):\n" + "\n".join(lines)
    ```
  - `create_task(title, ...)`:
    ```python
    result = await execute_on_client(action="insert", table="tasks", data={
        "title": title, "description": description or None, "status": status,
        "priority": priority, "assignee": assignees, "dueDate": due_date or None,
        "projectId": project_id or None, "isAiSuggested": is_ai_suggested, "isApproved": is_approved,
    })
    row = result["row"]
    return f"Task created: '{row['title']}' (id: {row['id']}, status: {row['status']}, priority: {row['priority']})"
    ```
  - `update_task(task_id, ...)`: build updates dict (same logic as now) → `execute_on_client(action="update", table="tasks", data={"id": task_id, "updates": updates})` → return "Task updated: {title}"
  - `delete_task(task_id)`: `execute_on_client(action="delete", table="tasks", data={"id": task_id})` → return "Task deleted"
  - `list_tasks_due_today()`: calculate today's start/end ms → `execute_on_client(action="select", table="tasks", filters={"dueDateFrom": start, "dueDateTo": end})` → format + return
  - `list_task_comments(task_id)`: `execute_on_client(action="select", table="taskComments", filters={"taskId": task_id})` → format + return
  - `add_task_comment(task_id, author, content)`: `execute_on_client(action="insert", table="taskComments", data={...})` → return confirmation
  - `delete_task_comment(comment_id)`: `execute_on_client(action="delete", table="taskComments", data={"id": comment_id})` → return confirmation

- [x] **`app/agents/project_agent.py` (6 tools):**
  - `list_projects(client_id, include_archived)`: `execute_on_client(action="select", table="projects", filters={clientId, includeArchived})` → format + return
  - `list_all_projects()`: `execute_on_client(action="select", table="projects")` → format + return
  - `get_project(project_id)`: `execute_on_client(action="get", table="projects", data={"id": project_id})` → return project details or "not found"
  - `create_project(name, client_id)`: `execute_on_client(action="insert", table="projects", data={name, clientId})` → return confirmation + id
  - `update_project(project_id, ...)`: build updates → `execute_on_client(action="update", ...)` → return confirmation
  - `delete_project(project_id)`: `execute_on_client(action="delete", ...)` → return confirmation

- [x] **`app/agents/checkpoint_agent.py` (4 tools):**
  - `list_checkpoints(project_id)`: `execute_on_client(action="select", table="checkpoints", filters={projectId})` → format + return
  - `create_checkpoint(project_id, title, date, ...)`: `execute_on_client(action="insert", table="checkpoints", data={...})` → return confirmation + id
  - `update_checkpoint(checkpoint_id, ...)`: build updates → `execute_on_client(action="update", ...)` → return confirmation
  - `delete_checkpoint(checkpoint_id)`: `execute_on_client(action="delete", ...)` → return confirmation

- [x] **`app/agents/note_agent.py` (5 tools):**
  - `list_notes(project_id)`: `execute_on_client(action="select", table="notes", filters={projectId})` → format + return
  - `get_note(note_id)`: `execute_on_client(action="get", table="notes", data={"id": note_id})` → return full content or "not found"
  - `create_note(title, content, project_id)`: `execute_on_client(action="insert", table="notes", data={...})` → then `execute_on_client(action="vector_upsert", data={id, projectId, content}, vector=await embed(content))` → return confirmation
  - `update_note(note_id, ...)`: build updates → `execute_on_client(action="update", ...)` → then vector_upsert for updated content → return confirmation
  - `delete_note(note_id)`: `execute_on_client(action="delete", ...)` → return confirmation

- **Files:** `app/agents/task_agent.py`, `app/agents/project_agent.py`, `app/agents/checkpoint_agent.py`, `app/agents/note_agent.py`
- **Outcome:** All 23 tools query real user data via WS. LLM sees actual rows, not action descriptors.

### Step B.3 — Bidirectional WebSocket handler
- [x] Refactor `app/api/routes/chat.py` WS endpoint:
  - After auth + accept + receive `chat_request`:
    1. Create `execute_on_client` callback closure capturing the websocket:
       ```python
       pending_calls: dict[str, asyncio.Future] = {}

       async def on_client_result(frame: dict):
           """Called when a tool_result frame arrives from Electron."""
           fut = pending_calls.pop(frame["id"], None)
           if fut and not fut.done():
               fut.set_result(frame)

       async def execute_callback(payload: dict) -> dict:
           """Send tool_call to Electron, wait for tool_result."""
           call_id = payload["id"]
           fut = asyncio.get_event_loop().create_future()
           pending_calls[call_id] = fut
           await websocket.send_text(json.dumps({"type": "tool_call", **payload}))
           return await asyncio.wait_for(fut, timeout=30.0)
       ```
    2. Set `client_executor` ContextVar with `execute_callback`
    3. Run orchestrator in a task — it calls agents, agents call tools, tools call `execute_on_client()` which goes through the callback
    4. In parallel, run a message receive loop that dispatches incoming frames:
       - `tool_result` → `on_client_result(frame)`
       - `ping` → ignore
    5. Orchestrator yields `text_chunk` frames → send to client
    6. Send `final` frame when done
    7. Clear ContextVar
  - Keep heartbeat ping every 30s
  - 30s timeout on `tool_result` — if Electron doesn't respond, future raises `TimeoutError`, tool returns error string to LLM
- **Files:** `app/api/routes/chat.py`
- **Outcome:** Full bidirectional WS. Tool calls and text streaming happen concurrently on the same connection.

### Step B.4 — `_tool_loop` — no changes needed
- [x] Verify `app/core/agent_registry.py` works unchanged:
  - `_tool_loop` calls `tool_fn.ainvoke(args)` → tool awaits `execute_on_client()` (WS round-trip) → returns string → `ToolMessage(content=string)` → LLM sees real data
  - The async WS round-trip happens inside each tool. `_tool_loop` just sees an awaited tool returning a string — same as before, different content.
- **No code changes.** Just verify + add a log line for tool execution times if desired.

### Step B.5 — Orchestrator cleanup
- [x] Update `app/core/orchestrator.py`:
  - `orchestrate_stream()`: remove `"actions": []` from final frame. Final becomes: `{"done": true, "response": "..."}`
  - No other changes — `classify_intent` → `call_agent` → chunk response → final frame
- **Files:** `app/core/orchestrator.py`
- **Outcome:** Clean final frame. No more action descriptors in the protocol.

### Step B.6 — Add `/vectors/embed` endpoint
- [x] Add to `app/api/routes/vectors.py`:
  - `POST /api/v1/storage/vectors/embed`:
    - Request: `{ text: str }`
    - Response: `{ vector: list[float] }` (1536-dim from `text-embedding-3-small`)
    - Auth required (JWT)
  - Used by:
    - Backend tools: `note_agent` calls this before `vector_upsert`
    - Electron: `vectordb.ts` calls this for note embedding on create/update
- **Files:** `app/api/routes/vectors.py`
- **Outcome:** Single embedding endpoint. Both backend tools and Electron can generate vectors.

---

## Verification

| What to test | How |
|---|---|
| **Read flow** | "List my tasks" → `list_tasks` → `tool_call{select, tasks}` → Electron returns rows → LLM describes real tasks |
| **Write flow** | "Create a task called Buy milk" → `create_task` → `tool_call{insert, tasks, data:{title:"Buy milk"}}` → Electron inserts + returns row → tool confirms with id |
| **Multi-tool** | "How many todo tasks do I have?" → `list_tasks(status=todo)` → LLM counts actual rows → "You have 3 todo tasks" |
| **Vector search** | "Find notes about deployment" → tool embeds → `tool_call{vector_search, vector:[...]}` → Electron searches LanceDB → returns matching notes |
| **Vector upsert** | "Create a note about..." → insert note → vector_upsert with embedding → both SQLite + LanceDB updated |
| **Tool timeout** | Disconnect Electron mid-conversation → 30s timeout → tool returns error → LLM handles gracefully |
| **Concurrent calls** | Agent calls 2 tools in sequence → each does WS round-trip → both succeed → LLM sees both results |
| **_tool_loop max iter** | Verify 5-iteration limit still works → after 5 tool calls, LLM forced to answer without tools |

---

## Execution Notes

- **Phase 1 is the critical path.** Auth + backend client + drizzle executor + orchestrator refactor must land first.
- **Steps 1.1–1.4 are additive** — existing app keeps working until Step 1.5 swaps the orchestrator.
- **Step 2.1 is the point of no return** — after removing LangChain, there's no local AI fallback.
- **Phase B (backend changes) must land before Phase 1.3–1.5** — Electron needs the bidirectional WS to talk to.
- **Phase 3 and Phase 4 are independent** — can be parallelized after Phase 2.
- **One step at a time.** Mark `[x]` and commit with `step N.N complete: <outcome>`.