Files
api/AI_REFACTOR_PLAN.md

15 KiB
Raw Blame History

AI Refactor Plan — Adiuva Backend

Objective: Transform backend tools from JSON-action-descriptor-returning functions into real bidirectional executors. Each tool sends structured CRUD operations to the Electron client via WebSocket, receives real data back, and returns meaningful results to the LLM. The LLM reasons about actual user data instead of serialized action payloads.

Electron app: Lives at ../adiuva/. See ../adiuva/AI_REFACTOR_PLAN.md.

Protocol: Execute steps sequentially. Each step is atomic and committable. Mark [x] when done.


Architecture — Before vs After

Before (current)

LLM calls list_tasks(status="todo")
  → tool returns: '{"action":"list","table":"tasks","filters":{"status":"todo"}}'
  → _tool_loop feeds that JSON string as ToolMessage to LLM
  → LLM sees a descriptor, NOT real data — cannot reason about tasks
  → Final response: generic "Here are your tasks" (no actual task data)
  → Action descriptors sent in final WS frame for Electron to execute post-response

After (target)

LLM calls list_tasks(status="todo")
  → tool calls execute_on_client(action="select", table="tasks", filters={status:"todo"})
    → WS frame sent to Electron: {type:"tool_call", id:"abc", action:"select", table:"tasks", filters:{status:"todo"}}
    → Electron runs: db.select().from(tasks).where(eq(tasks.status, "todo")).all()
    → WS frame back: {type:"tool_result", id:"abc", rows:[{id:"1",title:"Buy milk",...}, ...]}
  → tool returns: "Found 3 tasks: 1. Buy milk (high, due tomorrow) 2. ..."
  → _tool_loop feeds that as ToolMessage to LLM
  → LLM sees REAL data — can reason, count, compare, summarize

WS Protocol — Typed Frames

Direction type Payload
Client → Server chat_request { message: str, context: ChatContext }
Server → Client text_chunk { text: str }
Server → Client tool_call { id: str, action: str, table?: str, data?: dict, filters?: dict, vector?: list[float], limit?: int }
Client → Server tool_result { id: str, row?: dict, rows?: list[dict], results?: list[dict], deleted?: bool, ok?: bool, error?: str }
Server → Client final { response: str }
Server → Client ping {}

Actions:

action What Electron does (Drizzle) tool_result shape
select db.select().from(table).where(filters) { rows: [...] }
get db.select().from(table).where(id=...).get() { row: {...} or null }
insert db.insert(table).values({id: uuid(), ...data}).returning().get() { row: {...} }
update db.update(table).set(updates).where(id=...).returning().get() { row: {...} }
delete db.delete(table).where(id=...).run() { deleted: true }
vector_upsert LanceDB upsert with pre-computed vector { ok: true }
vector_search LanceDB search by vector { results: [{id, content, score}...] }

Electron generates IDs + timestamps. Backend tools never send id or createdAt in insert data — Electron adds id: uuid(), createdAt: Date.now(), updatedAt: Date.now().


SQLite Schema Reference (Electron's local database)

Tools must use camelCase field names (Drizzle maps them to snake_case internally):

Table Columns
tasks id, projectId, title, description, status (todo|in_progress|done), priority (high|medium|low), assignee (JSON array string), dueDate (ms), isAiSuggested (0|1), isApproved (0|1), createdAt (ms)
projects id, clientId, name, status (active|archived), aiSummary, createdAt (ms)
checkpoints id, projectId (required), title, date (ms), isAiSuggested (0|1), isApproved (0|1), createdAt (ms)
notes id, projectId, title, content (markdown), createdAt (ms), updatedAt (ms)
taskComments id, taskId, author, content, createdAt (ms)
clients id, parentId, name, industry, createdAt (ms)

Phase B — Backend Changes

Step B.1 — WS context + frame types

  • Create app/core/ws_context.py (~25 lines):
    • _client_executor: ContextVar[Callable] — holds the async callback for the current WS session
    • async def execute_on_client(action, table=None, data=None, filters=None, vector=None, limit=None) -> dict:
      • Reads callback from ContextVar
      • Builds tool_call payload: {id: str(uuid4()), action, table, data, filters, vector, limit} (omits None fields)
      • Calls await callback(payload) — which sends the WS frame and waits for tool_result
      • Returns the result dict
    • def set_client_executor(fn) / def clear_client_executor() — ContextVar management
  • Add to app/schemas.py:
    • WsFrameType(str, Enum): chat_request, text_chunk, tool_call, tool_result, final, ping
    • WsToolCall(BaseModel): type, id, action, table?, data?, filters?, vector?, limit?
    • WsToolResult(BaseModel): type, id, row?, rows?, results?, deleted?, ok?, error?
    • WsTextChunk(BaseModel): type, text
    • WsFinal(BaseModel): type, response
  • Files: app/core/ws_context.py, app/schemas.py
  • Outcome: Any tool can await execute_on_client(...) to query/mutate the user's local DB.

Step B.2 — Rewrite all 23 tools to use execute_on_client()

  • Each tool: same @tool decorator, same parameters, same docstring. Replace return json.dumps({...}) body with:

    1. Call result = await execute_on_client(action=..., table=..., data/filters=...)
    2. Return human-readable string with confirmation + key data from result
  • app/agents/task_agent.py (8 tools):

    • list_tasks(project_id, status, search, order_by):
      result = await execute_on_client(action="select", table="tasks", filters={
          "projectId": project_id or None,
          "status": status or None,
          "search": search or None,
          "orderBy": order_by or None,
      })
      rows = result.get("rows", [])
      if not rows:
          return "No tasks found matching the given filters."
      lines = [f"- {r['title']} (status: {r['status']}, priority: {r['priority']}, id: {r['id']})" for r in rows]
      return f"Found {len(rows)} task(s):\n" + "\n".join(lines)
      
    • create_task(title, ...):
      result = await execute_on_client(action="insert", table="tasks", data={
          "title": title, "description": description or None, "status": status,
          "priority": priority, "assignee": assignees, "dueDate": due_date or None,
          "projectId": project_id or None, "isAiSuggested": is_ai_suggested, "isApproved": is_approved,
      })
      row = result["row"]
      return f"Task created: '{row['title']}' (id: {row['id']}, status: {row['status']}, priority: {row['priority']})"
      
    • update_task(task_id, ...): build updates dict (same logic as now) → execute_on_client(action="update", table="tasks", data={"id": task_id, "updates": updates}) → return "Task updated: {title}"
    • delete_task(task_id): execute_on_client(action="delete", table="tasks", data={"id": task_id}) → return "Task deleted"
    • list_tasks_due_today(): calculate today's start/end ms → execute_on_client(action="select", table="tasks", filters={"dueDateFrom": start, "dueDateTo": end}) → format + return
    • list_task_comments(task_id): execute_on_client(action="select", table="taskComments", filters={"taskId": task_id}) → format + return
    • add_task_comment(task_id, author, content): execute_on_client(action="insert", table="taskComments", data={...}) → return confirmation
    • delete_task_comment(comment_id): execute_on_client(action="delete", table="taskComments", data={"id": comment_id}) → return confirmation
  • app/agents/project_agent.py (6 tools):

    • list_projects(client_id, include_archived): execute_on_client(action="select", table="projects", filters={clientId, includeArchived}) → format + return
    • list_all_projects(): execute_on_client(action="select", table="projects") → format + return
    • get_project(project_id): execute_on_client(action="get", table="projects", data={"id": project_id}) → return project details or "not found"
    • create_project(name, client_id): execute_on_client(action="insert", table="projects", data={name, clientId}) → return confirmation + id
    • update_project(project_id, ...): build updates → execute_on_client(action="update", ...) → return confirmation
    • delete_project(project_id): execute_on_client(action="delete", ...) → return confirmation
  • app/agents/checkpoint_agent.py (4 tools):

    • list_checkpoints(project_id): execute_on_client(action="select", table="checkpoints", filters={projectId}) → format + return
    • create_checkpoint(project_id, title, date, ...): execute_on_client(action="insert", table="checkpoints", data={...}) → return confirmation + id
    • update_checkpoint(checkpoint_id, ...): build updates → execute_on_client(action="update", ...) → return confirmation
    • delete_checkpoint(checkpoint_id): execute_on_client(action="delete", ...) → return confirmation
  • app/agents/note_agent.py (5 tools):

    • list_notes(project_id): execute_on_client(action="select", table="notes", filters={projectId}) → format + return
    • get_note(note_id): execute_on_client(action="get", table="notes", data={"id": note_id}) → return full content or "not found"
    • create_note(title, content, project_id): execute_on_client(action="insert", table="notes", data={...}) → then execute_on_client(action="vector_upsert", data={id, projectId, content}, vector=await embed(content)) → return confirmation
    • update_note(note_id, ...): build updates → execute_on_client(action="update", ...) → then vector_upsert for updated content → return confirmation
    • delete_note(note_id): execute_on_client(action="delete", ...) → return confirmation
  • Files: app/agents/task_agent.py, app/agents/project_agent.py, app/agents/checkpoint_agent.py, app/agents/note_agent.py

  • Outcome: All 23 tools query real user data via WS. LLM sees actual rows, not action descriptors.

Step B.3 — Bidirectional WebSocket handler

  • Refactor app/api/routes/chat.py WS endpoint:
    • After auth + accept + receive chat_request:
      1. Create execute_on_client callback closure capturing the websocket:
        pending_calls: dict[str, asyncio.Future] = {}
        
        async def on_client_result(frame: dict):
            """Called when a tool_result frame arrives from Electron."""
            fut = pending_calls.pop(frame["id"], None)
            if fut and not fut.done():
                fut.set_result(frame)
        
        async def execute_callback(payload: dict) -> dict:
            """Send tool_call to Electron, wait for tool_result."""
            call_id = payload["id"]
            fut = asyncio.get_event_loop().create_future()
            pending_calls[call_id] = fut
            await websocket.send_text(json.dumps({"type": "tool_call", **payload}))
            return await asyncio.wait_for(fut, timeout=30.0)
        
      2. Set client_executor ContextVar with execute_callback
      3. Run orchestrator in a task — it calls agents, agents call tools, tools call execute_on_client() which goes through the callback
      4. In parallel, run a message receive loop that dispatches incoming frames:
        • tool_resulton_client_result(frame)
        • ping → ignore
      5. Orchestrator yields text_chunk frames → send to client
      6. Send final frame when done
      7. Clear ContextVar
    • Keep heartbeat ping every 30s
    • 30s timeout on tool_result — if Electron doesn't respond, future raises TimeoutError, tool returns error string to LLM
  • Files: app/api/routes/chat.py
  • Outcome: Full bidirectional WS. Tool calls and text streaming happen concurrently on the same connection.

Step B.4 — _tool_loop — no changes needed

  • Verify app/core/agent_registry.py works unchanged:
    • _tool_loop calls tool_fn.ainvoke(args) → tool awaits execute_on_client() (WS round-trip) → returns string → ToolMessage(content=string) → LLM sees real data
    • The async WS round-trip happens inside each tool. _tool_loop just sees an awaited tool returning a string — same as before, different content.
  • No code changes. Just verify + add a log line for tool execution times if desired.

Step B.5 — Orchestrator cleanup

  • Update app/core/orchestrator.py:
    • orchestrate_stream(): remove "actions": [] from final frame. Final becomes: {"done": true, "response": "..."}
    • No other changes — classify_intentcall_agent → chunk response → final frame
  • Files: app/core/orchestrator.py
  • Outcome: Clean final frame. No more action descriptors in the protocol.

Step B.6 — Add /vectors/embed endpoint

  • Add to app/api/routes/vectors.py:
    • POST /api/v1/storage/vectors/embed:
      • Request: { text: str }
      • Response: { vector: list[float] } (1536-dim from text-embedding-3-small)
      • Auth required (JWT)
    • Used by:
      • Backend tools: note_agent calls this before vector_upsert
      • Electron: vectordb.ts calls this for note embedding on create/update
  • Files: app/api/routes/vectors.py
  • Outcome: Single embedding endpoint. Both backend tools and Electron can generate vectors.

Verification

What to test How
Read flow "List my tasks" → list_taskstool_call{select, tasks} → Electron returns rows → LLM describes real tasks
Write flow "Create a task called Buy milk" → create_tasktool_call{insert, tasks, data:{title:"Buy milk"}} → Electron inserts + returns row → tool confirms with id
Multi-tool "How many todo tasks do I have?" → list_tasks(status=todo) → LLM counts actual rows → "You have 3 todo tasks"
Vector search "Find notes about deployment" → tool embeds → tool_call{vector_search, vector:[...]} → Electron searches LanceDB → returns matching notes
Vector upsert "Create a note about..." → insert note → vector_upsert with embedding → both SQLite + LanceDB updated
Tool timeout Disconnect Electron mid-conversation → 30s timeout → tool returns error → LLM handles gracefully
Concurrent calls Agent calls 2 tools in sequence → each does WS round-trip → both succeed → LLM sees both results
_tool_loop max iter Verify 5-iteration limit still works → after 5 tool calls, LLM forced to answer without tools

Execution Notes

  • Phase 1 is the critical path. Auth + backend client + drizzle executor + orchestrator refactor must land first.
  • Steps 1.11.4 are additive — existing app keeps working until Step 1.5 swaps the orchestrator.
  • Step 2.1 is the point of no return — after removing LangChain, there's no local AI fallback.
  • Phase B (backend changes) must land before Phase 1.31.5 — Electron needs the bidirectional WS to talk to.
  • Phase 3 and Phase 4 are independent — can be parallelized after Phase 2.
  • One step at a time. Mark [x] and commit with step N.N complete: <outcome>.