15 KiB
AI Refactor Plan — Adiuva Backend
Objective: Transform backend tools from JSON-action-descriptor-returning functions into real bidirectional executors. Each tool sends structured CRUD operations to the Electron client via WebSocket, receives real data back, and returns meaningful results to the LLM. The LLM reasons about actual user data instead of serialized action payloads.
Electron app: Lives at
../adiuva/. See../adiuva/AI_REFACTOR_PLAN.md.Protocol: Execute steps sequentially. Each step is atomic and committable. Mark
[x]when done.
Architecture — Before vs After
Before (current)
LLM calls list_tasks(status="todo")
→ tool returns: '{"action":"list","table":"tasks","filters":{"status":"todo"}}'
→ _tool_loop feeds that JSON string as ToolMessage to LLM
→ LLM sees a descriptor, NOT real data — cannot reason about tasks
→ Final response: generic "Here are your tasks" (no actual task data)
→ Action descriptors sent in final WS frame for Electron to execute post-response
After (target)
LLM calls list_tasks(status="todo")
→ tool calls execute_on_client(action="select", table="tasks", filters={status:"todo"})
→ WS frame sent to Electron: {type:"tool_call", id:"abc", action:"select", table:"tasks", filters:{status:"todo"}}
→ Electron runs: db.select().from(tasks).where(eq(tasks.status, "todo")).all()
→ WS frame back: {type:"tool_result", id:"abc", rows:[{id:"1",title:"Buy milk",...}, ...]}
→ tool returns: "Found 3 tasks: 1. Buy milk (high, due tomorrow) 2. ..."
→ _tool_loop feeds that as ToolMessage to LLM
→ LLM sees REAL data — can reason, count, compare, summarize
WS Protocol — Typed Frames
| Direction | type |
Payload |
|---|---|---|
| Client → Server | chat_request |
{ message: str, context: ChatContext } |
| Server → Client | text_chunk |
{ text: str } |
| Server → Client | tool_call |
{ id: str, action: str, table?: str, data?: dict, filters?: dict, vector?: list[float], limit?: int } |
| Client → Server | tool_result |
{ id: str, row?: dict, rows?: list[dict], results?: list[dict], deleted?: bool, ok?: bool, error?: str } |
| Server → Client | final |
{ response: str } |
| Server → Client | ping |
{} |
Actions:
action |
What Electron does (Drizzle) | tool_result shape |
|---|---|---|
select |
db.select().from(table).where(filters) |
{ rows: [...] } |
get |
db.select().from(table).where(id=...).get() |
{ row: {...} or null } |
insert |
db.insert(table).values({id: uuid(), ...data}).returning().get() |
{ row: {...} } |
update |
db.update(table).set(updates).where(id=...).returning().get() |
{ row: {...} } |
delete |
db.delete(table).where(id=...).run() |
{ deleted: true } |
vector_upsert |
LanceDB upsert with pre-computed vector | { ok: true } |
vector_search |
LanceDB search by vector | { results: [{id, content, score}...] } |
Electron generates IDs + timestamps. Backend tools never send id or createdAt in insert data — Electron adds id: uuid(), createdAt: Date.now(), updatedAt: Date.now().
SQLite Schema Reference (Electron's local database)
Tools must use camelCase field names (Drizzle maps them to snake_case internally):
| Table | Columns |
|---|---|
tasks |
id, projectId, title, description, status (todo|in_progress|done), priority (high|medium|low), assignee (JSON array string), dueDate (ms), isAiSuggested (0|1), isApproved (0|1), createdAt (ms) |
projects |
id, clientId, name, status (active|archived), aiSummary, createdAt (ms) |
checkpoints |
id, projectId (required), title, date (ms), isAiSuggested (0|1), isApproved (0|1), createdAt (ms) |
notes |
id, projectId, title, content (markdown), createdAt (ms), updatedAt (ms) |
taskComments |
id, taskId, author, content, createdAt (ms) |
clients |
id, parentId, name, industry, createdAt (ms) |
Phase B — Backend Changes
Step B.1 — WS context + frame types
- Create
app/core/ws_context.py(~25 lines):_client_executor: ContextVar[Callable]— holds the async callback for the current WS sessionasync def execute_on_client(action, table=None, data=None, filters=None, vector=None, limit=None) -> dict:- Reads callback from ContextVar
- Builds
tool_callpayload:{id: str(uuid4()), action, table, data, filters, vector, limit}(omits None fields) - Calls
await callback(payload)— which sends the WS frame and waits fortool_result - Returns the result dict
def set_client_executor(fn)/def clear_client_executor()— ContextVar management
- Add to
app/schemas.py:WsFrameType(str, Enum):chat_request,text_chunk,tool_call,tool_result,final,pingWsToolCall(BaseModel):type,id,action,table?,data?,filters?,vector?,limit?WsToolResult(BaseModel):type,id,row?,rows?,results?,deleted?,ok?,error?WsTextChunk(BaseModel):type,textWsFinal(BaseModel):type,response
- Files:
app/core/ws_context.py,app/schemas.py - Outcome: Any tool can
await execute_on_client(...)to query/mutate the user's local DB.
Step B.2 — Rewrite all 23 tools to use execute_on_client()
-
Each tool: same
@tooldecorator, same parameters, same docstring. Replacereturn json.dumps({...})body with:- Call
result = await execute_on_client(action=..., table=..., data/filters=...) - Return human-readable string with confirmation + key data from
result
- Call
-
app/agents/task_agent.py(8 tools):list_tasks(project_id, status, search, order_by):result = await execute_on_client(action="select", table="tasks", filters={ "projectId": project_id or None, "status": status or None, "search": search or None, "orderBy": order_by or None, }) rows = result.get("rows", []) if not rows: return "No tasks found matching the given filters." lines = [f"- {r['title']} (status: {r['status']}, priority: {r['priority']}, id: {r['id']})" for r in rows] return f"Found {len(rows)} task(s):\n" + "\n".join(lines)create_task(title, ...):result = await execute_on_client(action="insert", table="tasks", data={ "title": title, "description": description or None, "status": status, "priority": priority, "assignee": assignees, "dueDate": due_date or None, "projectId": project_id or None, "isAiSuggested": is_ai_suggested, "isApproved": is_approved, }) row = result["row"] return f"Task created: '{row['title']}' (id: {row['id']}, status: {row['status']}, priority: {row['priority']})"update_task(task_id, ...): build updates dict (same logic as now) →execute_on_client(action="update", table="tasks", data={"id": task_id, "updates": updates})→ return "Task updated: {title}"delete_task(task_id):execute_on_client(action="delete", table="tasks", data={"id": task_id})→ return "Task deleted"list_tasks_due_today(): calculate today's start/end ms →execute_on_client(action="select", table="tasks", filters={"dueDateFrom": start, "dueDateTo": end})→ format + returnlist_task_comments(task_id):execute_on_client(action="select", table="taskComments", filters={"taskId": task_id})→ format + returnadd_task_comment(task_id, author, content):execute_on_client(action="insert", table="taskComments", data={...})→ return confirmationdelete_task_comment(comment_id):execute_on_client(action="delete", table="taskComments", data={"id": comment_id})→ return confirmation
-
app/agents/project_agent.py(6 tools):list_projects(client_id, include_archived):execute_on_client(action="select", table="projects", filters={clientId, includeArchived})→ format + returnlist_all_projects():execute_on_client(action="select", table="projects")→ format + returnget_project(project_id):execute_on_client(action="get", table="projects", data={"id": project_id})→ return project details or "not found"create_project(name, client_id):execute_on_client(action="insert", table="projects", data={name, clientId})→ return confirmation + idupdate_project(project_id, ...): build updates →execute_on_client(action="update", ...)→ return confirmationdelete_project(project_id):execute_on_client(action="delete", ...)→ return confirmation
-
app/agents/checkpoint_agent.py(4 tools):list_checkpoints(project_id):execute_on_client(action="select", table="checkpoints", filters={projectId})→ format + returncreate_checkpoint(project_id, title, date, ...):execute_on_client(action="insert", table="checkpoints", data={...})→ return confirmation + idupdate_checkpoint(checkpoint_id, ...): build updates →execute_on_client(action="update", ...)→ return confirmationdelete_checkpoint(checkpoint_id):execute_on_client(action="delete", ...)→ return confirmation
-
app/agents/note_agent.py(5 tools):list_notes(project_id):execute_on_client(action="select", table="notes", filters={projectId})→ format + returnget_note(note_id):execute_on_client(action="get", table="notes", data={"id": note_id})→ return full content or "not found"create_note(title, content, project_id):execute_on_client(action="insert", table="notes", data={...})→ thenexecute_on_client(action="vector_upsert", data={id, projectId, content}, vector=await embed(content))→ return confirmationupdate_note(note_id, ...): build updates →execute_on_client(action="update", ...)→ then vector_upsert for updated content → return confirmationdelete_note(note_id):execute_on_client(action="delete", ...)→ return confirmation
-
Files:
app/agents/task_agent.py,app/agents/project_agent.py,app/agents/checkpoint_agent.py,app/agents/note_agent.py -
Outcome: All 23 tools query real user data via WS. LLM sees actual rows, not action descriptors.
Step B.3 — Bidirectional WebSocket handler
- Refactor
app/api/routes/chat.pyWS endpoint:- After auth + accept + receive
chat_request:- Create
execute_on_clientcallback closure capturing the websocket:pending_calls: dict[str, asyncio.Future] = {} async def on_client_result(frame: dict): """Called when a tool_result frame arrives from Electron.""" fut = pending_calls.pop(frame["id"], None) if fut and not fut.done(): fut.set_result(frame) async def execute_callback(payload: dict) -> dict: """Send tool_call to Electron, wait for tool_result.""" call_id = payload["id"] fut = asyncio.get_event_loop().create_future() pending_calls[call_id] = fut await websocket.send_text(json.dumps({"type": "tool_call", **payload})) return await asyncio.wait_for(fut, timeout=30.0) - Set
client_executorContextVar withexecute_callback - Run orchestrator in a task — it calls agents, agents call tools, tools call
execute_on_client()which goes through the callback - In parallel, run a message receive loop that dispatches incoming frames:
tool_result→on_client_result(frame)ping→ ignore
- Orchestrator yields
text_chunkframes → send to client - Send
finalframe when done - Clear ContextVar
- Create
- Keep heartbeat ping every 30s
- 30s timeout on
tool_result— if Electron doesn't respond, future raisesTimeoutError, tool returns error string to LLM
- After auth + accept + receive
- Files:
app/api/routes/chat.py - Outcome: Full bidirectional WS. Tool calls and text streaming happen concurrently on the same connection.
Step B.4 — _tool_loop — no changes needed
- Verify
app/core/agent_registry.pyworks unchanged:_tool_loopcallstool_fn.ainvoke(args)→ tool awaitsexecute_on_client()(WS round-trip) → returns string →ToolMessage(content=string)→ LLM sees real data- The async WS round-trip happens inside each tool.
_tool_loopjust sees an awaited tool returning a string — same as before, different content.
- No code changes. Just verify + add a log line for tool execution times if desired.
Step B.5 — Orchestrator cleanup
- Update
app/core/orchestrator.py:orchestrate_stream(): remove"actions": []from final frame. Final becomes:{"done": true, "response": "..."}- No other changes —
classify_intent→call_agent→ chunk response → final frame
- Files:
app/core/orchestrator.py - Outcome: Clean final frame. No more action descriptors in the protocol.
Step B.6 — Add /vectors/embed endpoint
- Add to
app/api/routes/vectors.py:POST /api/v1/storage/vectors/embed:- Request:
{ text: str } - Response:
{ vector: list[float] }(1536-dim fromtext-embedding-3-small) - Auth required (JWT)
- Request:
- Used by:
- Backend tools:
note_agentcalls this beforevector_upsert - Electron:
vectordb.tscalls this for note embedding on create/update
- Backend tools:
- Files:
app/api/routes/vectors.py - Outcome: Single embedding endpoint. Both backend tools and Electron can generate vectors.
Verification
| What to test | How |
|---|---|
| Read flow | "List my tasks" → list_tasks → tool_call{select, tasks} → Electron returns rows → LLM describes real tasks |
| Write flow | "Create a task called Buy milk" → create_task → tool_call{insert, tasks, data:{title:"Buy milk"}} → Electron inserts + returns row → tool confirms with id |
| Multi-tool | "How many todo tasks do I have?" → list_tasks(status=todo) → LLM counts actual rows → "You have 3 todo tasks" |
| Vector search | "Find notes about deployment" → tool embeds → tool_call{vector_search, vector:[...]} → Electron searches LanceDB → returns matching notes |
| Vector upsert | "Create a note about..." → insert note → vector_upsert with embedding → both SQLite + LanceDB updated |
| Tool timeout | Disconnect Electron mid-conversation → 30s timeout → tool returns error → LLM handles gracefully |
| Concurrent calls | Agent calls 2 tools in sequence → each does WS round-trip → both succeed → LLM sees both results |
| _tool_loop max iter | Verify 5-iteration limit still works → after 5 tool calls, LLM forced to answer without tools |
Execution Notes
- Phase 1 is the critical path. Auth + backend client + drizzle executor + orchestrator refactor must land first.
- Steps 1.1–1.4 are additive — existing app keeps working until Step 1.5 swaps the orchestrator.
- Step 2.1 is the point of no return — after removing LangChain, there's no local AI fallback.
- Phase B (backend changes) must land before Phase 1.3–1.5 — Electron needs the bidirectional WS to talk to.
- Phase 3 and Phase 4 are independent — can be parallelized after Phase 2.
- One step at a time. Mark
[x]and commit withstep N.N complete: <outcome>.