Files

roberto 27c087d5d8 step B.2 complete: all 23 tools use execute_on_client(); add embed() to llm

2026-03-05 00:03:01 +01:00

15 KiB

Raw Blame History

AI Refactor Plan — Adiuva Backend

Objective: Transform backend tools from JSON-action-descriptor-returning functions into real bidirectional executors. Each tool sends structured CRUD operations to the Electron client via WebSocket, receives real data back, and returns meaningful results to the LLM. The LLM reasons about actual user data instead of serialized action payloads.

Electron app: Lives at ../adiuva/. See ../adiuva/AI_REFACTOR_PLAN.md.

Protocol: Execute steps sequentially. Each step is atomic and committable. Mark [x] when done.

Architecture — Before vs After

Before (current)

LLM calls list_tasks(status="todo")
  → tool returns: '{"action":"list","table":"tasks","filters":{"status":"todo"}}'
  → _tool_loop feeds that JSON string as ToolMessage to LLM
  → LLM sees a descriptor, NOT real data — cannot reason about tasks
  → Final response: generic "Here are your tasks" (no actual task data)
  → Action descriptors sent in final WS frame for Electron to execute post-response

After (target)

LLM calls list_tasks(status="todo")
  → tool calls execute_on_client(action="select", table="tasks", filters={status:"todo"})
    → WS frame sent to Electron: {type:"tool_call", id:"abc", action:"select", table:"tasks", filters:{status:"todo"}}
    → Electron runs: db.select().from(tasks).where(eq(tasks.status, "todo")).all()
    → WS frame back: {type:"tool_result", id:"abc", rows:[{id:"1",title:"Buy milk",...}, ...]}
  → tool returns: "Found 3 tasks: 1. Buy milk (high, due tomorrow) 2. ..."
  → _tool_loop feeds that as ToolMessage to LLM
  → LLM sees REAL data — can reason, count, compare, summarize

WS Protocol — Typed Frames

Direction	`type`	Payload
Client → Server	`chat_request`	`{ message: str, context: ChatContext }`
Server → Client	`text_chunk`	`{ text: str }`
Server → Client	`tool_call`	`{ id: str, action: str, table?: str, data?: dict, filters?: dict, vector?: list[float], limit?: int }`
Client → Server	`tool_result`	`{ id: str, row?: dict, rows?: list[dict], results?: list[dict], deleted?: bool, ok?: bool, error?: str }`
Server → Client	`final`	`{ response: str }`
Server → Client	`ping`	`{}`

Actions:

`action`	What Electron does (Drizzle)	`tool_result` shape
`select`	`db.select().from(table).where(filters)`	`{ rows: [...] }`
`get`	`db.select().from(table).where(id=...).get()`	`{ row: {...} or null }`
`insert`	`db.insert(table).values({id: uuid(), ...data}).returning().get()`	`{ row: {...} }`
`update`	`db.update(table).set(updates).where(id=...).returning().get()`	`{ row: {...} }`
`delete`	`db.delete(table).where(id=...).run()`	`{ deleted: true }`
`vector_upsert`	LanceDB upsert with pre-computed vector	`{ ok: true }`
`vector_search`	LanceDB search by vector	`{ results: [{id, content, score}...] }`

Electron generates IDs + timestamps. Backend tools never send id or createdAt in insert data — Electron adds id: uuid(), createdAt: Date.now(), updatedAt: Date.now().

SQLite Schema Reference (Electron's local database)

Tools must use camelCase field names (Drizzle maps them to snake_case internally):

Table	Columns
`tasks`	id, projectId, title, description, status (todo\|in_progress\|done), priority (high\|medium\|low), assignee (JSON array string), dueDate (ms), isAiSuggested (0\|1), isApproved (0\|1), createdAt (ms)
`projects`	id, clientId, name, status (active\|archived), aiSummary, createdAt (ms)
`checkpoints`	id, projectId (required), title, date (ms), isAiSuggested (0\|1), isApproved (0\|1), createdAt (ms)
`notes`	id, projectId, title, content (markdown), createdAt (ms), updatedAt (ms)
`taskComments`	id, taskId, author, content, createdAt (ms)
`clients`	id, parentId, name, industry, createdAt (ms)

Phase B — Backend Changes

Step B.1 — WS context + frame types

Create app/core/ws_context.py (~25 lines):
- _client_executor: ContextVar[Callable] — holds the async callback for the current WS session
- async def execute_on_client(action, table=None, data=None, filters=None, vector=None, limit=None) -> dict:
  - Reads callback from ContextVar
  - Builds tool_call payload: {id: str(uuid4()), action, table, data, filters, vector, limit} (omits None fields)
  - Calls await callback(payload) — which sends the WS frame and waits for tool_result
  - Returns the result dict
- def set_client_executor(fn) / def clear_client_executor() — ContextVar management
Add to app/schemas.py:
- WsFrameType(str, Enum): chat_request, text_chunk, tool_call, tool_result, final, ping
- WsToolCall(BaseModel): type, id, action, table?, data?, filters?, vector?, limit?
- WsToolResult(BaseModel): type, id, row?, rows?, results?, deleted?, ok?, error?
- WsTextChunk(BaseModel): type, text
- WsFinal(BaseModel): type, response
Files: app/core/ws_context.py, app/schemas.py
Outcome: Any tool can await execute_on_client(...) to query/mutate the user's local DB.

Step B.2 — Rewrite all 23 tools to use `execute_on_client()`

Each tool: same @tool decorator, same parameters, same docstring. Replace return json.dumps({...}) body with:
1. Call result = await execute_on_client(action=..., table=..., data/filters=...)
2. Return human-readable string with confirmation + key data from result

app/agents/task_agent.py (8 tools):

list_tasks(project_id, status, search, order_by):

result = await execute_on_client(action="select", table="tasks", filters={
    "projectId": project_id or None,
    "status": status or None,
    "search": search or None,
    "orderBy": order_by or None,
})
rows = result.get("rows", [])
if not rows:
    return "No tasks found matching the given filters."
lines = [f"- {r['title']} (status: {r['status']}, priority: {r['priority']}, id: {r['id']})" for r in rows]
return f"Found {len(rows)} task(s):\n" + "\n".join(lines)

create_task(title, ...):

result = await execute_on_client(action="insert", table="tasks", data={
    "title": title, "description": description or None, "status": status,
    "priority": priority, "assignee": assignees, "dueDate": due_date or None,
    "projectId": project_id or None, "isAiSuggested": is_ai_suggested, "isApproved": is_approved,
})
row = result["row"]
return f"Task created: '{row['title']}' (id: {row['id']}, status: {row['status']}, priority: {row['priority']})"

update_task(task_id, ...): build updates dict (same logic as now) → execute_on_client(action="update", table="tasks", data={"id": task_id, "updates": updates}) → return "Task updated: {title}"
delete_task(task_id): execute_on_client(action="delete", table="tasks", data={"id": task_id}) → return "Task deleted"
list_tasks_due_today(): calculate today's start/end ms → execute_on_client(action="select", table="tasks", filters={"dueDateFrom": start, "dueDateTo": end}) → format + return
list_task_comments(task_id): execute_on_client(action="select", table="taskComments", filters={"taskId": task_id}) → format + return
add_task_comment(task_id, author, content): execute_on_client(action="insert", table="taskComments", data={...}) → return confirmation
delete_task_comment(comment_id): execute_on_client(action="delete", table="taskComments", data={"id": comment_id}) → return confirmation

app/agents/project_agent.py (6 tools):
- list_projects(client_id, include_archived): execute_on_client(action="select", table="projects", filters={clientId, includeArchived}) → format + return
- list_all_projects(): execute_on_client(action="select", table="projects") → format + return
- get_project(project_id): execute_on_client(action="get", table="projects", data={"id": project_id}) → return project details or "not found"
- create_project(name, client_id): execute_on_client(action="insert", table="projects", data={name, clientId}) → return confirmation + id
- update_project(project_id, ...): build updates → execute_on_client(action="update", ...) → return confirmation
- delete_project(project_id): execute_on_client(action="delete", ...) → return confirmation
app/agents/checkpoint_agent.py (4 tools):
- list_checkpoints(project_id): execute_on_client(action="select", table="checkpoints", filters={projectId}) → format + return
- create_checkpoint(project_id, title, date, ...): execute_on_client(action="insert", table="checkpoints", data={...}) → return confirmation + id
- update_checkpoint(checkpoint_id, ...): build updates → execute_on_client(action="update", ...) → return confirmation
- delete_checkpoint(checkpoint_id): execute_on_client(action="delete", ...) → return confirmation
app/agents/note_agent.py (5 tools):
- list_notes(project_id): execute_on_client(action="select", table="notes", filters={projectId}) → format + return
- get_note(note_id): execute_on_client(action="get", table="notes", data={"id": note_id}) → return full content or "not found"
- create_note(title, content, project_id): execute_on_client(action="insert", table="notes", data={...}) → then execute_on_client(action="vector_upsert", data={id, projectId, content}, vector=await embed(content)) → return confirmation
- update_note(note_id, ...): build updates → execute_on_client(action="update", ...) → then vector_upsert for updated content → return confirmation
- delete_note(note_id): execute_on_client(action="delete", ...) → return confirmation
Files: app/agents/task_agent.py, app/agents/project_agent.py, app/agents/checkpoint_agent.py, app/agents/note_agent.py
Outcome: All 23 tools query real user data via WS. LLM sees actual rows, not action descriptors.

Step B.3 — Bidirectional WebSocket handler

Refactor app/api/routes/chat.py WS endpoint:

After auth + accept + receive chat_request:

Create execute_on_client callback closure capturing the websocket:

pending_calls: dict[str, asyncio.Future] = {}

async def on_client_result(frame: dict):
    """Called when a tool_result frame arrives from Electron."""
    fut = pending_calls.pop(frame["id"], None)
    if fut and not fut.done():
        fut.set_result(frame)

async def execute_callback(payload: dict) -> dict:
    """Send tool_call to Electron, wait for tool_result."""
    call_id = payload["id"]
    fut = asyncio.get_event_loop().create_future()
    pending_calls[call_id] = fut
    await websocket.send_text(json.dumps({"type": "tool_call", **payload}))
    return await asyncio.wait_for(fut, timeout=30.0)

Set client_executor ContextVar with execute_callback
Run orchestrator in a task — it calls agents, agents call tools, tools call execute_on_client() which goes through the callback
In parallel, run a message receive loop that dispatches incoming frames:
- tool_result → on_client_result(frame)
- ping → ignore
Orchestrator yields text_chunk frames → send to client
Send final frame when done
Clear ContextVar

Keep heartbeat ping every 30s
30s timeout on tool_result — if Electron doesn't respond, future raises TimeoutError, tool returns error string to LLM

Files: app/api/routes/chat.py
Outcome: Full bidirectional WS. Tool calls and text streaming happen concurrently on the same connection.

Step B.4 — `_tool_loop` — no changes needed

Verify app/core/agent_registry.py works unchanged:
- _tool_loop calls tool_fn.ainvoke(args) → tool awaits execute_on_client() (WS round-trip) → returns string → ToolMessage(content=string) → LLM sees real data
- The async WS round-trip happens inside each tool. _tool_loop just sees an awaited tool returning a string — same as before, different content.
No code changes. Just verify + add a log line for tool execution times if desired.

Step B.5 — Orchestrator cleanup

Update app/core/orchestrator.py:
- orchestrate_stream(): remove "actions": [] from final frame. Final becomes: {"done": true, "response": "..."}
- No other changes — classify_intent → call_agent → chunk response → final frame
Files: app/core/orchestrator.py
Outcome: Clean final frame. No more action descriptors in the protocol.

Step B.6 — Add `/vectors/embed` endpoint

Add to app/api/routes/vectors.py:
- POST /api/v1/storage/vectors/embed:
  - Request: { text: str }
  - Response: { vector: list[float] } (1536-dim from text-embedding-3-small)
  - Auth required (JWT)
- Used by:
  - Backend tools: note_agent calls this before vector_upsert
  - Electron: vectordb.ts calls this for note embedding on create/update
Files: app/api/routes/vectors.py
Outcome: Single embedding endpoint. Both backend tools and Electron can generate vectors.

Verification

What to test	How
Read flow	"List my tasks" → `list_tasks` → `tool_call{select, tasks}` → Electron returns rows → LLM describes real tasks
Write flow	"Create a task called Buy milk" → `create_task` → `tool_call{insert, tasks, data:{title:"Buy milk"}}` → Electron inserts + returns row → tool confirms with id
Multi-tool	"How many todo tasks do I have?" → `list_tasks(status=todo)` → LLM counts actual rows → "You have 3 todo tasks"
Vector search	"Find notes about deployment" → tool embeds → `tool_call{vector_search, vector:[...]}` → Electron searches LanceDB → returns matching notes
Vector upsert	"Create a note about..." → insert note → vector_upsert with embedding → both SQLite + LanceDB updated
Tool timeout	Disconnect Electron mid-conversation → 30s timeout → tool returns error → LLM handles gracefully
Concurrent calls	Agent calls 2 tools in sequence → each does WS round-trip → both succeed → LLM sees both results
_tool_loop max iter	Verify 5-iteration limit still works → after 5 tool calls, LLM forced to answer without tools

Execution Notes

Phase 1 is the critical path. Auth + backend client + drizzle executor + orchestrator refactor must land first.
Steps 1.1–1.4 are additive — existing app keeps working until Step 1.5 swaps the orchestrator.
Step 2.1 is the point of no return — after removing LangChain, there's no local AI fallback.
Phase B (backend changes) must land before Phase 1.3–1.5 — Electron needs the bidirectional WS to talk to.
Phase 3 and Phase 4 are independent — can be parallelized after Phase 2.
One step at a time. Mark [x] and commit with step N.N complete: <outcome>.

15 KiB Raw Blame History Unescape Escape