Files
api/AI_REFACTOR_PLAN.md

32 KiB
Raw Blame History

AI Refactor Plan — Adiuva Backend

Objective: Transform backend tools from JSON-action-descriptor-returning functions into real bidirectional executors. Each tool sends structured CRUD operations to the Electron client via WebSocket, receives real data back, and returns meaningful results to the LLM. The LLM reasons about actual user data instead of serialized action payloads.

Electron app: Lives at ../adiuva/. See ../adiuva/AI_REFACTOR_PLAN.md.

Protocol: Execute steps sequentially. Each step is atomic and committable. Mark [x] when done.


Architecture — Before vs After

Before (current)

LLM calls list_tasks(status="todo")
  → tool returns: '{"action":"list","table":"tasks","filters":{"status":"todo"}}'
  → _tool_loop feeds that JSON string as ToolMessage to LLM
  → LLM sees a descriptor, NOT real data — cannot reason about tasks
  → Final response: generic "Here are your tasks" (no actual task data)
  → Action descriptors sent in final WS frame for Electron to execute post-response

After (target)

LLM calls list_tasks(status="todo")
  → tool calls execute_on_client(action="select", table="tasks", filters={status:"todo"})
    → WS frame sent to Electron: {type:"tool_call", id:"abc", action:"select", table:"tasks", filters:{status:"todo"}}
    → Electron runs: db.select().from(tasks).where(eq(tasks.status, "todo")).all()
    → WS frame back: {type:"tool_result", id:"abc", rows:[{id:"1",title:"Buy milk",...}, ...]}
  → tool returns: "Found 3 tasks: 1. Buy milk (high, due tomorrow) 2. ..."
  → _tool_loop feeds that as ToolMessage to LLM
  → LLM sees REAL data — can reason, count, compare, summarize

WS Protocol — Typed Frames

Direction type Payload
Client → Server chat_request { message: str, context: ChatContext }
Server → Client text_chunk { text: str }
Server → Client tool_call { id: str, action: str, table?: str, data?: dict, filters?: dict, vector?: list[float], limit?: int }
Client → Server tool_result { id: str, row?: dict, rows?: list[dict], results?: list[dict], deleted?: bool, ok?: bool, error?: str }
Server → Client final { response: str }
Server → Client ping {}

Actions:

action What Electron does (Drizzle) tool_result shape
select db.select().from(table).where(filters) { rows: [...] }
get db.select().from(table).where(id=...).get() { row: {...} or null }
insert db.insert(table).values({id: uuid(), ...data}).returning().get() { row: {...} }
update db.update(table).set(updates).where(id=...).returning().get() { row: {...} }
delete db.delete(table).where(id=...).run() { deleted: true }
vector_upsert LanceDB upsert with pre-computed vector { ok: true }
vector_search LanceDB search by vector { results: [{id, content, score}...] }

Electron generates IDs + timestamps. Backend tools never send id or createdAt in insert data — Electron adds id: uuid(), createdAt: Date.now(), updatedAt: Date.now().


SQLite Schema Reference (Electron's local database)

Tools must use camelCase field names (Drizzle maps them to snake_case internally):

Table Columns
tasks id, projectId, title, description, status (todo|in_progress|done), priority (high|medium|low), assignee (JSON array string), dueDate (ms), isAiSuggested (0|1), isApproved (0|1), createdAt (ms)
projects id, clientId, name, status (active|archived), aiSummary, createdAt (ms)
timelines id, projectId (required), title, date (ms), isAiSuggested (0|1), isApproved (0|1), createdAt (ms)
notes id, projectId, title, content (markdown), createdAt (ms), updatedAt (ms)
taskComments id, taskId, author, content, createdAt (ms)
clients id, parentId, name, industry, createdAt (ms)

Phase B — Backend Changes

Step B.1 — WS context + frame types

  • Create app/core/ws_context.py (~25 lines):
    • _client_executor: ContextVar[Callable] — holds the async callback for the current WS session
    • async def execute_on_client(action, table=None, data=None, filters=None, vector=None, limit=None) -> dict:
      • Reads callback from ContextVar
      • Builds tool_call payload: {id: str(uuid4()), action, table, data, filters, vector, limit} (omits None fields)
      • Calls await callback(payload) — which sends the WS frame and waits for tool_result
      • Returns the result dict
    • def set_client_executor(fn) / def clear_client_executor() — ContextVar management
  • Add to app/schemas.py:
    • WsFrameType(str, Enum): chat_request, text_chunk, tool_call, tool_result, final, ping
    • WsToolCall(BaseModel): type, id, action, table?, data?, filters?, vector?, limit?
    • WsToolResult(BaseModel): type, id, row?, rows?, results?, deleted?, ok?, error?
    • WsTextChunk(BaseModel): type, text
    • WsFinal(BaseModel): type, response
  • Files: app/core/ws_context.py, app/schemas.py
  • Outcome: Any tool can await execute_on_client(...) to query/mutate the user's local DB.

Step B.2 — Rewrite all 23 tools to use execute_on_client()

  • Each tool: same @tool decorator, same parameters, same docstring. Replace return json.dumps({...}) body with:

    1. Call result = await execute_on_client(action=..., table=..., data/filters=...)
    2. Return human-readable string with confirmation + key data from result
  • app/agents/task_agent.py (8 tools):

    • list_tasks(project_id, status, search, order_by):
      result = await execute_on_client(action="select", table="tasks", filters={
          "projectId": project_id or None,
          "status": status or None,
          "search": search or None,
          "orderBy": order_by or None,
      })
      rows = result.get("rows", [])
      if not rows:
          return "No tasks found matching the given filters."
      lines = [f"- {r['title']} (status: {r['status']}, priority: {r['priority']}, id: {r['id']})" for r in rows]
      return f"Found {len(rows)} task(s):\n" + "\n".join(lines)
      
    • create_task(title, ...):
      result = await execute_on_client(action="insert", table="tasks", data={
          "title": title, "description": description or None, "status": status,
          "priority": priority, "assignee": assignees, "dueDate": due_date or None,
          "projectId": project_id or None, "isAiSuggested": is_ai_suggested, "isApproved": is_approved,
      })
      row = result["row"]
      return f"Task created: '{row['title']}' (id: {row['id']}, status: {row['status']}, priority: {row['priority']})"
      
    • update_task(task_id, ...): build updates dict (same logic as now) → execute_on_client(action="update", table="tasks", data={"id": task_id, "updates": updates}) → return "Task updated: {title}"
    • delete_task(task_id): execute_on_client(action="delete", table="tasks", data={"id": task_id}) → return "Task deleted"
    • list_tasks_due_today(): calculate today's start/end ms → execute_on_client(action="select", table="tasks", filters={"dueDateFrom": start, "dueDateTo": end}) → format + return
    • list_task_comments(task_id): execute_on_client(action="select", table="taskComments", filters={"taskId": task_id}) → format + return
    • add_task_comment(task_id, author, content): execute_on_client(action="insert", table="taskComments", data={...}) → return confirmation
    • delete_task_comment(comment_id): execute_on_client(action="delete", table="taskComments", data={"id": comment_id}) → return confirmation
  • app/agents/project_agent.py (6 tools):

    • list_projects(client_id, include_archived): execute_on_client(action="select", table="projects", filters={clientId, includeArchived}) → format + return
    • list_all_projects(): execute_on_client(action="select", table="projects") → format + return
    • get_project(project_id): execute_on_client(action="get", table="projects", data={"id": project_id}) → return project details or "not found"
    • create_project(name, client_id): execute_on_client(action="insert", table="projects", data={name, clientId}) → return confirmation + id
    • update_project(project_id, ...): build updates → execute_on_client(action="update", ...) → return confirmation
    • delete_project(project_id): execute_on_client(action="delete", ...) → return confirmation
  • app/agents/timeline_agent.py (4 tools):

    • list_timelines(project_id): execute_on_client(action="select", table="timelines", filters={projectId}) → format + return
    • create_timeline(project_id, title, date, ...): execute_on_client(action="insert", table="timelines", data={...}) → return confirmation + id
    • update_timeline(timeline_id, ...): build updates → execute_on_client(action="update", ...) → return confirmation
    • delete_timeline(timeline_id): execute_on_client(action="delete", ...) → return confirmation
  • app/agents/note_agent.py (5 tools):

    • list_notes(project_id): execute_on_client(action="select", table="notes", filters={projectId}) → format + return
    • get_note(note_id): execute_on_client(action="get", table="notes", data={"id": note_id}) → return full content or "not found"
    • create_note(title, content, project_id): execute_on_client(action="insert", table="notes", data={...}) → then execute_on_client(action="vector_upsert", data={id, projectId, content}, vector=await embed(content)) → return confirmation
    • update_note(note_id, ...): build updates → execute_on_client(action="update", ...) → then vector_upsert for updated content → return confirmation
    • delete_note(note_id): execute_on_client(action="delete", ...) → return confirmation
  • Files: app/agents/task_agent.py, app/agents/project_agent.py, app/agents/timeline_agent.py, app/agents/note_agent.py

  • Outcome: All 23 tools query real user data via WS. LLM sees actual rows, not action descriptors.

Step B.3 — Bidirectional WebSocket handler

  • Refactor app/api/routes/chat.py WS endpoint:
    • After auth + accept + receive chat_request:
      1. Create execute_on_client callback closure capturing the websocket:
        pending_calls: dict[str, asyncio.Future] = {}
        
        async def on_client_result(frame: dict):
            """Called when a tool_result frame arrives from Electron."""
            fut = pending_calls.pop(frame["id"], None)
            if fut and not fut.done():
                fut.set_result(frame)
        
        async def execute_callback(payload: dict) -> dict:
            """Send tool_call to Electron, wait for tool_result."""
            call_id = payload["id"]
            fut = asyncio.get_event_loop().create_future()
            pending_calls[call_id] = fut
            await websocket.send_text(json.dumps({"type": "tool_call", **payload}))
            return await asyncio.wait_for(fut, timeout=30.0)
        
      2. Set client_executor ContextVar with execute_callback
      3. Run orchestrator in a task — it calls agents, agents call tools, tools call execute_on_client() which goes through the callback
      4. In parallel, run a message receive loop that dispatches incoming frames:
        • tool_resulton_client_result(frame)
        • ping → ignore
      5. Orchestrator yields text_chunk frames → send to client
      6. Send final frame when done
      7. Clear ContextVar
    • Keep heartbeat ping every 30s
    • 30s timeout on tool_result — if Electron doesn't respond, future raises TimeoutError, tool returns error string to LLM
  • Files: app/api/routes/chat.py
  • Outcome: Full bidirectional WS. Tool calls and text streaming happen concurrently on the same connection.

Step B.4 — _tool_loop — no changes needed

  • Verify app/core/agent_registry.py works unchanged:
    • _tool_loop calls tool_fn.ainvoke(args) → tool awaits execute_on_client() (WS round-trip) → returns string → ToolMessage(content=string) → LLM sees real data
    • The async WS round-trip happens inside each tool. _tool_loop just sees an awaited tool returning a string — same as before, different content.
  • No code changes. Just verify + add a log line for tool execution times if desired.

Step B.5 — Orchestrator cleanup

  • Update app/core/orchestrator.py:
    • orchestrate_stream(): remove "actions": [] from final frame. Final becomes: {"done": true, "response": "..."}
    • No other changes — classify_intentcall_agent → chunk response → final frame
  • Files: app/core/orchestrator.py
  • Outcome: Clean final frame. No more action descriptors in the protocol.

Step B.6 — Add /vectors/embed endpoint

  • Add to app/api/routes/vectors.py:
    • POST /api/v1/storage/vectors/embed:
      • Request: { text: str }
      • Response: { vector: list[float] } (1536-dim from text-embedding-3-small)
      • Auth required (JWT)
    • Used by:
      • Backend tools: note_agent calls this before vector_upsert
      • Electron: vectordb.ts calls this for note embedding on create/update
  • Files: app/api/routes/vectors.py
  • Outcome: Single embedding endpoint. Both backend tools and Electron can generate vectors.

Verification

What to test How
Read flow "List my tasks" → list_taskstool_call{select, tasks} → Electron returns rows → LLM describes real tasks
Write flow "Create a task called Buy milk" → create_tasktool_call{insert, tasks, data:{title:"Buy milk"}} → Electron inserts + returns row → tool confirms with id
Multi-tool "How many todo tasks do I have?" → list_tasks(status=todo) → LLM counts actual rows → "You have 3 todo tasks"
Vector search "Find notes about deployment" → tool embeds → tool_call{vector_search, vector:[...]} → Electron searches LanceDB → returns matching notes
Vector upsert "Create a note about..." → insert note → vector_upsert with embedding → both SQLite + LanceDB updated
Tool timeout Disconnect Electron mid-conversation → 30s timeout → tool returns error → LLM handles gracefully
Concurrent calls Agent calls 2 tools in sequence → each does WS round-trip → both succeed → LLM sees both results
_tool_loop max iter Verify 5-iteration limit still works → after 5 tool calls, LLM forced to answer without tools

Execution Notes

  • Phase 1 is the critical path. Auth + backend client + drizzle executor + orchestrator refactor must land first.
  • Steps 1.11.4 are additive — existing app keeps working until Step 1.5 swaps the orchestrator.
  • Step 2.1 is the point of no return — after removing LangChain, there's no local AI fallback.
  • Phase B (backend changes) must land before Phase 1.31.5 — Electron needs the bidirectional WS to talk to.
  • Phase 3 and Phase 4 are independent — can be parallelized after Phase 2.

Phase 3 — Agent System: Config, Orchestration & Cloud Connectors

Objective: Backend manages all agent configuration, scheduling, orchestration, and cloud data fetching. Two agent types: Local Directory Agent (backend triggers Electron to read files, then AI analyzes) and Cloud Connector Agent (backend fetches Gmail/Teams data directly, AI analyzes, pushes results to Electron via WS tool_call). All extracted items use existing WS tool infrastructure to insert into Electron's local DB with is_ai_suggested=True.

Electron Phase 3 plan: ../adiuva/AI_REFACTOR_PLAN.md Phase 3 section.

Electron UI status (2025): Steps 3.6, 3.7, 3.8 of the Electron plan are complete. Agents are configured inside the Settings page (/settings?section=agents) — not a standalone route. The JourneyDialog (Step 3.8) is embedded inline in the Settings → Agents section. LocalAgentConfigPanel and CloudAgentConfigPanel (Step 3.7) are also inline. This affects the journey API contract (see Step 3.5 below).

Architecture

Local Agent:
  Scheduler/manual trigger ──► check device online ──► WS agent_run → Electron
    ──► Electron reads files ──► WS agent_data → Backend
    ──► Backend AI (prompt_template + file content) ──► WS tool_call(insert) → Electron
    ──► Electron persists with isAiSuggested=1

Cloud Agent:
  Scheduler/manual trigger ──► Backend fetches Gmail/Teams (OAuth) ──► Backend AI analyzes
    ──► check device online ──► WS tool_call(insert) → Electron ──► Electron persists

New WS frame types:

Direction type Payload
Server → Client agent_run { run_id, agent_id, config: { paths, file_extensions, prompt_template, data_types } }
Client → Server agent_data { run_id, files: [{ path, name, content, metadata }] }
Client → Server agent_complete { run_id, files_read, errors }
Client → Server device_hello { device_id, agent_ids }

Step 3.1 — Agent config tables

  • Add to app/models.py:
    • LocalAgentConfig:
      • id UUID PK
      • user_id FK → users
      • device_id str — identifies which Electron install this config belongs to
      • name str
      • directory_paths JSON — list of absolute paths on the device
      • data_types JSON — which tables to extract to: ["tasks", "notes", "timelines", "projects"]
      • prompt_template text — user-configured via Chatbot Journey
      • file_extensions JSON — e.g. [".eml", ".txt", ".pdf", ".md"]
      • schedule_cron str — e.g. "0 */6 * * *" (every 6h)
      • enabled bool (default True)
      • last_run_at datetime nullable
      • created_at, updated_at timestamps
    • CloudAgentConfig:
      • id UUID PK
      • user_id FK → users
      • provider str — enum: gmail, teams, outlook
      • name str
      • data_types JSON — same format as local
      • prompt_template text
      • oauth_token_encrypted text — Fernet-encrypted OAuth2 credentials
      • schedule_cron str
      • enabled bool (default True)
      • last_run_at datetime nullable
      • filter_config JSON — provider-specific: { labels: [], date_range: {from, to}, senders: [] }
      • created_at, updated_at timestamps
    • AgentRunLog:
      • id UUID PK
      • agent_id str — references LocalAgentConfig.id or CloudAgentConfig.id
      • agent_type str — local or cloud
      • user_id FK → users
      • status str — running, success, error, partial
      • items_processed int (default 0)
      • items_created int (default 0)
      • errors JSON — list of error strings
      • started_at datetime
      • completed_at datetime nullable
  • Add Pydantic schemas to app/schemas.py:
    • LocalAgentConfigCreate, LocalAgentConfigUpdate, LocalAgentConfigResponse
    • CloudAgentConfigCreate, CloudAgentConfigUpdate, CloudAgentConfigResponse
    • AgentRunLogResponse
    • AgentCatalogItem{ type, name, description, config_schema }
    • WsAgentRun, WsAgentData, WsAgentComplete, WsDeviceHello
  • Generate Alembic migration
  • Files: app/models.py, app/schemas.py, alembic/versions/
  • Outcome: Agent config and run tracking tables in PostgreSQL.

Step 3.2 — Agent CRUD API routes

  • Create app/api/routes/agents.py:
    • GET /api/v1/agents/catalog — returns hardcoded agent type catalog:
      • local_directory: "Watches local directories, extracts data from files using AI"
      • gmail: "Scans Gmail inbox, extracts tasks/notes from emails"
      • teams: "Monitors Teams messages, extracts action items"
      • outlook: "Scans Outlook inbox, extracts tasks/notes"
    • GET /api/v1/agents/local — list user's local agent configs
    • POST /api/v1/agents/local — create local agent config
      • Body: { name, device_id, directory_paths, data_types, prompt_template, file_extensions, schedule_cron }
      • Tier check: count enabled agents ≤ batch_active limit
    • PUT /api/v1/agents/local/{id} — update config (ownership check)
    • DELETE /api/v1/agents/local/{id} — delete config + associated run logs
    • GET /api/v1/agents/cloud — list user's cloud agent configs
    • POST /api/v1/agents/cloud — create cloud connector config
      • Body: { provider, name, data_types, prompt_template, oauth_token_encrypted, schedule_cron, filter_config }
      • Tier check: same batch_active limit (local + cloud count together)
    • PUT /api/v1/agents/cloud/{id} — update config
    • DELETE /api/v1/agents/cloud/{id} — delete config + run logs
    • GET /api/v1/agents/runs — query params: agent_id, page, limit → paginated run logs
    • POST /api/v1/agents/{id}/run — manual trigger (dispatches to agent runner)
    • All routes require JWT auth; ownership enforced on all mutations
  • Register router in app/main.py
  • Files: app/api/routes/agents.py, app/main.py
  • Outcome: Full CRUD for agent configs with tier-gated creation limits.

Step 3.3 — Device WS endpoint

  • Create app/api/routes/device_ws.py:
    • WebSocket /api/v1/ws/device?token=<jwt> — persistent connection from Electron
    • On connect:
      • Authenticate JWT
      • Receive device_hello frame → extract device_id, agent_ids
      • Store connection in DeviceConnectionManager (in-memory dict: user_id → { ws, device_id })
      • Check for overdue agent runs → trigger them immediately
    • Message loop:
      • agent_data → route to active agent run handler
      • agent_complete → finalize agent run
      • tool_result → route to pending tool call (same pattern as chat WS)
      • pong → heartbeat ack
    • On disconnect:
      • Remove from DeviceConnectionManager
      • Mark any in-progress agent runs as error with "device disconnected"
    • Heartbeat: send ping every 30s, disconnect if no pong within 10s
  • Create app/core/device_manager.py:
    • DeviceConnectionManager (singleton):
      • register(user_id, device_id, ws) — stores active connection
      • unregister(user_id) — removes connection
      • get_ws(user_id) -> WebSocket | None — returns active WS if device is online
      • is_online(user_id, device_id=None) -> bool — optionally checks specific device
      • send_frame(user_id, frame: dict) — sends JSON frame to device
  • Files: app/api/routes/device_ws.py, app/core/device_manager.py, app/main.py
  • Outcome: Backend maintains persistent WS connections to Electron devices for agent triggers.

Step 3.4 — Agent run orchestrator

  • Create app/core/agent_runner.py:
    • async run_local_agent(user_id, config: LocalAgentConfig, device_mgr: DeviceConnectionManager):
      1. Check device is online with matching device_id → abort if offline
      2. Create AgentRunLog with status=running
      3. Send WsAgentRun frame to Electron with config (paths, extensions, prompt)
      4. Await WsAgentData frames — collect file contents
      5. Await WsAgentComplete frame — Electron signals done reading
      6. For each file: call LLM with prompt_template + file content → extract structured items
      7. For each extracted item: send WsToolCall(insert, table, data) to Electron → await WsToolResult
        • All inserts include is_ai_suggested=True, is_approved=False
      8. Update AgentRunLog: status=success, items_processed, items_created
    • async run_cloud_agent(user_id, config: CloudAgentConfig, device_mgr: DeviceConnectionManager):
      1. Check device is online → abort if offline (results must push to Electron)
      2. Create AgentRunLog with status=running
      3. Decrypt OAuth credentials from config.oauth_token_encrypted
      4. Fetch data from cloud provider (Step 3.6):
        • Gmail: google-api-python-client + filter_config label/date filters
        • Teams: msgraph-sdk + channel/date filters
        • Outlook: msgraph-sdk + folder/date filters
      5. For each item: call LLM with prompt_template + email/message content → extract structured items
      6. For each extracted item: send WsToolCall(insert) to Electron → await WsToolResult
      7. Update AgentRunLog
    • async trigger_pending_runs(user_id, device_id, device_mgr):
      • Called when Electron connects (after device_hello)
      • Queries all enabled agent configs where last_run_at + schedule_interval < now()
      • For local agents: only triggers if config.device_id == device_id
      • For cloud agents: triggers regardless of device (any connected device can receive results)
      • Executes runs sequentially (one at a time to avoid overwhelming the WS)
    • Error handling: on any failure, update AgentRunLog with status=error + error details
  • Wire POST /agents/{id}/run endpoint to dispatch background task via asyncio.create_task()
  • Replace _trigger_pending_runs_stub in device_ws.py with real trigger_pending_runs call
  • Add croniter>=3.0.0 to requirements.txt
  • 23 unit + integration tests covering all code paths
  • Files: app/core/agent_runner.py, app/api/routes/agents.py, app/api/routes/device_ws.py, requirements.txt, tests/test_agent_runner.py
  • Outcome: Backend drives all agent execution — both local (via WS file request) and cloud (direct API calls — stub until Step 3.6).

Step 3.5 — Chatbot Journey endpoint

  • Create app/api/routes/agent_setup.py:
    • POST /api/v1/agents/journey/start:
      • Body: { agent_type: "local"|"cloud", agent_id: str | None }
        • agent_type: which kind of agent this journey configures.
        • agent_id: optional — if provided, the session is pre-seeded with the existing agent's prompt_template so the user can refine it. If absent, fresh journey.
        • No data_types field — data types are determined through the conversation itself, not sent upfront.
      • Creates a journey session (in-memory or Redis-backed)
      • Returns first AI message: contextual question based on agent type
        • Local: "What kind of files are in the directories you want to monitor? (emails, documents, logs, etc.)"
        • Cloud: "What kind of emails/messages should I look for? (client communications, invoices, meeting notes, etc.)"
      • Response: { session_id, message, done: false }
      • Electron note: proxyPost auto-converts camelCase keys to snake_case. Electron sends { agentType, agentId } → backend receives { agent_type, agent_id }.
    • POST /api/v1/agents/journey/message:
      • Body: { session_id, message }
      • AI processes user's answer, asks follow-up questions (max 5 turns)
      • System prompt: "You are configuring a data extraction agent for a freelancer. Ask about file format, what data to extract (tasks, notes, timelines), naming conventions, priority rules, and any special mapping. After 3-5 questions, generate a detailed prompt_template."
      • When AI determines enough context: { session_id, message: "Here's your configuration...", done: true, prompt_template: "..." }
      • The prompt_template is a structured instruction for the extraction LLM (e.g. "Extract tasks from email. Subject becomes task title. If body contains 'urgent' or 'ASAP', set priority to 'high'. Extract due dates if mentioned.")
      • Electron note: toCamelCase converts the response → Electron reads promptTemplate from the final message and auto-fills the agent config panel. User clicks "Save & apply" which calls agent.local.update / agent.cloud.update tRPC mutation.
  • Files: app/api/routes/agent_setup.py, app/main.py
  • Outcome: Users configure AI prompts through guided conversation. Journey can refine an existing config when agent_id is provided.

Step 3.6 — Cloud provider integrations

  • Create app/integrations/gmail.py:
    • GmailClient:
      • __init__(oauth_token) — initializes Google API client
      • async fetch_messages(filter_config, since: datetime) -> list[EmailMessage]
      • EmailMessage: { id, subject, sender, body_text, date, labels }
      • Handles token refresh via Google OAuth2 refresh flow
      • Respects filter_config.labels, filter_config.date_range, filter_config.senders
  • Create app/integrations/ms_graph.py:
    • MSGraphClient:
      • __init__(oauth_token) — initializes MS Graph client
      • async fetch_emails(filter_config, since: datetime) -> list[EmailMessage] (Outlook)
      • async fetch_messages(filter_config, since: datetime) -> list[ChatMessage] (Teams)
      • ChatMessage: { id, content, sender, channel, date }
      • Handles token refresh via MSAL
  • Create app/integrations/__init__.py — factory: get_provider(provider_name) -> GmailClient | MSGraphClient
  • Dependencies: google-api-python-client, google-auth-oauthlib, msgraph-sdk, msal
  • Files: app/integrations/gmail.py, app/integrations/ms_graph.py, app/integrations/__init__.py
  • Outcome: Backend can fetch emails/messages from Gmail, Outlook, and Teams.

Step 3.7 — Agent scheduler

  • Create app/core/agent_scheduler.py:
    • Uses APScheduler (or simple asyncio loop) to check agent schedules
    • Every 60s: query enabled agents where last_run_at + cron_interval < now()
    • For each due agent:
      • Check if user's device is online via DeviceConnectionManager
      • If online: dispatch to agent_runner
      • If offline: skip (will trigger on next device_hello)
    • Locks: use PostgreSQL advisory locks to prevent duplicate runs in multi-instance deployments
  • Integrate with FastAPI lifespan (start scheduler on app startup, shutdown gracefully)
  • Dependencies: apscheduler>=4.0
  • Files: app/core/agent_scheduler.py, app/main.py
  • Outcome: Agents run automatically on their configured schedules.

Step 3.8 — OAuth flow endpoints

  • Create app/api/routes/oauth.py:
    • GET /api/v1/oauth/{provider}/authorize — returns OAuth authorization URL
      • Gmail: Google OAuth2 with gmail.readonly scope
      • Outlook/Teams: MS identity platform with Mail.Read, ChannelMessage.Read.All scopes
    • GET /api/v1/oauth/{provider}/callback — handles OAuth redirect
      • Exchanges auth code for access + refresh tokens
      • Encrypts tokens with Fernet (server-side key from settings)
      • Returns encrypted token blob for storage in CloudAgentConfig.oauth_token_encrypted
    • POST /api/v1/oauth/{provider}/refresh — refresh expired OAuth token
  • Files: app/api/routes/oauth.py, app/main.py
  • Outcome: Users can connect Gmail/Teams/Outlook accounts securely.

Phase 3 — Verification

# Scenario Expected
1 Agent CRUD Create/read/update/delete local and cloud configs; tier limits enforced (free=2, pro=10)
2 WS device connect Electron connects → device_hello → backend stores connection → triggers overdue runs
3 Local agent run Backend sends agent_run → Electron reads files → agent_data → backend AI extracts → tool_call(insert) → Electron persists with isAiSuggested=1
4 Cloud agent run Backend fetches Gmail → AI extracts tasks → tool_call(insert) → Electron persists
5 Device binding Local agent config with device_id=A only triggers when device A is connected
6 Chatbot Journey Start journey → 3-5 Q&A turns → produces valid prompt_template
7 Schedule Agent with schedule_cron="0 */6 * * *" runs every 6h when device is online
8 Offline resilience Device offline → runs skipped → device reconnects → overdue runs trigger immediately
9 OAuth flow Gmail authorize → callback → token encrypted → stored in config → fetch emails works

Phase 3 — New Dependencies

Package Purpose
google-api-python-client Gmail API access
google-auth-oauthlib Gmail OAuth2 flow
msgraph-sdk Outlook + Teams API access
msal MS identity platform auth
apscheduler>=4.0 Agent scheduling
cryptography (Fernet) OAuth token encryption at rest

Phase 5 — Shared Memory (SUPERSEDED)

This phase has been fully replaced by V3_MIGRATION_PLAN.md.

  • Chat WS fix → V3 Step 5 (Unified WS Handler — single multiplexed socket)
  • Agent memory → V3 Steps 67 (Cloud-side MemGPT-style memory in PostgreSQL + pgvector, encrypted at rest with per-user Fernet key)

The on-device KV approach (Electron SQLite agent_memory table) is no longer the target architecture. See V3_MIGRATION_PLAN.md for the current plan.