14 KiB
V3 Migration Plan — Multi-Agent AI Productivity App
Incremental migration from current architecture to v3. Each step is self-contained, testable, and backwards-compatible. No BYOK — server manages all LLM keys. Memory encryption: server-side per-user Fernet key (Option A).
General Rules
Code Cleanup: As you implement each step, remove any code that becomes unused or obsolete. This includes:
- Old functions/methods that are superseded by new ones
- Deprecated imports or modules
- Dead code paths
- Old test files no longer needed
This keeps the codebase clean and prevents confusion. When removing code, note it in the commit message if significant.
Decisions Log
| Topic | Decision |
|---|---|
| WS topology | Single multiplexed socket (merge chat into device WS) |
| LLM keys | Server-managed only, no user key passthrough |
| Memory encryption | Per-user server-generated Fernet key, encrypted at rest, decrypted in-memory |
| device_manager | Already multi-user correct (keyed by user_id), no structural change |
Step 1 — WS Frame Protocol (schemas.py)
Goal: Define the v3 frame vocabulary so all subsequent steps can import it.
Changes:
app/schemas.py— Add toWsFrameTypeenum:home_request,floating_requeststream_start,stream_text,stream_block,stream_endfloating_domaindata_request,data_response,mutation
- Add Pydantic models:
WsHomeRequest(type, message, conversation_history?)WsFloatingRequest(type, message, scope: {type, id?})WsStreamStart(type, request_id)WsStreamText(type, request_id, chunk)WsStreamBlock(type, request_id, block_type, data)WsStreamEnd(type, request_id, mutations?)WsFloatingDomain(type, request_id, domain)
- Keep all existing frame types (backward compat).
Files touched: app/schemas.py
Test: Unit test that validates each new model serializes/deserializes correctly.
pytest tests/test_schemas_v3.py
Status:
- Step 1 complete
Commit: After tests pass, commit with:
git commit -m "step-1: add v3 ws frame protocol (schemas.py)"
Step 2 — Agent Streaming + Tool Result Capture (agent_registry.py, agents/)
Goal: Agents can stream LLM tokens and expose structured tool results.
Changes:
app/core/agent_registry.py:- Add
_tool_loop_stream()toChatAgent— same logic as_tool_loop()but the final LLM call (when no more tool calls) usesllm.astream()and yields tokens. - Add
self.tool_results: list[dict]attribute toChatAgent.__init__(). - In both
_tool_loopand_tool_loop_stream, capture rawexecute_on_clientresults when tools run (store inself.tool_results).
- Add
app/agents/*.py— Each agent's tools already return text summaries. No change to tools. The raw data capture happens at the_tool_looplevel by interceptingToolMessagecontent that comes fromexecute_on_client.
Files touched: app/core/agent_registry.py
Test: Unit test with mocked LLM that verifies _tool_loop_stream() yields tokens and agent.tool_results contains structured data after a tool call.
pytest tests/test_agent_streaming.py
Status:
- Step 2 complete
Commit: After tests pass, commit with:
git commit -m "step-2: add agent streaming and tool result capture (agent_registry.py)"
Step 3 — Router Refactor (orchestrator.py)
Goal: Orchestrator returns agent name alongside execution, supports streaming.
Changes:
app/core/orchestrator.py:- Add
orchestrate_v3(user_id, message, context, mode)that:- Calls
classify_intent()(unchanged) ->agent_name - Instantiates agent via registry
- Returns
(agent_name, agent_instance)— caller drives execution
- Calls
- Add
orchestrate_v3_stream(user_id, message, context)->AsyncGeneratorthat:- Calls
classify_intent()->agent_name - Calls
agent.handle_stream()(uses_tool_loop_stream) - Yields
(agent_name, token)tuples — first yield includes agent name for domain detection
- Calls
- Keep
orchestrate()andorchestrate_stream()unchanged (backward compat for POST /chat).
- Add
Files touched: app/core/orchestrator.py
Test: Unit test with mocked LLM and mocked registry that verifies orchestrate_v3_stream yields (agent_name, token) pairs.
pytest tests/test_orchestrator_v3.py
Status:
- Step 3 complete
Commit: After tests pass, commit with:
git commit -m "step-3: add router refactor with streaming support (orchestrator.py)"
Step 4 — Output Formatting Layer (NEW: output_formatter.py)
Goal: Home and Floating responses diverge at this layer only.
Block Types (from Electron app components)
The LLM outputs a JSON block stream. Each block has a type field that maps to
an Electron renderer component. The server validates and forwards these blocks.
Text block — streamed immediately, word-by-word:
{ "type": "text", "content": "Here's your task summary..." }
Chart blocks — buffered until complete, validated, sent as stream_block.
Chart types match shadcn/ui Recharts wrappers used in the Electron app:
{ "type": "chart", "chartType": "<type>", "title": "...", "data": [...], "config": {...} }
Supported chartType values:
area— Area chart (shadcn AreaChart)bar— Bar chart (shadcn BarChart)line— Line chart (shadcn LineChart)pie— Pie chart (shadcn PieChart)radar— Radar chart (shadcn RadarChart)radial— Radial/gauge chart (shadcn RadialChart)
data is an array of objects with keys matching the chart's dataKey config.
config follows the shadcn ChartConfig format: { [dataKey]: { label, color } }.
Entity blocks — server serializes from agent.tool_results (not LLM-generated data):
{ "type": "entity_ref", "entity": "task" }
The server resolves this by looking up the structured data from the agent's
tool call results and emitting a stream_block with the full entity data.
Supported entity types (matching Electron component types):
task— TaskRow component (TaskItem: id, title, status, priority, assignee, dueDate, projectId, ...)project— Project card (id, name, clientId, status)note— Note card (id, title, createdAt, projectId)timeline— Timeline card (GanttTimeline: id, title, date, projectId, isAiSuggested, isApproved)
Table block — buffered, validated:
{ "type": "table", "headers": ["Col1", "Col2"], "rows": [["val1", "val2"]] }
Timeline block — buffered, validated (renders via GanttChart component):
{ "type": "timeline", "timelines": [{ "id": "...", "title": "...", "date": 1234567890 }] }
Changes
app/core/output_formatter.py(new file):HomeFormatter:- Receives token stream from orchestrator
- Accumulates tokens into a JSON-aware buffer
- Detects block boundaries by
typefield:text-> yieldsWsStreamTextimmediately (streams content word-by-word)chart-> buffers until JSON complete, validateschartTypeagainst allowed set, yieldsWsStreamBlockentity_ref-> looks up data fromagent.tool_results, serializes full entity, yieldsWsStreamBlocktable-> buffers, validates headers/rows structure, yieldsWsStreamBlocktimeline-> buffers, validates timeline objects, yieldsWsStreamBlock
- Invalid blocks are logged and skipped (never crash the stream)
FloatingFormatter:- Receives
agent_namefrom orchestrator - Maps agent name to domain (deterministic, by code — no LLM):
task_agent->"tasks"timeline_agent->"timelines"note_agent->"notes"project_agent->"projects"
- Yields
WsFloatingDomainimmediately - Then yields
WsStreamTextfor all tokens (text-only, no blocks)
- Receives
Files touched: app/core/output_formatter.py (new)
Test: Unit test that feeds a mock token stream through each formatter and asserts correct frame output sequence.
pytest tests/test_output_formatter.py
Status:
- Step 4 complete
Commit: After tests pass, commit with:
git commit -m "step-4: add output formatting layer (output_formatter.py)"
Step 5 — Unified WS Handler (device_ws.py, chat.py, main.py)
Goal: Single multiplexed WebSocket handles device frames + Home/Floating chat.
Changes:
app/api/routes/device_ws.py:- Extend
_message_loopdispatch to handlehome_requestandfloating_request:- On
home_request: setws_contextexecutor, callorchestrate_v3_stream, pipe throughHomeFormatter, send frames back on same socket. - On
floating_request: same, but pipe throughFloatingFormatter. - Wrap both in try/finally to clear
ws_context.
- On
- Each request gets a
request_id(UUID) for frame correlation. - Concurrent requests from same client are supported (each runs as an async task).
- Extend
app/api/routes/chat.py:- Remove
chat_streamWS endpoint and any related helper functions that were only used by it. - Keep
POST /chatendpoint unchanged (REST fallback). - Clean up any unused imports.
- Remove
app/main.py:- No change needed (device_ws router already registered).
Files touched: app/api/routes/device_ws.py, app/api/routes/chat.py, app/main.py
Test: Integration test with a WebSocket test client that:
- Connects to
/api/v1/ws/device - Sends
device_hello - Sends
home_request-> receivesstream_start,stream_text*,stream_end - Sends
floating_request-> receivesfloating_domain,stream_text*,stream_end - Verifies
tool_call/tool_resultround-trip still works during chat
pytest tests/test_ws_unified.py
Status:
- Step 5 complete
Commit: After tests pass, commit with:
git commit -m "step-5: unify ws handler (device_ws.py, chat.py)"
Step 6 — Memory Models + Migration (models.py, alembic)
Goal: Database tables for 4-tier memory, with per-user encryption key.
Changes:
app/models.py:- Add
encryption_keycolumn toUsermodel (Fernet key, generated on registration). - Add
MemoryCoremodel:id, user_id, key, value_encrypted, updated_at - Add
MemoryAssociativemodel:id, user_id, content_encrypted, embedding (Vector(1536)), entity_type, entity_id, updated_at - Add
MemoryEpisodicmodel:id, user_id, summary_encrypted, session_id, created_at - Add
MemoryProactivemodel:id, user_id, pattern_encrypted, confidence, source, created_at
- Add
alembic/versions/— New migration adding the 4 memory tables + user encryption_key column.app/api/routes/auth.py— On user registration, generate and store a Fernet key.
Files touched: app/models.py, alembic/versions/xxx_add_memory_tables.py, app/api/routes/auth.py
Test: Run migration up/down, verify tables exist with correct columns.
alembic upgrade head && alembic downgrade -1 && alembic upgrade head
pytest tests/test_memory_models.py
Status:
- Step 6 complete
Commit: After tests pass, commit with:
git commit -m "step-6: add memory models and migration (models.py, alembic)"
Step 7 — Memory Middleware (NEW: memory_middleware.py)
Goal: Enrich every Router call with memory context, store interactions after.
Changes:
app/core/memory_middleware.py(new file):MemoryMiddlewareclass with:enrich_context(user_id, message) -> dict(pre-LLM):- Load core memory (user prefs) — always injected
- Embed
message, searchMemoryAssociativevia pgvector — top-k relevant - Fetch recent
MemoryEpisodicentries — last N sessions - Fetch active
MemoryProactivepatterns — above confidence threshold - Return merged context dict
store_episode(user_id, session_id, message, response)(post-LLM):- Summarize interaction (short LLM call or heuristic)
- Encrypt and store in
MemoryEpisodic - Embed interaction, encrypt and upsert in
MemoryAssociative
update_core(user_id, key, value)— explicit preference update- All read/write operations encrypt/decrypt using the user's Fernet key from
User.encryption_key
app/api/routes/device_ws.py— Updatehome_requestandfloating_requesthandlers:- Before orchestrator:
enriched = await memory.enrich_context(user_id, message) - After response complete:
await memory.store_episode(user_id, ...)
- Before orchestrator:
Files touched: app/core/memory_middleware.py (new), app/api/routes/device_ws.py
Test: Unit test with seeded memory rows that verifies:
enrich_contextreturns core prefs + associative matches + episodic summariesstore_episodecreates encrypted rows that can be decrypted with the user's key- End-to-end WS test: send
home_request, verify memory enrichment is passed to orchestrator
pytest tests/test_memory_middleware.py
Status:
- Step 7 complete
Commit: After tests pass, commit with:
git commit -m "step-7: add memory middleware (memory_middleware.py, device_ws.py)"
Summary
| Step | Component | Effort | Depends On |
|---|---|---|---|
| 1 | WS Frame Protocol | Low | — |
| 2 | Agent Streaming | Medium | Step 1 |
| 3 | Router Refactor | Medium | Step 2 |
| 4 | Output Formatter | High | Steps 1, 3 |
| 5 | Unified WS Handler | High | Steps 1–4 |
| 6 | Memory Models | Medium | — |
| 7 | Memory Middleware | High | Steps 5, 6 |
Steps 1–5 form the streaming pipeline. Steps 6–7 form the memory system. Step 6 can run in parallel with Steps 2–4 (no dependencies).