docs: add project-folder integration design spec

Approved design for linking a local folder to a project: lightweight manifest with per-file LLM summaries, WS-streamed indexing pipeline, pre-injected manifest for Home/Brief/Task-Brief agents, tier-gated token+file-count quota recorded backend-side.
2026-05-11 22:08:44 +02:00
parent 74e2152596
commit 361f89a29d
1 changed files with 386 additions and 0 deletions
--- a/docs/superpowers/specs/2026-05-11-project-folder-integration-design.md
+++ b/docs/superpowers/specs/2026-05-11-project-folder-integration-design.md
@@ -0,0 +1,386 @@
+# Project Folder Integration — Design
+
+**Date:** 2026-05-11
+**Status:** Approved (brainstorming complete)
+**Author:** Roberto + Claude
+
+## Goal
+
+Let users link a local (or shared-PC) folder to an adiuvAI project. Adiuvai scans the folder, generates per-file summaries via LLM, and exposes the resulting manifest to the Home, Brief, and Task-Brief agents so they can answer project questions with awareness of the user's local files.
+
+## Non-goals
+
+- Multi-folder linking per project (deferred — 1 folder per project for now).
+- Full-text RAG over file contents (we use lightweight per-file summaries instead).
+- File editing from inside adiuvAI (read-only).
+- Token-usage display in the project UI (recorded backend-side; dedicated Settings page comes later).
+- Web SPA support (Electron-only — web SPA has no filesystem access).
+
+## Strategy
+
+**Hybrid AI File System (manifest first, optional wiki tier later).**
+
+Phase 1 (this spec): build a lightweight manifest — for each indexable file record `(relativePath, kind, size, mtime, 1-line LLM summary)`. The agent receives the manifest pre-injected into its system prompt and reads full file contents lazily via a scoped tool.
+
+Phase 2 (future, out of scope): for folders above N files or on user opt-in, generate per-folder + per-file wiki summaries written to a structured index. Not built now.
+
+## Architecture
+
+```
+┌─────────────────────── adiuvAI (Electron) ───────────────────────┐
+│ Renderer (React)                                                  │
+│   • Project hero: <FolderChip>      (status glance)              │
+│   • <FilesTab>: link/unlink, browse, rescan, progress, browser   │
+│   • <FolderBrowser>: tree view of manifest                       │
+│                                                                   │
+│ Main (Node)                                                       │
+│   • db/schema.ts: +projectFolderFiles, +projects.folderPath       │
+│   • files/scanner.ts: walk + filter + mtime delta                │
+│   • files/indexer.ts: orchestrates WS index session              │
+│   • files/daily-rescan.ts: 24h-stale check on app start          │
+│   • router/projectFolders.ts: tRPC procedures                     │
+│   • api/backend-client.ts: +sendIndexBatch frame                 │
+│   • api/drizzle-executor.ts: +read_project_folder_manifest,      │
+│                              +read_project_folder_file actions    │
+└───────────────────────────────────────────────────────────────────┘
+                              │ /api/v1/device WS
+                              ▼
+┌────────────────────────── api (FastAPI) ─────────────────────────┐
+│ device_ws.py: +index_file_batch / +index_file_result frames      │
+│ core/folder_indexer.py: summarize text / vision per file          │
+│ core/deep_agent.py: pre-inject manifest when project context set │
+│ agents/folder_agent.py: scoped read_project_folder_file tool     │
+│ billing/tier_manager.py: +folder_max_files, +folder_monthly_tokens│
+│ models.py: +AgentRunLog.tokens_used, +MonthlyTokenUsage table     │
+└───────────────────────────────────────────────────────────────────┘
+```
+
+**Privacy invariant:** file content travels to the backend only transiently — for summarization — and is never persisted there. Summaries and manifest entries live in the local SQLite database. Token usage is recorded backend-side because it gates the user's tier quota.
+
+## Decisions
+
+| Topic | Decision |
+|-------|----------|
+| Retrieval strategy | Hybrid: manifest first, optional wiki tier later (phase 2, out of scope) |
+| File scope | Text whitelist (.md, .txt, .pdf, .docx, .csv, code) + images (.png/.jpg) summarized via gpt-4o-mini vision |
+| Cardinality | One folder per project |
+| Rescan triggers | Manual button + daily auto (24h staleness check on app start) + on-demand mtime delta when manifest is read |
+| Rate-limit metric | Tokens-per-month per user **and** total file-count cap per folder, both tier-gated |
+| Indexing pipeline | WS streaming over existing `/api/v1/device` with new frame types |
+| Agent access | Pre-inject manifest into system prompt; lazy reads via scoped `read_project_folder_file` tool |
+| UI placement | Hero chip + dedicated "Files" tab in `ProjectTabBar` |
+| Platform | Electron-only (web SPA: tab disabled) |
+| Token-usage display | Out of scope (record backend-side, surface in Settings later) |
+
+## Schema Changes
+
+### adiuvAI local SQLite (`src/main/db/schema.ts`)
+
+Extend `projects`:
+
+```typescript
+projects: {
+  // existing columns...
+  folderPath: text('folder_path'),                       // nullable absolute path or UNC
+  folderLastScannedAt: integer('folder_last_scanned_at'),// ms, nullable
+  folderLastScanStatus: text('folder_last_scan_status'), // 'idle' | 'scanning' | 'error'
+  folderTotalFiles: integer('folder_total_files').default(0),
+}
+```
+
+New table `projectFolderFiles`:
+
+```typescript
+projectFolderFiles: {
+  id: text('id').primaryKey(),
+  projectId: text('project_id').notNull(),       // FK projects.id (no DB constraint per convention)
+  relativePath: text('relative_path').notNull(), // path relative to folderPath
+  ext: text('ext').notNull(),                    // '.md', '.png', ...
+  kind: text('kind').notNull(),                  // 'text' | 'image' | 'pdf' | 'docx' | 'skipped' | 'error'
+  sizeBytes: integer('size_bytes').notNull(),
+  mtimeMs: integer('mtime_ms').notNull(),
+  summary: text('summary'),                      // nullable, ≤500 chars
+  summaryUpdatedAt: integer('summary_updated_at'),
+  // Unique index: (projectId, relativePath)
+}
+```
+
+### api Postgres (alembic migration)
+
+```python
+op.add_column('agent_run_logs',
+  sa.Column('tokens_used', sa.Integer(), nullable=False, server_default='0'))
+
+op.create_table('monthly_token_usage',
+  sa.Column('user_id', UUID(as_uuid=False), ForeignKey('users.id', ondelete='CASCADE'), nullable=False),
+  sa.Column('year_month', sa.String(7), nullable=False),  # 'YYYY-MM'
+  sa.Column('feature', sa.String(64), nullable=False),    # 'folder_index'
+  sa.Column('tokens_used', sa.Integer, nullable=False, server_default='0'),
+  sa.PrimaryKeyConstraint('user_id', 'year_month', 'feature'),
+)
+```
+
+### Tier matrix (`app/billing/tier_manager.py`)
+
+| Feature                 | Free | Pro | Power | Team |
+|-------------------------|------|-----|-------|------|
+| `folder_max_files`      | 200  | 5000| -1    | -1   |
+| `folder_monthly_tokens` | 100k | 2M  | -1    | -1   |
+
+## Indexing Pipeline
+
+### New WS frame types on `/api/v1/device`
+
+| Direction | Frame                     | Payload |
+|-----------|---------------------------|---------|
+| C → S     | `index_session_start`     | `{ sessionId, projectId, totalFiles }` |
+| C → S     | `index_file_batch`        | `{ sessionId, files: [{relPath, kind, content/imageB64, sizeBytes, mtimeMs}] }` (batches of 5) |
+| S → C     | `index_file_result`       | `{ sessionId, relPath, summary, tokensUsed, error? }` |
+| S → C     | `index_session_progress`  | `{ sessionId, processed, total }` |
+| C → S     | `index_session_cancel`    | `{ sessionId }` |
+| S → C     | `index_session_done`      | `{ sessionId, status: 'completed' \| 'cancelled' \| 'quota_exceeded' \| 'error' }` |
+
+### Flow (Electron `files/indexer.ts`)
+
+1. tRPC `projectFolders.startScan({ projectId })`.
+2. `scanner.ts` walks `folderPath`:
+   - Filter by whitelist (text exts + .png/.jpg).
+   - Apply size cap (1 MB / file).
+   - Compute mtime delta vs `projectFolderFiles`.
+   - Returns `{ newFiles[], changedFiles[], deletedFiles[] }`.
+3. Backend pre-flight: `POST /api/v1/billing/quota/check { feature: 'folder_index', estimated_files: N }`:
+   - Rejects 402 if `folder_max_files` exceeded for the user's tier.
+   - Rejects 402 if `folder_monthly_tokens` already exhausted.
+4. Open `index_session_start` over WS.
+5. For each batch of 5 files:
+   - Read content (text) or base64-encode (image).
+   - Send `index_file_batch`.
+   - Await `index_file_result × 5`.
+   - Upsert `projectFolderFiles` row with the returned summary.
+   - Backend atomically increments `MonthlyTokenUsage` and writes a row in `AgentRunLog` with `tokens_used`.
+6. Send `index_session_done`. Update `projects.folderLastScannedAt`, `.folderTotalFiles`, `.folderLastScanStatus = 'idle'`.
+7. Delete `projectFolderFiles` rows for `deletedFiles`.
+
+### Backend (`core/folder_indexer.py`)
+
+- `summarize_text(content, ext) → (summary, tokens)` via `gpt-4o-mini`, Langfuse prompt `folder_file_summary_text`.
+- `summarize_image(b64) → (summary, tokens)` via `gpt-4o-mini` vision, Langfuse prompt `folder_file_summary_image`.
+- After each summarization, atomically increment `MonthlyTokenUsage(user_id, year_month, 'folder_index', +tokens)`. If the increment would exceed cap, the call returns a `quota_exceeded` error in `index_file_result`, and the session sends `index_session_done(status='quota_exceeded')`.
+
+### Rescan triggers
+
+- **Manual button** → tRPC `projectFolders.startScan` mutation.
+- **On-demand mtime check** → inside `read_project_folder_manifest` drizzle-executor action: if any tracked mtime is stale, fire-and-forget `startScan` before returning the current manifest.
+- **Daily auto** → `app.on('ready')` iterates user projects; if `folderLastScannedAt < now − 24h` and `folderPath != null`, queue `startScan`.
+
+`projects.folderLastScanStatus === 'scanning'` blocks new scan triggers (manual button disabled, daily auto + mtime on-demand both skip).
+
+## Agent Integration
+
+### Manifest pre-injection
+
+In `core/deep_agent.py`, every agent run that has a resolved `projectId` builds a compact manifest block and prepends it to the system prompt:
+
+```
+<linked_folder>
+path: D:\Clients\Acme\Brand  (214 files, scanned 2h ago)
+files:
+- /briefs/kickoff.md       [text]  Project kickoff notes; scope, stakeholders, deadlines
+- /logos/logo-v3.png       [image] Final logo, golden-yellow palette on white
+- /research/competitor.pdf [pdf]   Competitor brand audit, 12 entries
+...
+</linked_folder>
+```
+
+Format: `relativePath [kind] summary`. If the rendered block exceeds ~3000 tokens, truncate to the top N files by `mtimeMs DESC` and append:
+
+```
+… {M} more files omitted, use read_project_folder_file to access by path
+```
+
+The backend pulls the manifest via the new drizzle-executor action:
+
+```
+action: read_project_folder_manifest
+data: { projectId }
+returns: { folderPath, lastScannedAt, files: [{relPath, kind, summary}] }
+```
+
+### projectId resolution per agent
+
+- `run_task_brief_research_stream` — `task.projectId`.
+- `run_home` — null unless the user message is project-scoped (via `@project` mention or active project context passed from renderer).
+- `run_brief` — backend cannot enumerate projects directly because projects live in the local SQLite. It calls a new `execute_on_client` action `list_projects_with_folder_manifests` that returns `[{ projectId, projectName, folderPath, lastScannedAt, files: [{relPath, kind, summary}] }]` for every project that has a linked folder. The backend then builds a **multi-project compact manifest** (top 5 most-recently-modified files per project).
+
+### New scoped tool (`agents/folder_agent.py`)
+
+```python
+@tool
+async def read_project_folder_file(project_id: str, relative_path: str) -> str:
+    """Read full content of a file inside the project's linked folder."""
+    result = await execute_on_client(
+        action="read_project_folder_file",
+        data={"projectId": project_id, "relativePath": relative_path},
+    )
+    return result.get("content", "") or f"File not found: {relative_path}"
+```
+
+Backed by a new `drizzle-executor` action that:
+1. Looks up `projects.folderPath` for the projectId.
+2. Resolves `path.join(folderPath, relativePath)` with traversal guard (`..` and absolute paths rejected).
+3. Reads the file via the existing fs helpers. Image → returns base64. Text → returns content (size-capped).
+
+The existing journey-only `FILESYSTEM_TOOLS` are not added to home/brief/task-brief; only the new scoped tool is bound.
+
+## UI
+
+### Hero chip (`ProjectDetail.tsx`)
+
+```tsx
+<FolderChip
+  projectId={project.id}
+  folderPath={project.folderPath}
+  totalFiles={project.folderTotalFiles}
+  lastScannedAt={project.folderLastScannedAt}
+  scanStatus={project.folderLastScanStatus}
+  onClick={() => scrollToTab('files')}
+/>
+```
+
+States:
+- **Unlinked:** dashed pill "📁 Link folder" + Sparkles icon.
+- **Linked idle:** "📁 214 files · 2h ago" with soft golden-yellow background.
+- **Scanning:** "📁 indexing 47/214" + spinner.
+- **Error:** "📁 Scan failed" red-tinted; click → Files tab.
+
+### Files tab
+
+Add `'files'` to `SECTIONS` in `ProjectTabBar.tsx`. The tab body:
+
+```
+┌──────────────────────────────────────────────────┐
+│  Linked folder                                   │
+│  ┌────────────────────────────────────────────┐  │
+│  │ 📁 D:\Clients\Acme\Brand          [⋯ menu] │  │
+│  │ 214 files · last scanned 2h ago            │  │
+│  │ [Rescan]  [Unlink]                         │  │
+│  └────────────────────────────────────────────┘  │
+│                                                  │
+│  Files (filter: [All] [Text] [Images] [PDF])     │
+│  ┌────────────────────────────────────────────┐  │
+│  │ briefs/kickoff.md                          │  │
+│  │   Project kickoff notes; scope, deadlines  │  │
+│  │ logos/logo-v3.png                          │  │
+│  │   Final logo, golden-yellow on white       │  │
+│  │ ...                                        │  │
+│  └────────────────────────────────────────────┘  │
+└──────────────────────────────────────────────────┘
+```
+
+### Empty state (no folder linked)
+
+```
+<Empty>
+  Sparkles
+  Link a project folder
+  Connect a local folder so AI agents can read its files
+  when answering questions about this project.
+  [Choose folder...]   ← opens Electron dialog.showOpenDialog
+</Empty>
+```
+
+### New components (`src/renderer/components/projects/folder/`)
+
+- `FolderChip.tsx`
+- `FilesSection.tsx` (mounts inside `ProjectDetail`)
+- `FolderLinkCard.tsx` (path + actions)
+- `FolderFileList.tsx` (virtualized list of manifest entries)
+- `FolderUnlinkDialog.tsx`
+
+### Platform gating
+
+Feature is **Electron-only**. Wrap entry points in `platform.isElectron`. On the web SPA, the Files tab renders disabled with a tooltip "Folder linking available in desktop app".
+
+### Folder dialog
+
+New tRPC `projectFolders.chooseFolder` mutation invokes `dialog.showOpenDialog({ properties: ['openDirectory'] })` in the main process and returns the selected path.
+
+### i18n
+
+Add `projects.folder.*` keys (title, link CTA, browse, rescan, unlink, status strings, empty state copy, error toasts) to all 5 locale JSON files: en, it, es, fr, de.
+
+## Error Handling
+
+### Quota exhaustion
+
+- Pre-flight 402 → toast `"Folder too big for {tier} plan — max {N} files"` or `"Monthly token budget exhausted (resets {date})"`. Folder not linked.
+- Mid-scan `quota_exceeded` frame → partial manifest kept, scan marked `error`, toast as above, banner in Files tab `"Indexing paused — quota exhausted"`.
+
+### Path errors
+
+- Folder no longer exists at scan start → tRPC throws → toast `"Folder not found: {path}"`. `folderLastScanStatus = 'error'`. User offered Unlink or Re-link.
+- Permission denied on a file during scan → file skipped, logged in `projectFolderFiles` with `kind='skipped'`, no summary. Skipped files appear greyed in the Files tab.
+- Path traversal attempt in `read_project_folder_file` (relativePath contains `..` or is absolute) → tool returns `"Access denied"`; backend logs a warning. Hard fail, no fallback.
+
+### Network / WS failures
+
+- WS drop mid-scan: the in-flight session is abandoned server-side and the local `folderLastScanStatus` is flipped from `'scanning'` to `'error'`. The next trigger (manual rescan, daily auto, or the next on-demand mtime check) starts a **new** session; because the scanner's mtime delta only re-summarizes files whose `mtimeMs` changed (or that have no row yet), already-indexed files are skipped naturally — there is no explicit session-resume protocol.
+- Backend 5xx on summarize → file marked `kind='error'`, retried in the next rescan, not auto-retried inline.
+
+### File-type fallbacks
+
+- PDF parse fails (corrupt) → skipped, `kind='skipped'` with `summary='Could not extract text'`.
+- Image too large (>5 MB) → skipped with reason. Cap is a constant in `files/scanner.ts`.
+- DOCX or other unsupported types → skipped silently with extension noted.
+
+### Concurrent scan guard
+
+`projects.folderLastScanStatus === 'scanning'` blocks new scan triggers. Manual button shows "Scanning..." disabled; daily auto + mtime on-demand both check the status flag first.
+
+### Manifest size overflow
+
+If the agent's pre-injected `<linked_folder>` block would exceed ~3000 tokens, the backend truncates to the top N files by `mtimeMs DESC` and appends an "M more files omitted" hint.
+
+### Tool call on unlinked project
+
+`read_project_folder_file` when `folderPath === null` returns `"No folder linked to project {projectId}"`. The agent can recover and answer without folder context.
+
+## Testing
+
+### API (`api/tests/`)
+
+| File | Coverage |
+|------|----------|
+| `test_folder_indexer.py` | `summarize_text` / `summarize_image` happy path, token recording, Langfuse prompt linking |
+| `test_folder_quota.py` | Pre-flight 402 rejects (max_files + monthly_tokens), atomic increment + `quota_exceeded` mid-stream, monthly reset at `year_month` rollover |
+| `test_ws_index_session.py` | Session lifecycle, cancel mid-stream, abandoned-on-disconnect (next scan skips already-indexed files via mtime delta), bad batch payload validation |
+| `test_folder_agent_tool.py` | `read_project_folder_file` happy path, unlinked project, traversal guard (`../`, absolute) |
+| `test_manifest_injection.py` | `<linked_folder>` block formatting, truncation past 3k tokens, multi-project brief manifest, null projectId skips injection |
+
+Reuse fixtures in `tests/conftest.py` and WS test helpers (`ws_unified` already covers session lifecycle).
+
+### Electron / adiuvAI
+
+No automated test suite currently. Manual smoke checks during development:
+
+- Link folder → manifest populated → unlink → manifest rows deleted.
+- Scan a synthetic dir with mixed text/image/binary → only whitelisted indexed.
+- mtime delta: change one file, rescan only re-indexes that file.
+- Disconnect WS mid-scan → status flips to `'error'`; next manual rescan re-indexes only the remaining files (mtime delta).
+
+### Eval (Langfuse)
+
+Build a test set of 10 representative folders (mix of markdown, code, PDFs, images). Score summary quality (LLM-as-judge) and token efficiency. Link scores to the prompt version per the existing `LOCAL_AGENT_V2_PLAN.md` pattern.
+
+## Out of scope (this spec)
+
+- Phase-2 wiki tier (per-folder + per-file structured summaries).
+- Multi-folder per project.
+- Web SPA support.
+- Token-usage display UI (Settings page comes later).
+- File editing from inside adiuvAI.
+- Live file watcher (chokidar). Daily + manual + on-demand mtime is enough for now.
+
+## Open questions (none)
+
+All resolved during brainstorming.