Files
workspace/docs/superpowers/specs/2026-05-11-project-folder-integration-design.md
Roberto 361f89a29d docs: add project-folder integration design spec
Approved design for linking a local folder to a project: lightweight
manifest with per-file LLM summaries, WS-streamed indexing pipeline,
pre-injected manifest for Home/Brief/Task-Brief agents, tier-gated
token+file-count quota recorded backend-side.
2026-05-11 22:08:44 +02:00

387 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Project Folder Integration — Design
**Date:** 2026-05-11
**Status:** Approved (brainstorming complete)
**Author:** Roberto + Claude
## Goal
Let users link a local (or shared-PC) folder to an adiuvAI project. Adiuvai scans the folder, generates per-file summaries via LLM, and exposes the resulting manifest to the Home, Brief, and Task-Brief agents so they can answer project questions with awareness of the user's local files.
## Non-goals
- Multi-folder linking per project (deferred — 1 folder per project for now).
- Full-text RAG over file contents (we use lightweight per-file summaries instead).
- File editing from inside adiuvAI (read-only).
- Token-usage display in the project UI (recorded backend-side; dedicated Settings page comes later).
- Web SPA support (Electron-only — web SPA has no filesystem access).
## Strategy
**Hybrid AI File System (manifest first, optional wiki tier later).**
Phase 1 (this spec): build a lightweight manifest — for each indexable file record `(relativePath, kind, size, mtime, 1-line LLM summary)`. The agent receives the manifest pre-injected into its system prompt and reads full file contents lazily via a scoped tool.
Phase 2 (future, out of scope): for folders above N files or on user opt-in, generate per-folder + per-file wiki summaries written to a structured index. Not built now.
## Architecture
```
┌─────────────────────── adiuvAI (Electron) ───────────────────────┐
│ Renderer (React) │
│ • Project hero: <FolderChip> (status glance) │
│ • <FilesTab>: link/unlink, browse, rescan, progress, browser │
│ • <FolderBrowser>: tree view of manifest │
│ │
│ Main (Node) │
│ • db/schema.ts: +projectFolderFiles, +projects.folderPath │
│ • files/scanner.ts: walk + filter + mtime delta │
│ • files/indexer.ts: orchestrates WS index session │
│ • files/daily-rescan.ts: 24h-stale check on app start │
│ • router/projectFolders.ts: tRPC procedures │
│ • api/backend-client.ts: +sendIndexBatch frame │
│ • api/drizzle-executor.ts: +read_project_folder_manifest, │
│ +read_project_folder_file actions │
└───────────────────────────────────────────────────────────────────┘
│ /api/v1/device WS
┌────────────────────────── api (FastAPI) ─────────────────────────┐
│ device_ws.py: +index_file_batch / +index_file_result frames │
│ core/folder_indexer.py: summarize text / vision per file │
│ core/deep_agent.py: pre-inject manifest when project context set │
│ agents/folder_agent.py: scoped read_project_folder_file tool │
│ billing/tier_manager.py: +folder_max_files, +folder_monthly_tokens│
│ models.py: +AgentRunLog.tokens_used, +MonthlyTokenUsage table │
└───────────────────────────────────────────────────────────────────┘
```
**Privacy invariant:** file content travels to the backend only transiently — for summarization — and is never persisted there. Summaries and manifest entries live in the local SQLite database. Token usage is recorded backend-side because it gates the user's tier quota.
## Decisions
| Topic | Decision |
|-------|----------|
| Retrieval strategy | Hybrid: manifest first, optional wiki tier later (phase 2, out of scope) |
| File scope | Text whitelist (.md, .txt, .pdf, .docx, .csv, code) + images (.png/.jpg) summarized via gpt-4o-mini vision |
| Cardinality | One folder per project |
| Rescan triggers | Manual button + daily auto (24h staleness check on app start) + on-demand mtime delta when manifest is read |
| Rate-limit metric | Tokens-per-month per user **and** total file-count cap per folder, both tier-gated |
| Indexing pipeline | WS streaming over existing `/api/v1/device` with new frame types |
| Agent access | Pre-inject manifest into system prompt; lazy reads via scoped `read_project_folder_file` tool |
| UI placement | Hero chip + dedicated "Files" tab in `ProjectTabBar` |
| Platform | Electron-only (web SPA: tab disabled) |
| Token-usage display | Out of scope (record backend-side, surface in Settings later) |
## Schema Changes
### adiuvAI local SQLite (`src/main/db/schema.ts`)
Extend `projects`:
```typescript
projects: {
// existing columns...
folderPath: text('folder_path'), // nullable absolute path or UNC
folderLastScannedAt: integer('folder_last_scanned_at'),// ms, nullable
folderLastScanStatus: text('folder_last_scan_status'), // 'idle' | 'scanning' | 'error'
folderTotalFiles: integer('folder_total_files').default(0),
}
```
New table `projectFolderFiles`:
```typescript
projectFolderFiles: {
id: text('id').primaryKey(),
projectId: text('project_id').notNull(), // FK projects.id (no DB constraint per convention)
relativePath: text('relative_path').notNull(), // path relative to folderPath
ext: text('ext').notNull(), // '.md', '.png', ...
kind: text('kind').notNull(), // 'text' | 'image' | 'pdf' | 'docx' | 'skipped' | 'error'
sizeBytes: integer('size_bytes').notNull(),
mtimeMs: integer('mtime_ms').notNull(),
summary: text('summary'), // nullable, ≤500 chars
summaryUpdatedAt: integer('summary_updated_at'),
// Unique index: (projectId, relativePath)
}
```
### api Postgres (alembic migration)
```python
op.add_column('agent_run_logs',
sa.Column('tokens_used', sa.Integer(), nullable=False, server_default='0'))
op.create_table('monthly_token_usage',
sa.Column('user_id', UUID(as_uuid=False), ForeignKey('users.id', ondelete='CASCADE'), nullable=False),
sa.Column('year_month', sa.String(7), nullable=False), # 'YYYY-MM'
sa.Column('feature', sa.String(64), nullable=False), # 'folder_index'
sa.Column('tokens_used', sa.Integer, nullable=False, server_default='0'),
sa.PrimaryKeyConstraint('user_id', 'year_month', 'feature'),
)
```
### Tier matrix (`app/billing/tier_manager.py`)
| Feature | Free | Pro | Power | Team |
|-------------------------|------|-----|-------|------|
| `folder_max_files` | 200 | 5000| -1 | -1 |
| `folder_monthly_tokens` | 100k | 2M | -1 | -1 |
## Indexing Pipeline
### New WS frame types on `/api/v1/device`
| Direction | Frame | Payload |
|-----------|---------------------------|---------|
| C → S | `index_session_start` | `{ sessionId, projectId, totalFiles }` |
| C → S | `index_file_batch` | `{ sessionId, files: [{relPath, kind, content/imageB64, sizeBytes, mtimeMs}] }` (batches of 5) |
| S → C | `index_file_result` | `{ sessionId, relPath, summary, tokensUsed, error? }` |
| S → C | `index_session_progress` | `{ sessionId, processed, total }` |
| C → S | `index_session_cancel` | `{ sessionId }` |
| S → C | `index_session_done` | `{ sessionId, status: 'completed' \| 'cancelled' \| 'quota_exceeded' \| 'error' }` |
### Flow (Electron `files/indexer.ts`)
1. tRPC `projectFolders.startScan({ projectId })`.
2. `scanner.ts` walks `folderPath`:
- Filter by whitelist (text exts + .png/.jpg).
- Apply size cap (1 MB / file).
- Compute mtime delta vs `projectFolderFiles`.
- Returns `{ newFiles[], changedFiles[], deletedFiles[] }`.
3. Backend pre-flight: `POST /api/v1/billing/quota/check { feature: 'folder_index', estimated_files: N }`:
- Rejects 402 if `folder_max_files` exceeded for the user's tier.
- Rejects 402 if `folder_monthly_tokens` already exhausted.
4. Open `index_session_start` over WS.
5. For each batch of 5 files:
- Read content (text) or base64-encode (image).
- Send `index_file_batch`.
- Await `index_file_result × 5`.
- Upsert `projectFolderFiles` row with the returned summary.
- Backend atomically increments `MonthlyTokenUsage` and writes a row in `AgentRunLog` with `tokens_used`.
6. Send `index_session_done`. Update `projects.folderLastScannedAt`, `.folderTotalFiles`, `.folderLastScanStatus = 'idle'`.
7. Delete `projectFolderFiles` rows for `deletedFiles`.
### Backend (`core/folder_indexer.py`)
- `summarize_text(content, ext) → (summary, tokens)` via `gpt-4o-mini`, Langfuse prompt `folder_file_summary_text`.
- `summarize_image(b64) → (summary, tokens)` via `gpt-4o-mini` vision, Langfuse prompt `folder_file_summary_image`.
- After each summarization, atomically increment `MonthlyTokenUsage(user_id, year_month, 'folder_index', +tokens)`. If the increment would exceed cap, the call returns a `quota_exceeded` error in `index_file_result`, and the session sends `index_session_done(status='quota_exceeded')`.
### Rescan triggers
- **Manual button** → tRPC `projectFolders.startScan` mutation.
- **On-demand mtime check** → inside `read_project_folder_manifest` drizzle-executor action: if any tracked mtime is stale, fire-and-forget `startScan` before returning the current manifest.
- **Daily auto** → `app.on('ready')` iterates user projects; if `folderLastScannedAt < now 24h` and `folderPath != null`, queue `startScan`.
`projects.folderLastScanStatus === 'scanning'` blocks new scan triggers (manual button disabled, daily auto + mtime on-demand both skip).
## Agent Integration
### Manifest pre-injection
In `core/deep_agent.py`, every agent run that has a resolved `projectId` builds a compact manifest block and prepends it to the system prompt:
```
<linked_folder>
path: D:\Clients\Acme\Brand (214 files, scanned 2h ago)
files:
- /briefs/kickoff.md [text] Project kickoff notes; scope, stakeholders, deadlines
- /logos/logo-v3.png [image] Final logo, golden-yellow palette on white
- /research/competitor.pdf [pdf] Competitor brand audit, 12 entries
...
</linked_folder>
```
Format: `relativePath [kind] summary`. If the rendered block exceeds ~3000 tokens, truncate to the top N files by `mtimeMs DESC` and append:
```
… {M} more files omitted, use read_project_folder_file to access by path
```
The backend pulls the manifest via the new drizzle-executor action:
```
action: read_project_folder_manifest
data: { projectId }
returns: { folderPath, lastScannedAt, files: [{relPath, kind, summary}] }
```
### projectId resolution per agent
- `run_task_brief_research_stream``task.projectId`.
- `run_home` — null unless the user message is project-scoped (via `@project` mention or active project context passed from renderer).
- `run_brief` — backend cannot enumerate projects directly because projects live in the local SQLite. It calls a new `execute_on_client` action `list_projects_with_folder_manifests` that returns `[{ projectId, projectName, folderPath, lastScannedAt, files: [{relPath, kind, summary}] }]` for every project that has a linked folder. The backend then builds a **multi-project compact manifest** (top 5 most-recently-modified files per project).
### New scoped tool (`agents/folder_agent.py`)
```python
@tool
async def read_project_folder_file(project_id: str, relative_path: str) -> str:
"""Read full content of a file inside the project's linked folder."""
result = await execute_on_client(
action="read_project_folder_file",
data={"projectId": project_id, "relativePath": relative_path},
)
return result.get("content", "") or f"File not found: {relative_path}"
```
Backed by a new `drizzle-executor` action that:
1. Looks up `projects.folderPath` for the projectId.
2. Resolves `path.join(folderPath, relativePath)` with traversal guard (`..` and absolute paths rejected).
3. Reads the file via the existing fs helpers. Image → returns base64. Text → returns content (size-capped).
The existing journey-only `FILESYSTEM_TOOLS` are not added to home/brief/task-brief; only the new scoped tool is bound.
## UI
### Hero chip (`ProjectDetail.tsx`)
```tsx
<FolderChip
projectId={project.id}
folderPath={project.folderPath}
totalFiles={project.folderTotalFiles}
lastScannedAt={project.folderLastScannedAt}
scanStatus={project.folderLastScanStatus}
onClick={() => scrollToTab('files')}
/>
```
States:
- **Unlinked:** dashed pill "📁 Link folder" + Sparkles icon.
- **Linked idle:** "📁 214 files · 2h ago" with soft golden-yellow background.
- **Scanning:** "📁 indexing 47/214" + spinner.
- **Error:** "📁 Scan failed" red-tinted; click → Files tab.
### Files tab
Add `'files'` to `SECTIONS` in `ProjectTabBar.tsx`. The tab body:
```
┌──────────────────────────────────────────────────┐
│ Linked folder │
│ ┌────────────────────────────────────────────┐ │
│ │ 📁 D:\Clients\Acme\Brand [⋯ menu] │ │
│ │ 214 files · last scanned 2h ago │ │
│ │ [Rescan] [Unlink] │ │
│ └────────────────────────────────────────────┘ │
│ │
│ Files (filter: [All] [Text] [Images] [PDF]) │
│ ┌────────────────────────────────────────────┐ │
│ │ briefs/kickoff.md │ │
│ │ Project kickoff notes; scope, deadlines │ │
│ │ logos/logo-v3.png │ │
│ │ Final logo, golden-yellow on white │ │
│ │ ... │ │
│ └────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
```
### Empty state (no folder linked)
```
<Empty>
Sparkles
Link a project folder
Connect a local folder so AI agents can read its files
when answering questions about this project.
[Choose folder...] ← opens Electron dialog.showOpenDialog
</Empty>
```
### New components (`src/renderer/components/projects/folder/`)
- `FolderChip.tsx`
- `FilesSection.tsx` (mounts inside `ProjectDetail`)
- `FolderLinkCard.tsx` (path + actions)
- `FolderFileList.tsx` (virtualized list of manifest entries)
- `FolderUnlinkDialog.tsx`
### Platform gating
Feature is **Electron-only**. Wrap entry points in `platform.isElectron`. On the web SPA, the Files tab renders disabled with a tooltip "Folder linking available in desktop app".
### Folder dialog
New tRPC `projectFolders.chooseFolder` mutation invokes `dialog.showOpenDialog({ properties: ['openDirectory'] })` in the main process and returns the selected path.
### i18n
Add `projects.folder.*` keys (title, link CTA, browse, rescan, unlink, status strings, empty state copy, error toasts) to all 5 locale JSON files: en, it, es, fr, de.
## Error Handling
### Quota exhaustion
- Pre-flight 402 → toast `"Folder too big for {tier} plan — max {N} files"` or `"Monthly token budget exhausted (resets {date})"`. Folder not linked.
- Mid-scan `quota_exceeded` frame → partial manifest kept, scan marked `error`, toast as above, banner in Files tab `"Indexing paused — quota exhausted"`.
### Path errors
- Folder no longer exists at scan start → tRPC throws → toast `"Folder not found: {path}"`. `folderLastScanStatus = 'error'`. User offered Unlink or Re-link.
- Permission denied on a file during scan → file skipped, logged in `projectFolderFiles` with `kind='skipped'`, no summary. Skipped files appear greyed in the Files tab.
- Path traversal attempt in `read_project_folder_file` (relativePath contains `..` or is absolute) → tool returns `"Access denied"`; backend logs a warning. Hard fail, no fallback.
### Network / WS failures
- WS drop mid-scan: the in-flight session is abandoned server-side and the local `folderLastScanStatus` is flipped from `'scanning'` to `'error'`. The next trigger (manual rescan, daily auto, or the next on-demand mtime check) starts a **new** session; because the scanner's mtime delta only re-summarizes files whose `mtimeMs` changed (or that have no row yet), already-indexed files are skipped naturally — there is no explicit session-resume protocol.
- Backend 5xx on summarize → file marked `kind='error'`, retried in the next rescan, not auto-retried inline.
### File-type fallbacks
- PDF parse fails (corrupt) → skipped, `kind='skipped'` with `summary='Could not extract text'`.
- Image too large (>5 MB) → skipped with reason. Cap is a constant in `files/scanner.ts`.
- DOCX or other unsupported types → skipped silently with extension noted.
### Concurrent scan guard
`projects.folderLastScanStatus === 'scanning'` blocks new scan triggers. Manual button shows "Scanning..." disabled; daily auto + mtime on-demand both check the status flag first.
### Manifest size overflow
If the agent's pre-injected `<linked_folder>` block would exceed ~3000 tokens, the backend truncates to the top N files by `mtimeMs DESC` and appends an "M more files omitted" hint.
### Tool call on unlinked project
`read_project_folder_file` when `folderPath === null` returns `"No folder linked to project {projectId}"`. The agent can recover and answer without folder context.
## Testing
### API (`api/tests/`)
| File | Coverage |
|------|----------|
| `test_folder_indexer.py` | `summarize_text` / `summarize_image` happy path, token recording, Langfuse prompt linking |
| `test_folder_quota.py` | Pre-flight 402 rejects (max_files + monthly_tokens), atomic increment + `quota_exceeded` mid-stream, monthly reset at `year_month` rollover |
| `test_ws_index_session.py` | Session lifecycle, cancel mid-stream, abandoned-on-disconnect (next scan skips already-indexed files via mtime delta), bad batch payload validation |
| `test_folder_agent_tool.py` | `read_project_folder_file` happy path, unlinked project, traversal guard (`../`, absolute) |
| `test_manifest_injection.py` | `<linked_folder>` block formatting, truncation past 3k tokens, multi-project brief manifest, null projectId skips injection |
Reuse fixtures in `tests/conftest.py` and WS test helpers (`ws_unified` already covers session lifecycle).
### Electron / adiuvAI
No automated test suite currently. Manual smoke checks during development:
- Link folder → manifest populated → unlink → manifest rows deleted.
- Scan a synthetic dir with mixed text/image/binary → only whitelisted indexed.
- mtime delta: change one file, rescan only re-indexes that file.
- Disconnect WS mid-scan → status flips to `'error'`; next manual rescan re-indexes only the remaining files (mtime delta).
### Eval (Langfuse)
Build a test set of 10 representative folders (mix of markdown, code, PDFs, images). Score summary quality (LLM-as-judge) and token efficiency. Link scores to the prompt version per the existing `LOCAL_AGENT_V2_PLAN.md` pattern.
## Out of scope (this spec)
- Phase-2 wiki tier (per-folder + per-file structured summaries).
- Multi-folder per project.
- Web SPA support.
- Token-usage display UI (Settings page comes later).
- File editing from inside adiuvAI.
- Live file watcher (chokidar). Daily + manual + on-demand mtime is enough for now.
## Open questions (none)
All resolved during brainstorming.