Files

Roberto 361f89a29d docs: add project-folder integration design spec

Approved design for linking a local folder to a project: lightweight
manifest with per-file LLM summaries, WS-streamed indexing pipeline,
pre-injected manifest for Home/Brief/Task-Brief agents, tier-gated
token+file-count quota recorded backend-side.

2026-05-11 22:08:44 +02:00

20 KiB

Raw Blame History

Project Folder Integration — Design

Date: 2026-05-11 Status: Approved (brainstorming complete) Author: Roberto + Claude

Goal

Let users link a local (or shared-PC) folder to an adiuvAI project. Adiuvai scans the folder, generates per-file summaries via LLM, and exposes the resulting manifest to the Home, Brief, and Task-Brief agents so they can answer project questions with awareness of the user's local files.

Non-goals

Multi-folder linking per project (deferred — 1 folder per project for now).
Full-text RAG over file contents (we use lightweight per-file summaries instead).
File editing from inside adiuvAI (read-only).
Token-usage display in the project UI (recorded backend-side; dedicated Settings page comes later).
Web SPA support (Electron-only — web SPA has no filesystem access).

Strategy

Hybrid AI File System (manifest first, optional wiki tier later).

Phase 1 (this spec): build a lightweight manifest — for each indexable file record (relativePath, kind, size, mtime, 1-line LLM summary). The agent receives the manifest pre-injected into its system prompt and reads full file contents lazily via a scoped tool.

Phase 2 (future, out of scope): for folders above N files or on user opt-in, generate per-folder + per-file wiki summaries written to a structured index. Not built now.

Architecture

┌─────────────────────── adiuvAI (Electron) ───────────────────────┐
│ Renderer (React)                                                  │
│   • Project hero: <FolderChip>      (status glance)              │
│   • <FilesTab>: link/unlink, browse, rescan, progress, browser   │
│   • <FolderBrowser>: tree view of manifest                       │
│                                                                   │
│ Main (Node)                                                       │
│   • db/schema.ts: +projectFolderFiles, +projects.folderPath       │
│   • files/scanner.ts: walk + filter + mtime delta                │
│   • files/indexer.ts: orchestrates WS index session              │
│   • files/daily-rescan.ts: 24h-stale check on app start          │
│   • router/projectFolders.ts: tRPC procedures                     │
│   • api/backend-client.ts: +sendIndexBatch frame                 │
│   • api/drizzle-executor.ts: +read_project_folder_manifest,      │
│                              +read_project_folder_file actions    │
└───────────────────────────────────────────────────────────────────┘
                              │ /api/v1/device WS
                              ▼
┌────────────────────────── api (FastAPI) ─────────────────────────┐
│ device_ws.py: +index_file_batch / +index_file_result frames      │
│ core/folder_indexer.py: summarize text / vision per file          │
│ core/deep_agent.py: pre-inject manifest when project context set │
│ agents/folder_agent.py: scoped read_project_folder_file tool     │
│ billing/tier_manager.py: +folder_max_files, +folder_monthly_tokens│
│ models.py: +AgentRunLog.tokens_used, +MonthlyTokenUsage table     │
└───────────────────────────────────────────────────────────────────┘

Privacy invariant: file content travels to the backend only transiently — for summarization — and is never persisted there. Summaries and manifest entries live in the local SQLite database. Token usage is recorded backend-side because it gates the user's tier quota.

Decisions

Topic	Decision
Retrieval strategy	Hybrid: manifest first, optional wiki tier later (phase 2, out of scope)
File scope	Text whitelist (.md, .txt, .pdf, .docx, .csv, code) + images (.png/.jpg) summarized via gpt-4o-mini vision
Cardinality	One folder per project
Rescan triggers	Manual button + daily auto (24h staleness check on app start) + on-demand mtime delta when manifest is read
Rate-limit metric	Tokens-per-month per user and total file-count cap per folder, both tier-gated
Indexing pipeline	WS streaming over existing `/api/v1/device` with new frame types
Agent access	Pre-inject manifest into system prompt; lazy reads via scoped `read_project_folder_file` tool
UI placement	Hero chip + dedicated "Files" tab in `ProjectTabBar`
Platform	Electron-only (web SPA: tab disabled)
Token-usage display	Out of scope (record backend-side, surface in Settings later)

Schema Changes

adiuvAI local SQLite (`src/main/db/schema.ts`)

Extend projects:

projects: {
  // existing columns...
  folderPath: text('folder_path'),                       // nullable absolute path or UNC
  folderLastScannedAt: integer('folder_last_scanned_at'),// ms, nullable
  folderLastScanStatus: text('folder_last_scan_status'), // 'idle' | 'scanning' | 'error'
  folderTotalFiles: integer('folder_total_files').default(0),
}

New table projectFolderFiles:

projectFolderFiles: {
  id: text('id').primaryKey(),
  projectId: text('project_id').notNull(),       // FK projects.id (no DB constraint per convention)
  relativePath: text('relative_path').notNull(), // path relative to folderPath
  ext: text('ext').notNull(),                    // '.md', '.png', ...
  kind: text('kind').notNull(),                  // 'text' | 'image' | 'pdf' | 'docx' | 'skipped' | 'error'
  sizeBytes: integer('size_bytes').notNull(),
  mtimeMs: integer('mtime_ms').notNull(),
  summary: text('summary'),                      // nullable, ≤500 chars
  summaryUpdatedAt: integer('summary_updated_at'),
  // Unique index: (projectId, relativePath)
}

api Postgres (alembic migration)

op.add_column('agent_run_logs',
  sa.Column('tokens_used', sa.Integer(), nullable=False, server_default='0'))

op.create_table('monthly_token_usage',
  sa.Column('user_id', UUID(as_uuid=False), ForeignKey('users.id', ondelete='CASCADE'), nullable=False),
  sa.Column('year_month', sa.String(7), nullable=False),  # 'YYYY-MM'
  sa.Column('feature', sa.String(64), nullable=False),    # 'folder_index'
  sa.Column('tokens_used', sa.Integer, nullable=False, server_default='0'),
  sa.PrimaryKeyConstraint('user_id', 'year_month', 'feature'),
)

Tier matrix (`app/billing/tier_manager.py`)

Feature	Free	Pro	Power	Team
`folder_max_files`	200	5000	-1	-1
`folder_monthly_tokens`	100k	2M	-1	-1

Indexing Pipeline

New WS frame types on `/api/v1/device`

Direction	Frame	Payload
C → S	`index_session_start`	`{ sessionId, projectId, totalFiles }`
C → S	`index_file_batch`	`{ sessionId, files: [{relPath, kind, content/imageB64, sizeBytes, mtimeMs}] }` (batches of 5)
S → C	`index_file_result`	`{ sessionId, relPath, summary, tokensUsed, error? }`
S → C	`index_session_progress`	`{ sessionId, processed, total }`
C → S	`index_session_cancel`	`{ sessionId }`
S → C	`index_session_done`	`{ sessionId, status: 'completed' \| 'cancelled' \| 'quota_exceeded' \| 'error' }`

Flow (Electron `files/indexer.ts`)

tRPC projectFolders.startScan({ projectId }).
scanner.ts walks folderPath:
- Filter by whitelist (text exts + .png/.jpg).
- Apply size cap (1 MB / file).
- Compute mtime delta vs projectFolderFiles.
- Returns { newFiles[], changedFiles[], deletedFiles[] }.
Backend pre-flight: POST /api/v1/billing/quota/check { feature: 'folder_index', estimated_files: N }:
- Rejects 402 if folder_max_files exceeded for the user's tier.
- Rejects 402 if folder_monthly_tokens already exhausted.
Open index_session_start over WS.
For each batch of 5 files:
- Read content (text) or base64-encode (image).
- Send index_file_batch.
- Await index_file_result × 5.
- Upsert projectFolderFiles row with the returned summary.
- Backend atomically increments MonthlyTokenUsage and writes a row in AgentRunLog with tokens_used.
Send index_session_done. Update projects.folderLastScannedAt, .folderTotalFiles, .folderLastScanStatus = 'idle'.
Delete projectFolderFiles rows for deletedFiles.

Backend (`core/folder_indexer.py`)

summarize_text(content, ext) → (summary, tokens) via gpt-4o-mini, Langfuse prompt folder_file_summary_text.
summarize_image(b64) → (summary, tokens) via gpt-4o-mini vision, Langfuse prompt folder_file_summary_image.
After each summarization, atomically increment MonthlyTokenUsage(user_id, year_month, 'folder_index', +tokens). If the increment would exceed cap, the call returns a quota_exceeded error in index_file_result, and the session sends index_session_done(status='quota_exceeded').

Rescan triggers

Manual button → tRPC projectFolders.startScan mutation.
On-demand mtime check → inside read_project_folder_manifest drizzle-executor action: if any tracked mtime is stale, fire-and-forget startScan before returning the current manifest.
Daily auto → app.on('ready') iterates user projects; if folderLastScannedAt < now − 24h and folderPath != null, queue startScan.

projects.folderLastScanStatus === 'scanning' blocks new scan triggers (manual button disabled, daily auto + mtime on-demand both skip).

Agent Integration

Manifest pre-injection

In core/deep_agent.py, every agent run that has a resolved projectId builds a compact manifest block and prepends it to the system prompt:

<linked_folder>
path: D:\Clients\Acme\Brand  (214 files, scanned 2h ago)
files:
- /briefs/kickoff.md       [text]  Project kickoff notes; scope, stakeholders, deadlines
- /logos/logo-v3.png       [image] Final logo, golden-yellow palette on white
- /research/competitor.pdf [pdf]   Competitor brand audit, 12 entries
...
</linked_folder>

Format: relativePath [kind] summary. If the rendered block exceeds ~3000 tokens, truncate to the top N files by mtimeMs DESC and append:

… {M} more files omitted, use read_project_folder_file to access by path

The backend pulls the manifest via the new drizzle-executor action:

action: read_project_folder_manifest
data: { projectId }
returns: { folderPath, lastScannedAt, files: [{relPath, kind, summary}] }

projectId resolution per agent

run_task_brief_research_stream — task.projectId.
run_home — null unless the user message is project-scoped (via @project mention or active project context passed from renderer).
run_brief — backend cannot enumerate projects directly because projects live in the local SQLite. It calls a new execute_on_client action list_projects_with_folder_manifests that returns [{ projectId, projectName, folderPath, lastScannedAt, files: [{relPath, kind, summary}] }] for every project that has a linked folder. The backend then builds a multi-project compact manifest (top 5 most-recently-modified files per project).

New scoped tool (`agents/folder_agent.py`)

@tool
async def read_project_folder_file(project_id: str, relative_path: str) -> str:
    """Read full content of a file inside the project's linked folder."""
    result = await execute_on_client(
        action="read_project_folder_file",
        data={"projectId": project_id, "relativePath": relative_path},
    )
    return result.get("content", "") or f"File not found: {relative_path}"

Backed by a new drizzle-executor action that:

Looks up projects.folderPath for the projectId.
Resolves path.join(folderPath, relativePath) with traversal guard (.. and absolute paths rejected).
Reads the file via the existing fs helpers. Image → returns base64. Text → returns content (size-capped).

The existing journey-only FILESYSTEM_TOOLS are not added to home/brief/task-brief; only the new scoped tool is bound.

UI

Hero chip (`ProjectDetail.tsx`)

<FolderChip
  projectId={project.id}
  folderPath={project.folderPath}
  totalFiles={project.folderTotalFiles}
  lastScannedAt={project.folderLastScannedAt}
  scanStatus={project.folderLastScanStatus}
  onClick={() => scrollToTab('files')}
/>

States:

Unlinked: dashed pill "📁 Link folder" + Sparkles icon.
Linked idle: "📁 214 files · 2h ago" with soft golden-yellow background.
Scanning: "📁 indexing 47/214" + spinner.
Error: "📁 Scan failed" red-tinted; click → Files tab.

Files tab

Add 'files' to SECTIONS in ProjectTabBar.tsx. The tab body:

┌──────────────────────────────────────────────────┐
│  Linked folder                                   │
│  ┌────────────────────────────────────────────┐  │
│  │ 📁 D:\Clients\Acme\Brand          [⋯ menu] │  │
│  │ 214 files · last scanned 2h ago            │  │
│  │ [Rescan]  [Unlink]                         │  │
│  └────────────────────────────────────────────┘  │
│                                                  │
│  Files (filter: [All] [Text] [Images] [PDF])     │
│  ┌────────────────────────────────────────────┐  │
│  │ briefs/kickoff.md                          │  │
│  │   Project kickoff notes; scope, deadlines  │  │
│  │ logos/logo-v3.png                          │  │
│  │   Final logo, golden-yellow on white       │  │
│  │ ...                                        │  │
│  └────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────┘

Empty state (no folder linked)

<Empty>
  Sparkles
  Link a project folder
  Connect a local folder so AI agents can read its files
  when answering questions about this project.
  [Choose folder...]   ← opens Electron dialog.showOpenDialog
</Empty>

New components (`src/renderer/components/projects/folder/`)

FolderChip.tsx
FilesSection.tsx (mounts inside ProjectDetail)
FolderLinkCard.tsx (path + actions)
FolderFileList.tsx (virtualized list of manifest entries)
FolderUnlinkDialog.tsx

Platform gating

Feature is Electron-only. Wrap entry points in platform.isElectron. On the web SPA, the Files tab renders disabled with a tooltip "Folder linking available in desktop app".

Folder dialog

New tRPC projectFolders.chooseFolder mutation invokes dialog.showOpenDialog({ properties: ['openDirectory'] }) in the main process and returns the selected path.

i18n

Add projects.folder.* keys (title, link CTA, browse, rescan, unlink, status strings, empty state copy, error toasts) to all 5 locale JSON files: en, it, es, fr, de.

Error Handling

Quota exhaustion

Pre-flight 402 → toast "Folder too big for {tier} plan — max {N} files" or "Monthly token budget exhausted (resets {date})". Folder not linked.
Mid-scan quota_exceeded frame → partial manifest kept, scan marked error, toast as above, banner in Files tab "Indexing paused — quota exhausted".

Path errors

Folder no longer exists at scan start → tRPC throws → toast "Folder not found: {path}". folderLastScanStatus = 'error'. User offered Unlink or Re-link.
Permission denied on a file during scan → file skipped, logged in projectFolderFiles with kind='skipped', no summary. Skipped files appear greyed in the Files tab.
Path traversal attempt in read_project_folder_file (relativePath contains .. or is absolute) → tool returns "Access denied"; backend logs a warning. Hard fail, no fallback.

Network / WS failures

WS drop mid-scan: the in-flight session is abandoned server-side and the local folderLastScanStatus is flipped from 'scanning' to 'error'. The next trigger (manual rescan, daily auto, or the next on-demand mtime check) starts a new session; because the scanner's mtime delta only re-summarizes files whose mtimeMs changed (or that have no row yet), already-indexed files are skipped naturally — there is no explicit session-resume protocol.
Backend 5xx on summarize → file marked kind='error', retried in the next rescan, not auto-retried inline.

File-type fallbacks

PDF parse fails (corrupt) → skipped, kind='skipped' with summary='Could not extract text'.
Image too large (>5 MB) → skipped with reason. Cap is a constant in files/scanner.ts.
DOCX or other unsupported types → skipped silently with extension noted.

Concurrent scan guard

projects.folderLastScanStatus === 'scanning' blocks new scan triggers. Manual button shows "Scanning..." disabled; daily auto + mtime on-demand both check the status flag first.

Manifest size overflow

If the agent's pre-injected <linked_folder> block would exceed ~3000 tokens, the backend truncates to the top N files by mtimeMs DESC and appends an "M more files omitted" hint.

Tool call on unlinked project

read_project_folder_file when folderPath === null returns "No folder linked to project {projectId}". The agent can recover and answer without folder context.

Testing

API (`api/tests/`)

File	Coverage
`test_folder_indexer.py`	`summarize_text` / `summarize_image` happy path, token recording, Langfuse prompt linking
`test_folder_quota.py`	Pre-flight 402 rejects (max_files + monthly_tokens), atomic increment + `quota_exceeded` mid-stream, monthly reset at `year_month` rollover
`test_ws_index_session.py`	Session lifecycle, cancel mid-stream, abandoned-on-disconnect (next scan skips already-indexed files via mtime delta), bad batch payload validation
`test_folder_agent_tool.py`	`read_project_folder_file` happy path, unlinked project, traversal guard (`../`, absolute)
`test_manifest_injection.py`	`<linked_folder>` block formatting, truncation past 3k tokens, multi-project brief manifest, null projectId skips injection

Reuse fixtures in tests/conftest.py and WS test helpers (ws_unified already covers session lifecycle).

Electron / adiuvAI

No automated test suite currently. Manual smoke checks during development:

Link folder → manifest populated → unlink → manifest rows deleted.
Scan a synthetic dir with mixed text/image/binary → only whitelisted indexed.
mtime delta: change one file, rescan only re-indexes that file.
Disconnect WS mid-scan → status flips to 'error'; next manual rescan re-indexes only the remaining files (mtime delta).

Eval (Langfuse)

Build a test set of 10 representative folders (mix of markdown, code, PDFs, images). Score summary quality (LLM-as-judge) and token efficiency. Link scores to the prompt version per the existing LOCAL_AGENT_V2_PLAN.md pattern.

Out of scope (this spec)

Phase-2 wiki tier (per-folder + per-file structured summaries).
Multi-folder per project.
Web SPA support.
Token-usage display UI (Settings page comes later).
File editing from inside adiuvAI.
Live file watcher (chokidar). Daily + manual + on-demand mtime is enough for now.

Open questions (none)

All resolved during brainstorming.

20 KiB Raw Blame History Unescape Escape