Approved design for linking a local folder to a project: lightweight manifest with per-file LLM summaries, WS-streamed indexing pipeline, pre-injected manifest for Home/Brief/Task-Brief agents, tier-gated token+file-count quota recorded backend-side.
20 KiB
Project Folder Integration — Design
Date: 2026-05-11 Status: Approved (brainstorming complete) Author: Roberto + Claude
Goal
Let users link a local (or shared-PC) folder to an adiuvAI project. Adiuvai scans the folder, generates per-file summaries via LLM, and exposes the resulting manifest to the Home, Brief, and Task-Brief agents so they can answer project questions with awareness of the user's local files.
Non-goals
- Multi-folder linking per project (deferred — 1 folder per project for now).
- Full-text RAG over file contents (we use lightweight per-file summaries instead).
- File editing from inside adiuvAI (read-only).
- Token-usage display in the project UI (recorded backend-side; dedicated Settings page comes later).
- Web SPA support (Electron-only — web SPA has no filesystem access).
Strategy
Hybrid AI File System (manifest first, optional wiki tier later).
Phase 1 (this spec): build a lightweight manifest — for each indexable file record (relativePath, kind, size, mtime, 1-line LLM summary). The agent receives the manifest pre-injected into its system prompt and reads full file contents lazily via a scoped tool.
Phase 2 (future, out of scope): for folders above N files or on user opt-in, generate per-folder + per-file wiki summaries written to a structured index. Not built now.
Architecture
┌─────────────────────── adiuvAI (Electron) ───────────────────────┐
│ Renderer (React) │
│ • Project hero: <FolderChip> (status glance) │
│ • <FilesTab>: link/unlink, browse, rescan, progress, browser │
│ • <FolderBrowser>: tree view of manifest │
│ │
│ Main (Node) │
│ • db/schema.ts: +projectFolderFiles, +projects.folderPath │
│ • files/scanner.ts: walk + filter + mtime delta │
│ • files/indexer.ts: orchestrates WS index session │
│ • files/daily-rescan.ts: 24h-stale check on app start │
│ • router/projectFolders.ts: tRPC procedures │
│ • api/backend-client.ts: +sendIndexBatch frame │
│ • api/drizzle-executor.ts: +read_project_folder_manifest, │
│ +read_project_folder_file actions │
└───────────────────────────────────────────────────────────────────┘
│ /api/v1/device WS
▼
┌────────────────────────── api (FastAPI) ─────────────────────────┐
│ device_ws.py: +index_file_batch / +index_file_result frames │
│ core/folder_indexer.py: summarize text / vision per file │
│ core/deep_agent.py: pre-inject manifest when project context set │
│ agents/folder_agent.py: scoped read_project_folder_file tool │
│ billing/tier_manager.py: +folder_max_files, +folder_monthly_tokens│
│ models.py: +AgentRunLog.tokens_used, +MonthlyTokenUsage table │
└───────────────────────────────────────────────────────────────────┘
Privacy invariant: file content travels to the backend only transiently — for summarization — and is never persisted there. Summaries and manifest entries live in the local SQLite database. Token usage is recorded backend-side because it gates the user's tier quota.
Decisions
| Topic | Decision |
|---|---|
| Retrieval strategy | Hybrid: manifest first, optional wiki tier later (phase 2, out of scope) |
| File scope | Text whitelist (.md, .txt, .pdf, .docx, .csv, code) + images (.png/.jpg) summarized via gpt-4o-mini vision |
| Cardinality | One folder per project |
| Rescan triggers | Manual button + daily auto (24h staleness check on app start) + on-demand mtime delta when manifest is read |
| Rate-limit metric | Tokens-per-month per user and total file-count cap per folder, both tier-gated |
| Indexing pipeline | WS streaming over existing /api/v1/device with new frame types |
| Agent access | Pre-inject manifest into system prompt; lazy reads via scoped read_project_folder_file tool |
| UI placement | Hero chip + dedicated "Files" tab in ProjectTabBar |
| Platform | Electron-only (web SPA: tab disabled) |
| Token-usage display | Out of scope (record backend-side, surface in Settings later) |
Schema Changes
adiuvAI local SQLite (src/main/db/schema.ts)
Extend projects:
projects: {
// existing columns...
folderPath: text('folder_path'), // nullable absolute path or UNC
folderLastScannedAt: integer('folder_last_scanned_at'),// ms, nullable
folderLastScanStatus: text('folder_last_scan_status'), // 'idle' | 'scanning' | 'error'
folderTotalFiles: integer('folder_total_files').default(0),
}
New table projectFolderFiles:
projectFolderFiles: {
id: text('id').primaryKey(),
projectId: text('project_id').notNull(), // FK projects.id (no DB constraint per convention)
relativePath: text('relative_path').notNull(), // path relative to folderPath
ext: text('ext').notNull(), // '.md', '.png', ...
kind: text('kind').notNull(), // 'text' | 'image' | 'pdf' | 'docx' | 'skipped' | 'error'
sizeBytes: integer('size_bytes').notNull(),
mtimeMs: integer('mtime_ms').notNull(),
summary: text('summary'), // nullable, ≤500 chars
summaryUpdatedAt: integer('summary_updated_at'),
// Unique index: (projectId, relativePath)
}
api Postgres (alembic migration)
op.add_column('agent_run_logs',
sa.Column('tokens_used', sa.Integer(), nullable=False, server_default='0'))
op.create_table('monthly_token_usage',
sa.Column('user_id', UUID(as_uuid=False), ForeignKey('users.id', ondelete='CASCADE'), nullable=False),
sa.Column('year_month', sa.String(7), nullable=False), # 'YYYY-MM'
sa.Column('feature', sa.String(64), nullable=False), # 'folder_index'
sa.Column('tokens_used', sa.Integer, nullable=False, server_default='0'),
sa.PrimaryKeyConstraint('user_id', 'year_month', 'feature'),
)
Tier matrix (app/billing/tier_manager.py)
| Feature | Free | Pro | Power | Team |
|---|---|---|---|---|
folder_max_files |
200 | 5000 | -1 | -1 |
folder_monthly_tokens |
100k | 2M | -1 | -1 |
Indexing Pipeline
New WS frame types on /api/v1/device
| Direction | Frame | Payload |
|---|---|---|
| C → S | index_session_start |
{ sessionId, projectId, totalFiles } |
| C → S | index_file_batch |
{ sessionId, files: [{relPath, kind, content/imageB64, sizeBytes, mtimeMs}] } (batches of 5) |
| S → C | index_file_result |
{ sessionId, relPath, summary, tokensUsed, error? } |
| S → C | index_session_progress |
{ sessionId, processed, total } |
| C → S | index_session_cancel |
{ sessionId } |
| S → C | index_session_done |
{ sessionId, status: 'completed' | 'cancelled' | 'quota_exceeded' | 'error' } |
Flow (Electron files/indexer.ts)
- tRPC
projectFolders.startScan({ projectId }). scanner.tswalksfolderPath:- Filter by whitelist (text exts + .png/.jpg).
- Apply size cap (1 MB / file).
- Compute mtime delta vs
projectFolderFiles. - Returns
{ newFiles[], changedFiles[], deletedFiles[] }.
- Backend pre-flight:
POST /api/v1/billing/quota/check { feature: 'folder_index', estimated_files: N }:- Rejects 402 if
folder_max_filesexceeded for the user's tier. - Rejects 402 if
folder_monthly_tokensalready exhausted.
- Rejects 402 if
- Open
index_session_startover WS. - For each batch of 5 files:
- Read content (text) or base64-encode (image).
- Send
index_file_batch. - Await
index_file_result × 5. - Upsert
projectFolderFilesrow with the returned summary. - Backend atomically increments
MonthlyTokenUsageand writes a row inAgentRunLogwithtokens_used.
- Send
index_session_done. Updateprojects.folderLastScannedAt,.folderTotalFiles,.folderLastScanStatus = 'idle'. - Delete
projectFolderFilesrows fordeletedFiles.
Backend (core/folder_indexer.py)
summarize_text(content, ext) → (summary, tokens)viagpt-4o-mini, Langfuse promptfolder_file_summary_text.summarize_image(b64) → (summary, tokens)viagpt-4o-minivision, Langfuse promptfolder_file_summary_image.- After each summarization, atomically increment
MonthlyTokenUsage(user_id, year_month, 'folder_index', +tokens). If the increment would exceed cap, the call returns aquota_exceedederror inindex_file_result, and the session sendsindex_session_done(status='quota_exceeded').
Rescan triggers
- Manual button → tRPC
projectFolders.startScanmutation. - On-demand mtime check → inside
read_project_folder_manifestdrizzle-executor action: if any tracked mtime is stale, fire-and-forgetstartScanbefore returning the current manifest. - Daily auto →
app.on('ready')iterates user projects; iffolderLastScannedAt < now − 24handfolderPath != null, queuestartScan.
projects.folderLastScanStatus === 'scanning' blocks new scan triggers (manual button disabled, daily auto + mtime on-demand both skip).
Agent Integration
Manifest pre-injection
In core/deep_agent.py, every agent run that has a resolved projectId builds a compact manifest block and prepends it to the system prompt:
<linked_folder>
path: D:\Clients\Acme\Brand (214 files, scanned 2h ago)
files:
- /briefs/kickoff.md [text] Project kickoff notes; scope, stakeholders, deadlines
- /logos/logo-v3.png [image] Final logo, golden-yellow palette on white
- /research/competitor.pdf [pdf] Competitor brand audit, 12 entries
...
</linked_folder>
Format: relativePath [kind] summary. If the rendered block exceeds ~3000 tokens, truncate to the top N files by mtimeMs DESC and append:
… {M} more files omitted, use read_project_folder_file to access by path
The backend pulls the manifest via the new drizzle-executor action:
action: read_project_folder_manifest
data: { projectId }
returns: { folderPath, lastScannedAt, files: [{relPath, kind, summary}] }
projectId resolution per agent
run_task_brief_research_stream—task.projectId.run_home— null unless the user message is project-scoped (via@projectmention or active project context passed from renderer).run_brief— backend cannot enumerate projects directly because projects live in the local SQLite. It calls a newexecute_on_clientactionlist_projects_with_folder_manifeststhat returns[{ projectId, projectName, folderPath, lastScannedAt, files: [{relPath, kind, summary}] }]for every project that has a linked folder. The backend then builds a multi-project compact manifest (top 5 most-recently-modified files per project).
New scoped tool (agents/folder_agent.py)
@tool
async def read_project_folder_file(project_id: str, relative_path: str) -> str:
"""Read full content of a file inside the project's linked folder."""
result = await execute_on_client(
action="read_project_folder_file",
data={"projectId": project_id, "relativePath": relative_path},
)
return result.get("content", "") or f"File not found: {relative_path}"
Backed by a new drizzle-executor action that:
- Looks up
projects.folderPathfor the projectId. - Resolves
path.join(folderPath, relativePath)with traversal guard (..and absolute paths rejected). - Reads the file via the existing fs helpers. Image → returns base64. Text → returns content (size-capped).
The existing journey-only FILESYSTEM_TOOLS are not added to home/brief/task-brief; only the new scoped tool is bound.
UI
Hero chip (ProjectDetail.tsx)
<FolderChip
projectId={project.id}
folderPath={project.folderPath}
totalFiles={project.folderTotalFiles}
lastScannedAt={project.folderLastScannedAt}
scanStatus={project.folderLastScanStatus}
onClick={() => scrollToTab('files')}
/>
States:
- Unlinked: dashed pill "📁 Link folder" + Sparkles icon.
- Linked idle: "📁 214 files · 2h ago" with soft golden-yellow background.
- Scanning: "📁 indexing 47/214" + spinner.
- Error: "📁 Scan failed" red-tinted; click → Files tab.
Files tab
Add 'files' to SECTIONS in ProjectTabBar.tsx. The tab body:
┌──────────────────────────────────────────────────┐
│ Linked folder │
│ ┌────────────────────────────────────────────┐ │
│ │ 📁 D:\Clients\Acme\Brand [⋯ menu] │ │
│ │ 214 files · last scanned 2h ago │ │
│ │ [Rescan] [Unlink] │ │
│ └────────────────────────────────────────────┘ │
│ │
│ Files (filter: [All] [Text] [Images] [PDF]) │
│ ┌────────────────────────────────────────────┐ │
│ │ briefs/kickoff.md │ │
│ │ Project kickoff notes; scope, deadlines │ │
│ │ logos/logo-v3.png │ │
│ │ Final logo, golden-yellow on white │ │
│ │ ... │ │
│ └────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
Empty state (no folder linked)
<Empty>
Sparkles
Link a project folder
Connect a local folder so AI agents can read its files
when answering questions about this project.
[Choose folder...] ← opens Electron dialog.showOpenDialog
</Empty>
New components (src/renderer/components/projects/folder/)
FolderChip.tsxFilesSection.tsx(mounts insideProjectDetail)FolderLinkCard.tsx(path + actions)FolderFileList.tsx(virtualized list of manifest entries)FolderUnlinkDialog.tsx
Platform gating
Feature is Electron-only. Wrap entry points in platform.isElectron. On the web SPA, the Files tab renders disabled with a tooltip "Folder linking available in desktop app".
Folder dialog
New tRPC projectFolders.chooseFolder mutation invokes dialog.showOpenDialog({ properties: ['openDirectory'] }) in the main process and returns the selected path.
i18n
Add projects.folder.* keys (title, link CTA, browse, rescan, unlink, status strings, empty state copy, error toasts) to all 5 locale JSON files: en, it, es, fr, de.
Error Handling
Quota exhaustion
- Pre-flight 402 → toast
"Folder too big for {tier} plan — max {N} files"or"Monthly token budget exhausted (resets {date})". Folder not linked. - Mid-scan
quota_exceededframe → partial manifest kept, scan markederror, toast as above, banner in Files tab"Indexing paused — quota exhausted".
Path errors
- Folder no longer exists at scan start → tRPC throws → toast
"Folder not found: {path}".folderLastScanStatus = 'error'. User offered Unlink or Re-link. - Permission denied on a file during scan → file skipped, logged in
projectFolderFileswithkind='skipped', no summary. Skipped files appear greyed in the Files tab. - Path traversal attempt in
read_project_folder_file(relativePath contains..or is absolute) → tool returns"Access denied"; backend logs a warning. Hard fail, no fallback.
Network / WS failures
- WS drop mid-scan: the in-flight session is abandoned server-side and the local
folderLastScanStatusis flipped from'scanning'to'error'. The next trigger (manual rescan, daily auto, or the next on-demand mtime check) starts a new session; because the scanner's mtime delta only re-summarizes files whosemtimeMschanged (or that have no row yet), already-indexed files are skipped naturally — there is no explicit session-resume protocol. - Backend 5xx on summarize → file marked
kind='error', retried in the next rescan, not auto-retried inline.
File-type fallbacks
- PDF parse fails (corrupt) → skipped,
kind='skipped'withsummary='Could not extract text'. - Image too large (>5 MB) → skipped with reason. Cap is a constant in
files/scanner.ts. - DOCX or other unsupported types → skipped silently with extension noted.
Concurrent scan guard
projects.folderLastScanStatus === 'scanning' blocks new scan triggers. Manual button shows "Scanning..." disabled; daily auto + mtime on-demand both check the status flag first.
Manifest size overflow
If the agent's pre-injected <linked_folder> block would exceed ~3000 tokens, the backend truncates to the top N files by mtimeMs DESC and appends an "M more files omitted" hint.
Tool call on unlinked project
read_project_folder_file when folderPath === null returns "No folder linked to project {projectId}". The agent can recover and answer without folder context.
Testing
API (api/tests/)
| File | Coverage |
|---|---|
test_folder_indexer.py |
summarize_text / summarize_image happy path, token recording, Langfuse prompt linking |
test_folder_quota.py |
Pre-flight 402 rejects (max_files + monthly_tokens), atomic increment + quota_exceeded mid-stream, monthly reset at year_month rollover |
test_ws_index_session.py |
Session lifecycle, cancel mid-stream, abandoned-on-disconnect (next scan skips already-indexed files via mtime delta), bad batch payload validation |
test_folder_agent_tool.py |
read_project_folder_file happy path, unlinked project, traversal guard (../, absolute) |
test_manifest_injection.py |
<linked_folder> block formatting, truncation past 3k tokens, multi-project brief manifest, null projectId skips injection |
Reuse fixtures in tests/conftest.py and WS test helpers (ws_unified already covers session lifecycle).
Electron / adiuvAI
No automated test suite currently. Manual smoke checks during development:
- Link folder → manifest populated → unlink → manifest rows deleted.
- Scan a synthetic dir with mixed text/image/binary → only whitelisted indexed.
- mtime delta: change one file, rescan only re-indexes that file.
- Disconnect WS mid-scan → status flips to
'error'; next manual rescan re-indexes only the remaining files (mtime delta).
Eval (Langfuse)
Build a test set of 10 representative folders (mix of markdown, code, PDFs, images). Score summary quality (LLM-as-judge) and token efficiency. Link scores to the prompt version per the existing LOCAL_AGENT_V2_PLAN.md pattern.
Out of scope (this spec)
- Phase-2 wiki tier (per-folder + per-file structured summaries).
- Multi-folder per project.
- Web SPA support.
- Token-usage display UI (Settings page comes later).
- File editing from inside adiuvAI.
- Live file watcher (chokidar). Daily + manual + on-demand mtime is enough for now.
Open questions (none)
All resolved during brainstorming.