Phases 1-3 in scope: rename agents → scouts (UI/code/Postgres/SQLite/ Langfuse), Gmail cloud scout w/ two-stage pipeline, SourceConnector abstraction. Phase 4 (Stage 2 categorization + HITL surface in brief) deferred to task-brief rework. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
19 KiB
Scouts Refactor + Gmail Integration — Design
Date: 2026-05-15 Status: Draft, awaiting user review Owner: Roberto
Summary
Rename the existing "Agents" subsystem to "Scouts" across the entire stack (UI, code, Postgres, SQLite, Langfuse), then add the first cloud scout — Gmail — using a two-stage pipeline that respects zero-trust (no email content stored on backend) and human-in-the-loop (no entities created autonomously).
The implementation is split into four phases. Phases 1–3 ship now. Phase 4 (Stage 2 categorization, HITL surface in the brief, conversion-to-entity mutations) is deferred to the planned task-brief rework.
Goals
- Unify the user-facing "data source watchers" concept under one name: Scout.
- Land a
SourceConnectorabstraction so future cloud scouts (Slack/Teams/Outlook/RSS/...) reuse the same engine, queue, delivery channel, and HITL surface — only the per-source connector is new. - Ship a Gmail scout end-to-end with: OAuth, push (
users.watch) + cron-fallback polling, BE-side spam triage, encrypted token storage, opt-in spam auto-trash. - Preserve zero-trust: Gmail bodies are fetched transiently for the triage LLM call and discarded; only
{message_id, scout_id, verdict, status}is persisted on BE. - Preserve HITL on the cloud path: scouts never create tasks/projects/events/notes autonomously; they accumulate proposals that the user resolves later from the brief.
Non-Goals (Phase 4, separate spec)
- Stage 2 categorization agent prompt + tool palette.
- HITL UI in the task brief (suggestion cards, approve/reject controls, convert-to-entity mutations,
list_pending_scout_suggestionsbrief tool). - Local scout behavior change. Local directory monitor keeps current "auto-create" semantics. HITL is opt-in for local scouts in a future migration.
- Schema unification of
LocalScoutConfig+CloudScoutConfig. They have different behaviors; keep separate tables. - Connectors other than Gmail (Slack/Teams/Outlook).
- Stripe/billing changes (existing tier checks suffice).
Constraints
- Pre-1.0 dev: no production users, no backwards-compatibility shims, no Alembic data migrations beyond rename. Drop-and-recreate is acceptable where simpler.
- Zero-trust: BE never persists user content. Gmail bodies are read transiently for the triage LLM call only.
- HITL (cloud path): scouts produce proposals, never entities.
- Spam auto-trash: off by default per scout; opt-in via UI toggle. Action is "move to Trash" (Gmail's 30d recovery), never permanent delete.
- Reusability: cloud-scout pipeline (connector → triage → queue → deliver-on-connect → HITL) is shared infra; Gmail is just the first connector.
Architecture
Two-stage pipeline (cloud scouts only)
[Gmail] --push/cron--> [BE Stage 1: Triage] [Electron Stage 2: Categorize]
| |
v v
fetch body (transient) drain queue on WS reconnect
| |
v v
LLM relevance call fetch metadata for each msg
| |
+-- spam + auto_trash_spam: archive v
| insert scout_suggestions row
+-- relevant: insert queue row (category='unprocessed' stub
until Phase 4)
Stage 1 (BE, always-on): verdict only. Stores {msg_id, verdict, status}. No content.
Stage 2 (Electron, on connect): Phase 3 ships a stub that simply mirrors the queue into a local SQLite table with category='unprocessed'. Phase 4 swaps in the real categorization agent.
Local scouts (unchanged behaviorally)
Local directory monitor keeps current Electron-side scheduling and auto-creation. Only renames apply.
SourceConnector abstraction
A SourceConnector Protocol owns all source-specific I/O. The shared ScoutEngine owns triage, queueing, delivery, and ack handling. To add a new cloud scout: implement one connector class + register it.
# app/scouts/connectors/base.py
class SourceConnector(Protocol):
source_type: str # "gmail"
async def list_new(self, scout: CloudScoutConfig) -> list[ItemRef]: ...
async def fetch_metadata(self, scout: CloudScoutConfig, ref: ItemRef) -> ItemMetadata: ...
async def fetch_content(self, scout: CloudScoutConfig, ref: ItemRef) -> ItemContent: ...
async def archive(self, scout: CloudScoutConfig, ref: ItemRef) -> None: ...
async def setup_watch(self, scout: CloudScoutConfig) -> None: ...
async def renew_watch(self, scout: CloudScoutConfig) -> None: ...
ItemContent.body_text is in-memory only; never persisted.
ScoutEngine
class ScoutEngine:
async def trigger_scout(self, scout_id: UUID) -> None: ...
async def _process_item(self, scout, connector, ref) -> None: ...
async def deliver_pending(self, user_id: UUID, ws: DeviceWS) -> None: ...
Both webhook and cron-fallback entry points call trigger_scout.
Data Model
Postgres (BE)
Renames (Phase 1, single Alembic migration)
| Before | After |
|---|---|
Table local_agent_configs |
local_scout_configs |
Table cloud_agent_configs |
cloud_scout_configs |
Table agent_run_logs |
scout_run_logs |
Column agent_config |
scout_config |
Column agent_id (FKs) |
scout_id |
Column agent_run_id |
scout_run_id |
Class LocalAgentConfig |
LocalScoutConfig |
Class CloudAgentConfig |
CloudScoutConfig |
Class AgentRunLog |
ScoutRunLog |
New (Phase 2)
CREATE TABLE scout_triage_queue (
id uuid PRIMARY KEY,
user_id uuid NOT NULL REFERENCES users(id),
scout_id uuid NOT NULL REFERENCES cloud_scout_configs(id),
source_type text NOT NULL, -- "gmail"
source_msg_ref text NOT NULL, -- gmail message id
triage_verdict text NOT NULL, -- "relevant"
triage_reason text, -- short LLM reason for debug
status text NOT NULL DEFAULT 'queued', -- queued | delivered | acked | expired
triaged_at timestamptz NOT NULL DEFAULT now(),
delivered_at timestamptz,
acked_at timestamptz,
expires_at timestamptz NOT NULL, -- triaged_at + 30d
UNIQUE (scout_id, source_msg_ref) -- idempotent webhook retries
);
CREATE INDEX ON scout_triage_queue (user_id, status);
CREATE INDEX ON scout_triage_queue (expires_at) WHERE status != 'acked';
Alterations to cloud_scout_configs (Phase 2)
ALTER TABLE cloud_scout_configs ADD COLUMN auto_trash_spam boolean NOT NULL DEFAULT false;
ALTER TABLE cloud_scout_configs ADD COLUMN gmail_history_id text;
ALTER TABLE cloud_scout_configs ADD COLUMN gmail_watch_expires_at timestamptz;
ALTER TABLE cloud_scout_configs ADD COLUMN device_inactivity_pause_days int NOT NULL DEFAULT 14;
OAuth tokens continue to live in the existing cloud_scout_configs.oauth_token_encrypted column. Encryption mechanism (key derivation, rotation) is reused unchanged. A pre-implementation investigation step will document the current key-management story so we know the threat model; hardening, if needed, is out of scope.
SQLite (Electron, Drizzle)
Renames (Phase 1)
| Before | After |
|---|---|
agent_runs |
scout_runs |
agent_run_actions |
scout_run_actions |
Col agent_id |
scout_id |
New (Phase 2)
export const scoutSuggestions = sqliteTable('scout_suggestions', {
id: text().primaryKey(),
scoutId: text().notNull(),
sourceType: text().notNull(), // "gmail"
sourceMsgRef: text().notNull(),
category: text().notNull(), // "unprocessed" until Phase 4
payload: text(), // JSON, populated by Phase 4
rawSubject: text(), // populated on delivery
rawSnippet: text(), // populated on delivery
status: text().notNull(), // pending | approved | rejected | expired
proposedAt: integer().notNull(), // ms epoch
resolvedAt: integer(),
resolvedEntityType: text(), // "task" | "project" | ... after Phase 4 approval
resolvedEntityId: text(),
});
rawSubject + rawSnippet are stored locally to render the HITL card without re-hitting Gmail every render. Body is still NOT stored — fetched on-demand via a tool call when the user explicitly opens the suggestion.
WebSocket Frame Contract
Existing /api/v1/device channel. Two new frame types.
// BE → Electron
{
type: 'scout_proposal',
proposal: {
id: string,
scoutId: string,
sourceType: 'gmail',
sourceMsgRef: string,
rawSubject: string | null,
rawSnippet: string | null,
category: 'unprocessed',
payload: null
}
}
// Electron → BE
{ type: 'scout_proposal_ack', proposalId: string }
On WS reconnect, BE's ScoutEngine.deliver_pending(user_id, ws) selects all status='queued' rows for the user, calls connector.fetch_metadata per row (subject + snippet only), sends one scout_proposal frame each, and flips status='delivered' + sets delivered_at upon ack.
Stage 1 Triage Detail
Webhook (Pub/Sub) or cron tick
-> ScoutEngine.trigger_scout(scout_id)
-> if device inactive > N days: skip (pause)
-> connector.list_new(scout) -> [ItemRef]
-> for each ref:
- if (scout_id, source_msg_ref) already in queue: skip (idempotent)
- content = await connector.fetch_content(scout, ref) # transient
- verdict = await ScoutEngine._triage_llm(scout, content) # gpt-4o-mini
- if verdict == spam:
- if scout.auto_trash_spam: connector.archive(...)
- return # not queued
- INSERT scout_triage_queue row
-> UPDATE cloud_scout_configs.last_run_at
-> INSERT scout_run_logs row
Triage LLM contract
- Prompt name (Langfuse):
scout-triage-system— source-agnostic, parameterized bysource_type. - Input:
{source_type, scout_name, scout_purpose, item_subject, item_sender, item_body_truncated_2k}. - Output (structured, Pydantic
TriageVerdict):{verdict: "relevant" | "spam", reason: str, confidence: float}. - Cost guard: body truncated at 2k chars before LLM call.
Failure modes
- LLM call fails: log error, leave message unprocessed, retry on next webhook/cron.
- Gmail 401 (refresh exhausted): mark scout
enabled=false, surface re-auth prompt to user via WS frame on next device connect. - Pub/Sub webhook unverified JWT: 401.
Gmail Push Setup
- On scout enable:
GmailConnector.setup_watch(scout)callsusers.watchagainst a single project-wide Pub/Sub topic. gmail_watch_expires_atstored. Watches expire after 7 days.- Weekly cron
_scout_watch_renewal_tickre-issueswatchfor any scout whose expiry is within 24h. - Webhook route:
POST /api/v1/scouts/webhooks/gmail. Verifies Pub/Sub-signed JWT, resolves user via the email address in the payload, enqueues triage job. - Cron fallback (
_scout_cron_tick, runs each scout'sschedule_cron): pollsusers.history.listsincegmail_history_id, updatesgmail_history_idafter.
Terminology Refactor (Detail)
Renamed
| Surface | Before | After |
|---|---|---|
| Settings nav | settings.agents "Agents" |
settings.scouts "Scouts" |
| Subtitle/desc | settings.agentsSubtitle, agentsDescription |
settings.scoutsSubtitle, scoutsDescription |
agents.* keys |
noAgentsYet, createAgent, yourAgents, etc. |
scouts.noScoutsYet, createScout, yourScouts |
toast.agent.* |
created, runStarted, etc. |
toast.scout.* |
| Components | AgentsSection, AgentRow, LocalAgentConfigPanel, CloudAgentConfigPanel, InlineAgentCreationStepper |
ScoutsSection, ScoutRow, LocalScoutConfigPanel, CloudScoutConfigPanel, InlineScoutCreationStepper |
| TS types | LocalAgentConfig, CloudAgentConfig |
LocalScoutConfig, CloudScoutConfig |
| tRPC router | agent.local, agent.cloud, agent.journey, agent.runs, agent.runActions |
scout.local, scout.cloud, scout.journey, scout.runs, scout.runActions |
| Drizzle tables | agent_runs, agent_run_actions |
scout_runs, scout_run_actions |
| Main process | src/main/agents/agent-scheduler.ts |
src/main/scouts/scout-scheduler.ts |
| BE routes | /api/v1/agents/*, /api/v1/agent-setup |
/api/v1/scouts/*, /api/v1/scout-setup |
| BE modules | routes/agents.py, routes/agent_setup.py, core/agent_runner.py, core/agent_session_buffer.py, core/agent_registry.py |
routes/scouts.py, routes/scout_setup.py, core/scout_runner.py, core/scout_session_buffer.py, core/scout_registry.py |
| Postgres tables | local_agent_configs, cloud_agent_configs, agent_run_logs |
local_scout_configs, cloud_scout_configs, scout_run_logs |
| Postgres columns | agent_config, agent_id, agent_run_id |
scout_config, scout_id, scout_run_id |
| SQLAlchemy models | LocalAgentConfig, CloudAgentConfig, AgentRunLog |
LocalScoutConfig, CloudScoutConfig, ScoutRunLog |
| Langfuse prompts | user-facing scout prompts named agent-* |
recreate as scout-*; delete old |
| i18n | 5 langs (en/it/es/fr/de) | all updated atomically |
Kept as-is
app/agents/*Python module — these are LLM helper agents (task_agent, project_agent, note_agent, timeline_agent, filesystem_agent) invoked internally bydeep_agent. Different concept from user-facing scouts. Renaming would create semantic clash with LLM-agent terminology./api/v1/deviceWS endpoint name (already source-neutral).- All
tool_call,run_complete, etc. WS frame types unrelated to scouts.
Phasing
Phase 1 — Rename only
- Single PR. Single Alembic migration. Single Drizzle migration.
- All renames listed above land together. App still works, existing local scout still runs. No new behavior.
Phase 2 — Connector abstraction skeleton
- New module
app/scouts/connectors/{base,registry,gmail}.py. - New module
app/scouts/engine.py. - New table
scout_triage_queue+ alterations tocloud_scout_configs. - New SQLite table
scout_suggestions(Drizzle). - New WS frame types
scout_proposal+scout_proposal_ack. - No user-facing change yet.
Phase 3 — Gmail scout end-to-end
- Settings UI: "Add Gmail scout" → OAuth consent (separate scope set:
gmail.readonly+gmail.modify) → encrypted token stored incloud_scout_configs.oauth_token_encrypted→ save scout config. - Pub/Sub topic + webhook route + JWT verify.
setup_watchon enable; weeklyrenew_watchcron.- Cron-fallback
_scout_cron_tickper scout. - Triage LLM (gpt-4o-mini, Langfuse
scout-triage-system). - Spam auto-trash toggle (default off) per scout.
- Device-inactivity pause logic.
- WS deliver-on-reconnect drains queue →
scout_proposalframes → ack handler → SQLitescout_suggestionsinsert withcategory='unprocessed'(Phase 4 swaps real categorization in). - "Read full email" tool call: Electron requests body for a suggestion → BE
GmailConnector.fetch_content→ returns body transiently in tool result.
Phase 4 — Deferred (separate spec, with task-brief rework)
- Stage 2 categorization agent (prompt + tool palette:
list_projects,list_tasks,search_notes, memory). - HITL UI surface in the brief: suggestion cards, approve/reject controls, "convert to task | event | note | project | actionable-only" actions.
list_pending_scout_suggestionsbrief tool.- Convert-to-entity mutations.
- Future connectors (Slack/Teams/Outlook/...).
Testing Surface
- Phase 1: existing pytest suite still green with renamed identifiers (auth, ws_unified, schemas, models, etc.). UI smoke: settings page renders, existing local scout runs.
- Phase 2: unit tests for
ScoutEnginew/ mockedSourceConnector. Idempotency test (replay samesource_msg_ref). - Phase 3: integration tests for Gmail webhook → triage → queue insertion (mocked
GmailConnectorfor content fetch and LLM). E2E (manual): connect a real Gmail account on dev, send an email, observe queue row appear, reconnect device, observescout_suggestionsrow land with subject/snippet.
Open Questions (none blocking)
- OAuth-token encryption key derivation (app-global vs user-derived) — investigation step in implementation plan; document current state, security hardening is out of scope.
- Pub/Sub topic naming and IAM setup (one topic project-wide vs per-environment) — operational detail to decide during Phase 3.
Risks
- Pub/Sub setup is per-Google-Cloud-project and requires console IAM grants — first-time setup friction.
- Gmail
users.watchquota: 1 watch per user. We use one watch per scout, but a user has only one Gmail scout per Gmail account, so this is fine. _pending_statesdict pattern in existing OAuth flow is in-memory — Pub/Sub webhook can run on any worker, so any cross-request state must be in DB, not in-memory. This design uses no in-memory state; safe.
Acceptance
- All renames land atomically; app boots; existing local scout still operates.
- A user can connect Gmail through the Scouts settings page, see the scout marked enabled, send themselves a test email, and observe a
scout_suggestionsrow appear in their local DB withcategory='unprocessed',rawSubject, andrawSnippetpopulated, after the next WS reconnect. - Spam emails (per LLM triage) are not queued; if
auto_trash_spam=truethey appear in Gmail Trash. - BE never persists email bodies. Verified by code review of triage flow + grep for
body_textwrites.