# Scouts Refactor + Gmail Integration — Design **Date:** 2026-05-15 **Status:** Draft, awaiting user review **Owner:** Roberto ## Summary Rename the existing "Agents" subsystem to "Scouts" across the entire stack (UI, code, Postgres, SQLite, Langfuse), then add the first cloud scout — Gmail — using a two-stage pipeline that respects zero-trust (no email content stored on backend) and human-in-the-loop (no entities created autonomously). The implementation is split into four phases. Phases 1–3 ship now. Phase 4 (Stage 2 categorization, HITL surface in the brief, conversion-to-entity mutations) is deferred to the planned task-brief rework. ## Goals - Unify the user-facing "data source watchers" concept under one name: **Scout**. - Land a `SourceConnector` abstraction so future cloud scouts (Slack/Teams/Outlook/RSS/...) reuse the same engine, queue, delivery channel, and HITL surface — only the per-source connector is new. - Ship a Gmail scout end-to-end with: OAuth, push (`users.watch`) + cron-fallback polling, BE-side spam triage, encrypted token storage, opt-in spam auto-trash. - Preserve zero-trust: Gmail bodies are fetched transiently for the triage LLM call and discarded; only `{message_id, scout_id, verdict, status}` is persisted on BE. - Preserve HITL on the cloud path: scouts never create tasks/projects/events/notes autonomously; they accumulate proposals that the user resolves later from the brief. ## Non-Goals (Phase 4, separate spec) - Stage 2 categorization agent prompt + tool palette. - HITL UI in the task brief (suggestion cards, approve/reject controls, convert-to-entity mutations, `list_pending_scout_suggestions` brief tool). - Local scout behavior change. Local directory monitor keeps current "auto-create" semantics. HITL is opt-in for local scouts in a future migration. - Schema unification of `LocalScoutConfig` + `CloudScoutConfig`. They have different behaviors; keep separate tables. - Connectors other than Gmail (Slack/Teams/Outlook). - Stripe/billing changes (existing tier checks suffice). ## Constraints - **Pre-1.0 dev**: no production users, no backwards-compatibility shims, no Alembic data migrations beyond rename. Drop-and-recreate is acceptable where simpler. - **Zero-trust**: BE never persists user content. Gmail bodies are read transiently for the triage LLM call only. - **HITL (cloud path)**: scouts produce proposals, never entities. - **Spam auto-trash**: off by default per scout; opt-in via UI toggle. Action is "move to Trash" (Gmail's 30d recovery), never permanent delete. - **Reusability**: cloud-scout pipeline (connector → triage → queue → deliver-on-connect → HITL) is shared infra; Gmail is just the first connector. ## Architecture ### Two-stage pipeline (cloud scouts only) ``` [Gmail] --push/cron--> [BE Stage 1: Triage] [Electron Stage 2: Categorize] | | v v fetch body (transient) drain queue on WS reconnect | | v v LLM relevance call fetch metadata for each msg | | +-- spam + auto_trash_spam: archive v | insert scout_suggestions row +-- relevant: insert queue row (category='unprocessed' stub until Phase 4) ``` **Stage 1 (BE, always-on):** verdict only. Stores `{msg_id, verdict, status}`. No content. **Stage 2 (Electron, on connect):** Phase 3 ships a stub that simply mirrors the queue into a local SQLite table with `category='unprocessed'`. Phase 4 swaps in the real categorization agent. ### Local scouts (unchanged behaviorally) Local directory monitor keeps current Electron-side scheduling and auto-creation. Only renames apply. ### SourceConnector abstraction A `SourceConnector` Protocol owns all source-specific I/O. The shared `ScoutEngine` owns triage, queueing, delivery, and ack handling. To add a new cloud scout: implement one connector class + register it. ```python # app/scouts/connectors/base.py class SourceConnector(Protocol): source_type: str # "gmail" async def list_new(self, scout: CloudScoutConfig) -> list[ItemRef]: ... async def fetch_metadata(self, scout: CloudScoutConfig, ref: ItemRef) -> ItemMetadata: ... async def fetch_content(self, scout: CloudScoutConfig, ref: ItemRef) -> ItemContent: ... async def archive(self, scout: CloudScoutConfig, ref: ItemRef) -> None: ... async def setup_watch(self, scout: CloudScoutConfig) -> None: ... async def renew_watch(self, scout: CloudScoutConfig) -> None: ... ``` `ItemContent.body_text` is in-memory only; never persisted. ### ScoutEngine ```python class ScoutEngine: async def trigger_scout(self, scout_id: UUID) -> None: ... async def _process_item(self, scout, connector, ref) -> None: ... async def deliver_pending(self, user_id: UUID, ws: DeviceWS) -> None: ... ``` Both webhook and cron-fallback entry points call `trigger_scout`. ## Data Model ### Postgres (BE) #### Renames (Phase 1, single Alembic migration) | Before | After | |--------------------------------|--------------------------------| | Table `local_agent_configs` | `local_scout_configs` | | Table `cloud_agent_configs` | `cloud_scout_configs` | | Table `agent_run_logs` | `scout_run_logs` | | Column `agent_config` | `scout_config` | | Column `agent_id` (FKs) | `scout_id` | | Column `agent_run_id` | `scout_run_id` | | Class `LocalAgentConfig` | `LocalScoutConfig` | | Class `CloudAgentConfig` | `CloudScoutConfig` | | Class `AgentRunLog` | `ScoutRunLog` | #### New (Phase 2) ```sql CREATE TABLE scout_triage_queue ( id uuid PRIMARY KEY, user_id uuid NOT NULL REFERENCES users(id), scout_id uuid NOT NULL REFERENCES cloud_scout_configs(id), source_type text NOT NULL, -- "gmail" source_msg_ref text NOT NULL, -- gmail message id triage_verdict text NOT NULL, -- "relevant" triage_reason text, -- short LLM reason for debug status text NOT NULL DEFAULT 'queued', -- queued | delivered | acked | expired triaged_at timestamptz NOT NULL DEFAULT now(), delivered_at timestamptz, acked_at timestamptz, expires_at timestamptz NOT NULL, -- triaged_at + 30d UNIQUE (scout_id, source_msg_ref) -- idempotent webhook retries ); CREATE INDEX ON scout_triage_queue (user_id, status); CREATE INDEX ON scout_triage_queue (expires_at) WHERE status != 'acked'; ``` #### Alterations to `cloud_scout_configs` (Phase 2) ```sql ALTER TABLE cloud_scout_configs ADD COLUMN auto_trash_spam boolean NOT NULL DEFAULT false; ALTER TABLE cloud_scout_configs ADD COLUMN gmail_history_id text; ALTER TABLE cloud_scout_configs ADD COLUMN gmail_watch_expires_at timestamptz; ALTER TABLE cloud_scout_configs ADD COLUMN device_inactivity_pause_days int NOT NULL DEFAULT 14; ``` OAuth tokens continue to live in the existing `cloud_scout_configs.oauth_token_encrypted` column. Encryption mechanism (key derivation, rotation) is reused unchanged. A pre-implementation investigation step will document the current key-management story so we know the threat model; hardening, if needed, is out of scope. ### SQLite (Electron, Drizzle) #### Renames (Phase 1) | Before | After | |---------------------|---------------------| | `agent_runs` | `scout_runs` | | `agent_run_actions` | `scout_run_actions` | | Col `agent_id` | `scout_id` | #### New (Phase 2) ```typescript export const scoutSuggestions = sqliteTable('scout_suggestions', { id: text().primaryKey(), scoutId: text().notNull(), sourceType: text().notNull(), // "gmail" sourceMsgRef: text().notNull(), category: text().notNull(), // "unprocessed" until Phase 4 payload: text(), // JSON, populated by Phase 4 rawSubject: text(), // populated on delivery rawSnippet: text(), // populated on delivery status: text().notNull(), // pending | approved | rejected | expired proposedAt: integer().notNull(), // ms epoch resolvedAt: integer(), resolvedEntityType: text(), // "task" | "project" | ... after Phase 4 approval resolvedEntityId: text(), }); ``` `rawSubject` + `rawSnippet` are stored locally to render the HITL card without re-hitting Gmail every render. Body is still NOT stored — fetched on-demand via a tool call when the user explicitly opens the suggestion. ## WebSocket Frame Contract Existing `/api/v1/device` channel. Two new frame types. ```typescript // BE → Electron { type: 'scout_proposal', proposal: { id: string, scoutId: string, sourceType: 'gmail', sourceMsgRef: string, rawSubject: string | null, rawSnippet: string | null, category: 'unprocessed', payload: null } } // Electron → BE { type: 'scout_proposal_ack', proposalId: string } ``` On WS reconnect, BE's `ScoutEngine.deliver_pending(user_id, ws)` selects all `status='queued'` rows for the user, calls `connector.fetch_metadata` per row (subject + snippet only), sends one `scout_proposal` frame each, and flips `status='delivered'` + sets `delivered_at` upon ack. ## Stage 1 Triage Detail ``` Webhook (Pub/Sub) or cron tick -> ScoutEngine.trigger_scout(scout_id) -> if device inactive > N days: skip (pause) -> connector.list_new(scout) -> [ItemRef] -> for each ref: - if (scout_id, source_msg_ref) already in queue: skip (idempotent) - content = await connector.fetch_content(scout, ref) # transient - verdict = await ScoutEngine._triage_llm(scout, content) # gpt-4o-mini - if verdict == spam: - if scout.auto_trash_spam: connector.archive(...) - return # not queued - INSERT scout_triage_queue row -> UPDATE cloud_scout_configs.last_run_at -> INSERT scout_run_logs row ``` ### Triage LLM contract - **Prompt name (Langfuse):** `scout-triage-system` — source-agnostic, parameterized by `source_type`. - **Input:** `{source_type, scout_name, scout_purpose, item_subject, item_sender, item_body_truncated_2k}`. - **Output (structured, Pydantic `TriageVerdict`):** `{verdict: "relevant" | "spam", reason: str, confidence: float}`. - **Cost guard:** body truncated at 2k chars before LLM call. ### Failure modes - LLM call fails: log error, leave message unprocessed, retry on next webhook/cron. - Gmail 401 (refresh exhausted): mark scout `enabled=false`, surface re-auth prompt to user via WS frame on next device connect. - Pub/Sub webhook unverified JWT: 401. ## Gmail Push Setup - On scout enable: `GmailConnector.setup_watch(scout)` calls `users.watch` against a single project-wide Pub/Sub topic. - `gmail_watch_expires_at` stored. Watches expire after 7 days. - Weekly cron `_scout_watch_renewal_tick` re-issues `watch` for any scout whose expiry is within 24h. - Webhook route: `POST /api/v1/scouts/webhooks/gmail`. Verifies Pub/Sub-signed JWT, resolves user via the email address in the payload, enqueues triage job. - Cron fallback (`_scout_cron_tick`, runs each scout's `schedule_cron`): polls `users.history.list` since `gmail_history_id`, updates `gmail_history_id` after. ## Terminology Refactor (Detail) ### Renamed | Surface | Before | After | |-------------------|-----------------------------------------------------|-----------------------------------------------------| | Settings nav | `settings.agents` "Agents" | `settings.scouts` "Scouts" | | Subtitle/desc | `settings.agentsSubtitle`, `agentsDescription` | `settings.scoutsSubtitle`, `scoutsDescription` | | `agents.*` keys | `noAgentsYet`, `createAgent`, `yourAgents`, etc. | `scouts.noScoutsYet`, `createScout`, `yourScouts` | | `toast.agent.*` | `created`, `runStarted`, etc. | `toast.scout.*` | | Components | `AgentsSection`, `AgentRow`, `LocalAgentConfigPanel`, `CloudAgentConfigPanel`, `InlineAgentCreationStepper` | `ScoutsSection`, `ScoutRow`, `LocalScoutConfigPanel`, `CloudScoutConfigPanel`, `InlineScoutCreationStepper` | | TS types | `LocalAgentConfig`, `CloudAgentConfig` | `LocalScoutConfig`, `CloudScoutConfig` | | tRPC router | `agent.local`, `agent.cloud`, `agent.journey`, `agent.runs`, `agent.runActions` | `scout.local`, `scout.cloud`, `scout.journey`, `scout.runs`, `scout.runActions` | | Drizzle tables | `agent_runs`, `agent_run_actions` | `scout_runs`, `scout_run_actions` | | Main process | `src/main/agents/agent-scheduler.ts` | `src/main/scouts/scout-scheduler.ts` | | BE routes | `/api/v1/agents/*`, `/api/v1/agent-setup` | `/api/v1/scouts/*`, `/api/v1/scout-setup` | | BE modules | `routes/agents.py`, `routes/agent_setup.py`, `core/agent_runner.py`, `core/agent_session_buffer.py`, `core/agent_registry.py` | `routes/scouts.py`, `routes/scout_setup.py`, `core/scout_runner.py`, `core/scout_session_buffer.py`, `core/scout_registry.py` | | Postgres tables | `local_agent_configs`, `cloud_agent_configs`, `agent_run_logs` | `local_scout_configs`, `cloud_scout_configs`, `scout_run_logs` | | Postgres columns | `agent_config`, `agent_id`, `agent_run_id` | `scout_config`, `scout_id`, `scout_run_id` | | SQLAlchemy models | `LocalAgentConfig`, `CloudAgentConfig`, `AgentRunLog` | `LocalScoutConfig`, `CloudScoutConfig`, `ScoutRunLog` | | Langfuse prompts | user-facing scout prompts named `agent-*` | recreate as `scout-*`; delete old | | i18n | 5 langs (en/it/es/fr/de) | all updated atomically | ### Kept as-is - `app/agents/*` Python module — these are LLM helper agents (task_agent, project_agent, note_agent, timeline_agent, filesystem_agent) invoked internally by `deep_agent`. Different concept from user-facing scouts. Renaming would create semantic clash with LLM-agent terminology. - `/api/v1/device` WS endpoint name (already source-neutral). - All `tool_call`, `run_complete`, etc. WS frame types unrelated to scouts. ## Phasing ### Phase 1 — Rename only - Single PR. Single Alembic migration. Single Drizzle migration. - All renames listed above land together. App still works, existing local scout still runs. No new behavior. ### Phase 2 — Connector abstraction skeleton - New module `app/scouts/connectors/{base,registry,gmail}.py`. - New module `app/scouts/engine.py`. - New table `scout_triage_queue` + alterations to `cloud_scout_configs`. - New SQLite table `scout_suggestions` (Drizzle). - New WS frame types `scout_proposal` + `scout_proposal_ack`. - No user-facing change yet. ### Phase 3 — Gmail scout end-to-end - Settings UI: "Add Gmail scout" → OAuth consent (separate scope set: `gmail.readonly` + `gmail.modify`) → encrypted token stored in `cloud_scout_configs.oauth_token_encrypted` → save scout config. - Pub/Sub topic + webhook route + JWT verify. - `setup_watch` on enable; weekly `renew_watch` cron. - Cron-fallback `_scout_cron_tick` per scout. - Triage LLM (gpt-4o-mini, Langfuse `scout-triage-system`). - Spam auto-trash toggle (default off) per scout. - Device-inactivity pause logic. - WS deliver-on-reconnect drains queue → `scout_proposal` frames → ack handler → SQLite `scout_suggestions` insert with `category='unprocessed'` (Phase 4 swaps real categorization in). - "Read full email" tool call: Electron requests body for a suggestion → BE `GmailConnector.fetch_content` → returns body transiently in tool result. ### Phase 4 — Deferred (separate spec, with task-brief rework) - Stage 2 categorization agent (prompt + tool palette: `list_projects`, `list_tasks`, `search_notes`, memory). - HITL UI surface in the brief: suggestion cards, approve/reject controls, "convert to task | event | note | project | actionable-only" actions. - `list_pending_scout_suggestions` brief tool. - Convert-to-entity mutations. - Future connectors (Slack/Teams/Outlook/...). ## Testing Surface - **Phase 1:** existing pytest suite still green with renamed identifiers (auth, ws_unified, schemas, models, etc.). UI smoke: settings page renders, existing local scout runs. - **Phase 2:** unit tests for `ScoutEngine` w/ mocked `SourceConnector`. Idempotency test (replay same `source_msg_ref`). - **Phase 3:** integration tests for Gmail webhook → triage → queue insertion (mocked `GmailConnector` for content fetch and LLM). E2E (manual): connect a real Gmail account on dev, send an email, observe queue row appear, reconnect device, observe `scout_suggestions` row land with subject/snippet. ## Open Questions (none blocking) - OAuth-token encryption key derivation (app-global vs user-derived) — investigation step in implementation plan; document current state, security hardening is out of scope. - Pub/Sub topic naming and IAM setup (one topic project-wide vs per-environment) — operational detail to decide during Phase 3. ## Risks - Pub/Sub setup is per-Google-Cloud-project and requires console IAM grants — first-time setup friction. - Gmail `users.watch` quota: 1 watch per user. We use one watch per scout, but a user has only one Gmail scout per Gmail account, so this is fine. - `_pending_states` dict pattern in existing OAuth flow is in-memory — Pub/Sub webhook can run on any worker, so any cross-request state must be in DB, not in-memory. This design uses no in-memory state; safe. ## Acceptance - All renames land atomically; app boots; existing local scout still operates. - A user can connect Gmail through the Scouts settings page, see the scout marked enabled, send themselves a test email, and observe a `scout_suggestions` row appear in their local DB with `category='unprocessed'`, `rawSubject`, and `rawSnippet` populated, after the next WS reconnect. - Spam emails (per LLM triage) are not queued; if `auto_trash_spam=true` they appear in Gmail Trash. - BE never persists email bodies. Verified by code review of triage flow + grep for `body_text` writes.