From 732a4c42f87d75e012a5f72376256a045ef906ee Mon Sep 17 00:00:00 2001
From: Roberto <roberto.musso@hpecds.com>
Date: Fri, 15 May 2026 23:15:46 +0200
Subject: [PATCH] docs: add scouts refactor + gmail scout design spec
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Phases 1-3 in scope: rename agents → scouts (UI/code/Postgres/SQLite/
Langfuse), Gmail cloud scout w/ two-stage pipeline, SourceConnector
abstraction. Phase 4 (Stage 2 categorization + HITL surface in brief)
deferred to task-brief rework.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 ...s-refactor-and-gmail-integration-design.md | 327 ++++++++++++++++++
 1 file changed, 327 insertions(+)
 create mode 100644 docs/superpowers/specs/2026-05-15-scouts-refactor-and-gmail-integration-design.md

diff --git a/docs/superpowers/specs/2026-05-15-scouts-refactor-and-gmail-integration-design.md b/docs/superpowers/specs/2026-05-15-scouts-refactor-and-gmail-integration-design.md
new file mode 100644
index 0000000..78ccd65
--- /dev/null
+++ b/docs/superpowers/specs/2026-05-15-scouts-refactor-and-gmail-integration-design.md
@@ -0,0 +1,327 @@
+# Scouts Refactor + Gmail Integration — Design
+
+**Date:** 2026-05-15
+**Status:** Draft, awaiting user review
+**Owner:** Roberto
+
+## Summary
+
+Rename the existing "Agents" subsystem to "Scouts" across the entire stack (UI, code, Postgres, SQLite, Langfuse), then add the first cloud scout — Gmail — using a two-stage pipeline that respects zero-trust (no email content stored on backend) and human-in-the-loop (no entities created autonomously).
+
+The implementation is split into four phases. Phases 1–3 ship now. Phase 4 (Stage 2 categorization, HITL surface in the brief, conversion-to-entity mutations) is deferred to the planned task-brief rework.
+
+## Goals
+
+- Unify the user-facing "data source watchers" concept under one name: **Scout**.
+- Land a `SourceConnector` abstraction so future cloud scouts (Slack/Teams/Outlook/RSS/...) reuse the same engine, queue, delivery channel, and HITL surface — only the per-source connector is new.
+- Ship a Gmail scout end-to-end with: OAuth, push (`users.watch`) + cron-fallback polling, BE-side spam triage, encrypted token storage, opt-in spam auto-trash.
+- Preserve zero-trust: Gmail bodies are fetched transiently for the triage LLM call and discarded; only `{message_id, scout_id, verdict, status}` is persisted on BE.
+- Preserve HITL on the cloud path: scouts never create tasks/projects/events/notes autonomously; they accumulate proposals that the user resolves later from the brief.
+
+## Non-Goals (Phase 4, separate spec)
+
+- Stage 2 categorization agent prompt + tool palette.
+- HITL UI in the task brief (suggestion cards, approve/reject controls, convert-to-entity mutations, `list_pending_scout_suggestions` brief tool).
+- Local scout behavior change. Local directory monitor keeps current "auto-create" semantics. HITL is opt-in for local scouts in a future migration.
+- Schema unification of `LocalScoutConfig` + `CloudScoutConfig`. They have different behaviors; keep separate tables.
+- Connectors other than Gmail (Slack/Teams/Outlook).
+- Stripe/billing changes (existing tier checks suffice).
+
+## Constraints
+
+- **Pre-1.0 dev**: no production users, no backwards-compatibility shims, no Alembic data migrations beyond rename. Drop-and-recreate is acceptable where simpler.
+- **Zero-trust**: BE never persists user content. Gmail bodies are read transiently for the triage LLM call only.
+- **HITL (cloud path)**: scouts produce proposals, never entities.
+- **Spam auto-trash**: off by default per scout; opt-in via UI toggle. Action is "move to Trash" (Gmail's 30d recovery), never permanent delete.
+- **Reusability**: cloud-scout pipeline (connector → triage → queue → deliver-on-connect → HITL) is shared infra; Gmail is just the first connector.
+
+## Architecture
+
+### Two-stage pipeline (cloud scouts only)
+
+```
+[Gmail] --push/cron--> [BE Stage 1: Triage]                  [Electron Stage 2: Categorize]
+                          |                                     |
+                          v                                     v
+                       fetch body (transient)               drain queue on WS reconnect
+                          |                                     |
+                          v                                     v
+                       LLM relevance call                   fetch metadata for each msg
+                          |                                     |
+                          +-- spam + auto_trash_spam: archive   v
+                          |                                  insert scout_suggestions row
+                          +-- relevant: insert queue row     (category='unprocessed' stub
+                                                              until Phase 4)
+```
+
+**Stage 1 (BE, always-on):** verdict only. Stores `{msg_id, verdict, status}`. No content.
+
+**Stage 2 (Electron, on connect):** Phase 3 ships a stub that simply mirrors the queue into a local SQLite table with `category='unprocessed'`. Phase 4 swaps in the real categorization agent.
+
+### Local scouts (unchanged behaviorally)
+
+Local directory monitor keeps current Electron-side scheduling and auto-creation. Only renames apply.
+
+### SourceConnector abstraction
+
+A `SourceConnector` Protocol owns all source-specific I/O. The shared `ScoutEngine` owns triage, queueing, delivery, and ack handling. To add a new cloud scout: implement one connector class + register it.
+
+```python
+# app/scouts/connectors/base.py
+class SourceConnector(Protocol):
+    source_type: str  # "gmail"
+
+    async def list_new(self, scout: CloudScoutConfig) -> list[ItemRef]: ...
+    async def fetch_metadata(self, scout: CloudScoutConfig, ref: ItemRef) -> ItemMetadata: ...
+    async def fetch_content(self, scout: CloudScoutConfig, ref: ItemRef) -> ItemContent: ...
+    async def archive(self, scout: CloudScoutConfig, ref: ItemRef) -> None: ...
+    async def setup_watch(self, scout: CloudScoutConfig) -> None: ...
+    async def renew_watch(self, scout: CloudScoutConfig) -> None: ...
+```
+
+`ItemContent.body_text` is in-memory only; never persisted.
+
+### ScoutEngine
+
+```python
+class ScoutEngine:
+    async def trigger_scout(self, scout_id: UUID) -> None: ...
+    async def _process_item(self, scout, connector, ref) -> None: ...
+    async def deliver_pending(self, user_id: UUID, ws: DeviceWS) -> None: ...
+```
+
+Both webhook and cron-fallback entry points call `trigger_scout`.
+
+## Data Model
+
+### Postgres (BE)
+
+#### Renames (Phase 1, single Alembic migration)
+
+| Before                         | After                          |
+|--------------------------------|--------------------------------|
+| Table `local_agent_configs`    | `local_scout_configs`          |
+| Table `cloud_agent_configs`    | `cloud_scout_configs`          |
+| Table `agent_run_logs`         | `scout_run_logs`               |
+| Column `agent_config`          | `scout_config`                 |
+| Column `agent_id` (FKs)        | `scout_id`                     |
+| Column `agent_run_id`          | `scout_run_id`                 |
+| Class `LocalAgentConfig`       | `LocalScoutConfig`             |
+| Class `CloudAgentConfig`       | `CloudScoutConfig`             |
+| Class `AgentRunLog`            | `ScoutRunLog`                  |
+
+#### New (Phase 2)
+
+```sql
+CREATE TABLE scout_triage_queue (
+  id              uuid PRIMARY KEY,
+  user_id         uuid NOT NULL REFERENCES users(id),
+  scout_id        uuid NOT NULL REFERENCES cloud_scout_configs(id),
+  source_type     text NOT NULL,                   -- "gmail"
+  source_msg_ref  text NOT NULL,                   -- gmail message id
+  triage_verdict  text NOT NULL,                   -- "relevant"
+  triage_reason   text,                            -- short LLM reason for debug
+  status          text NOT NULL DEFAULT 'queued',  -- queued | delivered | acked | expired
+  triaged_at      timestamptz NOT NULL DEFAULT now(),
+  delivered_at    timestamptz,
+  acked_at        timestamptz,
+  expires_at      timestamptz NOT NULL,            -- triaged_at + 30d
+  UNIQUE (scout_id, source_msg_ref)                -- idempotent webhook retries
+);
+CREATE INDEX ON scout_triage_queue (user_id, status);
+CREATE INDEX ON scout_triage_queue (expires_at) WHERE status != 'acked';
+```
+
+#### Alterations to `cloud_scout_configs` (Phase 2)
+
+```sql
+ALTER TABLE cloud_scout_configs ADD COLUMN auto_trash_spam boolean NOT NULL DEFAULT false;
+ALTER TABLE cloud_scout_configs ADD COLUMN gmail_history_id text;
+ALTER TABLE cloud_scout_configs ADD COLUMN gmail_watch_expires_at timestamptz;
+ALTER TABLE cloud_scout_configs ADD COLUMN device_inactivity_pause_days int NOT NULL DEFAULT 14;
+```
+
+OAuth tokens continue to live in the existing `cloud_scout_configs.oauth_token_encrypted` column. Encryption mechanism (key derivation, rotation) is reused unchanged. A pre-implementation investigation step will document the current key-management story so we know the threat model; hardening, if needed, is out of scope.
+
+### SQLite (Electron, Drizzle)
+
+#### Renames (Phase 1)
+
+| Before              | After               |
+|---------------------|---------------------|
+| `agent_runs`        | `scout_runs`        |
+| `agent_run_actions` | `scout_run_actions` |
+| Col `agent_id`      | `scout_id`          |
+
+#### New (Phase 2)
+
+```typescript
+export const scoutSuggestions = sqliteTable('scout_suggestions', {
+  id:                  text().primaryKey(),
+  scoutId:             text().notNull(),
+  sourceType:          text().notNull(),     // "gmail"
+  sourceMsgRef:        text().notNull(),
+  category:            text().notNull(),     // "unprocessed" until Phase 4
+  payload:             text(),               // JSON, populated by Phase 4
+  rawSubject:          text(),               // populated on delivery
+  rawSnippet:          text(),               // populated on delivery
+  status:              text().notNull(),     // pending | approved | rejected | expired
+  proposedAt:          integer().notNull(),  // ms epoch
+  resolvedAt:          integer(),
+  resolvedEntityType:  text(),               // "task" | "project" | ... after Phase 4 approval
+  resolvedEntityId:    text(),
+});
+```
+
+`rawSubject` + `rawSnippet` are stored locally to render the HITL card without re-hitting Gmail every render. Body is still NOT stored — fetched on-demand via a tool call when the user explicitly opens the suggestion.
+
+## WebSocket Frame Contract
+
+Existing `/api/v1/device` channel. Two new frame types.
+
+```typescript
+// BE → Electron
+{
+  type: 'scout_proposal',
+  proposal: {
+    id: string,
+    scoutId: string,
+    sourceType: 'gmail',
+    sourceMsgRef: string,
+    rawSubject: string | null,
+    rawSnippet: string | null,
+    category: 'unprocessed',
+    payload: null
+  }
+}
+
+// Electron → BE
+{ type: 'scout_proposal_ack', proposalId: string }
+```
+
+On WS reconnect, BE's `ScoutEngine.deliver_pending(user_id, ws)` selects all `status='queued'` rows for the user, calls `connector.fetch_metadata` per row (subject + snippet only), sends one `scout_proposal` frame each, and flips `status='delivered'` + sets `delivered_at` upon ack.
+
+## Stage 1 Triage Detail
+
+```
+Webhook (Pub/Sub) or cron tick
+  -> ScoutEngine.trigger_scout(scout_id)
+  -> if device inactive > N days: skip (pause)
+  -> connector.list_new(scout) -> [ItemRef]
+  -> for each ref:
+       - if (scout_id, source_msg_ref) already in queue: skip (idempotent)
+       - content = await connector.fetch_content(scout, ref)         # transient
+       - verdict = await ScoutEngine._triage_llm(scout, content)     # gpt-4o-mini
+       - if verdict == spam:
+           - if scout.auto_trash_spam: connector.archive(...)
+           - return                                                  # not queued
+       - INSERT scout_triage_queue row
+  -> UPDATE cloud_scout_configs.last_run_at
+  -> INSERT scout_run_logs row
+```
+
+### Triage LLM contract
+
+- **Prompt name (Langfuse):** `scout-triage-system` — source-agnostic, parameterized by `source_type`.
+- **Input:** `{source_type, scout_name, scout_purpose, item_subject, item_sender, item_body_truncated_2k}`.
+- **Output (structured, Pydantic `TriageVerdict`):** `{verdict: "relevant" | "spam", reason: str, confidence: float}`.
+- **Cost guard:** body truncated at 2k chars before LLM call.
+
+### Failure modes
+
+- LLM call fails: log error, leave message unprocessed, retry on next webhook/cron.
+- Gmail 401 (refresh exhausted): mark scout `enabled=false`, surface re-auth prompt to user via WS frame on next device connect.
+- Pub/Sub webhook unverified JWT: 401.
+
+## Gmail Push Setup
+
+- On scout enable: `GmailConnector.setup_watch(scout)` calls `users.watch` against a single project-wide Pub/Sub topic.
+- `gmail_watch_expires_at` stored. Watches expire after 7 days.
+- Weekly cron `_scout_watch_renewal_tick` re-issues `watch` for any scout whose expiry is within 24h.
+- Webhook route: `POST /api/v1/scouts/webhooks/gmail`. Verifies Pub/Sub-signed JWT, resolves user via the email address in the payload, enqueues triage job.
+- Cron fallback (`_scout_cron_tick`, runs each scout's `schedule_cron`): polls `users.history.list` since `gmail_history_id`, updates `gmail_history_id` after.
+
+## Terminology Refactor (Detail)
+
+### Renamed
+
+| Surface           | Before                                              | After                                               |
+|-------------------|-----------------------------------------------------|-----------------------------------------------------|
+| Settings nav      | `settings.agents` "Agents"                          | `settings.scouts` "Scouts"                          |
+| Subtitle/desc     | `settings.agentsSubtitle`, `agentsDescription`      | `settings.scoutsSubtitle`, `scoutsDescription`      |
+| `agents.*` keys   | `noAgentsYet`, `createAgent`, `yourAgents`, etc.    | `scouts.noScoutsYet`, `createScout`, `yourScouts`   |
+| `toast.agent.*`   | `created`, `runStarted`, etc.                       | `toast.scout.*`                                     |
+| Components        | `AgentsSection`, `AgentRow`, `LocalAgentConfigPanel`, `CloudAgentConfigPanel`, `InlineAgentCreationStepper` | `ScoutsSection`, `ScoutRow`, `LocalScoutConfigPanel`, `CloudScoutConfigPanel`, `InlineScoutCreationStepper` |
+| TS types          | `LocalAgentConfig`, `CloudAgentConfig`              | `LocalScoutConfig`, `CloudScoutConfig`              |
+| tRPC router       | `agent.local`, `agent.cloud`, `agent.journey`, `agent.runs`, `agent.runActions` | `scout.local`, `scout.cloud`, `scout.journey`, `scout.runs`, `scout.runActions` |
+| Drizzle tables    | `agent_runs`, `agent_run_actions`                   | `scout_runs`, `scout_run_actions`                   |
+| Main process      | `src/main/agents/agent-scheduler.ts`                | `src/main/scouts/scout-scheduler.ts`                |
+| BE routes         | `/api/v1/agents/*`, `/api/v1/agent-setup`           | `/api/v1/scouts/*`, `/api/v1/scout-setup`           |
+| BE modules        | `routes/agents.py`, `routes/agent_setup.py`, `core/agent_runner.py`, `core/agent_session_buffer.py`, `core/agent_registry.py` | `routes/scouts.py`, `routes/scout_setup.py`, `core/scout_runner.py`, `core/scout_session_buffer.py`, `core/scout_registry.py` |
+| Postgres tables   | `local_agent_configs`, `cloud_agent_configs`, `agent_run_logs` | `local_scout_configs`, `cloud_scout_configs`, `scout_run_logs` |
+| Postgres columns  | `agent_config`, `agent_id`, `agent_run_id`          | `scout_config`, `scout_id`, `scout_run_id`          |
+| SQLAlchemy models | `LocalAgentConfig`, `CloudAgentConfig`, `AgentRunLog` | `LocalScoutConfig`, `CloudScoutConfig`, `ScoutRunLog` |
+| Langfuse prompts  | user-facing scout prompts named `agent-*`           | recreate as `scout-*`; delete old                   |
+| i18n              | 5 langs (en/it/es/fr/de)                            | all updated atomically                              |
+
+### Kept as-is
+
+- `app/agents/*` Python module — these are LLM helper agents (task_agent, project_agent, note_agent, timeline_agent, filesystem_agent) invoked internally by `deep_agent`. Different concept from user-facing scouts. Renaming would create semantic clash with LLM-agent terminology.
+- `/api/v1/device` WS endpoint name (already source-neutral).
+- All `tool_call`, `run_complete`, etc. WS frame types unrelated to scouts.
+
+## Phasing
+
+### Phase 1 — Rename only
+- Single PR. Single Alembic migration. Single Drizzle migration.
+- All renames listed above land together. App still works, existing local scout still runs. No new behavior.
+
+### Phase 2 — Connector abstraction skeleton
+- New module `app/scouts/connectors/{base,registry,gmail}.py`.
+- New module `app/scouts/engine.py`.
+- New table `scout_triage_queue` + alterations to `cloud_scout_configs`.
+- New SQLite table `scout_suggestions` (Drizzle).
+- New WS frame types `scout_proposal` + `scout_proposal_ack`.
+- No user-facing change yet.
+
+### Phase 3 — Gmail scout end-to-end
+- Settings UI: "Add Gmail scout" → OAuth consent (separate scope set: `gmail.readonly` + `gmail.modify`) → encrypted token stored in `cloud_scout_configs.oauth_token_encrypted` → save scout config.
+- Pub/Sub topic + webhook route + JWT verify.
+- `setup_watch` on enable; weekly `renew_watch` cron.
+- Cron-fallback `_scout_cron_tick` per scout.
+- Triage LLM (gpt-4o-mini, Langfuse `scout-triage-system`).
+- Spam auto-trash toggle (default off) per scout.
+- Device-inactivity pause logic.
+- WS deliver-on-reconnect drains queue → `scout_proposal` frames → ack handler → SQLite `scout_suggestions` insert with `category='unprocessed'` (Phase 4 swaps real categorization in).
+- "Read full email" tool call: Electron requests body for a suggestion → BE `GmailConnector.fetch_content` → returns body transiently in tool result.
+
+### Phase 4 — Deferred (separate spec, with task-brief rework)
+- Stage 2 categorization agent (prompt + tool palette: `list_projects`, `list_tasks`, `search_notes`, memory).
+- HITL UI surface in the brief: suggestion cards, approve/reject controls, "convert to task | event | note | project | actionable-only" actions.
+- `list_pending_scout_suggestions` brief tool.
+- Convert-to-entity mutations.
+- Future connectors (Slack/Teams/Outlook/...).
+
+## Testing Surface
+
+- **Phase 1:** existing pytest suite still green with renamed identifiers (auth, ws_unified, schemas, models, etc.). UI smoke: settings page renders, existing local scout runs.
+- **Phase 2:** unit tests for `ScoutEngine` w/ mocked `SourceConnector`. Idempotency test (replay same `source_msg_ref`).
+- **Phase 3:** integration tests for Gmail webhook → triage → queue insertion (mocked `GmailConnector` for content fetch and LLM). E2E (manual): connect a real Gmail account on dev, send an email, observe queue row appear, reconnect device, observe `scout_suggestions` row land with subject/snippet.
+
+## Open Questions (none blocking)
+
+- OAuth-token encryption key derivation (app-global vs user-derived) — investigation step in implementation plan; document current state, security hardening is out of scope.
+- Pub/Sub topic naming and IAM setup (one topic project-wide vs per-environment) — operational detail to decide during Phase 3.
+
+## Risks
+
+- Pub/Sub setup is per-Google-Cloud-project and requires console IAM grants — first-time setup friction.
+- Gmail `users.watch` quota: 1 watch per user. We use one watch per scout, but a user has only one Gmail scout per Gmail account, so this is fine.
+- `_pending_states` dict pattern in existing OAuth flow is in-memory — Pub/Sub webhook can run on any worker, so any cross-request state must be in DB, not in-memory. This design uses no in-memory state; safe.
+
+## Acceptance
+
+- All renames land atomically; app boots; existing local scout still operates.
+- A user can connect Gmail through the Scouts settings page, see the scout marked enabled, send themselves a test email, and observe a `scout_suggestions` row appear in their local DB with `category='unprocessed'`, `rawSubject`, and `rawSnippet` populated, after the next WS reconnect.
+- Spam emails (per LLM triage) are not queued; if `auto_trash_spam=true` they appear in Gmail Trash.
+- BE never persists email bodies. Verified by code review of triage flow + grep for `body_text` writes.