docs: add cloud scout creation flow design spec
Branches scout creation: Gmail gets slim flow (name + focus + auto-trash + label/sender filter, OAuth during creation) fitting two-stage HITL pipeline. Local-directory flow untouched. Config panel rewritten for edit parity. Adds gmail_address column, label-list + disconnect routes, serializer oauthConnected/filterConfig fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,186 @@
|
|||||||
|
# Cloud Scout Creation Flow (Gmail) — Design
|
||||||
|
|
||||||
|
**Date:** 2026-05-16
|
||||||
|
**Status:** Draft, awaiting user review
|
||||||
|
**Owner:** Roberto
|
||||||
|
**Predecessor:** [2026-05-15-scouts-refactor-and-gmail-integration-design.md](2026-05-15-scouts-refactor-and-gmail-integration-design.md) (Phases 1–3 shipped)
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
The scout creation stepper (`InlineScoutCreationStepper`) and the cloud config panel (`CloudScoutConfigPanel`) still expose the pre-refactor local-agent config shape — a data-type picker, a batch-interval select, and a user-authored extraction-prompt builder — for **all** scouts, including Gmail. These fields contradict the two-stage HITL pipeline shipped in Phases 1–3:
|
||||||
|
|
||||||
|
- **Categorization is automatic and deferred to Phase 4.** The scout categorizes every relevant email itself (task / event / note / project) and proposes via the brief. Users do not pick extraction types.
|
||||||
|
- **The triage prompt is server-side and IP-protected** (Langfuse `scout-triage-system`, zero-trust). Users must not author it.
|
||||||
|
- **Gmail is push-primary** (Pub/Sub `watch`). The cron schedule is a fallback only; surfacing it as a "batch interval" implies emails arrive every N hours, which is false.
|
||||||
|
|
||||||
|
This design branches the creation flow so cloud scouts (Gmail) get a slim flow fitting the new pipeline — name, focus text, spam auto-trash, and a label/sender filter, with OAuth performed during creation — while local-directory scouts keep their current full flow untouched. The cloud config panel is rewritten to match (full edit parity).
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
- Cloud (Gmail) creation collects only fields the new pipeline actually uses.
|
||||||
|
- OAuth happens during creation; the scout is live immediately on connect.
|
||||||
|
- A label + sender filter lets the user scope which emails the scout watches.
|
||||||
|
- A free-text "focus" field steers triage (`scout_purpose`) without exposing the prompt.
|
||||||
|
- The cloud config panel offers full edit parity (focus, filter, auto-trash, connection management).
|
||||||
|
- Fix the Phase-3 follow-up: the BE cloud serializer now returns `oauthConnected`, `filterConfig`, and `gmail_address`.
|
||||||
|
|
||||||
|
## Non-Goals
|
||||||
|
|
||||||
|
- Stage-2 categorization agent and the brief HITL surface (Phase 4, separate spec).
|
||||||
|
- Teams / Outlook slim flows. They share the cloud branch path but their connectors don't exist yet; their catalog cards are disabled with a "coming soon" marker.
|
||||||
|
- Changes to the local-directory creation flow (untouched).
|
||||||
|
- `date_range` filter (a watch is ongoing, not time-bounded).
|
||||||
|
- New pending-token OAuth machinery — we reuse the existing scout-id-bound OAuth by creating the scout at the connect step.
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
- Pre-1.0 dev — no production users, no migration shims beyond what Alembic needs.
|
||||||
|
- Zero-trust + IP-protected triage prompt preserved — the focus field maps to `scout_purpose`, never to the raw prompt.
|
||||||
|
- Reuse existing OAuth (`startGmailOAuth` / `completeGmailOAuth` + deep-link callback) and existing `GmailClient._build_gmail_query` (already reads `labels` + `senders`).
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Stepper branch
|
||||||
|
|
||||||
|
`InlineScoutCreationStepper` becomes a thin router. The template-pick step (Step 1) stays shared. After a template is chosen, the stepper delegates by `selectedTemplate.type`:
|
||||||
|
|
||||||
|
- `local_directory` → `LocalScoutCreationFlow` (the current 3-step body, extracted verbatim).
|
||||||
|
- cloud (Gmail) → `CloudScoutCreationFlow` (new).
|
||||||
|
|
||||||
|
This extraction keeps each flow in its own focused component rather than piling `if (cloud)` branches through the existing local logic.
|
||||||
|
|
||||||
|
### Cloud (Gmail) flow
|
||||||
|
|
||||||
|
```
|
||||||
|
Step 1 (shared): Choose template → Gmail
|
||||||
|
Step 2 — Connect & Basics:
|
||||||
|
- Name (required)
|
||||||
|
- Focus text (optional) → prompt_template → scout_purpose
|
||||||
|
- Auto-trash spam toggle (off) → auto_trash_spam
|
||||||
|
- [Connect Gmail] button:
|
||||||
|
1. scout.cloud.create({ name, provider:'gmail', dataTypes:[],
|
||||||
|
promptTemplate, autoTrashSpam, filterConfig:{} })
|
||||||
|
→ returns scout id (dormant, no token yet)
|
||||||
|
2. startGmailOAuth({ scoutId }) → browser consent
|
||||||
|
3. deep-link callback → completeGmailOAuth({ code, state })
|
||||||
|
→ token stored, setup_watch fires, gmail_address persisted → scout live
|
||||||
|
Step 3 — Filter (post-connect):
|
||||||
|
- scout.cloud.gmailLabels({ scoutId }) → populate label multi-select
|
||||||
|
- Labels multi-select + sender/domain allowlist chips
|
||||||
|
- [Save] → scout.cloud.update({ id, filterConfig:{ labels, senders } })
|
||||||
|
- [Skip] → leaves filter empty (watch all INBOX)
|
||||||
|
- [Finish] → closes stepper, invalidates scout lists
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why create at the connect step:** the existing BE OAuth flow binds the token to an existing `scout_id` (`/scouts/oauth/gmail/authorize?scout_id=…`). Creating the dormant scout at connect reuses that flow with zero new machinery. The filter is applied as an `update` after connect — within the same stepper, it reads as one continuous flow.
|
||||||
|
|
||||||
|
### Abandon handling
|
||||||
|
|
||||||
|
- Bail **before** connect → dormant unconnected scout row remains; its row shows the same "Connect Gmail" CTA (identical to today's post-create connect path).
|
||||||
|
- Bail **after** connect, before filter → live INBOX-wide scout (empty filter). Functional; editable later in the config panel.
|
||||||
|
|
||||||
|
Both are acceptable pre-1.0; neither leaves corrupt state.
|
||||||
|
|
||||||
|
## Fields & Data Mapping
|
||||||
|
|
||||||
|
| UI field | Step | Stored as | Notes |
|
||||||
|
|----------|------|-----------|-------|
|
||||||
|
| Name | 2 | `name` | required |
|
||||||
|
| Focus text | 2 | `prompt_template` | free-text → `scout_purpose` in triage; optional |
|
||||||
|
| Auto-trash spam | 2 | `auto_trash_spam` | toggle, default **off** |
|
||||||
|
| Labels | 3 | `filter_config.labels: string[]` | multi-select of fetched Gmail labels; empty = all INBOX |
|
||||||
|
| Senders | 3 | `filter_config.senders: string[]` | chips (`alice@x.com` or `@client.co`); optional |
|
||||||
|
|
||||||
|
**Dropped from cloud:** data-types picker (`dataTypes` sent as `[]`), batch interval (`scheduleCron` omitted → BE default), extraction-prompt builder (`PromptBuilderChat` not rendered for cloud).
|
||||||
|
|
||||||
|
`filter_config = { labels?: string[], senders?: string[] }` — matches the existing `GmailClient._build_gmail_query`, so no query-builder change is needed; we just populate the config from the UI instead of sending `{}`.
|
||||||
|
|
||||||
|
## Backend Changes
|
||||||
|
|
||||||
|
### 1. Gmail label listing
|
||||||
|
- `GmailConnector.list_labels(scout) -> list[dict]` — calls `users().labels().list()`, returns `[{id, name}]` (user + system labels), wrapped in `asyncio.to_thread`. Returns `[]` if no token.
|
||||||
|
- Route `GET /api/v1/scouts/cloud/{scout_id}/gmail-labels` — auth-guarded, loads scout (ownership check), calls connector.
|
||||||
|
- tRPC `scout.cloud.gmailLabels({ scoutId })` → `proxyGet`.
|
||||||
|
|
||||||
|
### 2. Gmail disconnect / stop watch
|
||||||
|
- `GmailConnector.stop_watch(scout)` — calls `users().stop()`, swallows errors (watch may already be expired).
|
||||||
|
- Route `POST /api/v1/scouts/cloud/{scout_id}/gmail-disconnect` — clears `oauth_token_encrypted`, nulls `gmail_history_id` + `gmail_watch_expires_at` + `gmail_address`, sets `enabled=false`, calls `stop_watch`.
|
||||||
|
- tRPC `scout.cloud.disconnectGmail({ scoutId })`.
|
||||||
|
|
||||||
|
### 3. `scout.cloud.create` input — loosen + extend
|
||||||
|
- `scheduleCron`: required → **optional** (BE applies its default when omitted).
|
||||||
|
- `dataTypes`: stays; cloud sends `[]`.
|
||||||
|
- Add `autoTrashSpam?: boolean` (default `false`).
|
||||||
|
- `promptTemplate`: already present (carries focus text).
|
||||||
|
- `filterConfig`: already present (now populated).
|
||||||
|
- BE `POST /scouts/cloud` must accept + persist `auto_trash_spam`.
|
||||||
|
|
||||||
|
### 4. `scout.cloud.update` input — extend for config-panel parity
|
||||||
|
- Add `autoTrashSpam?`, `promptTemplate?`, `filterConfig?` (all optional, partial update).
|
||||||
|
- BE `PUT/PATCH /scouts/cloud/{id}` must apply these columns.
|
||||||
|
|
||||||
|
### 5. Cloud serializer — return the new fields
|
||||||
|
The BE cloud list/get serializer must return:
|
||||||
|
- `auto_trash_spam`
|
||||||
|
- `filter_config`
|
||||||
|
- `prompt_template`
|
||||||
|
- `gmail_address`
|
||||||
|
- computed `oauthConnected = oauth_token_encrypted is not None`
|
||||||
|
|
||||||
|
(`oauthConnected` was added to the TS type in Phase 3 but never populated by the BE — fixed here.)
|
||||||
|
|
||||||
|
### 6. `gmail_address` column — Alembic 009
|
||||||
|
- `ALTER TABLE cloud_scout_configs ADD COLUMN gmail_address VARCHAR(320) NULL`.
|
||||||
|
- Populated on OAuth callback from the Gmail profile (`users().getProfile().emailAddress` or OIDC `userinfo` email).
|
||||||
|
- SQLAlchemy model field `gmail_address: Mapped[str | None]`.
|
||||||
|
|
||||||
|
### Shared TS type `CloudScoutConfig`
|
||||||
|
Add: `autoTrashSpam: boolean`, `filterConfig?: { labels?: string[]; senders?: string[] }`, `promptTemplate?: string`, `gmailAddress?: string | null`. (`oauthConnected` already present.)
|
||||||
|
|
||||||
|
## Config Panel Parity (`CloudScoutConfigPanel`)
|
||||||
|
|
||||||
|
Rewrite the expanded edit view to the slim model:
|
||||||
|
|
||||||
|
- **Connection status block:**
|
||||||
|
- Not connected → amber "Connect Gmail" CTA (existing `startGmailOAuth`).
|
||||||
|
- Connected → "Connected as `<gmailAddress>`" + "Reconnect" (re-run `startGmailOAuth`) + "Disconnect" (`disconnectGmail`).
|
||||||
|
- **Focus text** — editable textarea bound to `prompt_template`.
|
||||||
|
- **Filter** — label multi-select (via `scout.cloud.gmailLabels`) + sender chips, bound to `filter_config`.
|
||||||
|
- **Auto-trash spam** — toggle bound to `auto_trash_spam`.
|
||||||
|
- **Save changes** — single `scout.cloud.update({ id, promptTemplate, filterConfig, autoTrashSpam })`.
|
||||||
|
|
||||||
|
**Removed:** data-types checkboxes, schedule select, "Customize AI prompt" journey button.
|
||||||
|
|
||||||
|
## Catalog Gating (Teams / Outlook)
|
||||||
|
|
||||||
|
The catalog currently shows Local Directory, Gmail, Teams, Outlook cards. The cloud branch only implements Gmail. Teams and Outlook cards are rendered **disabled** with a "coming soon" marker until their connectors exist, preventing a user from entering a half-built cloud flow for an unimplemented provider.
|
||||||
|
|
||||||
|
## i18n
|
||||||
|
|
||||||
|
New keys in all 5 languages (`en/it/es/fr/de`):
|
||||||
|
`scouts.focusLabel`, `scouts.focusPlaceholder`, `scouts.autoTrashSpam`, `scouts.autoTrashHint`, `scouts.filterLabels`, `scouts.filterSenders`, `scouts.filterSendersPlaceholder`, `scouts.watchAllInbox`, `scouts.connectedAs`, `scouts.reconnect`, `scouts.disconnect`, `scouts.skipFilter`, `scouts.finish`, plus the cloud stepper step headers (currently hardcoded English in the stepper — extracted to keys during the branch).
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
- **BE unit:** `GmailConnector.list_labels` + `stop_watch` (mocked Gmail service). `scout.cloud.create` with omitted `scheduleCron` applies the default and persists `auto_trash_spam`. Cloud serializer returns `oauthConnected` + `filterConfig` + `gmail_address`.
|
||||||
|
- **BE migration:** Alembic 009 revision-graph check (head = 009, parent = 008).
|
||||||
|
- **Electron:** no test suite — `tsc --noEmit` + manual smoke: create a Gmail scout end-to-end, connect, pick labels, confirm a `cloud_scout_configs` row with the focus/filter/auto-trash values and a populated `gmail_address`.
|
||||||
|
|
||||||
|
## Acceptance
|
||||||
|
|
||||||
|
- Creating a Gmail scout shows only name + focus + auto-trash, then a Connect step, then a label/sender filter step — no data-type picker, no batch interval, no extraction-prompt builder.
|
||||||
|
- After connect, the scout row shows "Connected as `<email>`".
|
||||||
|
- The config panel edits focus, filter, and auto-trash, and can disconnect/reconnect.
|
||||||
|
- Local-directory scout creation is unchanged.
|
||||||
|
- Teams/Outlook cards are visibly disabled.
|
||||||
|
- BE cloud list returns `oauthConnected`, `filterConfig`, `gmail_address`.
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
None blocking.
|
||||||
|
|
||||||
|
## Risks
|
||||||
|
|
||||||
|
- **Label fetch latency:** `users().labels().list()` is one extra round-trip after OAuth. Acceptable; show a loading state on the multi-select.
|
||||||
|
- **Dormant-scout litter:** abandoned flows leave dormant/unfiltered scouts. Pre-1.0 acceptable; a future cleanup job could prune never-connected scouts older than N days.
|
||||||
|
- **`gmail_address` PII:** stored plaintext (it's the user's own address, already in their JWT identity). Not sensitive beyond existing storage.
|
||||||
Reference in New Issue
Block a user