Files
workspace/docs/plan-brief-agent.md
2026-04-19 14:49:36 +02:00

511 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Dedicated Brief Agent (Home + Project)
> Ralph-loop plan. Execute one phase per iteration. Each phase is self-contained:
> its **Files**, **Tasks**, **Acceptance**, and **Verify** blocks are everything
> the agent needs to finish that phase without re-reading earlier phases.
> Mark `- [x]` as you complete tasks. Do not start phase N+1 until phase N's
> **Acceptance** is fully met.
## Environment
All Python commands in this plan (`pytest`, `python`, `ruff`, `alembic`) must
be run inside the `api/` project's virtualenv at `api/.venv`.
- **bash / WSL / macOS / Linux**: `source api/.venv/bin/activate` (or prefix
commands with `api/.venv/bin/python -m ...`)
- **Windows PowerShell**: `api\.venv\Scripts\Activate.ps1`
- **Windows bash shell (this repo's default)**: `source api/.venv/Scripts/activate`
Do **not** use the system Python or a globally installed `pytest`/`ruff`
dependencies are pinned inside the venv. Every `Verify` step below assumes the
venv is active.
---
## Context
Today the daily brief is produced by the `home-agent` with a prompt stuffed into
`sendHomeRequest()` at [adiuvAI/src/main/ai/orchestrator.ts:160](adiuvAI/src/main/ai/orchestrator.ts#L160). This
couples two very different jobs (chat vs summarisation) into one agent and one
prompt. Worse: the home agent is wired to emit XML tag wrappers (`<task>`,
`<timeline>`, `<project>`) for the UI component renderer — wrappers the brief
does not want. We filter them out in post, but the LLM still pays tokens for
them and sometimes leaks malformed tags.
We want a **dedicated brief agent** that:
- Runs in **two modes**`home` (daily brief) and `project` (per-project status brief).
- Produces **plain text only** — no XML/HTML tag wrappers, no bracketed id lists.
- Uses the **same infra pattern** as the other agents: `get_agent_llm(...)`,
`.env` override, Langfuse prompt via `get_prompt_or_fallback()`,
Langfuse tracing via `langfuse_context` + generation observations.
- Is **memory-aware** — core memory and relational memory are injected into the
system prompt so the brief can say "Client X usually pays late — your invoice
is still out" instead of a generic list.
- Is **read-only** — no create/update/delete tools. Tool surface is the minimum
needed to answer "what needs attention right now?".
---
## Architecture
```
Electron ─ WsFrame{type:"brief_request", mode:"home"|"project", project_id?} ─► device_ws
core/brief_agent.py
├── run_home_brief()
└── run_project_brief()
read-only tool subset │
(tasks, projects, notes, │
timelines, memory get) │
plain-text stream ▲
back to renderer ─┘
```
**Key decisions:**
- New WS frame type `brief_request`*not* a reuse of `home_request` — so the
frame payload stays small and typed, and the server can pick the right agent
without sniffing the prompt.
- Read-only tools only. Give the LLM access to the same data the UI sees
(tasks, projects, notes, timelines, memory **get**). No mutating tools, no
memory-write tools — a brief should never change state.
- `resolved_project_id` is passed explicitly in the request payload for the
`project` mode (no LLM-side resolution). The `home` mode omits it.
- Both modes stream — the UI already has streaming rendering for the home
brief; we reuse the same pattern for the project brief card.
---
## Improved prompts
The current prompt tries to do everything in one paragraph. These split the job
into role, data rules, voice, and output contract — the structure the model
actually follows.
### `home_brief` (Langfuse prompt, label `production`)
```
You are the user's personal assistant producing a short daily brief.
ROLE
Act like a calm, attentive secretary writing a stand-up note for your boss.
Warm and human, never breezy. Never cheerful filler, never emojis, never
"here is your brief" meta-text. The user is opening the app mid-workday and
is probably stressed — your job is to lower cognitive load, not add noise.
TOOLS — always call before writing
Pull fresh data every run. Do not invent counts or titles. Use at minimum:
- list_tasks_due_today — tasks the user owes today
- list_timeline_events_today — events starting or ending today
- list_active_projects — projects currently in progress or at risk
- memory_list_blocks / memory_get — personal context about people, clients,
payment habits, working preferences
If a tool returns nothing, simply omit that topic. Never report zeros.
WHAT TO INCLUDE
1. Tasks due today (title + priority; group the 12 most important).
2. Timeline events starting or ending today (and anything that starts/ends
tomorrow if the user has a very light day).
3. Active projects that need a nudge — stalled, blocked, or awaiting input.
4. Memory-aware colour where it sharpens the brief. Examples:
- "Client Rossi tends to pay late — the Acme invoice is 6 days out."
- "You usually dislike meetings before 10:00 — the call at 09:30 is unusual."
Only add a memory line when it changes what the user does. Do not pad.
WHAT TO OMIT
- Zero-counts ("no overdue items", "0 meetings today").
- Statistics ("2 active projects, 3 completed tasks").
- Headers, titles, greetings, sign-offs, dates, emojis, slang.
- Meta-phrases ("here is", "let me know if", "hope this helps").
- XML/HTML tags of any kind. Plain prose only.
LIGHT-DAY CLAUSE
If tasks + events + active-project-nudges together produce fewer than two
sentences of content, also list 12 projects in status `on_hold` or `waiting`
and ask a single, specific question about them — e.g. "Is the Bianchi
redesign still paused, or ready to pick back up?" One question max, grounded
in a real project name.
VOICE
- Calm. Concise. Human. Short sentences.
- Use **bold** sparingly for task titles, project names, and people's names.
- No bullet lists. Flow as 24 sentences of prose.
LENGTH
24 sentences total. Hard cap 4. If the day is truly empty, one sentence.
Respond in the user's language ({{language}}). Today is {{today}}.
```
Variables: `{{language}}` (e.g. "Italian"), `{{today}}` (ISO date).
### `project_brief` (Langfuse prompt, label `production`)
```
You are the project assistant producing a short status brief for ONE project.
ROLE
A senior project manager summarising state-of-play for the owner. Factual,
sharp, forward-looking. Never reassuring filler, never emojis.
SCOPE
Work only with project_id = {{project_id}}. Do not mention or pull data from
other projects. Use tools to fetch fresh data:
- get_project — current status, dates, description
- list_tasks(project_id) — open work, split by status
- list_timeline_events(project_id) — milestones hit, upcoming, overdue
- list_project_notes(project_id) — any recent decisions or blockers
- memory_get — relevant context about the client, collaborators, constraints
STRUCTURE — follow exactly, one short paragraph per section, no headers
1. **State.** One sentence: current phase, health (on track / at risk / blocked),
and why. Cite the concrete signal (overdue milestone, stalled tasks, recent
blocker note).
2. **What's moving.** What was completed or progressed recently. Name specific
tasks or milestones.
3. **Next steps.** The 13 most important things the user should do next, in
priority order. Be concrete — task name, who owns it, when due if known.
If waiting on someone else, name them and what the ask is.
4. **Risks / memory-flagged items.** One line max. Only include when there is
a real risk or a relevant memory (e.g. late-paying client, tight deadline,
scope change). Omit the section entirely if nothing to say.
WHAT TO OMIT
- Zero-counts ("no overdue tasks").
- Generic advice ("keep up the good work").
- Greetings, headers, bullet lists, emojis, sign-offs, meta-phrases.
- XML/HTML tags or bracketed id lists. Plain prose only.
VOICE
- Direct. Factual. No fluff.
- Use **bold** sparingly for task titles, milestone names, and the owner's name.
- Short sentences. Prefer verbs over nouns ("Client review is blocking release"
not "There is a blocker which is the client review").
LENGTH
48 sentences total across the 34 sections. Hard cap 8.
Respond in the user's language ({{language}}). Today is {{today}}.
```
Variables: `{{project_id}}`, `{{language}}`, `{{today}}`.
---
## Phase 1 — Backend config scaffolding
**Goal:** `LLM_MODEL_BRIEF_AGENT` resolvable via `get_agent_llm("brief-agent")`.
**Files**
- `api/app/config/settings.py`
- `api/app/core/llm.py`
- `api/.env.example`
**Tasks**
- [ ] Add field `LLM_MODEL_BRIEF_AGENT: str = ""` to `Settings` after `LLM_MODEL_CLOUD_PROCESSOR`.
- [ ] Add `"brief-agent": lambda: settings.LLM_MODEL_BRIEF_AGENT or settings.LLM_MODEL` entry to `_AGENT_MODEL_SETTINGS` in `llm.py`.
- [ ] Add a commented-out `LLM_MODEL_BRIEF_AGENT=` block in `.env.example`, with a 2-line description mirroring the existing style ("Brief-agent — produces home and project text briefs. A small model (e.g. gpt-4o-mini) is sufficient.").
**Acceptance**
- `python -c "from app.core.llm import model_for_agent; print(model_for_agent('brief-agent'))"` prints the default model (matches `LLM_MODEL`) when the override is empty; prints the override when set.
- `ruff check .` passes.
**Verify**
- `cd api && source .venv/Scripts/activate && python -c "from app.core.llm import model_for_agent; print(model_for_agent('brief-agent'))"`
- `cd api && source .venv/Scripts/activate && ruff check .`
---
## Phase 2 — Brief-agent module (read-only tool subset)
**Goal:** `run_home_brief()` and `run_project_brief()` callables exist and work
end-to-end against a live backend, producing plain-text streams. No WS wiring
yet — exercised via a `scripts/smoke_brief.py` one-liner.
**Files (new)**
- `api/app/core/brief_agent.py`
**Files (touched)**
- `api/app/agents/task_agent.py` — export a `TASK_READ_TOOLS` list
(`list_tasks`, `list_tasks_due_today`, `list_task_comments`).
- `api/app/agents/project_agent.py` — export a `PROJECT_READ_TOOLS` list
(`list_projects`, `list_all_projects`, `get_project`).
- `api/app/agents/timeline_agent.py` — export a `TIMELINE_READ_TOOLS` list
(`list_timelines`, plus a new `list_timelines_today` that filters by today
— add it alongside the existing tools) and a `list_timeline_events` alias
scoped by `project_id`.
- `api/app/agents/note_agent.py` — export a `NOTE_READ_TOOLS` list
(`list_notes`, `get_note`).
**Tasks**
- [x] Add the four `*_READ_TOOLS` exports in the agent files. Do not remove the
existing `*_TOOLS` exports — the chat agents still use them.
- [x] Add `list_timelines_today` in `timeline_agent.py`: returns only timelines
whose `date` falls on today (UTC). Mirror the shape of
`list_tasks_due_today`.
- [x] Create `brief_agent.py` with:
- Module-level fallback prompt constants `_HOME_BRIEF_FALLBACK` and
`_PROJECT_BRIEF_FALLBACK` — copy the prompts from the plan above verbatim,
using `{language}` / `{today}` / `{project_id}` (single-brace) so
`.format()` works when Langfuse is unavailable.
- Read-only memory tools subset: reuse `_memory_tools()` from `deep_agent.py`
but filter to `memory_list_blocks`, `memory_get`, `archival_memory_search`,
`conversation_search`. Factor out a small helper `_read_only_memory_tools()`
in `deep_agent.py` (or duplicate locally — keep it simple).
- `async def run_home_brief(user_id, context) -> AsyncGenerator[tuple[str, Any], None]`
- `async def run_project_brief(user_id, project_id, context) -> AsyncGenerator[tuple[str, Any], None]`
- Both reuse `_run_single_agent_stream` from `deep_agent.py` by passing
`agent_name="brief-agent"` and the relevant prompt. Tool list is the
read-only subset.
- Inject `_language_instruction`, `_relational_memory_injection`, and
`_proactive_hints_injection` into the system prompt — same pattern as
`run_home_stream`.
- After rendering the system prompt with `compile_prompt`, append a line
`"\nToday is YYYY-MM-DD."` only if the Langfuse template did not already
include `{{today}}` substitution (safe fallback).
- [x] Do **not** call `_normalize_tagged_list_lines` on the output — the brief
prompt forbids tags, so skipping the post-processor is a deliberate signal
of correctness.
**Acceptance**
- Importing `from app.core.brief_agent import run_home_brief, run_project_brief` succeeds.
- A smoke script `scripts/smoke_brief.py` (create it; git-ignore it; OK to delete afterwards) runs `run_home_brief` against a seeded test user and streams text to stdout. Output contains no `<` or `[uuid]` substrings.
- `ruff check .` passes.
**Verify**
- `cd api && source .venv/Scripts/activate && python scripts/smoke_brief.py home`
- `cd api && source .venv/Scripts/activate && python scripts/smoke_brief.py project <uuid>`
- `cd api && source .venv/Scripts/activate && ruff check .`
---
## Phase 3 — WS frame + REST fallback
**Goal:** Electron can send `{type:"brief_request", mode, project_id?}` over
the device WS and receive a plain-text stream. REST `POST /chat/brief` exists
as fallback.
**Files**
- `api/app/schemas.py`
- `api/app/api/routes/device_ws.py`
- `api/app/api/routes/chat.py`
**Tasks**
- [ ] In `schemas.py`: add `brief_request = "brief_request"` to `WsFrameType`,
and a `WsBriefRequest` model with fields
`type: Literal[WsFrameType.brief_request]`, `request_id: str | None`,
`session_id: str | None`, `mode: Literal["home", "project"]`,
`project_id: str | None`.
- [ ] In `device_ws.py`: add an `elif frame_type == WsFrameType.brief_request:`
branch that dispatches to a new `_handle_brief_request` task.
- [ ] Implement `_handle_brief_request` by mirroring `_handle_home_request` but:
- Call `run_home_brief(user_id, context)` when `mode == "home"`,
`run_project_brief(user_id, project_id, context)` when `mode == "project"`
(validate `project_id` is a UUID; send `stream_end` with error frame
otherwise).
- **Skip** episode storage — briefs are not conversations.
- Still run `memory.enrich_context(...)` so relational/proactive memory is
injected.
- [ ] In `chat.py`: add `POST /chat/brief` that accepts `{mode, project_id?}`
and returns the full text (collects stream). This is the offline fallback
path used when the WS is not ready.
**Acceptance**
- Electron smoke client opens the WS, sends a `brief_request` with `mode:"home"`,
and receives `stream_start` → N × `stream_text``stream_end` frames.
- `POST /chat/brief` returns `{response: "..."}`.
- Malformed `project_id` → WS frame `stream_end` with an error message (no server crash).
**Verify**
- Run pytest existing suite: `cd api && source .venv/Scripts/activate && pytest -q`.
- Add one unit test `tests/test_brief_agent.py` covering: home mode returns
non-empty text; project mode with bogus UUID returns an error without
crashing; tools called are from the read-only subset (monkeypatch
`run_home_brief` to assert the tool list).
- Then: `cd api && source .venv/Scripts/activate && pytest tests/test_brief_agent.py -v`.
---
## Phase 4 — Langfuse prompts
**Goal:** `home_brief` and `project_brief` prompts exist in Langfuse at label
`production`, matching the content in this plan.
**Files**
- None (external config via MCP).
**Tasks**
- [x] Use `mcp__langfuse-docs__searchLangfuseDocs` to confirm the text-prompt
variable syntax (`{{variable}}`) and that `label="production"` is the label
read by `get_prompt_or_fallback`.
- [x] Use `mcp__langfuse__createTextPrompt` to create `home_brief` with the
content from the "Improved prompts → home_brief" section above. Set label
to `production`. Variables: `language`, `today`.
- [x] Use `mcp__langfuse__createTextPrompt` to create `project_brief` with the
content from the "Improved prompts → project_brief" section above. Set label
to `production`. Variables: `language`, `today`, `project_id`.
- [x] Use `mcp__langfuse__getPrompt` to round-trip both prompts and verify the
raw template matches what was sent.
**Acceptance**
- Both prompts resolve via `get_prompt_or_fallback("home_brief", "")` and
`get_prompt_or_fallback("project_brief", "")` in a Python shell against the
real Langfuse instance — return a non-empty `raw_template` and a non-None
`prompt_obj`.
- `prompt_obj.compile(language="Italian", today="2026-04-17")` returns text
containing the Italian directive and the date.
**Verify**
- `cd api && source .venv/Scripts/activate && python -c "from app.core.langfuse_client import get_prompt_or_fallback; t,p = get_prompt_or_fallback('home_brief', ''); print(len(t), p is not None)"`
---
## Phase 5 — Electron client: home brief uses new agent
**Goal:** The existing home-brief UI flow (toast + full card) calls the new
brief agent over WS, and the `DAILY_BRIEF_PROMPT` constant is deleted.
**Files**
- `adiuvAI/src/shared/api-types.ts` (or wherever WS types live)
- `adiuvAI/src/main/api/backend-client.ts`
- `adiuvAI/src/main/ai/orchestrator.ts`
- `adiuvAI/src/main/router/index.ts`
**Tasks**
- [x] Add `WsBriefRequest` frame shape to shared types, mirroring the API
`WsBriefRequest` schema.
- [x] In `backend-client.ts`, add `sendBriefRequest(mode, projectId?, callbacks, requestId?)`
modeled on `sendHomeRequest`. It sends `{type:"brief_request", mode, project_id}`.
- [x] In `orchestrator.ts`:
- Delete the `DAILY_BRIEF_PROMPT` constant (and the `langSuffix` hack — the
backend now owns language injection).
- `generateAndCacheBrief()` → call `client.sendBriefRequest("home", undefined, {...})`.
- `dailyBrief()` → call `client.sendBriefRequest("home", undefined, {...}, requestId)`.
- [x] `router/index.ts`: no signature change — only the underlying orchestrator
was rewired. Leave `ai.dailyBrief` mutation as-is.
**Acceptance**
- Launch the Electron app, open Home, brief renders within 10s. No
`<task>`/`<timeline>` markers appear in the output. Italian UI user gets
Italian prose.
- Grep confirms `DAILY_BRIEF_PROMPT` no longer exists in the repo.
**Verify**
- `cd adiuvAI && npm run lint`
- Manual: Home page renders a fresh brief. Toggle `isHomePage` nav away/back
to check the cache path still works.
---
## Phase 6 — Project brief UI card
**Goal:** Each project page has a "Brief" card that calls `sendBriefRequest("project", id, ...)`
and renders streaming plain text.
**Files**
- `adiuvAI/src/renderer/components/projects/ProjectDetail.tsx`
- `adiuvAI/src/renderer/components/projects/ProjectBriefCard.tsx` (new)
- `adiuvAI/src/main/router/index.ts` (add `ai.projectBrief` mutation)
- `adiuvAI/src/main/ai/orchestrator.ts` (add `projectBrief(sender, projectId, requestId)`)
- Locale files (all 5).
**Tasks**
- [ ] Add `projectBrief(sender, projectId, requestId)` in `orchestrator.ts`
mirroring `dailyBrief` but with `mode:"project"` and no cache (cheap enough
to regenerate on demand; add a simple in-memory TTL of 5 minutes keyed by
`projectId` only if the UX feels laggy).
- [ ] Add `ai.projectBrief` mutation in the tRPC router, input
`{projectId: z.string().uuid(), requestId: z.string().optional()}`.
- [ ] `ProjectBriefCard`: shadcn Card, Sparkles icon, `text-sm` body. States:
`idle` (button "Generate brief") → `streaming` (skeleton + partial text) →
`ready` (full text + "Refresh" button). Stream via
`window.electronAI.onStreamChunk()` by request id.
- [ ] Mount `ProjectBriefCard` at the top of `ProjectDetail`, above existing
content.
- [ ] Add i18n keys under `projects.brief.*`: `title`, `generate`, `refresh`,
`generating`, `error`. Add to all 5 locale files.
**Acceptance**
- On a project with tasks/timelines, the card streams a coherent 48 sentence
brief with state / what's moving / next steps sections, no XML tags.
- On an empty project (no tasks/timelines), the brief is still coherent and
does not hallucinate.
- Refresh button produces a fresh generation (new `request_id`).
**Verify**
- `cd adiuvAI && npm run lint`
- Manual: navigate `/projects?projectId=<uuid>`, click Generate.
---
## Phase 7 — Observability + cleanup
**Goal:** The brief agent is visible in Langfuse as its own generation name,
and the old hard-coded prompt is fully removed.
**Files**
- `api/app/core/brief_agent.py`
- `adiuvAI/.claude/CLAUDE.md` — document the new agent
- `.claude/CLAUDE.md` (root) — add a short line under "api" section
**Tasks**
- [ ] Verify that `_run_single_agent_stream` uses `agent_name="brief-agent"`
so the Langfuse span/generation is named accordingly. Spot-check one trace
in the Langfuse UI.
- [ ] In `adiuvAI/.claude/CLAUDE.md`, add under the "AI Subsystem" section a
new bullet for `brief-agent` in the agents table (Scope: "Daily home brief
and per-project status brief", Tools: read-only subset).
- [ ] In the root `.claude/CLAUDE.md`, under the `api/` architecture section,
add `brief_agent.py` to the "Orchestration" list with a one-line purpose.
- [ ] Delete `scripts/smoke_brief.py` if it was committed.
**Acceptance**
- A Langfuse trace for a home brief shows span `brief-agent-stream` containing
a generation `brief-agent-llm` linked to the `home_brief` prompt version.
- `rg -n "DAILY_BRIEF_PROMPT" adiuvAI/` returns no matches.
- `rg -n "home_brief|project_brief" api/` shows usages only in `brief_agent.py`.
**Verify**
- Trigger one home brief and one project brief, open Langfuse, confirm traces.
---
## Phase 8 — Regression + doc polish
**Goal:** existing home chat behavior unchanged; brief behavior documented for
future contributors.
**Tasks**
- [ ] Open the home chat, send a normal message ("what are my tasks today?").
Response must still include `<task>` tag lines (the chat agent still uses
the tag contract; only the brief agent does not).
- [ ] Open the floating panel on a task. Response must still be plain text with
no tags (existing contract).
- [ ] Add a short "Daily Brief" paragraph to the user-facing docs (if any
marketing/help doc exists — otherwise skip).
**Acceptance**
- No regressions in home chat or floating chat.
- Plan document (`docs/plan-brief-agent.md`) is marked complete: every phase's
task checklist is `- [x]`.
**Verify**
- Manual QA pass of: home brief, project brief, home chat, floating chat on
task, floating chat on project.
---
## Out of scope (explicitly)
- Push notifications / proactive brief delivery (already a separate plan).
- Weekly / monthly brief variants.
- Brief export to PDF / email.
- Per-client brief (clients are currently a lightweight table, not a UI page).
- Writing tools in the brief agent — it stays read-only. If the user acts on
the brief ("create a task for X"), they send that as a normal home-chat
message, and the home agent handles it.