workspace/docs/plan-brief-agent.md

# Dedicated Brief Agent (Home + Project)

> Ralph-loop plan. Execute one phase per iteration. Each phase is self-contained:
> its **Files**, **Tasks**, **Acceptance**, and **Verify** blocks are everything
> the agent needs to finish that phase without re-reading earlier phases.
> Mark `- [x]` as you complete tasks. Do not start phase N+1 until phase N's
> **Acceptance** is fully met.

## Environment

All Python commands in this plan (`pytest`, `python`, `ruff`, `alembic`) must
be run inside the `api/` project's virtualenv at `api/.venv`.

- **bash / WSL / macOS / Linux**: `source api/.venv/bin/activate` (or prefix
  commands with `api/.venv/bin/python -m ...`)
- **Windows PowerShell**: `api\.venv\Scripts\Activate.ps1`
- **Windows bash shell (this repo's default)**: `source api/.venv/Scripts/activate`

Do **not** use the system Python or a globally installed `pytest`/`ruff` —
dependencies are pinned inside the venv. Every `Verify` step below assumes the
venv is active.

---

## Context

Today the daily brief is produced by the `home-agent` with a prompt stuffed into
`sendHomeRequest()` at [adiuvAI/src/main/ai/orchestrator.ts:160](adiuvAI/src/main/ai/orchestrator.ts#L160). This
couples two very different jobs (chat vs summarisation) into one agent and one
prompt. Worse: the home agent is wired to emit XML tag wrappers (`<task>`,
`<timeline>`, `<project>`) for the UI component renderer — wrappers the brief
does not want. We filter them out in post, but the LLM still pays tokens for
them and sometimes leaks malformed tags.

We want a **dedicated brief agent** that:

- Runs in **two modes** — `home` (daily brief) and `project` (per-project status brief).
- Produces **plain text only** — no XML/HTML tag wrappers, no bracketed id lists.
- Uses the **same infra pattern** as the other agents: `get_agent_llm(...)`,
  `.env` override, Langfuse prompt via `get_prompt_or_fallback()`,
  Langfuse tracing via `langfuse_context` + generation observations.
- Is **memory-aware** — core memory and relational memory are injected into the
  system prompt so the brief can say "Client X usually pays late — your invoice
  is still out" instead of a generic list.
- Is **read-only** — no create/update/delete tools. Tool surface is the minimum
  needed to answer "what needs attention right now?".

---

## Architecture

```
Electron ─ WsFrame{type:"brief_request", mode:"home"|"project", project_id?} ─►  device_ws
                                                                                      │
                                                                                      ▼
                                                                          core/brief_agent.py
                                                                          ├── run_home_brief()
                                                                          └── run_project_brief()
                                                                                      │
                                                             read-only tool subset    │
                                                            (tasks, projects, notes,  │
                                                              timelines, memory get)  │
                                                                                      ▼
                                                                  plain-text stream   ▲
                                                                   back to renderer  ─┘
```

**Key decisions:**

- New WS frame type `brief_request` — *not* a reuse of `home_request` — so the
  frame payload stays small and typed, and the server can pick the right agent
  without sniffing the prompt.
- Read-only tools only. Give the LLM access to the same data the UI sees
  (tasks, projects, notes, timelines, memory **get**). No mutating tools, no
  memory-write tools — a brief should never change state.
- `resolved_project_id` is passed explicitly in the request payload for the
  `project` mode (no LLM-side resolution). The `home` mode omits it.
- Both modes stream — the UI already has streaming rendering for the home
  brief; we reuse the same pattern for the project brief card.

---

## Improved prompts

The current prompt tries to do everything in one paragraph. These split the job
into role, data rules, voice, and output contract — the structure the model
actually follows.

### `home_brief` (Langfuse prompt, label `production`)

```
You are the user's personal assistant producing a short daily brief.

ROLE
Act like a calm, attentive secretary writing a stand-up note for your boss.
Warm and human, never breezy. Never cheerful filler, never emojis, never
"here is your brief" meta-text. The user is opening the app mid-workday and
is probably stressed — your job is to lower cognitive load, not add noise.

TOOLS — always call before writing
Pull fresh data every run. Do not invent counts or titles. Use at minimum:
- list_tasks_due_today — tasks the user owes today
- list_timeline_events_today — events starting or ending today
- list_active_projects — projects currently in progress or at risk
- memory_list_blocks / memory_get — personal context about people, clients,
  payment habits, working preferences
If a tool returns nothing, simply omit that topic. Never report zeros.

WHAT TO INCLUDE
1. Tasks due today (title + priority; group the 1–2 most important).
2. Timeline events starting or ending today (and anything that starts/ends
   tomorrow if the user has a very light day).
3. Active projects that need a nudge — stalled, blocked, or awaiting input.
4. Memory-aware colour where it sharpens the brief. Examples:
   - "Client Rossi tends to pay late — the Acme invoice is 6 days out."
   - "You usually dislike meetings before 10:00 — the call at 09:30 is unusual."
   Only add a memory line when it changes what the user does. Do not pad.

WHAT TO OMIT
- Zero-counts ("no overdue items", "0 meetings today").
- Statistics ("2 active projects, 3 completed tasks").
- Headers, titles, greetings, sign-offs, dates, emojis, slang.
- Meta-phrases ("here is", "let me know if", "hope this helps").
- XML/HTML tags of any kind. Plain prose only.

LIGHT-DAY CLAUSE
If tasks + events + active-project-nudges together produce fewer than two
sentences of content, also list 1–2 projects in status `on_hold` or `waiting`
and ask a single, specific question about them — e.g. "Is the Bianchi
redesign still paused, or ready to pick back up?" One question max, grounded
in a real project name.

VOICE
- Calm. Concise. Human. Short sentences.
- Use **bold** sparingly for task titles, project names, and people's names.
- No bullet lists. Flow as 2–4 sentences of prose.

LENGTH
2–4 sentences total. Hard cap 4. If the day is truly empty, one sentence.

Respond in the user's language ({{language}}). Today is {{today}}.
```

Variables: `{{language}}` (e.g. "Italian"), `{{today}}` (ISO date).

### `project_brief` (Langfuse prompt, label `production`)

```
You are the project assistant producing a short status brief for ONE project.

ROLE
A senior project manager summarising state-of-play for the owner. Factual,
sharp, forward-looking. Never reassuring filler, never emojis.

SCOPE
Work only with project_id = {{project_id}}. Do not mention or pull data from
other projects. Use tools to fetch fresh data:
- get_project — current status, dates, description
- list_tasks(project_id) — open work, split by status
- list_timeline_events(project_id) — milestones hit, upcoming, overdue
- list_project_notes(project_id) — any recent decisions or blockers
- memory_get — relevant context about the client, collaborators, constraints

STRUCTURE — follow exactly, one short paragraph per section, no headers
1. **State.** One sentence: current phase, health (on track / at risk / blocked),
   and why. Cite the concrete signal (overdue milestone, stalled tasks, recent
   blocker note).
2. **What's moving.** What was completed or progressed recently. Name specific
   tasks or milestones.
3. **Next steps.** The 1–3 most important things the user should do next, in
   priority order. Be concrete — task name, who owns it, when due if known.
   If waiting on someone else, name them and what the ask is.
4. **Risks / memory-flagged items.** One line max. Only include when there is
   a real risk or a relevant memory (e.g. late-paying client, tight deadline,
   scope change). Omit the section entirely if nothing to say.

WHAT TO OMIT
- Zero-counts ("no overdue tasks").
- Generic advice ("keep up the good work").
- Greetings, headers, bullet lists, emojis, sign-offs, meta-phrases.
- XML/HTML tags or bracketed id lists. Plain prose only.

VOICE
- Direct. Factual. No fluff.
- Use **bold** sparingly for task titles, milestone names, and the owner's name.
- Short sentences. Prefer verbs over nouns ("Client review is blocking release"
  not "There is a blocker which is the client review").

LENGTH
4–8 sentences total across the 3–4 sections. Hard cap 8.

Respond in the user's language ({{language}}). Today is {{today}}.
```

Variables: `{{project_id}}`, `{{language}}`, `{{today}}`.

---

## Phase 1 — Backend config scaffolding

**Goal:** `LLM_MODEL_BRIEF_AGENT` resolvable via `get_agent_llm("brief-agent")`.

**Files**
- `api/app/config/settings.py`
- `api/app/core/llm.py`
- `api/.env.example`

**Tasks**
- [ ] Add field `LLM_MODEL_BRIEF_AGENT: str = ""` to `Settings` after `LLM_MODEL_CLOUD_PROCESSOR`.
- [ ] Add `"brief-agent": lambda: settings.LLM_MODEL_BRIEF_AGENT or settings.LLM_MODEL` entry to `_AGENT_MODEL_SETTINGS` in `llm.py`.
- [ ] Add a commented-out `LLM_MODEL_BRIEF_AGENT=` block in `.env.example`, with a 2-line description mirroring the existing style ("Brief-agent — produces home and project text briefs. A small model (e.g. gpt-4o-mini) is sufficient.").

**Acceptance**
- `python -c "from app.core.llm import model_for_agent; print(model_for_agent('brief-agent'))"` prints the default model (matches `LLM_MODEL`) when the override is empty; prints the override when set.
- `ruff check .` passes.

**Verify**
- `cd api && source .venv/Scripts/activate && python -c "from app.core.llm import model_for_agent; print(model_for_agent('brief-agent'))"`
- `cd api && source .venv/Scripts/activate && ruff check .`

---

## Phase 2 — Brief-agent module (read-only tool subset)

**Goal:** `run_home_brief()` and `run_project_brief()` callables exist and work
end-to-end against a live backend, producing plain-text streams. No WS wiring
yet — exercised via a `scripts/smoke_brief.py` one-liner.

**Files (new)**
- `api/app/core/brief_agent.py`

**Files (touched)**
- `api/app/agents/task_agent.py` — export a `TASK_READ_TOOLS` list
  (`list_tasks`, `list_tasks_due_today`, `list_task_comments`).
- `api/app/agents/project_agent.py` — export a `PROJECT_READ_TOOLS` list
  (`list_projects`, `list_all_projects`, `get_project`).
- `api/app/agents/timeline_agent.py` — export a `TIMELINE_READ_TOOLS` list
  (`list_timelines`, plus a new `list_timelines_today` that filters by today
  — add it alongside the existing tools) and a `list_timeline_events` alias
  scoped by `project_id`.
- `api/app/agents/note_agent.py` — export a `NOTE_READ_TOOLS` list
  (`list_notes`, `get_note`).

**Tasks**
- [x] Add the four `*_READ_TOOLS` exports in the agent files. Do not remove the
  existing `*_TOOLS` exports — the chat agents still use them.
- [x] Add `list_timelines_today` in `timeline_agent.py`: returns only timelines
  whose `date` falls on today (UTC). Mirror the shape of
  `list_tasks_due_today`.
- [x] Create `brief_agent.py` with:
  - Module-level fallback prompt constants `_HOME_BRIEF_FALLBACK` and
    `_PROJECT_BRIEF_FALLBACK` — copy the prompts from the plan above verbatim,
    using `{language}` / `{today}` / `{project_id}` (single-brace) so
    `.format()` works when Langfuse is unavailable.
  - Read-only memory tools subset: reuse `_memory_tools()` from `deep_agent.py`
    but filter to `memory_list_blocks`, `memory_get`, `archival_memory_search`,
    `conversation_search`. Factor out a small helper `_read_only_memory_tools()`
    in `deep_agent.py` (or duplicate locally — keep it simple).
  - `async def run_home_brief(user_id, context) -> AsyncGenerator[tuple[str, Any], None]`
  - `async def run_project_brief(user_id, project_id, context) -> AsyncGenerator[tuple[str, Any], None]`
  - Both reuse `_run_single_agent_stream` from `deep_agent.py` by passing
    `agent_name="brief-agent"` and the relevant prompt. Tool list is the
    read-only subset.
  - Inject `_language_instruction`, `_relational_memory_injection`, and
    `_proactive_hints_injection` into the system prompt — same pattern as
    `run_home_stream`.
  - After rendering the system prompt with `compile_prompt`, append a line
    `"\nToday is YYYY-MM-DD."` only if the Langfuse template did not already
    include `{{today}}` substitution (safe fallback).
- [x] Do **not** call `_normalize_tagged_list_lines` on the output — the brief
  prompt forbids tags, so skipping the post-processor is a deliberate signal
  of correctness.

**Acceptance**
- Importing `from app.core.brief_agent import run_home_brief, run_project_brief` succeeds.
- A smoke script `scripts/smoke_brief.py` (create it; git-ignore it; OK to delete afterwards) runs `run_home_brief` against a seeded test user and streams text to stdout. Output contains no `<` or `[uuid]` substrings.
- `ruff check .` passes.

**Verify**
- `cd api && source .venv/Scripts/activate && python scripts/smoke_brief.py home`
- `cd api && source .venv/Scripts/activate && python scripts/smoke_brief.py project <uuid>`
- `cd api && source .venv/Scripts/activate && ruff check .`

---

## Phase 3 — WS frame + REST fallback

**Goal:** Electron can send `{type:"brief_request", mode, project_id?}` over
the device WS and receive a plain-text stream. REST `POST /chat/brief` exists
as fallback.

**Files**
- `api/app/schemas.py`
- `api/app/api/routes/device_ws.py`
- `api/app/api/routes/chat.py`

**Tasks**
- [ ] In `schemas.py`: add `brief_request = "brief_request"` to `WsFrameType`,
  and a `WsBriefRequest` model with fields
  `type: Literal[WsFrameType.brief_request]`, `request_id: str | None`,
  `session_id: str | None`, `mode: Literal["home", "project"]`,
  `project_id: str | None`.
- [ ] In `device_ws.py`: add an `elif frame_type == WsFrameType.brief_request:`
  branch that dispatches to a new `_handle_brief_request` task.
- [ ] Implement `_handle_brief_request` by mirroring `_handle_home_request` but:
  - Call `run_home_brief(user_id, context)` when `mode == "home"`,
    `run_project_brief(user_id, project_id, context)` when `mode == "project"`
    (validate `project_id` is a UUID; send `stream_end` with error frame
    otherwise).
  - **Skip** episode storage — briefs are not conversations.
  - Still run `memory.enrich_context(...)` so relational/proactive memory is
    injected.
- [ ] In `chat.py`: add `POST /chat/brief` that accepts `{mode, project_id?}`
  and returns the full text (collects stream). This is the offline fallback
  path used when the WS is not ready.

**Acceptance**
- Electron smoke client opens the WS, sends a `brief_request` with `mode:"home"`,
  and receives `stream_start` → N × `stream_text` → `stream_end` frames.
- `POST /chat/brief` returns `{response: "..."}`.
- Malformed `project_id` → WS frame `stream_end` with an error message (no server crash).

**Verify**
- Run pytest existing suite: `cd api && source .venv/Scripts/activate && pytest -q`.
- Add one unit test `tests/test_brief_agent.py` covering: home mode returns
  non-empty text; project mode with bogus UUID returns an error without
  crashing; tools called are from the read-only subset (monkeypatch
  `run_home_brief` to assert the tool list).
- Then: `cd api && source .venv/Scripts/activate && pytest tests/test_brief_agent.py -v`.

---

## Phase 4 — Langfuse prompts

**Goal:** `home_brief` and `project_brief` prompts exist in Langfuse at label
`production`, matching the content in this plan.

**Files**
- None (external config via MCP).

**Tasks**
- [x] Use `mcp__langfuse-docs__searchLangfuseDocs` to confirm the text-prompt
  variable syntax (`{{variable}}`) and that `label="production"` is the label
  read by `get_prompt_or_fallback`.
- [x] Use `mcp__langfuse__createTextPrompt` to create `home_brief` with the
  content from the "Improved prompts → home_brief" section above. Set label
  to `production`. Variables: `language`, `today`.
- [x] Use `mcp__langfuse__createTextPrompt` to create `project_brief` with the
  content from the "Improved prompts → project_brief" section above. Set label
  to `production`. Variables: `language`, `today`, `project_id`.
- [x] Use `mcp__langfuse__getPrompt` to round-trip both prompts and verify the
  raw template matches what was sent.

**Acceptance**
- Both prompts resolve via `get_prompt_or_fallback("home_brief", "")` and
  `get_prompt_or_fallback("project_brief", "")` in a Python shell against the
  real Langfuse instance — return a non-empty `raw_template` and a non-None
  `prompt_obj`.
- `prompt_obj.compile(language="Italian", today="2026-04-17")` returns text
  containing the Italian directive and the date.

**Verify**
- `cd api && source .venv/Scripts/activate && python -c "from app.core.langfuse_client import get_prompt_or_fallback; t,p = get_prompt_or_fallback('home_brief', ''); print(len(t), p is not None)"`

---

## Phase 5 — Electron client: home brief uses new agent

**Goal:** The existing home-brief UI flow (toast + full card) calls the new
brief agent over WS, and the `DAILY_BRIEF_PROMPT` constant is deleted.

**Files**
- `adiuvAI/src/shared/api-types.ts` (or wherever WS types live)
- `adiuvAI/src/main/api/backend-client.ts`
- `adiuvAI/src/main/ai/orchestrator.ts`
- `adiuvAI/src/main/router/index.ts`

**Tasks**
- [x] Add `WsBriefRequest` frame shape to shared types, mirroring the API
  `WsBriefRequest` schema.
- [x] In `backend-client.ts`, add `sendBriefRequest(mode, projectId?, callbacks, requestId?)`
  modeled on `sendHomeRequest`. It sends `{type:"brief_request", mode, project_id}`.
- [x] In `orchestrator.ts`:
  - Delete the `DAILY_BRIEF_PROMPT` constant (and the `langSuffix` hack — the
    backend now owns language injection).
  - `generateAndCacheBrief()` → call `client.sendBriefRequest("home", undefined, {...})`.
  - `dailyBrief()` → call `client.sendBriefRequest("home", undefined, {...}, requestId)`.
- [x] `router/index.ts`: no signature change — only the underlying orchestrator
  was rewired. Leave `ai.dailyBrief` mutation as-is.

**Acceptance**
- Launch the Electron app, open Home, brief renders within 10s. No
  `<task>`/`<timeline>` markers appear in the output. Italian UI user gets
  Italian prose.
- Grep confirms `DAILY_BRIEF_PROMPT` no longer exists in the repo.

**Verify**
- `cd adiuvAI && npm run lint`
- Manual: Home page renders a fresh brief. Toggle `isHomePage` nav away/back
  to check the cache path still works.

---

## Phase 6 — Project brief UI card

**Goal:** Each project page has a "Brief" card that calls `sendBriefRequest("project", id, ...)`
and renders streaming plain text.

**Files**
- `adiuvAI/src/renderer/components/projects/ProjectDetail.tsx`
- `adiuvAI/src/renderer/components/projects/ProjectBriefCard.tsx` (new)
- `adiuvAI/src/main/router/index.ts` (add `ai.projectBrief` mutation)
- `adiuvAI/src/main/ai/orchestrator.ts` (add `projectBrief(sender, projectId, requestId)`)
- Locale files (all 5).

**Tasks**
- [ ] Add `projectBrief(sender, projectId, requestId)` in `orchestrator.ts`
  mirroring `dailyBrief` but with `mode:"project"` and no cache (cheap enough
  to regenerate on demand; add a simple in-memory TTL of 5 minutes keyed by
  `projectId` only if the UX feels laggy).
- [ ] Add `ai.projectBrief` mutation in the tRPC router, input
  `{projectId: z.string().uuid(), requestId: z.string().optional()}`.
- [ ] `ProjectBriefCard`: shadcn Card, Sparkles icon, `text-sm` body. States:
  `idle` (button "Generate brief") → `streaming` (skeleton + partial text) →
  `ready` (full text + "Refresh" button). Stream via
  `window.electronAI.onStreamChunk()` by request id.
- [ ] Mount `ProjectBriefCard` at the top of `ProjectDetail`, above existing
  content.
- [ ] Add i18n keys under `projects.brief.*`: `title`, `generate`, `refresh`,
  `generating`, `error`. Add to all 5 locale files.

**Acceptance**
- On a project with tasks/timelines, the card streams a coherent 4–8 sentence
  brief with state / what's moving / next steps sections, no XML tags.
- On an empty project (no tasks/timelines), the brief is still coherent and
  does not hallucinate.
- Refresh button produces a fresh generation (new `request_id`).

**Verify**
- `cd adiuvAI && npm run lint`
- Manual: navigate `/projects?projectId=<uuid>`, click Generate.

---

## Phase 7 — Observability + cleanup

**Goal:** The brief agent is visible in Langfuse as its own generation name,
and the old hard-coded prompt is fully removed.

**Files**
- `api/app/core/brief_agent.py`
- `adiuvAI/.claude/CLAUDE.md` — document the new agent
- `.claude/CLAUDE.md` (root) — add a short line under "api" section

**Tasks**
- [ ] Verify that `_run_single_agent_stream` uses `agent_name="brief-agent"`
  so the Langfuse span/generation is named accordingly. Spot-check one trace
  in the Langfuse UI.
- [ ] In `adiuvAI/.claude/CLAUDE.md`, add under the "AI Subsystem" section a
  new bullet for `brief-agent` in the agents table (Scope: "Daily home brief
  and per-project status brief", Tools: read-only subset).
- [ ] In the root `.claude/CLAUDE.md`, under the `api/` architecture section,
  add `brief_agent.py` to the "Orchestration" list with a one-line purpose.
- [ ] Delete `scripts/smoke_brief.py` if it was committed.

**Acceptance**
- A Langfuse trace for a home brief shows span `brief-agent-stream` containing
  a generation `brief-agent-llm` linked to the `home_brief` prompt version.
- `rg -n "DAILY_BRIEF_PROMPT" adiuvAI/` returns no matches.
- `rg -n "home_brief|project_brief" api/` shows usages only in `brief_agent.py`.

**Verify**
- Trigger one home brief and one project brief, open Langfuse, confirm traces.

---

## Phase 8 — Regression + doc polish

**Goal:** existing home chat behavior unchanged; brief behavior documented for
future contributors.

**Tasks**
- [ ] Open the home chat, send a normal message ("what are my tasks today?").
  Response must still include `<task>` tag lines (the chat agent still uses
  the tag contract; only the brief agent does not).
- [ ] Open the floating panel on a task. Response must still be plain text with
  no tags (existing contract).
- [ ] Add a short "Daily Brief" paragraph to the user-facing docs (if any
  marketing/help doc exists — otherwise skip).

**Acceptance**
- No regressions in home chat or floating chat.
- Plan document (`docs/plan-brief-agent.md`) is marked complete: every phase's
  task checklist is `- [x]`.

**Verify**
- Manual QA pass of: home brief, project brief, home chat, floating chat on
  task, floating chat on project.

---

## Out of scope (explicitly)

- Push notifications / proactive brief delivery (already a separate plan).
- Weekly / monthly brief variants.
- Brief export to PDF / email.
- Per-client brief (clients are currently a lightweight table, not a UI page).
- Writing tools in the brief agent — it stays read-only. If the user acts on
  the brief ("create a task for X"), they send that as a normal home-chat
  message, and the home agent handles it.