brief agent

This commit is contained in:
Roberto Musso
2026-04-18 22:20:03 +02:00
parent 3538050e75
commit 0ac2ce924d
4 changed files with 534 additions and 1 deletions

1
.gitignore vendored
View File

@@ -97,3 +97,4 @@ unused_skills/
.claude/skills/webapp-testing/examples/element_discovery.py
.claude/skills/webapp-testing/examples/static_html_automation.py
.claude/skills/webapp-testing/scripts/with_server.py
.mcp.json

View File

@@ -3,6 +3,28 @@
"langfuse-docs": {
"type": "http",
"url": "https://langfuse.com/api/mcp"
},
"langfuse": {
"type": "http",
"url": "https://langfuse.muticolturano.com/api/public/mcp",
"headers": {
"Authorization": "Basic cGstbGYtMGU2MmE5ZWItMDk3OC00ZTJlLWIzYWQtYmIzNjE5NDcwMWI4OnNrLWxmLTI4NmMxNjVmLTFjODQtNGEzNi1iMGIwLWNmZTViNjgwODk3ZA=="
}
},
"postgres": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"-e",
"DATABASE_URI",
"crystaldba/postgres-mcp",
"--access-mode=restricted"
],
"env": {
"DATABASE_URI": "postgresql://postgres:XVTsmNqsMJX5Cd%2FNrAG4%2F4KFoaVDEy2CXsFMDqi8m58%3D@10.0.0.123:5432/adiuvai"
}
}
}
}

2
api

Submodule api updated: 0b5ef48463...d5fea95561

510
docs/plan-brief-agent.md Normal file
View File

@@ -0,0 +1,510 @@
# Dedicated Brief Agent (Home + Project)
> Ralph-loop plan. Execute one phase per iteration. Each phase is self-contained:
> its **Files**, **Tasks**, **Acceptance**, and **Verify** blocks are everything
> the agent needs to finish that phase without re-reading earlier phases.
> Mark `- [x]` as you complete tasks. Do not start phase N+1 until phase N's
> **Acceptance** is fully met.
## Environment
All Python commands in this plan (`pytest`, `python`, `ruff`, `alembic`) must
be run inside the `api/` project's virtualenv at `api/.venv`.
- **bash / WSL / macOS / Linux**: `source api/.venv/bin/activate` (or prefix
commands with `api/.venv/bin/python -m ...`)
- **Windows PowerShell**: `api\.venv\Scripts\Activate.ps1`
- **Windows bash shell (this repo's default)**: `source api/.venv/Scripts/activate`
Do **not** use the system Python or a globally installed `pytest`/`ruff`
dependencies are pinned inside the venv. Every `Verify` step below assumes the
venv is active.
---
## Context
Today the daily brief is produced by the `home-agent` with a prompt stuffed into
`sendHomeRequest()` at [adiuvAI/src/main/ai/orchestrator.ts:160](adiuvAI/src/main/ai/orchestrator.ts#L160). This
couples two very different jobs (chat vs summarisation) into one agent and one
prompt. Worse: the home agent is wired to emit XML tag wrappers (`<task>`,
`<timeline>`, `<project>`) for the UI component renderer — wrappers the brief
does not want. We filter them out in post, but the LLM still pays tokens for
them and sometimes leaks malformed tags.
We want a **dedicated brief agent** that:
- Runs in **two modes**`home` (daily brief) and `project` (per-project status brief).
- Produces **plain text only** — no XML/HTML tag wrappers, no bracketed id lists.
- Uses the **same infra pattern** as the other agents: `get_agent_llm(...)`,
`.env` override, Langfuse prompt via `get_prompt_or_fallback()`,
Langfuse tracing via `langfuse_context` + generation observations.
- Is **memory-aware** — core memory and relational memory are injected into the
system prompt so the brief can say "Client X usually pays late — your invoice
is still out" instead of a generic list.
- Is **read-only** — no create/update/delete tools. Tool surface is the minimum
needed to answer "what needs attention right now?".
---
## Architecture
```
Electron ─ WsFrame{type:"brief_request", mode:"home"|"project", project_id?} ─► device_ws
core/brief_agent.py
├── run_home_brief()
└── run_project_brief()
read-only tool subset │
(tasks, projects, notes, │
timelines, memory get) │
plain-text stream ▲
back to renderer ─┘
```
**Key decisions:**
- New WS frame type `brief_request`*not* a reuse of `home_request` — so the
frame payload stays small and typed, and the server can pick the right agent
without sniffing the prompt.
- Read-only tools only. Give the LLM access to the same data the UI sees
(tasks, projects, notes, timelines, memory **get**). No mutating tools, no
memory-write tools — a brief should never change state.
- `resolved_project_id` is passed explicitly in the request payload for the
`project` mode (no LLM-side resolution). The `home` mode omits it.
- Both modes stream — the UI already has streaming rendering for the home
brief; we reuse the same pattern for the project brief card.
---
## Improved prompts
The current prompt tries to do everything in one paragraph. These split the job
into role, data rules, voice, and output contract — the structure the model
actually follows.
### `home_brief` (Langfuse prompt, label `production`)
```
You are the user's personal assistant producing a short daily brief.
ROLE
Act like a calm, attentive secretary writing a stand-up note for your boss.
Warm and human, never breezy. Never cheerful filler, never emojis, never
"here is your brief" meta-text. The user is opening the app mid-workday and
is probably stressed — your job is to lower cognitive load, not add noise.
TOOLS — always call before writing
Pull fresh data every run. Do not invent counts or titles. Use at minimum:
- list_tasks_due_today — tasks the user owes today
- list_timeline_events_today — events starting or ending today
- list_active_projects — projects currently in progress or at risk
- memory_list_blocks / memory_get — personal context about people, clients,
payment habits, working preferences
If a tool returns nothing, simply omit that topic. Never report zeros.
WHAT TO INCLUDE
1. Tasks due today (title + priority; group the 12 most important).
2. Timeline events starting or ending today (and anything that starts/ends
tomorrow if the user has a very light day).
3. Active projects that need a nudge — stalled, blocked, or awaiting input.
4. Memory-aware colour where it sharpens the brief. Examples:
- "Client Rossi tends to pay late — the Acme invoice is 6 days out."
- "You usually dislike meetings before 10:00 — the call at 09:30 is unusual."
Only add a memory line when it changes what the user does. Do not pad.
WHAT TO OMIT
- Zero-counts ("no overdue items", "0 meetings today").
- Statistics ("2 active projects, 3 completed tasks").
- Headers, titles, greetings, sign-offs, dates, emojis, slang.
- Meta-phrases ("here is", "let me know if", "hope this helps").
- XML/HTML tags of any kind. Plain prose only.
LIGHT-DAY CLAUSE
If tasks + events + active-project-nudges together produce fewer than two
sentences of content, also list 12 projects in status `on_hold` or `waiting`
and ask a single, specific question about them — e.g. "Is the Bianchi
redesign still paused, or ready to pick back up?" One question max, grounded
in a real project name.
VOICE
- Calm. Concise. Human. Short sentences.
- Use **bold** sparingly for task titles, project names, and people's names.
- No bullet lists. Flow as 24 sentences of prose.
LENGTH
24 sentences total. Hard cap 4. If the day is truly empty, one sentence.
Respond in the user's language ({{language}}). Today is {{today}}.
```
Variables: `{{language}}` (e.g. "Italian"), `{{today}}` (ISO date).
### `project_brief` (Langfuse prompt, label `production`)
```
You are the project assistant producing a short status brief for ONE project.
ROLE
A senior project manager summarising state-of-play for the owner. Factual,
sharp, forward-looking. Never reassuring filler, never emojis.
SCOPE
Work only with project_id = {{project_id}}. Do not mention or pull data from
other projects. Use tools to fetch fresh data:
- get_project — current status, dates, description
- list_tasks(project_id) — open work, split by status
- list_timeline_events(project_id) — milestones hit, upcoming, overdue
- list_project_notes(project_id) — any recent decisions or blockers
- memory_get — relevant context about the client, collaborators, constraints
STRUCTURE — follow exactly, one short paragraph per section, no headers
1. **State.** One sentence: current phase, health (on track / at risk / blocked),
and why. Cite the concrete signal (overdue milestone, stalled tasks, recent
blocker note).
2. **What's moving.** What was completed or progressed recently. Name specific
tasks or milestones.
3. **Next steps.** The 13 most important things the user should do next, in
priority order. Be concrete — task name, who owns it, when due if known.
If waiting on someone else, name them and what the ask is.
4. **Risks / memory-flagged items.** One line max. Only include when there is
a real risk or a relevant memory (e.g. late-paying client, tight deadline,
scope change). Omit the section entirely if nothing to say.
WHAT TO OMIT
- Zero-counts ("no overdue tasks").
- Generic advice ("keep up the good work").
- Greetings, headers, bullet lists, emojis, sign-offs, meta-phrases.
- XML/HTML tags or bracketed id lists. Plain prose only.
VOICE
- Direct. Factual. No fluff.
- Use **bold** sparingly for task titles, milestone names, and the owner's name.
- Short sentences. Prefer verbs over nouns ("Client review is blocking release"
not "There is a blocker which is the client review").
LENGTH
48 sentences total across the 34 sections. Hard cap 8.
Respond in the user's language ({{language}}). Today is {{today}}.
```
Variables: `{{project_id}}`, `{{language}}`, `{{today}}`.
---
## Phase 1 — Backend config scaffolding
**Goal:** `LLM_MODEL_BRIEF_AGENT` resolvable via `get_agent_llm("brief-agent")`.
**Files**
- `api/app/config/settings.py`
- `api/app/core/llm.py`
- `api/.env.example`
**Tasks**
- [ ] Add field `LLM_MODEL_BRIEF_AGENT: str = ""` to `Settings` after `LLM_MODEL_CLOUD_PROCESSOR`.
- [ ] Add `"brief-agent": lambda: settings.LLM_MODEL_BRIEF_AGENT or settings.LLM_MODEL` entry to `_AGENT_MODEL_SETTINGS` in `llm.py`.
- [ ] Add a commented-out `LLM_MODEL_BRIEF_AGENT=` block in `.env.example`, with a 2-line description mirroring the existing style ("Brief-agent — produces home and project text briefs. A small model (e.g. gpt-4o-mini) is sufficient.").
**Acceptance**
- `python -c "from app.core.llm import model_for_agent; print(model_for_agent('brief-agent'))"` prints the default model (matches `LLM_MODEL`) when the override is empty; prints the override when set.
- `ruff check .` passes.
**Verify**
- `cd api && source .venv/Scripts/activate && python -c "from app.core.llm import model_for_agent; print(model_for_agent('brief-agent'))"`
- `cd api && source .venv/Scripts/activate && ruff check .`
---
## Phase 2 — Brief-agent module (read-only tool subset)
**Goal:** `run_home_brief()` and `run_project_brief()` callables exist and work
end-to-end against a live backend, producing plain-text streams. No WS wiring
yet — exercised via a `scripts/smoke_brief.py` one-liner.
**Files (new)**
- `api/app/core/brief_agent.py`
**Files (touched)**
- `api/app/agents/task_agent.py` — export a `TASK_READ_TOOLS` list
(`list_tasks`, `list_tasks_due_today`, `list_task_comments`).
- `api/app/agents/project_agent.py` — export a `PROJECT_READ_TOOLS` list
(`list_projects`, `list_all_projects`, `get_project`).
- `api/app/agents/timeline_agent.py` — export a `TIMELINE_READ_TOOLS` list
(`list_timelines`, plus a new `list_timelines_today` that filters by today
— add it alongside the existing tools) and a `list_timeline_events` alias
scoped by `project_id`.
- `api/app/agents/note_agent.py` — export a `NOTE_READ_TOOLS` list
(`list_notes`, `get_note`).
**Tasks**
- [x] Add the four `*_READ_TOOLS` exports in the agent files. Do not remove the
existing `*_TOOLS` exports — the chat agents still use them.
- [x] Add `list_timelines_today` in `timeline_agent.py`: returns only timelines
whose `date` falls on today (UTC). Mirror the shape of
`list_tasks_due_today`.
- [x] Create `brief_agent.py` with:
- Module-level fallback prompt constants `_HOME_BRIEF_FALLBACK` and
`_PROJECT_BRIEF_FALLBACK` — copy the prompts from the plan above verbatim,
using `{language}` / `{today}` / `{project_id}` (single-brace) so
`.format()` works when Langfuse is unavailable.
- Read-only memory tools subset: reuse `_memory_tools()` from `deep_agent.py`
but filter to `memory_list_blocks`, `memory_get`, `archival_memory_search`,
`conversation_search`. Factor out a small helper `_read_only_memory_tools()`
in `deep_agent.py` (or duplicate locally — keep it simple).
- `async def run_home_brief(user_id, context) -> AsyncGenerator[tuple[str, Any], None]`
- `async def run_project_brief(user_id, project_id, context) -> AsyncGenerator[tuple[str, Any], None]`
- Both reuse `_run_single_agent_stream` from `deep_agent.py` by passing
`agent_name="brief-agent"` and the relevant prompt. Tool list is the
read-only subset.
- Inject `_language_instruction`, `_relational_memory_injection`, and
`_proactive_hints_injection` into the system prompt — same pattern as
`run_home_stream`.
- After rendering the system prompt with `compile_prompt`, append a line
`"\nToday is YYYY-MM-DD."` only if the Langfuse template did not already
include `{{today}}` substitution (safe fallback).
- [x] Do **not** call `_normalize_tagged_list_lines` on the output — the brief
prompt forbids tags, so skipping the post-processor is a deliberate signal
of correctness.
**Acceptance**
- Importing `from app.core.brief_agent import run_home_brief, run_project_brief` succeeds.
- A smoke script `scripts/smoke_brief.py` (create it; git-ignore it; OK to delete afterwards) runs `run_home_brief` against a seeded test user and streams text to stdout. Output contains no `<` or `[uuid]` substrings.
- `ruff check .` passes.
**Verify**
- `cd api && source .venv/Scripts/activate && python scripts/smoke_brief.py home`
- `cd api && source .venv/Scripts/activate && python scripts/smoke_brief.py project <uuid>`
- `cd api && source .venv/Scripts/activate && ruff check .`
---
## Phase 3 — WS frame + REST fallback
**Goal:** Electron can send `{type:"brief_request", mode, project_id?}` over
the device WS and receive a plain-text stream. REST `POST /chat/brief` exists
as fallback.
**Files**
- `api/app/schemas.py`
- `api/app/api/routes/device_ws.py`
- `api/app/api/routes/chat.py`
**Tasks**
- [ ] In `schemas.py`: add `brief_request = "brief_request"` to `WsFrameType`,
and a `WsBriefRequest` model with fields
`type: Literal[WsFrameType.brief_request]`, `request_id: str | None`,
`session_id: str | None`, `mode: Literal["home", "project"]`,
`project_id: str | None`.
- [ ] In `device_ws.py`: add an `elif frame_type == WsFrameType.brief_request:`
branch that dispatches to a new `_handle_brief_request` task.
- [ ] Implement `_handle_brief_request` by mirroring `_handle_home_request` but:
- Call `run_home_brief(user_id, context)` when `mode == "home"`,
`run_project_brief(user_id, project_id, context)` when `mode == "project"`
(validate `project_id` is a UUID; send `stream_end` with error frame
otherwise).
- **Skip** episode storage — briefs are not conversations.
- Still run `memory.enrich_context(...)` so relational/proactive memory is
injected.
- [ ] In `chat.py`: add `POST /chat/brief` that accepts `{mode, project_id?}`
and returns the full text (collects stream). This is the offline fallback
path used when the WS is not ready.
**Acceptance**
- Electron smoke client opens the WS, sends a `brief_request` with `mode:"home"`,
and receives `stream_start` → N × `stream_text``stream_end` frames.
- `POST /chat/brief` returns `{response: "..."}`.
- Malformed `project_id` → WS frame `stream_end` with an error message (no server crash).
**Verify**
- Run pytest existing suite: `cd api && source .venv/Scripts/activate && pytest -q`.
- Add one unit test `tests/test_brief_agent.py` covering: home mode returns
non-empty text; project mode with bogus UUID returns an error without
crashing; tools called are from the read-only subset (monkeypatch
`run_home_brief` to assert the tool list).
- Then: `cd api && source .venv/Scripts/activate && pytest tests/test_brief_agent.py -v`.
---
## Phase 4 — Langfuse prompts
**Goal:** `home_brief` and `project_brief` prompts exist in Langfuse at label
`production`, matching the content in this plan.
**Files**
- None (external config via MCP).
**Tasks**
- [ ] Use `mcp__langfuse-docs__searchLangfuseDocs` to confirm the text-prompt
variable syntax (`{{variable}}`) and that `label="production"` is the label
read by `get_prompt_or_fallback`.
- [ ] Use `mcp__langfuse__createTextPrompt` to create `home_brief` with the
content from the "Improved prompts → home_brief" section above. Set label
to `production`. Variables: `language`, `today`.
- [ ] Use `mcp__langfuse__createTextPrompt` to create `project_brief` with the
content from the "Improved prompts → project_brief" section above. Set label
to `production`. Variables: `language`, `today`, `project_id`.
- [ ] Use `mcp__langfuse__getPrompt` to round-trip both prompts and verify the
raw template matches what was sent.
**Acceptance**
- Both prompts resolve via `get_prompt_or_fallback("home_brief", "")` and
`get_prompt_or_fallback("project_brief", "")` in a Python shell against the
real Langfuse instance — return a non-empty `raw_template` and a non-None
`prompt_obj`.
- `prompt_obj.compile(language="Italian", today="2026-04-17")` returns text
containing the Italian directive and the date.
**Verify**
- `cd api && source .venv/Scripts/activate && python -c "from app.core.langfuse_client import get_prompt_or_fallback; t,p = get_prompt_or_fallback('home_brief', ''); print(len(t), p is not None)"`
---
## Phase 5 — Electron client: home brief uses new agent
**Goal:** The existing home-brief UI flow (toast + full card) calls the new
brief agent over WS, and the `DAILY_BRIEF_PROMPT` constant is deleted.
**Files**
- `adiuvAI/src/shared/api-types.ts` (or wherever WS types live)
- `adiuvAI/src/main/api/backend-client.ts`
- `adiuvAI/src/main/ai/orchestrator.ts`
- `adiuvAI/src/main/router/index.ts`
**Tasks**
- [ ] Add `WsBriefRequest` frame shape to shared types, mirroring the API
`WsBriefRequest` schema.
- [ ] In `backend-client.ts`, add `sendBriefRequest(mode, projectId?, callbacks, requestId?)`
modeled on `sendHomeRequest`. It sends `{type:"brief_request", mode, project_id}`.
- [ ] In `orchestrator.ts`:
- Delete the `DAILY_BRIEF_PROMPT` constant (and the `langSuffix` hack — the
backend now owns language injection).
- `generateAndCacheBrief()` → call `client.sendBriefRequest("home", undefined, {...})`.
- `dailyBrief()` → call `client.sendBriefRequest("home", undefined, {...}, requestId)`.
- [ ] `router/index.ts`: no signature change — only the underlying orchestrator
was rewired. Leave `ai.dailyBrief` mutation as-is.
**Acceptance**
- Launch the Electron app, open Home, brief renders within 10s. No
`<task>`/`<timeline>` markers appear in the output. Italian UI user gets
Italian prose.
- Grep confirms `DAILY_BRIEF_PROMPT` no longer exists in the repo.
**Verify**
- `cd adiuvAI && npm run lint`
- Manual: Home page renders a fresh brief. Toggle `isHomePage` nav away/back
to check the cache path still works.
---
## Phase 6 — Project brief UI card
**Goal:** Each project page has a "Brief" card that calls `sendBriefRequest("project", id, ...)`
and renders streaming plain text.
**Files**
- `adiuvAI/src/renderer/components/projects/ProjectDetail.tsx`
- `adiuvAI/src/renderer/components/projects/ProjectBriefCard.tsx` (new)
- `adiuvAI/src/main/router/index.ts` (add `ai.projectBrief` mutation)
- `adiuvAI/src/main/ai/orchestrator.ts` (add `projectBrief(sender, projectId, requestId)`)
- Locale files (all 5).
**Tasks**
- [ ] Add `projectBrief(sender, projectId, requestId)` in `orchestrator.ts`
mirroring `dailyBrief` but with `mode:"project"` and no cache (cheap enough
to regenerate on demand; add a simple in-memory TTL of 5 minutes keyed by
`projectId` only if the UX feels laggy).
- [ ] Add `ai.projectBrief` mutation in the tRPC router, input
`{projectId: z.string().uuid(), requestId: z.string().optional()}`.
- [ ] `ProjectBriefCard`: shadcn Card, Sparkles icon, `text-sm` body. States:
`idle` (button "Generate brief") → `streaming` (skeleton + partial text) →
`ready` (full text + "Refresh" button). Stream via
`window.electronAI.onStreamChunk()` by request id.
- [ ] Mount `ProjectBriefCard` at the top of `ProjectDetail`, above existing
content.
- [ ] Add i18n keys under `projects.brief.*`: `title`, `generate`, `refresh`,
`generating`, `error`. Add to all 5 locale files.
**Acceptance**
- On a project with tasks/timelines, the card streams a coherent 48 sentence
brief with state / what's moving / next steps sections, no XML tags.
- On an empty project (no tasks/timelines), the brief is still coherent and
does not hallucinate.
- Refresh button produces a fresh generation (new `request_id`).
**Verify**
- `cd adiuvAI && npm run lint`
- Manual: navigate `/projects?projectId=<uuid>`, click Generate.
---
## Phase 7 — Observability + cleanup
**Goal:** The brief agent is visible in Langfuse as its own generation name,
and the old hard-coded prompt is fully removed.
**Files**
- `api/app/core/brief_agent.py`
- `adiuvAI/.claude/CLAUDE.md` — document the new agent
- `.claude/CLAUDE.md` (root) — add a short line under "api" section
**Tasks**
- [ ] Verify that `_run_single_agent_stream` uses `agent_name="brief-agent"`
so the Langfuse span/generation is named accordingly. Spot-check one trace
in the Langfuse UI.
- [ ] In `adiuvAI/.claude/CLAUDE.md`, add under the "AI Subsystem" section a
new bullet for `brief-agent` in the agents table (Scope: "Daily home brief
and per-project status brief", Tools: read-only subset).
- [ ] In the root `.claude/CLAUDE.md`, under the `api/` architecture section,
add `brief_agent.py` to the "Orchestration" list with a one-line purpose.
- [ ] Delete `scripts/smoke_brief.py` if it was committed.
**Acceptance**
- A Langfuse trace for a home brief shows span `brief-agent-stream` containing
a generation `brief-agent-llm` linked to the `home_brief` prompt version.
- `rg -n "DAILY_BRIEF_PROMPT" adiuvAI/` returns no matches.
- `rg -n "home_brief|project_brief" api/` shows usages only in `brief_agent.py`.
**Verify**
- Trigger one home brief and one project brief, open Langfuse, confirm traces.
---
## Phase 8 — Regression + doc polish
**Goal:** existing home chat behavior unchanged; brief behavior documented for
future contributors.
**Tasks**
- [ ] Open the home chat, send a normal message ("what are my tasks today?").
Response must still include `<task>` tag lines (the chat agent still uses
the tag contract; only the brief agent does not).
- [ ] Open the floating panel on a task. Response must still be plain text with
no tags (existing contract).
- [ ] Add a short "Daily Brief" paragraph to the user-facing docs (if any
marketing/help doc exists — otherwise skip).
**Acceptance**
- No regressions in home chat or floating chat.
- Plan document (`docs/plan-brief-agent.md`) is marked complete: every phase's
task checklist is `- [x]`.
**Verify**
- Manual QA pass of: home brief, project brief, home chat, floating chat on
task, floating chat on project.
---
## Out of scope (explicitly)
- Push notifications / proactive brief delivery (already a separate plan).
- Weekly / monthly brief variants.
- Brief export to PDF / email.
- Per-client brief (clients are currently a lightweight table, not a UI page).
- Writing tools in the brief agent — it stays read-only. If the user acts on
the brief ("create a task for X"), they send that as a normal home-chat
message, and the home agent handles it.