Files
workspace/docs/plan-brief-agent.md
Roberto Musso 0ac2ce924d brief agent
2026-04-18 22:20:03 +02:00

23 KiB
Raw Blame History

Dedicated Brief Agent (Home + Project)

Ralph-loop plan. Execute one phase per iteration. Each phase is self-contained: its Files, Tasks, Acceptance, and Verify blocks are everything the agent needs to finish that phase without re-reading earlier phases. Mark - [x] as you complete tasks. Do not start phase N+1 until phase N's Acceptance is fully met.

Environment

All Python commands in this plan (pytest, python, ruff, alembic) must be run inside the api/ project's virtualenv at api/.venv.

  • bash / WSL / macOS / Linux: source api/.venv/bin/activate (or prefix commands with api/.venv/bin/python -m ...)
  • Windows PowerShell: api\.venv\Scripts\Activate.ps1
  • Windows bash shell (this repo's default): source api/.venv/Scripts/activate

Do not use the system Python or a globally installed pytest/ruff — dependencies are pinned inside the venv. Every Verify step below assumes the venv is active.


Context

Today the daily brief is produced by the home-agent with a prompt stuffed into sendHomeRequest() at adiuvAI/src/main/ai/orchestrator.ts:160. This couples two very different jobs (chat vs summarisation) into one agent and one prompt. Worse: the home agent is wired to emit XML tag wrappers (<task>, <timeline>, <project>) for the UI component renderer — wrappers the brief does not want. We filter them out in post, but the LLM still pays tokens for them and sometimes leaks malformed tags.

We want a dedicated brief agent that:

  • Runs in two modeshome (daily brief) and project (per-project status brief).
  • Produces plain text only — no XML/HTML tag wrappers, no bracketed id lists.
  • Uses the same infra pattern as the other agents: get_agent_llm(...), .env override, Langfuse prompt via get_prompt_or_fallback(), Langfuse tracing via langfuse_context + generation observations.
  • Is memory-aware — core memory and relational memory are injected into the system prompt so the brief can say "Client X usually pays late — your invoice is still out" instead of a generic list.
  • Is read-only — no create/update/delete tools. Tool surface is the minimum needed to answer "what needs attention right now?".

Architecture

Electron ─ WsFrame{type:"brief_request", mode:"home"|"project", project_id?} ─►  device_ws
                                                                                      │
                                                                                      ▼
                                                                          core/brief_agent.py
                                                                          ├── run_home_brief()
                                                                          └── run_project_brief()
                                                                                      │
                                                             read-only tool subset    │
                                                            (tasks, projects, notes,  │
                                                              timelines, memory get)  │
                                                                                      ▼
                                                                  plain-text stream   ▲
                                                                   back to renderer  ─┘

Key decisions:

  • New WS frame type brief_requestnot a reuse of home_request — so the frame payload stays small and typed, and the server can pick the right agent without sniffing the prompt.
  • Read-only tools only. Give the LLM access to the same data the UI sees (tasks, projects, notes, timelines, memory get). No mutating tools, no memory-write tools — a brief should never change state.
  • resolved_project_id is passed explicitly in the request payload for the project mode (no LLM-side resolution). The home mode omits it.
  • Both modes stream — the UI already has streaming rendering for the home brief; we reuse the same pattern for the project brief card.

Improved prompts

The current prompt tries to do everything in one paragraph. These split the job into role, data rules, voice, and output contract — the structure the model actually follows.

home_brief (Langfuse prompt, label production)

You are the user's personal assistant producing a short daily brief.

ROLE
Act like a calm, attentive secretary writing a stand-up note for your boss.
Warm and human, never breezy. Never cheerful filler, never emojis, never
"here is your brief" meta-text. The user is opening the app mid-workday and
is probably stressed — your job is to lower cognitive load, not add noise.

TOOLS — always call before writing
Pull fresh data every run. Do not invent counts or titles. Use at minimum:
- list_tasks_due_today — tasks the user owes today
- list_timeline_events_today — events starting or ending today
- list_active_projects — projects currently in progress or at risk
- memory_list_blocks / memory_get — personal context about people, clients,
  payment habits, working preferences
If a tool returns nothing, simply omit that topic. Never report zeros.

WHAT TO INCLUDE
1. Tasks due today (title + priority; group the 12 most important).
2. Timeline events starting or ending today (and anything that starts/ends
   tomorrow if the user has a very light day).
3. Active projects that need a nudge — stalled, blocked, or awaiting input.
4. Memory-aware colour where it sharpens the brief. Examples:
   - "Client Rossi tends to pay late — the Acme invoice is 6 days out."
   - "You usually dislike meetings before 10:00 — the call at 09:30 is unusual."
   Only add a memory line when it changes what the user does. Do not pad.

WHAT TO OMIT
- Zero-counts ("no overdue items", "0 meetings today").
- Statistics ("2 active projects, 3 completed tasks").
- Headers, titles, greetings, sign-offs, dates, emojis, slang.
- Meta-phrases ("here is", "let me know if", "hope this helps").
- XML/HTML tags of any kind. Plain prose only.

LIGHT-DAY CLAUSE
If tasks + events + active-project-nudges together produce fewer than two
sentences of content, also list 12 projects in status `on_hold` or `waiting`
and ask a single, specific question about them — e.g. "Is the Bianchi
redesign still paused, or ready to pick back up?" One question max, grounded
in a real project name.

VOICE
- Calm. Concise. Human. Short sentences.
- Use **bold** sparingly for task titles, project names, and people's names.
- No bullet lists. Flow as 24 sentences of prose.

LENGTH
24 sentences total. Hard cap 4. If the day is truly empty, one sentence.

Respond in the user's language ({{language}}). Today is {{today}}.

Variables: {{language}} (e.g. "Italian"), {{today}} (ISO date).

project_brief (Langfuse prompt, label production)

You are the project assistant producing a short status brief for ONE project.

ROLE
A senior project manager summarising state-of-play for the owner. Factual,
sharp, forward-looking. Never reassuring filler, never emojis.

SCOPE
Work only with project_id = {{project_id}}. Do not mention or pull data from
other projects. Use tools to fetch fresh data:
- get_project — current status, dates, description
- list_tasks(project_id) — open work, split by status
- list_timeline_events(project_id) — milestones hit, upcoming, overdue
- list_project_notes(project_id) — any recent decisions or blockers
- memory_get — relevant context about the client, collaborators, constraints

STRUCTURE — follow exactly, one short paragraph per section, no headers
1. **State.** One sentence: current phase, health (on track / at risk / blocked),
   and why. Cite the concrete signal (overdue milestone, stalled tasks, recent
   blocker note).
2. **What's moving.** What was completed or progressed recently. Name specific
   tasks or milestones.
3. **Next steps.** The 13 most important things the user should do next, in
   priority order. Be concrete — task name, who owns it, when due if known.
   If waiting on someone else, name them and what the ask is.
4. **Risks / memory-flagged items.** One line max. Only include when there is
   a real risk or a relevant memory (e.g. late-paying client, tight deadline,
   scope change). Omit the section entirely if nothing to say.

WHAT TO OMIT
- Zero-counts ("no overdue tasks").
- Generic advice ("keep up the good work").
- Greetings, headers, bullet lists, emojis, sign-offs, meta-phrases.
- XML/HTML tags or bracketed id lists. Plain prose only.

VOICE
- Direct. Factual. No fluff.
- Use **bold** sparingly for task titles, milestone names, and the owner's name.
- Short sentences. Prefer verbs over nouns ("Client review is blocking release"
  not "There is a blocker which is the client review").

LENGTH
48 sentences total across the 34 sections. Hard cap 8.

Respond in the user's language ({{language}}). Today is {{today}}.

Variables: {{project_id}}, {{language}}, {{today}}.


Phase 1 — Backend config scaffolding

Goal: LLM_MODEL_BRIEF_AGENT resolvable via get_agent_llm("brief-agent").

Files

  • api/app/config/settings.py
  • api/app/core/llm.py
  • api/.env.example

Tasks

  • Add field LLM_MODEL_BRIEF_AGENT: str = "" to Settings after LLM_MODEL_CLOUD_PROCESSOR.
  • Add "brief-agent": lambda: settings.LLM_MODEL_BRIEF_AGENT or settings.LLM_MODEL entry to _AGENT_MODEL_SETTINGS in llm.py.
  • Add a commented-out LLM_MODEL_BRIEF_AGENT= block in .env.example, with a 2-line description mirroring the existing style ("Brief-agent — produces home and project text briefs. A small model (e.g. gpt-4o-mini) is sufficient.").

Acceptance

  • python -c "from app.core.llm import model_for_agent; print(model_for_agent('brief-agent'))" prints the default model (matches LLM_MODEL) when the override is empty; prints the override when set.
  • ruff check . passes.

Verify

  • cd api && source .venv/Scripts/activate && python -c "from app.core.llm import model_for_agent; print(model_for_agent('brief-agent'))"
  • cd api && source .venv/Scripts/activate && ruff check .

Phase 2 — Brief-agent module (read-only tool subset)

Goal: run_home_brief() and run_project_brief() callables exist and work end-to-end against a live backend, producing plain-text streams. No WS wiring yet — exercised via a scripts/smoke_brief.py one-liner.

Files (new)

  • api/app/core/brief_agent.py

Files (touched)

  • api/app/agents/task_agent.py — export a TASK_READ_TOOLS list (list_tasks, list_tasks_due_today, list_task_comments).
  • api/app/agents/project_agent.py — export a PROJECT_READ_TOOLS list (list_projects, list_all_projects, get_project).
  • api/app/agents/timeline_agent.py — export a TIMELINE_READ_TOOLS list (list_timelines, plus a new list_timelines_today that filters by today — add it alongside the existing tools) and a list_timeline_events alias scoped by project_id.
  • api/app/agents/note_agent.py — export a NOTE_READ_TOOLS list (list_notes, get_note).

Tasks

  • Add the four *_READ_TOOLS exports in the agent files. Do not remove the existing *_TOOLS exports — the chat agents still use them.
  • Add list_timelines_today in timeline_agent.py: returns only timelines whose date falls on today (UTC). Mirror the shape of list_tasks_due_today.
  • Create brief_agent.py with:
    • Module-level fallback prompt constants _HOME_BRIEF_FALLBACK and _PROJECT_BRIEF_FALLBACK — copy the prompts from the plan above verbatim, using {language} / {today} / {project_id} (single-brace) so .format() works when Langfuse is unavailable.
    • Read-only memory tools subset: reuse _memory_tools() from deep_agent.py but filter to memory_list_blocks, memory_get, archival_memory_search, conversation_search. Factor out a small helper _read_only_memory_tools() in deep_agent.py (or duplicate locally — keep it simple).
    • async def run_home_brief(user_id, context) -> AsyncGenerator[tuple[str, Any], None]
    • async def run_project_brief(user_id, project_id, context) -> AsyncGenerator[tuple[str, Any], None]
    • Both reuse _run_single_agent_stream from deep_agent.py by passing agent_name="brief-agent" and the relevant prompt. Tool list is the read-only subset.
    • Inject _language_instruction, _relational_memory_injection, and _proactive_hints_injection into the system prompt — same pattern as run_home_stream.
    • After rendering the system prompt with compile_prompt, append a line "\nToday is YYYY-MM-DD." only if the Langfuse template did not already include {{today}} substitution (safe fallback).
  • Do not call _normalize_tagged_list_lines on the output — the brief prompt forbids tags, so skipping the post-processor is a deliberate signal of correctness.

Acceptance

  • Importing from app.core.brief_agent import run_home_brief, run_project_brief succeeds.
  • A smoke script scripts/smoke_brief.py (create it; git-ignore it; OK to delete afterwards) runs run_home_brief against a seeded test user and streams text to stdout. Output contains no < or [uuid] substrings.
  • ruff check . passes.

Verify

  • cd api && source .venv/Scripts/activate && python scripts/smoke_brief.py home
  • cd api && source .venv/Scripts/activate && python scripts/smoke_brief.py project <uuid>
  • cd api && source .venv/Scripts/activate && ruff check .

Phase 3 — WS frame + REST fallback

Goal: Electron can send {type:"brief_request", mode, project_id?} over the device WS and receive a plain-text stream. REST POST /chat/brief exists as fallback.

Files

  • api/app/schemas.py
  • api/app/api/routes/device_ws.py
  • api/app/api/routes/chat.py

Tasks

  • In schemas.py: add brief_request = "brief_request" to WsFrameType, and a WsBriefRequest model with fields type: Literal[WsFrameType.brief_request], request_id: str | None, session_id: str | None, mode: Literal["home", "project"], project_id: str | None.
  • In device_ws.py: add an elif frame_type == WsFrameType.brief_request: branch that dispatches to a new _handle_brief_request task.
  • Implement _handle_brief_request by mirroring _handle_home_request but:
    • Call run_home_brief(user_id, context) when mode == "home", run_project_brief(user_id, project_id, context) when mode == "project" (validate project_id is a UUID; send stream_end with error frame otherwise).
    • Skip episode storage — briefs are not conversations.
    • Still run memory.enrich_context(...) so relational/proactive memory is injected.
  • In chat.py: add POST /chat/brief that accepts {mode, project_id?} and returns the full text (collects stream). This is the offline fallback path used when the WS is not ready.

Acceptance

  • Electron smoke client opens the WS, sends a brief_request with mode:"home", and receives stream_start → N × stream_textstream_end frames.
  • POST /chat/brief returns {response: "..."}.
  • Malformed project_id → WS frame stream_end with an error message (no server crash).

Verify

  • Run pytest existing suite: cd api && source .venv/Scripts/activate && pytest -q.
  • Add one unit test tests/test_brief_agent.py covering: home mode returns non-empty text; project mode with bogus UUID returns an error without crashing; tools called are from the read-only subset (monkeypatch run_home_brief to assert the tool list).
  • Then: cd api && source .venv/Scripts/activate && pytest tests/test_brief_agent.py -v.

Phase 4 — Langfuse prompts

Goal: home_brief and project_brief prompts exist in Langfuse at label production, matching the content in this plan.

Files

  • None (external config via MCP).

Tasks

  • Use mcp__langfuse-docs__searchLangfuseDocs to confirm the text-prompt variable syntax ({{variable}}) and that label="production" is the label read by get_prompt_or_fallback.
  • Use mcp__langfuse__createTextPrompt to create home_brief with the content from the "Improved prompts → home_brief" section above. Set label to production. Variables: language, today.
  • Use mcp__langfuse__createTextPrompt to create project_brief with the content from the "Improved prompts → project_brief" section above. Set label to production. Variables: language, today, project_id.
  • Use mcp__langfuse__getPrompt to round-trip both prompts and verify the raw template matches what was sent.

Acceptance

  • Both prompts resolve via get_prompt_or_fallback("home_brief", "") and get_prompt_or_fallback("project_brief", "") in a Python shell against the real Langfuse instance — return a non-empty raw_template and a non-None prompt_obj.
  • prompt_obj.compile(language="Italian", today="2026-04-17") returns text containing the Italian directive and the date.

Verify

  • cd api && source .venv/Scripts/activate && python -c "from app.core.langfuse_client import get_prompt_or_fallback; t,p = get_prompt_or_fallback('home_brief', ''); print(len(t), p is not None)"

Phase 5 — Electron client: home brief uses new agent

Goal: The existing home-brief UI flow (toast + full card) calls the new brief agent over WS, and the DAILY_BRIEF_PROMPT constant is deleted.

Files

  • adiuvAI/src/shared/api-types.ts (or wherever WS types live)
  • adiuvAI/src/main/api/backend-client.ts
  • adiuvAI/src/main/ai/orchestrator.ts
  • adiuvAI/src/main/router/index.ts

Tasks

  • Add WsBriefRequest frame shape to shared types, mirroring the API WsBriefRequest schema.
  • In backend-client.ts, add sendBriefRequest(mode, projectId?, callbacks, requestId?) modeled on sendHomeRequest. It sends {type:"brief_request", mode, project_id}.
  • In orchestrator.ts:
    • Delete the DAILY_BRIEF_PROMPT constant (and the langSuffix hack — the backend now owns language injection).
    • generateAndCacheBrief() → call client.sendBriefRequest("home", undefined, {...}).
    • dailyBrief() → call client.sendBriefRequest("home", undefined, {...}, requestId).
  • router/index.ts: no signature change — only the underlying orchestrator was rewired. Leave ai.dailyBrief mutation as-is.

Acceptance

  • Launch the Electron app, open Home, brief renders within 10s. No <task>/<timeline> markers appear in the output. Italian UI user gets Italian prose.
  • Grep confirms DAILY_BRIEF_PROMPT no longer exists in the repo.

Verify

  • cd adiuvAI && npm run lint
  • Manual: Home page renders a fresh brief. Toggle isHomePage nav away/back to check the cache path still works.

Phase 6 — Project brief UI card

Goal: Each project page has a "Brief" card that calls sendBriefRequest("project", id, ...) and renders streaming plain text.

Files

  • adiuvAI/src/renderer/components/projects/ProjectDetail.tsx
  • adiuvAI/src/renderer/components/projects/ProjectBriefCard.tsx (new)
  • adiuvAI/src/main/router/index.ts (add ai.projectBrief mutation)
  • adiuvAI/src/main/ai/orchestrator.ts (add projectBrief(sender, projectId, requestId))
  • Locale files (all 5).

Tasks

  • Add projectBrief(sender, projectId, requestId) in orchestrator.ts mirroring dailyBrief but with mode:"project" and no cache (cheap enough to regenerate on demand; add a simple in-memory TTL of 5 minutes keyed by projectId only if the UX feels laggy).
  • Add ai.projectBrief mutation in the tRPC router, input {projectId: z.string().uuid(), requestId: z.string().optional()}.
  • ProjectBriefCard: shadcn Card, Sparkles icon, text-sm body. States: idle (button "Generate brief") → streaming (skeleton + partial text) → ready (full text + "Refresh" button). Stream via window.electronAI.onStreamChunk() by request id.
  • Mount ProjectBriefCard at the top of ProjectDetail, above existing content.
  • Add i18n keys under projects.brief.*: title, generate, refresh, generating, error. Add to all 5 locale files.

Acceptance

  • On a project with tasks/timelines, the card streams a coherent 48 sentence brief with state / what's moving / next steps sections, no XML tags.
  • On an empty project (no tasks/timelines), the brief is still coherent and does not hallucinate.
  • Refresh button produces a fresh generation (new request_id).

Verify

  • cd adiuvAI && npm run lint
  • Manual: navigate /projects?projectId=<uuid>, click Generate.

Phase 7 — Observability + cleanup

Goal: The brief agent is visible in Langfuse as its own generation name, and the old hard-coded prompt is fully removed.

Files

  • api/app/core/brief_agent.py
  • adiuvAI/.claude/CLAUDE.md — document the new agent
  • .claude/CLAUDE.md (root) — add a short line under "api" section

Tasks

  • Verify that _run_single_agent_stream uses agent_name="brief-agent" so the Langfuse span/generation is named accordingly. Spot-check one trace in the Langfuse UI.
  • In adiuvAI/.claude/CLAUDE.md, add under the "AI Subsystem" section a new bullet for brief-agent in the agents table (Scope: "Daily home brief and per-project status brief", Tools: read-only subset).
  • In the root .claude/CLAUDE.md, under the api/ architecture section, add brief_agent.py to the "Orchestration" list with a one-line purpose.
  • Delete scripts/smoke_brief.py if it was committed.

Acceptance

  • A Langfuse trace for a home brief shows span brief-agent-stream containing a generation brief-agent-llm linked to the home_brief prompt version.
  • rg -n "DAILY_BRIEF_PROMPT" adiuvAI/ returns no matches.
  • rg -n "home_brief|project_brief" api/ shows usages only in brief_agent.py.

Verify

  • Trigger one home brief and one project brief, open Langfuse, confirm traces.

Phase 8 — Regression + doc polish

Goal: existing home chat behavior unchanged; brief behavior documented for future contributors.

Tasks

  • Open the home chat, send a normal message ("what are my tasks today?"). Response must still include <task> tag lines (the chat agent still uses the tag contract; only the brief agent does not).
  • Open the floating panel on a task. Response must still be plain text with no tags (existing contract).
  • Add a short "Daily Brief" paragraph to the user-facing docs (if any marketing/help doc exists — otherwise skip).

Acceptance

  • No regressions in home chat or floating chat.
  • Plan document (docs/plan-brief-agent.md) is marked complete: every phase's task checklist is - [x].

Verify

  • Manual QA pass of: home brief, project brief, home chat, floating chat on task, floating chat on project.

Out of scope (explicitly)

  • Push notifications / proactive brief delivery (already a separate plan).
  • Weekly / monthly brief variants.
  • Brief export to PDF / email.
  • Per-client brief (clients are currently a lightweight table, not a UI page).
  • Writing tools in the brief agent — it stays read-only. If the user acts on the brief ("create a task for X"), they send that as a normal home-chat message, and the home agent handles it.