diff --git a/.gitignore b/.gitignore index aece012..c747de4 100644 --- a/.gitignore +++ b/.gitignore @@ -97,3 +97,4 @@ unused_skills/ .claude/skills/webapp-testing/examples/element_discovery.py .claude/skills/webapp-testing/examples/static_html_automation.py .claude/skills/webapp-testing/scripts/with_server.py +.mcp.json diff --git a/.mcp.json b/.mcp.json index ce060cb..4e13957 100644 --- a/.mcp.json +++ b/.mcp.json @@ -3,6 +3,28 @@ "langfuse-docs": { "type": "http", "url": "https://langfuse.com/api/mcp" + }, + "langfuse": { + "type": "http", + "url": "https://langfuse.muticolturano.com/api/public/mcp", + "headers": { + "Authorization": "Basic cGstbGYtMGU2MmE5ZWItMDk3OC00ZTJlLWIzYWQtYmIzNjE5NDcwMWI4OnNrLWxmLTI4NmMxNjVmLTFjODQtNGEzNi1iMGIwLWNmZTViNjgwODk3ZA==" + } + }, + "postgres": { + "command": "docker", + "args": [ + "run", + "-i", + "--rm", + "-e", + "DATABASE_URI", + "crystaldba/postgres-mcp", + "--access-mode=restricted" + ], + "env": { + "DATABASE_URI": "postgresql://postgres:XVTsmNqsMJX5Cd%2FNrAG4%2F4KFoaVDEy2CXsFMDqi8m58%3D@10.0.0.123:5432/adiuvai" + } } } } \ No newline at end of file diff --git a/api b/api index 0b5ef48..d5fea95 160000 --- a/api +++ b/api @@ -1 +1 @@ -Subproject commit 0b5ef484630d5836e36520abb2b9518dbf59a195 +Subproject commit d5fea955611969869a6ae754107c4500aa78edfe diff --git a/docs/plan-brief-agent.md b/docs/plan-brief-agent.md new file mode 100644 index 0000000..4cfa17a --- /dev/null +++ b/docs/plan-brief-agent.md @@ -0,0 +1,510 @@ +# Dedicated Brief Agent (Home + Project) + +> Ralph-loop plan. Execute one phase per iteration. Each phase is self-contained: +> its **Files**, **Tasks**, **Acceptance**, and **Verify** blocks are everything +> the agent needs to finish that phase without re-reading earlier phases. +> Mark `- [x]` as you complete tasks. Do not start phase N+1 until phase N's +> **Acceptance** is fully met. + +## Environment + +All Python commands in this plan (`pytest`, `python`, `ruff`, `alembic`) must +be run inside the `api/` project's virtualenv at `api/.venv`. + +- **bash / WSL / macOS / Linux**: `source api/.venv/bin/activate` (or prefix + commands with `api/.venv/bin/python -m ...`) +- **Windows PowerShell**: `api\.venv\Scripts\Activate.ps1` +- **Windows bash shell (this repo's default)**: `source api/.venv/Scripts/activate` + +Do **not** use the system Python or a globally installed `pytest`/`ruff` — +dependencies are pinned inside the venv. Every `Verify` step below assumes the +venv is active. + +--- + +## Context + +Today the daily brief is produced by the `home-agent` with a prompt stuffed into +`sendHomeRequest()` at [adiuvAI/src/main/ai/orchestrator.ts:160](adiuvAI/src/main/ai/orchestrator.ts#L160). This +couples two very different jobs (chat vs summarisation) into one agent and one +prompt. Worse: the home agent is wired to emit XML tag wrappers (``, +``, ``) for the UI component renderer — wrappers the brief +does not want. We filter them out in post, but the LLM still pays tokens for +them and sometimes leaks malformed tags. + +We want a **dedicated brief agent** that: + +- Runs in **two modes** — `home` (daily brief) and `project` (per-project status brief). +- Produces **plain text only** — no XML/HTML tag wrappers, no bracketed id lists. +- Uses the **same infra pattern** as the other agents: `get_agent_llm(...)`, + `.env` override, Langfuse prompt via `get_prompt_or_fallback()`, + Langfuse tracing via `langfuse_context` + generation observations. +- Is **memory-aware** — core memory and relational memory are injected into the + system prompt so the brief can say "Client X usually pays late — your invoice + is still out" instead of a generic list. +- Is **read-only** — no create/update/delete tools. Tool surface is the minimum + needed to answer "what needs attention right now?". + +--- + +## Architecture + +``` +Electron ─ WsFrame{type:"brief_request", mode:"home"|"project", project_id?} ─► device_ws + │ + ▼ + core/brief_agent.py + ├── run_home_brief() + └── run_project_brief() + │ + read-only tool subset │ + (tasks, projects, notes, │ + timelines, memory get) │ + ▼ + plain-text stream ▲ + back to renderer ─┘ +``` + +**Key decisions:** + +- New WS frame type `brief_request` — *not* a reuse of `home_request` — so the + frame payload stays small and typed, and the server can pick the right agent + without sniffing the prompt. +- Read-only tools only. Give the LLM access to the same data the UI sees + (tasks, projects, notes, timelines, memory **get**). No mutating tools, no + memory-write tools — a brief should never change state. +- `resolved_project_id` is passed explicitly in the request payload for the + `project` mode (no LLM-side resolution). The `home` mode omits it. +- Both modes stream — the UI already has streaming rendering for the home + brief; we reuse the same pattern for the project brief card. + +--- + +## Improved prompts + +The current prompt tries to do everything in one paragraph. These split the job +into role, data rules, voice, and output contract — the structure the model +actually follows. + +### `home_brief` (Langfuse prompt, label `production`) + +``` +You are the user's personal assistant producing a short daily brief. + +ROLE +Act like a calm, attentive secretary writing a stand-up note for your boss. +Warm and human, never breezy. Never cheerful filler, never emojis, never +"here is your brief" meta-text. The user is opening the app mid-workday and +is probably stressed — your job is to lower cognitive load, not add noise. + +TOOLS — always call before writing +Pull fresh data every run. Do not invent counts or titles. Use at minimum: +- list_tasks_due_today — tasks the user owes today +- list_timeline_events_today — events starting or ending today +- list_active_projects — projects currently in progress or at risk +- memory_list_blocks / memory_get — personal context about people, clients, + payment habits, working preferences +If a tool returns nothing, simply omit that topic. Never report zeros. + +WHAT TO INCLUDE +1. Tasks due today (title + priority; group the 1–2 most important). +2. Timeline events starting or ending today (and anything that starts/ends + tomorrow if the user has a very light day). +3. Active projects that need a nudge — stalled, blocked, or awaiting input. +4. Memory-aware colour where it sharpens the brief. Examples: + - "Client Rossi tends to pay late — the Acme invoice is 6 days out." + - "You usually dislike meetings before 10:00 — the call at 09:30 is unusual." + Only add a memory line when it changes what the user does. Do not pad. + +WHAT TO OMIT +- Zero-counts ("no overdue items", "0 meetings today"). +- Statistics ("2 active projects, 3 completed tasks"). +- Headers, titles, greetings, sign-offs, dates, emojis, slang. +- Meta-phrases ("here is", "let me know if", "hope this helps"). +- XML/HTML tags of any kind. Plain prose only. + +LIGHT-DAY CLAUSE +If tasks + events + active-project-nudges together produce fewer than two +sentences of content, also list 1–2 projects in status `on_hold` or `waiting` +and ask a single, specific question about them — e.g. "Is the Bianchi +redesign still paused, or ready to pick back up?" One question max, grounded +in a real project name. + +VOICE +- Calm. Concise. Human. Short sentences. +- Use **bold** sparingly for task titles, project names, and people's names. +- No bullet lists. Flow as 2–4 sentences of prose. + +LENGTH +2–4 sentences total. Hard cap 4. If the day is truly empty, one sentence. + +Respond in the user's language ({{language}}). Today is {{today}}. +``` + +Variables: `{{language}}` (e.g. "Italian"), `{{today}}` (ISO date). + +### `project_brief` (Langfuse prompt, label `production`) + +``` +You are the project assistant producing a short status brief for ONE project. + +ROLE +A senior project manager summarising state-of-play for the owner. Factual, +sharp, forward-looking. Never reassuring filler, never emojis. + +SCOPE +Work only with project_id = {{project_id}}. Do not mention or pull data from +other projects. Use tools to fetch fresh data: +- get_project — current status, dates, description +- list_tasks(project_id) — open work, split by status +- list_timeline_events(project_id) — milestones hit, upcoming, overdue +- list_project_notes(project_id) — any recent decisions or blockers +- memory_get — relevant context about the client, collaborators, constraints + +STRUCTURE — follow exactly, one short paragraph per section, no headers +1. **State.** One sentence: current phase, health (on track / at risk / blocked), + and why. Cite the concrete signal (overdue milestone, stalled tasks, recent + blocker note). +2. **What's moving.** What was completed or progressed recently. Name specific + tasks or milestones. +3. **Next steps.** The 1–3 most important things the user should do next, in + priority order. Be concrete — task name, who owns it, when due if known. + If waiting on someone else, name them and what the ask is. +4. **Risks / memory-flagged items.** One line max. Only include when there is + a real risk or a relevant memory (e.g. late-paying client, tight deadline, + scope change). Omit the section entirely if nothing to say. + +WHAT TO OMIT +- Zero-counts ("no overdue tasks"). +- Generic advice ("keep up the good work"). +- Greetings, headers, bullet lists, emojis, sign-offs, meta-phrases. +- XML/HTML tags or bracketed id lists. Plain prose only. + +VOICE +- Direct. Factual. No fluff. +- Use **bold** sparingly for task titles, milestone names, and the owner's name. +- Short sentences. Prefer verbs over nouns ("Client review is blocking release" + not "There is a blocker which is the client review"). + +LENGTH +4–8 sentences total across the 3–4 sections. Hard cap 8. + +Respond in the user's language ({{language}}). Today is {{today}}. +``` + +Variables: `{{project_id}}`, `{{language}}`, `{{today}}`. + +--- + +## Phase 1 — Backend config scaffolding + +**Goal:** `LLM_MODEL_BRIEF_AGENT` resolvable via `get_agent_llm("brief-agent")`. + +**Files** +- `api/app/config/settings.py` +- `api/app/core/llm.py` +- `api/.env.example` + +**Tasks** +- [ ] Add field `LLM_MODEL_BRIEF_AGENT: str = ""` to `Settings` after `LLM_MODEL_CLOUD_PROCESSOR`. +- [ ] Add `"brief-agent": lambda: settings.LLM_MODEL_BRIEF_AGENT or settings.LLM_MODEL` entry to `_AGENT_MODEL_SETTINGS` in `llm.py`. +- [ ] Add a commented-out `LLM_MODEL_BRIEF_AGENT=` block in `.env.example`, with a 2-line description mirroring the existing style ("Brief-agent — produces home and project text briefs. A small model (e.g. gpt-4o-mini) is sufficient."). + +**Acceptance** +- `python -c "from app.core.llm import model_for_agent; print(model_for_agent('brief-agent'))"` prints the default model (matches `LLM_MODEL`) when the override is empty; prints the override when set. +- `ruff check .` passes. + +**Verify** +- `cd api && source .venv/Scripts/activate && python -c "from app.core.llm import model_for_agent; print(model_for_agent('brief-agent'))"` +- `cd api && source .venv/Scripts/activate && ruff check .` + +--- + +## Phase 2 — Brief-agent module (read-only tool subset) + +**Goal:** `run_home_brief()` and `run_project_brief()` callables exist and work +end-to-end against a live backend, producing plain-text streams. No WS wiring +yet — exercised via a `scripts/smoke_brief.py` one-liner. + +**Files (new)** +- `api/app/core/brief_agent.py` + +**Files (touched)** +- `api/app/agents/task_agent.py` — export a `TASK_READ_TOOLS` list + (`list_tasks`, `list_tasks_due_today`, `list_task_comments`). +- `api/app/agents/project_agent.py` — export a `PROJECT_READ_TOOLS` list + (`list_projects`, `list_all_projects`, `get_project`). +- `api/app/agents/timeline_agent.py` — export a `TIMELINE_READ_TOOLS` list + (`list_timelines`, plus a new `list_timelines_today` that filters by today + — add it alongside the existing tools) and a `list_timeline_events` alias + scoped by `project_id`. +- `api/app/agents/note_agent.py` — export a `NOTE_READ_TOOLS` list + (`list_notes`, `get_note`). + +**Tasks** +- [x] Add the four `*_READ_TOOLS` exports in the agent files. Do not remove the + existing `*_TOOLS` exports — the chat agents still use them. +- [x] Add `list_timelines_today` in `timeline_agent.py`: returns only timelines + whose `date` falls on today (UTC). Mirror the shape of + `list_tasks_due_today`. +- [x] Create `brief_agent.py` with: + - Module-level fallback prompt constants `_HOME_BRIEF_FALLBACK` and + `_PROJECT_BRIEF_FALLBACK` — copy the prompts from the plan above verbatim, + using `{language}` / `{today}` / `{project_id}` (single-brace) so + `.format()` works when Langfuse is unavailable. + - Read-only memory tools subset: reuse `_memory_tools()` from `deep_agent.py` + but filter to `memory_list_blocks`, `memory_get`, `archival_memory_search`, + `conversation_search`. Factor out a small helper `_read_only_memory_tools()` + in `deep_agent.py` (or duplicate locally — keep it simple). + - `async def run_home_brief(user_id, context) -> AsyncGenerator[tuple[str, Any], None]` + - `async def run_project_brief(user_id, project_id, context) -> AsyncGenerator[tuple[str, Any], None]` + - Both reuse `_run_single_agent_stream` from `deep_agent.py` by passing + `agent_name="brief-agent"` and the relevant prompt. Tool list is the + read-only subset. + - Inject `_language_instruction`, `_relational_memory_injection`, and + `_proactive_hints_injection` into the system prompt — same pattern as + `run_home_stream`. + - After rendering the system prompt with `compile_prompt`, append a line + `"\nToday is YYYY-MM-DD."` only if the Langfuse template did not already + include `{{today}}` substitution (safe fallback). +- [x] Do **not** call `_normalize_tagged_list_lines` on the output — the brief + prompt forbids tags, so skipping the post-processor is a deliberate signal + of correctness. + +**Acceptance** +- Importing `from app.core.brief_agent import run_home_brief, run_project_brief` succeeds. +- A smoke script `scripts/smoke_brief.py` (create it; git-ignore it; OK to delete afterwards) runs `run_home_brief` against a seeded test user and streams text to stdout. Output contains no `<` or `[uuid]` substrings. +- `ruff check .` passes. + +**Verify** +- `cd api && source .venv/Scripts/activate && python scripts/smoke_brief.py home` +- `cd api && source .venv/Scripts/activate && python scripts/smoke_brief.py project ` +- `cd api && source .venv/Scripts/activate && ruff check .` + +--- + +## Phase 3 — WS frame + REST fallback + +**Goal:** Electron can send `{type:"brief_request", mode, project_id?}` over +the device WS and receive a plain-text stream. REST `POST /chat/brief` exists +as fallback. + +**Files** +- `api/app/schemas.py` +- `api/app/api/routes/device_ws.py` +- `api/app/api/routes/chat.py` + +**Tasks** +- [ ] In `schemas.py`: add `brief_request = "brief_request"` to `WsFrameType`, + and a `WsBriefRequest` model with fields + `type: Literal[WsFrameType.brief_request]`, `request_id: str | None`, + `session_id: str | None`, `mode: Literal["home", "project"]`, + `project_id: str | None`. +- [ ] In `device_ws.py`: add an `elif frame_type == WsFrameType.brief_request:` + branch that dispatches to a new `_handle_brief_request` task. +- [ ] Implement `_handle_brief_request` by mirroring `_handle_home_request` but: + - Call `run_home_brief(user_id, context)` when `mode == "home"`, + `run_project_brief(user_id, project_id, context)` when `mode == "project"` + (validate `project_id` is a UUID; send `stream_end` with error frame + otherwise). + - **Skip** episode storage — briefs are not conversations. + - Still run `memory.enrich_context(...)` so relational/proactive memory is + injected. +- [ ] In `chat.py`: add `POST /chat/brief` that accepts `{mode, project_id?}` + and returns the full text (collects stream). This is the offline fallback + path used when the WS is not ready. + +**Acceptance** +- Electron smoke client opens the WS, sends a `brief_request` with `mode:"home"`, + and receives `stream_start` → N × `stream_text` → `stream_end` frames. +- `POST /chat/brief` returns `{response: "..."}`. +- Malformed `project_id` → WS frame `stream_end` with an error message (no server crash). + +**Verify** +- Run pytest existing suite: `cd api && source .venv/Scripts/activate && pytest -q`. +- Add one unit test `tests/test_brief_agent.py` covering: home mode returns + non-empty text; project mode with bogus UUID returns an error without + crashing; tools called are from the read-only subset (monkeypatch + `run_home_brief` to assert the tool list). +- Then: `cd api && source .venv/Scripts/activate && pytest tests/test_brief_agent.py -v`. + +--- + +## Phase 4 — Langfuse prompts + +**Goal:** `home_brief` and `project_brief` prompts exist in Langfuse at label +`production`, matching the content in this plan. + +**Files** +- None (external config via MCP). + +**Tasks** +- [ ] Use `mcp__langfuse-docs__searchLangfuseDocs` to confirm the text-prompt + variable syntax (`{{variable}}`) and that `label="production"` is the label + read by `get_prompt_or_fallback`. +- [ ] Use `mcp__langfuse__createTextPrompt` to create `home_brief` with the + content from the "Improved prompts → home_brief" section above. Set label + to `production`. Variables: `language`, `today`. +- [ ] Use `mcp__langfuse__createTextPrompt` to create `project_brief` with the + content from the "Improved prompts → project_brief" section above. Set label + to `production`. Variables: `language`, `today`, `project_id`. +- [ ] Use `mcp__langfuse__getPrompt` to round-trip both prompts and verify the + raw template matches what was sent. + +**Acceptance** +- Both prompts resolve via `get_prompt_or_fallback("home_brief", "")` and + `get_prompt_or_fallback("project_brief", "")` in a Python shell against the + real Langfuse instance — return a non-empty `raw_template` and a non-None + `prompt_obj`. +- `prompt_obj.compile(language="Italian", today="2026-04-17")` returns text + containing the Italian directive and the date. + +**Verify** +- `cd api && source .venv/Scripts/activate && python -c "from app.core.langfuse_client import get_prompt_or_fallback; t,p = get_prompt_or_fallback('home_brief', ''); print(len(t), p is not None)"` + +--- + +## Phase 5 — Electron client: home brief uses new agent + +**Goal:** The existing home-brief UI flow (toast + full card) calls the new +brief agent over WS, and the `DAILY_BRIEF_PROMPT` constant is deleted. + +**Files** +- `adiuvAI/src/shared/api-types.ts` (or wherever WS types live) +- `adiuvAI/src/main/api/backend-client.ts` +- `adiuvAI/src/main/ai/orchestrator.ts` +- `adiuvAI/src/main/router/index.ts` + +**Tasks** +- [ ] Add `WsBriefRequest` frame shape to shared types, mirroring the API + `WsBriefRequest` schema. +- [ ] In `backend-client.ts`, add `sendBriefRequest(mode, projectId?, callbacks, requestId?)` + modeled on `sendHomeRequest`. It sends `{type:"brief_request", mode, project_id}`. +- [ ] In `orchestrator.ts`: + - Delete the `DAILY_BRIEF_PROMPT` constant (and the `langSuffix` hack — the + backend now owns language injection). + - `generateAndCacheBrief()` → call `client.sendBriefRequest("home", undefined, {...})`. + - `dailyBrief()` → call `client.sendBriefRequest("home", undefined, {...}, requestId)`. +- [ ] `router/index.ts`: no signature change — only the underlying orchestrator + was rewired. Leave `ai.dailyBrief` mutation as-is. + +**Acceptance** +- Launch the Electron app, open Home, brief renders within 10s. No + ``/`` markers appear in the output. Italian UI user gets + Italian prose. +- Grep confirms `DAILY_BRIEF_PROMPT` no longer exists in the repo. + +**Verify** +- `cd adiuvAI && npm run lint` +- Manual: Home page renders a fresh brief. Toggle `isHomePage` nav away/back + to check the cache path still works. + +--- + +## Phase 6 — Project brief UI card + +**Goal:** Each project page has a "Brief" card that calls `sendBriefRequest("project", id, ...)` +and renders streaming plain text. + +**Files** +- `adiuvAI/src/renderer/components/projects/ProjectDetail.tsx` +- `adiuvAI/src/renderer/components/projects/ProjectBriefCard.tsx` (new) +- `adiuvAI/src/main/router/index.ts` (add `ai.projectBrief` mutation) +- `adiuvAI/src/main/ai/orchestrator.ts` (add `projectBrief(sender, projectId, requestId)`) +- Locale files (all 5). + +**Tasks** +- [ ] Add `projectBrief(sender, projectId, requestId)` in `orchestrator.ts` + mirroring `dailyBrief` but with `mode:"project"` and no cache (cheap enough + to regenerate on demand; add a simple in-memory TTL of 5 minutes keyed by + `projectId` only if the UX feels laggy). +- [ ] Add `ai.projectBrief` mutation in the tRPC router, input + `{projectId: z.string().uuid(), requestId: z.string().optional()}`. +- [ ] `ProjectBriefCard`: shadcn Card, Sparkles icon, `text-sm` body. States: + `idle` (button "Generate brief") → `streaming` (skeleton + partial text) → + `ready` (full text + "Refresh" button). Stream via + `window.electronAI.onStreamChunk()` by request id. +- [ ] Mount `ProjectBriefCard` at the top of `ProjectDetail`, above existing + content. +- [ ] Add i18n keys under `projects.brief.*`: `title`, `generate`, `refresh`, + `generating`, `error`. Add to all 5 locale files. + +**Acceptance** +- On a project with tasks/timelines, the card streams a coherent 4–8 sentence + brief with state / what's moving / next steps sections, no XML tags. +- On an empty project (no tasks/timelines), the brief is still coherent and + does not hallucinate. +- Refresh button produces a fresh generation (new `request_id`). + +**Verify** +- `cd adiuvAI && npm run lint` +- Manual: navigate `/projects?projectId=`, click Generate. + +--- + +## Phase 7 — Observability + cleanup + +**Goal:** The brief agent is visible in Langfuse as its own generation name, +and the old hard-coded prompt is fully removed. + +**Files** +- `api/app/core/brief_agent.py` +- `adiuvAI/.claude/CLAUDE.md` — document the new agent +- `.claude/CLAUDE.md` (root) — add a short line under "api" section + +**Tasks** +- [ ] Verify that `_run_single_agent_stream` uses `agent_name="brief-agent"` + so the Langfuse span/generation is named accordingly. Spot-check one trace + in the Langfuse UI. +- [ ] In `adiuvAI/.claude/CLAUDE.md`, add under the "AI Subsystem" section a + new bullet for `brief-agent` in the agents table (Scope: "Daily home brief + and per-project status brief", Tools: read-only subset). +- [ ] In the root `.claude/CLAUDE.md`, under the `api/` architecture section, + add `brief_agent.py` to the "Orchestration" list with a one-line purpose. +- [ ] Delete `scripts/smoke_brief.py` if it was committed. + +**Acceptance** +- A Langfuse trace for a home brief shows span `brief-agent-stream` containing + a generation `brief-agent-llm` linked to the `home_brief` prompt version. +- `rg -n "DAILY_BRIEF_PROMPT" adiuvAI/` returns no matches. +- `rg -n "home_brief|project_brief" api/` shows usages only in `brief_agent.py`. + +**Verify** +- Trigger one home brief and one project brief, open Langfuse, confirm traces. + +--- + +## Phase 8 — Regression + doc polish + +**Goal:** existing home chat behavior unchanged; brief behavior documented for +future contributors. + +**Tasks** +- [ ] Open the home chat, send a normal message ("what are my tasks today?"). + Response must still include `` tag lines (the chat agent still uses + the tag contract; only the brief agent does not). +- [ ] Open the floating panel on a task. Response must still be plain text with + no tags (existing contract). +- [ ] Add a short "Daily Brief" paragraph to the user-facing docs (if any + marketing/help doc exists — otherwise skip). + +**Acceptance** +- No regressions in home chat or floating chat. +- Plan document (`docs/plan-brief-agent.md`) is marked complete: every phase's + task checklist is `- [x]`. + +**Verify** +- Manual QA pass of: home brief, project brief, home chat, floating chat on + task, floating chat on project. + +--- + +## Out of scope (explicitly) + +- Push notifications / proactive brief delivery (already a separate plan). +- Weekly / monthly brief variants. +- Brief export to PDF / email. +- Per-client brief (clients are currently a lightweight table, not a UI page). +- Writing tools in the brief agent — it stays read-only. If the user acts on + the brief ("create a task for X"), they send that as a normal home-chat + message, and the home agent handles it.