first commit

2026-04-08 22:55:08 +02:00
commit 1f1ce7d40e
20 changed files with 2531 additions and 0 deletions
--- a/docs/local_agent_v2_mem.md
+++ b/docs/local_agent_v2_mem.md
@@ -0,0 +1,303 @@
+# Local Agent V2 — Working Memory
+
+## Decisioni confermate
+
+- **Breaking change**: nessuna backward compatibility con prompt_template
+- **Preprocessing**: lato backend Python, approccio (c): handler predefiniti + fallback LLM futuro
+- **Primo handler**: email HTML. Altri tipi in futuro.
+- **Journey**: produce agent_config strutturato (JSON), non prompt monolitico
+- **L'utente vuole personalizzazione**: es. "summarize documenti nelle note per progetto"
+- **File types**: qualsiasi tipo, anche mischiati nella stessa directory
+- **Progetti**: numero variabile, deve scalare
+
+---
+
+## Architettura V2 — Flusso per file
+
+### [A] Detect + Preprocess (Python puro, zero LLM)
+
+```
+File raw da Electron
+    ↓
+detect_content_type(filename, raw_content)
+    → heuristic: extension + content patterns
+    → match a un content_type dal agent_config
+    ↓
+preprocess(content_type, raw_content)
+    → handler specifico (es. email_html → BeautifulSoup)
+    → Output: { content_type, clean_text, metadata: {subject, from, date, ...} }
+```
+
+Handlers predefiniti (MVP: solo email_html):
+- `email_html`: strip tags, estrai subject/from/to/date, splitta thread → ultimo msg
+- `generic_html`: estrai main content, strip nav/footer (futuro)
+- `plain_text`: pass-through (futuro)
+- `csv`: parse + summary (futuro)
+- `pdf`: estrai testo (futuro)
+- Fallback: raw text con limit
+
+### [B] Single LLM call — classify + extract + create
+
+Una sola call LLM con tool calling che fa tutto:
+
+**System prompt costruito da:**
+1. Istruzioni base (update-first, isAiSuggested=1, ecc.)
+2. Regole di estrazione del content_type (dal agent_config) ← posizione PROMINENTE
+3. Global rules (dal agent_config)
+4. Lista progetti compatta
+5. Istruzioni procedurali: identifica progetto → query entità → estrai → crea/aggiorna
+
+**User message:**
+- Filename + metadata
+- Testo pulito
+
+**Tools disponibili:**
+- list_tasks, list_notes, list_timelines (query)
+- create_task, create_note, create_timeline
+- update_task, update_note, update_timeline
+
+**Max steps:** 12 (loop tool calling)
+
+### Journey → agent_config (JSON strutturato)
+
+```json
+{
+  "content_types": [
+    {
+      "id": "email_html",
+      "label": "Email HTML",
+      "detection_hint": "HTML con struttura email (From/To/Subject)",
+      "preprocessing": "email_html",
+      "extraction_prompt": "Per ogni email: azione diretta → task..."
+    }
+  ],
+  "global_rules": [
+    "Se il file non è riconducibile a nessun progetto, non creare entità."
+  ],
+  "data_types": ["tasks", "notes", "timelines"]
+}
+```
+
+---
+
+## Problemi V1 e come V2 li risolve
+
+| # | Problema V1 | Soluzione V2 |
+|---|---|---|
+| P1 | HTML raw all'LLM | Preprocessing Python → testo pulito |
+| P2 | Troncamento 4000 char | Testo preprocessato, molto più denso |
+| P3 | Nessuna gestione thread | Handler email splitta thread, ultimo msg |
+| P4 | Project matching debole | Filename come segnale primario + testo pulito |
+| P5 | custom_prompt in coda | Extraction rules in posizione prominente |
+| P6 | Nessun preprocessing | Handler predefiniti per tipo |
+| P7 | items_created sempre 0 | Fix nel runner (contare tool call results) |
+
+---
+
+## Modifiche al codice necessarie
+
+### Backend (adiuva-api)
+
+1. **Nuovo modulo**: `app/core/preprocessors/` con handler per tipo
+   - `__init__.py` — registry + detect + dispatch
+   - `email_html.py` — BeautifulSoup: strip, metadata, thread split
+   - `base.py` — interfaccia base + fallback
+
+2. **`agent_setup.py`**: Journey produce agent_config JSON, non prompt_template
+   - System prompt aggiornato per generare JSON strutturato
+   - Validazione output con schema Pydantic
+
+3. **`agent_runner.py`**: Flusso rivisto
+   - Rimuovere `_classify_file()` (Step 1 separato)
+   - Aggiungere preprocess step prima della call LLM
+   - Single LLM call con prompt tipo-specifico
+   - Contare items_created dai tool call results
+
+4. **`models.py`**: `prompt_template: Text` → `agent_config: JSON`
+
+### Frontend (adiuva)
+
+5. **`store.ts`**: Campo `promptTemplate` → `agentConfig`
+6. **`JourneyDialog.tsx`**: Parsing JSON da journey reply
+7. **`agent-scheduler.ts`**: Passa `agentConfig` al trigger
+8. **Schema Pydantic/Zod**: Aggiornare per nuovo formato
+
+---
+
+---
+
+## Stato implementazione
+
+| Step | Stato | Branch |
+|------|-------|--------|
+| Step 1 — Preprocessors | ✅ DONE | `feature/batch-agent-v2` |
+| Step 2 — agent_runner.py refactor | ✅ DONE | `feature/batch-agent-v2` |
+| Step 3 — Model/schema agent_config | ✅ DONE | `feature/batch-agent-v2` |
+| Step 4 — Journey setup output strutturato | ✅ DONE | `feature/batch-agent-v2` |
+| Step 5 — Frontend | ✅ DONE | main |
+| Step 6 — E2E con file reali | ⏳ TODO | — |
+
+---
+
+## Convenzioni test (aggiornate dopo implementazione step 1–2)
+
+### Struttura fixture
+
+```
+tests/fixtures/<step_name>/
+  cases.yaml        ← definizioni dei casi
+  data/             ← file di input (HTML, txt, ...)
+```
+
+Opzione CLI per sovrascrivere la cartella:
+```bash
+pytest tests/test_<step>.py -v --<step>-dir /path/to/folder
+```
+Registrata in `conftest.py` via `pytest_addoption`. La cartella custom deve avere la stessa struttura (`cases.yaml` + `data/`).
+
+Opzioni registrate finora:
+- `--preprocess-dir` → step 1
+- `--runner-dir` → step 2 (aggiungere `--journey-dir` per step 4, `--e2e-dir` per step 6)
+
+### Schema YAML — principi (step 1 vs step 2)
+
+**Step 1 (preprocessors) — test deterministici, no LLM:**
+- Chiavi piatte: `detect:`, `process:`, `no_html:`, `min_chars:`, ecc.
+- Nessun `description` né `score_name` (Langfuse non usato)
+- `file:` serve sia come nome su disco che come filename passato alla funzione
+- `generate: binary_noise` per contenuto sintetico
+
+**Step 2+ (runner, journey, e2e) — test LLM eval:**
+- `file:` = nome su disco in `data/`
+- `file_path:` = path vista dall'agent (separato perché più casi riusano lo stesso file con path diversi, es. per testare project matching da filename vs content)
+- `description:` presente nel YAML (utile nel report pytest)
+- `score_name:` presente nel YAML (il nome con cui lo score viene inviato a Langfuse)
+- `projects:` lista di nomi simbolici (`alpha`, `beta`) o dict inline `{id, name, status}` — risolta da `_resolve_projects()`
+- Assertion keys piatte: `expect_insert`, `expect_no_insert`, `expect_project_id`, `expect_dedup`
+
+### Parametrize da YAML
+
+Usare `pytest_generate_tests` per accedere all'opzione CLI custom:
+
+```python
+def pytest_generate_tests(metafunc):
+    if "runner_case" not in metafunc.fixturenames:
+        return
+    cases = _load_cases(metafunc.config)
+    metafunc.parametrize("runner_case", cases, ids=[c["id"] for c in cases])
+```
+
+I test accedono alla dir via `pytestconfig`:
+```python
+async def test_eval_runner(runner_case, pytestconfig):
+    data_dir = _fixtures_dir(pytestconfig) / "data"
+```
+
+### Langfuse V3 — pattern corretto
+
+**Problemi riscontrati con V2 API (non usare):**
+- `lf.trace()` → non esiste in V3
+- `lf.score(trace_id=...)` → non esiste in V3
+- `lf.start_as_current_observation(user_id=..., session_id=...)` → kwargs non accettati
+
+**Pattern V3 corretto nei test eval:**
+```python
+from contextlib import nullcontext
+lf = get_langfuse()
+obs_ctx = lf.start_as_current_observation(
+    name="eval-runner-2.1",
+    metadata={"step": "2", "case_id": "2.1"},
+) if lf else nullcontext()
+
+with obs_ctx as obs:
+    # ... esegui il codice ...
+    if obs is not None:
+        obs.score(name="runner.email_to_task", value=1.0, comment="...")
+
+if lf:
+    lf.flush()
+```
+
+**Pattern V3 corretto nel codice produzione (`agent_runner.py`, `deep_agent.py`, `agent_setup.py`):**
+```python
+# user_id e session_id vanno in metadata, NON come kwarg diretti
+lf.start_as_current_observation(
+    as_type="span",
+    name="my-span",
+    metadata={"user_id": user_id, "session_id": session_id},
+    input=...,
+)
+```
+
+### compile_prompt — non usare template.format() direttamente
+
+`get_prompt_or_fallback()` ritorna il template grezzo. Langfuse usa `{{variable}}`, il fallback usa `{variable}`. Usare sempre `compile_prompt()` che dispatcha correttamente:
+
+```python
+from app.core.langfuse_client import compile_prompt, get_prompt_or_fallback
+
+template, prompt_obj = get_prompt_or_fallback("my_prompt", FALLBACK_PROMPT)
+compiled = compile_prompt(template, prompt_obj, var1=val1, var2=val2)
+# ↑ usa prompt_obj.compile() per Langfuse, template.format() per fallback
+```
+
+**Non fare mai:**
+```python
+compiled = template.format(var1=val1)  # ❌ rompe con Langfuse (usa {{var1}})
+```
+
+### Struttura test file per step LLM eval
+
+Pattern consolidato da `test_agent_runner_v2.py`:
+
+```
+tests/test_<step>.py
+  ├── Costanti (_USER_ID, _DEFAULT_FIXTURE_DIR, _AGENT_CONFIG, simboli progetto)
+  ├── _fixtures_dir(config) + _load_cases(config) + _read_case_file(case, data_dir)
+  ├── _resolve_projects(entries) — gestisce sia stringhe simboliche che dict inline
+  ├── pytest_generate_tests — parametrize eval tests da YAML
+  ├── Helper builders (_make_config, _make_run_log, _make_manager, _make_executor)
+  ├── Unit tests statici (no YAML, no LLM)
+  └── test_eval_<step>(runner_case, pytestconfig) — unica funzione parametrizzata
+        ↓ legge file, risolve progetti, crea executor, chiama runner
+        ↓ _evaluate_case(case, calls, kwargs) → (score, comment)
+        ↓ obs.score(...) se Langfuse attivo
+```
+
+`_evaluate_case()` centralizza tutta la logica di assertion mappata dalle chiavi YAML — nessuna logica di assert sparsa nel test.
+
+### Step 4 — Journey V2: pattern specifici
+
+**Sentinelle:** `AGENT_CONFIG_START` / `AGENT_CONFIG_END` (rimpiazzano `PROMPT_TEMPLATE_START/END`)
+
+**Langfuse prompt:** `journey_system_v2` (non `journey_system` della V1)
+
+**Frame key:** `existing_config` (JSON string, rimpiazza `existing_template` stringa in prosa)
+
+**Ritorno handler:** chiave `agent_config` (JSON string validato da Pydantic) invece di `prompt_template`
+
+**Executor per test journey:** usa `set_client_executor(executor)` / `clear_client_executor()` direttamente nel test helper `_run_journey`, mimando `device_ws._handle_journey_start`. Re-imposta prima di ogni chiamata (start + ogni message).
+
+**Fixture YAML journey:** `directory_files: [{path, content_file}]` + `user_messages: [...]` + assertion keys flat (`expect_question`, `expect_done`, `expect_valid_config`, `expect_content_type_id`, `expect_extraction_contains`, `expect_global_rules`)
+
+**Test nudge (unit):** popola `_sessions` con una `JourneySession` fake con `_MAX_TURNS` turni, patcha `_call_llm_with_tools`, verifica che il secondo call riceva il nudge con i nuovi marker nelle `history`.
+
+**JSON nel system prompt:** i literal `{` e `}` nel JSON di esempio devono essere `{{` e `}}` per il fallback `str.format()`. Le variabili template usano `{var}` (singolo). `compile_prompt()` gestisce il dispatch corretto per Langfuse vs fallback.
+
+### Step 5 — Frontend V2: pattern specifici
+
+**Store (`LocalAgentLocalConfig`):** campo `agentConfig: Record<string, unknown> | null` sostituisce `promptTemplate: string`. Stored nell'electron-store come oggetto JSON.
+
+**Trigger body:** lo scheduler e `runNow` mandano `agentConfig` (oggetto, non `customAgentPrompt` stringa).
+
+**WS frame `journey_start`:** campo `existingConfig` (JSON string) rimpiazza `existingTemplate` stringa. Backend si aspetta `existing_config` (snake_case via `toSnakeCase()`).
+
+**WS frame `journey_reply`:** campo `agentConfig` (JSON string) rimpiazza `promptTemplate`. Il FE lo riceve come stringa, lo parsa con `JSON.parse()` → oggetto.
+
+**tRPC journey router:** ritorna `{ ..., agentConfig: string | undefined }`. I componenti React lo parsano localmente.
+
+**Cloud agents:** non migrati — mantengono `promptTemplate: string` in `CloudAgentConfigSchema`, `agentCloudRouter`, `PromptBuilderChat.onPromptUpdate`. Il `PromptBuilderChat` ora ha anche `onConfigUpdate` per il path local.
+
+**`JourneyDialog`:** props `currentConfig: Record<string, unknown> | null` + `onSaved(agentConfig: Record<string, unknown>)`. Mostra un summary human-readable (`AgentConfigSummary`) invece del raw prompt string.
+
+**`InlineAgentCreationStepper`:** mantiene `promptTemplate` state per cloud; aggiunge `agentConfig` state per local. `PromptBuilderChat` richiama `onConfigUpdate` per local e `onPromptUpdate` per cloud (backward-compat).