workspace/docs/local_agent_v2_mem.md

# Local Agent V2 — Working Memory

## Decisioni confermate

- **Breaking change**: nessuna backward compatibility con prompt_template
- **Preprocessing**: lato backend Python, approccio (c): handler predefiniti + fallback LLM futuro
- **Primo handler**: email HTML. Altri tipi in futuro.
- **Journey**: produce agent_config strutturato (JSON), non prompt monolitico
- **L'utente vuole personalizzazione**: es. "summarize documenti nelle note per progetto"
- **File types**: qualsiasi tipo, anche mischiati nella stessa directory
- **Progetti**: numero variabile, deve scalare

---

## Architettura V2 — Flusso per file

### [A] Detect + Preprocess (Python puro, zero LLM)

```
File raw da Electron
    ↓
detect_content_type(filename, raw_content)
    → heuristic: extension + content patterns
    → match a un content_type dal agent_config
    ↓
preprocess(content_type, raw_content)
    → handler specifico (es. email_html → BeautifulSoup)
    → Output: { content_type, clean_text, metadata: {subject, from, date, ...} }
```

Handlers predefiniti (MVP: solo email_html):
- `email_html`: strip tags, estrai subject/from/to/date, splitta thread → ultimo msg
- `generic_html`: estrai main content, strip nav/footer (futuro)
- `plain_text`: pass-through (futuro)
- `csv`: parse + summary (futuro)
- `pdf`: estrai testo (futuro)
- Fallback: raw text con limit

### [B] Single LLM call — classify + extract + create

Una sola call LLM con tool calling che fa tutto:

**System prompt costruito da:**
1. Istruzioni base (update-first, isAiSuggested=1, ecc.)
2. Regole di estrazione del content_type (dal agent_config) ← posizione PROMINENTE
3. Global rules (dal agent_config)
4. Lista progetti compatta
5. Istruzioni procedurali: identifica progetto → query entità → estrai → crea/aggiorna

**User message:**
- Filename + metadata
- Testo pulito

**Tools disponibili:**
- list_tasks, list_notes, list_timelines (query)
- create_task, create_note, create_timeline
- update_task, update_note, update_timeline

**Max steps:** 12 (loop tool calling)

### Journey → agent_config (JSON strutturato)

```json
{
  "content_types": [
    {
      "id": "email_html",
      "label": "Email HTML",
      "detection_hint": "HTML con struttura email (From/To/Subject)",
      "preprocessing": "email_html",
      "extraction_prompt": "Per ogni email: azione diretta → task..."
    }
  ],
  "global_rules": [
    "Se il file non è riconducibile a nessun progetto, non creare entità."
  ],
  "data_types": ["tasks", "notes", "timelines"]
}
```

---

## Problemi V1 e come V2 li risolve

| # | Problema V1 | Soluzione V2 |
|---|---|---|
| P1 | HTML raw all'LLM | Preprocessing Python → testo pulito |
| P2 | Troncamento 4000 char | Testo preprocessato, molto più denso |
| P3 | Nessuna gestione thread | Handler email splitta thread, ultimo msg |
| P4 | Project matching debole | Filename come segnale primario + testo pulito |
| P5 | custom_prompt in coda | Extraction rules in posizione prominente |
| P6 | Nessun preprocessing | Handler predefiniti per tipo |
| P7 | items_created sempre 0 | Fix nel runner (contare tool call results) |

---

## Modifiche al codice necessarie

### Backend (adiuva-api)

1. **Nuovo modulo**: `app/core/preprocessors/` con handler per tipo
   - `__init__.py` — registry + detect + dispatch
   - `email_html.py` — BeautifulSoup: strip, metadata, thread split
   - `base.py` — interfaccia base + fallback

2. **`agent_setup.py`**: Journey produce agent_config JSON, non prompt_template
   - System prompt aggiornato per generare JSON strutturato
   - Validazione output con schema Pydantic

3. **`agent_runner.py`**: Flusso rivisto
   - Rimuovere `_classify_file()` (Step 1 separato)
   - Aggiungere preprocess step prima della call LLM
   - Single LLM call con prompt tipo-specifico
   - Contare items_created dai tool call results

4. **`models.py`**: `prompt_template: Text` → `agent_config: JSON`

### Frontend (adiuva)

5. **`store.ts`**: Campo `promptTemplate` → `agentConfig`
6. **`JourneyDialog.tsx`**: Parsing JSON da journey reply
7. **`agent-scheduler.ts`**: Passa `agentConfig` al trigger
8. **Schema Pydantic/Zod**: Aggiornare per nuovo formato

---

---

## Stato implementazione

| Step | Stato | Branch |
|------|-------|--------|
| Step 1 — Preprocessors | ✅ DONE | `feature/batch-agent-v2` |
| Step 2 — agent_runner.py refactor | ✅ DONE | `feature/batch-agent-v2` |
| Step 3 — Model/schema agent_config | ✅ DONE | `feature/batch-agent-v2` |
| Step 4 — Journey setup output strutturato | ✅ DONE | `feature/batch-agent-v2` |
| Step 5 — Frontend | ✅ DONE | main |
| Step 6 — E2E con file reali | ⏳ TODO | — |

---

## Convenzioni test (aggiornate dopo implementazione step 1–2)

### Struttura fixture

```
tests/fixtures/<step_name>/
  cases.yaml        ← definizioni dei casi
  data/             ← file di input (HTML, txt, ...)
```

Opzione CLI per sovrascrivere la cartella:
```bash
pytest tests/test_<step>.py -v --<step>-dir /path/to/folder
```
Registrata in `conftest.py` via `pytest_addoption`. La cartella custom deve avere la stessa struttura (`cases.yaml` + `data/`).

Opzioni registrate finora:
- `--preprocess-dir` → step 1
- `--runner-dir` → step 2 (aggiungere `--journey-dir` per step 4, `--e2e-dir` per step 6)

### Schema YAML — principi (step 1 vs step 2)

**Step 1 (preprocessors) — test deterministici, no LLM:**
- Chiavi piatte: `detect:`, `process:`, `no_html:`, `min_chars:`, ecc.
- Nessun `description` né `score_name` (Langfuse non usato)
- `file:` serve sia come nome su disco che come filename passato alla funzione
- `generate: binary_noise` per contenuto sintetico

**Step 2+ (runner, journey, e2e) — test LLM eval:**
- `file:` = nome su disco in `data/`
- `file_path:` = path vista dall'agent (separato perché più casi riusano lo stesso file con path diversi, es. per testare project matching da filename vs content)
- `description:` presente nel YAML (utile nel report pytest)
- `score_name:` presente nel YAML (il nome con cui lo score viene inviato a Langfuse)
- `projects:` lista di nomi simbolici (`alpha`, `beta`) o dict inline `{id, name, status}` — risolta da `_resolve_projects()`
- Assertion keys piatte: `expect_insert`, `expect_no_insert`, `expect_project_id`, `expect_dedup`

### Parametrize da YAML

Usare `pytest_generate_tests` per accedere all'opzione CLI custom:

```python
def pytest_generate_tests(metafunc):
    if "runner_case" not in metafunc.fixturenames:
        return
    cases = _load_cases(metafunc.config)
    metafunc.parametrize("runner_case", cases, ids=[c["id"] for c in cases])
```

I test accedono alla dir via `pytestconfig`:
```python
async def test_eval_runner(runner_case, pytestconfig):
    data_dir = _fixtures_dir(pytestconfig) / "data"
```

### Langfuse V3 — pattern corretto

**Problemi riscontrati con V2 API (non usare):**
- `lf.trace()` → non esiste in V3
- `lf.score(trace_id=...)` → non esiste in V3
- `lf.start_as_current_observation(user_id=..., session_id=...)` → kwargs non accettati

**Pattern V3 corretto nei test eval:**
```python
from contextlib import nullcontext
lf = get_langfuse()
obs_ctx = lf.start_as_current_observation(
    name="eval-runner-2.1",
    metadata={"step": "2", "case_id": "2.1"},
) if lf else nullcontext()

with obs_ctx as obs:
    # ... esegui il codice ...
    if obs is not None:
        obs.score(name="runner.email_to_task", value=1.0, comment="...")

if lf:
    lf.flush()
```

**Pattern V3 corretto nel codice produzione (`agent_runner.py`, `deep_agent.py`, `agent_setup.py`):**
```python
# user_id e session_id vanno in metadata, NON come kwarg diretti
lf.start_as_current_observation(
    as_type="span",
    name="my-span",
    metadata={"user_id": user_id, "session_id": session_id},
    input=...,
)
```

### compile_prompt — non usare template.format() direttamente

`get_prompt_or_fallback()` ritorna il template grezzo. Langfuse usa `{{variable}}`, il fallback usa `{variable}`. Usare sempre `compile_prompt()` che dispatcha correttamente:

```python
from app.core.langfuse_client import compile_prompt, get_prompt_or_fallback

template, prompt_obj = get_prompt_or_fallback("my_prompt", FALLBACK_PROMPT)
compiled = compile_prompt(template, prompt_obj, var1=val1, var2=val2)
# ↑ usa prompt_obj.compile() per Langfuse, template.format() per fallback
```

**Non fare mai:**
```python
compiled = template.format(var1=val1)  # ❌ rompe con Langfuse (usa {{var1}})
```

### Struttura test file per step LLM eval

Pattern consolidato da `test_agent_runner_v2.py`:

```
tests/test_<step>.py
  ├── Costanti (_USER_ID, _DEFAULT_FIXTURE_DIR, _AGENT_CONFIG, simboli progetto)
  ├── _fixtures_dir(config) + _load_cases(config) + _read_case_file(case, data_dir)
  ├── _resolve_projects(entries) — gestisce sia stringhe simboliche che dict inline
  ├── pytest_generate_tests — parametrize eval tests da YAML
  ├── Helper builders (_make_config, _make_run_log, _make_manager, _make_executor)
  ├── Unit tests statici (no YAML, no LLM)
  └── test_eval_<step>(runner_case, pytestconfig) — unica funzione parametrizzata
        ↓ legge file, risolve progetti, crea executor, chiama runner
        ↓ _evaluate_case(case, calls, kwargs) → (score, comment)
        ↓ obs.score(...) se Langfuse attivo
```

`_evaluate_case()` centralizza tutta la logica di assertion mappata dalle chiavi YAML — nessuna logica di assert sparsa nel test.

### Step 4 — Journey V2: pattern specifici

**Sentinelle:** `AGENT_CONFIG_START` / `AGENT_CONFIG_END` (rimpiazzano `PROMPT_TEMPLATE_START/END`)

**Langfuse prompt:** `journey_system_v2` (non `journey_system` della V1)

**Frame key:** `existing_config` (JSON string, rimpiazza `existing_template` stringa in prosa)

**Ritorno handler:** chiave `agent_config` (JSON string validato da Pydantic) invece di `prompt_template`

**Executor per test journey:** usa `set_client_executor(executor)` / `clear_client_executor()` direttamente nel test helper `_run_journey`, mimando `device_ws._handle_journey_start`. Re-imposta prima di ogni chiamata (start + ogni message).

**Fixture YAML journey:** `directory_files: [{path, content_file}]` + `user_messages: [...]` + assertion keys flat (`expect_question`, `expect_done`, `expect_valid_config`, `expect_content_type_id`, `expect_extraction_contains`, `expect_global_rules`)

**Test nudge (unit):** popola `_sessions` con una `JourneySession` fake con `_MAX_TURNS` turni, patcha `_call_llm_with_tools`, verifica che il secondo call riceva il nudge con i nuovi marker nelle `history`.

**JSON nel system prompt:** i literal `{` e `}` nel JSON di esempio devono essere `{{` e `}}` per il fallback `str.format()`. Le variabili template usano `{var}` (singolo). `compile_prompt()` gestisce il dispatch corretto per Langfuse vs fallback.

### Step 5 — Frontend V2: pattern specifici

**Store (`LocalAgentLocalConfig`):** campo `agentConfig: Record<string, unknown> | null` sostituisce `promptTemplate: string`. Stored nell'electron-store come oggetto JSON.

**Trigger body:** lo scheduler e `runNow` mandano `agentConfig` (oggetto, non `customAgentPrompt` stringa).

**WS frame `journey_start`:** campo `existingConfig` (JSON string) rimpiazza `existingTemplate` stringa. Backend si aspetta `existing_config` (snake_case via `toSnakeCase()`).

**WS frame `journey_reply`:** campo `agentConfig` (JSON string) rimpiazza `promptTemplate`. Il FE lo riceve come stringa, lo parsa con `JSON.parse()` → oggetto.

**tRPC journey router:** ritorna `{ ..., agentConfig: string | undefined }`. I componenti React lo parsano localmente.

**Cloud agents:** non migrati — mantengono `promptTemplate: string` in `CloudAgentConfigSchema`, `agentCloudRouter`, `PromptBuilderChat.onPromptUpdate`. Il `PromptBuilderChat` ora ha anche `onConfigUpdate` per il path local.

**`JourneyDialog`:** props `currentConfig: Record<string, unknown> | null` + `onSaved(agentConfig: Record<string, unknown>)`. Mostra un summary human-readable (`AgentConfigSummary`) invece del raw prompt string.

**`InlineAgentCreationStepper`:** mantiene `promptTemplate` state per cloud; aggiunge `agentConfig` state per local. `PromptBuilderChat` richiama `onConfigUpdate` per local e `onPromptUpdate` per cloud (backward-compat).