Files

Roberto Musso 54eb863c52 feat: Update local agent documentation to reflect changes in user_id and session_id handling; add marketing strategy document; update skills-lock.json with new Remotion best practices; update website subproject commit.

2026-04-11 02:15:59 +02:00

12 KiB

Raw Blame History

Local Agent V2 — Working Memory

Decisioni confermate

Breaking change: nessuna backward compatibility con prompt_template
Preprocessing: lato backend Python, approccio (c): handler predefiniti + fallback LLM futuro
Primo handler: email HTML. Altri tipi in futuro.
Journey: produce agent_config strutturato (JSON), non prompt monolitico
L'utente vuole personalizzazione: es. "summarize documenti nelle note per progetto"
File types: qualsiasi tipo, anche mischiati nella stessa directory
Progetti: numero variabile, deve scalare

Architettura V2 — Flusso per file

[A] Detect + Preprocess (Python puro, zero LLM)

File raw da Electron
    ↓
detect_content_type(filename, raw_content)
    → heuristic: extension + content patterns
    → match a un content_type dal agent_config
    ↓
preprocess(content_type, raw_content)
    → handler specifico (es. email_html → BeautifulSoup)
    → Output: { content_type, clean_text, metadata: {subject, from, date, ...} }

Handlers predefiniti (MVP: solo email_html):

email_html: strip tags, estrai subject/from/to/date, splitta thread → ultimo msg
generic_html: estrai main content, strip nav/footer (futuro)
plain_text: pass-through (futuro)
csv: parse + summary (futuro)
pdf: estrai testo (futuro)
Fallback: raw text con limit

[B] Single LLM call — classify + extract + create

Una sola call LLM con tool calling che fa tutto:

System prompt costruito da:

Istruzioni base (update-first, isAiSuggested=1, ecc.)
Regole di estrazione del content_type (dal agent_config) ← posizione PROMINENTE
Global rules (dal agent_config)
Lista progetti compatta
Istruzioni procedurali: identifica progetto → query entità → estrai → crea/aggiorna

User message:

Filename + metadata
Testo pulito

Tools disponibili:

list_tasks, list_notes, list_timelines (query)
create_task, create_note, create_timeline
update_task, update_note, update_timeline

Max steps: 12 (loop tool calling)

Journey → agent_config (JSON strutturato)

{
  "content_types": [
    {
      "id": "email_html",
      "label": "Email HTML",
      "detection_hint": "HTML con struttura email (From/To/Subject)",
      "preprocessing": "email_html",
      "extraction_prompt": "Per ogni email: azione diretta → task..."
    }
  ],
  "global_rules": [
    "Se il file non è riconducibile a nessun progetto, non creare entità."
  ],
  "data_types": ["tasks", "notes", "timelines"]
}

Problemi V1 e come V2 li risolve

#	Problema V1	Soluzione V2
P1	HTML raw all'LLM	Preprocessing Python → testo pulito
P2	Troncamento 4000 char	Testo preprocessato, molto più denso
P3	Nessuna gestione thread	Handler email splitta thread, ultimo msg
P4	Project matching debole	Filename come segnale primario + testo pulito
P5	custom_prompt in coda	Extraction rules in posizione prominente
P6	Nessun preprocessing	Handler predefiniti per tipo
P7	items_created sempre 0	Fix nel runner (contare tool call results)

Modifiche al codice necessarie

Backend (adiuvai-api)

Nuovo modulo: app/core/preprocessors/ con handler per tipo
- __init__.py — registry + detect + dispatch
- email_html.py — BeautifulSoup: strip, metadata, thread split
- base.py — interfaccia base + fallback
agent_setup.py: Journey produce agent_config JSON, non prompt_template
- System prompt aggiornato per generare JSON strutturato
- Validazione output con schema Pydantic
agent_runner.py: Flusso rivisto
- Rimuovere _classify_file() (Step 1 separato)
- Aggiungere preprocess step prima della call LLM
- Single LLM call con prompt tipo-specifico
- Contare items_created dai tool call results
models.py: prompt_template: Text → agent_config: JSON

Frontend (adiuvai)

store.ts: Campo promptTemplate → agentConfig
JourneyDialog.tsx: Parsing JSON da journey reply
agent-scheduler.ts: Passa agentConfig al trigger
Schema Pydantic/Zod: Aggiornare per nuovo formato

Stato implementazione

Step	Stato	Branch
Step 1 — Preprocessors	✅ DONE	`feature/batch-agent-v2`
Step 2 — agent_runner.py refactor	✅ DONE	`feature/batch-agent-v2`
Step 3 — Model/schema agent_config	✅ DONE	`feature/batch-agent-v2`
Step 4 — Journey setup output strutturato	✅ DONE	`feature/batch-agent-v2`
Step 5 — Frontend	✅ DONE	main
Step 6 — E2E con file reali	⏳ TODO	—

Convenzioni test (aggiornate dopo implementazione step 1–2)

Struttura fixture

tests/fixtures/<step_name>/
  cases.yaml        ← definizioni dei casi
  data/             ← file di input (HTML, txt, ...)

Opzione CLI per sovrascrivere la cartella:

pytest tests/test_<step>.py -v --<step>-dir /path/to/folder

Registrata in conftest.py via pytest_addoption. La cartella custom deve avere la stessa struttura (cases.yaml + data/).

Opzioni registrate finora:

--preprocess-dir → step 1
--runner-dir → step 2 (aggiungere --journey-dir per step 4, --e2e-dir per step 6)

Schema YAML — principi (step 1 vs step 2)

Step 1 (preprocessors) — test deterministici, no LLM:

Chiavi piatte: detect:, process:, no_html:, min_chars:, ecc.
Nessun description né score_name (Langfuse non usato)
file: serve sia come nome su disco che come filename passato alla funzione
generate: binary_noise per contenuto sintetico

Step 2+ (runner, journey, e2e) — test LLM eval:

file: = nome su disco in data/
file_path: = path vista dall'agent (separato perché più casi riusano lo stesso file con path diversi, es. per testare project matching da filename vs content)
description: presente nel YAML (utile nel report pytest)
score_name: presente nel YAML (il nome con cui lo score viene inviato a Langfuse)
projects: lista di nomi simbolici (alpha, beta) o dict inline {id, name, status} — risolta da _resolve_projects()
Assertion keys piatte: expect_insert, expect_no_insert, expect_project_id, expect_dedup

Parametrize da YAML

Usare pytest_generate_tests per accedere all'opzione CLI custom:

def pytest_generate_tests(metafunc):
    if "runner_case" not in metafunc.fixturenames:
        return
    cases = _load_cases(metafunc.config)
    metafunc.parametrize("runner_case", cases, ids=[c["id"] for c in cases])

I test accedono alla dir via pytestconfig:

async def test_eval_runner(runner_case, pytestconfig):
    data_dir = _fixtures_dir(pytestconfig) / "data"

Langfuse V3 — pattern corretto

Problemi riscontrati con V2 API (non usare):

lf.trace() → non esiste in V3
lf.score(trace_id=...) → non esiste in V3
lf.start_as_current_observation(user_id=..., session_id=...) → kwargs non accettati

Pattern V3 corretto nei test eval:

from contextlib import nullcontext
lf = get_langfuse()
obs_ctx = lf.start_as_current_observation(
    name="eval-runner-2.1",
    metadata={"step": "2", "case_id": "2.1"},
) if lf else nullcontext()

with obs_ctx as obs:
    # ... esegui il codice ...
    if obs is not None:
        obs.score(name="runner.email_to_task", value=1.0, comment="...")

if lf:
    lf.flush()

Pattern V3 corretto nel codice produzione (agent_runner.py, deep_agent.py, agent_setup.py):

# user_id e session_id propagati come attributi first-class Langfuse
# tramite langfuse_context() che wrappa propagate_attributes()
from app.core.langfuse_client import langfuse_context

_lf_ctx = langfuse_context(user_id=user_id, session_id=session_id)
_lf_ctx.__enter__()

# user_id viene hashato con SHA-256 prima dell'invio a Langfuse
# session_id arriva dal renderer (home/floating) o dal run_id (batch)

_span_ctx = lf.start_as_current_observation(
    as_type="span",
    name="my-span",
    input=...,
)
# NON mettere user_id/session_id in metadata — propagate_attributes li gestisce

compile_prompt — non usare template.format() direttamente

get_prompt_or_fallback() ritorna il template grezzo. Langfuse usa {{variable}}, il fallback usa {variable}. Usare sempre compile_prompt() che dispatcha correttamente:

from app.core.langfuse_client import compile_prompt, get_prompt_or_fallback

template, prompt_obj = get_prompt_or_fallback("my_prompt", FALLBACK_PROMPT)
compiled = compile_prompt(template, prompt_obj, var1=val1, var2=val2)
# ↑ usa prompt_obj.compile() per Langfuse, template.format() per fallback

Non fare mai:

compiled = template.format(var1=val1)  # ❌ rompe con Langfuse (usa {{var1}})

Struttura test file per step LLM eval

Pattern consolidato da test_agent_runner_v2.py:

tests/test_<step>.py
  ├── Costanti (_USER_ID, _DEFAULT_FIXTURE_DIR, _AGENT_CONFIG, simboli progetto)
  ├── _fixtures_dir(config) + _load_cases(config) + _read_case_file(case, data_dir)
  ├── _resolve_projects(entries) — gestisce sia stringhe simboliche che dict inline
  ├── pytest_generate_tests — parametrize eval tests da YAML
  ├── Helper builders (_make_config, _make_run_log, _make_manager, _make_executor)
  ├── Unit tests statici (no YAML, no LLM)
  └── test_eval_<step>(runner_case, pytestconfig) — unica funzione parametrizzata
        ↓ legge file, risolve progetti, crea executor, chiama runner
        ↓ _evaluate_case(case, calls, kwargs) → (score, comment)
        ↓ obs.score(...) se Langfuse attivo

_evaluate_case() centralizza tutta la logica di assertion mappata dalle chiavi YAML — nessuna logica di assert sparsa nel test.

Step 4 — Journey V2: pattern specifici

Sentinelle: AGENT_CONFIG_START / AGENT_CONFIG_END (rimpiazzano PROMPT_TEMPLATE_START/END)

Langfuse prompt: journey_system_v2 (non journey_system della V1)

Frame key: existing_config (JSON string, rimpiazza existing_template stringa in prosa)

Ritorno handler: chiave agent_config (JSON string validato da Pydantic) invece di prompt_template

Executor per test journey: usa set_client_executor(executor) / clear_client_executor() direttamente nel test helper _run_journey, mimando device_ws._handle_journey_start. Re-imposta prima di ogni chiamata (start + ogni message).

Fixture YAML journey: directory_files: [{path, content_file}] + user_messages: [...] + assertion keys flat (expect_question, expect_done, expect_valid_config, expect_content_type_id, expect_extraction_contains, expect_global_rules)

Test nudge (unit): popola _sessions con una JourneySession fake con _MAX_TURNS turni, patcha _call_llm_with_tools, verifica che il secondo call riceva il nudge con i nuovi marker nelle history.

JSON nel system prompt: i literal { e } nel JSON di esempio devono essere {{ e }} per il fallback str.format(). Le variabili template usano {var} (singolo). compile_prompt() gestisce il dispatch corretto per Langfuse vs fallback.

Step 5 — Frontend V2: pattern specifici

Store (LocalAgentLocalConfig): campo agentConfig: Record<string, unknown> | null sostituisce promptTemplate: string. Stored nell'electron-store come oggetto JSON.

Trigger body: lo scheduler e runNow mandano agentConfig (oggetto, non customAgentPrompt stringa).

WS frame journey_start: campo existingConfig (JSON string) rimpiazza existingTemplate stringa. Backend si aspetta existing_config (snake_case via toSnakeCase()).

WS frame journey_reply: campo agentConfig (JSON string) rimpiazza promptTemplate. Il FE lo riceve come stringa, lo parsa con JSON.parse() → oggetto.

tRPC journey router: ritorna { ..., agentConfig: string | undefined }. I componenti React lo parsano localmente.

Cloud agents: non migrati — mantengono promptTemplate: string in CloudAgentConfigSchema, agentCloudRouter, PromptBuilderChat.onPromptUpdate. Il PromptBuilderChat ora ha anche onConfigUpdate per il path local.

JourneyDialog: props currentConfig: Record<string, unknown> | null + onSaved(agentConfig: Record<string, unknown>). Mostra un summary human-readable (AgentConfigSummary) invece del raw prompt string.

InlineAgentCreationStepper: mantiene promptTemplate state per cloud; aggiunge agentConfig state per local. PromptBuilderChat richiama onConfigUpdate per local e onPromptUpdate per cloud (backward-compat).

12 KiB Raw Blame History Unescape Escape