- Rewrite eval config with EvalMode (step1, step2, full) replacing prompt_variants - Rewrite runner with _run_step1, _run_step2, _run_full dispatch - CLI: replace --variants with --mode flag - Add 3 fixture YAMLs: classify_invoices (step1), process_invoices (step2), full_invoices (full) - Remove old freelance_invoices fixture - Langfuse: mode-aware dataset items (classifications for step1, extraction for step2, both for full) - Langfuse: link both prompts (batch_file_classifier + batch_processing) in full mode - Langfuse: post separate classification_precision/recall/f1 scores for full mode - Langfuse: skip misleading field_accuracy=0 when field_scores is empty (step1) - Langfuse: include step1_results in trace output - MockExecutor: mock async_session to bypass DB in full mode - Journey fixture: remove user_messages (only interactive test kept)
109 lines
3.8 KiB
YAML
109 lines
3.8 KiB
YAML
# Fixture: full-invoices (full)
|
|
# Tests both _STEP1_SYSTEM_PROMPT and _PROCESSING_SYSTEM_PROMPT in sequence
|
|
# via run_local_agent(). Verifies end-to-end classification + extraction.
|
|
|
|
name: full-invoices
|
|
mode: full
|
|
description: >
|
|
End-to-end test: classify Italian invoices/meeting notes into the
|
|
correct project, then extract tasks, notes, and timeline events.
|
|
|
|
directory: sample_files/invoices
|
|
data_types: [tasks, notes, timelines]
|
|
file_extensions: [txt, md]
|
|
|
|
# ── Step-1 prompt variables ──────────────────────────────────────
|
|
domain_definitions: |
|
|
- tasks: Action items, deliverables, things to do — anything that someone needs to complete.
|
|
- notes: Meeting summaries, decisions, reference information — permanent knowledge entries.
|
|
- timelines: Project milestones, deadlines, scheduled events — specific dates that mark a point in the progress of a project.
|
|
|
|
projects_list:
|
|
- id: "proj-web-redesign"
|
|
name: "Redesign Sito Web Corporate"
|
|
status: "active"
|
|
aiSummary: "Corporate website redesign for Studio Architettura Bianchi"
|
|
- id: "proj-ecommerce"
|
|
name: "E-Commerce FashionStore"
|
|
status: "active"
|
|
aiSummary: "Next.js e-commerce platform for FashionStore srl"
|
|
|
|
# ── Step-2 prompt variables ──────────────────────────────────────
|
|
existing_context: |
|
|
Existing tasks:
|
|
(none)
|
|
|
|
Existing notes:
|
|
(none)
|
|
|
|
Existing timelines:
|
|
(none)
|
|
|
|
project_context: ""
|
|
|
|
custom_prompt_section: |
|
|
User instructions:
|
|
Estrai i dati dai file come segue:
|
|
- TASK: ogni azione da fare, deliverable, o item con scadenza.
|
|
Mappa "URGENTE" o "ALTA PRIORITÀ" → priority: high.
|
|
Mappa "media priorità" → priority: medium.
|
|
Mappa "bassa priorità" → priority: low.
|
|
Se un item è marcato come "completato" o [x], impostalo status: done.
|
|
Altrimenti status: todo.
|
|
- NOTE: riassunti di meeting, decisioni prese, note tecniche.
|
|
- TIMELINE: date di scadenza, milestone, meeting futuri.
|
|
Imposta sempre isAiSuggested=1.
|
|
|
|
# ── Seed records (pre-existing DB state) ─────────────────────────
|
|
seed_records:
|
|
projects:
|
|
- id: "proj-web-redesign"
|
|
name: "Redesign Sito Web Corporate"
|
|
status: "active"
|
|
aiSummary: "Corporate website redesign for Studio Architettura Bianchi"
|
|
- id: "proj-ecommerce"
|
|
name: "E-Commerce FashionStore"
|
|
status: "active"
|
|
aiSummary: "Next.js e-commerce platform for FashionStore srl"
|
|
tasks: []
|
|
notes: []
|
|
timelines: []
|
|
|
|
# ── Expected classification (step 1) ─────────────────────────────
|
|
expected_classification:
|
|
- file: "sample_files/invoices/fattura_042.txt"
|
|
project_id: "proj-web-redesign"
|
|
domains: [tasks, notes, timelines]
|
|
|
|
- file: "sample_files/invoices/meeting_ecommerce.md"
|
|
project_id: "proj-ecommerce"
|
|
domains: [tasks, notes, timelines]
|
|
|
|
# ── Expected extractions (step 2) ────────────────────────────────
|
|
expected:
|
|
tasks:
|
|
- title: "Sviluppo frontend React"
|
|
priority: "high"
|
|
status: "todo"
|
|
- title: "Integrazione API backend"
|
|
priority: "medium"
|
|
status: "todo"
|
|
- title: "Testing cross-browser e fix bug responsive"
|
|
status: "todo"
|
|
- title: "Preparare wireframe homepage"
|
|
priority: "high"
|
|
status: "todo"
|
|
- title: "Setup progetto Next.js e configurare CI/CD"
|
|
priority: "medium"
|
|
status: "todo"
|
|
- title: "Ricerca plugin Stripe per gestione abbonamenti"
|
|
priority: "low"
|
|
status: "todo"
|
|
|
|
notes:
|
|
- title: "Meeting Kickoff Progetto E-Commerce"
|
|
|
|
timelines:
|
|
- title: "MVP E-Commerce pronto"
|
|
- title: "Meeting di revisione"
|