Roberto Musso
63fa119543
feat(batch-agent): add journey eval to E2E harness
- journey_runner.py: orchestrates journey start → simulated user
messages → template extraction → LLM judge scoring
- config.py: JourneyFixture dataclass with user_messages and
expected_template_criteria, discover_journey_fixtures()
- langfuse_eval.py: sync_journey_fixture_to_dataset()
- cli.py: new 'journey' subcommand (python -m eval journey)
with --fixture, --models, --judge-model flags
- fixtures/journey_invoice_setup.yaml: example journey fixture
with 4 user messages and 8 quality criteria
2026-03-23 23:16:41 +01:00
..
2026-03-23 23:16:41 +01:00
2026-03-23 08:54:19 +01:00
2026-03-23 08:54:19 +01:00
2026-03-23 23:16:41 +01:00
2026-03-23 23:16:41 +01:00
2026-03-23 23:16:41 +01:00
2026-03-23 23:16:41 +01:00
2026-03-23 08:54:19 +01:00
2026-03-23 08:54:19 +01:00
2026-03-23 08:54:19 +01:00
2026-03-23 08:54:19 +01:00