Files

Roberto Musso 75a826c9d8 feat(batch-agent): add E2E evaluation harness with Langfuse integration

- eval/mock_executor.py: intercepts execute_on_client, serves fixture
  files from disk, records all mutations (insert/update/delete)
- eval/config.py: YAML fixture loader with prompt variants, expected
  results, seed records, model overrides
- eval/scorer.py: FieldMatchScorer (fuzzy title match, per-field
  accuracy, precision/recall/F1) + LLMJudgeScorer (semantic eval)
- eval/langfuse_eval.py: sync fixtures to Langfuse datasets, create
  dataset runs, post scores, link traces to runs
- eval/runner.py: orchestrates fixture → mock → agent pipeline →
  scoring → Langfuse reporting
- eval/cli.py: CLI (python -m eval run/list/sync) with --models,
  --variants, --fixture, --no-judge flags
- eval/fixtures/: example Italian freelance scenario with 3 prompt
  variants (baseline, detailed_italian, minimal)

2026-03-23 08:54:19 +01:00

app

feat(batch-agent): integrate Langfuse tracing

2026-03-23 08:43:15 +01:00

eval

feat(batch-agent): add E2E evaluation harness with Langfuse integration

2026-03-23 08:54:19 +01:00

Dockerfile

feat(batch-agent): extract Batch Agent Service (Step 3)

2026-03-23 07:19:02 +01:00

README.md

docs: add Langfuse integration TODO for batch-agent service

2026-03-23 00:25:42 +01:00

requirements.txt

feat(batch-agent): integrate Langfuse tracing

2026-03-23 08:43:15 +01:00

README.md

Batch Agent Service

Owns: agent_runner, journey builder, filesystem_agent, integrations (Gmail, MS Graph).

Tables owned

local_agent_configs
cloud_agent_configs
agent_run_logs

Endpoints

GET /agents/catalog
POST /agents/can-create
POST /agents/trigger
GET /agents/{id}/history

Redis channels

Subscribe: batch:request:{user_id}
Publish: ws:out:{user_id} (journey replies + tool calls)
BRPOP: tool:result:{call_id} (30s timeout)
SET+EX: journey:{user_id} (session state, TTL 1800s)

TODO

Integrate Langfuse tracing (reuse services/chat/app/tracing.py pattern — trace_span(), get_langfuse_callback(), prompt management). Each batch agent run should create a trace with input/output, link prompts, and pass the LangChain CallbackHandler to LLM calls.