Langfuse V3 does not accept user_id/session_id on observation-level calls.
Moved to metadata dict in agent_runner, deep_agent, and agent_setup.
refactor(tests): fixture-based pattern for agent_runner_v2 eval tests
- cases.yaml + data/ fixtures under tests/fixtures/agent_runner_v2/
- pytest_generate_tests parametrizes test_eval_runner from YAML
- _resolve_projects() handles symbolic names and inline dicts
- _evaluate_case() centralizes all assertion logic
- --runner-dir CLI option for custom fixture folders
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
file: serve sia come path da leggere che come nome passato a detect_content_type.
Non c'è motivo di averli separati.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
YAML: rimosse op/description/score_name/assertions block — ora detect/process
come chiave diretta, assertions piatte sullo stesso livello del caso.
Runner: eliminato _run_assertions engine, assertions inline in test_preprocess.
Riduzione da ~170 a ~75 righe totali tra YAML + test.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Scoring is only meaningful for LLM-backed steps. Preprocess tests are
deterministic Python, so scores add no value. Kept only for detect tests.
- test_preprocess: drop _lf_score call, simplify _run_assertions return type
- cases.yaml: remove score_name from all op=preprocess entries
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>