adiuvAI/api - api - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Roberto Musso	d3f7099d93	refactor(eval): 3-mode eval harness (step1/step2/full) with Langfuse fixes - Rewrite eval config with EvalMode (step1, step2, full) replacing prompt_variants - Rewrite runner with _run_step1, _run_step2, _run_full dispatch - CLI: replace --variants with --mode flag - Add 3 fixture YAMLs: classify_invoices (step1), process_invoices (step2), full_invoices (full) - Remove old freelance_invoices fixture - Langfuse: mode-aware dataset items (classifications for step1, extraction for step2, both for full) - Langfuse: link both prompts (batch_file_classifier + batch_processing) in full mode - Langfuse: post separate classification_precision/recall/f1 scores for full mode - Langfuse: skip misleading field_accuracy=0 when field_scores is empty (step1) - Langfuse: include step1_results in trace output - MockExecutor: mock async_session to bypass DB in full mode - Journey fixture: remove user_messages (only interactive test kept)	2026-03-24 16:18:51 +01:00
Roberto Musso	63fa119543	feat(batch-agent): add journey eval to E2E harness - journey_runner.py: orchestrates journey start → simulated user messages → template extraction → LLM judge scoring - config.py: JourneyFixture dataclass with user_messages and expected_template_criteria, discover_journey_fixtures() - langfuse_eval.py: sync_journey_fixture_to_dataset() - cli.py: new 'journey' subcommand (python -m eval journey) with --fixture, --models, --judge-model flags - fixtures/journey_invoice_setup.yaml: example journey fixture with 4 user messages and 8 quality criteria	2026-03-23 23:16:41 +01:00
Roberto Musso	d856dfd28c	refactor: deduplicate shared code into shared/ module Move duplicated files from chat + batch-agent into shared/: - shared/ws_context.py — Redis-based tool call round-trip - shared/llm.py — LiteLLM factory (get_llm, embed) - shared/agents/ — 4 domain agents (task, note, project, timeline) Update all service imports to use shared.* instead of app.*. Delete 12 duplicated files across both services.	2026-03-23 23:01:45 +01:00
Roberto Musso	ccba54ac24	fix(tracing): use Langfuse compile_prompt with {{variable}} syntax - tracing.py: add compile_prompt() that uses Langfuse .compile(**vars) for {{variable}} substitution, falls back to Python .format() for hardcoded {variable} templates - agent_runner.py: replace _get_system_prompt().format() with tracing.compile_prompt() for batch_file_classifier, batch_processing, batch_cloud_processing prompts - journey.py: replace get_prompt + .format() with compile_prompt() for journey_system prompt - chat tracing.py: add compile_prompt() for parity (chat prompts currently have no variables, but ready for future use) - Remove unused _get_system_prompt helper	2026-03-23 22:39:27 +01:00
Roberto Musso	55500cc818	feat(batch-agent): add Langfuse prompt management - _get_system_prompt helper: fetches managed prompts from Langfuse with hardcoded fallback (same pattern as chat service) - journey.py: journey_system prompt manageable via Langfuse - agent_runner.py: batch_file_classifier, batch_processing, batch_cloud_processing prompts all manageable via Langfuse - redis_consumer.py: link_prompt_to_trace for all three handlers	2026-03-23 22:30:36 +01:00
Roberto Musso	75a826c9d8	feat(batch-agent): add E2E evaluation harness with Langfuse integration - eval/mock_executor.py: intercepts execute_on_client, serves fixture files from disk, records all mutations (insert/update/delete) - eval/config.py: YAML fixture loader with prompt variants, expected results, seed records, model overrides - eval/scorer.py: FieldMatchScorer (fuzzy title match, per-field accuracy, precision/recall/F1) + LLMJudgeScorer (semantic eval) - eval/langfuse_eval.py: sync fixtures to Langfuse datasets, create dataset runs, post scores, link traces to runs - eval/runner.py: orchestrates fixture → mock → agent pipeline → scoring → Langfuse reporting - eval/cli.py: CLI (python -m eval run/list/sync) with --models, --variants, --fixture, --no-judge flags - eval/fixtures/: example Italian freelance scenario with 3 prompt variants (baseline, detailed_italian, minimal)	2026-03-23 08:54:19 +01:00
Roberto Musso	971f1dd84f	feat(batch-agent): integrate Langfuse tracing - tracing.py: init/shutdown, trace_span, get_langfuse_callback, prompt mgmt - main.py: init_langfuse at startup, shutdown on teardown - redis_consumer.py: trace_span around journey_start/message/agent_trigger - agent_runner.py: thread langfuse_handler through classify + processing LLM - journey.py: thread langfuse_handler through _call_llm_with_tools - llm.py: accept callbacks param, forward to LLM constructors - requirements.txt: add langfuse>=3.0.0	2026-03-23 08:43:15 +01:00
Roberto Musso	333bba6fdd	feat(batch-agent): extract Batch Agent Service (Step 3) - agent_runner: local directory + cloud agent orchestration via Redis - 5 domain agents: filesystem, task, note, project, timeline - integrations: Gmail, MS Graph (Outlook + Teams) - journey: guided chatbot conversation to build prompt_template - routes: REST endpoints (catalog, can-create, trigger) - redis_consumer: subscribes to batch:request:* pattern - ws_context: Redis-based execute_on_client for tool round-trip - Dockerfile with 300s timeout for long-running batch jobs	2026-03-23 07:19:02 +01:00
Roberto Musso	229e20d073	docs: add Langfuse integration TODO for batch-agent service	2026-03-23 00:25:42 +01:00
Roberto Musso	aa219a4d08	feat: microservices scaffold + Auth Service (Step 1) - Add shared/ module: config, db, models, schemas, redis utilities - Add Auth Service (services/auth/): register, login, refresh, me, ForwardAuth /verify endpoint for Traefik - Add Traefik config: ACME/Cloudflare DNS-01, dynamic routing, ForwardAuth middleware, sticky sessions for WS Gateway - Add service scaffolds: ws-gateway, chat, batch-agent, billing (READMEs) - Add redis>=5.0.0 to requirements.txt - Monolith app/ is untouched — strangler fig migration	2026-03-22 00:29:51 +01:00

10 Commits