Langfuse V3 does not accept user_id/session_id on observation-level calls.
Moved to metadata dict in agent_runner, deep_agent, and agent_setup.
refactor(tests): fixture-based pattern for agent_runner_v2 eval tests
- cases.yaml + data/ fixtures under tests/fixtures/agent_runner_v2/
- pytest_generate_tests parametrizes test_eval_runner from YAML
- _resolve_projects() handles symbolic names and inline dicts
- _evaluate_case() centralizes all assertion logic
- --runner-dir CLI option for custom fixture folders
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Langfuse uses {{variable}} syntax in its prompt management UI, while the
hardcoded fallbacks use {variable} (Python str.format). The previous code
always called .format() which silently failed/errored when a real Langfuse
prompt was fetched.
- langfuse_client.py: add compile_prompt(template, prompt_obj, **vars)
→ uses prompt_obj.compile(**vars) when Langfuse is available
→ falls back to template.format(**vars) when using the hardcoded fallback
- agent_runner.py: replace .format() with compile_prompt() for
unified_processing (V2 local) and batch_cloud_processing (cloud agent)
- agent_setup.py: replace .format() with compile_prompt() for journey_system
deep_agent.py prompts have no variables, so no change needed there.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New app/core/langfuse_client.py: lazy singleton client, get_prompt_or_fallback()
helper (returns raw template + prompt obj for linking), extract_usage() for token
counts. No-ops when LANGFUSE_* env vars are not set.
- deep_agent.py: home-agent and floating-agent runs wrapped in spans; each ainvoke
wrapped in a generation with model/input/output/usage; prompts fetched from
Langfuse (adiuva-home-agent, adiuva-floating-agent, adiuva-floating-classifier)
with hardcoded fallback.
- agent_runner.py: step1-classifier and step2-processor LLM calls traced; batch
agent _run_agent_with_tools spans + generations; cloud-processor included.
Prompts: adiuva-step1-classifier, adiuva-step2-processor, adiuva-cloud-processor.
- agent_setup.py: journey-setup span + generation per ainvoke; prompt_obj stored
on JourneySession and reused across turns. Prompt: journey_system.
- settings.py: LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_HOST added.
- .env.example: Langfuse section with EU/US/self-hosted host comments.
- requirements.txt: langfuse>=2.0.0.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root causes fixed:
1. PROJECT_TOOLS removed from Step 2 tool set — project assignment is now
exclusively handled by the runner in code, never by the LLM.
2. When Step 1 returns "new", runner calls execute_on_client insert/projects
directly (before Step 2), gets the created id, and passes it as context.
3. Newly created projects are appended to the local `projects` list so that
subsequent files in the same run can match to them via Step 1 — prevents
one project per file when multiple files share the same topic.
Also add tests/test_classify_file.py with pytest cases for _classify_file
and a CLI runner: python -m tests.test_classify_file <file> [project...]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rewrite _STEP1_SYSTEM_PROMPT: lower matching threshold (no longer requires
"clear" match), strongly prefer existing projects over creating new ones,
use structured id=|name=|status= format with aiSummary for richer context
- Add code-level UUID validation: reject hallucinated ids not in the fetched
projects list, fall back to "new" instead of creating a bad link
- Rewrite _PROCESSING_SYSTEM_PROMPT: enforce explicit scan-before-create
process (read existing → search → update if found → create only if not)
with hard rule against calling create_* without checking existing records
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace LLM-driven triage with code-based directory scan and project fetch
- Two-step LLM approach: Step 1 classifies file→project+domains, Step 2 processes with tools
- Add domain descriptions to Step 1 prompt for better extraction accuracy
- Add _running_agents set for per-agent concurrency guard (one running instance per agent)
- Return 409 from route before DB write when agent already running
- Remove is_approved from task_agent create/update tools and system prompt
- Remove is_approved from timeline_agent create/update tools and system prompt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- AgentTriggerRequest accepts optional agent_id (FE's stable electron-store UUID)
- _make_agent_executor injects run_context into every tool_call frame
so Electron can attribute actions to the correct agent run
- run_local_agent accepts run_context and sends a run_complete WS frame
when the run finishes so the FE can close the run record
- trigger_agent_run builds run_context with run_id=run_log.id and the
stable agent_id, passes it through to run_local_agent
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove unused config_schema from AgentCatalogItem (schema + route)
- Fix agent_setup system prompt: add extraction agent base behaviour
context so journey LLM knows what is already handled and focuses on
field mappings only; remove redundant data-types question (already
known from user selection); derive data types list dynamically
- Rewrite processing base prompt to use actual tool names
(list_tasks, update_task, add_task_comment, list_notes, update_note,
list_timelines, update_timeline, list_all_projects, create_project)
and enforce update-first strategy before falling back to creation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the single-pass FE-driven agent_run/agent_data flow with a
BE-orchestrated two-phase execution using LangChain tool-calling:
- Phase 1 (Triage): explores directory via new filesystem tools, matches
files to existing projects using PROJECT_TOOLS
- Phase 2 (Processing): reads files and performs CRUD per project group
with clean LLM context windows
Key changes:
- Add filesystem_agent.py with list_directory, read_file_content,
get_file_metadata tools using execute_on_client()
- Move setup journey from REST to WebSocket (journey_start/message frames)
- Add batch_runs_per_day billing limit and enforce in /trigger
- Remove deprecated agent_data/agent_complete frame handlers and queues
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- orchestrate_v3(user_id, message, context): classifies intent, returns
(agent_name, agent_instance) — caller drives execution
- orchestrate_v3_stream(user_id, message, context): yields (agent_name, token)
pairs; first yield is always (agent_name, "") as a domain-detection signal
- ChatAgent.handle_stream(): default implementation yields handle() result as
one chunk; subclasses override for true token-level streaming
- Fix stale test_orchestrator.py assertions that expected a JSON final frame
that orchestrate_stream never emitted
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ChatAgent.__init__: adds tool_results: list[dict] = []
- _tool_loop: wraps execution in a result collector; populates
self.tool_results with raw execute_on_client dicts after each run
- _tool_loop_stream: streaming variant — uses ainvoke for tool-call
iterations, llm.astream() for the final answer; same result capture
- ws_context.py: adds _tool_result_collector ContextVar +
set/clear helpers; execute_on_client appends to collector when set
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Introduced new API keys for Anthropic and Google in .env.example and settings.py
- Updated llm.py to retrieve API keys directly from settings
- Modified deploy.yaml to streamline code checkout and improve deployment process
- Replaced direct instantiation of ChatOpenAI with a centralized get_llm function in CheckpointAgent, NoteAgent, ProjectAgent, and TaskAgent.
- Introduced a new llm.py module to handle LLM model instantiation and API key management.
- Updated settings.py to include LLM_MODEL and LLM_ROUTER_MODEL configurations.
- Modified orchestrator.py to use get_router_llm for intent classification.
- Updated requirements.txt to include litellm for LLM management.
- Adjusted tests to mock get_llm instead of ChatOpenAI directly.
- Updated `TestModuleSingletons` in `test_execution_plan.py` to reflect new agent templates and playbook names.
- Changed assertions in playbook tests to match updated templates and agents.
- Introduced `test_storage.py` to cover the storage layer, including encryption, BlobStore, and VectorStore functionalities.
- Added tests for S3 interactions, ensuring upload, download, delete, and list operations work as expected.
- Implemented mock tests for Pinecone and Qdrant vector stores to validate upsert, search, and delete operations.