- tracing.py: add compile_prompt() that uses Langfuse .compile(**vars)
for {{variable}} substitution, falls back to Python .format() for
hardcoded {variable} templates
- agent_runner.py: replace _get_system_prompt().format() with
tracing.compile_prompt() for batch_file_classifier, batch_processing,
batch_cloud_processing prompts
- journey.py: replace get_prompt + .format() with compile_prompt()
for journey_system prompt
- chat tracing.py: add compile_prompt() for parity (chat prompts
currently have no variables, but ready for future use)
- Remove unused _get_system_prompt helper
- _get_system_prompt helper: fetches managed prompts from Langfuse
with hardcoded fallback (same pattern as chat service)
- journey.py: journey_system prompt manageable via Langfuse
- agent_runner.py: batch_file_classifier, batch_processing,
batch_cloud_processing prompts all manageable via Langfuse
- redis_consumer.py: link_prompt_to_trace for all three handlers
- shared/config.py: add LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_HOST
- services/chat/app/tracing.py: new module — Langfuse client singleton,
create_trace(), get_langfuse_callback(), get_prompt(), link_prompt_to_trace(),
score_trace(), flush/shutdown helpers. Gracefully no-ops when keys are missing.
- services/chat/app/llm.py: add callbacks param to get_llm() for LangChain
callback handler injection
- services/chat/app/deep_agent.py: accept langfuse_handler in all run_* and
_run_single_agent* functions, pipe callbacks to LLM calls, fetch managed
prompts from Langfuse with fallback to hardcoded system prompts
- services/chat/app/redis_consumer.py: create Langfuse trace per request
(home_request/floating_request), pass callback handler to deep_agent,
link prompt name to trace, attach output preview, flush after each request
- services/chat/app/main.py: shutdown Langfuse client in lifespan teardown
- services/chat/requirements.txt: add langfuse>=2.0.0
Langfuse prompt names: 'home_system', 'floating_system' — create these in
the Langfuse dashboard to manage prompts. Without them, hardcoded defaults
are used transparently.
- Add field_validator to expand literal \n in PEM keys (auth config + shared config)
- Set extra='ignore' on shared Settings so service-specific .env vars don't cause ValidationError
- Add *.pem to .gitignore
- Add services/auth/app/config.py with JWT_PRIVATE_KEY and JWT_PUBLIC_KEY
(Auth Service local config - private key never leaves this service)
- Update routes.py: sign tokens with RS256 private key
- Update deps.py + verify.py: verify tokens with RS256 public key
- Update shared/config.py: replace JWT_SECRET/JWT_ALGORITHM with
JWT_PUBLIC_KEY (for optional local verification by other services)
- Add sys.path fix in main.py for local dev without PYTHONPATH
Root causes fixed:
1. PROJECT_TOOLS removed from Step 2 tool set — project assignment is now
exclusively handled by the runner in code, never by the LLM.
2. When Step 1 returns "new", runner calls execute_on_client insert/projects
directly (before Step 2), gets the created id, and passes it as context.
3. Newly created projects are appended to the local `projects` list so that
subsequent files in the same run can match to them via Step 1 — prevents
one project per file when multiple files share the same topic.
Also add tests/test_classify_file.py with pytest cases for _classify_file
and a CLI runner: python -m tests.test_classify_file <file> [project...]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add explicit MUST NOT instruction: never ask about projects, projectId,
or how to link records; project assignment is handled by the agent runner
- Remove projectId from template field list; remove projects from entity types
- Remove stale isApproved=0 reference (already removed from the data model)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove fixed _MAX_TURNS=5 instruction from system prompt; LLM now decides
when to stop based on self-assessed confidence (>= 90%)
- Add _MIN_TURNS_BEFORE_NUDGE=3 and raise safety cap to _MAX_TURNS=15
- Nudge message and hard cap still act as a safety net for infinite loops
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rewrite _STEP1_SYSTEM_PROMPT: lower matching threshold (no longer requires
"clear" match), strongly prefer existing projects over creating new ones,
use structured id=|name=|status= format with aiSummary for richer context
- Add code-level UUID validation: reject hallucinated ids not in the fetched
projects list, fall back to "new" instead of creating a bad link
- Rewrite _PROCESSING_SYSTEM_PROMPT: enforce explicit scan-before-create
process (read existing → search → update if found → create only if not)
with hard rule against calling create_* without checking existing records
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace LLM-driven triage with code-based directory scan and project fetch
- Two-step LLM approach: Step 1 classifies file→project+domains, Step 2 processes with tools
- Add domain descriptions to Step 1 prompt for better extraction accuracy
- Add _running_agents set for per-agent concurrency guard (one running instance per agent)
- Return 409 from route before DB write when agent already running
- Remove is_approved from task_agent create/update tools and system prompt
- Remove is_approved from timeline_agent create/update tools and system prompt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The tier is resolved live from the subscriptions table in get_current_user.
Previously fell back to 'free' unconditionally, hitting the 5 runs/day
limit immediately in dev. Now falls back to 'power' (unlimited) when
ENV=dev and no subscription row exists.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Users without a subscription row in dev get power tier so rate limits
and quota checks don't block local development. In prod the fallback
remains free tier as before.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- AgentTriggerRequest accepts optional agent_id (FE's stable electron-store UUID)
- _make_agent_executor injects run_context into every tool_call frame
so Electron can attribute actions to the correct agent run
- run_local_agent accepts run_context and sends a run_complete WS frame
when the run finishes so the FE can close the run record
- trigger_agent_run builds run_context with run_id=run_log.id and the
stable agent_id, passes it through to run_local_agent
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AgentTriggerRequest.what_to_extract now accepts list[str] instead of
strict Literal values. _to_data_types normalises all FE variants
(tasks/task, notes/note, timelines/timeline/timelineEvents,
projects/project) to the canonical plural form the runner expects,
with deduplication.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove unused config_schema from AgentCatalogItem (schema + route)
- Fix agent_setup system prompt: add extraction agent base behaviour
context so journey LLM knows what is already handled and focuses on
field mappings only; remove redundant data-types question (already
known from user selection); derive data types list dynamically
- Rewrite processing base prompt to use actual tool names
(list_tasks, update_task, add_task_comment, list_notes, update_note,
list_timelines, update_timeline, list_all_projects, create_project)
and enforce update-first strategy before falling back to creation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Use session_id from the FE frame so replies match the listener key
- Seed conversation with a user message for LLM provider compatibility
- On max turns, nudge the LLM and immediately re-invoke to force
prompt_template generation instead of deferring to next message
- Fix display_message extraction to safely check for template markers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the single-pass FE-driven agent_run/agent_data flow with a
BE-orchestrated two-phase execution using LangChain tool-calling:
- Phase 1 (Triage): explores directory via new filesystem tools, matches
files to existing projects using PROJECT_TOOLS
- Phase 2 (Processing): reads files and performs CRUD per project group
with clean LLM context windows
Key changes:
- Add filesystem_agent.py with list_directory, read_file_content,
get_file_metadata tools using execute_on_client()
- Move setup journey from REST to WebSocket (journey_start/message frames)
- Add batch_runs_per_day billing limit and enforce in /trigger
- Remove deprecated agent_data/agent_complete frame handlers and queues
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code bugs fixed:
- checkpoint_agent.py, project_agent.py, note_agent.py: add missing
'import json' (used in handle() for context serialization)
Test fixes:
- test_agents.py: add autouse ws_executor fixture that sets a fake
execute_on_client so tools can run in unit tests without a WS session
- Rewrite all TestXxxAgentTools tests: patch execute_on_client per-test,
assert on call_args (what payload was sent to the client) and on the
formatted string return value — matching actual tool behavior
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>