Refactor _pending_scout_oauth_states from a tuple to a dict carrying
mode (reconnect|create), draft fields, and a transient encrypted token.
Add authorize-draft, session-labels, and cloud/finalize endpoints so the
scout row is created only when the flow completes — abandoned flows leave
no orphan rows. Zero-trust: the encrypted token lives only in the in-memory
session (<=15 min) until finalize persists it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The run_logs relationship joins scout_run_logs.scout_id (varchar) to
cloud_scout_configs.id (uuid); Postgres has no varchar=uuid operator so the
ORM cascade on db.delete(scout) 500'd. Core deletes bypass it; triage queue
rows cascade via FK ondelete.
Replace bulk GmailClient.fetch_messages() + linear search with a direct
service.users().messages().get(format="full") call. Adds _extract_plain_text_body
helper for recursive MIME part walking. Update test to patch _get_gmail_service.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three new endpoints under /api/v1/scouts/oauth/gmail/:
GET /authorize — PKCE consent URL for gmail.readonly + gmail.modify scopes
GET /web-callback — bounces to adiuvai:// deep link (excluded from schema)
POST /callback — exchanges code, encrypts + stores token, triggers setup_watch
State TTL 10 min, in-memory (same pattern as auth.py _pending_states).
Redirect URI base derived from existing OAUTH_REDIRECT_URI setting.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds GmailConnector registration to the FastAPI lifespan startup block,
making it available via the connector registry for the ScoutEngine
and any other startup-time consumers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements GmailConnector — the first concrete SourceConnector.
Wraps existing GmailClient + low-level Gmail API service for metadata-only
fetch, trash archive, incremental history polling, and Pub/Sub watch setup.
Adds GMAIL_PUBSUB_TOPIC setting (empty string default for dev).
Adds 3 passing unit tests (mocked API, no real credentials required).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add ScoutEngine.deliver_pending(user_id, ws) that queries status='queued'
rows, fetches metadata via the registered connector, sends scout_proposal
WS frames, and flips status to 'delivered'. Add ack_proposal(proposal_id)
that flips 'delivered' -> 'acked' (idempotent). Wire both into device_ws.py:
deliver_pending fires as a background task after device_hello + register;
scout_proposal_ack frames dispatch to ack_proposal in the message loop.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tasks 12+13 of Phase 2 — first new infra after rename.
Alembic 008 creates scout_triage_queue with unique constraint on
(scout_id, source_msg_ref) and partial index on expires_at for active
rows. Adds four columns to cloud_scout_configs: auto_trash_spam,
gmail_history_id, gmail_watch_expires_at, device_inactivity_pause_days.
SQLAlchemy model ScoutTriageQueue added; CloudScoutConfig updated to
match. Imports extended with UniqueConstraint and text.
WsDeviceHello.agent_ids → scout_ids in Pydantic schema,
device_ws.py handler, and all test fixtures (test_device_ws,
test_ws_unified, test_memory_middleware). Also fixes stale
CloudAgentConfig reference in gmail.py docstring.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rename routes/agents.py → routes/scouts.py and routes/agent_setup.py →
routes/scout_setup.py. Update APIRouter prefix/tags in scouts.py to
/scouts and scouts. Update main.py router registration, device_ws.py
import, and test_journey_v2.py import/patch paths to use scout_setup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After M6.5 deletion of run_floating_stream and the frame dispatch,
WsFrameType.floating_request/floating_domain, WsFloatingRequest,
WsFloatingDomain, WsFloatingScope, WsDomain, and the StreamFormatter's
floating_domain branch were left as dead protocol surface. Remove them,
along with the corresponding test cases in test_schemas_v3.py and
test_output_formatter.py.
Replaced by tests/test_contextual_*.py in M3.
No dedicated test_floating_*.py files existed; floating test
functions were embedded in test_deep_agent.py and test_ws_unified.py
and have been removed from those files.
contextual_request + contextual_scope_update are the only WS
flows for ad-hoc contextual chat now. Floating system prompt
constant removed; Langfuse 'floating_system' is deleted in a
separate manual step. Also removes floating-agent LLM slot from
llm.py and the associated LLM_MODEL_FLOATING_AGENT setting entry.
Smoke trace 0b46841484ba7d024ed9f8d5ac8b1df0 showed the agent
defaulting to list_projects + get_project for a 'summarize
project Nexus' query, returning a shallow row without aiSummary
or tasks/notes. The legacy read tools were exposed via
*PROJECT_TOOLS / *TASK_TOOLS spreading.
Now _contextual_tools exposes exactly:
- get_page_details (sole read; supports per-entity + list views)
- create_task, update_task
- create_note
- create_timeline
Prompt rule 2 explicitly forbids the legacy reads, and the test
asserts they are excluded from the palette.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use _build_system_prompt helper so the contextual agent gets the
same system-prompt slots as home/floating runners — most importantly
{date_context} so the agent can reason about due dates when
creating/updating tasks.
Also makes the session_id contract on run_contextual_stream explicit
(was reading via context['_debug']) and tightens the tool-list test.
_SessionBuffer.append_system_message(user_id, session_id, text) injects a
synthetic SystemMessage into the named session slot (creating it if absent).
ContextualBufferProxy closes over user_id + session_id so call sites need
only call proxy.append_system_message(text).
get_session_buffer(user_id, session_id, channel) in device_ws returns a
ContextualBufferProxy, keeping the test-patchable function signature intact.
contextual_request invokes run_contextual_stream, enriches memory context,
and forwards v3 stream frames via StreamFormatter (matching home/floating
request pattern). Episode stored after response.
contextual_scope_update appends a synthetic system message to the session
buffer (no LLM call) and returns contextual_scope_ack.
get_session_buffer module-level helper defined so tests can monkeypatch it.
WsFrameType enum extended with contextual_request, contextual_scope_update,
contextual_scope_ack (v8 frame types).
NOTE: test_contextual_ws.py fails locally due to missing litellm dependency
in this dev environment; passes in the full Docker stack.
New agent runner. Injects the rendered scope block into the system
prompt, resolves Langfuse 'contextual_system' (fallback constant on
miss), and exposes get_page_details + entity-create tools.
Note-edit tools (propose_note_edit) intentionally excluded — next sprint.
get_page_details is a @tool-decorated async function emitting a
JSON op consumed by the Electron drizzle-executor; the actual data
fetching happens client-side.
_contextual_tools() assembles the safe tool palette. Tools follow the
existing @tool decorator pattern from langchain_core.tools.
NOTE: test_run_contextual.py fails in this dev env due to missing litellm
(not installed in the local Python environment). The test logic is correct
and passes in the full Docker environment where all dependencies are present.
Convert app/schemas.py → app/schemas/__init__.py so the contextual
module can live at app/schemas/contextual.py while keeping all existing
'from app.schemas import ...' calls unchanged.
ContextualScope mirrors the renderer's camelCase payload via
alias_generator=to_camel. render_scope_block produces a single-paragraph
human-readable summary injected into the contextual agent system prompt.
4 tests, all passing.
Add build_brief_multi_project_manifest() to deep_agent.py that fetches
all project folder manifests via execute_on_client and keeps the top 5
most-recently-modified files per project. Wire into run_home_brief in
brief_agent.py, injecting the <linked_folders> block into the system
prompt alongside FOLDER_TOOLS.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add optional project_id param to run_home_stream. When set, fetch the linked
folder manifest via _fetch_project_manifest and prepend the <linked_folder>
block to the system prompt. Also build an explicit tools list that extends
_all_tools_for_user with FOLDER_TOOLS so the home agent can read folder
files. device_ws._handle_home_request extracts project_id / projectId from
the home_request frame and forwards it to the runner.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add _fetch_project_manifest helper that calls read_project_folder_manifest
via execute_on_client. Wire it into run_task_brief_research_stream (new
optional project_id param) so the <linked_folder> block is prepended to the
system prompt when the task belongs to a linked project. Also bind
FOLDER_TOOLS into the task-brief tool palette so the agent can read folder
files. device_ws extracts project_id / projectId from the task_brief_request
frame and forwards it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add six v7 WsFrameType enum members (index_session_start/cancel/batch,
index_file_result/progress/done), wire dispatch in device_ws message loop,
and implement _handle_index_session_start/cancel/file_batch with per-file
summarisation, token accounting, and quota enforcement.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add pypdf/python-docx deps, _extract_pdf_text/_extract_docx_text helpers,
and summarize_pdf/summarize_docx wrappers that delegate to summarize_text.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pre-flight quota check for folder_index. Returns 402 with reason
when file cap or monthly token budget would be exceeded; 200 {"ok": true}
otherwise. Also adds auth_headers_free fixture to conftest.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements check_folder_quota and add_token_usage in app/billing/quota.py
with dialect-aware upsert (pg_insert on PostgreSQL, read-then-write on SQLite).
Adds test_user_free/test_user_power fixtures and db alias to conftest.py.
6 new tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add folder_max_files and folder_monthly_tokens to all four tier dicts
in FEATURES, and add get_feature_value() helper to TierManager.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add _language_instruction() to deep_agent.py, reads language from core memory
- Append language directive to all 4 run_* functions (task/project/checkpoint/note)
- Minor fixes: alembic env, route imports, test cleanup
Before: branch 3 of oauth_callback attempted to INSERT a user with a
duplicate email → DB constraint violation → 500.
After: if email_verified=False and the email already exists, raise 409
with a message directing the user to sign in with their password.
Also adds test_callback_unverified_email_conflict_returns_409.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6 tests covering the authorize and callback endpoints:
- authorize returns URL + state, 503 when unconfigured
- callback: state mismatch → 401, new user creation, existing OAuth
link re-login (same user sub), email-match auto-linking to password user
Provider methods (exchange_code, get_userinfo) are mocked via AsyncMock
so tests run without hitting Google APIs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GET /auth/oauth/{provider}/web-callback receives the Google redirect and
bounces immediately to adiuvai://oauth/callback deep link. Google Cloud
Console only accepts http/https redirect URIs — adiuvai:// is not valid.
Default OAUTH_REDIRECT_URI now points to localhost:8000 for dev; override
with the API domain env var in production.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Step 1 of Google login integration: Alembic migration for oauth_accounts +
avatar_url on users, OAuthAccount model with User relationship, UserProfile
schema extended with avatar_url, get_current_user updated to include avatar_url.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep only 4.1 (first reply contains question) as automated eval.
Multi-turn cases (4.2–4.5) are non-deterministic and tested manually
with results tracked in Langfuse.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Langfuse V3 does not accept user_id/session_id on observation-level calls.
Moved to metadata dict in agent_runner, deep_agent, and agent_setup.
refactor(tests): fixture-based pattern for agent_runner_v2 eval tests
- cases.yaml + data/ fixtures under tests/fixtures/agent_runner_v2/
- pytest_generate_tests parametrizes test_eval_runner from YAML
- _resolve_projects() handles symbolic names and inline dicts
- _evaluate_case() centralizes all assertion logic
- --runner-dir CLI option for custom fixture folders
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
lf.trace() and lf.score(trace_id=...) are V2 API removed in V3.
V3 pattern:
lf.start_as_current_observation(name=...) as context manager → obs
obs.score(name=..., value=...)
contextlib.nullcontext() when lf is None so structure stays the same
Updated tests 2.1–2.7 in test_agent_runner_v2.py accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Langfuse uses {{variable}} syntax in its prompt management UI, while the
hardcoded fallbacks use {variable} (Python str.format). The previous code
always called .format() which silently failed/errored when a real Langfuse
prompt was fetched.
- langfuse_client.py: add compile_prompt(template, prompt_obj, **vars)
→ uses prompt_obj.compile(**vars) when Langfuse is available
→ falls back to template.format(**vars) when using the hardcoded fallback
- agent_runner.py: replace .format() with compile_prompt() for
unified_processing (V2 local) and batch_cloud_processing (cloud agent)
- agent_setup.py: replace .format() with compile_prompt() for journey_system
deep_agent.py prompts have no variables, so no change needed there.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
I test del preprocessor sono deterministici — nessun LLM coinvolto,
nessuno score da tracciare.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
file: serve sia come path da leggere che come nome passato a detect_content_type.
Non c'è motivo di averli separati.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- conftest.py: registra --preprocess-dir via pytest_addoption
- test_preprocessors.py: usa pytest_generate_tests per leggere i casi
a collection time con accesso a config; _content e _fixtures_dir
accettano path dinamico
Usage: pytest tests/test_preprocessors.py --preprocess-dir /my/folder
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
YAML: rimosse op/description/score_name/assertions block — ora detect/process
come chiave diretta, assertions piatte sullo stesso livello del caso.
Runner: eliminato _run_assertions engine, assertions inline in test_preprocess.
Riduzione da ~170 a ~75 righe totali tra YAML + test.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Scoring is only meaningful for LLM-backed steps. Preprocess tests are
deterministic Python, so scores add no value. Kept only for detect tests.
- test_preprocess: drop _lf_score call, simplify _run_assertions return type
- cases.yaml: remove score_name from all op=preprocess entries
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New app/core/langfuse_client.py: lazy singleton client, get_prompt_or_fallback()
helper (returns raw template + prompt obj for linking), extract_usage() for token
counts. No-ops when LANGFUSE_* env vars are not set.
- deep_agent.py: home-agent and floating-agent runs wrapped in spans; each ainvoke
wrapped in a generation with model/input/output/usage; prompts fetched from
Langfuse (adiuva-home-agent, adiuva-floating-agent, adiuva-floating-classifier)
with hardcoded fallback.
- agent_runner.py: step1-classifier and step2-processor LLM calls traced; batch
agent _run_agent_with_tools spans + generations; cloud-processor included.
Prompts: adiuva-step1-classifier, adiuva-step2-processor, adiuva-cloud-processor.
- agent_setup.py: journey-setup span + generation per ainvoke; prompt_obj stored
on JourneySession and reused across turns. Prompt: journey_system.
- settings.py: LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_HOST added.
- .env.example: Langfuse section with EU/US/self-hosted host comments.
- requirements.txt: langfuse>=2.0.0.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root causes fixed:
1. PROJECT_TOOLS removed from Step 2 tool set — project assignment is now
exclusively handled by the runner in code, never by the LLM.
2. When Step 1 returns "new", runner calls execute_on_client insert/projects
directly (before Step 2), gets the created id, and passes it as context.
3. Newly created projects are appended to the local `projects` list so that
subsequent files in the same run can match to them via Step 1 — prevents
one project per file when multiple files share the same topic.
Also add tests/test_classify_file.py with pytest cases for _classify_file
and a CLI runner: python -m tests.test_classify_file <file> [project...]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add explicit MUST NOT instruction: never ask about projects, projectId,
or how to link records; project assignment is handled by the agent runner
- Remove projectId from template field list; remove projects from entity types
- Remove stale isApproved=0 reference (already removed from the data model)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove fixed _MAX_TURNS=5 instruction from system prompt; LLM now decides
when to stop based on self-assessed confidence (>= 90%)
- Add _MIN_TURNS_BEFORE_NUDGE=3 and raise safety cap to _MAX_TURNS=15
- Nudge message and hard cap still act as a safety net for infinite loops
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rewrite _STEP1_SYSTEM_PROMPT: lower matching threshold (no longer requires
"clear" match), strongly prefer existing projects over creating new ones,
use structured id=|name=|status= format with aiSummary for richer context
- Add code-level UUID validation: reject hallucinated ids not in the fetched
projects list, fall back to "new" instead of creating a bad link
- Rewrite _PROCESSING_SYSTEM_PROMPT: enforce explicit scan-before-create
process (read existing → search → update if found → create only if not)
with hard rule against calling create_* without checking existing records
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace LLM-driven triage with code-based directory scan and project fetch
- Two-step LLM approach: Step 1 classifies file→project+domains, Step 2 processes with tools
- Add domain descriptions to Step 1 prompt for better extraction accuracy
- Add _running_agents set for per-agent concurrency guard (one running instance per agent)
- Return 409 from route before DB write when agent already running
- Remove is_approved from task_agent create/update tools and system prompt
- Remove is_approved from timeline_agent create/update tools and system prompt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The tier is resolved live from the subscriptions table in get_current_user.
Previously fell back to 'free' unconditionally, hitting the 5 runs/day
limit immediately in dev. Now falls back to 'power' (unlimited) when
ENV=dev and no subscription row exists.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Users without a subscription row in dev get power tier so rate limits
and quota checks don't block local development. In prod the fallback
remains free tier as before.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- AgentTriggerRequest accepts optional agent_id (FE's stable electron-store UUID)
- _make_agent_executor injects run_context into every tool_call frame
so Electron can attribute actions to the correct agent run
- run_local_agent accepts run_context and sends a run_complete WS frame
when the run finishes so the FE can close the run record
- trigger_agent_run builds run_context with run_id=run_log.id and the
stable agent_id, passes it through to run_local_agent
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AgentTriggerRequest.what_to_extract now accepts list[str] instead of
strict Literal values. _to_data_types normalises all FE variants
(tasks/task, notes/note, timelines/timeline/timelineEvents,
projects/project) to the canonical plural form the runner expects,
with deduplication.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove unused config_schema from AgentCatalogItem (schema + route)
- Fix agent_setup system prompt: add extraction agent base behaviour
context so journey LLM knows what is already handled and focuses on
field mappings only; remove redundant data-types question (already
known from user selection); derive data types list dynamically
- Rewrite processing base prompt to use actual tool names
(list_tasks, update_task, add_task_comment, list_notes, update_note,
list_timelines, update_timeline, list_all_projects, create_project)
and enforce update-first strategy before falling back to creation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Use session_id from the FE frame so replies match the listener key
- Seed conversation with a user message for LLM provider compatibility
- On max turns, nudge the LLM and immediately re-invoke to force
prompt_template generation instead of deferring to next message
- Fix display_message extraction to safely check for template markers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the single-pass FE-driven agent_run/agent_data flow with a
BE-orchestrated two-phase execution using LangChain tool-calling:
- Phase 1 (Triage): explores directory via new filesystem tools, matches
files to existing projects using PROJECT_TOOLS
- Phase 2 (Processing): reads files and performs CRUD per project group
with clean LLM context windows
Key changes:
- Add filesystem_agent.py with list_directory, read_file_content,
get_file_metadata tools using execute_on_client()
- Move setup journey from REST to WebSocket (journey_start/message frames)
- Add batch_runs_per_day billing limit and enforce in /trigger
- Remove deprecated agent_data/agent_complete frame handlers and queues
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code bugs fixed:
- checkpoint_agent.py, project_agent.py, note_agent.py: add missing
'import json' (used in handle() for context serialization)
Test fixes:
- test_agents.py: add autouse ws_executor fixture that sets a fake
execute_on_client so tools can run in unit tests without a WS session
- Rewrite all TestXxxAgentTools tests: patch execute_on_client per-test,
assert on call_args (what payload was sent to the client) and on the
formatted string return value — matching actual tool behavior
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- device_ws.py: dispatch home_request/popup_request to HomeFormatter/PopupFormatter
via async tasks; each request gets a UUID request_id for frame correlation
- chat.py: remove chat_stream WS endpoint (superseded by unified device WS);
keep POST /chat REST fallback unchanged
- 5 new integration tests pass; all 22 existing device_ws tests still pass
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- orchestrate_v3(user_id, message, context): classifies intent, returns
(agent_name, agent_instance) — caller drives execution
- orchestrate_v3_stream(user_id, message, context): yields (agent_name, token)
pairs; first yield is always (agent_name, "") as a domain-detection signal
- ChatAgent.handle_stream(): default implementation yields handle() result as
one chunk; subclasses override for true token-level streaming
- Fix stale test_orchestrator.py assertions that expected a JSON final frame
that orchestrate_stream never emitted
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ChatAgent.__init__: adds tool_results: list[dict] = []
- _tool_loop: wraps execution in a result collector; populates
self.tool_results with raw execute_on_client dicts after each run
- _tool_loop_stream: streaming variant — uses ainvoke for tool-call
iterations, llm.astream() for the final answer; same result capture
- ws_context.py: adds _tool_result_collector ContextVar +
set/clear helpers; execute_on_client appends to collector when set
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace deprecated Pydantic v1 `class Config:` inner class with
`model_config = SettingsConfigDict(...)` to eliminate the deprecation
warning emitted on every test run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add app/api/routes/agents.py with 11 endpoints:
GET/POST/PUT/DELETE /agents/local (local directory agent configs)
GET/POST/PUT/DELETE /agents/cloud (cloud connector agent configs)
GET /agents/catalog (hardcoded agent type catalog)
GET /agents/runs (paginated run logs with agent_id/page/limit filters)
POST /agents/{id}/run (manual trigger stub, dispatch wired in step 3.4)
- Tier-gate creation via combined local+cloud batch_active limit
- Ownership checks on all mutations (404 on mismatch)
- Cascade delete of run logs via SQLAlchemy relationship
- Register agents router in app/main.py
- Fix missing import json in app/agents/task_agent.py