Add six v7 WsFrameType enum members (index_session_start/cancel/batch,
index_file_result/progress/done), wire dispatch in device_ws message loop,
and implement _handle_index_session_start/cancel/file_batch with per-file
summarisation, token accounting, and quota enforcement.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add pypdf/python-docx deps, _extract_pdf_text/_extract_docx_text helpers,
and summarize_pdf/summarize_docx wrappers that delegate to summarize_text.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pre-flight quota check for folder_index. Returns 402 with reason
when file cap or monthly token budget would be exceeded; 200 {"ok": true}
otherwise. Also adds auth_headers_free fixture to conftest.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements check_folder_quota and add_token_usage in app/billing/quota.py
with dialect-aware upsert (pg_insert on PostgreSQL, read-then-write on SQLite).
Adds test_user_free/test_user_power fixtures and db alias to conftest.py.
6 new tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add _language_instruction() to deep_agent.py, reads language from core memory
- Append language directive to all 4 run_* functions (task/project/checkpoint/note)
- Minor fixes: alembic env, route imports, test cleanup
Before: branch 3 of oauth_callback attempted to INSERT a user with a
duplicate email → DB constraint violation → 500.
After: if email_verified=False and the email already exists, raise 409
with a message directing the user to sign in with their password.
Also adds test_callback_unverified_email_conflict_returns_409.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6 tests covering the authorize and callback endpoints:
- authorize returns URL + state, 503 when unconfigured
- callback: state mismatch → 401, new user creation, existing OAuth
link re-login (same user sub), email-match auto-linking to password user
Provider methods (exchange_code, get_userinfo) are mocked via AsyncMock
so tests run without hitting Google APIs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep only 4.1 (first reply contains question) as automated eval.
Multi-turn cases (4.2–4.5) are non-deterministic and tested manually
with results tracked in Langfuse.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Langfuse V3 does not accept user_id/session_id on observation-level calls.
Moved to metadata dict in agent_runner, deep_agent, and agent_setup.
refactor(tests): fixture-based pattern for agent_runner_v2 eval tests
- cases.yaml + data/ fixtures under tests/fixtures/agent_runner_v2/
- pytest_generate_tests parametrizes test_eval_runner from YAML
- _resolve_projects() handles symbolic names and inline dicts
- _evaluate_case() centralizes all assertion logic
- --runner-dir CLI option for custom fixture folders
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
lf.trace() and lf.score(trace_id=...) are V2 API removed in V3.
V3 pattern:
lf.start_as_current_observation(name=...) as context manager → obs
obs.score(name=..., value=...)
contextlib.nullcontext() when lf is None so structure stays the same
Updated tests 2.1–2.7 in test_agent_runner_v2.py accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
I test del preprocessor sono deterministici — nessun LLM coinvolto,
nessuno score da tracciare.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
file: serve sia come path da leggere che come nome passato a detect_content_type.
Non c'è motivo di averli separati.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- conftest.py: registra --preprocess-dir via pytest_addoption
- test_preprocessors.py: usa pytest_generate_tests per leggere i casi
a collection time con accesso a config; _content e _fixtures_dir
accettano path dinamico
Usage: pytest tests/test_preprocessors.py --preprocess-dir /my/folder
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
YAML: rimosse op/description/score_name/assertions block — ora detect/process
come chiave diretta, assertions piatte sullo stesso livello del caso.
Runner: eliminato _run_assertions engine, assertions inline in test_preprocess.
Riduzione da ~170 a ~75 righe totali tra YAML + test.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Scoring is only meaningful for LLM-backed steps. Preprocess tests are
deterministic Python, so scores add no value. Kept only for detect tests.
- test_preprocess: drop _lf_score call, simplify _run_assertions return type
- cases.yaml: remove score_name from all op=preprocess entries
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root causes fixed:
1. PROJECT_TOOLS removed from Step 2 tool set — project assignment is now
exclusively handled by the runner in code, never by the LLM.
2. When Step 1 returns "new", runner calls execute_on_client insert/projects
directly (before Step 2), gets the created id, and passes it as context.
3. Newly created projects are appended to the local `projects` list so that
subsequent files in the same run can match to them via Step 1 — prevents
one project per file when multiple files share the same topic.
Also add tests/test_classify_file.py with pytest cases for _classify_file
and a CLI runner: python -m tests.test_classify_file <file> [project...]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code bugs fixed:
- checkpoint_agent.py, project_agent.py, note_agent.py: add missing
'import json' (used in handle() for context serialization)
Test fixes:
- test_agents.py: add autouse ws_executor fixture that sets a fake
execute_on_client so tools can run in unit tests without a WS session
- Rewrite all TestXxxAgentTools tests: patch execute_on_client per-test,
assert on call_args (what payload was sent to the client) and on the
formatted string return value — matching actual tool behavior
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- device_ws.py: dispatch home_request/popup_request to HomeFormatter/PopupFormatter
via async tasks; each request gets a UUID request_id for frame correlation
- chat.py: remove chat_stream WS endpoint (superseded by unified device WS);
keep POST /chat REST fallback unchanged
- 5 new integration tests pass; all 22 existing device_ws tests still pass
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>