feat(local-agent-v2): step 4 — journey produces structured AgentConfig JSON

Replace freeform prompt_template output with validated AgentConfig JSON:
- agent_setup.py: new system prompt (journey_system_v2), AGENT_CONFIG_START/END
  markers, _extract_agent_config() with Pydantic validation, updated handlers
  returning agent_config key; import AgentConfig from schemas
- tests/test_journey_v2.py: 6 unit tests + 5 parametrized LLM eval cases
  following test_agent_runner_v2.py pattern; _run_journey uses
  set_client_executor/clear_client_executor mirroring device_ws
- tests/fixtures/journey_v2/: cases.yaml + email_action.html + email_info.html
- tests/conftest.py: add --journey-dir CLI option; remove S3/plugin fixtures
  (cleanup from microservices migration, already present in working tree)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Roberto Musso
2026-04-08 00:23:58 +02:00
parent c6c4578f9a
commit d8add7e8cb
6 changed files with 607 additions and 190 deletions

87
tests/fixtures/journey_v2/cases.yaml vendored Normal file
View File

@@ -0,0 +1,87 @@
# Journey V2 eval test cases — Step 4
#
# Each case simulates a complete journey session:
# 1. handle_journey_start is called with directory + data_types
# 2. handle_journey_message is called for each entry in user_messages
# 3. Assertions are evaluated on the final reply
#
# directory_files: list of {path, content_file} — content_file is relative to data/
#
# Assertion keys:
# expect_question: true → first reply must contain "?"
# expect_done: true → final reply must have done=True
# expect_valid_config: true → agent_config must be parseable as AgentConfig with content_types > 0
# expect_content_type_id: <str> → AgentConfig.content_types must contain an entry with this id
# expect_extraction_contains: <str> → first content_type extraction_prompt must contain this word
# expect_global_rules: true → AgentConfig.global_rules must be non-empty
- id: "4.1"
description: "Journey start explores directory, first reply contains a question"
directory: "/test/emails"
data_types: ["tasks", "notes", "timelines"]
directory_files:
- path: "/test/emails/outlook_export_2024.html"
content_file: "email_action.html"
user_messages: []
score_name: "journey.start"
expect_question: true
- id: "4.2"
description: "Full 3-turn conversation produces a valid AgentConfig JSON"
directory: "/test/emails"
data_types: ["tasks", "notes", "timelines"]
directory_files:
- path: "/test/emails/email_backup.html"
content_file: "email_action.html"
user_messages:
- "These are email exports from Outlook in HTML format"
- "Create tasks for emails with direct action requests, notes for informational emails"
- "Yes, that looks correct. No other rules."
score_name: "journey.valid_json"
expect_done: true
expect_valid_config: true
- id: "4.3"
description: "Journey detects email_html content type from directory exploration"
directory: "/test/emails"
data_types: ["tasks", "notes"]
directory_files:
- path: "/test/emails/message.html"
content_file: "email_action.html"
user_messages:
- "HTML email backups from my mail client, exported from Outlook"
- "Create tasks from emails that contain assignments or direct action items"
- "Correct, no other rules needed"
score_name: "journey.detect_email"
expect_done: true
expect_content_type_id: "email_html"
- id: "4.4"
description: "Custom user rule (only notes, no tasks) reflected in extraction_prompt"
directory: "/test/emails"
data_types: ["notes"]
directory_files:
- path: "/test/emails/email.html"
content_file: "email_info.html"
user_messages:
- "HTML emails from my work inbox"
- "Create only notes from all emails — I do not want tasks or timelines to be created"
- "Yes, exactly"
score_name: "journey.custom_rules"
expect_done: true
expect_extraction_contains: "note"
- id: "4.5"
description: "Global rule (no project = no entity) appears in AgentConfig.global_rules"
directory: "/test/emails"
data_types: ["tasks", "notes"]
directory_files:
- path: "/test/emails/email.html"
content_file: "email_action.html"
user_messages:
- "Email backups from Outlook"
- "Create tasks from action request emails, notes from informational emails"
- "If the email cannot be matched to any project, do not create any entity at all"
score_name: "journey.global_rules"
expect_done: true
expect_global_rules: true

View File

@@ -0,0 +1,23 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Email: Fix the login bug</title>
<style>body { font-family: Arial; } .header { color: #666; }</style>
</head>
<body>
<div class="header">
<p><strong>From:</strong> boss@company.com</p>
<p><strong>To:</strong> dev@company.com</p>
<p><strong>Subject:</strong> Fix the login bug</p>
<p><strong>Date:</strong> Mon, 7 Apr 2026 09:15:00 +0000</p>
</div>
<div class="body">
<p>Hi,</p>
<p>Please fix the login bug in Project Alpha as soon as possible.
Users are reporting that they can't log in with their Google accounts.
This is blocking the whole team. Please resolve it by Friday.</p>
<p>Thanks,<br>Boss</p>
</div>
</body>
</html>

View File

@@ -0,0 +1,23 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Email: New policy update</title>
<style>body { font-family: Arial; }</style>
</head>
<body>
<div class="header">
<p><strong>From:</strong> hr@company.com</p>
<p><strong>To:</strong> all@company.com</p>
<p><strong>Subject:</strong> FYI: New remote work policy effective May 1</p>
<p><strong>Date:</strong> Tue, 8 Apr 2026 10:00:00 +0000</p>
</div>
<div class="body">
<p>Hi everyone,</p>
<p>Just a heads-up that starting May 1, 2026 the company will be moving to
a hybrid work model. You will be expected to come into the office at least
two days per week. More details will follow in the employee handbook.</p>
<p>Best,<br>HR Team</p>
</div>
</body>
</html>