feat: add preprocessor system (Step 1 — Local Agent V2)

- app/core/preprocessors/__init__.py: detect_content_type + preprocess dispatcher
- app/core/preprocessors/base.py: PreprocessResult dataclass
- app/core/preprocessors/email_html.py: BeautifulSoup HTML stripping, metadata extraction, thread splitting
- requirements.txt: add beautifulsoup4 and lxml
- tests/test_preprocessors.py: 10 tests with Langfuse scoring (preprocess.* scores)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Roberto Musso
2026-04-07 10:19:02 +02:00
parent aa8bcbf0d8
commit a2d6d689e4
5 changed files with 463 additions and 0 deletions

View File

@@ -33,4 +33,6 @@ google-auth-httplib2>=0.2.0
msal>=1.28.0
cryptography>=42.0.0
langfuse>=2.0.0
beautifulsoup4>=4.12.0
lxml>=5.0.0
ruff>=0.8.0