YAML: rimosse op/description/score_name/assertions block — ora detect/process come chiave diretta, assertions piatte sullo stesso livello del caso. Runner: eliminato _run_assertions engine, assertions inline in test_preprocess. Riduzione da ~170 a ~75 righe totali tra YAML + test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
72 lines
1.7 KiB
YAML
72 lines
1.7 KiB
YAML
# Preprocessor test cases
|
|
#
|
|
# detect: <expected_type> → chiama detect_content_type(filename, content)
|
|
# process: <content_type> → chiama preprocess(content_type, content)
|
|
#
|
|
# Sorgente: file: <nome in data/> oppure generate: binary_noise
|
|
# filename: override del nome file passato a detect (default: valore di file:)
|
|
#
|
|
# Assertions piatte (solo per process):
|
|
# no_html: true clean_text senza tag HTML
|
|
# min_chars: N len(clean_text) >= N
|
|
# ratio_lt: F len(clean) / len(raw) < F
|
|
# has_meta: [k, ...] chiavi presenti in metadata
|
|
# contains: str | [str] substring(s) presenti in clean_text
|
|
# excludes: str | [str] substring(s) assenti da clean_text
|
|
# content_type: str result.content_type == questo valore
|
|
|
|
- id: "1.1"
|
|
file: email_action.html
|
|
filename: email_export.html
|
|
detect: email_html
|
|
|
|
- id: "1.2"
|
|
file: generic_page.html
|
|
filename: index.html
|
|
detect: generic_html
|
|
|
|
- id: "1.3"
|
|
file: notes.txt
|
|
detect: plain_text
|
|
|
|
- id: "1.4"
|
|
generate: binary_noise
|
|
filename: archive.xyz
|
|
detect: unknown
|
|
|
|
- id: "1.5"
|
|
file: email_action.html
|
|
process: email_html
|
|
no_html: true
|
|
min_chars: 50
|
|
ratio_lt: 0.8
|
|
|
|
- id: "1.6"
|
|
file: email_action.html
|
|
process: email_html
|
|
has_meta: [subject, from]
|
|
|
|
- id: "1.7"
|
|
file: email_thread.html
|
|
process: email_html
|
|
contains: "Sure, I'll handle the deploy"
|
|
excludes: "Let's plan the deploy"
|
|
|
|
- id: "1.8"
|
|
file: email_single.html
|
|
process: email_html
|
|
contains: "deploy is done"
|
|
|
|
- id: "1.9"
|
|
file: email_heavy.html
|
|
process: email_html
|
|
no_html: true
|
|
min_chars: 30
|
|
excludes: [border-collapse, font-size]
|
|
|
|
- id: "1.10"
|
|
file: fallback.txt
|
|
process: unknown
|
|
min_chars: 1
|
|
content_type: unknown
|