fix(zopk): Raise minimum scraped content threshold from 100 to 500 chars
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions

Articles with only 100-458 chars were passing validation but contained
metadata/teasers instead of full article text, causing all knowledge
extraction to fail ("Treść za krótka do ekstrakcji"). The 500-char
minimum better aligns with the 200-token chunking requirement (~800 chars).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Maciej Pienczyn 2026-02-09 15:50:21 +01:00
parent 3c1f920675
commit 18f9f98f5d

View File

@ -524,8 +524,8 @@ class ZOPKContentScraper:
# Extract text
text = self._extract_text(content_element)
if not text or len(text) < 100:
return None, "Treść artykułu za krótka"
if not text or len(text) < 500:
return None, f"Treść artykułu za krótka ({len(text) if text else 0} znaków, min. 500)"
# Truncate if too long
if len(text) > MAX_CONTENT_LENGTH: