fix(zopk): Raise minimum scraped content threshold from 100 to 500 chars
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
Articles with only 100-458 chars were passing validation but contained
metadata/teasers instead of full article text, causing all knowledge
extraction to fail ("Treść za krótka do ekstrakcji"). The 500-char
minimum better aligns with the 200-token chunking requirement (~800 chars).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
3c1f920675
commit
18f9f98f5d
@ -524,8 +524,8 @@ class ZOPKContentScraper:
|
||||
# Extract text
|
||||
text = self._extract_text(content_element)
|
||||
|
||||
if not text or len(text) < 100:
|
||||
return None, "Treść artykułu za krótka"
|
||||
if not text or len(text) < 500:
|
||||
return None, f"Treść artykułu za krótka ({len(text) if text else 0} znaków, min. 500)"
|
||||
|
||||
# Truncate if too long
|
||||
if len(text) > MAX_CONTENT_LENGTH:
|
||||
|
||||
Loading…
Reference in New Issue
Block a user