fix(zopk): Raise minimum scraped content threshold from 100 to 500 chars

Articles with only 100-458 chars were passing validation but contained metadata/teasers instead of full article text, causing all knowledge extraction to fail ("Treść za krótka do ekstrakcji"). The 500-char minimum better aligns with the 200-token chunking requirement (~800 chars). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 15:50:21 +01:00 · 2026-02-09 15:50:21 +01:00 · 18f9f98f5d
commit 18f9f98f5d
parent 3c1f920675
1 changed files with 2 additions and 2 deletions
--- a/zopk_content_scraper.py
+++ b/zopk_content_scraper.py
@ -524,8 +524,8 @@ class ZOPKContentScraper:
            # Extract text
            text = self._extract_text(content_element)

-            if not text or len(text) < 100:
-                return None, "Treść artykułu za krótka"
+            if not text or len(text) < 500:
+                return None, f"Treść artykułu za krótka ({len(text) if text else 0} znaków, min. 500)"

            # Truncate if too long
            if len(text) > MAX_CONTENT_LENGTH: