- Add Polish city name declensions to local keyword matcher
- Add openingHours string format alongside openingHoursSpecification
- Add Wejherowo to page title for city_in_title signal
- Add service+city keyword phrases in visible text (serwis, transport,
szkolenia, sklep, remonty, instalacje + Wejherowo/Rumia/Reda/Gdynia)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
parsedate_to_datetime returns offset-aware datetime from Last-Modified
header, but datetime.now() is naive. Strip tzinfo before subtraction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DATABASE_URL and PAGESPEED_API_KEY are read at module level (import
time), so load_dotenv must run before third-party imports that
reference these variables.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add load_dotenv() to seo_audit.py so it reads GOOGLE_PAGESPEED_API_KEY
and DATABASE_URL from .env without requiring manual env var passing.
Fixes PageSpeed 429 errors when running audit via SSH.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
seo_audit.py was missing SSL columns (has_ssl, ssl_expires_at,
ssl_issuer) in its INSERT/UPDATE query, causing all SEO-audited
companies to show has_ssl=false regardless of actual certificate status.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix ~190 hardcoded Polish strings missing diacritical characters
across seo_audit.html, gbp_audit.html, social_audit.html
- Fix encoding issue in SEO scraper: requests defaults to ISO-8859-1
when server omits charset, causing mojibake for UTF-8 pages.
Now uses apparent_encoding detection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Google replaced First Input Delay (FID) with Interaction to Next Paint
(INP) as a Core Web Vital in March 2024. This renames the DB column
from first_input_delay_ms to interaction_to_next_paint_ms, updates the
PageSpeed client to prefer the INP audit key, and fixes all references
across routes, services, scripts, and report generators. Updated INP
thresholds: good ≤200ms, needs improvement ≤500ms.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bug: When page fetch fails (SSL error), result['onpage'] is None.
Using dict.get('key', {}) returns None when key exists with None value.
Fix: Use 'or {}' pattern to handle both missing keys and None values.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Changed PostgreSQL-specific ANY(:ids) to use IN clause with
dynamic placeholders for SQLite/PostgreSQL compatibility
- Verified SEO audit dry-run extracts all metrics correctly:
- HTTP status, load time, final URL
- Meta title, H1 count, image analysis
- Structured data detection
- robots.txt, sitemap.xml, indexability
- Overall SEO score calculation (95 for pixlab.pl)
Note: Company ID 26 has no website configured, tested with ID 1 instead.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Enhanced save_audit_result method with complete column coverage
- Added missing columns to idempotent upsert query:
- broken_links_count (for future link checking)
- viewport_configured (derived from meta viewport tag)
- is_mobile_friendly (derived from viewport content)
- has_hreflang (for international SEO detection)
- All 45+ SEO columns now properly mapped for database upserts
- ON CONFLICT (company_id) DO UPDATE ensures idempotent operations
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enhanced scripts/seo_audit.py with comprehensive CLI improvements:
CLI Arguments:
- --company-id: Audit single company by ID
- --company-ids: Audit multiple companies (comma-separated)
- --batch: Audit range of companies (e.g., 1-10)
- --all: Audit all companies
- --dry-run: Print results without database writes
- --verbose/-v: Debug output
- --quiet/-q: Suppress progress output
- --json: JSON output for scripting
- --database-url: Override DATABASE_URL env var
Progress Logging:
- ETA calculation based on average time per company
- Progress counter [X/Y] for each company
- Status indicators (SUCCESS/SKIPPED/FAILED/TIMEOUT)
Summary Reporting:
- Detailed breakdown by result category
- Edge case counts (no_website, unavailable, timeout, ssl_errors)
- PageSpeed API quota tracking (start/used/remaining)
- Visual score distribution with bar charts
- Failed audits listing with error messages
Error Handling:
- Proper exit codes (0-5) for different scenarios
- Categorization of errors (timeout, connection, SSL, unavailable)
- Database connection error handling
- Quota exceeded handling
- Batch argument validation with helpful error messages
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements SEOAuditor class following social_media_audit.py pattern:
- __init__: Initialize database connection and analysis components
- get_companies: Fetch companies by ID, batch, or all
- audit_company: Full SEO audit (PageSpeed, on-page, technical)
- save_audit_result: Upsert to company_website_analysis table
- run_audit: Orchestration with progress logging and summary
Features:
- Integrates GooglePageSpeedClient for Lighthouse scores
- Uses OnPageSEOAnalyzer for meta tags, headings, images, links
- Uses TechnicalSEOChecker for robots.txt, sitemap, canonical
- Calculates overall SEO score from weighted components
- CLI support: --company-id, --batch, --all, --dry-run, --json
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>