nordabiz/docs/architecture/flows/04-seo-audit-flow.md
Maciej Pienczyn 110d971dca
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
feat: migrate prod docs to OVH VPS + UTC→Warsaw timezone in all templates
Production moved from on-prem VM 249 (10.22.68.249) to OVH VPS
(57.128.200.27, inpi-vps-waw01). Updated ALL documentation, slash
commands, memory files, architecture docs, and deploy procedures.

Added |local_time Jinja filter (UTC→Europe/Warsaw) and converted
155 .strftime() calls across 71 templates so timestamps display
in Polish timezone regardless of server timezone.

Also includes: created_by_id tracking, abort import fix, ICS
calendar fix for missing end times, Pros Poland data cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 13:41:53 +02:00

37 KiB
Raw Permalink Blame History

SEO Audit Flow

Document Version: 1.0 Last Updated: 2026-01-10 Status: Production LIVE Flow Type: Admin-Triggered Website SEO Analysis


Overview

This document describes the complete SEO audit flow for the Norda Biznes Partner application, covering:

  • Admin Dashboard (/admin/seo route)
  • Single Company Audit (admin-triggered via UI/API)
  • Batch Audit (script-based for all companies)
  • PageSpeed Insights API Integration for performance metrics
  • On-Page SEO Analysis (meta tags, headings, images, links)
  • Technical SEO Checks (robots.txt, sitemap, canonical URLs)
  • Database Storage in company_website_analysis table
  • Results Display on admin dashboard and company profiles

Key Technology:

  • PageSpeed API: Google PageSpeed Insights (Lighthouse)
  • Analysis Engine: SEOAuditor (scripts/seo_audit.py)
  • On-Page Analyzer: OnPageSEOAnalyzer (scripts/seo_analyzer.py)
  • Technical Checker: TechnicalSEOChecker (scripts/seo_analyzer.py)
  • Database: PostgreSQL (company_website_analysis table)

Key Features:

  • Full website analysis (PageSpeed + On-Page + Technical SEO)
  • Admin dashboard with sortable table and score distribution
  • Color-coded score badges (green 90-100, yellow 50-89, red 0-49)
  • Filtering by category, score range, and company name
  • Single company audit trigger from admin UI
  • Batch audit script for all companies (scripts/seo_audit.py)
  • API quota tracking (25,000 requests/day free tier)

API Costs & Performance:

  • API: Google PageSpeed Insights (Free tier: 25,000 queries/day)
  • Pricing: Free for up to 25,000 requests/day, $5/1000 queries after
  • Typical Audit Time: 5-15 seconds per company
  • Actual Cost: $0.00 (free tier, 80 companies = 80 audits << 25,000 limit)

1. High-Level SEO Audit Flow

1.1 Complete SEO Audit Flow Diagram

flowchart TD
    Admin[Admin User] -->|1. Navigate to /admin/seo| Browser[Browser]
    Browser -->|2. GET /admin/seo| Flask[Flask App<br/>app.py]
    Flask -->|3. Check permissions| AuthCheck{Is Admin?}

    AuthCheck -->|No| Deny[403 Forbidden]
    AuthCheck -->|Yes| Dashboard[Admin SEO Dashboard<br/>admin_seo_dashboard.html]

    Dashboard -->|4. Render dashboard| Browser
    Browser -->|5. Display stats & table| AdminUI[Admin UI]

    AdminUI -->|6. Click 'Uruchom audyt'<br/>for single company| TriggerSingle[Trigger Single Audit]
    TriggerSingle -->|7. POST /api/seo/audit| Flask

    AdminUI -->|8. Click 'Uruchom audyt'<br/>for batch| TriggerBatch[Trigger Batch Audit]
    TriggerBatch -->|9. Run script| Script[scripts/seo_audit.py]

    Flask -->|10. Verify admin| PermCheck{Is Admin?}
    PermCheck -->|No| Error403[403 Error]
    PermCheck -->|Yes| CreateAuditor[Create SEOAuditor]

    CreateAuditor -->|11. Initialize| Auditor[SEOAuditor<br/>seo_audit.py]
    Auditor -->|12. Fetch page| Website[Company Website]

    Website -->|13. HTML + HTTP status| Auditor
    Auditor -->|14. Analyze HTML| OnPageAnalyzer[OnPageSEOAnalyzer]

    OnPageAnalyzer -->|15. Extract meta tags<br/>headings, images| OnPageResult[On-Page Results]

    Auditor -->|16. Technical checks| TechnicalChecker[TechnicalSEOChecker]
    TechnicalChecker -->|17. Check robots.txt<br/>sitemap, canonical| TechResult[Technical Results]

    Auditor -->|18. Check quota| QuotaCheck{Quota > 0?}
    QuotaCheck -->|No| SkipPageSpeed[Skip PageSpeed]
    QuotaCheck -->|Yes| PageSpeedClient[GooglePageSpeedClient]

    PageSpeedClient -->|19. API call| PageSpeedAPI[Google PageSpeed Insights<br/>Lighthouse]
    PageSpeedAPI -->|20. Scores + CWV| PageSpeedResult[PageSpeed Results]

    PageSpeedResult -->|21. Combine results| Auditor
    OnPageResult -->|22. Combine results| Auditor
    TechResult -->|23. Combine results| Auditor
    SkipPageSpeed -->|24. Combine results| Auditor

    Auditor -->|25. Calculate overall score| ScoreCalc[Score Calculator]
    ScoreCalc -->|26. Overall SEO score| AuditResult[Complete Audit Result]

    AuditResult -->|27. Save to DB| DB[(company_website_analysis)]
    DB -->|28. Saved| Auditor

    Auditor -->|29. Return results| Flask
    Flask -->|30. JSON response| Browser
    Browser -->|31. Reload dashboard| AdminUI

    Script -->|32. Batch process| Auditor
    Script -->|33. For each company| Auditor

    style Auditor fill:#4CAF50
    style PageSpeedClient fill:#2196F3
    style OnPageAnalyzer fill:#FF9800
    style TechnicalChecker fill:#9C27B0
    style DB fill:#E91E63

1.2 Admin Dashboard View Flow

sequenceDiagram
    participant Admin as Admin User
    participant Browser as Browser
    participant Flask as Flask App
    participant DB as PostgreSQL

    Admin->>Browser: Navigate to /admin/seo
    Browser->>Flask: GET /admin/seo
    Flask->>Flask: Check is_admin permission

    alt Not Admin
        Flask-->>Browser: Redirect to dashboard
    else Is Admin
        Flask->>DB: Query companies + SEO analysis
        DB-->>Flask: Companies with scores
        Flask->>Flask: Calculate stats (avg, distribution)
        Flask-->>Browser: Render admin_seo_dashboard.html
        Browser-->>Admin: Display dashboard with stats & table
    end

    Admin->>Browser: Click filter/sort
    Browser->>Browser: Client-side filtering (JavaScript)
    Browser-->>Admin: Updated table view

    Admin->>Browser: Click "Uruchom audyt" for company
    Browser->>Browser: Show confirmation modal
    Admin->>Browser: Confirm audit
    Browser->>Flask: POST /api/seo/audit {slug: "company-slug"}

    Flask->>Flask: Verify admin + rate limit (10/hour)
    Flask->>DB: Find company by slug
    DB-->>Flask: Company record

    Note over Flask,DB: SEO audit process (see next diagram)

    Flask-->>Browser: JSON {success: true, scores: {...}}
    Browser->>Browser: Show success modal
    Browser->>Browser: Reload page after 1.5s
    Browser->>Flask: GET /admin/seo (refresh)
    Flask->>DB: Query companies + updated scores
    DB-->>Flask: Companies with new scores
    Flask-->>Browser: Updated dashboard
    Browser-->>Admin: Display updated scores

2. SEO Audit Process Details

2.1 Single Company Audit Flow

sequenceDiagram
    participant Flask as Flask App
    participant Auditor as SEOAuditor
    participant Web as Company Website
    participant OnPage as OnPageSEOAnalyzer
    participant Tech as TechnicalSEOChecker
    participant PageSpeed as GooglePageSpeedClient
    participant API as PageSpeed Insights API
    participant DB as PostgreSQL

    Flask->>Auditor: audit_company(company_dict)

    Note over Auditor: 1. FETCH PAGE
    Auditor->>Web: HTTP GET website_url
    Web-->>Auditor: HTML content + status (200/404/500)

    Note over Auditor,OnPage: 2. ON-PAGE ANALYSIS
    Auditor->>OnPage: analyze_html(html, base_url)
    OnPage->>OnPage: Extract meta tags (title, description, keywords)
    OnPage->>OnPage: Count headings (H1, H2, H3)
    OnPage->>OnPage: Analyze images (total, alt text)
    OnPage->>OnPage: Count links (internal, external)
    OnPage->>OnPage: Detect structured data (JSON-LD, Schema.org)
    OnPage->>OnPage: Extract Open Graph tags
    OnPage->>OnPage: Extract Twitter Card tags
    OnPage->>OnPage: Count words on homepage
    OnPage-->>Auditor: OnPageSEOResult

    Note over Auditor,Tech: 3. TECHNICAL CHECKS
    Auditor->>Tech: check_url(final_url)
    Tech->>Web: GET /robots.txt
    Web-->>Tech: robots.txt content or 404
    Tech->>Tech: Parse robots.txt (exists, blocks Googlebot)

    Tech->>Web: GET /sitemap.xml
    Web-->>Tech: sitemap.xml content or 404
    Tech->>Tech: Validate XML sitemap

    Tech->>Tech: Check meta robots tags
    Tech->>Tech: Check canonical URL
    Tech->>Tech: Detect redirect chains
    Tech-->>Auditor: TechnicalSEOResult

    Note over Auditor,API: 4. PAGESPEED INSIGHTS
    Auditor->>PageSpeed: Check remaining quota
    PageSpeed-->>Auditor: quota_remaining (e.g., 24,950/25,000)

    alt Quota Available
        Auditor->>PageSpeed: analyze_url(url, strategy=MOBILE)
        PageSpeed->>API: POST runPagespeed?url=...&strategy=mobile
        API->>API: Run Lighthouse audit (5-15 seconds)
        API-->>PageSpeed: Lighthouse results JSON
        PageSpeed->>PageSpeed: Extract scores (0-100)
        PageSpeed->>PageSpeed: Extract Core Web Vitals (LCP, FID, CLS)
        PageSpeed->>PageSpeed: Extract audits (failed checks)
        PageSpeed-->>Auditor: PageSpeedResult
    else No Quota
        Auditor->>Auditor: Skip PageSpeed (save quota)
    end

    Note over Auditor: 5. CALCULATE SCORES
    Auditor->>Auditor: _calculate_onpage_score(onpage)
    Auditor->>Auditor: _calculate_technical_score(technical)
    Auditor->>Auditor: _calculate_overall_score(all_results)

    Note over Auditor: Score weights:
    Note over Auditor: PageSpeed SEO: 3x
    Note over Auditor: PageSpeed Perf: 2x
    Note over Auditor: On-Page: 2x
    Note over Auditor: Technical: 2x

    Note over Auditor,DB: 6. SAVE TO DATABASE
    Auditor->>DB: UPSERT company_website_analysis
    Note over DB: ON CONFLICT (company_id) DO UPDATE
    DB-->>Auditor: Saved successfully

    Auditor-->>Flask: Complete audit result dict

2.2 Batch Audit Script Flow

flowchart TD
    Start[Start: python seo_audit.py --all] --> Init[Initialize SEOAuditor]
    Init --> GetCompanies[Get companies from DB<br/>ORDER BY id]

    GetCompanies --> Loop{For each company}
    Loop -->|Next company| CheckWebsite{Has website?}

    CheckWebsite -->|No| Skip[Skip: No website]
    Skip --> Loop

    CheckWebsite -->|Yes| CheckQuota{Quota > 0?}
    CheckQuota -->|No| QuotaWarn[Warn: Quota exceeded<br/>Skip PageSpeed]
    QuotaWarn --> AuditPartial[Audit without PageSpeed]

    CheckQuota -->|Yes| AuditFull[Full Audit<br/>PageSpeed + OnPage + Technical]

    AuditPartial --> SaveResult[Save to database]
    AuditFull --> SaveResult

    SaveResult --> UpdateStats[Update summary stats]
    UpdateStats --> Sleep[Sleep 1s<br/>Rate limiting]
    Sleep --> Loop

    Loop -->|Done| PrintSummary[Print Summary Report]

    PrintSummary --> ShowStats[Show score distribution<br/>Failed audits<br/>Quota usage]
    ShowStats --> End[Exit with code]

    style AuditFull fill:#4CAF50
    style AuditPartial fill:#FF9800
    style QuotaWarn fill:#F44336

3. Score Calculation

3.1 Overall SEO Score Formula

The overall SEO score is a weighted average of four components:

Overall Score = (
    (PageSpeed SEO × 3) +
    (PageSpeed Performance × 2) +
    (On-Page Score × 2) +
    (Technical Score × 2)
) / Total Weight

Weights:

  • PageSpeed SEO: 3x (most important for search rankings)
  • PageSpeed Performance: 2x (user experience)
  • On-Page Score: 2x (content optimization)
  • Technical Score: 2x (crawlability and indexability)

Score Ranges:

  • 90-100 (Green): Excellent SEO
  • 50-89 (Yellow): Needs improvement
  • 0-49 (Red): Poor SEO

3.2 On-Page Score Calculation

Starting Score: 100 (perfect)

Deductions:

Issue Deduction Check
Missing meta title -15 meta_tags['title'] is empty
Title too short/long -5 Length < 30 or > 70 characters
Missing meta description -10 meta_tags['description'] is empty
Description too short/long -5 Length < 120 or > 160 characters
No canonical URL -5 meta_tags['canonical_url'] is empty
No H1 heading -10 headings['h1_count'] == 0
Multiple H1 headings -5 headings['h1_count'] > 1
Improper heading hierarchy -5 H3 without H2, etc.
>50% images missing alt -10 images_without_alt / total_images > 0.5
>20% images missing alt -5 images_without_alt / total_images > 0.2
No structured data -5 No JSON-LD or Schema.org
No Open Graph tags -3 No og:title

Example:

# Perfect page
score = 100
# Missing meta description (-10)
# 1 image without alt out of 10 (-0, < 20%)
# No structured data (-5)
final_score = 100 - 10 - 5 = 85 (Good)

3.3 Technical Score Calculation

Starting Score: 100 (perfect)

Deductions:

Issue Deduction Check
No robots.txt -10 robots_txt['exists'] == False
Robots blocks Googlebot -20 robots_txt['blocks_googlebot'] == True
No sitemap.xml -10 sitemap['exists'] == False
Invalid sitemap XML -5 sitemap['is_valid_xml'] == False
>3 redirects in chain -10 redirect_chain['chain_length'] > 3
>1 redirect -5 redirect_chain['chain_length'] > 1
Redirect loop detected -20 redirect_chain['has_redirect_loop'] == True
Not indexable -15 indexability['is_indexable'] == False
Canonical to different domain -10 Points to external site

Example:

# Typical site
score = 100
# No robots.txt (-10)
# Has sitemap.xml (+0)
# 1 redirect (-5)
# Indexable (+0)
final_score = 100 - 10 - 5 = 85 (Good)

4. Database Schema

4.1 CompanyWebsiteAnalysis Table

The company_website_analysis table stores comprehensive SEO audit results.

Location: database.py (lines ~429-520)

Key Fields:

CREATE TABLE company_website_analysis (
    -- Identity
    id SERIAL PRIMARY KEY,
    company_id INTEGER REFERENCES companies(id) UNIQUE,
    analyzed_at TIMESTAMP DEFAULT NOW(),

    -- Basic Info
    website_url VARCHAR(500),
    final_url VARCHAR(500),  -- After redirects
    http_status_code INTEGER,
    load_time_ms INTEGER,

    -- PageSpeed Scores (0-100)
    pagespeed_seo_score INTEGER,
    pagespeed_performance_score INTEGER,
    pagespeed_accessibility_score INTEGER,
    pagespeed_best_practices_score INTEGER,
    pagespeed_audits JSONB,  -- Failed Lighthouse audits

    -- On-Page SEO
    meta_title VARCHAR(500),
    meta_description TEXT,
    meta_keywords TEXT,
    h1_count INTEGER,
    h2_count INTEGER,
    h3_count INTEGER,
    h1_text VARCHAR(500),
    total_images INTEGER,
    images_without_alt INTEGER,
    images_with_alt INTEGER,
    internal_links_count INTEGER,
    external_links_count INTEGER,
    broken_links_count INTEGER,
    has_structured_data BOOLEAN,
    structured_data_types TEXT[],  -- ['Organization', 'LocalBusiness']
    structured_data_json JSONB,

    -- Technical SEO
    has_canonical BOOLEAN,
    canonical_url VARCHAR(500),
    is_indexable BOOLEAN,
    noindex_reason VARCHAR(100),
    has_sitemap BOOLEAN,
    has_robots_txt BOOLEAN,
    viewport_configured BOOLEAN,
    is_mobile_friendly BOOLEAN,

    -- Core Web Vitals
    largest_contentful_paint_ms INTEGER,  -- LCP (Good: <2500ms)
    first_input_delay_ms INTEGER,         -- FID (Good: <100ms)
    cumulative_layout_shift NUMERIC(4,2), -- CLS (Good: <0.1)

    -- Open Graph
    has_og_tags BOOLEAN,
    og_title VARCHAR(500),
    og_description TEXT,
    og_image VARCHAR(500),
    has_twitter_cards BOOLEAN,

    -- Language & International
    html_lang VARCHAR(10),
    has_hreflang BOOLEAN,

    -- Word Count
    word_count_homepage INTEGER,

    -- Audit Metadata
    seo_audit_version VARCHAR(20),
    seo_audited_at TIMESTAMP,
    seo_audit_errors TEXT[],
    seo_overall_score INTEGER,
    seo_health_score INTEGER,
    seo_issues JSONB
);

-- Indexes
CREATE INDEX idx_cwa_company_id ON company_website_analysis(company_id);
CREATE INDEX idx_cwa_analyzed_at ON company_website_analysis(analyzed_at);
CREATE INDEX idx_cwa_seo_audited_at ON company_website_analysis(seo_audited_at);

4.2 Upsert Pattern

The audit uses ON CONFLICT DO UPDATE for idempotent saves:

INSERT INTO company_website_analysis (
    company_id, analyzed_at, website_url, ...
) VALUES (
    :company_id, :analyzed_at, :website_url, ...
)
ON CONFLICT (company_id) DO UPDATE SET
    analyzed_at = EXCLUDED.analyzed_at,
    website_url = EXCLUDED.website_url,
    pagespeed_seo_score = EXCLUDED.pagespeed_seo_score,
    -- ... all fields updated
    seo_audited_at = EXCLUDED.seo_audited_at;

Benefits:

  • Safe to run multiple times (idempotent)
  • Always keeps latest audit results
  • No duplicate records
  • Atomic operation (transaction-safe)

5. API Endpoints

5.1 Admin SEO Dashboard

Route: GET /admin/seo Authentication: Required (Admin only) Location: app.py lines 4093-4192

Purpose: Display SEO metrics dashboard for all companies

Query Parameters:

  • company (optional): Company slug to highlight/filter

Response: HTML (admin_seo_dashboard.html template)

Dashboard Features:

  • Summary stats (score distribution, average, not audited count)
  • Sortable table by name, category, scores, date
  • Filters by category, score range, company name
  • Color-coded score badges
  • Last audit date with staleness indicator
  • Actions: view profile, trigger single audit

Access Control:

if not current_user.is_admin:
    flash('Brak uprawnień do tej strony.', 'error')
    return redirect(url_for('dashboard'))

5.2 Get SEO Audit Results (Read)

Route: GET /api/seo/audit Authentication: Not required (public API) Location: app.py lines 3870-3914

Purpose: Retrieve existing SEO audit results for a company

Query Parameters:

  • company_id (integer): Company ID
  • slug (string): Company slug

Response:

{
  "company_id": 26,
  "company_name": "PIXLAB Sp. z o.o.",
  "company_slug": "pixlab-sp-z-o-o",
  "website": "https://pixlab.pl",
  "pagespeed": {
    "seo_score": 92,
    "performance_score": 78,
    "accessibility_score": 95,
    "best_practices_score": 88,
    "audits": {...}
  },
  "on_page": {
    "meta_title": "PIXLAB - Oprogramowanie na miarę",
    "meta_description": "Tworzymy dedykowane oprogramowanie...",
    "h1_count": 1,
    "total_images": 12,
    "images_without_alt": 0,
    "has_structured_data": true
  },
  "technical": {
    "has_robots_txt": true,
    "has_sitemap": true,
    "is_indexable": true,
    "is_mobile_friendly": true
  },
  "overall_score": 88,
  "audited_at": "2026-01-10T10:30:00"
}

5.3 Trigger SEO Audit (Write)

Route: POST /api/seo/audit Authentication: Required (Admin only) Rate Limit: 10 requests per hour per user Location: app.py lines 3943-4086

Purpose: Trigger a new SEO audit for a company

Request Body:

{
  "company_id": 26,
  "slug": "pixlab-sp-z-o-o"
}

Response (Success):

{
  "success": true,
  "message": "Audyt SEO dla firmy \"PIXLAB Sp. z o.o.\" został zakończony pomyślnie.",
  "audit_version": "1.0.0",
  "triggered_by": "admin@nordabiznes.pl",
  "triggered_at": "2026-01-10T10:35:00",
  "company_id": 26,
  "company_name": "PIXLAB Sp. z o.o.",
  "pagespeed": {...},
  "on_page": {...},
  "technical": {...},
  "overall_score": 88
}

Response (Error - No Website):

{
  "success": false,
  "error": "Firma \"PIXLAB Sp. z o.o.\" nie ma zdefiniowanej strony internetowej.",
  "company_id": 26,
  "company_name": "PIXLAB Sp. z o.o."
}

Response (Error - Quota Exceeded):

{
  "success": false,
  "error": "PageSpeed API quota exceeded. Try again tomorrow.",
  "company_id": 26
}

Access Control:

if not current_user.is_admin:
    return jsonify({
        'success': False,
        'error': 'Brak uprawnień. Tylko administrator może uruchamiać audyty SEO.'
    }), 403

Rate Limiting:

@limiter.limit("10 per hour")

6. PageSpeed Insights API Integration

6.1 API Configuration

Service File: scripts/pagespeed_client.py

Endpoint: https://www.googleapis.com/pagespeedonline/v5/runPagespeed

Authentication: API Key (GOOGLE_PAGESPEED_API_KEY)

Free Tier:

  • 25,000 queries per day
  • $5 per 1,000 queries after free tier

API Key:

  • Name in Google Cloud: "Page SPEED SEO Audit v2"
  • Project: NORDABIZNES (gen-lang-client-0540794446)
  • Storage: .env file (GOOGLE_PAGESPEED_API_KEY)

6.2 API Request

params = {
    'url': 'https://example.com',
    'key': GOOGLE_PAGESPEED_API_KEY,
    'strategy': 'mobile',  # or 'desktop'
    'category': ['performance', 'accessibility', 'best-practices', 'seo']
}

response = requests.get(
    'https://www.googleapis.com/pagespeedonline/v5/runPagespeed',
    params=params,
    timeout=30
)

6.3 API Response Structure

{
  "lighthouseResult": {
    "categories": {
      "performance": {"score": 0.78},
      "accessibility": {"score": 0.95},
      "best-practices": {"score": 0.88},
      "seo": {"score": 0.92}
    },
    "audits": {
      "largest-contentful-paint": {"numericValue": 2300},
      "first-input-delay": {"numericValue": 85},
      "cumulative-layout-shift": {"numericValue": 0.05},
      "meta-description": {"score": 1.0},
      "robots-txt": {"score": 1.0},
      "is-crawlable": {"score": 1.0}
    }
  },
  "loadingExperience": {
    "metrics": {
      "LARGEST_CONTENTFUL_PAINT_MS": {"category": "FAST"},
      "FIRST_INPUT_DELAY_MS": {"category": "FAST"},
      "CUMULATIVE_LAYOUT_SHIFT_SCORE": {"category": "FAST"}
    }
  }
}

6.4 Quota Management

Quota Tracking:

class GooglePageSpeedClient:
    def __init__(self):
        self.daily_quota = 25000
        self.used_today = 0  # Reset daily at midnight

    def get_remaining_quota(self) -> int:
        """Returns remaining API quota for today."""
        return max(0, self.daily_quota - self.used_today)

    def analyze_url(self, url: str) -> PageSpeedResult:
        if self.get_remaining_quota() <= 0:
            raise QuotaExceededError("Daily quota exceeded")

        # Make API call
        response = self._call_api(url)
        self.used_today += 1

        return self._parse_response(response)

Quota Exceeded Handling:

  1. Check quota before audit: if quota > 0
  2. If exceeded, skip PageSpeed but continue on-page/technical
  3. Log warning: "PageSpeed quota exceeded, skipping"
  4. Return partial audit result (no PageSpeed scores)

7. SEO Audit Script Usage

7.1 Command Line Interface

Script Location: scripts/seo_audit.py

Basic Usage:

# Audit single company by ID
python seo_audit.py --company-id 26

# Audit single company by slug
python seo_audit.py --company-slug pixlab-sp-z-o-o

# Audit batch of companies (rows 1-10)
python seo_audit.py --batch 1-10

# Audit all companies
python seo_audit.py --all

# Dry run (no database writes)
python seo_audit.py --company-id 26 --dry-run

# Export results to JSON
python seo_audit.py --all --json > seo_report.json

Options:

  • --company-id ID: Audit single company by ID
  • --company-ids IDS: Audit multiple companies (comma-separated: 1,5,10)
  • --batch RANGE: Audit batch by row offset (e.g., 1-10)
  • --all: Audit all companies
  • --dry-run: Print results without saving to database
  • --verbose, -v: Enable verbose/debug output
  • --quiet, -q: Suppress progress output (only summary)
  • --json: Output results as JSON
  • --database-url URL: Override DATABASE_URL env var

7.2 Exit Codes

Code Meaning
0 All audits completed successfully
1 Argument error or invalid input
2 Partial failures (some audits failed)
3 All audits failed
4 Database connection error
5 API quota exceeded

7.3 Batch Audit Output

============================================================
                    SEO AUDIT STARTING
============================================================
Companies to audit: 80
Mode: LIVE
PageSpeed API quota remaining: 24,950
============================================================

[1/80] PIXLAB Sp. z o.o. (ID: 26) - ETA: calculating...
  Fetching page: https://pixlab.pl
  Page fetched successfully (850ms)
  Running on-page SEO analysis...
  On-page analysis complete
  Running technical SEO checks...
  Technical checks complete
  Running PageSpeed Insights (quota: 24,949)...
  PageSpeed complete - SEO: 92, Perf: 78
  Saved SEO audit for company 26
  → SUCCESS: Overall SEO score: 88

[2/80] Hotel SPA Wieniawa (ID: 15) - ETA: 00:15:30
  Fetching page: https://wieniawa.pl
  ...

======================================================================
                        SEO AUDIT COMPLETE
======================================================================

  Mode:                  LIVE
  Duration:              00:18:45

----------------------------------------------------------------------
  RESULTS BREAKDOWN
----------------------------------------------------------------------
  Total companies:       80
  ✓ Successful:          72
  ✗ Failed:              5
  ○ Skipped:             3

    - No website:        3
    - Unavailable:       2
    - Timeout:           2
    - SSL errors:        1

----------------------------------------------------------------------
  PAGESPEED API QUOTA
----------------------------------------------------------------------
  Quota at start:        24,950
  Quota used:            72
  Quota remaining:       24,878

----------------------------------------------------------------------
  SEO SCORE DISTRIBUTION
----------------------------------------------------------------------
  Companies with scores: 72
  Average SEO score:     76.3
  Highest score:         95
  Lowest score:          42

  Excellent (90-100): 18  ██████████████░░░░░░░░░░░░░░░░░░
  Good      (70-89):  38  ████████████████████████████████
  Fair      (50-69):  12  ████████░░░░░░░░░░░░░░░░░░░░░░░░
  Poor      (<50):    4   ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

----------------------------------------------------------------------
  FAILED AUDITS
----------------------------------------------------------------------
  🔴 Firma ABC - HTTP 404
  ⏱ Firma XYZ - Timeout after 30s
  🔌 Firma DEF - Connection refused

======================================================================

7.4 Production Deployment

On NORDABIZ-01 Server:

# Connect to server
ssh maciejpi@57.128.200.27

# Navigate to application directory
cd /var/www/nordabiznes

# Activate virtual environment
source venv/bin/activate

# Run audit for all companies (production database)
cd scripts
python seo_audit.py --all

# Run audit for specific company
python seo_audit.py --company-id 26

# Dry run to test without saving
python seo_audit.py --all --dry-run

# Export results to JSON
python seo_audit.py --all --json > ~/seo_audit_$(date +%Y%m%d).json

IMPORTANT - Database Connection: Scripts in scripts/ must use localhost (127.0.0.1) for PostgreSQL:

# CORRECT:
DATABASE_URL = 'postgresql://nordabiz_app:NordaBiz2025Secure@127.0.0.1:5432/nordabiz'

# WRONG (PostgreSQL doesn't accept external connections):
DATABASE_URL = 'postgresql://nordabiz_app:NordaBiz2025Secure@57.128.200.27:5432/nordabiz'

7.5 Cron Job (Automated Audits)

Schedule weekly audit:

# Edit crontab
crontab -e

# Add weekly audit (Sundays at 2 AM)
0 2 * * 0 cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/seo_audit.py --all >> /var/log/nordabiznes/seo_audit.log 2>&1

Benefits:

  • Automatic SEO monitoring
  • Detect score degradation
  • Track improvements over time
  • Email alerts on failures (future)

8. Security & Performance

8.1 Security Features

1. Admin-Only Access:

if not current_user.is_admin:
    return jsonify({'error': 'Brak uprawnień'}), 403

2. Rate Limiting:

@limiter.limit("10 per hour")
  • Prevents API abuse
  • Protects PageSpeed quota
  • Per-user rate limit

3. CSRF Protection:

fetch('/api/seo/audit', {
    headers: {
        'X-CSRFToken': csrfToken
    }
})

4. Input Validation:

if not company_id and not slug:
    return jsonify({'error': 'Podaj company_id lub slug'}), 400

5. Database Permissions:

GRANT ALL ON TABLE company_website_analysis TO nordabiz_app;
GRANT USAGE, SELECT ON SEQUENCE company_website_analysis_id_seq TO nordabiz_app;

8.2 Performance Optimizations

1. Upsert Instead of Insert:

  • ON CONFLICT DO UPDATE (idempotent)
  • No duplicate records
  • Safe to re-run audits

2. Database Indexing:

CREATE INDEX idx_cwa_company_id ON company_website_analysis(company_id);
CREATE INDEX idx_cwa_seo_audited_at ON company_website_analysis(seo_audited_at);

3. Batch Processing:

  • Process companies sequentially
  • Sleep 1s between audits (rate limiting)
  • Skip companies without websites

4. API Quota Management:

  • Check quota before calling PageSpeed
  • Skip PageSpeed if quota low
  • Continue with on-page/technical only

5. Timeout Handling:

response = requests.get(url, timeout=30)
  • Prevents hanging requests
  • Falls back gracefully

6. Caching (Future):

  • Cache PageSpeed results for 7 days
  • Skip re-audit if recent (<7 days old)
  • Force refresh option for admins

9. Error Handling

9.1 Common Errors

1. No Website URL:

{
  "success": false,
  "error": "Firma \"ABC\" nie ma zdefiniowanej strony internetowej.",
  "company_id": 15
}

2. Website Unreachable:

{
  "success": false,
  "error": "Audyt nie powiódł się: HTTP 404, Timeout after 30s",
  "company_id": 26
}

3. SSL Certificate Error:

⚠ SSL error for https://example.com
Trying HTTP fallback: http://example.com
✓ Fallback successful

4. PageSpeed API Quota Exceeded:

{
  "success": false,
  "error": "PageSpeed API quota exceeded. Try again tomorrow."
}

5. Database Connection Error:

❌ Error: Database connection failed: connection refused
Exit code: 4

9.2 Error Recovery

1. SSL Errors → HTTP Fallback:

try:
    response = requests.get(https_url)
except requests.exceptions.SSLError:
    http_url = https_url.replace('https://', 'http://')
    response = requests.get(http_url)

2. Timeout → Skip Company:

try:
    response = requests.get(url, timeout=30)
except requests.exceptions.Timeout:
    result['errors'].append('Timeout after 30s')
    # Continue to next company

3. Quota Exceeded → Skip PageSpeed:

if quota_remaining > 0:
    run_pagespeed_audit()
else:
    logger.warning("Quota exceeded, skipping PageSpeed")
    # Continue with on-page/technical only

4. Database Error → Rollback:

try:
    db.execute(query)
    db.commit()
except SQLAlchemyError as e:
    db.rollback()
    logger.error(f"Database error: {e}")

10. Monitoring & Maintenance

10.1 Health Checks

Check SEO Audit Status:

# Check latest audit dates
psql -U nordabiz_app -d nordabiz -c "
SELECT
    c.name,
    cwa.seo_audited_at,
    cwa.pagespeed_seo_score,
    cwa.seo_overall_score
FROM companies c
LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active'
ORDER BY cwa.seo_audited_at DESC NULLS LAST
LIMIT 10;
"

Check Quota Usage:

# Check how many audits today
psql -U nordabiz_app -d nordabiz -c "
SELECT COUNT(*) AS audits_today
FROM company_website_analysis
WHERE seo_audited_at >= CURRENT_DATE;
"

Check Failed Audits:

# Companies with no SEO data
psql -U nordabiz_app -d nordabiz -c "
SELECT c.id, c.name, c.website
FROM companies c
LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active'
  AND c.website IS NOT NULL
  AND cwa.id IS NULL;
"

10.2 Maintenance Tasks

1. Re-audit Stale Data (>30 days):

python seo_audit.py --all --filter-stale 30

2. Audit New Companies:

# Companies added in last 7 days
python seo_audit.py --filter-new 7

3. Fix Failed Audits:

# Re-audit companies with errors
python seo_audit.py --retry-failed

4. Clean Old Data:

-- Delete audit results older than 90 days (keep latest)
DELETE FROM company_website_analysis
WHERE analyzed_at < NOW() - INTERVAL '90 days'
  AND id NOT IN (
      SELECT DISTINCT ON (company_id) id
      FROM company_website_analysis
      ORDER BY company_id, analyzed_at DESC
  );

10.3 Monitoring Queries

Score Distribution:

SELECT
    CASE
        WHEN pagespeed_seo_score >= 90 THEN 'Excellent (90-100)'
        WHEN pagespeed_seo_score >= 50 THEN 'Good (50-89)'
        WHEN pagespeed_seo_score >= 0 THEN 'Poor (0-49)'
        ELSE 'Not Audited'
    END AS score_range,
    COUNT(*) AS companies
FROM companies c
LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active'
GROUP BY score_range
ORDER BY score_range;

Top/Bottom Performers:

-- Top 10 SEO scores
SELECT c.name, cwa.pagespeed_seo_score, cwa.seo_overall_score
FROM companies c
JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active'
ORDER BY cwa.seo_overall_score DESC
LIMIT 10;

-- Bottom 10 SEO scores
SELECT c.name, cwa.pagespeed_seo_score, cwa.seo_overall_score
FROM companies c
JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active' AND cwa.seo_overall_score IS NOT NULL
ORDER BY cwa.seo_overall_score ASC
LIMIT 10;

Audit Coverage:

SELECT
    COUNT(*) AS total_companies,
    COUNT(cwa.id) AS audited_companies,
    ROUND(COUNT(cwa.id)::NUMERIC / COUNT(*)::NUMERIC * 100, 1) AS coverage_percent
FROM companies c
LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active' AND c.website IS NOT NULL;

11. Future Enhancements

11.1 Planned Features

1. Automated Re-Audit Scheduling:

  • Weekly cron job for all companies
  • Priority queue for low-scoring sites
  • Email alerts for score drops

2. Historical Trend Tracking:

  • Store audit history (not just latest)
  • Chart score changes over time
  • Identify improving/declining sites

3. Competitor Benchmarking:

  • Compare scores within categories
  • Identify SEO leaders
  • Best practice recommendations

4. SEO Report Generation:

  • PDF reports for company owners
  • Actionable recommendations
  • Step-by-step fix guides

5. Integration with Company Profiles:

  • Display SEO badge on company page
  • Show top SEO issues
  • Link to audit details

6. Mobile vs Desktop Audits:

  • Separate scores for mobile/desktop
  • Mobile-first optimization tracking
  • Device-specific recommendations

11.2 Technical Improvements

1. Async Batch Processing:

  • Celery background tasks
  • Parallel audits (5 concurrent)
  • Real-time progress updates

2. API Webhook Notifications:

  • Notify company owners of audit results
  • Integration with Slack/Discord
  • Email summaries

3. Advanced Caching:

  • Cache PageSpeed results for 7 days
  • Skip re-audit if recent
  • Force refresh button for admins

4. Audit Scheduling:

  • Per-company audit frequency
  • High-priority companies daily
  • Low-priority weekly

12. Troubleshooting

12.1 Common Issues

Issue: "PageSpeed API quota exceeded" Solution: Wait 24 hours for quota reset or upgrade to paid tier

Issue: "Database connection failed" Solution: Check PostgreSQL is running: systemctl status postgresql

Issue: "SSL certificate verify failed" Solution: Script automatically tries HTTP fallback

Issue: "Company has no website URL" Solution: Add website in company edit form or skip

Issue: "Timeout after 30s" Solution: Website is slow/down, skip or retry later

12.2 Debugging

Enable Verbose Logging:

python seo_audit.py --all --verbose

Check API Key:

echo $GOOGLE_PAGESPEED_API_KEY
# Should print API key, not empty

Test Single Company:

python seo_audit.py --company-id 26 --dry-run
# See full audit output without saving

Check Database Connection:

psql -U nordabiz_app -d nordabiz -h 127.0.0.1 -c "SELECT COUNT(*) FROM companies;"

Test PageSpeed API:

curl "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://pixlab.pl&key=YOUR_API_KEY&strategy=mobile"


14. Glossary

Term Definition
SEO Search Engine Optimization - improving website visibility in search results
PageSpeed Insights Google tool for measuring website performance and SEO quality
Lighthouse Automated audit tool by Google (powers PageSpeed Insights)
Core Web Vitals Google's UX metrics: LCP (Largest Contentful Paint), FID (First Input Delay), CLS (Cumulative Layout Shift)
On-Page SEO SEO factors on the page itself (meta tags, headings, content)
Technical SEO SEO factors related to crawlability (robots.txt, sitemap, indexability)
Meta Tags HTML tags providing metadata about the page (title, description, keywords)
Structured Data Machine-readable format (JSON-LD, Schema.org) for search engines
Canonical URL Preferred version of a page (prevents duplicate content issues)
Robots.txt File telling search engines which pages to crawl/not crawl
Sitemap.xml XML file listing all pages on a website for search engines
Open Graph Meta tags for social media sharing (og:title, og:image, etc.)
Twitter Card Meta tags for Twitter sharing
Upsert Database operation: INSERT or UPDATE if exists
Quota API usage limit (25,000 requests/day for PageSpeed)

Document End