# SEO Audit Flow
**Document Version:** 1.0
**Last Updated:** 2026-01-10
**Status:** Production LIVE
**Flow Type:** Admin-Triggered Website SEO Analysis
---
## Overview
This document describes the **complete SEO audit flow** for the Norda Biznes Partner application, covering:
- **Admin Dashboard** (`/admin/seo` route)
- **Single Company Audit** (admin-triggered via UI/API)
- **Batch Audit** (script-based for all companies)
- **PageSpeed Insights API Integration** for performance metrics
- **On-Page SEO Analysis** (meta tags, headings, images, links)
- **Technical SEO Checks** (robots.txt, sitemap, canonical URLs)
- **Database Storage** in `company_website_analysis` table
- **Results Display** on admin dashboard and company profiles
**Key Technology:**
- **PageSpeed API:** Google PageSpeed Insights (Lighthouse)
- **Analysis Engine:** SEOAuditor (scripts/seo_audit.py)
- **On-Page Analyzer:** OnPageSEOAnalyzer (scripts/seo_analyzer.py)
- **Technical Checker:** TechnicalSEOChecker (scripts/seo_analyzer.py)
- **Database:** PostgreSQL (company_website_analysis table)
**Key Features:**
- Full website analysis (PageSpeed + On-Page + Technical SEO)
- Admin dashboard with sortable table and score distribution
- Color-coded score badges (green 90-100, yellow 50-89, red 0-49)
- Filtering by category, score range, and company name
- Single company audit trigger from admin UI
- Batch audit script for all companies (`scripts/seo_audit.py`)
- API quota tracking (25,000 requests/day free tier)
**API Costs & Performance:**
- **API:** Google PageSpeed Insights (Free tier: 25,000 queries/day)
- **Pricing:** Free for up to 25,000 requests/day, $5/1000 queries after
- **Typical Audit Time:** 5-15 seconds per company
- **Actual Cost:** $0.00 (free tier, 80 companies = 80 audits << 25,000 limit)
---
## 1. High-Level SEO Audit Flow
### 1.1 Complete SEO Audit Flow Diagram
```mermaid
flowchart TD
Admin[Admin User] -->|1. Navigate to /admin/seo| Browser[Browser]
Browser -->|2. GET /admin/seo| Flask[Flask App
app.py]
Flask -->|3. Check permissions| AuthCheck{Is Admin?}
AuthCheck -->|No| Deny[403 Forbidden]
AuthCheck -->|Yes| Dashboard[Admin SEO Dashboard
admin_seo_dashboard.html]
Dashboard -->|4. Render dashboard| Browser
Browser -->|5. Display stats & table| AdminUI[Admin UI]
AdminUI -->|6. Click 'Uruchom audyt'
for single company| TriggerSingle[Trigger Single Audit]
TriggerSingle -->|7. POST /api/seo/audit| Flask
AdminUI -->|8. Click 'Uruchom audyt'
for batch| TriggerBatch[Trigger Batch Audit]
TriggerBatch -->|9. Run script| Script[scripts/seo_audit.py]
Flask -->|10. Verify admin| PermCheck{Is Admin?}
PermCheck -->|No| Error403[403 Error]
PermCheck -->|Yes| CreateAuditor[Create SEOAuditor]
CreateAuditor -->|11. Initialize| Auditor[SEOAuditor
seo_audit.py]
Auditor -->|12. Fetch page| Website[Company Website]
Website -->|13. HTML + HTTP status| Auditor
Auditor -->|14. Analyze HTML| OnPageAnalyzer[OnPageSEOAnalyzer]
OnPageAnalyzer -->|15. Extract meta tags
headings, images| OnPageResult[On-Page Results]
Auditor -->|16. Technical checks| TechnicalChecker[TechnicalSEOChecker]
TechnicalChecker -->|17. Check robots.txt
sitemap, canonical| TechResult[Technical Results]
Auditor -->|18. Check quota| QuotaCheck{Quota > 0?}
QuotaCheck -->|No| SkipPageSpeed[Skip PageSpeed]
QuotaCheck -->|Yes| PageSpeedClient[GooglePageSpeedClient]
PageSpeedClient -->|19. API call| PageSpeedAPI[Google PageSpeed Insights
Lighthouse]
PageSpeedAPI -->|20. Scores + CWV| PageSpeedResult[PageSpeed Results]
PageSpeedResult -->|21. Combine results| Auditor
OnPageResult -->|22. Combine results| Auditor
TechResult -->|23. Combine results| Auditor
SkipPageSpeed -->|24. Combine results| Auditor
Auditor -->|25. Calculate overall score| ScoreCalc[Score Calculator]
ScoreCalc -->|26. Overall SEO score| AuditResult[Complete Audit Result]
AuditResult -->|27. Save to DB| DB[(company_website_analysis)]
DB -->|28. Saved| Auditor
Auditor -->|29. Return results| Flask
Flask -->|30. JSON response| Browser
Browser -->|31. Reload dashboard| AdminUI
Script -->|32. Batch process| Auditor
Script -->|33. For each company| Auditor
style Auditor fill:#4CAF50
style PageSpeedClient fill:#2196F3
style OnPageAnalyzer fill:#FF9800
style TechnicalChecker fill:#9C27B0
style DB fill:#E91E63
```
### 1.2 Admin Dashboard View Flow
```mermaid
sequenceDiagram
participant Admin as Admin User
participant Browser as Browser
participant Flask as Flask App
participant DB as PostgreSQL
Admin->>Browser: Navigate to /admin/seo
Browser->>Flask: GET /admin/seo
Flask->>Flask: Check is_admin permission
alt Not Admin
Flask-->>Browser: Redirect to dashboard
else Is Admin
Flask->>DB: Query companies + SEO analysis
DB-->>Flask: Companies with scores
Flask->>Flask: Calculate stats (avg, distribution)
Flask-->>Browser: Render admin_seo_dashboard.html
Browser-->>Admin: Display dashboard with stats & table
end
Admin->>Browser: Click filter/sort
Browser->>Browser: Client-side filtering (JavaScript)
Browser-->>Admin: Updated table view
Admin->>Browser: Click "Uruchom audyt" for company
Browser->>Browser: Show confirmation modal
Admin->>Browser: Confirm audit
Browser->>Flask: POST /api/seo/audit {slug: "company-slug"}
Flask->>Flask: Verify admin + rate limit (10/hour)
Flask->>DB: Find company by slug
DB-->>Flask: Company record
Note over Flask,DB: SEO audit process (see next diagram)
Flask-->>Browser: JSON {success: true, scores: {...}}
Browser->>Browser: Show success modal
Browser->>Browser: Reload page after 1.5s
Browser->>Flask: GET /admin/seo (refresh)
Flask->>DB: Query companies + updated scores
DB-->>Flask: Companies with new scores
Flask-->>Browser: Updated dashboard
Browser-->>Admin: Display updated scores
```
---
## 2. SEO Audit Process Details
### 2.1 Single Company Audit Flow
```mermaid
sequenceDiagram
participant Flask as Flask App
participant Auditor as SEOAuditor
participant Web as Company Website
participant OnPage as OnPageSEOAnalyzer
participant Tech as TechnicalSEOChecker
participant PageSpeed as GooglePageSpeedClient
participant API as PageSpeed Insights API
participant DB as PostgreSQL
Flask->>Auditor: audit_company(company_dict)
Note over Auditor: 1. FETCH PAGE
Auditor->>Web: HTTP GET website_url
Web-->>Auditor: HTML content + status (200/404/500)
Note over Auditor,OnPage: 2. ON-PAGE ANALYSIS
Auditor->>OnPage: analyze_html(html, base_url)
OnPage->>OnPage: Extract meta tags (title, description, keywords)
OnPage->>OnPage: Count headings (H1, H2, H3)
OnPage->>OnPage: Analyze images (total, alt text)
OnPage->>OnPage: Count links (internal, external)
OnPage->>OnPage: Detect structured data (JSON-LD, Schema.org)
OnPage->>OnPage: Extract Open Graph tags
OnPage->>OnPage: Extract Twitter Card tags
OnPage->>OnPage: Count words on homepage
OnPage-->>Auditor: OnPageSEOResult
Note over Auditor,Tech: 3. TECHNICAL CHECKS
Auditor->>Tech: check_url(final_url)
Tech->>Web: GET /robots.txt
Web-->>Tech: robots.txt content or 404
Tech->>Tech: Parse robots.txt (exists, blocks Googlebot)
Tech->>Web: GET /sitemap.xml
Web-->>Tech: sitemap.xml content or 404
Tech->>Tech: Validate XML sitemap
Tech->>Tech: Check meta robots tags
Tech->>Tech: Check canonical URL
Tech->>Tech: Detect redirect chains
Tech-->>Auditor: TechnicalSEOResult
Note over Auditor,API: 4. PAGESPEED INSIGHTS
Auditor->>PageSpeed: Check remaining quota
PageSpeed-->>Auditor: quota_remaining (e.g., 24,950/25,000)
alt Quota Available
Auditor->>PageSpeed: analyze_url(url, strategy=MOBILE)
PageSpeed->>API: POST runPagespeed?url=...&strategy=mobile
API->>API: Run Lighthouse audit (5-15 seconds)
API-->>PageSpeed: Lighthouse results JSON
PageSpeed->>PageSpeed: Extract scores (0-100)
PageSpeed->>PageSpeed: Extract Core Web Vitals (LCP, FID, CLS)
PageSpeed->>PageSpeed: Extract audits (failed checks)
PageSpeed-->>Auditor: PageSpeedResult
else No Quota
Auditor->>Auditor: Skip PageSpeed (save quota)
end
Note over Auditor: 5. CALCULATE SCORES
Auditor->>Auditor: _calculate_onpage_score(onpage)
Auditor->>Auditor: _calculate_technical_score(technical)
Auditor->>Auditor: _calculate_overall_score(all_results)
Note over Auditor: Score weights:
Note over Auditor: PageSpeed SEO: 3x
Note over Auditor: PageSpeed Perf: 2x
Note over Auditor: On-Page: 2x
Note over Auditor: Technical: 2x
Note over Auditor,DB: 6. SAVE TO DATABASE
Auditor->>DB: UPSERT company_website_analysis
Note over DB: ON CONFLICT (company_id) DO UPDATE
DB-->>Auditor: Saved successfully
Auditor-->>Flask: Complete audit result dict
```
### 2.2 Batch Audit Script Flow
```mermaid
flowchart TD
Start[Start: python seo_audit.py --all] --> Init[Initialize SEOAuditor]
Init --> GetCompanies[Get companies from DB
ORDER BY id]
GetCompanies --> Loop{For each company}
Loop -->|Next company| CheckWebsite{Has website?}
CheckWebsite -->|No| Skip[Skip: No website]
Skip --> Loop
CheckWebsite -->|Yes| CheckQuota{Quota > 0?}
CheckQuota -->|No| QuotaWarn[Warn: Quota exceeded
Skip PageSpeed]
QuotaWarn --> AuditPartial[Audit without PageSpeed]
CheckQuota -->|Yes| AuditFull[Full Audit
PageSpeed + OnPage + Technical]
AuditPartial --> SaveResult[Save to database]
AuditFull --> SaveResult
SaveResult --> UpdateStats[Update summary stats]
UpdateStats --> Sleep[Sleep 1s
Rate limiting]
Sleep --> Loop
Loop -->|Done| PrintSummary[Print Summary Report]
PrintSummary --> ShowStats[Show score distribution
Failed audits
Quota usage]
ShowStats --> End[Exit with code]
style AuditFull fill:#4CAF50
style AuditPartial fill:#FF9800
style QuotaWarn fill:#F44336
```
---
## 3. Score Calculation
### 3.1 Overall SEO Score Formula
The overall SEO score is a **weighted average** of four components:
```
Overall Score = (
(PageSpeed SEO × 3) +
(PageSpeed Performance × 2) +
(On-Page Score × 2) +
(Technical Score × 2)
) / Total Weight
```
**Weights:**
- PageSpeed SEO: **3x** (most important for search rankings)
- PageSpeed Performance: **2x** (user experience)
- On-Page Score: **2x** (content optimization)
- Technical Score: **2x** (crawlability and indexability)
**Score Ranges:**
- **90-100 (Green):** Excellent SEO
- **50-89 (Yellow):** Needs improvement
- **0-49 (Red):** Poor SEO
### 3.2 On-Page Score Calculation
**Starting Score:** 100 (perfect)
**Deductions:**
| Issue | Deduction | Check |
|-------|-----------|-------|
| Missing meta title | -15 | `meta_tags['title']` is empty |
| Title too short/long | -5 | Length < 30 or > 70 characters |
| Missing meta description | -10 | `meta_tags['description']` is empty |
| Description too short/long | -5 | Length < 120 or > 160 characters |
| No canonical URL | -5 | `meta_tags['canonical_url']` is empty |
| No H1 heading | -10 | `headings['h1_count']` == 0 |
| Multiple H1 headings | -5 | `headings['h1_count']` > 1 |
| Improper heading hierarchy | -5 | H3 without H2, etc. |
| >50% images missing alt | -10 | `images_without_alt / total_images` > 0.5 |
| >20% images missing alt | -5 | `images_without_alt / total_images` > 0.2 |
| No structured data | -5 | No JSON-LD or Schema.org |
| No Open Graph tags | -3 | No `og:title` |
**Example:**
```python
# Perfect page
score = 100
# Missing meta description (-10)
# 1 image without alt out of 10 (-0, < 20%)
# No structured data (-5)
final_score = 100 - 10 - 5 = 85 (Good)
```
### 3.3 Technical Score Calculation
**Starting Score:** 100 (perfect)
**Deductions:**
| Issue | Deduction | Check |
|-------|-----------|-------|
| No robots.txt | -10 | `robots_txt['exists']` == False |
| Robots blocks Googlebot | -20 | `robots_txt['blocks_googlebot']` == True |
| No sitemap.xml | -10 | `sitemap['exists']` == False |
| Invalid sitemap XML | -5 | `sitemap['is_valid_xml']` == False |
| >3 redirects in chain | -10 | `redirect_chain['chain_length']` > 3 |
| >1 redirect | -5 | `redirect_chain['chain_length']` > 1 |
| Redirect loop detected | -20 | `redirect_chain['has_redirect_loop']` == True |
| Not indexable | -15 | `indexability['is_indexable']` == False |
| Canonical to different domain | -10 | Points to external site |
**Example:**
```python
# Typical site
score = 100
# No robots.txt (-10)
# Has sitemap.xml (+0)
# 1 redirect (-5)
# Indexable (+0)
final_score = 100 - 10 - 5 = 85 (Good)
```
---
## 4. Database Schema
### 4.1 CompanyWebsiteAnalysis Table
The `company_website_analysis` table stores comprehensive SEO audit results.
**Location:** `database.py` (lines ~429-520)
**Key Fields:**
```sql
CREATE TABLE company_website_analysis (
-- Identity
id SERIAL PRIMARY KEY,
company_id INTEGER REFERENCES companies(id) UNIQUE,
analyzed_at TIMESTAMP DEFAULT NOW(),
-- Basic Info
website_url VARCHAR(500),
final_url VARCHAR(500), -- After redirects
http_status_code INTEGER,
load_time_ms INTEGER,
-- PageSpeed Scores (0-100)
pagespeed_seo_score INTEGER,
pagespeed_performance_score INTEGER,
pagespeed_accessibility_score INTEGER,
pagespeed_best_practices_score INTEGER,
pagespeed_audits JSONB, -- Failed Lighthouse audits
-- On-Page SEO
meta_title VARCHAR(500),
meta_description TEXT,
meta_keywords TEXT,
h1_count INTEGER,
h2_count INTEGER,
h3_count INTEGER,
h1_text VARCHAR(500),
total_images INTEGER,
images_without_alt INTEGER,
images_with_alt INTEGER,
internal_links_count INTEGER,
external_links_count INTEGER,
broken_links_count INTEGER,
has_structured_data BOOLEAN,
structured_data_types TEXT[], -- ['Organization', 'LocalBusiness']
structured_data_json JSONB,
-- Technical SEO
has_canonical BOOLEAN,
canonical_url VARCHAR(500),
is_indexable BOOLEAN,
noindex_reason VARCHAR(100),
has_sitemap BOOLEAN,
has_robots_txt BOOLEAN,
viewport_configured BOOLEAN,
is_mobile_friendly BOOLEAN,
-- Core Web Vitals
largest_contentful_paint_ms INTEGER, -- LCP (Good: <2500ms)
first_input_delay_ms INTEGER, -- FID (Good: <100ms)
cumulative_layout_shift NUMERIC(4,2), -- CLS (Good: <0.1)
-- Open Graph
has_og_tags BOOLEAN,
og_title VARCHAR(500),
og_description TEXT,
og_image VARCHAR(500),
has_twitter_cards BOOLEAN,
-- Language & International
html_lang VARCHAR(10),
has_hreflang BOOLEAN,
-- Word Count
word_count_homepage INTEGER,
-- Audit Metadata
seo_audit_version VARCHAR(20),
seo_audited_at TIMESTAMP,
seo_audit_errors TEXT[],
seo_overall_score INTEGER,
seo_health_score INTEGER,
seo_issues JSONB
);
-- Indexes
CREATE INDEX idx_cwa_company_id ON company_website_analysis(company_id);
CREATE INDEX idx_cwa_analyzed_at ON company_website_analysis(analyzed_at);
CREATE INDEX idx_cwa_seo_audited_at ON company_website_analysis(seo_audited_at);
```
### 4.2 Upsert Pattern
The audit uses **ON CONFLICT DO UPDATE** for idempotent saves:
```sql
INSERT INTO company_website_analysis (
company_id, analyzed_at, website_url, ...
) VALUES (
:company_id, :analyzed_at, :website_url, ...
)
ON CONFLICT (company_id) DO UPDATE SET
analyzed_at = EXCLUDED.analyzed_at,
website_url = EXCLUDED.website_url,
pagespeed_seo_score = EXCLUDED.pagespeed_seo_score,
-- ... all fields updated
seo_audited_at = EXCLUDED.seo_audited_at;
```
**Benefits:**
- Safe to run multiple times (idempotent)
- Always keeps latest audit results
- No duplicate records
- Atomic operation (transaction-safe)
---
## 5. API Endpoints
### 5.1 Admin SEO Dashboard
**Route:** `GET /admin/seo`
**Authentication:** Required (Admin only)
**Location:** `app.py` lines 4093-4192
**Purpose:** Display SEO metrics dashboard for all companies
**Query Parameters:**
- `company` (optional): Company slug to highlight/filter
**Response:** HTML (admin_seo_dashboard.html template)
**Dashboard Features:**
- Summary stats (score distribution, average, not audited count)
- Sortable table by name, category, scores, date
- Filters by category, score range, company name
- Color-coded score badges
- Last audit date with staleness indicator
- Actions: view profile, trigger single audit
**Access Control:**
```python
if not current_user.is_admin:
flash('Brak uprawnień do tej strony.', 'error')
return redirect(url_for('dashboard'))
```
### 5.2 Get SEO Audit Results (Read)
**Route:** `GET /api/seo/audit`
**Authentication:** Not required (public API)
**Location:** `app.py` lines 3870-3914
**Purpose:** Retrieve existing SEO audit results for a company
**Query Parameters:**
- `company_id` (integer): Company ID
- `slug` (string): Company slug
**Response:**
```json
{
"company_id": 26,
"company_name": "PIXLAB Sp. z o.o.",
"company_slug": "pixlab-sp-z-o-o",
"website": "https://pixlab.pl",
"pagespeed": {
"seo_score": 92,
"performance_score": 78,
"accessibility_score": 95,
"best_practices_score": 88,
"audits": {...}
},
"on_page": {
"meta_title": "PIXLAB - Oprogramowanie na miarę",
"meta_description": "Tworzymy dedykowane oprogramowanie...",
"h1_count": 1,
"total_images": 12,
"images_without_alt": 0,
"has_structured_data": true
},
"technical": {
"has_robots_txt": true,
"has_sitemap": true,
"is_indexable": true,
"is_mobile_friendly": true
},
"overall_score": 88,
"audited_at": "2026-01-10T10:30:00"
}
```
### 5.3 Trigger SEO Audit (Write)
**Route:** `POST /api/seo/audit`
**Authentication:** Required (Admin only)
**Rate Limit:** 10 requests per hour per user
**Location:** `app.py` lines 3943-4086
**Purpose:** Trigger a new SEO audit for a company
**Request Body:**
```json
{
"company_id": 26,
"slug": "pixlab-sp-z-o-o"
}
```
**Response (Success):**
```json
{
"success": true,
"message": "Audyt SEO dla firmy \"PIXLAB Sp. z o.o.\" został zakończony pomyślnie.",
"audit_version": "1.0.0",
"triggered_by": "admin@nordabiznes.pl",
"triggered_at": "2026-01-10T10:35:00",
"company_id": 26,
"company_name": "PIXLAB Sp. z o.o.",
"pagespeed": {...},
"on_page": {...},
"technical": {...},
"overall_score": 88
}
```
**Response (Error - No Website):**
```json
{
"success": false,
"error": "Firma \"PIXLAB Sp. z o.o.\" nie ma zdefiniowanej strony internetowej.",
"company_id": 26,
"company_name": "PIXLAB Sp. z o.o."
}
```
**Response (Error - Quota Exceeded):**
```json
{
"success": false,
"error": "PageSpeed API quota exceeded. Try again tomorrow.",
"company_id": 26
}
```
**Access Control:**
```python
if not current_user.is_admin:
return jsonify({
'success': False,
'error': 'Brak uprawnień. Tylko administrator może uruchamiać audyty SEO.'
}), 403
```
**Rate Limiting:**
```python
@limiter.limit("10 per hour")
```
---
## 6. PageSpeed Insights API Integration
### 6.1 API Configuration
**Service File:** `scripts/pagespeed_client.py`
**Endpoint:** `https://www.googleapis.com/pagespeedonline/v5/runPagespeed`
**Authentication:** API Key (GOOGLE_PAGESPEED_API_KEY)
**Free Tier:**
- 25,000 queries per day
- $5 per 1,000 queries after free tier
**API Key:**
- **Name in Google Cloud:** "Page SPEED SEO Audit v2"
- **Project:** NORDABIZNES (gen-lang-client-0540794446)
- **Storage:** `.env` file (GOOGLE_PAGESPEED_API_KEY)
### 6.2 API Request
```python
params = {
'url': 'https://example.com',
'key': GOOGLE_PAGESPEED_API_KEY,
'strategy': 'mobile', # or 'desktop'
'category': ['performance', 'accessibility', 'best-practices', 'seo']
}
response = requests.get(
'https://www.googleapis.com/pagespeedonline/v5/runPagespeed',
params=params,
timeout=30
)
```
### 6.3 API Response Structure
```json
{
"lighthouseResult": {
"categories": {
"performance": {"score": 0.78},
"accessibility": {"score": 0.95},
"best-practices": {"score": 0.88},
"seo": {"score": 0.92}
},
"audits": {
"largest-contentful-paint": {"numericValue": 2300},
"first-input-delay": {"numericValue": 85},
"cumulative-layout-shift": {"numericValue": 0.05},
"meta-description": {"score": 1.0},
"robots-txt": {"score": 1.0},
"is-crawlable": {"score": 1.0}
}
},
"loadingExperience": {
"metrics": {
"LARGEST_CONTENTFUL_PAINT_MS": {"category": "FAST"},
"FIRST_INPUT_DELAY_MS": {"category": "FAST"},
"CUMULATIVE_LAYOUT_SHIFT_SCORE": {"category": "FAST"}
}
}
}
```
### 6.4 Quota Management
**Quota Tracking:**
```python
class GooglePageSpeedClient:
def __init__(self):
self.daily_quota = 25000
self.used_today = 0 # Reset daily at midnight
def get_remaining_quota(self) -> int:
"""Returns remaining API quota for today."""
return max(0, self.daily_quota - self.used_today)
def analyze_url(self, url: str) -> PageSpeedResult:
if self.get_remaining_quota() <= 0:
raise QuotaExceededError("Daily quota exceeded")
# Make API call
response = self._call_api(url)
self.used_today += 1
return self._parse_response(response)
```
**Quota Exceeded Handling:**
1. Check quota before audit: `if quota > 0`
2. If exceeded, skip PageSpeed but continue on-page/technical
3. Log warning: "PageSpeed quota exceeded, skipping"
4. Return partial audit result (no PageSpeed scores)
---
## 7. SEO Audit Script Usage
### 7.1 Command Line Interface
**Script Location:** `scripts/seo_audit.py`
**Basic Usage:**
```bash
# Audit single company by ID
python seo_audit.py --company-id 26
# Audit single company by slug
python seo_audit.py --company-slug pixlab-sp-z-o-o
# Audit batch of companies (rows 1-10)
python seo_audit.py --batch 1-10
# Audit all companies
python seo_audit.py --all
# Dry run (no database writes)
python seo_audit.py --company-id 26 --dry-run
# Export results to JSON
python seo_audit.py --all --json > seo_report.json
```
**Options:**
- `--company-id ID`: Audit single company by ID
- `--company-ids IDS`: Audit multiple companies (comma-separated: 1,5,10)
- `--batch RANGE`: Audit batch by row offset (e.g., 1-10)
- `--all`: Audit all companies
- `--dry-run`: Print results without saving to database
- `--verbose, -v`: Enable verbose/debug output
- `--quiet, -q`: Suppress progress output (only summary)
- `--json`: Output results as JSON
- `--database-url URL`: Override DATABASE_URL env var
### 7.2 Exit Codes
| Code | Meaning |
|------|---------|
| 0 | All audits completed successfully |
| 1 | Argument error or invalid input |
| 2 | Partial failures (some audits failed) |
| 3 | All audits failed |
| 4 | Database connection error |
| 5 | API quota exceeded |
### 7.3 Batch Audit Output
```
============================================================
SEO AUDIT STARTING
============================================================
Companies to audit: 80
Mode: LIVE
PageSpeed API quota remaining: 24,950
============================================================
[1/80] PIXLAB Sp. z o.o. (ID: 26) - ETA: calculating...
Fetching page: https://pixlab.pl
Page fetched successfully (850ms)
Running on-page SEO analysis...
On-page analysis complete
Running technical SEO checks...
Technical checks complete
Running PageSpeed Insights (quota: 24,949)...
PageSpeed complete - SEO: 92, Perf: 78
Saved SEO audit for company 26
→ SUCCESS: Overall SEO score: 88
[2/80] Hotel SPA Wieniawa (ID: 15) - ETA: 00:15:30
Fetching page: https://wieniawa.pl
...
======================================================================
SEO AUDIT COMPLETE
======================================================================
Mode: LIVE
Duration: 00:18:45
----------------------------------------------------------------------
RESULTS BREAKDOWN
----------------------------------------------------------------------
Total companies: 80
✓ Successful: 72
✗ Failed: 5
○ Skipped: 3
- No website: 3
- Unavailable: 2
- Timeout: 2
- SSL errors: 1
----------------------------------------------------------------------
PAGESPEED API QUOTA
----------------------------------------------------------------------
Quota at start: 24,950
Quota used: 72
Quota remaining: 24,878
----------------------------------------------------------------------
SEO SCORE DISTRIBUTION
----------------------------------------------------------------------
Companies with scores: 72
Average SEO score: 76.3
Highest score: 95
Lowest score: 42
Excellent (90-100): 18 ██████████████░░░░░░░░░░░░░░░░░░
Good (70-89): 38 ████████████████████████████████
Fair (50-69): 12 ████████░░░░░░░░░░░░░░░░░░░░░░░░
Poor (<50): 4 ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
----------------------------------------------------------------------
FAILED AUDITS
----------------------------------------------------------------------
🔴 Firma ABC - HTTP 404
⏱ Firma XYZ - Timeout after 30s
🔌 Firma DEF - Connection refused
======================================================================
```
### 7.4 Production Deployment
**On NORDABIZ-01 Server:**
```bash
# Connect to server
ssh maciejpi@10.22.68.249
# Navigate to application directory
cd /var/www/nordabiznes
# Activate virtual environment
source venv/bin/activate
# Run audit for all companies (production database)
cd scripts
python seo_audit.py --all
# Run audit for specific company
python seo_audit.py --company-id 26
# Dry run to test without saving
python seo_audit.py --all --dry-run
# Export results to JSON
python seo_audit.py --all --json > ~/seo_audit_$(date +%Y%m%d).json
```
**IMPORTANT - Database Connection:**
Scripts in `scripts/` must use **localhost (127.0.0.1)** for PostgreSQL:
```python
# CORRECT:
DATABASE_URL = 'postgresql://nordabiz_app:NordaBiz2025Secure@127.0.0.1:5432/nordabiz'
# WRONG (PostgreSQL doesn't accept external connections):
DATABASE_URL = 'postgresql://nordabiz_app:NordaBiz2025Secure@10.22.68.249:5432/nordabiz'
```
### 7.5 Cron Job (Automated Audits)
**Schedule weekly audit:**
```bash
# Edit crontab
crontab -e
# Add weekly audit (Sundays at 2 AM)
0 2 * * 0 cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/seo_audit.py --all >> /var/log/nordabiznes/seo_audit.log 2>&1
```
**Benefits:**
- Automatic SEO monitoring
- Detect score degradation
- Track improvements over time
- Email alerts on failures (future)
---
## 8. Security & Performance
### 8.1 Security Features
**1. Admin-Only Access:**
```python
if not current_user.is_admin:
return jsonify({'error': 'Brak uprawnień'}), 403
```
**2. Rate Limiting:**
```python
@limiter.limit("10 per hour")
```
- Prevents API abuse
- Protects PageSpeed quota
- Per-user rate limit
**3. CSRF Protection:**
```javascript
fetch('/api/seo/audit', {
headers: {
'X-CSRFToken': csrfToken
}
})
```
**4. Input Validation:**
```python
if not company_id and not slug:
return jsonify({'error': 'Podaj company_id lub slug'}), 400
```
**5. Database Permissions:**
```sql
GRANT ALL ON TABLE company_website_analysis TO nordabiz_app;
GRANT USAGE, SELECT ON SEQUENCE company_website_analysis_id_seq TO nordabiz_app;
```
### 8.2 Performance Optimizations
**1. Upsert Instead of Insert:**
- ON CONFLICT DO UPDATE (idempotent)
- No duplicate records
- Safe to re-run audits
**2. Database Indexing:**
```sql
CREATE INDEX idx_cwa_company_id ON company_website_analysis(company_id);
CREATE INDEX idx_cwa_seo_audited_at ON company_website_analysis(seo_audited_at);
```
**3. Batch Processing:**
- Process companies sequentially
- Sleep 1s between audits (rate limiting)
- Skip companies without websites
**4. API Quota Management:**
- Check quota before calling PageSpeed
- Skip PageSpeed if quota low
- Continue with on-page/technical only
**5. Timeout Handling:**
```python
response = requests.get(url, timeout=30)
```
- Prevents hanging requests
- Falls back gracefully
**6. Caching (Future):**
- Cache PageSpeed results for 7 days
- Skip re-audit if recent (<7 days old)
- Force refresh option for admins
---
## 9. Error Handling
### 9.1 Common Errors
**1. No Website URL:**
```json
{
"success": false,
"error": "Firma \"ABC\" nie ma zdefiniowanej strony internetowej.",
"company_id": 15
}
```
**2. Website Unreachable:**
```json
{
"success": false,
"error": "Audyt nie powiódł się: HTTP 404, Timeout after 30s",
"company_id": 26
}
```
**3. SSL Certificate Error:**
```
⚠ SSL error for https://example.com
Trying HTTP fallback: http://example.com
✓ Fallback successful
```
**4. PageSpeed API Quota Exceeded:**
```json
{
"success": false,
"error": "PageSpeed API quota exceeded. Try again tomorrow."
}
```
**5. Database Connection Error:**
```
❌ Error: Database connection failed: connection refused
Exit code: 4
```
### 9.2 Error Recovery
**1. SSL Errors → HTTP Fallback:**
```python
try:
response = requests.get(https_url)
except requests.exceptions.SSLError:
http_url = https_url.replace('https://', 'http://')
response = requests.get(http_url)
```
**2. Timeout → Skip Company:**
```python
try:
response = requests.get(url, timeout=30)
except requests.exceptions.Timeout:
result['errors'].append('Timeout after 30s')
# Continue to next company
```
**3. Quota Exceeded → Skip PageSpeed:**
```python
if quota_remaining > 0:
run_pagespeed_audit()
else:
logger.warning("Quota exceeded, skipping PageSpeed")
# Continue with on-page/technical only
```
**4. Database Error → Rollback:**
```python
try:
db.execute(query)
db.commit()
except SQLAlchemyError as e:
db.rollback()
logger.error(f"Database error: {e}")
```
---
## 10. Monitoring & Maintenance
### 10.1 Health Checks
**Check SEO Audit Status:**
```bash
# Check latest audit dates
psql -U nordabiz_app -d nordabiz -c "
SELECT
c.name,
cwa.seo_audited_at,
cwa.pagespeed_seo_score,
cwa.seo_overall_score
FROM companies c
LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active'
ORDER BY cwa.seo_audited_at DESC NULLS LAST
LIMIT 10;
"
```
**Check Quota Usage:**
```bash
# Check how many audits today
psql -U nordabiz_app -d nordabiz -c "
SELECT COUNT(*) AS audits_today
FROM company_website_analysis
WHERE seo_audited_at >= CURRENT_DATE;
"
```
**Check Failed Audits:**
```bash
# Companies with no SEO data
psql -U nordabiz_app -d nordabiz -c "
SELECT c.id, c.name, c.website
FROM companies c
LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active'
AND c.website IS NOT NULL
AND cwa.id IS NULL;
"
```
### 10.2 Maintenance Tasks
**1. Re-audit Stale Data (>30 days):**
```bash
python seo_audit.py --all --filter-stale 30
```
**2. Audit New Companies:**
```bash
# Companies added in last 7 days
python seo_audit.py --filter-new 7
```
**3. Fix Failed Audits:**
```bash
# Re-audit companies with errors
python seo_audit.py --retry-failed
```
**4. Clean Old Data:**
```sql
-- Delete audit results older than 90 days (keep latest)
DELETE FROM company_website_analysis
WHERE analyzed_at < NOW() - INTERVAL '90 days'
AND id NOT IN (
SELECT DISTINCT ON (company_id) id
FROM company_website_analysis
ORDER BY company_id, analyzed_at DESC
);
```
### 10.3 Monitoring Queries
**Score Distribution:**
```sql
SELECT
CASE
WHEN pagespeed_seo_score >= 90 THEN 'Excellent (90-100)'
WHEN pagespeed_seo_score >= 50 THEN 'Good (50-89)'
WHEN pagespeed_seo_score >= 0 THEN 'Poor (0-49)'
ELSE 'Not Audited'
END AS score_range,
COUNT(*) AS companies
FROM companies c
LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active'
GROUP BY score_range
ORDER BY score_range;
```
**Top/Bottom Performers:**
```sql
-- Top 10 SEO scores
SELECT c.name, cwa.pagespeed_seo_score, cwa.seo_overall_score
FROM companies c
JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active'
ORDER BY cwa.seo_overall_score DESC
LIMIT 10;
-- Bottom 10 SEO scores
SELECT c.name, cwa.pagespeed_seo_score, cwa.seo_overall_score
FROM companies c
JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active' AND cwa.seo_overall_score IS NOT NULL
ORDER BY cwa.seo_overall_score ASC
LIMIT 10;
```
**Audit Coverage:**
```sql
SELECT
COUNT(*) AS total_companies,
COUNT(cwa.id) AS audited_companies,
ROUND(COUNT(cwa.id)::NUMERIC / COUNT(*)::NUMERIC * 100, 1) AS coverage_percent
FROM companies c
LEFT JOIN company_website_analysis cwa ON c.id = cwa.company_id
WHERE c.status = 'active' AND c.website IS NOT NULL;
```
---
## 11. Future Enhancements
### 11.1 Planned Features
**1. Automated Re-Audit Scheduling:**
- Weekly cron job for all companies
- Priority queue for low-scoring sites
- Email alerts for score drops
**2. Historical Trend Tracking:**
- Store audit history (not just latest)
- Chart score changes over time
- Identify improving/declining sites
**3. Competitor Benchmarking:**
- Compare scores within categories
- Identify SEO leaders
- Best practice recommendations
**4. SEO Report Generation:**
- PDF reports for company owners
- Actionable recommendations
- Step-by-step fix guides
**5. Integration with Company Profiles:**
- Display SEO badge on company page
- Show top SEO issues
- Link to audit details
**6. Mobile vs Desktop Audits:**
- Separate scores for mobile/desktop
- Mobile-first optimization tracking
- Device-specific recommendations
### 11.2 Technical Improvements
**1. Async Batch Processing:**
- Celery background tasks
- Parallel audits (5 concurrent)
- Real-time progress updates
**2. API Webhook Notifications:**
- Notify company owners of audit results
- Integration with Slack/Discord
- Email summaries
**3. Advanced Caching:**
- Cache PageSpeed results for 7 days
- Skip re-audit if recent
- Force refresh button for admins
**4. Audit Scheduling:**
- Per-company audit frequency
- High-priority companies daily
- Low-priority weekly
---
## 12. Troubleshooting
### 12.1 Common Issues
**Issue:** "PageSpeed API quota exceeded"
**Solution:** Wait 24 hours for quota reset or upgrade to paid tier
**Issue:** "Database connection failed"
**Solution:** Check PostgreSQL is running: `systemctl status postgresql`
**Issue:** "SSL certificate verify failed"
**Solution:** Script automatically tries HTTP fallback
**Issue:** "Company has no website URL"
**Solution:** Add website in company edit form or skip
**Issue:** "Timeout after 30s"
**Solution:** Website is slow/down, skip or retry later
### 12.2 Debugging
**Enable Verbose Logging:**
```bash
python seo_audit.py --all --verbose
```
**Check API Key:**
```bash
echo $GOOGLE_PAGESPEED_API_KEY
# Should print API key, not empty
```
**Test Single Company:**
```bash
python seo_audit.py --company-id 26 --dry-run
# See full audit output without saving
```
**Check Database Connection:**
```bash
psql -U nordabiz_app -d nordabiz -h 127.0.0.1 -c "SELECT COUNT(*) FROM companies;"
```
**Test PageSpeed API:**
```bash
curl "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://pixlab.pl&key=YOUR_API_KEY&strategy=mobile"
```
---
## 13. Related Documentation
- **Google PageSpeed API:** [docs/architecture/flows/external-api-integrations.md#3-google-pagespeed-insights-api](../06-external-integrations.md#3-google-pagespeed-insights-api)
- **Database Schema:** [docs/architecture/05-database-schema.md](../05-database-schema.md)
- **Flask Components:** [docs/architecture/04-flask-components.md](../04-flask-components.md)
- **Admin Panel:** [CLAUDE.md#audyt-seo-panel-adminseo](../../CLAUDE.md#audyt-seo-panel-adminseo)
---
## 14. Glossary
| Term | Definition |
|------|------------|
| **SEO** | Search Engine Optimization - improving website visibility in search results |
| **PageSpeed Insights** | Google tool for measuring website performance and SEO quality |
| **Lighthouse** | Automated audit tool by Google (powers PageSpeed Insights) |
| **Core Web Vitals** | Google's UX metrics: LCP (Largest Contentful Paint), FID (First Input Delay), CLS (Cumulative Layout Shift) |
| **On-Page SEO** | SEO factors on the page itself (meta tags, headings, content) |
| **Technical SEO** | SEO factors related to crawlability (robots.txt, sitemap, indexability) |
| **Meta Tags** | HTML tags providing metadata about the page (title, description, keywords) |
| **Structured Data** | Machine-readable format (JSON-LD, Schema.org) for search engines |
| **Canonical URL** | Preferred version of a page (prevents duplicate content issues) |
| **Robots.txt** | File telling search engines which pages to crawl/not crawl |
| **Sitemap.xml** | XML file listing all pages on a website for search engines |
| **Open Graph** | Meta tags for social media sharing (og:title, og:image, etc.) |
| **Twitter Card** | Meta tags for Twitter sharing |
| **Upsert** | Database operation: INSERT or UPDATE if exists |
| **Quota** | API usage limit (25,000 requests/day for PageSpeed) |
---
**Document End**