Maciej Pienczyn fa4fb92390 docs: Add complete architecture documentation with C4 diagrams

- System Context diagram (C4 Level 1)
- Container diagram (C4 Level 2)
- Flask component diagram (C4 Level 3)
- Deployment architecture with NPM proxy
- Database schema (PostgreSQL)
- External integrations (Gemini AI, Brave Search, PageSpeed)
- Network topology (INPI infrastructure)
- Security architecture
- API endpoints reference
- Troubleshooting guide
- Data flow diagrams (auth, search, AI chat, SEO audit, news monitoring)

All diagrams use Mermaid.js and render automatically on GitHub.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-10 12:40:52 +01:00

57 KiB

Raw Blame History

News Monitoring Flow

Document Version: 1.0 Last Updated: 2026-01-10 Status: Planned (Database schema ready, implementation pending) Flow Type: Automated News Discovery and Moderation

Overview

This document describes the complete news monitoring flow for the Norda Biznes Hub application, covering:

News Discovery via Brave Search API
AI-Powered Filtering using Google Gemini AI
Manual Moderation workflow for admins
Company Profile Display of approved news
User Notifications for new news items
Database Storage in company_news and user_notifications tables

Key Technology:

Search API: Brave Search News API (free tier: 2,000 req/month)
AI Filter: Google Gemini 2.0 Flash (relevance scoring and classification)
Database: PostgreSQL (company_news and user_notifications tables)
Scheduler: Planned cron job (6-hour intervals)

Key Features:

Automated discovery of company mentions in news media
AI-powered relevance scoring (0.0-1.0 scale)
Automatic classification (news_mention, press_release, award, etc.)
Admin moderation dashboard (/admin/news)
Display on company profiles (approved news only)
User notification system
Deduplication by URL

API Costs & Performance:

API: Brave Search News API (Free tier: 2,000 searches/month)
Pricing: Free for 2,000 monthly searches
Typical Search Time: 2-5 seconds per company
Monthly Capacity: 2,000 searches ÷ 80 companies = 25 searches per company
Actual Cost: $0.00 (within free tier)

Planned Schedule:

Run every 6 hours (4 times/day)
80 companies × 4 runs = 320 searches/day
320 × 30 days = 9,600 searches/month
⚠️ EXCEEDS FREE TIER - Need to implement rate limiting or paid tier

1. High-Level News Monitoring Flow

1.1 Complete News Monitoring Flow Diagram

flowchart TD
    Cron[Cron Job<br/>Every 6 hours] -->|Trigger| Script[scripts/fetch_company_news.py]

    Script -->|1. Fetch companies| DB[(PostgreSQL<br/>companies table)]
    DB -->|Company list| Script

    Script -->|2. For each company| Loop{More companies?}
    Loop -->|Yes| BraveAPI[Brave Search API]
    Loop -->|No| Complete[Complete]

    BraveAPI -->|3. Search query<br/>"company_name" OR "NIP"| BraveSearch[Brave Search<br/>News Endpoint]
    BraveSearch -->|4. News results<br/>JSON response| BraveAPI

    BraveAPI -->|5. News articles| Filter{Has results?}
    Filter -->|No| Loop
    Filter -->|Yes| AIFilter[AI Filtering Pipeline]

    AIFilter -->|6. For each article| Gemini[Google Gemini AI]
    Gemini -->|7. Analyze relevance| RelevanceScore[Calculate<br/>relevance_score<br/>0.0-1.0]

    RelevanceScore -->|8. Score + classification| Decision{Score >= 0.3?}
    Decision -->|No - Irrelevant| Discard[Discard article]
    Decision -->|Yes| SaveNews[Save to DB]

    SaveNews -->|9. INSERT ON CONFLICT| NewsDB[(company_news<br/>table)]
    NewsDB -->|10. Check duplicates<br/>by URL| DupeCheck{Duplicate?}

    DupeCheck -->|Yes| Skip[Skip - Already exists]
    DupeCheck -->|No| CreateRecord[Create news record<br/>status='pending']

    CreateRecord -->|11. News saved| NotifyCheck{Notify users?}
    NotifyCheck -->|Yes| CreateNotif[Create notifications]
    CreateNotif -->|12. INSERT| NotifDB[(user_notifications)]

    NotifyCheck -->|No| Loop
    CreateNotif -->|13. Done| Loop

    Discard --> Loop
    Skip --> Loop

    style BraveSearch fill:#FFD700
    style Gemini fill:#4285F4
    style NewsDB fill:#90EE90
    style NotifDB fill:#90EE90
    style AIFilter fill:#FFB6C1

1.2 Admin Moderation Flow

sequenceDiagram
    participant Admin as Admin User
    participant Browser
    participant Flask as Flask App
    participant DB as PostgreSQL

    Admin->>Browser: Navigate to /admin/news
    Browser->>Flask: GET /admin/news
    Flask->>Flask: Check permissions (is_admin?)

    alt Not Admin
        Flask-->>Browser: 403 Forbidden
    else Is Admin
        Flask->>DB: SELECT * FROM company_news<br/>WHERE moderation_status='pending'
        DB-->>Flask: Pending news list
        Flask-->>Browser: Render admin_news_moderation.html
        Browser-->>Admin: Display pending news
    end

    Admin->>Browser: Review article #42<br/>Click "Approve"
    Browser->>Flask: POST /api/news/moderate<br/>{news_id: 42, action: 'approve'}

    Flask->>Flask: Verify admin permissions
    Flask->>DB: UPDATE company_news<br/>SET moderation_status='approved',<br/>is_approved=TRUE,<br/>moderated_by=admin_id,<br/>moderated_at=NOW()

    DB-->>Flask: Updated
    Flask->>DB: INSERT INTO user_notifications<br/>(type='news', related_id=42)
    DB-->>Flask: Notification created

    Flask-->>Browser: JSON: {success: true}
    Browser-->>Admin: Show success message

    Note over Admin,DB: Article now visible on company profile

1.3 User View Flow (Company Profile)

sequenceDiagram
    participant User as Visitor/Member
    participant Browser
    participant Flask as Flask App
    participant DB as PostgreSQL

    User->>Browser: Visit /company/pixlab-sp-z-o-o
    Browser->>Flask: GET /company/pixlab-sp-z-o-o

    Flask->>DB: SELECT * FROM companies<br/>WHERE slug='pixlab-sp-z-o-o'
    DB-->>Flask: Company data

    Flask->>DB: SELECT * FROM company_news<br/>WHERE company_id=26<br/>AND is_approved=TRUE<br/>AND is_visible=TRUE<br/>ORDER BY published_date DESC<br/>LIMIT 5

    DB-->>Flask: Approved news (0-5 items)

    Flask-->>Browser: Render company_detail.html<br/>with news section
    Browser-->>User: Display company profile<br/>with "Aktualności" section

    alt Has approved news
        Browser-->>User: Show news cards<br/>(title, date, source, summary)
        User->>Browser: Click "Czytaj więcej"
        Browser->>User: Open source_url in new tab
    else No news
        Browser-->>User: "Brak aktualności"
    end

2. News Discovery Pipeline

2.1 Brave Search API Integration

Endpoint: https://api.search.brave.com/res/v1/news/search

Authentication:

API Key in .env: BRAVE_SEARCH_API_KEY
Header: X-Subscription-Token: {API_KEY}

Search Parameters:

params = {
    "q": f'"{company_name}" OR "{nip}"',  # Quoted for exact match
    "count": 10,                           # Max results per query
    "freshness": "pw",                     # Past week (pw), month (pm), year (py)
    "country": "pl",                       # Poland
    "search_lang": "pl",                   # Polish language
    "offset": 0                            # Pagination (unused)
}

Rate Limits:

Free Tier: 2,000 searches/month
Paid Tier: $5/1000 additional searches
Throttling: 1 request/second (built into script)

Response Format:

{
  "type": "news",
  "news": {
    "results": [
      {
        "title": "PIXLAB otwiera nową siedzibę w Wejherowie",
        "url": "https://example.com/article",
        "description": "Firma PIXLAB, specjalizująca się...",
        "age": "2 days ago",
        "meta_url": {
          "netloc": "example.com",
          "hostname": "example.com"
        },
        "thumbnail": {
          "src": "https://example.com/image.jpg"
        }
      }
    ]
  }
}

Error Handling:

try:
    response = requests.get(url, headers=headers, params=params, timeout=10)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.Timeout:
    # Retry with exponential backoff
    time.sleep(2 ** retry_count)
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 429:  # Rate limit exceeded
        # Wait and retry
        time.sleep(60)
    elif e.response.status_code == 401:  # Invalid API key
        # Log error and skip
        logger.error("Invalid Brave API key")
except requests.exceptions.RequestException:
    # Network error - skip company
    logger.error(f"Network error for company {company_name}")

2.2 News Discovery Script

File: scripts/fetch_company_news.py (planned)

Usage:

# Fetch news for all companies
python scripts/fetch_company_news.py --all

# Fetch for specific company
python scripts/fetch_company_news.py --company pixlab-sp-z-o-o

# Dry run (no database writes)
python scripts/fetch_company_news.py --all --dry-run

# Fetch only high-priority companies
python scripts/fetch_company_news.py --priority

Implementation Outline:

#!/usr/bin/env python3
"""
Fetch company news from Brave Search API and store in database.
"""

import os
import sys
import time
import requests
from datetime import datetime
from sqlalchemy import create_engine, select
from sqlalchemy.orm import Session
from database import Company, CompanyNews
from gemini_service import GeminiService

# Configuration
BRAVE_API_KEY = os.getenv('BRAVE_SEARCH_API_KEY')
BRAVE_NEWS_ENDPOINT = 'https://api.search.brave.com/res/v1/news/search'
DATABASE_URL = os.getenv('DATABASE_URL')

def fetch_news_for_company(company: Company, db: Session) -> int:
    """
    Fetch news for a single company.
    Returns: Number of new articles found.
    """
    # Build search query
    query = f'"{company.name}" OR "{company.nip}"'

    # Call Brave API
    headers = {'X-Subscription-Token': BRAVE_API_KEY}
    params = {
        'q': query,
        'count': 10,
        'freshness': 'pw',  # Past week
        'country': 'pl',
        'search_lang': 'pl'
    }

    response = requests.get(BRAVE_NEWS_ENDPOINT, headers=headers, params=params, timeout=10)
    response.raise_for_status()
    data = response.json()

    articles = data.get('news', {}).get('results', [])
    new_count = 0

    for article in articles:
        # Check if already exists
        existing = db.query(CompanyNews).filter_by(
            company_id=company.id,
            source_url=article['url']
        ).first()

        if existing:
            continue  # Skip duplicate

        # AI filtering
        relevance = filter_with_ai(company, article)

        if relevance['score'] < 0.3:
            continue  # Too irrelevant

        # Create news record
        news = CompanyNews(
            company_id=company.id,
            title=article['title'],
            summary=article['description'],
            source_url=article['url'],
            source_name=article['meta_url']['hostname'],
            source_type='web',
            news_type=relevance['type'],
            published_date=parse_date(article.get('age')),
            discovered_at=datetime.utcnow(),
            relevance_score=relevance['score'],
            ai_summary=relevance['summary'],
            ai_tags=relevance['tags'],
            moderation_status='pending',
            is_approved=False,
            is_visible=True
        )

        db.add(news)
        new_count += 1

    db.commit()
    return new_count

def filter_with_ai(company: Company, article: dict) -> dict:
    """
    Use Gemini AI to filter and classify news article.
    Returns: {score: float, type: str, summary: str, tags: list}
    """
    gemini = GeminiService()

    prompt = f"""
    Oceń czy poniższy artykuł jest istotny dla firmy "{company.name}".

    Firma: {company.name}
    NIP: {company.nip}
    Branża: {company.category}
    Opis: {company.description}

    Artykuł:
    Tytuł: {article['title']}
    Treść: {article['description']}

    Zwróć JSON:
    {{
        "relevance": 0.0-1.0,  // 1.0 = bardzo istotny, 0.0 = całkowicie nieistotny
        "type": "news_mention|press_release|award|social_post|event|financial|partnership",
        "reason": "Krótkie uzasadnienie oceny",
        "summary": "Krótkie streszczenie artykułu (max 200 znaków)",
        "tags": ["tag1", "tag2", "tag3"]  // Maksymalnie 5 tagów
    }}
    """

    response = gemini.generate_content(prompt)
    result = parse_json(response.text)

    return {
        'score': result['relevance'],
        'type': result['type'],
        'summary': result['summary'],
        'tags': result['tags']
    }

def main():
    """Main entry point."""
    parser = argparse.ArgumentParser(description='Fetch company news from Brave API')
    parser.add_argument('--all', action='store_true', help='Fetch for all companies')
    parser.add_argument('--company', type=str, help='Fetch for specific company (slug)')
    parser.add_argument('--dry-run', action='store_true', help='Dry run (no DB writes)')
    parser.add_argument('--priority', action='store_true', help='Fetch only high-priority')
    args = parser.parse_args()

    engine = create_engine(DATABASE_URL)

    with Session(engine) as db:
        if args.all:
            companies = db.query(Company).filter_by(is_active=True).all()
        elif args.company:
            companies = [db.query(Company).filter_by(slug=args.company).first()]
        else:
            print("Error: Must specify --all or --company")
            sys.exit(1)

        total_new = 0
        for company in companies:
            print(f"Fetching news for {company.name}...")
            new_count = fetch_news_for_company(company, db)
            total_new += new_count
            print(f"  → Found {new_count} new articles")
            time.sleep(1)  # Rate limiting

        print(f"\nTotal: {total_new} new articles")

if __name__ == '__main__':
    main()

Cron Job Setup (planned):

# Add to crontab (every 6 hours)
0 */6 * * * cd /var/www/nordabiznes && \
  /var/www/nordabiznes/venv/bin/python3 scripts/fetch_company_news.py --all \
  >> /var/log/nordabiznes/news_fetch.log 2>&1

3. AI Filtering and Classification

3.1 Gemini AI Integration

Purpose:

Filter out irrelevant articles (false positives)
Calculate relevance score (0.0-1.0)
Classify news type (news_mention, press_release, award, etc.)
Generate AI summary
Extract tags for categorization

Relevance Scoring Criteria:

Score Range	Description	Action
0.9 - 1.0	Highly relevant - direct mention, official communication	Auto-approve
0.7 - 0.8	Very relevant - significant mention or related news	Pending moderation
0.5 - 0.6	Moderately relevant - indirect mention	Pending moderation
0.3 - 0.4	Low relevance - tangential mention	Pending moderation
0.0 - 0.2	Irrelevant - false positive, unrelated	Auto-reject (discard)

AI Prompt Template:

RELEVANCE_PROMPT = """
Jesteś ekspertem od analizy newsów firmowych. Oceń czy poniższy artykuł jest istotny dla firmy.

INFORMACJE O FIRMIE:
Nazwa: {company_name}
NIP: {nip}
Branża: {category}
Opis: {description}
Lokalizacja: {city}

ARTYKUŁ DO OCENY:
Tytuł: {article_title}
Źródło: {source_name}
Data: {published_date}
Treść: {article_content}

KRYTERIA OCENY:
1. Czy artykuł bezpośrednio wspomina o firmie (nazwa lub NIP)?
2. Czy dotyczy działalności firmy, produktów lub usług?
3. Czy jest to oficjalny komunikat prasowy firmy?
4. Czy informacje są istotne dla klientów lub partnerów firmy?
5. Czy artykuł dotyczy nagród, wyróżnień lub osiągnięć firmy?

INSTRUKCJE:
- Zwróć ocenę w formacie JSON
- relevance: 0.0 (całkowicie nieistotny) do 1.0 (bardzo istotny)
- type: klasyfikacja artykułu
- reason: krótkie uzasadnienie (max 100 znaków)
- summary: streszczenie artykułu (max 200 znaków)
- tags: maksymalnie 5 tagów opisujących temat

FORMAT ODPOWIEDZI (tylko JSON, bez dodatkowego tekstu):
{{
    "relevance": 0.85,
    "type": "news_mention",
    "reason": "Artykuł wspomina o nowym projekcie firmy",
    "summary": "Firma PIXLAB rozpoczyna realizację projektu XYZ...",
    "tags": ["projekty", "IT", "wejherowo", "innowacje"]
}}

DOSTĘPNE TYPY:
- news_mention: Wzmianka w mediach
- press_release: Oficjalny komunikat prasowy
- award: Nagroda lub wyróżnienie
- social_post: Post w mediach społecznościowych
- event: Wydarzenie lub konferencja
- financial: Informacje finansowe (wyniki, inwestycje)
- partnership: Partnerstwo lub współpraca
"""

AI Cost Tracking:

# Track AI API costs in ai_api_costs table
def track_ai_cost(prompt_tokens: int, completion_tokens: int, model: str):
    """
    Track AI API usage and cost.
    Gemini 2.0 Flash: Free tier 1,500 req/day
    """
    cost_per_1k_input = 0.0  # Free tier
    cost_per_1k_output = 0.0  # Free tier

    input_cost = (prompt_tokens / 1000) * cost_per_1k_input
    output_cost = (completion_tokens / 1000) * cost_per_1k_output
    total_cost = input_cost + output_cost

    # Save to database
    cost_record = AIAPICost(
        service='gemini',
        model=model,
        operation='news_filtering',
        prompt_tokens=prompt_tokens,
        completion_tokens=completion_tokens,
        total_tokens=prompt_tokens + completion_tokens,
        cost=total_cost,
        created_at=datetime.utcnow()
    )
    db.add(cost_record)
    db.commit()

3.2 Classification Types

News Types:

news_mention - General media mention
- Company mentioned in news article
- Industry news involving the company
- Local or regional news coverage
press_release - Official company press release
- Official statements from company
- Product launches
- Company announcements
award - Award or recognition
- Industry awards won
- Certifications achieved
- Recognition or rankings
social_post - Social media post
- Facebook posts
- LinkedIn updates
- Instagram stories (future)
event - Event announcement
- Company hosting or participating in event
- Conference appearances
- Webinars or workshops
financial - Financial news
- Revenue reports
- Investment announcements
- Funding rounds
partnership - Partnership or collaboration
- New partnerships announced
- Joint ventures
- Strategic collaborations

Source Types:

web - Web news article (Brave Search)
facebook - Facebook post (future)
linkedin - LinkedIn post (future)
instagram - Instagram post (future)
press - Press release portal
award - Award announcement

4. Database Schema

4.1 company_news Table

Purpose: Store news and mentions for companies from various sources.

Schema:

CREATE TABLE company_news (
    id SERIAL PRIMARY KEY,

    -- Company reference
    company_id INTEGER NOT NULL REFERENCES companies(id) ON DELETE CASCADE,

    -- News content
    title VARCHAR(500) NOT NULL,
    summary TEXT,
    content TEXT,

    -- Source information
    source_url VARCHAR(1000),
    source_name VARCHAR(255),
    source_type VARCHAR(50),

    -- Classification
    news_type VARCHAR(50) DEFAULT 'news_mention',

    -- Dates
    published_date TIMESTAMP,
    discovered_at TIMESTAMP DEFAULT NOW(),

    -- AI filtering
    is_approved BOOLEAN DEFAULT FALSE,
    is_visible BOOLEAN DEFAULT TRUE,
    relevance_score NUMERIC(3,2),
    ai_summary TEXT,
    ai_tags TEXT[],

    -- Moderation
    moderation_status VARCHAR(20) DEFAULT 'pending',
    moderated_by INTEGER REFERENCES users(id),
    moderated_at TIMESTAMP,
    rejection_reason VARCHAR(255),

    -- Engagement
    view_count INTEGER DEFAULT 0,

    -- Timestamps
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),

    -- Unique constraint
    CONSTRAINT uq_company_news_url UNIQUE (company_id, source_url)
);

Indexes:

-- Performance indexes
CREATE INDEX idx_company_news_company_id ON company_news(company_id);
CREATE INDEX idx_company_news_source_type ON company_news(source_type);
CREATE INDEX idx_company_news_news_type ON company_news(news_type);
CREATE INDEX idx_company_news_is_approved ON company_news(is_approved);
CREATE INDEX idx_company_news_published_date ON company_news(published_date DESC);
CREATE INDEX idx_company_news_discovered_at ON company_news(discovered_at DESC);
CREATE INDEX idx_company_news_moderation ON company_news(moderation_status);

-- Composite index for efficient querying
CREATE INDEX idx_company_news_approved_visible
    ON company_news(company_id, is_approved, is_visible)
    WHERE is_approved = TRUE AND is_visible = TRUE;

Field Descriptions:

Field	Type	Description
`id`	SERIAL	Primary key
`company_id`	INTEGER	Foreign key to companies table
`title`	VARCHAR(500)	News headline
`summary`	TEXT	Short excerpt or description
`content`	TEXT	Full article content (if scraped)
`source_url`	VARCHAR(1000)	Original URL of news article
`source_name`	VARCHAR(255)	Name of source (e.g., "Gazeta Wyborcza")
`source_type`	VARCHAR(50)	Type: web, facebook, linkedin, instagram, press, award
`news_type`	VARCHAR(50)	Classification (see section 3.2)
`published_date`	TIMESTAMP	Original publication date
`discovered_at`	TIMESTAMP	When our system found it
`is_approved`	BOOLEAN	Passed AI filter and approved for display
`is_visible`	BOOLEAN	Visible on company profile
`relevance_score`	NUMERIC(3,2)	AI-calculated relevance (0.00-1.00)
`ai_summary`	TEXT	Gemini-generated summary
`ai_tags`	TEXT[]	Array of AI-extracted tags
`moderation_status`	VARCHAR(20)	Status: pending, approved, rejected
`moderated_by`	INTEGER	Admin user ID who moderated
`moderated_at`	TIMESTAMP	When moderation happened
`rejection_reason`	VARCHAR(255)	Reason if rejected
`view_count`	INTEGER	Number of views on platform
`created_at`	TIMESTAMP	Record creation time
`updated_at`	TIMESTAMP	Last update time

4.2 user_notifications Table

Purpose: In-app notifications for users with read/unread tracking.

Schema:

CREATE TABLE user_notifications (
    id SERIAL PRIMARY KEY,

    -- User reference
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,

    -- Notification content
    title VARCHAR(255) NOT NULL,
    message TEXT,
    notification_type VARCHAR(50) DEFAULT 'info',

    -- Related entity (polymorphic reference)
    related_type VARCHAR(50),
    related_id INTEGER,

    -- Status
    is_read BOOLEAN DEFAULT FALSE,
    read_at TIMESTAMP,

    -- Action
    action_url VARCHAR(500),

    -- Timestamps
    created_at TIMESTAMP DEFAULT NOW()
);

Indexes:

CREATE INDEX idx_user_notifications_user_id ON user_notifications(user_id);
CREATE INDEX idx_user_notifications_type ON user_notifications(notification_type);
CREATE INDEX idx_user_notifications_is_read ON user_notifications(is_read);
CREATE INDEX idx_user_notifications_created_at ON user_notifications(created_at DESC);

-- Composite index for unread notifications badge
CREATE INDEX idx_user_notifications_unread
    ON user_notifications(user_id, is_read, created_at DESC)
    WHERE is_read = FALSE;

Field Descriptions:

Field	Type	Description
`id`	SERIAL	Primary key
`user_id`	INTEGER	Foreign key to users table
`title`	VARCHAR(255)	Notification title
`message`	TEXT	Full notification message
`notification_type`	VARCHAR(50)	Type: news, system, message, event, alert
`related_type`	VARCHAR(50)	Type of related entity (company_news, event, message)
`related_id`	INTEGER	ID of related entity
`is_read`	BOOLEAN	Has user read the notification?
`read_at`	TIMESTAMP	When was it read?
`action_url`	VARCHAR(500)	URL to navigate when clicked
`created_at`	TIMESTAMP	Notification creation time

Notification Types:

news - New company news
system - System announcements
message - Private message notification
event - Event reminder/update
alert - Important alert

5. Admin Moderation Workflow

5.1 Admin Dashboard (`/admin/news`)

Purpose: Allow admins to review, approve, or reject pending news items.

URL: /admin/news Authentication: Requires is_admin=True

Features:

Pending News List
- Display all news with moderation_status='pending'
- Sort by discovered_at DESC (newest first)
- Show: title, company, source, published_date, relevance_score
Filtering Options
- By company (dropdown)
- By source_type (web, facebook, linkedin, etc.)
- By news_type (news_mention, press_release, award, etc.)
- By relevance_score range (0.0-1.0)
- By date range (last 7 days, last 30 days, custom)
Moderation Actions
- Approve: Set moderation_status='approved', is_approved=TRUE
- Reject: Set moderation_status='rejected', is_approved=FALSE
- Edit: Modify title, summary, news_type before approving
- Preview: View full article (open source_url in new tab)
Bulk Actions
- Approve all with relevance_score >= 0.8
- Reject all with relevance_score < 0.4
- Select multiple items for batch approval/rejection

UI Layout:

┌─────────────────────────────────────────────────────────────┐
│  NEWS MODERATION DASHBOARD                                  │
├─────────────────────────────────────────────────────────────┤
│  Filters: [Company ▼] [Type ▼] [Score ▼] [Date ▼]         │
│  Bulk: [Approve Score>=0.8] [Reject Score<0.4]             │
├─────────────────────────────────────────────────────────────┤
│  Pending: 42 | Approved: 128 | Rejected: 15                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ☐ PIXLAB otwiera nową siedzibę                            │
│     Company: PIXLAB | Type: news_mention | Score: 0.85     │
│     Source: trojmiasto.pl | Published: 2026-01-08          │
│     [Preview] [Approve] [Reject] [Edit]                    │
│                                                             │
│  ☐ Graal zwycięzcą konkursu SME Leader                     │
│     Company: GRAAL | Type: award | Score: 0.95             │
│     Source: forbes.pl | Published: 2026-01-07              │
│     [Preview] [Approve] [Reject] [Edit]                    │
│                                                             │
│  ☐ Losowe firmę wspomniała                                 │
│     Company: ABC Sp. z o.o. | Type: news_mention | Score: 0.25 │
│     Source: random-blog.com | Published: 2026-01-05        │
│     [Preview] [Approve] [Reject] [Edit]                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

5.2 Moderation API Endpoints

Endpoint: POST /api/news/moderate

Authentication: Admin only

Request Body:

{
  "news_id": 42,
  "action": "approve",  // or "reject"
  "rejection_reason": "Spam / Nieistotne / Duplikat"  // Required if rejecting
}

Response:

{
  "success": true,
  "message": "News approved successfully",
  "news_id": 42,
  "moderation_status": "approved"
}

Implementation:

@app.route('/api/news/moderate', methods=['POST'])
@login_required
def api_news_moderate():
    """Moderate a news item (admin only)."""
    if not current_user.is_admin:
        return jsonify({'error': 'Unauthorized'}), 403

    data = request.get_json()
    news_id = data.get('news_id')
    action = data.get('action')  # 'approve' or 'reject'
    rejection_reason = data.get('rejection_reason')

    news = db.session.get(CompanyNews, news_id)
    if not news:
        return jsonify({'error': 'News not found'}), 404

    if action == 'approve':
        news.moderation_status = 'approved'
        news.is_approved = True
        news.moderated_by = current_user.id
        news.moderated_at = datetime.utcnow()

        # Create notification for company owner (if exists)
        company_user = db.session.query(User).filter_by(company_id=news.company_id).first()
        if company_user:
            notification = UserNotification(
                user_id=company_user.id,
                title=f"Nowa aktualność o {news.company.name}",
                message=f"Artykuł '{news.title}' został zatwierdzony i jest widoczny na profilu firmy.",
                notification_type='news',
                related_type='company_news',
                related_id=news.id,
                action_url=f"/company/{news.company.slug}#news"
            )
            db.session.add(notification)

    elif action == 'reject':
        if not rejection_reason:
            return jsonify({'error': 'Rejection reason required'}), 400

        news.moderation_status = 'rejected'
        news.is_approved = False
        news.is_visible = False
        news.moderated_by = current_user.id
        news.moderated_at = datetime.utcnow()
        news.rejection_reason = rejection_reason

    else:
        return jsonify({'error': 'Invalid action'}), 400

    db.session.commit()

    return jsonify({
        'success': True,
        'message': f"News {action}d successfully",
        'news_id': news.id,
        'moderation_status': news.moderation_status
    })

5.3 Auto-Approval Rules

High-Confidence Auto-Approval:

Automatically approve news if ALL conditions are met:

relevance_score >= 0.9
source_type in ('press', 'award')
Company name appears in title
Source is a trusted domain (whitelist)

Trusted Sources Whitelist:

TRUSTED_NEWS_SOURCES = [
    'trojmiasto.pl',
    'gdansk.pl',
    'bizneswkaszubach.pl',
    'pomorska.pl',
    'forbes.pl',
    'pulshr.pl',
    'rp.pl',  # Rzeczpospolita
    'pb.pl',  # Puls Biznesu
    'gp24.pl'  # Gazeta Pomorska
]

Implementation:

def should_auto_approve(news: CompanyNews) -> bool:
    """
    Determine if news should be auto-approved.
    Returns True if news meets high-confidence criteria.
    """
    if news.relevance_score < 0.9:
        return False

    if news.source_type not in ('press', 'award'):
        return False

    # Check if company name in title
    if news.company.name.lower() not in news.title.lower():
        return False

    # Check if source is trusted
    from urllib.parse import urlparse
    domain = urlparse(news.source_url).netloc
    if domain not in TRUSTED_NEWS_SOURCES:
        return False

    return True

6. Display on Company Profiles

6.1 News Section in Company Profile

Location: templates/company_detail.html

Placement: After "Social Media" section, before "Strona WWW" section

Visibility Rules:

Only show news with is_approved=TRUE AND is_visible=TRUE
Sort by published_date DESC
Limit to 5 most recent items
If no approved news, don't show section

HTML Structure:

<!-- Company News Section -->
{% if company_news %}
<section class="company-section">
    <h2 class="section-title">
        <i class="fas fa-newspaper"></i> Aktualności
    </h2>

    <div class="news-grid">
        {% for news in company_news %}
        <div class="news-card">
            <div class="news-header">
                <span class="news-type-badge {{ news.news_type }}">
                    {{ news_type_labels[news.news_type] }}
                </span>
                <span class="news-date">{{ news.published_date|format_date }}</span>
            </div>

            <h3 class="news-title">{{ news.title }}</h3>

            <p class="news-summary">
                {{ news.ai_summary or news.summary }}
            </p>

            <div class="news-meta">
                <span class="news-source">
                    <i class="fas fa-external-link-alt"></i> {{ news.source_name }}
                </span>
                <span class="news-relevance" title="Relevance: {{ news.relevance_score }}">
                    <i class="fas fa-star"></i> {{ (news.relevance_score * 100)|int }}%
                </span>
            </div>

            {% if news.ai_tags %}
            <div class="news-tags">
                {% for tag in news.ai_tags[:5] %}
                <span class="tag">{{ tag }}</span>
                {% endfor %}
            </div>
            {% endif %}

            <a href="{{ news.source_url }}" target="_blank" class="news-link">
                Czytaj więcej <i class="fas fa-arrow-right"></i>
            </a>
        </div>
        {% endfor %}
    </div>

    {% if company_news|length >= 5 %}
    <div class="news-load-more">
        <button class="btn-secondary" onclick="loadMoreNews({{ company.id }})">
            Załaduj więcej aktualności
        </button>
    </div>
    {% endif %}
</section>
{% endif %}

CSS Styling:

.news-grid {
    display: grid;
    grid-template-columns: repeat(auto-fill, minmax(300px, 1fr));
    gap: 20px;
    margin-top: 20px;
}

.news-card {
    background: white;
    border: 1px solid #e0e0e0;
    border-radius: 8px;
    padding: 20px;
    transition: box-shadow 0.2s;
}

.news-card:hover {
    box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1);
}

.news-header {
    display: flex;
    justify-content: space-between;
    align-items: center;
    margin-bottom: 12px;
}

.news-type-badge {
    padding: 4px 12px;
    border-radius: 4px;
    font-size: 12px;
    font-weight: 600;
    text-transform: uppercase;
}

.news-type-badge.news_mention {
    background: #e3f2fd;
    color: #1976d2;
}

.news-type-badge.press_release {
    background: #f3e5f5;
    color: #7b1fa2;
}

.news-type-badge.award {
    background: #fff3e0;
    color: #f57c00;
}

.news-date {
    color: #666;
    font-size: 13px;
}

.news-title {
    font-size: 18px;
    font-weight: 600;
    margin-bottom: 12px;
    color: #333;
    line-height: 1.4;
}

.news-summary {
    color: #555;
    font-size: 14px;
    line-height: 1.6;
    margin-bottom: 12px;
}

.news-meta {
    display: flex;
    justify-content: space-between;
    align-items: center;
    margin-bottom: 12px;
    font-size: 13px;
    color: #666;
}

.news-tags {
    display: flex;
    flex-wrap: wrap;
    gap: 8px;
    margin-bottom: 12px;
}

.news-tags .tag {
    background: #f5f5f5;
    padding: 4px 10px;
    border-radius: 12px;
    font-size: 12px;
    color: #555;
}

.news-link {
    display: inline-flex;
    align-items: center;
    gap: 6px;
    color: #1976d2;
    font-weight: 500;
    text-decoration: none;
    font-size: 14px;
}

.news-link:hover {
    text-decoration: underline;
}

6.2 Load More News (Pagination)

Endpoint: GET /api/company/<company_id>/news

Parameters:

offset - Number of items to skip (default: 0)
limit - Number of items to return (default: 5)

Response:

{
  "news": [
    {
      "id": 42,
      "title": "PIXLAB otwiera nową siedzibę",
      "summary": "Firma PIXLAB, specjalizująca się...",
      "source_name": "Trojmiasto.pl",
      "source_url": "https://example.com/article",
      "news_type": "news_mention",
      "published_date": "2026-01-08T10:30:00",
      "relevance_score": 0.85,
      "ai_tags": ["projekty", "IT", "wejherowo"]
    }
  ],
  "total": 12,
  "offset": 5,
  "limit": 5,
  "has_more": true
}

JavaScript Implementation:

async function loadMoreNews(companyId) {
    const currentCount = document.querySelectorAll('.news-card').length;

    try {
        const response = await fetch(`/api/company/${companyId}/news?offset=${currentCount}&limit=5`);
        const data = await response.json();

        if (data.news.length === 0) {
            document.querySelector('.news-load-more').innerHTML =
                '<p>Brak więcej aktualności</p>';
            return;
        }

        const newsGrid = document.querySelector('.news-grid');

        data.news.forEach(news => {
            const newsCard = createNewsCard(news);
            newsGrid.appendChild(newsCard);
        });

        if (!data.has_more) {
            document.querySelector('.news-load-more').style.display = 'none';
        }

    } catch (error) {
        console.error('Error loading more news:', error);
        alert('Błąd podczas ładowania aktualności');
    }
}

function createNewsCard(news) {
    const card = document.createElement('div');
    card.className = 'news-card';

    card.innerHTML = `
        <div class="news-header">
            <span class="news-type-badge ${news.news_type}">
                ${newsTypeLabels[news.news_type]}
            </span>
            <span class="news-date">${formatDate(news.published_date)}</span>
        </div>

        <h3 class="news-title">${escapeHtml(news.title)}</h3>

        <p class="news-summary">${escapeHtml(news.summary)}</p>

        <div class="news-meta">
            <span class="news-source">
                <i class="fas fa-external-link-alt"></i> ${escapeHtml(news.source_name)}
            </span>
            <span class="news-relevance">
                <i class="fas fa-star"></i> ${Math.round(news.relevance_score * 100)}%
            </span>
        </div>

        ${news.ai_tags ? `
        <div class="news-tags">
            ${news.ai_tags.slice(0, 5).map(tag =>
                `<span class="tag">${escapeHtml(tag)}</span>`
            ).join('')}
        </div>
        ` : ''}

        <a href="${escapeHtml(news.source_url)}" target="_blank" class="news-link">
            Czytaj więcej <i class="fas fa-arrow-right"></i>
        </a>
    `;

    return card;
}

7. User Notification System

7.1 Notification Creation

When to Create Notifications:

New News Approved - Notify company owner when their news is approved
News Rejected - Notify company owner when their news is rejected (optional)
High-Priority News - Notify NORDA members when high-relevance news appears for companies they follow (future feature)

Implementation:

def create_news_approval_notification(news: CompanyNews, db: Session):
    """
    Create notification when news is approved.
    Notify company owner (if user account exists).
    """
    # Find company owner
    company_user = db.query(User).filter_by(company_id=news.company_id).first()

    if not company_user:
        return  # No user account for this company

    notification = UserNotification(
        user_id=company_user.id,
        title=f"Nowa aktualność o {news.company.name}",
        message=f"Artykuł '{news.title}' został zatwierdzony i jest widoczny na profilu firmy.",
        notification_type='news',
        related_type='company_news',
        related_id=news.id,
        action_url=f"/company/{news.company.slug}#news",
        is_read=False
    )

    db.add(notification)
    db.commit()

7.2 Notification API

Endpoint: GET /api/notifications

Authentication: Requires logged-in user

Response:

{
  "notifications": [
    {
      "id": 123,
      "title": "Nowa aktualność o PIXLAB",
      "message": "Artykuł 'PIXLAB otwiera nową siedzibę' został zatwierdzony...",
      "notification_type": "news",
      "related_type": "company_news",
      "related_id": 42,
      "action_url": "/company/pixlab-sp-z-o-o#news",
      "is_read": false,
      "created_at": "2026-01-10T14:30:00"
    }
  ],
  "unread_count": 3,
  "total": 15
}

Implementation:

@app.route('/api/notifications', methods=['GET'])
@login_required
def api_notifications():
    """Get user notifications."""
    limit = request.args.get('limit', 20, type=int)
    offset = request.args.get('offset', 0, type=int)
    unread_only = request.args.get('unread_only', 'false') == 'true'

    query = db.session.query(UserNotification).filter_by(user_id=current_user.id)

    if unread_only:
        query = query.filter_by(is_read=False)

    total = query.count()
    unread_count = db.session.query(UserNotification).filter_by(
        user_id=current_user.id,
        is_read=False
    ).count()

    notifications = query.order_by(UserNotification.created_at.desc()) \
        .limit(limit) \
        .offset(offset) \
        .all()

    return jsonify({
        'notifications': [n.to_dict() for n in notifications],
        'unread_count': unread_count,
        'total': total,
        'has_more': (offset + limit) < total
    })

7.3 Mark as Read

Endpoint: POST /api/notifications/<notification_id>/read

Authentication: Requires logged-in user

Response:

{
  "success": true,
  "notification_id": 123,
  "is_read": true
}

Implementation:

@app.route('/api/notifications/<int:notification_id>/read', methods=['POST'])
@login_required
def api_notification_mark_read(notification_id):
    """Mark notification as read."""
    notification = db.session.get(UserNotification, notification_id)

    if not notification:
        return jsonify({'error': 'Notification not found'}), 404

    if notification.user_id != current_user.id:
        return jsonify({'error': 'Unauthorized'}), 403

    notification.is_read = True
    notification.read_at = datetime.utcnow()
    db.session.commit()

    return jsonify({
        'success': True,
        'notification_id': notification.id,
        'is_read': notification.is_read
    })

7.4 Notification Badge (UI)

Location: Navigation bar (next to user avatar)

Implementation:

<!-- Notification Badge -->
<div class="notification-badge" id="notificationBadge">
    <i class="fas fa-bell"></i>
    <span class="badge" id="unreadCount">0</span>
</div>

<script>
// Fetch unread count on page load
async function fetchUnreadCount() {
    try {
        const response = await fetch('/api/notifications?unread_only=true&limit=1');
        const data = await response.json();

        const badge = document.getElementById('unreadCount');
        if (data.unread_count > 0) {
            badge.textContent = data.unread_count;
            badge.style.display = 'inline-block';
        } else {
            badge.style.display = 'none';
        }
    } catch (error) {
        console.error('Error fetching notifications:', error);
    }
}

// Refresh every 60 seconds
setInterval(fetchUnreadCount, 60000);
fetchUnreadCount();
</script>

8. Performance and Optimization

8.1 Rate Limiting

Brave Search API:

Free Tier: 2,000 searches/month
Rate Limit: 1 request/second (implemented in script)
Monthly Quota Tracking: Store in database

Gemini AI:

Free Tier: 1,500 requests/day
Cost per request: $0.00 (free tier)
Track usage in ai_api_costs table

Database Query Optimization:

Use composite indexes for approved + visible news
Cache company news list (5 min TTL)
Paginate results (5 items per page)

8.2 Caching Strategy

News List Caching:

from functools import lru_cache
from datetime import datetime, timedelta

@lru_cache(maxsize=128)
def get_company_news_cached(company_id: int, cache_key: str) -> list:
    """
    Cache company news for 5 minutes.
    cache_key format: "news_{company_id}_{timestamp_5min}"
    """
    news = db.session.query(CompanyNews).filter(
        CompanyNews.company_id == company_id,
        CompanyNews.is_approved == True,
        CompanyNews.is_visible == True
    ).order_by(CompanyNews.published_date.desc()).limit(5).all()

    return [n.to_dict() for n in news]

def get_company_news(company_id: int) -> list:
    """Get company news with 5-minute cache."""
    # Generate cache key (changes every 5 minutes)
    now = datetime.utcnow()
    cache_timestamp = now.replace(minute=(now.minute // 5) * 5, second=0, microsecond=0)
    cache_key = f"news_{company_id}_{cache_timestamp.isoformat()}"

    return get_company_news_cached(company_id, cache_key)

8.3 Monitoring Queries

Check Quota Usage:

-- Brave API usage (last 30 days)
SELECT
    COUNT(*) as total_searches,
    2000 - COUNT(*) as remaining_quota,
    DATE(created_at) as search_date
FROM company_news
WHERE created_at >= NOW() - INTERVAL '30 days'
  AND source_type = 'web'
GROUP BY DATE(created_at)
ORDER BY search_date DESC;

News Statistics:

-- News by status
SELECT
    moderation_status,
    COUNT(*) as count,
    AVG(relevance_score) as avg_relevance
FROM company_news
GROUP BY moderation_status;

-- News by type
SELECT
    news_type,
    COUNT(*) as count
FROM company_news
WHERE is_approved = TRUE
GROUP BY news_type
ORDER BY count DESC;

-- Top sources
SELECT
    source_name,
    COUNT(*) as article_count,
    AVG(relevance_score) as avg_relevance
FROM company_news
WHERE is_approved = TRUE
GROUP BY source_name
ORDER BY article_count DESC
LIMIT 10;

9. Security Considerations

9.1 Input Validation

URL Validation:

from urllib.parse import urlparse

def is_valid_news_url(url: str) -> bool:
    """Validate news URL before storing."""
    try:
        parsed = urlparse(url)

        # Must have scheme and netloc
        if not parsed.scheme or not parsed.netloc:
            return False

        # Only allow HTTP/HTTPS
        if parsed.scheme not in ('http', 'https'):
            return False

        # Block localhost and private IPs
        if 'localhost' in parsed.netloc or '127.0.0.1' in parsed.netloc:
            return False

        return True

    except Exception:
        return False

Content Sanitization:

from markupsafe import escape

def sanitize_news_content(text: str) -> str:
    """Sanitize user-generated content."""
    # Escape HTML
    text = escape(text)

    # Remove excessive whitespace
    text = ' '.join(text.split())

    # Limit length
    max_length = 5000
    if len(text) > max_length:
        text = text[:max_length] + '...'

    return text

9.2 Rate Limiting (Flask-Limiter)

API Endpoints:

from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

limiter = Limiter(
    app=app,
    key_func=get_remote_address,
    default_limits=["200 per day", "50 per hour"]
)

@app.route('/api/company/<int:company_id>/news')
@limiter.limit("30 per minute")  # Prevent abuse
def api_company_news(company_id):
    """Get company news (rate limited)."""
    pass

@app.route('/api/news/moderate', methods=['POST'])
@login_required
@limiter.limit("100 per hour")  # Admin moderation limit
def api_news_moderate():
    """Moderate news (rate limited)."""
    pass

9.3 CSRF Protection

All POST endpoints:

from flask_wtf.csrf import CSRFProtect

csrf = CSRFProtect(app)

# Automatically protects all POST/PUT/DELETE requests
# Frontend must include CSRF token:
# <input type="hidden" name="csrf_token" value="{{ csrf_token() }}">

10. Testing and Validation

10.1 Manual Testing Checklist

News Discovery:

Brave API returns results for company name search
Brave API returns results for NIP search
Script handles companies with no results
Script handles API rate limits (429 error)
Script handles network errors gracefully
Deduplication works (same URL not inserted twice)

AI Filtering:

Gemini AI returns valid JSON response
Relevance scores are between 0.0 and 1.0
News types are correctly classified
AI summaries are generated correctly
Tags are relevant and limited to 5

Admin Moderation:

Only admins can access /admin/news
Pending news list displays correctly
Approve action updates database
Reject action updates database
Bulk actions work correctly
Filtering works (by company, type, score, date)

Company Profile Display:

News section appears on company profile
Only approved news is shown
News sorted by published_date DESC
"Load more" pagination works
News cards display correctly

Notifications:

Notification created when news approved
Notification badge shows unread count
Mark as read works correctly
Notification links to correct company profile

10.2 Database Integrity Tests

Run these queries:

-- Check for duplicate URLs (should return 0)
SELECT company_id, source_url, COUNT(*)
FROM company_news
GROUP BY company_id, source_url
HAVING COUNT(*) > 1;

-- Check for invalid relevance scores (should return 0)
SELECT id, relevance_score
FROM company_news
WHERE relevance_score < 0.0 OR relevance_score > 1.0;

-- Check for orphaned news (company deleted)
SELECT cn.id, cn.company_id
FROM company_news cn
LEFT JOIN companies c ON cn.company_id = c.id
WHERE c.id IS NULL;

-- Check for orphaned notifications (user deleted)
SELECT un.id, un.user_id
FROM user_notifications un
LEFT JOIN users u ON un.user_id = u.id
WHERE u.id IS NULL;

10.3 Performance Tests

Load Testing:

# Test company profile with 50 news items
ab -n 1000 -c 10 https://nordabiznes.pl/company/pixlab-sp-z-o-o

# Test news API pagination
ab -n 500 -c 5 https://nordabiznes.pl/api/company/26/news?offset=0&limit=5

# Test notification API
ab -n 500 -c 5 -H "Cookie: session=..." https://nordabiznes.pl/api/notifications

Expected Performance:

Company profile load: < 500ms
News API (5 items): < 200ms
Notification API: < 150ms

11. Troubleshooting Guide

11.1 Common Issues

Issue: No news discovered for company

Possible Causes:

Company name too generic (e.g., "ABC")
No recent news published
Brave API rate limit exceeded
Network connectivity issues

Solution:

# Manual test for specific company
python scripts/fetch_company_news.py --company pixlab-sp-z-o-o --dry-run

# Check Brave API quota
curl -H "X-Subscription-Token: $BRAVE_API_KEY" \
  "https://api.search.brave.com/res/v1/news/search?q=test"

Issue: News not appearing on company profile

Possible Causes:

News not approved (is_approved=FALSE)
News not visible (is_visible=FALSE)
Moderation status is pending or rejected
Cache not cleared

Solution:

-- Check news status
SELECT id, title, is_approved, is_visible, moderation_status
FROM company_news
WHERE company_id = 26
ORDER BY created_at DESC;

-- Force approve (admin only)
UPDATE company_news
SET is_approved = TRUE,
    is_visible = TRUE,
    moderation_status = 'approved'
WHERE id = 42;

Issue: AI filtering returns invalid JSON

Possible Causes:

Gemini response includes markdown formatting
Response truncated (token limit)
Response contains invalid JSON characters

Solution:

def parse_gemini_json_response(response_text: str) -> dict:
    """Parse Gemini JSON response with error handling."""
    import re
    import json

    # Remove markdown code blocks
    text = re.sub(r'```json\s*|\s*```', '', response_text)

    # Remove leading/trailing whitespace
    text = text.strip()

    try:
        return json.loads(text)
    except json.JSONDecodeError as e:
        logger.error(f"Invalid JSON from Gemini: {e}")
        logger.error(f"Response: {text}")

        # Return default values
        return {
            'relevance': 0.5,
            'type': 'news_mention',
            'reason': 'AI parsing error',
            'summary': '',
            'tags': []
        }

Issue: Notification not received

Possible Causes:

Company has no user account
Notification creation failed (database error)
User has notifications disabled (future feature)

Solution:

-- Check if company has user account
SELECT u.id, u.email, u.company_id
FROM users u
WHERE u.company_id = 26;

-- Manually create notification
INSERT INTO user_notifications (
    user_id, title, message, notification_type,
    related_type, related_id, action_url
) VALUES (
    42,  -- user_id
    'Test notification',
    'This is a test message',
    'news',
    'company_news',
    123,  -- news_id
    '/company/pixlab-sp-z-o-o#news'
);

11.2 Diagnostic Queries

News Discovery Stats:

-- News discovered per day (last 30 days)
SELECT
    DATE(discovered_at) as discovery_date,
    COUNT(*) as news_count,
    AVG(relevance_score) as avg_relevance
FROM company_news
WHERE discovered_at >= NOW() - INTERVAL '30 days'
GROUP BY DATE(discovered_at)
ORDER BY discovery_date DESC;

Moderation Backlog:

-- Pending news count by company
SELECT
    c.name,
    COUNT(*) as pending_count,
    MIN(cn.discovered_at) as oldest_pending
FROM company_news cn
JOIN companies c ON cn.company_id = c.id
WHERE cn.moderation_status = 'pending'
GROUP BY c.name
ORDER BY pending_count DESC;

AI Filtering Performance:

-- Relevance score distribution
SELECT
    CASE
        WHEN relevance_score >= 0.8 THEN 'High (0.8-1.0)'
        WHEN relevance_score >= 0.5 THEN 'Medium (0.5-0.7)'
        WHEN relevance_score >= 0.3 THEN 'Low (0.3-0.4)'
        ELSE 'Very Low (0.0-0.2)'
    END as score_range,
    COUNT(*) as count,
    ROUND(AVG(relevance_score)::numeric, 2) as avg_score
FROM company_news
GROUP BY score_range
ORDER BY avg_score DESC;

12. Future Enhancements

12.1 Planned Features

Social Media Integration:

Fetch posts from Facebook Pages
Fetch posts from LinkedIn Company Pages
Fetch posts from Instagram Business accounts
Unified news feed (web + social)

Advanced Filtering:

Sentiment analysis (positive, neutral, negative)
Entity extraction (people, places, organizations)
Topic clustering (group similar news)
Trend detection (identify trending topics)

User Features:

Follow companies to receive notifications
Save news items to favorites
Share news on social media
Comment on news articles (internal discussion)

Analytics:

News engagement metrics (views, clicks, shares)
Company visibility score based on news coverage
Trending companies dashboard
News heatmap (by region, industry, time)

Automation:

Auto-approve high-confidence news (score >= 0.95)
Auto-reject spam/irrelevant (score < 0.2)
Scheduled email digests for admins
RSS feed for approved news

12.2 Technical Improvements

Performance:

Implement Redis caching for news lists
Add full-text search for news content
Optimize database queries with materialized views
Add CDN for news thumbnails

Reliability:

Add retry mechanism for Brave API
Implement circuit breaker pattern
Add health check endpoint for news system
Monitor API quota usage with alerts

Security:

Add content moderation for user comments (future)
Implement rate limiting per user (not just IP)
Add CAPTCHA for public API endpoints
Scan URLs for malware/phishing

13. Glossary

Terms:

Brave Search API - News search API by Brave (alternative to Google News API)
Company News - News articles and mentions about companies in the Norda Biznes directory
AI Filtering - Automated relevance scoring using Google Gemini AI
Moderation - Manual review and approval process by admins
Relevance Score - AI-calculated score (0.0-1.0) indicating how relevant an article is to a company
News Type - Classification of news (news_mention, press_release, award, etc.)
Source Type - Origin of news (web, facebook, linkedin, etc.)
Moderation Status - Workflow state (pending, approved, rejected)
User Notification - In-app notification for users about new news

Database Tables:

company_news - Stores news articles and mentions
user_notifications - Stores user notifications
companies - Company directory
users - User accounts

External Integrations Architecture - Brave Search and Gemini AI integration details
Database Schema - Complete database schema documentation
Flask Components - Flask application structure and routes
AI Chat Flow - Gemini AI integration patterns
CLAUDE.md - Main project documentation (News Monitoring section)
database/migrate_news_tables.sql - Database migration script

15. Maintenance Guidelines

15.1 Regular Maintenance Tasks

Daily:

Monitor Brave API quota usage
Check moderation backlog (pending news)
Review AI filtering accuracy (sample check)

Weekly:

Analyze news discovery statistics
Review rejected news for false negatives
Update trusted sources whitelist

Monthly:

Review Brave API costs (if exceeding free tier)
Analyze news engagement metrics
Update AI filtering prompt (if needed)
Clean up old rejected news (> 90 days)

15.2 When to Update This Document

Update this document when:

New news sources are added (e.g., social media)
AI filtering algorithm changes
Database schema changes (new fields, indexes)
Admin dashboard UI changes significantly
New API endpoints are added
Rate limits or quotas change

Update Process:

Edit this markdown file
Verify Mermaid diagrams render correctly
Update "Last Updated" date at top
Commit with descriptive message: docs: Update news monitoring flow - [what changed]
Notify team via Slack/email

Document Status: ✅ Complete - Ready for implementation Implementation Status: 🚧 Planned (Database schema ready, scripts pending) Next Steps: Implement scripts/fetch_company_news.py and /admin/news dashboard

57 KiB Raw Blame History Unescape Escape

News Monitoring Flow

Overview

1. High-Level News Monitoring Flow

1.1 Complete News Monitoring Flow Diagram

1.2 Admin Moderation Flow

1.3 User View Flow (Company Profile)

2. News Discovery Pipeline

2.1 Brave Search API Integration

2.2 News Discovery Script

3. AI Filtering and Classification

3.1 Gemini AI Integration

3.2 Classification Types

4. Database Schema

4.1 company_news Table

4.2 user_notifications Table

5. Admin Moderation Workflow

5.1 Admin Dashboard (/admin/news)

5.2 Moderation API Endpoints

5.3 Auto-Approval Rules

6. Display on Company Profiles

6.1 News Section in Company Profile

6.2 Load More News (Pagination)

7. User Notification System

7.1 Notification Creation

7.2 Notification API

7.3 Mark as Read

7.4 Notification Badge (UI)

8. Performance and Optimization

8.1 Rate Limiting

8.2 Caching Strategy

8.3 Monitoring Queries

9. Security Considerations

9.1 Input Validation

9.2 Rate Limiting (Flask-Limiter)

9.3 CSRF Protection

10. Testing and Validation

10.1 Manual Testing Checklist

10.2 Database Integrity Tests

10.3 Performance Tests

11. Troubleshooting Guide

11.1 Common Issues

11.2 Diagnostic Queries

12. Future Enhancements

12.1 Planned Features

12.2 Technical Improvements

13. Glossary

14. Related Documentation

15. Maintenance Guidelines

15.1 Regular Maintenance Tasks

15.2 When to Update This Document

57 KiB

Raw Blame History

5.1 Admin Dashboard (`/admin/news`)