# External Integrations Architecture **Document Version:** 1.0 **Last Updated:** 2026-01-10 **Status:** Production LIVE **Diagram Type:** External Systems Integration Architecture --- ## Overview This diagram shows the **external APIs and data sources** integrated with Norda Biznes Partner. It illustrates: - **6 major API integrations** (Google Gemini, Brave Search, PageSpeed, Places, KRS, MS Graph) - **2 web scraping sources** (ALEO.com, rejestr.io) - **Authentication methods** for each integration - **Data flows** and usage patterns - **Rate limits** and quota management - **Cost tracking** and optimization **Abstraction Level:** External Integration Architecture **Audience:** Developers, DevOps, System Architects **Purpose:** Understanding external dependencies, API usage, and integration patterns --- ## Integration Architecture Diagram ```mermaid graph TB %% Main system subgraph "Norda Biznes Partner System" WebApp["🌐 Flask Web Application
app.py"] subgraph "Service Layer" GeminiSvc["πŸ€– Gemini Service
gemini_service.py"] ChatSvc["πŸ’¬ Chat Service
nordabiz_chat.py"] EmailSvc["πŸ“§ Email Service
email_service.py"] KRSSvc["πŸ›οΈ KRS Service
krs_api_service.py"] GBPSvc["πŸ“Š GBP Audit Service
gbp_audit_service.py"] end subgraph "Background Scripts" SEOScript["πŸ“Š SEO Audit
scripts/seo_audit.py"] SocialScript["πŸ“± Social Media Audit
scripts/social_media_audit.py"] ImportScript["πŸ“₯ Data Import
scripts/import_*.py"] end Database["πŸ’Ύ PostgreSQL Database
localhost:5432"] end %% External integrations subgraph "AI & ML Services" Gemini["πŸ€– Google Gemini API
gemini-2.5-flash

Free tier: 200 req/day
Auth: API Key
Cost: $0.075-$5.00/1M tokens"] end subgraph "SEO & Analytics" PageSpeed["πŸ“Š Google PageSpeed Insights
v5 API

Free tier: 25,000 req/day
Auth: API Key
Cost: Free"] Places["πŸ“ Google Places API
Maps Platform

Pay-per-use
Auth: API Key
Cost: $0.032/request"] end subgraph "Search & Discovery" BraveAPI["πŸ” Brave Search API
Web & News Search

Free tier: 2,000 req/month
Auth: API Key
Cost: Free"] end subgraph "Data Sources" KRS["πŸ›οΈ KRS Open API
Ministry of Justice Poland

No limits (public API)
Auth: None
Cost: Free"] ALEO["🌐 ALEO.com
NIP Verification Service

Web scraping (Playwright)
Auth: None
Cost: Free"] Rejestr["πŸ”— rejestr.io
Company Connections

Web scraping (Playwright)
Auth: None
Cost: Free"] end subgraph "Communication" MSGraph["πŸ“§ Microsoft Graph API
Email & Notifications

10,000 req/10min
Auth: OAuth 2.0 Client Credentials
Cost: Included in M365"] end %% Service layer connections to external APIs GeminiSvc -->|"HTTPS POST
generateContent
API Key: GOOGLE_GEMINI_API_KEY"| Gemini ChatSvc --> GeminiSvc GBPSvc -->|"Generate AI recommendations"| GeminiSvc KRSSvc -->|"HTTPS GET
OdpisAktualny/{krs}
Public API (no auth)"| KRS EmailSvc -->|"HTTPS POST
users/{id}/sendMail
OAuth 2.0 + Client Credentials"| MSGraph GBPSvc -->|"HTTPS GET
findplacefromtext
placedetails
API Key: GOOGLE_PLACES_API_KEY"| Places %% Script connections to external APIs SEOScript -->|"HTTPS GET
runPagespeed
API Key: GOOGLE_PAGESPEED_API_KEY
Quota tracking: 25K/day"| PageSpeed SocialScript -->|"HTTPS GET
web/search
news/search
API Key: BRAVE_SEARCH_API_KEY"| BraveAPI SocialScript -->|"Fallback for reviews"| Places ImportScript -->|"NIP verification
Playwright browser automation"| ALEO ImportScript -->|"Company connections
Playwright browser automation"| Rejestr ImportScript --> KRSSvc %% Data flows back to database GeminiSvc -->|"Log costs
ai_api_costs table"| Database SEOScript -->|"Store metrics
company_website_analysis"| Database SocialScript -->|"Store profiles
company_social_media"| Database GBPSvc -->|"Store audit results
gbp_audits"| Database ImportScript -->|"Import companies
companies table"| Database %% Web app connections WebApp --> ChatSvc WebApp --> EmailSvc WebApp --> KRSSvc WebApp --> GBPSvc WebApp --> GeminiSvc WebApp --> Database %% Styling classDef serviceStyle fill:#85bbf0,stroke:#5d92c7,color:#000000,stroke-width:2px classDef scriptStyle fill:#ffd93d,stroke:#ccae31,color:#000000,stroke-width:2px classDef aiStyle fill:#ff6b9d,stroke:#cc5579,color:#ffffff,stroke-width:3px classDef seoStyle fill:#c44569,stroke:#9d3754,color:#ffffff,stroke-width:3px classDef searchStyle fill:#6a89cc,stroke:#5570a3,color:#ffffff,stroke-width:3px classDef dataStyle fill:#4a69bd,stroke:#3b5497,color:#ffffff,stroke-width:3px classDef commStyle fill:#20bf6b,stroke:#1a9956,color:#ffffff,stroke-width:3px classDef dbStyle fill:#438dd5,stroke:#2e6295,color:#ffffff,stroke-width:3px classDef appStyle fill:#1168bd,stroke:#0b4884,color:#ffffff,stroke-width:3px class GeminiSvc,ChatSvc,EmailSvc,KRSSvc,GBPSvc serviceStyle class SEOScript,SocialScript,ImportScript scriptStyle class Gemini aiStyle class PageSpeed,Places seoStyle class BraveAPI searchStyle class KRS,ALEO,Rejestr dataStyle class MSGraph commStyle class Database dbStyle class WebApp appStyle ``` --- ## External Integration Details ### πŸ€– Google Gemini API **Purpose:** AI-powered text generation, chat, and image analysis **Service Files:** `gemini_service.py`, `nordabiz_chat.py`, `gbp_audit_service.py` **Status:** βœ… Production (Free Tier) #### Configuration | Parameter | Value | |-----------|-------| | **Provider** | Google Generative AI | | **Endpoint** | https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent | | **Authentication** | API Key | | **Environment Variable** | `GOOGLE_GEMINI_API_KEY` | | **Default Model** | gemini-2.5-flash | | **Timeout** | None (default) | #### Available Models ```python GEMINI_MODELS = { 'flash': 'gemini-2.5-flash', # Best for general use 'flash-lite': 'gemini-2.5-flash-lite', # Ultra cheap 'pro': 'gemini-2.5-pro', # High quality 'flash-2.0': 'gemini-2.0-flash', # 1M context window } ``` #### Pricing (per 1M tokens) | Model | Input Cost | Output Cost | |-------|-----------|-------------| | gemini-2.5-flash | $0.075 | $0.30 | | gemini-2.5-flash-lite | $0.10 | $0.40 | | gemini-2.5-pro | $1.25 | $5.00 | | gemini-2.0-flash | $0.075 | $0.30 | #### Rate Limits - **Free Tier:** 200 requests/day, 50 requests/hour - **Token Limits:** Model-dependent (1M for flash-2.0) #### Integration Points | Feature | File | Function | |---------|------|----------| | AI Chat | `nordabiz_chat.py` | `NordaBizChatEngine.chat()` | | GBP Recommendations | `gbp_audit_service.py` | `generate_ai_recommendations()` | | Text Generation | `gemini_service.py` | `generate_text()` | | Image Analysis | `gemini_service.py` | `analyze_image()` | #### Cost Tracking All API calls logged to `ai_api_costs` table: ```sql ai_api_costs ( id, timestamp, api_provider, model_name, feature, user_id, input_tokens, output_tokens, total_tokens, input_cost, output_cost, total_cost, success, error_message, latency_ms, prompt_hash ) ``` #### Data Flow ``` User Message β†’ ChatService β†’ GeminiService ↓ Google Gemini API ↓ Response + Token Count ↓ Cost Calculation β†’ ai_api_costs ↓ Response to User ``` --- ### πŸ›οΈ KRS Open API **Purpose:** Official company data from Polish National Court Register **Service Files:** `krs_api_service.py` **Status:** βœ… Production #### Configuration | Parameter | Value | |-----------|-------| | **Provider** | Ministry of Justice Poland | | **Endpoint** | https://api-krs.ms.gov.pl/api/krs/OdpisAktualny/{krs} | | **Authentication** | None (public API) | | **Timeout** | 15 seconds | | **Response Format** | JSON | #### Rate Limits - **Official Limit:** None documented - **Best Practice:** 1-2 second delays between requests - **Timeout:** 15 seconds configured #### Data Retrieved - Basic identifiers (KRS, NIP, REGON) - Company name (full and shortened) - Legal form (Sp. z o.o., S.A., etc.) - Full address (street, city, voivodeship) - Share capital and currency - Registration dates - Management board (anonymized in Open API) - Shareholders (anonymized in Open API) - Business activities - OPP status (Organizacja PoΕΌytku Publicznego) #### Integration Points | Feature | File | Usage | |---------|------|-------| | Data Import | `import_*.py` scripts | Company verification | | Manual Verification | `verify_all_companies_data.py` | Batch verification | | API Endpoint | `app.py` | `/api/verify-krs` | #### Data Flow ``` Import Script β†’ KRSService.get_company_from_krs() ↓ KRS Open API ↓ KRSCompanyData (dataclass) ↓ Verification & Validation ↓ Update companies table ``` --- ### πŸ“Š Google PageSpeed Insights API **Purpose:** SEO, performance, accessibility, and best practices analysis **Service Files:** `scripts/seo_audit.py`, `scripts/pagespeed_client.py` **Status:** βœ… Production #### Configuration | Parameter | Value | |-----------|-------| | **Provider** | Google PageSpeed Insights | | **Endpoint** | https://www.googleapis.com/pagespeedonline/v5/runPagespeed | | **Authentication** | API Key | | **Environment Variable** | `GOOGLE_PAGESPEED_API_KEY` | | **Google Cloud Project** | NORDABIZNES (gen-lang-client-0540794446) | | **Timeout** | 30 seconds | | **Strategy** | Mobile (default), Desktop (optional) | #### Rate Limits - **Free Tier:** 25,000 queries/day - **Per-Second:** Recommended 1 query/second - **Quota Tracking:** In-memory counter in `pagespeed_client.py` #### Metrics Returned ```python @dataclass class PageSpeedScores: seo: int # 0-100 SEO score performance: int # 0-100 Performance score accessibility: int # 0-100 Accessibility score best_practices: int # 0-100 Best Practices score pwa: Optional[int] # 0-100 PWA score @dataclass class CoreWebVitals: lcp_ms: Optional[int] # Largest Contentful Paint fid_ms: Optional[int] # First Input Delay cls: Optional[float] # Cumulative Layout Shift ``` #### Database Storage Results saved to `company_website_analysis`: ```sql company_website_analysis ( company_id PRIMARY KEY, analyzed_at, pagespeed_seo_score, pagespeed_performance_score, pagespeed_accessibility_score, pagespeed_best_practices_score, pagespeed_audits JSONB, largest_contentful_paint_ms, first_input_delay_ms, cumulative_layout_shift, seo_overall_score, seo_health_score, seo_issues JSONB ) ``` #### Integration Points | Feature | File | Endpoint/Function | |---------|------|-------------------| | Admin Dashboard | `app.py` | `/admin/seo` | | Audit Script | `scripts/seo_audit.py` | CLI tool | | Batch Audits | `scripts/seo_audit.py` | `SEOAuditor.run_audit()` | #### Data Flow ``` Admin Trigger β†’ SEO Audit Script ↓ PageSpeed API ↓ Scores + Core Web Vitals + Audits ↓ company_website_analysis table ↓ Admin Dashboard Display ``` --- ### πŸ“ Google Places API **Purpose:** Business profiles, ratings, reviews, and opening hours **Service Files:** `gbp_audit_service.py`, `scripts/social_media_audit.py` **Status:** βœ… Production #### Configuration | Parameter | Value | |-----------|-------| | **Provider** | Google Maps Platform | | **Endpoints** | Find Place from Text, Place Details | | **Authentication** | API Key | | **Environment Variable** | `GOOGLE_PLACES_API_KEY` | | **Timeout** | 15 seconds | | **Language** | Polish (pl) | #### Cost - **Pricing Model:** Pay-per-use - **Cost per Request:** ~$0.032 per Place Details call - **Optimization:** 24-hour cache in database #### Endpoints Used **1. Find Place from Text** ``` https://maps.googleapis.com/maps/api/place/findplacefromtext/json ``` **2. Place Details** ``` https://maps.googleapis.com/maps/api/place/details/json ``` #### Data Retrieved ```python { 'google_place_id': str, # Unique Place ID 'google_name': str, # Business name 'google_address': str, # Formatted address 'google_phone': str, # Phone number 'google_website': str, # Website URL 'google_types': List[str], # Business categories 'google_maps_url': str, # Google Maps link 'google_rating': Decimal, # Rating (1.0-5.0) 'google_reviews_count': int, # Number of reviews 'google_photos_count': int, # Number of photos 'google_opening_hours': dict, # Opening hours 'google_business_status': str # OPERATIONAL, CLOSED, etc. } ``` #### Cache Strategy - **Cache Duration:** 24 hours - **Storage:** `company_website_analysis.analyzed_at` - **Force Refresh:** `force_refresh=True` parameter #### Integration Points | Feature | File | Function | |---------|------|----------| | GBP Audit | `gbp_audit_service.py` | `fetch_google_business_data()` | | Social Media Audit | `scripts/social_media_audit.py` | `GooglePlacesSearcher` | | Admin Dashboard | `app.py` | `/admin/gbp` | #### Data Flow ``` Admin/Script Trigger β†’ GBPService.fetch_google_business_data() ↓ Check cache (< 24h old?) ↓ [Cache miss] β†’ Places API ↓ Business Profile Data (JSON) ↓ company_website_analysis table ↓ Display in Admin Panel ``` --- ### πŸ” Brave Search API **Purpose:** News monitoring, social media discovery, web search **Service Files:** `scripts/social_media_audit.py` **Status:** βœ… Production (Social Media), πŸ“‹ Planned (News Monitoring) #### Configuration | Parameter | Value | |-----------|-------| | **Provider** | Brave Search | | **Endpoint (Web)** | https://api.search.brave.com/res/v1/web/search | | **Endpoint (News)** | https://api.search.brave.com/res/v1/news/search | | **Authentication** | API Key | | **Environment Variable** | `BRAVE_SEARCH_API_KEY` or `BRAVE_API_KEY` | | **Timeout** | 15 seconds | #### Rate Limits - **Free Tier:** 2,000 requests/month - **Per-Second:** No official limit - **Recommended:** 0.5-1 second delay #### Current Usage: Social Media Discovery ```python # Search for social media profiles params = { "q": f'"{company_name}" {city} facebook OR instagram', "count": 10, "country": "pl", "search_lang": "pl" } ``` #### Planned Usage: News Monitoring ```python # News search (from CLAUDE.md) params = { "q": f'"{company_name}" OR "{nip}"', "count": 10, "freshness": "pw", # past week "country": "pl", "search_lang": "pl" } ``` #### Pattern Extraction **Social Media URLs:** Regex patterns for: - Facebook: `facebook.com/[username]` - Instagram: `instagram.com/[username]` - LinkedIn: `linkedin.com/company/[name]` - YouTube: `youtube.com/@[channel]` - Twitter/X: `twitter.com/[username]` or `x.com/[username]` - TikTok: `tiktok.com/@[username]` **Google Reviews:** Patterns: - `"4,5 (123 opinii)"` - `"Rating: 4.5 Β· 123 reviews"` #### Integration Points | Feature | File | Status | |---------|------|--------| | Social Media Discovery | `scripts/social_media_audit.py` | βœ… Implemented | | Google Reviews Fallback | `scripts/social_media_audit.py` | βœ… Implemented | | News Monitoring | (Planned) | πŸ“‹ Pending | #### Data Flow ``` Social Media Audit Script β†’ Brave Search API ↓ Web Search Results (JSON) ↓ Pattern Extraction (regex) ↓ Social Media URLs (Facebook, Instagram, etc.) ↓ company_social_media table ``` --- ### πŸ“§ Microsoft Graph API **Purpose:** Email notifications via Microsoft 365 **Service Files:** `email_service.py` **Status:** βœ… Production #### Configuration | Parameter | Value | |-----------|-------| | **Provider** | Microsoft Graph API | | **Endpoint** | https://graph.microsoft.com/v1.0 | | **Authentication** | OAuth 2.0 Client Credentials Flow | | **Authority** | https://login.microsoftonline.com/{tenant_id} | | **Scope** | https://graph.microsoft.com/.default | #### Environment Variables ```bash MICROSOFT_TENANT_ID= MICROSOFT_CLIENT_ID= MICROSOFT_CLIENT_SECRET= MICROSOFT_MAIL_FROM=noreply@nordabiznes.pl ``` #### Authentication Flow 1. **Client Credentials Flow** (Application permissions) - No user interaction required - Service-to-service authentication - Uses client ID + client secret 2. **Token Acquisition** ```python app = msal.ConfidentialClientApplication( client_id, authority=f"https://login.microsoftonline.com/{tenant_id}", client_credential=client_secret, ) result = app.acquire_token_for_client( scopes=["https://graph.microsoft.com/.default"] ) ``` 3. **Token Caching** - MSAL library handles caching - Tokens cached for ~1 hour - Automatic refresh when expired #### Required Azure AD Permissions **Application Permissions** (requires admin consent): - `Mail.Send` - Send mail as any user #### Rate Limits - **Mail.Send:** 10,000 requests per 10 minutes per app - **Throttling:** 429 Too Many Requests (retry with backoff) #### Integration Points | Feature | File | Usage | |---------|------|-------| | User Registration | `app.py` | Send welcome email | | Password Reset | `app.py` | Send reset link | | Notifications | `app.py` | News approval notifications | #### Data Flow ``` App Trigger β†’ EmailService.send_mail() ↓ MSAL Token Acquisition (cached) ↓ Microsoft Graph API ↓ POST /users/{id}/sendMail ↓ Email Sent via M365 ↓ Success/Failure Response ``` --- ### 🌐 ALEO.com (Web Scraping) **Purpose:** NIP verification and company data enrichment **Service Files:** `scripts/import_*.py` (Playwright integration) **Status:** βœ… Production (Limited Use) #### Configuration | Parameter | Value | |-----------|-------| | **Provider** | ALEO.com (Polish business directory) | | **Endpoint** | https://www.aleo.com/ | | **Authentication** | None (public website) | | **Method** | Web scraping (Playwright browser automation) | | **Rate Limiting** | Self-imposed delays (1-2 seconds) | #### Data Retrieved - Company NIP verification - Company name - Address - Business category - Basic contact information #### Best Practices - **Rate Limiting:** 1-2 second delays between requests - **User Agent:** Standard browser user agent - **Error Handling:** Handle missing elements gracefully - **Caching:** Cache results to minimize requests #### Integration Points | Feature | File | Usage | |---------|------|-------| | Data Import | `import_*.py` scripts | NIP verification | #### Data Flow ``` Import Script β†’ Playwright Browser ↓ ALEO.com Search Page ↓ Company Search (by NIP) ↓ Parse HTML Results ↓ Extract Company Data ↓ Verify against KRS API ↓ Save to companies table ``` --- ### πŸ”— rejestr.io (Web Scraping) **Purpose:** Company connections, shareholders, management **Service Files:** `analyze_connections.py` (Playwright integration) **Status:** πŸ“‹ Planned Enhancement #### Configuration | Parameter | Value | |-----------|-------| | **Provider** | rejestr.io (KRS registry browser) | | **Endpoint** | https://rejestr.io/ | | **Authentication** | None (public website) | | **Method** | Web scraping (Playwright browser automation) | | **Rate Limiting** | Self-imposed delays (1-2 seconds) | #### Data to Retrieve (Planned) - Management board members - Shareholders with ownership percentages - Beneficial owners - Prokurents (proxies) - Links between companies (shared owners/managers) #### Planned Database Table ```sql company_people ( id SERIAL PRIMARY KEY, company_id INTEGER REFERENCES companies(id), name VARCHAR(255), role VARCHAR(100), -- Prezes, CzΕ‚onek ZarzΔ…du, WspΓ³lnik shares_percent NUMERIC(5,2), person_url VARCHAR(500), -- Link to rejestr.io person page created_at TIMESTAMP, updated_at TIMESTAMP ) ``` #### Integration Points (Planned) | Feature | File | Status | |---------|------|--------| | Connection Analysis | `analyze_connections.py` | πŸ“‹ Basic implementation exists | | Company Profile Display | `templates/company_detail.html` | πŸ“‹ Planned | | Network Visualization | (Future) | πŸ“‹ Planned | --- ## Authentication Summary ### API Key Authentication | API | Environment Variable | Key Location | |-----|---------------------|--------------| | Google Gemini | `GOOGLE_GEMINI_API_KEY` | Google AI Studio | | Google PageSpeed | `GOOGLE_PAGESPEED_API_KEY` | Google Cloud Console | | Google Places | `GOOGLE_PLACES_API_KEY` | Google Cloud Console | | Brave Search | `BRAVE_SEARCH_API_KEY` | Brave Search API Portal | ### OAuth 2.0 Authentication | API | Flow Type | Environment Variables | |-----|-----------|----------------------| | Microsoft Graph | Client Credentials | `MICROSOFT_TENANT_ID`
`MICROSOFT_CLIENT_ID`
`MICROSOFT_CLIENT_SECRET` | ### No Authentication | API | Access Type | |-----|------------| | KRS Open API | Public API | | ALEO.com | Web scraping (public) | | rejestr.io | Web scraping (public) | --- ## Rate Limits & Quota Management ### Summary Table | API | Free Tier Quota | Rate Limit | Cost | Tracking | |-----|----------------|------------|------|----------| | **Google Gemini** | 200 req/day
50 req/hour | Built-in | $0.075-$5.00/1M tokens | `ai_api_costs` table | | **Google PageSpeed** | 25,000 req/day | ~1 req/sec | Free | In-memory counter | | **Google Places** | Pay-per-use | No official limit | $0.032/request | 24-hour cache | | **Brave Search** | 2,000 req/month | No official limit | Free | None | | **KRS Open API** | Unlimited | No official limit | Free | None | | **Microsoft Graph** | 10,000 req/10min | Built-in throttling | Included in M365 | None | | **ALEO.com** | N/A (scraping) | Self-imposed (1-2s) | Free | None | | **rejestr.io** | N/A (scraping) | Self-imposed (1-2s) | Free | None | ### Quota Monitoring **Gemini AI - Daily Cost Report:** ```sql SELECT feature, COUNT(*) as calls, SUM(total_tokens) as total_tokens, SUM(total_cost) as total_cost, AVG(latency_ms) as avg_latency_ms FROM ai_api_costs WHERE DATE(timestamp) = CURRENT_DATE GROUP BY feature ORDER BY total_cost DESC; ``` **PageSpeed - Remaining Quota:** ```python from scripts.pagespeed_client import GooglePageSpeedClient client = GooglePageSpeedClient() remaining = client.get_remaining_quota() print(f"Remaining quota: {remaining}/{25000}") ``` --- ## Error Handling Patterns ### Common Error Types **1. Authentication Errors** - Invalid API key - Expired credentials - Missing environment variables **2. Rate Limiting** - Quota exceeded (daily/hourly) - Too many requests per second - Throttling (429 status code) **3. Network Errors** - Connection timeout - DNS resolution failure - SSL certificate errors **4. API Errors** - 400 Bad Request (invalid parameters) - 404 Not Found (resource doesn't exist) - 500 Internal Server Error (API issue) ### Retry Strategy **Exponential Backoff:** ```python import time max_retries = 3 for attempt in range(max_retries): try: result = api_client.call() break except TransientError: if attempt < max_retries - 1: wait_time = 2 ** attempt # 1s, 2s, 4s time.sleep(wait_time) else: raise ``` ### Error Handling Example ```python try: result = api_client.call_api(params) except requests.exceptions.Timeout: logger.error("API timeout") result = None except requests.exceptions.ConnectionError as e: logger.error(f"Connection error: {e}") result = None except QuotaExceededError: logger.warning("Quota exceeded, queuing for retry") queue_for_retry(params) except APIError as e: logger.error(f"API error: {e.status_code} - {e.message}") result = None finally: log_api_call(success=result is not None) ``` --- ## Security Considerations ### API Key Storage βœ… **Best Practices:** - Store in environment variables - Use `.env` file (NOT committed to git) - Rotate keys regularly - Use separate keys for dev/prod ❌ **Never:** - Hardcode keys in source code - Commit keys to version control - Share keys in chat/email - Use production keys in development ### HTTPS/TLS All APIs use HTTPS: - Google APIs: TLS 1.2+ - Microsoft Graph: TLS 1.2+ - Brave Search: TLS 1.2+ - KRS Open API: TLS 1.2+ ### Secrets Management **Production:** - Environment variables set in systemd service - Restricted file permissions on `.env` files - No secrets in logs or error messages **Development:** - `.env` file with restricted permissions (600) - Local `.env` not synced to cloud storage - Use test API keys when available --- ## Cost Optimization Strategies ### 1. Caching - **Google Places:** 24-hour cache in `company_website_analysis` - **PageSpeed:** Cache results, re-audit only when needed - **Gemini:** Cache common responses (FAQ, greetings) ### 2. Batch Processing - **SEO Audits:** Run during off-peak hours - **Social Media Discovery:** Process in batches of 10-20 - **News Monitoring:** Schedule daily/weekly runs ### 3. Model Selection - **Gemini:** Use cheaper models where appropriate - `gemini-2.5-flash-lite` for simple tasks - `gemini-2.5-flash` for general use - `gemini-2.5-pro` only for complex reasoning ### 4. Result Reuse - Don't re-analyze unchanged content - Check last analysis timestamp before API calls - Use `force_refresh` parameter sparingly ### 5. Quota Monitoring - Daily reports on API usage and costs - Alerts when >80% quota used - Automatic throttling when approaching limit --- ## Monitoring & Troubleshooting ### Health Checks **Test External API Connectivity:** ```bash # Gemini API curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=${GOOGLE_GEMINI_API_KEY}" \ -H 'Content-Type: application/json' \ -d '{"contents":[{"parts":[{"text":"Hello"}]}]}' # PageSpeed API curl "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://nordabiznes.pl&key=${GOOGLE_PAGESPEED_API_KEY}" # KRS API curl "https://api-krs.ms.gov.pl/api/krs/OdpisAktualny/0000817317?rejestr=P&format=json" # Brave Search API curl -H "X-Subscription-Token: ${BRAVE_SEARCH_API_KEY}" \ "https://api.search.brave.com/res/v1/web/search?q=test&count=1" ``` ### Common Issues **1. Gemini Quota Exceeded** ``` Error: 429 Resource has been exhausted Solution: Wait for quota reset (hourly/daily) or upgrade to paid tier ``` **2. PageSpeed Timeout** ``` Error: Timeout waiting for PageSpeed response Solution: Increase timeout, retry later, or skip slow websites ``` **3. Places API 403 Forbidden** ``` Error: This API project is not authorized to use this API Solution: Enable Places API in Google Cloud Console ``` **4. MS Graph Authentication Failed** ``` Error: AADSTS700016: Application not found in directory Solution: Verify MICROSOFT_TENANT_ID and MICROSOFT_CLIENT_ID ``` ### Diagnostic Commands **Check API Key Configuration:** ```bash # Development grep -E "GOOGLE|BRAVE|MICROSOFT" .env # Production sudo -u www-data printenv | grep -E "GOOGLE|BRAVE|MICROSOFT" ``` **Check Database API Cost Tracking:** ```sql -- Gemini API calls today SELECT feature, COUNT(*) as calls, SUM(total_cost) as cost FROM ai_api_costs WHERE DATE(timestamp) = CURRENT_DATE GROUP BY feature; -- Failed API calls SELECT timestamp, feature, error_message FROM ai_api_costs WHERE success = FALSE ORDER BY timestamp DESC LIMIT 10; ``` --- ## Related Documentation - **[System Context](./01-system-context.md)** - High-level system overview - **[Container Diagram](./02-container-diagram.md)** - Container architecture - **[Flask Components](./04-flask-components.md)** - Application components - **[Database Schema](./05-database-schema.md)** - Database design - **[External API Integration Analysis](./.auto-claude/specs/003-.../analysis/external-api-integrations.md)** - Detailed API analysis --- ## Maintenance Guidelines ### When to Update This Document - βœ… Adding new external API integration - βœ… Changing API authentication method - βœ… Updating rate limits or quotas - βœ… Modifying data flow patterns - βœ… Adding new database tables for API data - βœ… Changing cost tracking or optimization strategies ### Update Checklist - [ ] Update Mermaid diagram with new integration - [ ] Add detailed section for new API - [ ] Update authentication summary table - [ ] Update rate limits & quota table - [ ] Add integration points - [ ] Document data flow - [ ] Add health check commands - [ ] Update cost optimization strategies --- ## Glossary | Term | Definition | |------|------------| | **API Key** | Secret token for authenticating API requests | | **OAuth 2.0** | Industry-standard protocol for authorization | | **Client Credentials Flow** | OAuth flow for service-to-service authentication | | **Rate Limit** | Maximum number of API requests allowed per time period | | **Quota** | Total allowance for API usage (daily/monthly) | | **Web Scraping** | Automated extraction of data from websites | | **Playwright** | Browser automation framework for web scraping | | **Exponential Backoff** | Retry strategy with increasing delays | | **HTTPS/TLS** | Secure protocol for encrypted communication | | **Free Tier** | No-cost API usage level with limits | | **Pay-per-use** | Pricing model charging per API request | --- **Document End**