# AI Chat Flow **Document Version:** 1.0 **Last Updated:** 2026-01-10 **Status:** Production LIVE **Flow Type:** AI-Powered Company Discovery & Chat --- ## Overview This document describes the **complete AI chat flow** for the Norda Biznes Partner application, covering: - **Chat Interface** (`/chat` route) - **Conversation Management** (start, message, history) - **Context Building** with full company database - **Gemini API Integration** for intelligent responses - **Cost Tracking** and performance metrics - **Search Integration** for company discovery **Key Technology:** - **AI Model:** Google Gemini 2.5 Flash (gemini-2.5-flash) - **Chat Engine:** NordaBizChatEngine (nordabiz_chat.py) - **Gemini Service:** Centralized GeminiService (gemini_service.py) - **Search Integration:** Unified SearchService (search_service.py) - **Database:** PostgreSQL (conversations, messages, companies) **Key Features:** - Full company database context (all 80 companies available to AI) - Multi-turn conversation with history (last 10 messages) - Intelligent company selection by AI (no pre-filtering) - Real-time cost tracking (tokens, latency, theoretical cost) - Free tier usage monitoring (1,500 requests/day limit) - Compact data format to minimize token usage **Cost & Performance:** - **Model:** Gemini 2.5 Flash - **Pricing:** $0.075/$0.30 per 1M tokens (input/output) - **Free Tier:** 1,500 requests/day, unlimited tokens - **Typical Response:** 200-400ms latency, 5,000-15,000 tokens - **Actual Cost:** $0.00 (free tier) - **Theoretical Cost:** $0.003-0.006 per message --- ## 1. High-Level Chat Flow ### 1.1 Complete Chat Flow Diagram ```mermaid flowchart TD User[User] -->|1. Navigate to /chat| Browser[Browser] Browser -->|2. GET /chat| Flask[Flask App
app.py] Flask -->|3. Require login| AuthCheck{Authenticated?} AuthCheck -->|No| Login[Redirect to /login] AuthCheck -->|Yes| ChatUI[Render chat.html] ChatUI -->|4. Load UI| Browser Browser -->|5. POST /api/chat/start| Flask Flask -->|6. Create conversation| ChatEngine[NordaBizChatEngine
nordabiz_chat.py] ChatEngine -->|7. INSERT| ConvDB[(ai_chat_conversations)] ConvDB -->|8. conversation_id| ChatEngine ChatEngine -->|9. Return conversation| Flask Flask -->|10. JSON response| Browser Browser -->|11. User types message| UserInput[User Message] UserInput -->|12. POST /api/chat/:id/message| Flask Flask -->|13. Verify ownership| DB[(PostgreSQL)] Flask -->|14. send_message| ChatEngine ChatEngine -->|15. Save user message| MsgDB[(ai_chat_messages)] ChatEngine -->|16. Build context| ContextBuilder[Context Builder
_build_conversation_context] ContextBuilder -->|17. Load ALL companies| DB ContextBuilder -->|18. Load last 10 messages| MsgDB ContextBuilder -->|19. Compact format| Context[Full Context
JSON] Context -->|20. Query AI| GeminiService[Gemini Service
gemini_service.py] GeminiService -->|21. API call| GeminiAPI[Google Gemini API
gemini-2.5-flash] GeminiAPI -->|22. AI response| GeminiService GeminiService -->|23. Track cost| CostDB[(ai_api_costs)] GeminiService -->|24. Response text| ChatEngine ChatEngine -->|25. Count tokens| TokenCounter[Tokenizer] TokenCounter -->|26. tokens_input, tokens_output| ChatEngine ChatEngine -->|27. Save AI message| MsgDB ChatEngine -->|28. Update conversation| ConvDB ChatEngine -->|29. Return response| Flask Flask -->|30. JSON + tech_info| Browser Browser -->|31. Display message| User style ChatEngine fill:#4CAF50 style GeminiService fill:#2196F3 style ContextBuilder fill:#FF9800 style DB fill:#9C27B0 ``` --- ## 2. Chat Initialization Flow ### 2.1 Start Conversation **Route:** `POST /api/chat/start` **File:** `app.py` (lines 3511-3533) **Authentication:** Required (`@login_required`) ```mermaid sequenceDiagram actor User participant Browser participant Flask as Flask App
(app.py) participant Engine as NordaBizChatEngine
(nordabiz_chat.py) participant DB as PostgreSQL
(ai_chat_conversations) User->>Browser: Click "Start Chat" Browser->>Flask: POST /api/chat/start
{title: "Rozmowa..."} Note over Flask: @login_required Flask->>Flask: Get current_user.id Flask->>Engine: start_conversation(
user_id=current_user.id,
title="Rozmowa - 2026-01-10 10:30"
) Engine->>Engine: Auto-generate title if not provided Engine->>DB: INSERT INTO ai_chat_conversations
(user_id, started_at, title,
conversation_type, is_active,
message_count, model_name) DB->>Engine: conversation.id = 123 Engine->>Flask: Return AIChatConversation object Flask->>Browser: JSON {
success: true,
conversation_id: 123,
title: "Rozmowa - 2026-01-10 10:30"
} Browser->>User: Chat session ready ``` **Database Operation:** ```sql INSERT INTO ai_chat_conversations ( user_id, started_at, conversation_type, title, is_active, message_count, model_name, created_at ) VALUES ( ?, NOW(), 'general', ?, TRUE, 0, 'gemini-2.5-flash', NOW() ); ``` **Response:** ```json { "success": true, "conversation_id": 123, "title": "Rozmowa - 2026-01-10 10:30" } ``` --- ## 3. Message Flow (Core Chat Logic) ### 3.1 Send Message Sequence **Route:** `POST /api/chat//message` **File:** `app.py` (lines 3536-3603) **Authentication:** Required (`@login_required`) ```mermaid sequenceDiagram actor User participant Browser participant Flask as Flask App participant Engine as NordaBizChatEngine participant DB as PostgreSQL participant Context as Context Builder participant Search as SearchService participant Gemini as GeminiService participant API as Gemini API participant CostDB as ai_api_costs User->>Browser: Type: "Kto robi strony www?" Browser->>Flask: POST /api/chat/123/message
{message: "Kto robi strony www?"} Note over Flask: Verify conversation ownership Flask->>DB: SELECT * FROM ai_chat_conversations
WHERE id = 123 AND user_id = ? DB->>Flask: Conversation found Flask->>Engine: send_message(
conversation_id=123,
user_message="Kto robi strony www?",
user_id=current_user.id
) Note over Engine: 1. Save user message Engine->>DB: INSERT INTO ai_chat_messages
(conversation_id, role='user',
content="Kto robi strony www?") DB->>Engine: Message saved Note over Engine: 2. Build context with ALL companies Engine->>Context: _build_conversation_context(
db, conversation, message
) Context->>DB: SELECT * FROM companies
WHERE status = 'active' DB->>Context: 80 companies Context->>DB: SELECT * FROM ai_chat_messages
WHERE conversation_id = 123
ORDER BY created_at DESC
LIMIT 10 DB->>Context: Last 10 messages Context->>Context: Build compact JSON format
(minimize tokens) Context->>Engine: Return full context dict Note over Engine: 3. Query AI with full context Engine->>Gemini: generate_text(
prompt=system_prompt + context + history,
feature='ai_chat',
user_id=current_user.id,
temperature=0.7
) Gemini->>API: POST /v1/models/gemini-2.5-flash:generateContent API->>Gemini: AI response text Note over Gemini: Track API cost to database Gemini->>Gemini: Count tokens (input, output) Gemini->>Gemini: Calculate cost
($0.075/$0.30 per 1M tokens) Gemini->>CostDB: INSERT INTO ai_api_costs
(api_provider, model_name, feature,
tokens, cost, latency_ms) Gemini->>Engine: Return response text Note over Engine: 4. Calculate per-message metrics Engine->>Engine: tokenizer.count_tokens(user_message) Engine->>Engine: tokenizer.count_tokens(response) Engine->>Engine: Calculate latency_ms, cost_usd Note over Engine: 5. Save AI response Engine->>DB: INSERT INTO ai_chat_messages
(conversation_id, role='assistant',
content=response, tokens_input,
tokens_output, cost_usd, latency_ms) Note over Engine: 6. Update conversation stats Engine->>DB: UPDATE ai_chat_conversations
SET message_count = message_count + 2,
updated_at = NOW()
WHERE id = 123 Engine->>Flask: Return AIChatMessage object Note over Flask: Get free tier usage stats Flask->>CostDB: SELECT COUNT(*), SUM(tokens)
FROM ai_api_costs
WHERE DATE(timestamp) = TODAY() CostDB->>Flask: requests_today, tokens_today Flask->>Browser: JSON {
success: true,
message: "PIXLAB, WebStorm...",
tech_info: {...}
} Browser->>User: Display AI response ``` ### 3.2 Message Implementation Details **Input Validation:** - Message cannot be empty (`.strip()` check) - Conversation ownership verified (user_id match) - Conversation must exist and be active **Database Operations:** ```sql -- Save user message INSERT INTO ai_chat_messages ( conversation_id, created_at, role, content, edited, regenerated ) VALUES (?, NOW(), 'user', ?, FALSE, FALSE); -- Save AI response with metrics INSERT INTO ai_chat_messages ( conversation_id, created_at, role, content, tokens_input, tokens_output, cost_usd, latency_ms, edited, regenerated ) VALUES (?, NOW(), 'assistant', ?, ?, ?, ?, ?, FALSE, FALSE); -- Update conversation UPDATE ai_chat_conversations SET message_count = message_count + 2, updated_at = NOW() WHERE id = ?; ``` **Response Format:** ```json { "success": true, "message": "Znalazłem kilka firm zajmujących się stronami www: PIXLAB (www.pixlab.pl, tel: 509 509 689), WebStorm Agencja Interaktywna...", "message_id": 456, "created_at": "2026-01-10T10:35:22.123456", "tech_info": { "model": "gemini-2.5-flash", "data_source": "PostgreSQL (80 firm Norda Biznes)", "architecture": "Full DB Context (wszystkie firmy w kontekście AI)", "tokens_input": 8543, "tokens_output": 234, "tokens_total": 8777, "latency_ms": 342, "theoretical_cost_usd": 0.00128, "actual_cost_usd": 0.0, "free_tier": { "is_free": true, "daily_limit": 1500, "requests_today": 47, "tokens_today": 423891, "remaining": 1453 } } } ``` --- ## 4. Context Building (Core Intelligence) ### 4.1 Context Building Flow **Method:** `_build_conversation_context(db, conversation, current_message)` **File:** `nordabiz_chat.py` (lines 254-310) **Strategy:** Full database context (AI does intelligent filtering) ```mermaid flowchart TD Start([User Message:
"Kto robi strony www?"]) --> LoadCompanies[Load ALL active companies
FROM companies WHERE status='active'] LoadCompanies --> Count[Total: 80 companies] Count --> LoadCategories[Load all categories with counts] LoadCategories --> LoadHistory[Load last 10 conversation messages
ORDER BY created_at DESC] LoadHistory --> BuildContext[Build context dict] BuildContext --> CompactFormat[Convert ALL companies
to compact format] CompactFormat --> CompactLoop{For each
company} CompactLoop -->|Process| CompactFields[Include only non-empty fields:
- name, cat (category)
- desc (description_short)
- history (founding_history)
- svc (services)
- comp (competencies)
- web, tel, mail
- city, year
- cert (top 3 certifications)] CompactFields --> SaveTokens[Save tokens by:
- Short field names
- Omit empty fields
- Limit certs to 3] SaveTokens --> NextCompany{More
companies?} NextCompany -->|Yes| CompactLoop NextCompany -->|No| ContextReady[Context ready] ContextReady --> ContextDict{Context Dictionary} ContextDict --> Field1[conversation_type: 'general'] ContextDict --> Field2[total_companies: 80] ContextDict --> Field3[categories: Array] ContextDict --> Field4[all_companies: Array
~8,000-12,000 tokens] ContextDict --> Field5[recent_messages: Array
Last 10 messages] Field1 & Field2 & Field3 & Field4 & Field5 --> Return[Return to _query_ai] style BuildContext fill:#4CAF50 style CompactFormat fill:#FF9800 style ContextDict fill:#2196F3 ``` ### 4.2 Compact Company Format **Purpose:** Minimize token usage while preserving all important data **Example Company Object:** ```json { "name": "PIXLAB Sp. z o.o.", "cat": "IT i Technologie", "desc": "Agencja interaktywna - strony www, sklepy online, aplikacje", "history": "Założona przez Macieja Pieńczyńskiego w 2015 roku", "svc": ["Strony WWW", "E-commerce", "Aplikacje webowe", "SEO"], "comp": ["WordPress", "Shopify", "React", "Node.js"], "web": "https://pixlab.pl", "tel": "509 509 689", "mail": "kontakt@pixlab.pl", "city": "Wejherowo", "year": 2015, "cert": ["ISO 9001", "Google Partner"] } ``` **Token Savings:** - Short field names: `svc` instead of `services` (-40%) - Omit empty fields: Only include if data exists (-30%) - Limit certifications: Top 3 instead of all (-20%) - Compact JSON: No extra whitespace (-10%) **Typical Token Usage:** - Single company: ~100-150 tokens (compact) - All 80 companies: ~8,000-12,000 tokens - System prompt: ~500 tokens - Conversation history (10 msgs): ~1,000-2,000 tokens - **Total input:** ~10,000-15,000 tokens --- ## 5. AI Query & Prompt Engineering ### 5.1 AI Query Flow **Method:** `_query_ai(context, user_message, user_id)` **File:** `nordabiz_chat.py` (lines 406-481) ```mermaid flowchart TD Start([Context + User Message]) --> BuildPrompt[Build system prompt] BuildPrompt --> SystemPrompt[SYSTEM PROMPT:
- Role definition
- Database stats
- Instructions
- Data format guide] SystemPrompt --> AddCompanies[Add ALL companies JSON
~8,000-12,000 tokens] AddCompanies --> AddHistory[Add conversation history
Last 10 messages] AddHistory --> AddUserMsg[Add current user message] AddUserMsg --> FullPrompt[Complete prompt ready
~10,000-15,000 tokens] FullPrompt --> UseGlobal{use_global_service?} UseGlobal -->|Yes (default)| GeminiSvc[gemini_service.generate_text] UseGlobal -->|No (legacy)| DirectAPI[model.generate_content] GeminiSvc --> AutoCost[Automatic cost tracking
to ai_api_costs table] DirectAPI --> NoCost[No cost tracking] AutoCost --> APICall[Gemini API Call
gemini-2.5-flash] NoCost --> APICall APICall --> Response[AI Response
~200-400 tokens] Response --> Return[Return response text] style SystemPrompt fill:#4CAF50 style GeminiSvc fill:#2196F3 style AutoCost fill:#FF9800 ``` ### 5.2 System Prompt Structure **File:** `nordabiz_chat.py` (lines 426-458) ``` Jesteś pomocnym asystentem portalu Norda Biznes - katalogu firm zrzeszonych w stowarzyszeniu Norda Biznes z Wejherowa. 📊 MASZ DOSTĘP DO PEŁNEJ BAZY DANYCH: - Liczba firm: 80 - Kategorie: IT i Technologie (25), Budownictwo (18), Usługi (15), ... 🎯 TWOJA ROLA: - Analizujesz CAŁĄ bazę firm i wybierasz najlepsze dopasowania do pytania - Odpowiadasz zwięźle (2-3 zdania), chyba że użytkownik prosi o szczegóły - Podajesz konkretne nazwy firm z kontaktem - Możesz wyszukiwać po: nazwie, usługach, kompetencjach, właścicielach, mieście 📋 FORMAT DANYCH (skróty): - name: nazwa firmy - cat: kategoria - desc: krótki opis - history: historia firmy, właściciele, założyciele - svc: usługi - comp: kompetencje - web/tel/mail: kontakt - city: miasto - cert: certyfikaty ⚠️ WAŻNE: - ZAWSZE podawaj nazwę firmy i kontakt (tel/web/mail jeśli dostępne) - Jeśli pytanie o osobę (np. "kto to Roszman") - szukaj w polu "history" - Odpowiadaj PO POLSKU 🏢 PEŁNA BAZA FIRM (wybierz najlepsze): [JSON array with all 80 companies in compact format] # HISTORIA ROZMOWY: Użytkownik: [previous message 1] Ty: [previous response 1] Użytkownik: [previous message 2] Ty: [previous response 2] ... Użytkownik: Kto robi strony www? Ty: ``` **Prompt Engineering Principles:** 1. **Clear role definition:** "Jesteś pomocnym asystentem..." 2. **Database context:** Total companies, category distribution 3. **Response guidelines:** Concise (2-3 sentences), specific contacts 4. **Data format guide:** Field name abbreviations explained 5. **Search capabilities:** What AI can search by 6. **Important notes:** Always include contact, search in "history" for people 7. **Language:** Always respond in Polish 8. **Full context:** ALL companies provided (AI does filtering) 9. **Conversation history:** Last 10 messages for context continuity --- ## 6. Cost Tracking & Performance ### 6.1 Dual Cost Tracking System The application uses **TWO levels** of cost tracking: **Level 1: Global API Cost Tracking** (ai_api_costs table) - Managed by `gemini_service.py` - Tracks ALL Gemini API calls (chat, image analysis, etc.) - Automatic via `_log_api_cost()` method **Level 2: Per-Message Chat Metrics** (ai_chat_messages table) - Managed by `nordabiz_chat.py` - Tracks tokens, cost, latency per chat message - User-facing metrics for transparency ### 6.2 Cost Tracking Flow ```mermaid sequenceDiagram participant Engine as NordaBizChatEngine participant Gemini as GeminiService participant API as Gemini API participant GlobalDB as ai_api_costs participant ChatDB as ai_chat_messages Engine->>Gemini: generate_text(
prompt, feature='ai_chat',
user_id=123
) Note over Gemini: Start timer Gemini->>API: POST /generateContent API->>Gemini: Response text Note over Gemini: Stop timer (latency_ms) Note over Gemini: Count tokens Gemini->>Gemini: input_tokens = count_tokens(prompt) Gemini->>Gemini: output_tokens = count_tokens(response) Note over Gemini: Calculate cost Gemini->>Gemini: input_cost = (input/1M) * $0.075 Gemini->>Gemini: output_cost = (output/1M) * $0.30 Gemini->>Gemini: total_cost = input + output Note over Gemini: Global cost tracking Gemini->>GlobalDB: INSERT INTO ai_api_costs
(api_provider='gemini',
model='gemini-2.5-flash',
feature='ai_chat',
user_id=123,
tokens, cost, latency) Gemini->>Engine: Return response text Note over Engine: Per-message tracking Engine->>Engine: tokenizer.count_tokens(user_msg) Engine->>Engine: tokenizer.count_tokens(response) Engine->>Engine: Calculate cost again (for message record) Engine->>ChatDB: INSERT INTO ai_chat_messages
(role='assistant',
content, tokens_input,
tokens_output, cost_usd,
latency_ms) ``` ### 6.3 Cost Calculation **Gemini 2.5 Flash Pricing:** - **Input:** $0.075 per 1M tokens - **Output:** $0.30 per 1M tokens - **Free Tier:** 1,500 requests/day (unlimited tokens) **Typical Chat Message:** ``` Input: 10,000 tokens (system prompt + companies + history) = $0.00075 Output: 300 tokens (AI response) = $0.00009 Total: = $0.00084 ``` **Daily Usage Estimate:** - 100 chat messages/day - Average 10,000 input + 300 output tokens - Theoretical cost: $0.084/day ($2.52/month) - **Actual cost: $0.00** (free tier covers all usage) ### 6.4 Free Tier Monitoring **Function:** `get_free_tier_usage()` **File:** `app.py` ```python def get_free_tier_usage(): """Get free tier usage stats for today""" db = SessionLocal() try: today_start = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0) stats = db.query( func.count(AIAPICostLog.id).label('requests'), func.sum(AIAPICostLog.total_tokens).label('tokens') ).filter( AIAPICostLog.timestamp >= today_start, AIAPICostLog.api_provider == 'gemini', AIAPICostLog.success == True ).first() return { 'requests_today': stats.requests or 0, 'tokens_today': stats.tokens or 0, 'daily_limit': 1500, 'remaining': max(0, 1500 - (stats.requests or 0)) } finally: db.close() ``` **Response in `/api/chat/:id/message`:** ```json { "tech_info": { "free_tier": { "is_free": true, "daily_limit": 1500, "requests_today": 47, "tokens_today": 423891, "remaining": 1453 } } } ``` --- ## 7. Conversation History ### 7.1 Get History Flow **Route:** `GET /api/chat//history` **File:** `app.py` (lines 3606-3634) **Authentication:** Required (`@login_required`) ```mermaid sequenceDiagram actor User participant Browser participant Flask as Flask App participant Engine as NordaBizChatEngine participant DB as ai_chat_messages User->>Browser: Load chat history Browser->>Flask: GET /api/chat/123/history Note over Flask: Verify ownership Flask->>DB: SELECT * FROM ai_chat_conversations
WHERE id = 123 AND user_id = ? DB->>Flask: Conversation found Flask->>Engine: get_conversation_history(123) Engine->>DB: SELECT * FROM ai_chat_messages
WHERE conversation_id = 123
ORDER BY created_at ASC DB->>Engine: All messages in conversation Engine->>Engine: Format messages as dicts Engine->>Flask: Return messages array Flask->>Browser: JSON {
success: true,
messages: [...]
} Browser->>User: Display conversation history ``` **Response Format:** ```json { "success": true, "messages": [ { "id": 789, "role": "user", "content": "Kto robi strony www?", "created_at": "2026-01-10T10:35:00.123456", "tokens_input": 0, "tokens_output": 0, "cost_usd": 0.0, "latency_ms": 0 }, { "id": 790, "role": "assistant", "content": "Znalazłem kilka firm zajmujących się stronami www...", "created_at": "2026-01-10T10:35:02.456789", "tokens_input": 8543, "tokens_output": 234, "cost_usd": 0.00128, "latency_ms": 342 } ] } ``` --- ## 8. Database Schema ### 8.1 Conversation Tables **ai_chat_conversations** (conversation metadata) ```sql CREATE TABLE ai_chat_conversations ( id SERIAL PRIMARY KEY, user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE, started_at TIMESTAMP NOT NULL DEFAULT NOW(), updated_at TIMESTAMP, conversation_type VARCHAR(50) DEFAULT 'general', title VARCHAR(500), is_active BOOLEAN DEFAULT TRUE, message_count INTEGER DEFAULT 0, model_name VARCHAR(100) ); CREATE INDEX idx_chat_conv_user_id ON ai_chat_conversations(user_id); CREATE INDEX idx_chat_conv_started_at ON ai_chat_conversations(started_at DESC); ``` **ai_chat_messages** (individual messages) ```sql CREATE TABLE ai_chat_messages ( id SERIAL PRIMARY KEY, conversation_id INTEGER NOT NULL REFERENCES ai_chat_conversations(id) ON DELETE CASCADE, created_at TIMESTAMP NOT NULL DEFAULT NOW(), role VARCHAR(20) NOT NULL, -- 'user' or 'assistant' content TEXT NOT NULL, tokens_input INTEGER, tokens_output INTEGER, cost_usd DECIMAL(10,6), latency_ms INTEGER, edited BOOLEAN DEFAULT FALSE, regenerated BOOLEAN DEFAULT FALSE ); CREATE INDEX idx_chat_msg_conv_id ON ai_chat_messages(conversation_id); CREATE INDEX idx_chat_msg_created_at ON ai_chat_messages(created_at); ``` **ai_api_costs** (global API cost tracking) ```sql CREATE TABLE ai_api_costs ( id SERIAL PRIMARY KEY, timestamp TIMESTAMP NOT NULL DEFAULT NOW(), api_provider VARCHAR(50) NOT NULL, -- 'gemini' model_name VARCHAR(100), -- 'gemini-2.5-flash' feature VARCHAR(100), -- 'ai_chat', 'image_analysis', etc. user_id INTEGER REFERENCES users(id), input_tokens INTEGER, output_tokens INTEGER, total_tokens INTEGER, input_cost DECIMAL(10,6), output_cost DECIMAL(10,6), total_cost DECIMAL(10,6), success BOOLEAN DEFAULT TRUE, error_message TEXT, latency_ms INTEGER, prompt_hash VARCHAR(64) ); CREATE INDEX idx_api_costs_timestamp ON ai_api_costs(timestamp DESC); CREATE INDEX idx_api_costs_provider ON ai_api_costs(api_provider); CREATE INDEX idx_api_costs_feature ON ai_api_costs(feature); CREATE INDEX idx_api_costs_user_id ON ai_api_costs(user_id); ``` ### 8.2 Entity Relationships ```mermaid erDiagram users ||--o{ ai_chat_conversations : "has many" ai_chat_conversations ||--o{ ai_chat_messages : "contains" users ||--o{ ai_api_costs : "generates" users { int id PK varchar email varchar name boolean is_admin } ai_chat_conversations { int id PK int user_id FK timestamp started_at varchar conversation_type varchar title boolean is_active int message_count varchar model_name } ai_chat_messages { int id PK int conversation_id FK timestamp created_at varchar role text content int tokens_input int tokens_output decimal cost_usd int latency_ms } ai_api_costs { int id PK timestamp timestamp varchar api_provider varchar model_name varchar feature int user_id FK int total_tokens decimal total_cost int latency_ms } ``` --- ## 9. Error Handling ### 9.1 Common Error Scenarios **1. Conversation Not Found** ```python # app.py conversation = db.query(AIChatConversation).filter_by( id=conversation_id, user_id=current_user.id ).first() if not conversation: return jsonify({ 'success': False, 'error': 'Conversation not found' }), 404 ``` **2. Empty Message** ```python message = data.get('message', '').strip() if not message: return jsonify({ 'success': False, 'error': 'Wiadomość nie może być pusta' }), 400 ``` **3. Gemini API Error** ```python # gemini_service.py try: response = self.model.generate_content(prompt) # Check safety filters if not response.candidates: raise Exception("Response blocked by safety filters") # Check finish reason if candidate.finish_reason not in [1, 0]: # STOP or UNSPECIFIED raise Exception(f"Response incomplete: {finish_reason}") except Exception as e: logger.error(f"Gemini API error: {e}") # Log failed request to database self._log_api_cost( prompt=prompt, response_text='', input_tokens=self.count_tokens(prompt), output_tokens=0, success=False, error_message=str(e) ) raise Exception(f"Gemini API call failed: {str(e)}") ``` **4. Database Connection Error** ```python # nordabiz_chat.py db = SessionLocal() try: # Database operations conversation = db.query(AIChatConversation).filter_by(id=conversation_id).first() # ... finally: db.close() # Always close connection ``` ### 9.2 Error Response Format ```json { "success": false, "error": "Conversation not found" } ``` **HTTP Status Codes:** - `400` - Bad Request (empty message, invalid input) - `404` - Not Found (conversation doesn't exist) - `500` - Internal Server Error (Gemini API failure, database error) --- ## 10. Search Integration ### 10.1 Search Service Integration **Method:** `_find_relevant_companies(db, message)` **File:** `nordabiz_chat.py` (lines 383-404) **Status:** DEPRECATED (kept for reference, not used in production) **Historical Context:** The chat engine originally used SearchService to **pre-filter** companies before sending to AI: ```python # OLD APPROACH (deprecated): def _find_relevant_companies(self, db, message): """Find companies relevant to user's message""" results = search_companies(db, message, limit=10) return [result.company for result in results] # In _build_conversation_context: relevant_companies = self._find_relevant_companies(db, current_message) context['companies'] = [self._company_to_compact_dict(c) for c in relevant_companies] ``` **Current Approach:** Send **ALL companies** to AI and let it do intelligent filtering: ```python # NEW APPROACH (current production): def _build_conversation_context(self, db, conversation, current_message): """Build context with ALL companies (not pre-filtered)""" all_companies = db.query(Company).filter_by(status='active').all() context['all_companies'] = [ self._company_to_compact_dict(c) for c in all_companies ] return context ``` **Why the Change?** | Aspect | Old (Pre-filtered) | New (Full Context) | |--------|-------------------|-------------------| | **Companies sent** | 8-10 (search filtered) | 80 (all active) | | **Token usage** | ~1,500 tokens | ~10,000 tokens | | **Search quality** | Keyword-based, limited | AI-powered, intelligent | | **Multi-criteria** | Difficult | Excellent | | **Owner searches** | Impossible | Works perfectly | | **Cost** | $0.0001/msg | $0.0008/msg | | **User experience** | Sometimes misses results | Always comprehensive | **Example:** - User: "Kto to Roszman?" (Who is Roszman?) - Old approach: Search for "roszman" in services/competencies → 0 results ❌ - New approach: AI searches `founding_history` field → Finds company owner ✅ --- ## 11. Performance & Optimization ### 11.1 Performance Metrics **Typical Chat Message:** - **Latency:** 200-400ms - **Input tokens:** 8,000-15,000 (system prompt + 80 companies + history) - **Output tokens:** 200-500 (AI response) - **Total tokens:** 8,500-15,500 - **Theoretical cost:** $0.0008-0.0015 - **Actual cost:** $0.00 (free tier) **Database Queries:** - Conversation lookup: ~5ms (indexed on user_id, id) - All companies query: ~50ms (80 rows, no complex joins) - Last 10 messages: ~10ms (indexed on conversation_id, created_at) - **Total DB time:** ~65ms **Gemini API:** - Network latency: ~100-200ms - Processing time: ~100-200ms - **Total API time:** ~250-350ms ### 11.2 Token Optimization Strategies **1. Compact Field Names** ```python # GOOD (saves ~40% tokens): {"name": "PIXLAB", "svc": ["WWW", "SEO"], "comp": ["WordPress"]} # BAD (wasteful): {"company_name": "PIXLAB", "services": ["WWW", "SEO"], "competencies": ["WordPress"]} ``` **2. Omit Empty Fields** ```python # GOOD: compact = {"name": c.name} if c.description_short: compact['desc'] = c.description_short # Only adds field if data exists # BAD: compact = { "name": c.name, "desc": c.description_short or "", # Wastes tokens on "" } ``` **3. Limit Arrays** ```python # GOOD (top 3 certifications): if c.certifications: compact['cert'] = [cert.name for cert in c.certifications[:3]] # BAD (all certifications): compact['cert'] = [cert.name for cert in c.certifications] # May be 10+ ``` **4. Compact JSON (no whitespace)** ```python # GOOD: json.dumps(data, ensure_ascii=False, indent=None) # {"name":"PIXLAB","svc":["WWW"]} # BAD: json.dumps(data, ensure_ascii=False, indent=2) # { # "name": "PIXLAB", # "svc": ["WWW"] # } ``` **Token Savings:** - Single company: 200 tokens → 100 tokens (50% reduction) - 80 companies: 16,000 tokens → 8,000 tokens (50% reduction) - Cost savings: $0.0016 → $0.0008 per message (50% reduction) ### 11.3 Caching Opportunities (Future) **Not Currently Implemented** (all companies loaded per message) **Potential Optimizations:** 1. **Company data caching** (Redis) - Cache all companies JSON for 5 minutes - Invalidate on company data changes - Reduce DB query time: 50ms → 5ms 2. **Prompt template caching** - Cache system prompt template - Only rebuild when companies change 3. **Conversation context caching** - Cache last 10 messages per conversation - Invalidate on new message - Reduce DB query time: 10ms → 1ms **Why Not Implemented Yet:** - Current performance is acceptable (250-350ms total) - Free tier has no rate limits on DB queries - Premature optimization (80 companies is small dataset) - Complexity vs. benefit tradeoff --- ## 12. Security & Access Control ### 12.1 Authentication & Authorization **All chat routes require authentication:** ```python @app.route('/chat') @login_required def chat(): """AI Chat interface""" return render_template('chat.html') @app.route('/api/chat/start', methods=['POST']) @login_required def chat_start(): # Only logged-in users can start conversations ... @app.route('/api/chat//message', methods=['POST']) @login_required def chat_send_message(conversation_id): # Verify conversation ownership conversation = db.query(AIChatConversation).filter_by( id=conversation_id, user_id=current_user.id # IMPORTANT: Ownership check ).first() if not conversation: return jsonify({'error': 'Conversation not found'}), 404 ... ``` ### 12.2 Input Sanitization **User message sanitization:** ```python # app.py message = data.get('message', '').strip() # No HTML/JavaScript injection possible # Gemini API treats all input as plain text # Database stores as TEXT (no code execution) ``` **No SQL Injection:** ```python # Safe (parameterized query): conversation = db.query(AIChatConversation).filter_by( id=conversation_id, user_id=current_user.id ).first() # PostgreSQL parameters prevent SQL injection ``` ### 12.3 Rate Limiting **Gemini API Free Tier Limits:** - 1,500 requests/day - No per-minute limit - No token limit **Application-Level Limits:** - No specific rate limiting on chat endpoints (yet) - User must be logged in (reduces abuse) - Flask-Limiter can be added if needed **Future Rate Limiting:** ```python from flask_limiter import Limiter limiter = Limiter(app, key_func=lambda: current_user.id) @app.route('/api/chat//message', methods=['POST']) @login_required @limiter.limit("60 per hour") # 60 messages per hour per user def chat_send_message(conversation_id): ... ``` --- ## 13. Monitoring & Debugging ### 13.1 Cost Tracking Queries **Daily API usage:** ```sql SELECT DATE(timestamp) as date, COUNT(*) as requests, SUM(total_tokens) as tokens, SUM(total_cost) as cost_usd FROM ai_api_costs WHERE api_provider = 'gemini' AND feature = 'ai_chat' GROUP BY DATE(timestamp) ORDER BY date DESC; ``` **Top users by API usage:** ```sql SELECT u.name, u.email, COUNT(*) as chat_messages, SUM(c.total_tokens) as total_tokens, SUM(c.total_cost) as total_cost_usd FROM ai_api_costs c JOIN users u ON c.user_id = u.id WHERE c.api_provider = 'gemini' AND c.feature = 'ai_chat' GROUP BY u.id, u.name, u.email ORDER BY total_cost_usd DESC LIMIT 10; ``` **Free tier usage today:** ```sql SELECT COUNT(*) as requests_today, SUM(total_tokens) as tokens_today, 1500 - COUNT(*) as remaining_requests FROM ai_api_costs WHERE DATE(timestamp) = CURRENT_DATE AND api_provider = 'gemini' AND success = TRUE; ``` ### 13.2 Chat Analytics **Most active conversations:** ```sql SELECT c.id, c.title, u.name as user_name, c.message_count, c.started_at, c.updated_at FROM ai_chat_conversations c JOIN users u ON c.user_id = u.id WHERE c.is_active = TRUE ORDER BY c.message_count DESC LIMIT 20; ``` **Average response metrics:** ```sql SELECT AVG(tokens_input) as avg_input_tokens, AVG(tokens_output) as avg_output_tokens, AVG(latency_ms) as avg_latency_ms, AVG(cost_usd) as avg_cost_usd FROM ai_chat_messages WHERE role = 'assistant' AND created_at > NOW() - INTERVAL '7 days'; ``` ### 13.3 Error Monitoring **Failed API requests:** ```sql SELECT timestamp, model_name, feature, error_message, latency_ms FROM ai_api_costs WHERE success = FALSE AND api_provider = 'gemini' ORDER BY timestamp DESC LIMIT 20; ``` **Conversations with errors:** ```sql -- Conversations where last message is from user (AI didn't respond) SELECT c.id, c.title, c.message_count, c.updated_at, (SELECT content FROM ai_chat_messages WHERE conversation_id = c.id ORDER BY created_at DESC LIMIT 1) as last_message FROM ai_chat_conversations c WHERE c.message_count % 2 = 1 -- Odd number (user message without response) AND c.updated_at > NOW() - INTERVAL '1 hour' ORDER BY c.updated_at DESC; ``` --- ## 14. Future Enhancements ### 14.1 Planned Features **1. Conversation Context Memory** - Remember user preferences across sessions - "Remember that I'm looking for IT services" - Personalized recommendations **2. Conversation Sharing** - Share conversation URL with other users - Public vs. private conversations - Embed chat widget on company profiles **3. Voice Input/Output** - Web Speech API for voice input - Text-to-speech for AI responses - Hands-free interaction **4. Multi-Modal Input** - Upload images (company logo, product photos) - Gemini Vision API for image analysis - "Find companies similar to this logo" **5. Conversation Search** - Full-text search across all user conversations - Filter by date, company mentioned, topic - Export conversation history **6. Advanced Analytics** - Which companies are most recommended by AI? - What services are users asking about most? - Conversation funnel (browse → chat → contact) ### 14.2 Optimization Opportunities **1. Redis Caching** ```python # Cache all companies JSON redis_key = f"companies:all:{version_hash}" cached = redis.get(redis_key) if cached: all_companies = json.loads(cached) else: all_companies = load_from_db() redis.setex(redis_key, 300, json.dumps(all_companies)) # 5 min TTL ``` **2. Prompt Compression** - Use Gemini's context caching feature (when available) - Cache system prompt + company database - Only send new user message (save 90% tokens) **3. Streaming Responses** ```python @app.route('/api/chat//message', methods=['POST']) def chat_send_message(conversation_id): # Enable streaming response = gemini_service.generate_text( prompt=full_prompt, stream=True # Return generator ) # Server-Sent Events (SSE) def generate(): for chunk in response: yield f"data: {json.dumps({'text': chunk.text})}\n\n" return Response(generate(), mimetype='text/event-stream') ``` **4. Conversation Summarization** - Auto-summarize conversations > 20 messages - Include summary instead of full history - Reduce token usage by 50% --- ## 15. Troubleshooting Guide ### 15.1 Common Issues **Issue: "Conversation not found" error** ``` Cause: User trying to access someone else's conversation Fix: Verify conversation_id belongs to current_user.id SQL Debug: SELECT id, user_id FROM ai_chat_conversations WHERE id = 123; ``` **Issue: Empty AI responses** ``` Cause: Gemini safety filters blocking response Fix: Check ai_api_costs for error_message SQL Debug: SELECT error_message, prompt_hash FROM ai_api_costs WHERE success = FALSE ORDER BY timestamp DESC LIMIT 10; ``` **Issue: Slow response times (> 1 second)** ``` Cause: Large context (many companies, long history) Fix: Check token counts, consider summarization SQL Debug: SELECT tokens_input, tokens_output, latency_ms FROM ai_chat_messages WHERE latency_ms > 1000 ORDER BY created_at DESC LIMIT 20; ``` **Issue: "Free tier limit exceeded"** ``` Cause: > 1,500 requests in 24 hours Fix: Wait for quota reset (midnight Pacific Time) SQL Debug: SELECT COUNT(*) FROM ai_api_costs WHERE DATE(timestamp) = CURRENT_DATE AND api_provider = 'gemini'; ``` ### 15.2 Diagnostic Commands **Check Gemini API connectivity:** ```bash python3 -c " from gemini_service import GeminiService svc = GeminiService() response = svc.generate_text('Hello', feature='test') print(response) " ``` **Verify database connection:** ```bash psql -U nordabiz_app -d nordabiz -c " SELECT COUNT(*) as conversations FROM ai_chat_conversations; SELECT COUNT(*) as messages FROM ai_chat_messages; SELECT COUNT(*) as api_calls FROM ai_api_costs WHERE api_provider = 'gemini'; " ``` **Test chat flow:** ```python from nordabiz_chat import NordaBizChatEngine engine = NordaBizChatEngine() conv = engine.start_conversation(user_id=1, title="Test") response = engine.send_message(conv.id, "Test message", user_id=1) print(f"Response: {response.content}") ``` --- ## 16. Related Documentation - **[Search Flow](./02-search-flow.md)** - Company search integration - **[Authentication Flow](./01-authentication-flow.md)** - User authentication - **[Flask Components](../04-flask-components.md)** - Application architecture - **[External Integrations](../06-external-integrations.md)** - Gemini API details - **[Database Schema](../05-database-schema.md)** - Database structure --- ## 17. Glossary | Term | Definition | |------|------------| | **NordaBizChatEngine** | Main chat engine class in `nordabiz_chat.py` | | **GeminiService** | Centralized Gemini API wrapper in `gemini_service.py` | | **Conversation** | Chat session with multiple messages | | **Context** | Full company database + history sent to AI | | **Compact Format** | Token-optimized company data format | | **Free Tier** | Google Gemini free tier (1,500 req/day) | | **Token** | Unit of text (~4 characters) for AI models | | **Latency** | Response time in milliseconds | | **Cost Tracking** | Dual-level system (global + per-message) | | **System Prompt** | Instructions sent to AI with each query | --- ## 18. Maintenance **When to Update This Document:** - ✅ Gemini model version change (e.g., 2.5 → 3.0) - ✅ Pricing changes - ✅ New chat features (voice, images, etc.) - ✅ Context building algorithm changes - ✅ Database schema changes - ✅ Performance optimization implementations **Document Owner:** Development Team **Review Frequency:** Quarterly or after major changes **Last Review:** 2026-01-10 --- **END OF DOCUMENT**