nordabiz/docs/superpowers/plans/2026-03-28-nordagpt-identity-memory.md
Maciej Pienczyn 110d971dca
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
feat: migrate prod docs to OVH VPS + UTC→Warsaw timezone in all templates
Production moved from on-prem VM 249 (10.22.68.249) to OVH VPS
(57.128.200.27, inpi-vps-waw01). Updated ALL documentation, slash
commands, memory files, architecture docs, and deploy procedures.

Added |local_time Jinja filter (UTC→Europe/Warsaw) and converted
155 .strftime() calls across 71 templates so timestamps display
in Polish timezone regardless of server timezone.

Also includes: created_by_id tracking, abort import fix, ICS
calendar fix for missing end times, Pros Poland data cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 13:41:53 +02:00

64 KiB

NordaGPT Identity, Memory & Performance — Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Transform NordaGPT from an anonymous chatbot into a personalized assistant with user identity, persistent memory, smart routing, and streaming responses.

Architecture: Four-phase rollout: (1) inject user identity into AI prompt, (2) smart router + selective context loading, (3) streaming SSE responses, (4) persistent user memory with async extraction. Each phase is independently deployable and testable.

Tech Stack: Flask 3.0, SQLAlchemy 2.0, PostgreSQL, Google Gemini API (3-Flash, 3.1-Flash-Lite), Server-Sent Events, Jinja2 inline JS.

Spec: docs/superpowers/specs/2026-03-28-nordagpt-identity-memory-design.md


File Structure

New files

File Responsibility
smart_router.py Classifies query complexity, selects data categories and model
memory_service.py CRUD for user memory facts + conversation summaries, extraction prompt
context_builder.py Loads selective data from DB based on router decision
database/migrations/092_ai_user_memory.sql Memory + summary tables
database/migrations/093_ai_conversation_summary.sql Summary table

Modified files

File Changes
database.py Add AIUserMemory, AIConversationSummary models (before line 5954)
nordabiz_chat.py Accept user_context, integrate router, selective context, memory injection
gemini_service.py Token counting for streamed responses
blueprints/chat/routes.py Build user_context, add streaming endpoint, memory CRUD routes
templates/chat.html Streaming UI, thinking animation, memory settings panel

Phase 1: User Identity (Tasks 1-3)

Task 1: Pass user context from route to chat engine

Files:

  • Modify: blueprints/chat/routes.py:234-309

  • Modify: nordabiz_chat.py:163-180

  • Step 1: Build user_context dict in chat route

In blueprints/chat/routes.py, modify chat_send_message(). After line 262 (where current_user.id and current_user.email are used for limit check), add user_context construction:

# After line 262, before line 268
# Build user context for AI personalization
user_context = {
    'user_id': current_user.id,
    'user_name': current_user.name,
    'user_email': current_user.email,
    'company_name': current_user.company.name if current_user.company else None,
    'company_id': current_user.company.id if current_user.company else None,
    'company_category': current_user.company.category.name if current_user.company and current_user.company.category else None,
    'company_role': current_user.company_role or 'MEMBER',
    'is_norda_member': current_user.is_norda_member,
    'chamber_role': current_user.chamber_role,
    'member_since': current_user.created_at.strftime('%Y-%m-%d') if current_user.created_at else None,
}
  • Step 2: Pass user_context to send_message()

In the same function, modify the chat_engine.send_message() call (around line 282):

# Before:
ai_response = chat_engine.send_message(
    conversation_id,
    user_message=message,
    user_id=current_user.id,
    thinking_level=thinking_level
)

# After:
ai_response = chat_engine.send_message(
    conversation_id,
    user_message=message,
    user_id=current_user.id,
    thinking_level=thinking_level,
    user_context=user_context
)
  • Step 3: Update send_message() signature in nordabiz_chat.py

In nordabiz_chat.py, modify send_message() at line 163:

# Before:
def send_message(
    self,
    conversation_id: int,
    user_message: str,
    user_id: int,
    thinking_level: str = 'high'
) -> AIChatMessage:

# After:
def send_message(
    self,
    conversation_id: int,
    user_message: str,
    user_id: int,
    thinking_level: str = 'high',
    user_context: Optional[Dict[str, Any]] = None
) -> AIChatMessage:

Add from typing import Optional, Dict, Any to imports if not already present.

  • Step 4: Thread user_context through to _query_ai()

In send_message(), find the call to _query_ai() (around line 239) and add user_context:

# Before:
ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level)

# After:
ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level, user_context=user_context)
  • Step 5: Update _query_ai() signature

In nordabiz_chat.py, modify _query_ai() at line 890:

# Before:
def _query_ai(
    self,
    context: Dict[str, Any],
    user_message: str,
    user_id: Optional[int] = None,
    thinking_level: str = 'high'
) -> str:

# After:
def _query_ai(
    self,
    context: Dict[str, Any],
    user_message: str,
    user_id: Optional[int] = None,
    thinking_level: str = 'high',
    user_context: Optional[Dict[str, Any]] = None
) -> str:
  • Step 6: Commit
git add blueprints/chat/routes.py nordabiz_chat.py
git commit -m "refactor(chat): thread user_context from route through to _query_ai"

Task 2: Inject user identity into system prompt

Files:

  • Modify: nordabiz_chat.py:920-930

  • Step 1: Add user identity block to system prompt

In nordabiz_chat.py, inside _query_ai(), find line ~922 where system_prompt starts. Insert the user identity block BEFORE the main system prompt string (after line 921, before line 922):

        # Build user identity section
        user_identity = ""
        if user_context:
            user_identity = f"""
# AKTUALNY UŻYTKOWNIK
Rozmawiasz z: {user_context.get('user_name', 'Nieznany')}
Firma: {user_context.get('company_name', 'brak')} — kategoria: {user_context.get('company_category', 'brak')}
Rola w firmie: {user_context.get('company_role', 'MEMBER')}
Członek Izby Norda Biznes: {'tak' if user_context.get('is_norda_member') else 'nie'}
Rola w Izbie: {user_context.get('chamber_role') or '—'}
Na portalu od: {user_context.get('member_since', 'nieznana data')}

ZASADY PERSONALIZACJI:
- Zwracaj się do użytkownika po imieniu (pierwsze słowo z imienia i nazwiska)
- W pierwszej wiadomości konwersacji przywitaj się: "Cześć [imię], w czym mogę pomóc?"
- Na pytania "co wiesz o mnie?" / "kim jestem?" — wypisz powyższe dane + powiązania firmowe z bazy
- Uwzględniaj kontekst firmy użytkownika w odpowiedziach (np. sugeruj partnerów z komplementarnych branż)
- NIE ujawniaj danych technicznych (user_id, company_id, rola systemowa)
"""
  • Step 2: Prepend user_identity to system_prompt

Find where system_prompt is first assigned (line 922) and prepend:

        # Line 922 area - the system_prompt f-string starts here
        system_prompt = user_identity + f"""Jesteś pomocnym asystentem portalu Norda Biznes...

This is a minimal change — just concatenate user_identity (which is empty string if no context) before the existing prompt.

  • Step 3: Verify syntax compiles
python3 -m py_compile nordabiz_chat.py && echo "OK"
  • Step 4: Test locally

Start local dev server and send a chat message. Verify in logs that the prompt now contains the user identity block. Check that the AI greets by name.

python3 app.py
# In another terminal:
curl -X POST http://localhost:5000/api/chat/1/message \
  -H "Content-Type: application/json" \
  -d '{"message": "Kim jestem?"}'

(Note: requires auth cookie — easier to test via browser)

  • Step 5: Commit
git add nordabiz_chat.py
git commit -m "feat(nordagpt): inject user identity into AI system prompt — personalized greetings and context"

Task 3: Deploy Phase 1 and verify

Files: None (deployment only)

  • Step 1: Push to remotes
git push origin master && git push inpi master
  • Step 2: Deploy to staging
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
  • Step 3: Test on staging — verify AI greets by name

Open https://staging.nordabiznes.pl/chat, start new conversation, type "Cześć". Verify AI responds with your name.

Type "Co wiesz o mnie?" — verify AI lists your profile data.

  • Step 4: Deploy to production
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
  • Step 5: Commit deployment notes (update release_notes in routes.py)

Add new release entry in blueprints/public/routes.py _get_releases() function.


Phase 2: Smart Router + Context Builder (Tasks 4-7)

Task 4: Create context_builder.py — selective data loading

Files:

  • Create: context_builder.py

  • Step 1: Create context_builder.py with selective loading functions

"""
Context Builder for NordaGPT Smart Router
==========================================
Loads only the data categories requested by the Smart Router,
instead of loading everything for every query.
"""

import json
import logging
from typing import Dict, Any, List, Optional
from datetime import datetime, timedelta

from database import (
    SessionLocal, Company, Category, CompanyRecommendation,
    NordaEvent, Classified, ForumTopic, ForumReply,
    CompanyPerson, Person, User, CompanySocialMedia,
    GBPAudit, CompanyWebsiteAnalysis, ZOPKNews,
    UserCompanyPermissions
)
from sqlalchemy import func, desc

logger = logging.getLogger(__name__)


def _company_to_compact_dict(company) -> Dict:
    """Convert company to compact dict for AI context. Mirrors nordabiz_chat.py format."""
    return {
        'name': company.name,
        'cat': company.category.name if company.category else None,
        'profile': f'/firma/{company.slug}',
        'desc': company.description_short,
        'about': company.description_full[:500] if company.description_full else None,
        'svc': company.services,
        'comp': company.competencies,
        'web': company.website,
        'tel': company.phone,
        'mail': company.email,
        'city': company.city,
    }


def build_selective_context(
    data_needed: List[str],
    conversation_id: int,
    current_message: str,
    user_context: Optional[Dict] = None
) -> Dict[str, Any]:
    """
    Build AI context with only the requested data categories.

    Args:
        data_needed: List of category strings from Smart Router, e.g.:
            ["companies_all", "companies_filtered:IT", "companies_single:termo",
             "events", "news", "classifieds", "forum", "company_people",
             "registered_users", "social_media", "audits"]
        conversation_id: Current conversation ID for history
        current_message: User's message text
        user_context: User identity dict

    Returns:
        Context dict compatible with nordabiz_chat.py _query_ai()
    """
    db = SessionLocal()
    context = {}

    try:
        # Always load: basic stats and conversation history
        active_companies = db.query(Company).filter_by(status='active').all()
        context['total_companies'] = len(active_companies)

        categories = db.query(Category).all()
        context['categories'] = [
            {'name': c.name, 'slug': c.slug, 'company_count': len([co for co in active_companies if co.category_id == c.id])}
            for c in categories
        ]

        # Conversation history (always loaded)
        from database import AIChatMessage, AIChatConversation
        messages = db.query(AIChatMessage).filter_by(
            conversation_id=conversation_id
        ).order_by(AIChatMessage.created_at.desc()).limit(10).all()
        context['recent_messages'] = [
            {'role': msg.role, 'content': msg.content}
            for msg in reversed(messages)
        ]

        # Selective data loading based on router decision
        for category in data_needed:
            if category == 'companies_all':
                context['all_companies'] = [_company_to_compact_dict(c) for c in active_companies]

            elif category.startswith('companies_filtered:'):
                filter_cat = category.split(':', 1)[1]
                filtered = [c for c in active_companies
                           if c.category and c.category.name.lower() == filter_cat.lower()]
                context['all_companies'] = [_company_to_compact_dict(c) for c in filtered]

            elif category.startswith('companies_single:'):
                search = category.split(':', 1)[1].lower()
                matched = [c for c in active_companies
                          if search in c.name.lower() or search in (c.slug or '')]
                context['all_companies'] = [_company_to_compact_dict(c) for c in matched[:5]]

            elif category == 'events':
                events = db.query(NordaEvent).filter(
                    NordaEvent.event_date >= datetime.now(),
                    NordaEvent.event_date <= datetime.now() + timedelta(days=60)
                ).order_by(NordaEvent.event_date).all()
                context['upcoming_events'] = [
                    {'title': e.title, 'date': str(e.event_date), 'type': e.event_type,
                     'location': e.location, 'url': f'/kalendarz/{e.id}'}
                    for e in events
                ]

            elif category == 'news':
                news = db.query(ZOPKNews).filter(
                    ZOPKNews.published_at >= datetime.now() - timedelta(days=30),
                    ZOPKNews.status == 'approved'
                ).order_by(ZOPKNews.published_at.desc()).limit(10).all()
                context['recent_news'] = [
                    {'title': n.title, 'summary': n.ai_summary, 'date': str(n.published_at),
                     'source': n.source_name, 'url': n.source_url}
                    for n in news
                ]

            elif category == 'classifieds':
                classifieds = db.query(Classified).filter(
                    Classified.status == 'active',
                    Classified.is_test == False
                ).order_by(Classified.created_at.desc()).limit(20).all()
                context['classifieds'] = [
                    {'type': c.listing_type, 'title': c.title, 'description': c.description,
                     'company': c.company.name if c.company else None,
                     'budget': c.budget_text, 'url': f'/b2b/{c.id}'}
                    for c in classifieds
                ]

            elif category == 'forum':
                topics = db.query(ForumTopic).filter(
                    ForumTopic.is_test == False
                ).order_by(ForumTopic.created_at.desc()).limit(15).all()
                context['forum_topics'] = [
                    {'title': t.title, 'content': t.content[:300],
                     'author': t.author.name if t.author else None,
                     'replies': t.reply_count, 'url': f'/forum/{t.slug}'}
                    for t in topics
                ]

            elif category == 'company_people':
                people_query = db.query(CompanyPerson).join(Person).join(Company).filter(
                    Company.status == 'active'
                ).all()
                grouped = {}
                for cp in people_query:
                    cname = cp.company.name
                    if cname not in grouped:
                        grouped[cname] = []
                    grouped[cname].append({
                        'name': cp.person.name,
                        'role': cp.role_description,
                        'shares': cp.shares_value
                    })
                context['company_people'] = grouped

            elif category == 'registered_users':
                users = db.query(User).filter(
                    User.is_active == True,
                    User.company_id.isnot(None)
                ).all()
                grouped = {}
                for u in users:
                    cname = u.company.name if u.company else 'Brak firmy'
                    if cname not in grouped:
                        grouped[cname] = []
                    grouped[cname].append({
                        'name': u.name, 'email': u.email,
                        'role': u.company_role, 'member': u.is_norda_member
                    })
                context['registered_users'] = grouped

            elif category == 'social_media':
                socials = db.query(CompanySocialMedia).filter_by(is_valid=True).all()
                grouped = {}
                for s in socials:
                    cname = s.company.name if s.company else 'Unknown'
                    if cname not in grouped:
                        grouped[cname] = []
                    grouped[cname].append({
                        'platform': s.platform, 'url': s.url,
                        'followers': s.followers_count
                    })
                context['company_social_media'] = grouped

            elif category == 'audits':
                # GBP audits
                gbp = db.query(GBPAudit).order_by(GBPAudit.created_at.desc()).all()
                seen = set()
                gbp_unique = []
                for g in gbp:
                    if g.company_id not in seen:
                        seen.add(g.company_id)
                        gbp_unique.append({
                            'company': g.company.name if g.company else None,
                            'score': g.overall_score, 'reviews': g.total_reviews,
                            'rating': g.average_rating
                        })
                context['gbp_audits'] = gbp_unique

                # SEO audits
                seo = db.query(CompanyWebsiteAnalysis).all()
                context['seo_audits'] = [
                    {'company': s.company.name if s.company else None,
                     'seo': s.seo_score, 'performance': s.performance_score}
                    for s in seo
                ]

        # If no companies were loaded by any category, load a minimal summary
        if 'all_companies' not in context:
            context['all_companies'] = []

    finally:
        db.close()

    return context
  • Step 2: Verify syntax
python3 -m py_compile context_builder.py && echo "OK"
  • Step 3: Commit
git add context_builder.py
git commit -m "feat(nordagpt): add context_builder.py — selective data loading for smart router"

Task 5: Create smart_router.py — query classification

Files:

  • Create: smart_router.py

  • Step 1: Create smart_router.py

"""
Smart Router for NordaGPT
==========================
Classifies query complexity and selects which data categories to load.
Uses Gemini 3.1 Flash-Lite for fast, cheap classification (~1-2s).
"""

import json
import logging
import time
from typing import Dict, Any, List, Optional

logger = logging.getLogger(__name__)

# Keyword-based fast routing (no API call needed)
FAST_ROUTES = {
    'companies_all': ['wszystkie firmy', 'ile firm', 'lista firm', 'katalog', 'porównaj firmy'],
    'events': ['wydarzenie', 'spotkanie', 'kalendarz', 'konferencja', 'szkolenie', 'kiedy'],
    'news': ['aktualności', 'nowości', 'wiadomości', 'pej', 'atom', 'elektrownia', 'zopk'],
    'classifieds': ['ogłoszenie', 'b2b', 'zlecenie', 'oferta', 'szukam', 'oferuję'],
    'forum': ['forum', 'dyskusja', 'temat', 'wątek', 'post'],
    'company_people': ['zarząd', 'krs', 'właściciel', 'prezes', 'udziały', 'wspólnik'],
    'registered_users': ['użytkownik', 'kto jest', 'profil', 'zarejestrowany', 'członek'],
    'social_media': ['facebook', 'instagram', 'linkedin', 'social media', 'media społeczn'],
    'audits': ['seo', 'google', 'gbp', 'opinie', 'ocena', 'pageSpeed'],
}

# Model selection by complexity
MODEL_MAP = {
    'simple': {'model': '3.1-flash-lite', 'thinking': 'minimal'},
    'medium': {'model': '3-flash', 'thinking': 'low'},
    'complex': {'model': '3-flash', 'thinking': 'high'},
}

ROUTER_PROMPT = """Jesteś routerem zapytań. Przeanalizuj pytanie i zdecyduj jakie dane potrzebne.

Użytkownik: {user_name} z firmy {company_name}
Pytanie: {message}

Zwróć TYLKO JSON (bez markdown):
{{
  "complexity": "simple|medium|complex",
  "data_needed": ["lista kategorii z poniższych"]
}}

Kategorie:
- companies_all — wszystkie firmy (porównania, przeglądy, "ile firm")
- companies_filtered:KATEGORIA — firmy z kategorii (np. companies_filtered:IT)
- companies_single:NAZWA — jedna firma (np. companies_single:termo)
- events — nadchodzące wydarzenia
- news — aktualności, PEJ, ZOPK
- classifieds — ogłoszenia B2B
- forum — tematy forum
- company_people — zarząd, KRS, udziałowcy
- registered_users — użytkownicy portalu
- social_media — profile social media firm
- audits — wyniki SEO/GBP

Zasady:
- "simple" = jedno pytanie o konkretną rzecz (telefon, adres, link)
- "medium" = porównanie, lista, filtrowanie
- "complex" = analiza, strategia, rekomendacje
- Wybierz MINIMUM kategorii. Nie ładuj niepotrzebnych danych.
- Jeśli pytanie dotyczy konkretnej firmy, użyj companies_single:nazwa
- Pytania ogólne o użytkownika (kim jestem, co wiesz) = [] (dane z profilu wystarczą)
"""


def route_query_fast(message: str, user_context: Optional[Dict] = None) -> Dict[str, Any]:
    """
    Fast keyword-based routing. No API call.
    Returns routing decision or None if uncertain (needs AI router).
    """
    msg_lower = message.lower()

    # Check for personal questions — no data needed
    personal_patterns = ['kim jestem', 'co wiesz o mnie', 'mój profil', 'moje dane']
    if any(p in msg_lower for p in personal_patterns):
        return {
            'complexity': 'simple',
            'data_needed': [],
            'model': '3.1-flash-lite',
            'thinking': 'minimal',
            'routed_by': 'fast'
        }

    # Check for greetings — no data needed
    greeting_patterns = ['cześć', 'hej', 'witam', 'dzień dobry', 'siema', 'hello']
    if any(msg_lower.strip().startswith(p) for p in greeting_patterns) and len(message) < 30:
        return {
            'complexity': 'simple',
            'data_needed': [],
            'model': '3.1-flash-lite',
            'thinking': 'minimal',
            'routed_by': 'fast'
        }

    # Check keyword matches
    matched_categories = []
    for category, keywords in FAST_ROUTES.items():
        if any(kw in msg_lower for kw in keywords):
            matched_categories.append(category)

    # Check for specific company name mention
    # Simple heuristic: if message has quotes or specific capitalized words
    if not matched_categories:
        # Can't determine — return None to trigger AI router
        return None

    # Determine complexity
    if len(matched_categories) <= 1 and len(message) < 80:
        complexity = 'simple'
    elif len(matched_categories) <= 2:
        complexity = 'medium'
    else:
        complexity = 'complex'

    model_config = MODEL_MAP[complexity]
    return {
        'complexity': complexity,
        'data_needed': matched_categories,
        'model': model_config['model'],
        'thinking': model_config['thinking'],
        'routed_by': 'fast'
    }


def route_query_ai(
    message: str,
    user_context: Optional[Dict] = None,
    gemini_service=None
) -> Dict[str, Any]:
    """
    AI-powered routing using Flash-Lite. Called when fast routing is uncertain.
    """
    if not gemini_service:
        # Fallback: load everything
        return _fallback_route()

    user_name = user_context.get('user_name', 'Nieznany') if user_context else 'Nieznany'
    company_name = user_context.get('company_name', 'brak') if user_context else 'brak'

    prompt = ROUTER_PROMPT.format(
        user_name=user_name,
        company_name=company_name,
        message=message
    )

    try:
        start = time.time()
        response = gemini_service.generate_text(
            prompt=prompt,
            temperature=0.1,
            max_tokens=200,
            model='gemini-3.1-flash-lite-preview',
            thinking_level='minimal',
            feature='smart_router'
        )
        latency = int((time.time() - start) * 1000)
        logger.info(f"Smart Router AI response in {latency}ms: {response[:200]}")

        # Parse JSON from response
        # Handle potential markdown wrapping
        text = response.strip()
        if text.startswith('```'):
            text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()

        result = json.loads(text)
        complexity = result.get('complexity', 'medium')
        model_config = MODEL_MAP.get(complexity, MODEL_MAP['medium'])

        return {
            'complexity': complexity,
            'data_needed': result.get('data_needed', []),
            'model': model_config['model'],
            'thinking': model_config['thinking'],
            'routed_by': 'ai',
            'router_latency_ms': latency
        }

    except (json.JSONDecodeError, KeyError, Exception) as e:
        logger.warning(f"Smart Router AI failed: {e}, falling back to full context")
        return _fallback_route()


def route_query(
    message: str,
    user_context: Optional[Dict] = None,
    gemini_service=None
) -> Dict[str, Any]:
    """
    Main entry point. Tries fast routing first, falls back to AI routing.
    """
    # Try fast keyword-based routing
    result = route_query_fast(message, user_context)
    if result is not None:
        logger.info(f"Smart Router FAST: complexity={result['complexity']}, data={result['data_needed']}")
        return result

    # Fall back to AI routing
    result = route_query_ai(message, user_context, gemini_service)
    logger.info(f"Smart Router AI: complexity={result['complexity']}, data={result['data_needed']}")
    return result


def _fallback_route() -> Dict[str, Any]:
    """Fallback: load everything, use default model. Safe but slow."""
    return {
        'complexity': 'medium',
        'data_needed': [
            'companies_all', 'events', 'news', 'classifieds',
            'forum', 'company_people', 'registered_users'
        ],
        'model': '3-flash',
        'thinking': 'low',
        'routed_by': 'fallback'
    }
  • Step 2: Verify syntax
python3 -m py_compile smart_router.py && echo "OK"
  • Step 3: Commit
git add smart_router.py
git commit -m "feat(nordagpt): add smart_router.py — fast keyword routing + AI fallback"

Task 6: Integrate Smart Router into nordabiz_chat.py

Files:

  • Modify: nordabiz_chat.py:163-282, 347-643, 890-1365

  • Step 1: Add imports at top of nordabiz_chat.py

After existing imports (around line 30), add:

from smart_router import route_query
from context_builder import build_selective_context
  • Step 2: Modify send_message() to use Smart Router

In send_message(), replace the call to _build_conversation_context() and _query_ai() (around lines 236-239). The key change: use the router to decide model and data, then use context_builder for selective loading.

Find the section where context is built and AI is queried (around lines 236-241):

# Before (approximately lines 236-241):
# context = self._build_conversation_context(db, conversation, original_message)
# ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level, user_context=user_context)

# After:
# Smart Router — classify query and select data + model
route_decision = route_query(
    message=original_message,
    user_context=user_context,
    gemini_service=self.gemini_service
)

# Override model and thinking based on router decision
effective_model = route_decision.get('model', '3-flash')
effective_thinking = route_decision.get('thinking', thinking_level)

# Build selective context (only requested data categories)
context = build_selective_context(
    data_needed=route_decision.get('data_needed', []),
    conversation_id=conversation.id,
    current_message=original_message,
    user_context=user_context
)

# Use the original _query_ai but with router-selected parameters
ai_response_text = self._query_ai(
    context, original_message,
    user_id=user_id,
    thinking_level=effective_thinking,
    user_context=user_context
)

Note: Keep _build_conversation_context() and full _query_ai() intact as fallback. The router's _fallback_route() loads all data, so it's safe.

  • Step 3: Log routing decisions

After the route_query call, add logging:

logger.info(
    f"NordaGPT Router: user={user_context.get('user_name') if user_context else '?'}, "
    f"complexity={route_decision['complexity']}, model={effective_model}, "
    f"thinking={effective_thinking}, data={route_decision['data_needed']}, "
    f"routed_by={route_decision.get('routed_by')}"
)
  • Step 4: Update the GeminiService call in _query_ai() to use effective model

Currently _query_ai() uses self.gemini_service which has a fixed model. We need to pass the router-selected model to the generate_text call. In _query_ai(), around line 1352, modify:

# Before:
response = self.gemini_service.generate_text(
    prompt=full_prompt,
    temperature=0.7,
    thinking_level=thinking_level,
    user_id=user_id,
    feature='chat'
)

# After:
response = self.gemini_service.generate_text(
    prompt=full_prompt,
    temperature=0.7,
    thinking_level=thinking_level,
    user_id=user_id,
    feature='chat',
    model=route_decision.get('model') if hasattr(self, '_current_route_decision') else None
)

Actually, a cleaner approach — pass the model through context:

In send_message(), add to context before calling _query_ai():

context['_route_decision'] = route_decision

In _query_ai(), read it at the generate_text call:

route = context.get('_route_decision', {})
effective_model_id = None
model_alias = route.get('model')
if model_alias:
    from gemini_service import GEMINI_MODELS
    effective_model_id = GEMINI_MODELS.get(model_alias)

response = self.gemini_service.generate_text(
    prompt=full_prompt,
    temperature=0.7,
    thinking_level=thinking_level,
    user_id=user_id,
    feature='chat',
    model=effective_model_id
)
  • Step 5: Verify syntax
python3 -m py_compile nordabiz_chat.py && echo "OK"
  • Step 6: Commit
git add nordabiz_chat.py
git commit -m "feat(nordagpt): integrate smart router — selective context loading + adaptive model selection"

Task 7: Deploy Phase 2 and verify

  • Step 1: Push and deploy to staging
git push origin master && git push inpi master
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
  • Step 2: Test on staging — verify routing works

Test simple query: "Jaki jest telefon do TERMO?" — should be fast (2-3s), Flash-Lite model. Test medium query: "Porównaj firmy budowlane w Izbie" — should load companies_all, medium speed. Test complex query: "Jakie firmy mogłyby współpracować przy projekcie PEJ?" — should use full context.

Check logs for routing decisions:

ssh maciejpi@10.22.68.248 "journalctl -u nordabiznes -n 30 --no-pager | grep 'Router'"
  • Step 3: Deploy to production
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3

Phase 3: Streaming Responses (Tasks 8-10)

Task 8: Add streaming endpoint in Flask

Files:

  • Modify: blueprints/chat/routes.py

  • Modify: nordabiz_chat.py

  • Step 1: Add SSE streaming endpoint

In blueprints/chat/routes.py, add a new route after chat_send_message() (after line ~309):

@bp.route('/api/chat/<int:conversation_id>/message/stream', methods=['POST'])
@login_required
@member_required
def chat_send_message_stream(conversation_id):
    """Send message to AI chat with streaming response (SSE)"""
    from flask import Response, stream_with_context
    import json as json_module

    data = request.get_json()
    if not data or not data.get('message', '').strip():
        return jsonify({'error': 'Wiadomość nie może być pusta'}), 400

    message = data['message'].strip()

    # Check limits
    from nordabiz_chat import check_user_limits
    limit_result = check_user_limits(current_user.id, current_user.email)
    if limit_result.get('limited'):
        return jsonify({'error': 'Przekroczono limit', 'limit_info': limit_result}), 429

    # Build user context
    user_context = {
        'user_id': current_user.id,
        'user_name': current_user.name,
        'user_email': current_user.email,
        'company_name': current_user.company.name if current_user.company else None,
        'company_id': current_user.company.id if current_user.company else None,
        'company_category': current_user.company.category.name if current_user.company and current_user.company.category else None,
        'company_role': current_user.company_role or 'MEMBER',
        'is_norda_member': current_user.is_norda_member,
        'chamber_role': current_user.chamber_role,
        'member_since': current_user.created_at.strftime('%Y-%m-%d') if current_user.created_at else None,
    }

    model_choice = data.get('model') or session.get('chat_model', 'flash')
    model_key = '3-flash' if model_choice == 'flash' else '3-pro'

    def generate():
        try:
            chat_engine = NordaBizChatEngine(model=model_key)
            for chunk in chat_engine.send_message_stream(
                conversation_id=conversation_id,
                user_message=message,
                user_id=current_user.id,
                user_context=user_context
            ):
                yield f"data: {json_module.dumps(chunk, ensure_ascii=False)}\n\n"
        except PermissionError:
            yield f"data: {json_module.dumps({'type': 'error', 'content': 'Brak dostępu do tej konwersacji'})}\n\n"
        except Exception as e:
            logger.error(f"Streaming error: {e}")
            yield f"data: {json_module.dumps({'type': 'error', 'content': 'Wystąpił błąd'})}\n\n"

    return Response(
        stream_with_context(generate()),
        mimetype='text/event-stream',
        headers={
            'Cache-Control': 'no-cache',
            'X-Accel-Buffering': 'no',  # Disable Nginx buffering
        }
    )
  • Step 2: Add send_message_stream() to NordaBizChatEngine

In nordabiz_chat.py, add a new method after send_message() (after line ~282):

def send_message_stream(
    self,
    conversation_id: int,
    user_message: str,
    user_id: int,
    user_context: Optional[Dict[str, Any]] = None
):
    """
    Generator that yields streaming chunks for SSE.
    Yields dicts: {'type': 'thinking'|'token'|'done'|'error', 'content': '...'}
    """
    import time

    db = SessionLocal()
    try:
        conversation = db.query(AIChatConversation).filter_by(
            id=conversation_id, user_id=user_id
        ).first()
        if not conversation:
            yield {'type': 'error', 'content': 'Konwersacja nie znaleziona'}
            return

        # Save user message
        original_message = user_message
        sanitized = self._sanitize_message(user_message)
        user_msg = AIChatMessage(
            conversation_id=conversation_id,
            role='user',
            content=sanitized
        )
        db.add(user_msg)
        db.commit()

        # Smart Router
        route_decision = route_query(
            message=original_message,
            user_context=user_context,
            gemini_service=self.gemini_service
        )

        yield {'type': 'thinking', 'content': 'Analizuję pytanie...'}

        # Build selective context
        context = build_selective_context(
            data_needed=route_decision.get('data_needed', []),
            conversation_id=conversation.id,
            current_message=original_message,
            user_context=user_context
        )
        context['_route_decision'] = route_decision

        # Build prompt (reuse _query_ai logic for prompt building)
        full_prompt = self._build_prompt(context, original_message, user_context, route_decision.get('thinking', 'low'))

        # Get effective model
        from gemini_service import GEMINI_MODELS
        model_alias = route_decision.get('model', '3-flash')
        effective_model = GEMINI_MODELS.get(model_alias, self.model_name)

        # Stream from Gemini
        start_time = time.time()
        stream_response = self.gemini_service.generate_text(
            prompt=full_prompt,
            temperature=0.7,
            stream=True,
            thinking_level=route_decision.get('thinking', 'low'),
            user_id=user_id,
            feature='chat_stream',
            model=effective_model
        )

        full_text = ""
        for chunk in stream_response:
            if hasattr(chunk, 'text') and chunk.text:
                full_text += chunk.text
                yield {'type': 'token', 'content': chunk.text}

        latency_ms = int((time.time() - start_time) * 1000)

        # Save AI response to DB
        ai_msg = AIChatMessage(
            conversation_id=conversation_id,
            role='assistant',
            content=full_text,
            latency_ms=latency_ms
        )
        db.add(ai_msg)
        conversation.updated_at = datetime.now()
        conversation.message_count = (conversation.message_count or 0) + 2
        db.commit()

        yield {
            'type': 'done',
            'message_id': ai_msg.id,
            'latency_ms': latency_ms,
            'model': model_alias,
            'complexity': route_decision.get('complexity')
        }

    except Exception as e:
        logger.error(f"Stream error: {e}", exc_info=True)
        yield {'type': 'error', 'content': 'Wystąpił błąd podczas generowania odpowiedzi'}
    finally:
        db.close()
  • Step 3: Extract prompt building into reusable method

Add a _build_prompt() method to NordaBizChatEngine that extracts prompt construction from _query_ai(). This method builds the full prompt string without calling Gemini:

def _build_prompt(
    self,
    context: Dict[str, Any],
    user_message: str,
    user_context: Optional[Dict[str, Any]] = None,
    thinking_level: str = 'low'
) -> str:
    """Build the full prompt string. Extracted from _query_ai() for reuse in streaming."""
    # Build user identity section
    user_identity = ""
    if user_context:
        user_identity = f"""
# AKTUALNY UŻYTKOWNIK
Rozmawiasz z: {user_context.get('user_name', 'Nieznany')}
Firma: {user_context.get('company_name', 'brak')} — kategoria: {user_context.get('company_category', 'brak')}
Rola w firmie: {user_context.get('company_role', 'MEMBER')}
Członek Izby: {'tak' if user_context.get('is_norda_member') else 'nie'}
Rola w Izbie: {user_context.get('chamber_role') or '—'}
Na portalu od: {user_context.get('member_since', 'nieznana data')}
"""

    # Reuse the existing system_prompt from _query_ai() lines 922-1134
    # This is the same static prompt — extract it to a class attribute or method
    # For now, call _query_ai's prompt logic
    # NOTE: In implementation, refactor the static prompt into a separate method
    # to avoid duplication. The key point is that _build_prompt returns the
    # same prompt string that _query_ai would build.

    # ... (reuse existing system prompt construction logic) ...

    return full_prompt

Implementation note: The actual implementation should refactor _query_ai() to call _build_prompt() internally, then the streaming method also calls _build_prompt(). This avoids prompt duplication.

  • Step 4: Verify syntax
python3 -m py_compile nordabiz_chat.py && python3 -m py_compile blueprints/chat/routes.py && echo "OK"
  • Step 5: Commit
git add nordabiz_chat.py blueprints/chat/routes.py
git commit -m "feat(nordagpt): add streaming SSE endpoint + send_message_stream method"

Task 9: Frontend streaming UI

Files:

  • Modify: templates/chat.html

  • Step 1: Add streaming sendMessage function

In templates/chat.html, replace the existing sendMessage() function (lines 2373-2454) with a streaming version:

async function sendMessage() {
    const input = document.getElementById('messageInput');
    const message = input.value.trim();
    if (!message || isSending) return;

    isSending = true;
    document.getElementById('sendBtn').disabled = true;
    input.value = '';
    autoResizeTextarea();

    // Add user message to chat
    addMessage('user', message);

    // Create conversation if needed
    if (!currentConversationId) {
        try {
            const startRes = await fetch('/api/chat/start', {
                method: 'POST',
                headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
                body: JSON.stringify({title: message.substring(0, 50)})
            });
            const startData = await startRes.json();
            currentConversationId = startData.conversation_id;
        } catch (e) {
            addMessage('assistant', 'Błąd tworzenia konwersacji.');
            isSending = false;
            document.getElementById('sendBtn').disabled = false;
            return;
        }
    }

    // Add empty assistant bubble with thinking animation
    const msgDiv = document.createElement('div');
    msgDiv.className = 'message assistant';
    msgDiv.innerHTML = `
        <div class="message-avatar">AI</div>
        <div class="message-content">
            <div class="thinking-dots"><span>.</span><span>.</span><span>.</span></div>
        </div>
    `;
    document.getElementById('chatMessages').appendChild(msgDiv);
    scrollToBottom();

    const contentDiv = msgDiv.querySelector('.message-content');

    try {
        const response = await fetch(`/api/chat/${currentConversationId}/message/stream`, {
            method: 'POST',
            headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
            body: JSON.stringify({message: message, model: currentModel})
        });

        if (response.status === 429) {
            contentDiv.innerHTML = '';
            contentDiv.textContent = 'Przekroczono limit zapytań.';
            showLimitBanner();
            isSending = false;
            document.getElementById('sendBtn').disabled = false;
            return;
        }

        const reader = response.body.getReader();
        const decoder = new TextDecoder();
        let fullText = '';
        let thinkingRemoved = false;

        while (true) {
            const {done, value} = await reader.read();
            if (done) break;

            const text = decoder.decode(value, {stream: true});
            const lines = text.split('\n');

            for (const line of lines) {
                if (!line.startsWith('data: ')) continue;
                try {
                    const chunk = JSON.parse(line.slice(6));

                    if (chunk.type === 'thinking') {
                        // Keep thinking dots visible
                        continue;
                    }

                    if (chunk.type === 'token') {
                        if (!thinkingRemoved) {
                            contentDiv.innerHTML = '';
                            thinkingRemoved = true;
                        }
                        fullText += chunk.content;
                        contentDiv.innerHTML = formatMessage(fullText);
                        scrollToBottom();
                    }

                    if (chunk.type === 'done') {
                        // Add tech info badge
                        if (chunk.latency_ms) {
                            const badge = document.createElement('div');
                            badge.className = 'thinking-info-badge';
                            badge.textContent = `${chunk.model || 'AI'} · ${(chunk.latency_ms/1000).toFixed(1)}s`;
                            msgDiv.appendChild(badge);
                        }
                        loadConversations();
                    }

                    if (chunk.type === 'error') {
                        contentDiv.innerHTML = '';
                        contentDiv.textContent = chunk.content || 'Wystąpił błąd';
                    }
                } catch (e) {
                    // Skip malformed chunks
                }
            }
        }
    } catch (e) {
        contentDiv.innerHTML = '';
        contentDiv.textContent = 'Błąd połączenia z serwerem.';
    }

    isSending = false;
    document.getElementById('sendBtn').disabled = false;
}
  • Step 2: Add CSS for thinking animation

In templates/chat.html, in the {% block extra_css %} section, add:

.thinking-dots {
    display: flex;
    gap: 4px;
    padding: 8px 0;
}

.thinking-dots span {
    animation: thinkBounce 1.4s infinite ease-in-out both;
    font-size: 1.5rem;
    color: var(--text-secondary);
}

.thinking-dots span:nth-child(1) { animation-delay: -0.32s; }
.thinking-dots span:nth-child(2) { animation-delay: -0.16s; }
.thinking-dots span:nth-child(3) { animation-delay: 0s; }

@keyframes thinkBounce {
    0%, 80%, 100% { transform: scale(0); }
    40% { transform: scale(1); }
}
  • Step 3: Verify locally and commit
python3 -m py_compile app.py && echo "OK"
git add templates/chat.html
git commit -m "feat(nordagpt): streaming UI — word-by-word response with thinking animation"

Task 10: Deploy Phase 3 and verify streaming

  • Step 1: Check Nginx/NPM config for SSE support

SSE requires Nginx to NOT buffer the response. The streaming endpoint sets X-Accel-Buffering: no header. Verify NPM custom config allows this:

ssh maciejpi@57.128.200.27 "cat /etc/nginx/sites-enabled/nordabiznes.conf 2>/dev/null || echo 'Using NPM proxy'"

If using NPM, the X-Accel-Buffering: no header should be sufficient. If not, add to NPM custom Nginx config for nordabiznes.pl:

proxy_buffering off;
proxy_cache off;
  • Step 2: Push, deploy to staging, test streaming
git push origin master && git push inpi master
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"

Test on staging: open chat, send message, verify text appears word-by-word.

  • Step 3: Deploy to production
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3

Phase 4: Persistent User Memory (Tasks 11-15)

Task 11: Database migration — memory tables

Files:

  • Create: database/migrations/092_ai_user_memory.sql

  • Create: database/migrations/093_ai_conversation_summary.sql

  • Step 1: Create migration 092

-- 092_ai_user_memory.sql
-- Persistent memory for NordaGPT — per-user facts extracted from conversations

CREATE TABLE IF NOT EXISTS ai_user_memory (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    fact TEXT NOT NULL,
    category VARCHAR(50) DEFAULT 'general',
    source_conversation_id INTEGER REFERENCES ai_chat_conversations(id) ON DELETE SET NULL,
    confidence FLOAT DEFAULT 1.0,
    created_at TIMESTAMP DEFAULT NOW(),
    expires_at TIMESTAMP DEFAULT (NOW() + INTERVAL '12 months'),
    is_active BOOLEAN DEFAULT TRUE
);

CREATE INDEX idx_ai_user_memory_user_active ON ai_user_memory(user_id, is_active, confidence DESC);
CREATE INDEX idx_ai_user_memory_expires ON ai_user_memory(expires_at) WHERE is_active = TRUE;

GRANT ALL ON TABLE ai_user_memory TO nordabiz_app;
GRANT USAGE, SELECT ON SEQUENCE ai_user_memory_id_seq TO nordabiz_app;
  • Step 2: Create migration 093
-- 093_ai_conversation_summary.sql
-- Auto-generated summaries of AI conversations for memory context

CREATE TABLE IF NOT EXISTS ai_conversation_summary (
    id SERIAL PRIMARY KEY,
    conversation_id INTEGER NOT NULL UNIQUE REFERENCES ai_chat_conversations(id) ON DELETE CASCADE,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    summary TEXT NOT NULL,
    key_topics JSONB DEFAULT '[]',
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_ai_conv_summary_user ON ai_conversation_summary(user_id, created_at DESC);

GRANT ALL ON TABLE ai_conversation_summary TO nordabiz_app;
GRANT USAGE, SELECT ON SEQUENCE ai_conversation_summary_id_seq TO nordabiz_app;
  • Step 3: Commit migrations
git add database/migrations/092_ai_user_memory.sql database/migrations/093_ai_conversation_summary.sql
git commit -m "feat(nordagpt): add migrations for user memory and conversation summary tables"

Task 12: Add SQLAlchemy models

Files:

  • Modify: database.py (insert before line 5954)

  • Step 1: Add AIUserMemory model

Insert before the # DATABASE INITIALIZATION comment (line 5954):

class AIUserMemory(Base):
    __tablename__ = 'ai_user_memory'

    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False)
    fact = Column(Text, nullable=False)
    category = Column(String(50), default='general')
    source_conversation_id = Column(Integer, ForeignKey('ai_chat_conversations.id', ondelete='SET NULL'), nullable=True)
    confidence = Column(Float, default=1.0)
    created_at = Column(DateTime, default=datetime.utcnow)
    expires_at = Column(DateTime)
    is_active = Column(Boolean, default=True)

    user = relationship('User')
    source_conversation = relationship('AIChatConversation')


class AIConversationSummary(Base):
    __tablename__ = 'ai_conversation_summary'

    id = Column(Integer, primary_key=True)
    conversation_id = Column(Integer, ForeignKey('ai_chat_conversations.id', ondelete='CASCADE'), nullable=False, unique=True)
    user_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False)
    summary = Column(Text, nullable=False)
    key_topics = Column(JSON, default=list)
    created_at = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow)

    user = relationship('User')
    conversation = relationship('AIChatConversation')
  • Step 2: Verify syntax
python3 -m py_compile database.py && echo "OK"
  • Step 3: Commit
git add database.py
git commit -m "feat(nordagpt): add AIUserMemory and AIConversationSummary ORM models"

Task 13: Create memory_service.py

Files:

  • Create: memory_service.py

  • Step 1: Create memory_service.py

"""
Memory Service for NordaGPT
=============================
Manages persistent per-user memory: fact extraction, storage, retrieval, cleanup.
"""

import json
import logging
from datetime import datetime, timedelta
from typing import Dict, Any, List, Optional

from database import SessionLocal, AIUserMemory, AIConversationSummary, AIChatMessage

logger = logging.getLogger(__name__)

EXTRACT_FACTS_PROMPT = """Na podstawie tej rozmowy wyciągnij kluczowe fakty o użytkowniku {user_name} ({company_name}).

Rozmowa:
{conversation_text}

Istniejące fakty (NIE DUPLIKUJ):
{existing_facts}

Zwróć TYLKO JSON array (bez markdown):
[{{"fact": "...", "category": "interests|needs|contacts|insights"}}]

Zasady:
- Tylko nowe, nietrywialne fakty przydatne w przyszłych rozmowach
- Nie zapisuj: "zapytał o firmę X" (to za mało)
- Zapisuj: "szuka podwykonawców do projektu PEJ w branży elektrycznej"
- Max 3 fakty. Jeśli nie ma nowych faktów, zwróć []
- Kategorie: interests (zainteresowania), needs (potrzeby biznesowe), contacts (kontakty), insights (wnioski/preferencje)
"""

SUMMARIZE_PROMPT = """Podsumuj tę rozmowę w 1-3 zdaniach. Skup się na tym, czego użytkownik szukał i co ustalono.

Rozmowa:
{conversation_text}

Zwróć TYLKO JSON (bez markdown):
{{"summary": "...", "key_topics": ["temat1", "temat2"]}}
"""


def get_user_memory(user_id: int, limit: int = 10) -> List[Dict]:
    """Get active memory facts for a user, sorted by recency and confidence."""
    db = SessionLocal()
    try:
        facts = db.query(AIUserMemory).filter(
            AIUserMemory.user_id == user_id,
            AIUserMemory.is_active == True,
            AIUserMemory.expires_at > datetime.now()
        ).order_by(
            AIUserMemory.confidence.desc(),
            AIUserMemory.created_at.desc()
        ).limit(limit).all()

        return [
            {
                'id': f.id,
                'fact': f.fact,
                'category': f.category,
                'confidence': f.confidence,
                'created_at': f.created_at.isoformat()
            }
            for f in facts
        ]
    finally:
        db.close()


def get_conversation_summaries(user_id: int, limit: int = 5) -> List[Dict]:
    """Get recent conversation summaries for a user."""
    db = SessionLocal()
    try:
        summaries = db.query(AIConversationSummary).filter(
            AIConversationSummary.user_id == user_id
        ).order_by(
            AIConversationSummary.created_at.desc()
        ).limit(limit).all()

        return [
            {
                'summary': s.summary,
                'topics': s.key_topics or [],
                'date': s.created_at.strftime('%Y-%m-%d')
            }
            for s in summaries
        ]
    finally:
        db.close()


def format_memory_for_prompt(user_id: int) -> str:
    """Format user memory and summaries for injection into AI prompt."""
    facts = get_user_memory(user_id)
    summaries = get_conversation_summaries(user_id)

    if not facts and not summaries:
        return ""

    parts = ["\n# PAMIĘĆ O UŻYTKOWNIKU"]

    if facts:
        parts.append("Znane fakty:")
        for f in facts:
            parts.append(f"- [{f['category']}] {f['fact']}")

    if summaries:
        parts.append("\nOstatnie rozmowy:")
        for s in summaries:
            topics = ", ".join(s['topics'][:3]) if s['topics'] else ""
            parts.append(f"- {s['date']}: {s['summary']}" + (f" (tematy: {topics})" if topics else ""))

    parts.append("\nWykorzystuj tę wiedzę do personalizacji odpowiedzi. Nawiązuj do wcześniejszych rozmów gdy to naturalne.")

    return "\n".join(parts)


def extract_facts_async(
    conversation_id: int,
    user_id: int,
    user_context: Dict,
    gemini_service
):
    """
    Extract memory facts from a conversation. Run async after response is sent.
    Uses Flash-Lite for minimal cost.
    """
    db = SessionLocal()
    try:
        # Get conversation messages
        messages = db.query(AIChatMessage).filter_by(
            conversation_id=conversation_id
        ).order_by(AIChatMessage.created_at).all()

        if len(messages) < 2:
            return  # Too short to extract

        conversation_text = "\n".join([
            f"{'Użytkownik' if m.role == 'user' else 'NordaGPT'}: {m.content}"
            for m in messages[-10:]  # Last 10 messages
        ])

        # Get existing facts to avoid duplicates
        existing = db.query(AIUserMemory).filter(
            AIUserMemory.user_id == user_id,
            AIUserMemory.is_active == True
        ).all()
        existing_text = "\n".join([f"- {f.fact}" for f in existing]) or "Brak"

        prompt = EXTRACT_FACTS_PROMPT.format(
            user_name=user_context.get('user_name', 'Nieznany'),
            company_name=user_context.get('company_name', 'brak'),
            conversation_text=conversation_text,
            existing_facts=existing_text
        )

        response = gemini_service.generate_text(
            prompt=prompt,
            temperature=0.1,
            max_tokens=300,
            model='gemini-3.1-flash-lite-preview',
            thinking_level='minimal',
            feature='memory_extraction'
        )

        # Parse response
        text = response.strip()
        if text.startswith('```'):
            text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()

        facts = json.loads(text)
        if not isinstance(facts, list):
            return

        for fact_data in facts[:3]:
            if not fact_data.get('fact'):
                continue
            memory = AIUserMemory(
                user_id=user_id,
                fact=fact_data['fact'],
                category=fact_data.get('category', 'general'),
                source_conversation_id=conversation_id,
                expires_at=datetime.now() + timedelta(days=365)
            )
            db.add(memory)

        db.commit()
        logger.info(f"Extracted {len(facts)} memory facts for user {user_id}")

    except Exception as e:
        logger.warning(f"Memory extraction failed for conversation {conversation_id}: {e}")
        db.rollback()
    finally:
        db.close()


def summarize_conversation_async(
    conversation_id: int,
    user_id: int,
    gemini_service
):
    """Generate or update conversation summary. Run async."""
    db = SessionLocal()
    try:
        messages = db.query(AIChatMessage).filter_by(
            conversation_id=conversation_id
        ).order_by(AIChatMessage.created_at).all()

        if len(messages) < 2:
            return

        conversation_text = "\n".join([
            f"{'Użytkownik' if m.role == 'user' else 'NordaGPT'}: {m.content[:200]}"
            for m in messages[-10:]
        ])

        prompt = SUMMARIZE_PROMPT.format(conversation_text=conversation_text)

        response = gemini_service.generate_text(
            prompt=prompt,
            temperature=0.1,
            max_tokens=200,
            model='gemini-3.1-flash-lite-preview',
            thinking_level='minimal',
            feature='conversation_summary'
        )

        text = response.strip()
        if text.startswith('```'):
            text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()

        result = json.loads(text)

        existing = db.query(AIConversationSummary).filter_by(
            conversation_id=conversation_id
        ).first()

        if existing:
            existing.summary = result.get('summary', existing.summary)
            existing.key_topics = result.get('key_topics', existing.key_topics)
            existing.updated_at = datetime.now()
        else:
            summary = AIConversationSummary(
                conversation_id=conversation_id,
                user_id=user_id,
                summary=result.get('summary', ''),
                key_topics=result.get('key_topics', [])
            )
            db.add(summary)

        db.commit()
        logger.info(f"Summarized conversation {conversation_id}")

    except Exception as e:
        logger.warning(f"Conversation summary failed for {conversation_id}: {e}")
        db.rollback()
    finally:
        db.close()


def delete_user_fact(user_id: int, fact_id: int) -> bool:
    """Soft-delete a memory fact. Returns True if deleted."""
    db = SessionLocal()
    try:
        fact = db.query(AIUserMemory).filter_by(id=fact_id, user_id=user_id).first()
        if fact:
            fact.is_active = False
            db.commit()
            return True
        return False
    finally:
        db.close()
  • Step 2: Verify syntax
python3 -m py_compile memory_service.py && echo "OK"
  • Step 3: Commit
git add memory_service.py
git commit -m "feat(nordagpt): add memory_service.py — fact extraction, summaries, CRUD"

Task 14: Integrate memory into chat flow

Files:

  • Modify: nordabiz_chat.py

  • Modify: blueprints/chat/routes.py

  • Step 1: Inject memory into system prompt

In nordabiz_chat.py, in the _build_prompt() or _query_ai() method, after the user identity block and before the data sections, add memory:

from memory_service import format_memory_for_prompt

# After user_identity block, before data injection:
user_memory_text = ""
if user_context and user_context.get('user_id'):
    user_memory_text = format_memory_for_prompt(user_context['user_id'])

# Prepend to system prompt:
system_prompt = user_identity + user_memory_text + f"""Jesteś pomocnym asystentem..."""
  • Step 2: Trigger async memory extraction after response

In send_message() and send_message_stream(), after saving the AI response, trigger async extraction using threading:

import threading
from memory_service import extract_facts_async, summarize_conversation_async

# After saving AI response to DB (end of send_message/send_message_stream):
# Async memory extraction — don't block the response
def _extract_memory():
    extract_facts_async(conversation_id, user_id, user_context, self.gemini_service)
    # Summarize every 5 messages
    if (conversation.message_count or 0) % 5 == 0:
        summarize_conversation_async(conversation_id, user_id, self.gemini_service)

threading.Thread(target=_extract_memory, daemon=True).start()
  • Step 3: Add memory CRUD API routes

In blueprints/chat/routes.py, add routes for viewing and deleting memory:

@bp.route('/api/chat/memory', methods=['GET'])
@login_required
@member_required
def get_user_memory_api():
    """Get current user's NordaGPT memory facts and summaries"""
    from memory_service import get_user_memory, get_conversation_summaries
    return jsonify({
        'facts': get_user_memory(current_user.id, limit=20),
        'summaries': get_conversation_summaries(current_user.id, limit=10)
    })


@bp.route('/api/chat/memory/<int:fact_id>', methods=['DELETE'])
@login_required
@member_required
def delete_memory_fact(fact_id):
    """Delete a memory fact"""
    from memory_service import delete_user_fact
    if delete_user_fact(current_user.id, fact_id):
        return jsonify({'status': 'ok'})
    return jsonify({'error': 'Nie znaleziono'}), 404
  • Step 4: Verify syntax
python3 -m py_compile nordabiz_chat.py && python3 -m py_compile blueprints/chat/routes.py && echo "OK"
  • Step 5: Commit
git add nordabiz_chat.py blueprints/chat/routes.py
git commit -m "feat(nordagpt): integrate memory into chat — injection, async extraction, CRUD API"

Task 15: Deploy Phase 4 — migrations + code

  • Step 1: Push to remotes
git push origin master && git push inpi master
  • Step 2: Deploy to staging with migrations
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull"
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/092_ai_user_memory.sql"
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/093_ai_conversation_summary.sql"
ssh maciejpi@10.22.68.248 "sudo systemctl restart nordabiznes"
  • Step 3: Test on staging
  1. Open chat, have a conversation about looking for IT companies
  2. Open another chat, ask "o czym rozmawialiśmy?" — verify AI mentions previous topics
  3. Check memory API: curl https://staging.nordabiznes.pl/api/chat/memory (with auth)
  4. Verify facts are extracted
  • Step 4: Deploy to production
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull"
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && DATABASE_URL=\$(grep DATABASE_URL .env | cut -d'=' -f2) /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/092_ai_user_memory.sql"
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && DATABASE_URL=\$(grep DATABASE_URL .env | cut -d'=' -f2) /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/093_ai_conversation_summary.sql"
ssh maciejpi@57.128.200.27 "sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
  • Step 5: Update release notes

Add entry in blueprints/public/routes.py _get_releases().


Post-Implementation Checklist

  • Verify AI greets users by name
  • Verify Smart Router logs show correct classification
  • Verify streaming works on mobile (Android + iOS)
  • Verify memory facts are extracted after conversations
  • Verify memory is private (user A cannot see user B's facts)
  • Verify response times: simple <3s, medium <6s, complex <12s
  • Monitor costs for first week — compare with estimates
  • Send message to Jakub Pornowski confirming speed improvements