Production moved from on-prem VM 249 (10.22.68.249) to OVH VPS (57.128.200.27, inpi-vps-waw01). Updated ALL documentation, slash commands, memory files, architecture docs, and deploy procedures. Added |local_time Jinja filter (UTC→Europe/Warsaw) and converted 155 .strftime() calls across 71 templates so timestamps display in Polish timezone regardless of server timezone. Also includes: created_by_id tracking, abort import fix, ICS calendar fix for missing end times, Pros Poland data cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
64 KiB
NordaGPT Identity, Memory & Performance — Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Transform NordaGPT from an anonymous chatbot into a personalized assistant with user identity, persistent memory, smart routing, and streaming responses.
Architecture: Four-phase rollout: (1) inject user identity into AI prompt, (2) smart router + selective context loading, (3) streaming SSE responses, (4) persistent user memory with async extraction. Each phase is independently deployable and testable.
Tech Stack: Flask 3.0, SQLAlchemy 2.0, PostgreSQL, Google Gemini API (3-Flash, 3.1-Flash-Lite), Server-Sent Events, Jinja2 inline JS.
Spec: docs/superpowers/specs/2026-03-28-nordagpt-identity-memory-design.md
File Structure
New files
| File | Responsibility |
|---|---|
smart_router.py |
Classifies query complexity, selects data categories and model |
memory_service.py |
CRUD for user memory facts + conversation summaries, extraction prompt |
context_builder.py |
Loads selective data from DB based on router decision |
database/migrations/092_ai_user_memory.sql |
Memory + summary tables |
database/migrations/093_ai_conversation_summary.sql |
Summary table |
Modified files
| File | Changes |
|---|---|
database.py |
Add AIUserMemory, AIConversationSummary models (before line 5954) |
nordabiz_chat.py |
Accept user_context, integrate router, selective context, memory injection |
gemini_service.py |
Token counting for streamed responses |
blueprints/chat/routes.py |
Build user_context, add streaming endpoint, memory CRUD routes |
templates/chat.html |
Streaming UI, thinking animation, memory settings panel |
Phase 1: User Identity (Tasks 1-3)
Task 1: Pass user context from route to chat engine
Files:
-
Modify:
blueprints/chat/routes.py:234-309 -
Modify:
nordabiz_chat.py:163-180 -
Step 1: Build user_context dict in chat route
In blueprints/chat/routes.py, modify chat_send_message(). After line 262 (where current_user.id and current_user.email are used for limit check), add user_context construction:
# After line 262, before line 268
# Build user context for AI personalization
user_context = {
'user_id': current_user.id,
'user_name': current_user.name,
'user_email': current_user.email,
'company_name': current_user.company.name if current_user.company else None,
'company_id': current_user.company.id if current_user.company else None,
'company_category': current_user.company.category.name if current_user.company and current_user.company.category else None,
'company_role': current_user.company_role or 'MEMBER',
'is_norda_member': current_user.is_norda_member,
'chamber_role': current_user.chamber_role,
'member_since': current_user.created_at.strftime('%Y-%m-%d') if current_user.created_at else None,
}
- Step 2: Pass user_context to send_message()
In the same function, modify the chat_engine.send_message() call (around line 282):
# Before:
ai_response = chat_engine.send_message(
conversation_id,
user_message=message,
user_id=current_user.id,
thinking_level=thinking_level
)
# After:
ai_response = chat_engine.send_message(
conversation_id,
user_message=message,
user_id=current_user.id,
thinking_level=thinking_level,
user_context=user_context
)
- Step 3: Update send_message() signature in nordabiz_chat.py
In nordabiz_chat.py, modify send_message() at line 163:
# Before:
def send_message(
self,
conversation_id: int,
user_message: str,
user_id: int,
thinking_level: str = 'high'
) -> AIChatMessage:
# After:
def send_message(
self,
conversation_id: int,
user_message: str,
user_id: int,
thinking_level: str = 'high',
user_context: Optional[Dict[str, Any]] = None
) -> AIChatMessage:
Add from typing import Optional, Dict, Any to imports if not already present.
- Step 4: Thread user_context through to _query_ai()
In send_message(), find the call to _query_ai() (around line 239) and add user_context:
# Before:
ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level)
# After:
ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level, user_context=user_context)
- Step 5: Update _query_ai() signature
In nordabiz_chat.py, modify _query_ai() at line 890:
# Before:
def _query_ai(
self,
context: Dict[str, Any],
user_message: str,
user_id: Optional[int] = None,
thinking_level: str = 'high'
) -> str:
# After:
def _query_ai(
self,
context: Dict[str, Any],
user_message: str,
user_id: Optional[int] = None,
thinking_level: str = 'high',
user_context: Optional[Dict[str, Any]] = None
) -> str:
- Step 6: Commit
git add blueprints/chat/routes.py nordabiz_chat.py
git commit -m "refactor(chat): thread user_context from route through to _query_ai"
Task 2: Inject user identity into system prompt
Files:
-
Modify:
nordabiz_chat.py:920-930 -
Step 1: Add user identity block to system prompt
In nordabiz_chat.py, inside _query_ai(), find line ~922 where system_prompt starts. Insert the user identity block BEFORE the main system prompt string (after line 921, before line 922):
# Build user identity section
user_identity = ""
if user_context:
user_identity = f"""
# AKTUALNY UŻYTKOWNIK
Rozmawiasz z: {user_context.get('user_name', 'Nieznany')}
Firma: {user_context.get('company_name', 'brak')} — kategoria: {user_context.get('company_category', 'brak')}
Rola w firmie: {user_context.get('company_role', 'MEMBER')}
Członek Izby Norda Biznes: {'tak' if user_context.get('is_norda_member') else 'nie'}
Rola w Izbie: {user_context.get('chamber_role') or '—'}
Na portalu od: {user_context.get('member_since', 'nieznana data')}
ZASADY PERSONALIZACJI:
- Zwracaj się do użytkownika po imieniu (pierwsze słowo z imienia i nazwiska)
- W pierwszej wiadomości konwersacji przywitaj się: "Cześć [imię], w czym mogę pomóc?"
- Na pytania "co wiesz o mnie?" / "kim jestem?" — wypisz powyższe dane + powiązania firmowe z bazy
- Uwzględniaj kontekst firmy użytkownika w odpowiedziach (np. sugeruj partnerów z komplementarnych branż)
- NIE ujawniaj danych technicznych (user_id, company_id, rola systemowa)
"""
- Step 2: Prepend user_identity to system_prompt
Find where system_prompt is first assigned (line 922) and prepend:
# Line 922 area - the system_prompt f-string starts here
system_prompt = user_identity + f"""Jesteś pomocnym asystentem portalu Norda Biznes...
This is a minimal change — just concatenate user_identity (which is empty string if no context) before the existing prompt.
- Step 3: Verify syntax compiles
python3 -m py_compile nordabiz_chat.py && echo "OK"
- Step 4: Test locally
Start local dev server and send a chat message. Verify in logs that the prompt now contains the user identity block. Check that the AI greets by name.
python3 app.py
# In another terminal:
curl -X POST http://localhost:5000/api/chat/1/message \
-H "Content-Type: application/json" \
-d '{"message": "Kim jestem?"}'
(Note: requires auth cookie — easier to test via browser)
- Step 5: Commit
git add nordabiz_chat.py
git commit -m "feat(nordagpt): inject user identity into AI system prompt — personalized greetings and context"
Task 3: Deploy Phase 1 and verify
Files: None (deployment only)
- Step 1: Push to remotes
git push origin master && git push inpi master
- Step 2: Deploy to staging
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
- Step 3: Test on staging — verify AI greets by name
Open https://staging.nordabiznes.pl/chat, start new conversation, type "Cześć". Verify AI responds with your name.
Type "Co wiesz o mnie?" — verify AI lists your profile data.
- Step 4: Deploy to production
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
- Step 5: Commit deployment notes (update release_notes in routes.py)
Add new release entry in blueprints/public/routes.py _get_releases() function.
Phase 2: Smart Router + Context Builder (Tasks 4-7)
Task 4: Create context_builder.py — selective data loading
Files:
-
Create:
context_builder.py -
Step 1: Create context_builder.py with selective loading functions
"""
Context Builder for NordaGPT Smart Router
==========================================
Loads only the data categories requested by the Smart Router,
instead of loading everything for every query.
"""
import json
import logging
from typing import Dict, Any, List, Optional
from datetime import datetime, timedelta
from database import (
SessionLocal, Company, Category, CompanyRecommendation,
NordaEvent, Classified, ForumTopic, ForumReply,
CompanyPerson, Person, User, CompanySocialMedia,
GBPAudit, CompanyWebsiteAnalysis, ZOPKNews,
UserCompanyPermissions
)
from sqlalchemy import func, desc
logger = logging.getLogger(__name__)
def _company_to_compact_dict(company) -> Dict:
"""Convert company to compact dict for AI context. Mirrors nordabiz_chat.py format."""
return {
'name': company.name,
'cat': company.category.name if company.category else None,
'profile': f'/firma/{company.slug}',
'desc': company.description_short,
'about': company.description_full[:500] if company.description_full else None,
'svc': company.services,
'comp': company.competencies,
'web': company.website,
'tel': company.phone,
'mail': company.email,
'city': company.city,
}
def build_selective_context(
data_needed: List[str],
conversation_id: int,
current_message: str,
user_context: Optional[Dict] = None
) -> Dict[str, Any]:
"""
Build AI context with only the requested data categories.
Args:
data_needed: List of category strings from Smart Router, e.g.:
["companies_all", "companies_filtered:IT", "companies_single:termo",
"events", "news", "classifieds", "forum", "company_people",
"registered_users", "social_media", "audits"]
conversation_id: Current conversation ID for history
current_message: User's message text
user_context: User identity dict
Returns:
Context dict compatible with nordabiz_chat.py _query_ai()
"""
db = SessionLocal()
context = {}
try:
# Always load: basic stats and conversation history
active_companies = db.query(Company).filter_by(status='active').all()
context['total_companies'] = len(active_companies)
categories = db.query(Category).all()
context['categories'] = [
{'name': c.name, 'slug': c.slug, 'company_count': len([co for co in active_companies if co.category_id == c.id])}
for c in categories
]
# Conversation history (always loaded)
from database import AIChatMessage, AIChatConversation
messages = db.query(AIChatMessage).filter_by(
conversation_id=conversation_id
).order_by(AIChatMessage.created_at.desc()).limit(10).all()
context['recent_messages'] = [
{'role': msg.role, 'content': msg.content}
for msg in reversed(messages)
]
# Selective data loading based on router decision
for category in data_needed:
if category == 'companies_all':
context['all_companies'] = [_company_to_compact_dict(c) for c in active_companies]
elif category.startswith('companies_filtered:'):
filter_cat = category.split(':', 1)[1]
filtered = [c for c in active_companies
if c.category and c.category.name.lower() == filter_cat.lower()]
context['all_companies'] = [_company_to_compact_dict(c) for c in filtered]
elif category.startswith('companies_single:'):
search = category.split(':', 1)[1].lower()
matched = [c for c in active_companies
if search in c.name.lower() or search in (c.slug or '')]
context['all_companies'] = [_company_to_compact_dict(c) for c in matched[:5]]
elif category == 'events':
events = db.query(NordaEvent).filter(
NordaEvent.event_date >= datetime.now(),
NordaEvent.event_date <= datetime.now() + timedelta(days=60)
).order_by(NordaEvent.event_date).all()
context['upcoming_events'] = [
{'title': e.title, 'date': str(e.event_date), 'type': e.event_type,
'location': e.location, 'url': f'/kalendarz/{e.id}'}
for e in events
]
elif category == 'news':
news = db.query(ZOPKNews).filter(
ZOPKNews.published_at >= datetime.now() - timedelta(days=30),
ZOPKNews.status == 'approved'
).order_by(ZOPKNews.published_at.desc()).limit(10).all()
context['recent_news'] = [
{'title': n.title, 'summary': n.ai_summary, 'date': str(n.published_at),
'source': n.source_name, 'url': n.source_url}
for n in news
]
elif category == 'classifieds':
classifieds = db.query(Classified).filter(
Classified.status == 'active',
Classified.is_test == False
).order_by(Classified.created_at.desc()).limit(20).all()
context['classifieds'] = [
{'type': c.listing_type, 'title': c.title, 'description': c.description,
'company': c.company.name if c.company else None,
'budget': c.budget_text, 'url': f'/b2b/{c.id}'}
for c in classifieds
]
elif category == 'forum':
topics = db.query(ForumTopic).filter(
ForumTopic.is_test == False
).order_by(ForumTopic.created_at.desc()).limit(15).all()
context['forum_topics'] = [
{'title': t.title, 'content': t.content[:300],
'author': t.author.name if t.author else None,
'replies': t.reply_count, 'url': f'/forum/{t.slug}'}
for t in topics
]
elif category == 'company_people':
people_query = db.query(CompanyPerson).join(Person).join(Company).filter(
Company.status == 'active'
).all()
grouped = {}
for cp in people_query:
cname = cp.company.name
if cname not in grouped:
grouped[cname] = []
grouped[cname].append({
'name': cp.person.name,
'role': cp.role_description,
'shares': cp.shares_value
})
context['company_people'] = grouped
elif category == 'registered_users':
users = db.query(User).filter(
User.is_active == True,
User.company_id.isnot(None)
).all()
grouped = {}
for u in users:
cname = u.company.name if u.company else 'Brak firmy'
if cname not in grouped:
grouped[cname] = []
grouped[cname].append({
'name': u.name, 'email': u.email,
'role': u.company_role, 'member': u.is_norda_member
})
context['registered_users'] = grouped
elif category == 'social_media':
socials = db.query(CompanySocialMedia).filter_by(is_valid=True).all()
grouped = {}
for s in socials:
cname = s.company.name if s.company else 'Unknown'
if cname not in grouped:
grouped[cname] = []
grouped[cname].append({
'platform': s.platform, 'url': s.url,
'followers': s.followers_count
})
context['company_social_media'] = grouped
elif category == 'audits':
# GBP audits
gbp = db.query(GBPAudit).order_by(GBPAudit.created_at.desc()).all()
seen = set()
gbp_unique = []
for g in gbp:
if g.company_id not in seen:
seen.add(g.company_id)
gbp_unique.append({
'company': g.company.name if g.company else None,
'score': g.overall_score, 'reviews': g.total_reviews,
'rating': g.average_rating
})
context['gbp_audits'] = gbp_unique
# SEO audits
seo = db.query(CompanyWebsiteAnalysis).all()
context['seo_audits'] = [
{'company': s.company.name if s.company else None,
'seo': s.seo_score, 'performance': s.performance_score}
for s in seo
]
# If no companies were loaded by any category, load a minimal summary
if 'all_companies' not in context:
context['all_companies'] = []
finally:
db.close()
return context
- Step 2: Verify syntax
python3 -m py_compile context_builder.py && echo "OK"
- Step 3: Commit
git add context_builder.py
git commit -m "feat(nordagpt): add context_builder.py — selective data loading for smart router"
Task 5: Create smart_router.py — query classification
Files:
-
Create:
smart_router.py -
Step 1: Create smart_router.py
"""
Smart Router for NordaGPT
==========================
Classifies query complexity and selects which data categories to load.
Uses Gemini 3.1 Flash-Lite for fast, cheap classification (~1-2s).
"""
import json
import logging
import time
from typing import Dict, Any, List, Optional
logger = logging.getLogger(__name__)
# Keyword-based fast routing (no API call needed)
FAST_ROUTES = {
'companies_all': ['wszystkie firmy', 'ile firm', 'lista firm', 'katalog', 'porównaj firmy'],
'events': ['wydarzenie', 'spotkanie', 'kalendarz', 'konferencja', 'szkolenie', 'kiedy'],
'news': ['aktualności', 'nowości', 'wiadomości', 'pej', 'atom', 'elektrownia', 'zopk'],
'classifieds': ['ogłoszenie', 'b2b', 'zlecenie', 'oferta', 'szukam', 'oferuję'],
'forum': ['forum', 'dyskusja', 'temat', 'wątek', 'post'],
'company_people': ['zarząd', 'krs', 'właściciel', 'prezes', 'udziały', 'wspólnik'],
'registered_users': ['użytkownik', 'kto jest', 'profil', 'zarejestrowany', 'członek'],
'social_media': ['facebook', 'instagram', 'linkedin', 'social media', 'media społeczn'],
'audits': ['seo', 'google', 'gbp', 'opinie', 'ocena', 'pageSpeed'],
}
# Model selection by complexity
MODEL_MAP = {
'simple': {'model': '3.1-flash-lite', 'thinking': 'minimal'},
'medium': {'model': '3-flash', 'thinking': 'low'},
'complex': {'model': '3-flash', 'thinking': 'high'},
}
ROUTER_PROMPT = """Jesteś routerem zapytań. Przeanalizuj pytanie i zdecyduj jakie dane potrzebne.
Użytkownik: {user_name} z firmy {company_name}
Pytanie: {message}
Zwróć TYLKO JSON (bez markdown):
{{
"complexity": "simple|medium|complex",
"data_needed": ["lista kategorii z poniższych"]
}}
Kategorie:
- companies_all — wszystkie firmy (porównania, przeglądy, "ile firm")
- companies_filtered:KATEGORIA — firmy z kategorii (np. companies_filtered:IT)
- companies_single:NAZWA — jedna firma (np. companies_single:termo)
- events — nadchodzące wydarzenia
- news — aktualności, PEJ, ZOPK
- classifieds — ogłoszenia B2B
- forum — tematy forum
- company_people — zarząd, KRS, udziałowcy
- registered_users — użytkownicy portalu
- social_media — profile social media firm
- audits — wyniki SEO/GBP
Zasady:
- "simple" = jedno pytanie o konkretną rzecz (telefon, adres, link)
- "medium" = porównanie, lista, filtrowanie
- "complex" = analiza, strategia, rekomendacje
- Wybierz MINIMUM kategorii. Nie ładuj niepotrzebnych danych.
- Jeśli pytanie dotyczy konkretnej firmy, użyj companies_single:nazwa
- Pytania ogólne o użytkownika (kim jestem, co wiesz) = [] (dane z profilu wystarczą)
"""
def route_query_fast(message: str, user_context: Optional[Dict] = None) -> Dict[str, Any]:
"""
Fast keyword-based routing. No API call.
Returns routing decision or None if uncertain (needs AI router).
"""
msg_lower = message.lower()
# Check for personal questions — no data needed
personal_patterns = ['kim jestem', 'co wiesz o mnie', 'mój profil', 'moje dane']
if any(p in msg_lower for p in personal_patterns):
return {
'complexity': 'simple',
'data_needed': [],
'model': '3.1-flash-lite',
'thinking': 'minimal',
'routed_by': 'fast'
}
# Check for greetings — no data needed
greeting_patterns = ['cześć', 'hej', 'witam', 'dzień dobry', 'siema', 'hello']
if any(msg_lower.strip().startswith(p) for p in greeting_patterns) and len(message) < 30:
return {
'complexity': 'simple',
'data_needed': [],
'model': '3.1-flash-lite',
'thinking': 'minimal',
'routed_by': 'fast'
}
# Check keyword matches
matched_categories = []
for category, keywords in FAST_ROUTES.items():
if any(kw in msg_lower for kw in keywords):
matched_categories.append(category)
# Check for specific company name mention
# Simple heuristic: if message has quotes or specific capitalized words
if not matched_categories:
# Can't determine — return None to trigger AI router
return None
# Determine complexity
if len(matched_categories) <= 1 and len(message) < 80:
complexity = 'simple'
elif len(matched_categories) <= 2:
complexity = 'medium'
else:
complexity = 'complex'
model_config = MODEL_MAP[complexity]
return {
'complexity': complexity,
'data_needed': matched_categories,
'model': model_config['model'],
'thinking': model_config['thinking'],
'routed_by': 'fast'
}
def route_query_ai(
message: str,
user_context: Optional[Dict] = None,
gemini_service=None
) -> Dict[str, Any]:
"""
AI-powered routing using Flash-Lite. Called when fast routing is uncertain.
"""
if not gemini_service:
# Fallback: load everything
return _fallback_route()
user_name = user_context.get('user_name', 'Nieznany') if user_context else 'Nieznany'
company_name = user_context.get('company_name', 'brak') if user_context else 'brak'
prompt = ROUTER_PROMPT.format(
user_name=user_name,
company_name=company_name,
message=message
)
try:
start = time.time()
response = gemini_service.generate_text(
prompt=prompt,
temperature=0.1,
max_tokens=200,
model='gemini-3.1-flash-lite-preview',
thinking_level='minimal',
feature='smart_router'
)
latency = int((time.time() - start) * 1000)
logger.info(f"Smart Router AI response in {latency}ms: {response[:200]}")
# Parse JSON from response
# Handle potential markdown wrapping
text = response.strip()
if text.startswith('```'):
text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()
result = json.loads(text)
complexity = result.get('complexity', 'medium')
model_config = MODEL_MAP.get(complexity, MODEL_MAP['medium'])
return {
'complexity': complexity,
'data_needed': result.get('data_needed', []),
'model': model_config['model'],
'thinking': model_config['thinking'],
'routed_by': 'ai',
'router_latency_ms': latency
}
except (json.JSONDecodeError, KeyError, Exception) as e:
logger.warning(f"Smart Router AI failed: {e}, falling back to full context")
return _fallback_route()
def route_query(
message: str,
user_context: Optional[Dict] = None,
gemini_service=None
) -> Dict[str, Any]:
"""
Main entry point. Tries fast routing first, falls back to AI routing.
"""
# Try fast keyword-based routing
result = route_query_fast(message, user_context)
if result is not None:
logger.info(f"Smart Router FAST: complexity={result['complexity']}, data={result['data_needed']}")
return result
# Fall back to AI routing
result = route_query_ai(message, user_context, gemini_service)
logger.info(f"Smart Router AI: complexity={result['complexity']}, data={result['data_needed']}")
return result
def _fallback_route() -> Dict[str, Any]:
"""Fallback: load everything, use default model. Safe but slow."""
return {
'complexity': 'medium',
'data_needed': [
'companies_all', 'events', 'news', 'classifieds',
'forum', 'company_people', 'registered_users'
],
'model': '3-flash',
'thinking': 'low',
'routed_by': 'fallback'
}
- Step 2: Verify syntax
python3 -m py_compile smart_router.py && echo "OK"
- Step 3: Commit
git add smart_router.py
git commit -m "feat(nordagpt): add smart_router.py — fast keyword routing + AI fallback"
Task 6: Integrate Smart Router into nordabiz_chat.py
Files:
-
Modify:
nordabiz_chat.py:163-282, 347-643, 890-1365 -
Step 1: Add imports at top of nordabiz_chat.py
After existing imports (around line 30), add:
from smart_router import route_query
from context_builder import build_selective_context
- Step 2: Modify send_message() to use Smart Router
In send_message(), replace the call to _build_conversation_context() and _query_ai() (around lines 236-239). The key change: use the router to decide model and data, then use context_builder for selective loading.
Find the section where context is built and AI is queried (around lines 236-241):
# Before (approximately lines 236-241):
# context = self._build_conversation_context(db, conversation, original_message)
# ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level, user_context=user_context)
# After:
# Smart Router — classify query and select data + model
route_decision = route_query(
message=original_message,
user_context=user_context,
gemini_service=self.gemini_service
)
# Override model and thinking based on router decision
effective_model = route_decision.get('model', '3-flash')
effective_thinking = route_decision.get('thinking', thinking_level)
# Build selective context (only requested data categories)
context = build_selective_context(
data_needed=route_decision.get('data_needed', []),
conversation_id=conversation.id,
current_message=original_message,
user_context=user_context
)
# Use the original _query_ai but with router-selected parameters
ai_response_text = self._query_ai(
context, original_message,
user_id=user_id,
thinking_level=effective_thinking,
user_context=user_context
)
Note: Keep _build_conversation_context() and full _query_ai() intact as fallback. The router's _fallback_route() loads all data, so it's safe.
- Step 3: Log routing decisions
After the route_query call, add logging:
logger.info(
f"NordaGPT Router: user={user_context.get('user_name') if user_context else '?'}, "
f"complexity={route_decision['complexity']}, model={effective_model}, "
f"thinking={effective_thinking}, data={route_decision['data_needed']}, "
f"routed_by={route_decision.get('routed_by')}"
)
- Step 4: Update the GeminiService call in _query_ai() to use effective model
Currently _query_ai() uses self.gemini_service which has a fixed model. We need to pass the router-selected model to the generate_text call. In _query_ai(), around line 1352, modify:
# Before:
response = self.gemini_service.generate_text(
prompt=full_prompt,
temperature=0.7,
thinking_level=thinking_level,
user_id=user_id,
feature='chat'
)
# After:
response = self.gemini_service.generate_text(
prompt=full_prompt,
temperature=0.7,
thinking_level=thinking_level,
user_id=user_id,
feature='chat',
model=route_decision.get('model') if hasattr(self, '_current_route_decision') else None
)
Actually, a cleaner approach — pass the model through context:
In send_message(), add to context before calling _query_ai():
context['_route_decision'] = route_decision
In _query_ai(), read it at the generate_text call:
route = context.get('_route_decision', {})
effective_model_id = None
model_alias = route.get('model')
if model_alias:
from gemini_service import GEMINI_MODELS
effective_model_id = GEMINI_MODELS.get(model_alias)
response = self.gemini_service.generate_text(
prompt=full_prompt,
temperature=0.7,
thinking_level=thinking_level,
user_id=user_id,
feature='chat',
model=effective_model_id
)
- Step 5: Verify syntax
python3 -m py_compile nordabiz_chat.py && echo "OK"
- Step 6: Commit
git add nordabiz_chat.py
git commit -m "feat(nordagpt): integrate smart router — selective context loading + adaptive model selection"
Task 7: Deploy Phase 2 and verify
- Step 1: Push and deploy to staging
git push origin master && git push inpi master
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
- Step 2: Test on staging — verify routing works
Test simple query: "Jaki jest telefon do TERMO?" — should be fast (2-3s), Flash-Lite model. Test medium query: "Porównaj firmy budowlane w Izbie" — should load companies_all, medium speed. Test complex query: "Jakie firmy mogłyby współpracować przy projekcie PEJ?" — should use full context.
Check logs for routing decisions:
ssh maciejpi@10.22.68.248 "journalctl -u nordabiznes -n 30 --no-pager | grep 'Router'"
- Step 3: Deploy to production
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
Phase 3: Streaming Responses (Tasks 8-10)
Task 8: Add streaming endpoint in Flask
Files:
-
Modify:
blueprints/chat/routes.py -
Modify:
nordabiz_chat.py -
Step 1: Add SSE streaming endpoint
In blueprints/chat/routes.py, add a new route after chat_send_message() (after line ~309):
@bp.route('/api/chat/<int:conversation_id>/message/stream', methods=['POST'])
@login_required
@member_required
def chat_send_message_stream(conversation_id):
"""Send message to AI chat with streaming response (SSE)"""
from flask import Response, stream_with_context
import json as json_module
data = request.get_json()
if not data or not data.get('message', '').strip():
return jsonify({'error': 'Wiadomość nie może być pusta'}), 400
message = data['message'].strip()
# Check limits
from nordabiz_chat import check_user_limits
limit_result = check_user_limits(current_user.id, current_user.email)
if limit_result.get('limited'):
return jsonify({'error': 'Przekroczono limit', 'limit_info': limit_result}), 429
# Build user context
user_context = {
'user_id': current_user.id,
'user_name': current_user.name,
'user_email': current_user.email,
'company_name': current_user.company.name if current_user.company else None,
'company_id': current_user.company.id if current_user.company else None,
'company_category': current_user.company.category.name if current_user.company and current_user.company.category else None,
'company_role': current_user.company_role or 'MEMBER',
'is_norda_member': current_user.is_norda_member,
'chamber_role': current_user.chamber_role,
'member_since': current_user.created_at.strftime('%Y-%m-%d') if current_user.created_at else None,
}
model_choice = data.get('model') or session.get('chat_model', 'flash')
model_key = '3-flash' if model_choice == 'flash' else '3-pro'
def generate():
try:
chat_engine = NordaBizChatEngine(model=model_key)
for chunk in chat_engine.send_message_stream(
conversation_id=conversation_id,
user_message=message,
user_id=current_user.id,
user_context=user_context
):
yield f"data: {json_module.dumps(chunk, ensure_ascii=False)}\n\n"
except PermissionError:
yield f"data: {json_module.dumps({'type': 'error', 'content': 'Brak dostępu do tej konwersacji'})}\n\n"
except Exception as e:
logger.error(f"Streaming error: {e}")
yield f"data: {json_module.dumps({'type': 'error', 'content': 'Wystąpił błąd'})}\n\n"
return Response(
stream_with_context(generate()),
mimetype='text/event-stream',
headers={
'Cache-Control': 'no-cache',
'X-Accel-Buffering': 'no', # Disable Nginx buffering
}
)
- Step 2: Add send_message_stream() to NordaBizChatEngine
In nordabiz_chat.py, add a new method after send_message() (after line ~282):
def send_message_stream(
self,
conversation_id: int,
user_message: str,
user_id: int,
user_context: Optional[Dict[str, Any]] = None
):
"""
Generator that yields streaming chunks for SSE.
Yields dicts: {'type': 'thinking'|'token'|'done'|'error', 'content': '...'}
"""
import time
db = SessionLocal()
try:
conversation = db.query(AIChatConversation).filter_by(
id=conversation_id, user_id=user_id
).first()
if not conversation:
yield {'type': 'error', 'content': 'Konwersacja nie znaleziona'}
return
# Save user message
original_message = user_message
sanitized = self._sanitize_message(user_message)
user_msg = AIChatMessage(
conversation_id=conversation_id,
role='user',
content=sanitized
)
db.add(user_msg)
db.commit()
# Smart Router
route_decision = route_query(
message=original_message,
user_context=user_context,
gemini_service=self.gemini_service
)
yield {'type': 'thinking', 'content': 'Analizuję pytanie...'}
# Build selective context
context = build_selective_context(
data_needed=route_decision.get('data_needed', []),
conversation_id=conversation.id,
current_message=original_message,
user_context=user_context
)
context['_route_decision'] = route_decision
# Build prompt (reuse _query_ai logic for prompt building)
full_prompt = self._build_prompt(context, original_message, user_context, route_decision.get('thinking', 'low'))
# Get effective model
from gemini_service import GEMINI_MODELS
model_alias = route_decision.get('model', '3-flash')
effective_model = GEMINI_MODELS.get(model_alias, self.model_name)
# Stream from Gemini
start_time = time.time()
stream_response = self.gemini_service.generate_text(
prompt=full_prompt,
temperature=0.7,
stream=True,
thinking_level=route_decision.get('thinking', 'low'),
user_id=user_id,
feature='chat_stream',
model=effective_model
)
full_text = ""
for chunk in stream_response:
if hasattr(chunk, 'text') and chunk.text:
full_text += chunk.text
yield {'type': 'token', 'content': chunk.text}
latency_ms = int((time.time() - start_time) * 1000)
# Save AI response to DB
ai_msg = AIChatMessage(
conversation_id=conversation_id,
role='assistant',
content=full_text,
latency_ms=latency_ms
)
db.add(ai_msg)
conversation.updated_at = datetime.now()
conversation.message_count = (conversation.message_count or 0) + 2
db.commit()
yield {
'type': 'done',
'message_id': ai_msg.id,
'latency_ms': latency_ms,
'model': model_alias,
'complexity': route_decision.get('complexity')
}
except Exception as e:
logger.error(f"Stream error: {e}", exc_info=True)
yield {'type': 'error', 'content': 'Wystąpił błąd podczas generowania odpowiedzi'}
finally:
db.close()
- Step 3: Extract prompt building into reusable method
Add a _build_prompt() method to NordaBizChatEngine that extracts prompt construction from _query_ai(). This method builds the full prompt string without calling Gemini:
def _build_prompt(
self,
context: Dict[str, Any],
user_message: str,
user_context: Optional[Dict[str, Any]] = None,
thinking_level: str = 'low'
) -> str:
"""Build the full prompt string. Extracted from _query_ai() for reuse in streaming."""
# Build user identity section
user_identity = ""
if user_context:
user_identity = f"""
# AKTUALNY UŻYTKOWNIK
Rozmawiasz z: {user_context.get('user_name', 'Nieznany')}
Firma: {user_context.get('company_name', 'brak')} — kategoria: {user_context.get('company_category', 'brak')}
Rola w firmie: {user_context.get('company_role', 'MEMBER')}
Członek Izby: {'tak' if user_context.get('is_norda_member') else 'nie'}
Rola w Izbie: {user_context.get('chamber_role') or '—'}
Na portalu od: {user_context.get('member_since', 'nieznana data')}
"""
# Reuse the existing system_prompt from _query_ai() lines 922-1134
# This is the same static prompt — extract it to a class attribute or method
# For now, call _query_ai's prompt logic
# NOTE: In implementation, refactor the static prompt into a separate method
# to avoid duplication. The key point is that _build_prompt returns the
# same prompt string that _query_ai would build.
# ... (reuse existing system prompt construction logic) ...
return full_prompt
Implementation note: The actual implementation should refactor _query_ai() to call _build_prompt() internally, then the streaming method also calls _build_prompt(). This avoids prompt duplication.
- Step 4: Verify syntax
python3 -m py_compile nordabiz_chat.py && python3 -m py_compile blueprints/chat/routes.py && echo "OK"
- Step 5: Commit
git add nordabiz_chat.py blueprints/chat/routes.py
git commit -m "feat(nordagpt): add streaming SSE endpoint + send_message_stream method"
Task 9: Frontend streaming UI
Files:
-
Modify:
templates/chat.html -
Step 1: Add streaming sendMessage function
In templates/chat.html, replace the existing sendMessage() function (lines 2373-2454) with a streaming version:
async function sendMessage() {
const input = document.getElementById('messageInput');
const message = input.value.trim();
if (!message || isSending) return;
isSending = true;
document.getElementById('sendBtn').disabled = true;
input.value = '';
autoResizeTextarea();
// Add user message to chat
addMessage('user', message);
// Create conversation if needed
if (!currentConversationId) {
try {
const startRes = await fetch('/api/chat/start', {
method: 'POST',
headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
body: JSON.stringify({title: message.substring(0, 50)})
});
const startData = await startRes.json();
currentConversationId = startData.conversation_id;
} catch (e) {
addMessage('assistant', 'Błąd tworzenia konwersacji.');
isSending = false;
document.getElementById('sendBtn').disabled = false;
return;
}
}
// Add empty assistant bubble with thinking animation
const msgDiv = document.createElement('div');
msgDiv.className = 'message assistant';
msgDiv.innerHTML = `
<div class="message-avatar">AI</div>
<div class="message-content">
<div class="thinking-dots"><span>.</span><span>.</span><span>.</span></div>
</div>
`;
document.getElementById('chatMessages').appendChild(msgDiv);
scrollToBottom();
const contentDiv = msgDiv.querySelector('.message-content');
try {
const response = await fetch(`/api/chat/${currentConversationId}/message/stream`, {
method: 'POST',
headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
body: JSON.stringify({message: message, model: currentModel})
});
if (response.status === 429) {
contentDiv.innerHTML = '';
contentDiv.textContent = 'Przekroczono limit zapytań.';
showLimitBanner();
isSending = false;
document.getElementById('sendBtn').disabled = false;
return;
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
let fullText = '';
let thinkingRemoved = false;
while (true) {
const {done, value} = await reader.read();
if (done) break;
const text = decoder.decode(value, {stream: true});
const lines = text.split('\n');
for (const line of lines) {
if (!line.startsWith('data: ')) continue;
try {
const chunk = JSON.parse(line.slice(6));
if (chunk.type === 'thinking') {
// Keep thinking dots visible
continue;
}
if (chunk.type === 'token') {
if (!thinkingRemoved) {
contentDiv.innerHTML = '';
thinkingRemoved = true;
}
fullText += chunk.content;
contentDiv.innerHTML = formatMessage(fullText);
scrollToBottom();
}
if (chunk.type === 'done') {
// Add tech info badge
if (chunk.latency_ms) {
const badge = document.createElement('div');
badge.className = 'thinking-info-badge';
badge.textContent = `${chunk.model || 'AI'} · ${(chunk.latency_ms/1000).toFixed(1)}s`;
msgDiv.appendChild(badge);
}
loadConversations();
}
if (chunk.type === 'error') {
contentDiv.innerHTML = '';
contentDiv.textContent = chunk.content || 'Wystąpił błąd';
}
} catch (e) {
// Skip malformed chunks
}
}
}
} catch (e) {
contentDiv.innerHTML = '';
contentDiv.textContent = 'Błąd połączenia z serwerem.';
}
isSending = false;
document.getElementById('sendBtn').disabled = false;
}
- Step 2: Add CSS for thinking animation
In templates/chat.html, in the {% block extra_css %} section, add:
.thinking-dots {
display: flex;
gap: 4px;
padding: 8px 0;
}
.thinking-dots span {
animation: thinkBounce 1.4s infinite ease-in-out both;
font-size: 1.5rem;
color: var(--text-secondary);
}
.thinking-dots span:nth-child(1) { animation-delay: -0.32s; }
.thinking-dots span:nth-child(2) { animation-delay: -0.16s; }
.thinking-dots span:nth-child(3) { animation-delay: 0s; }
@keyframes thinkBounce {
0%, 80%, 100% { transform: scale(0); }
40% { transform: scale(1); }
}
- Step 3: Verify locally and commit
python3 -m py_compile app.py && echo "OK"
git add templates/chat.html
git commit -m "feat(nordagpt): streaming UI — word-by-word response with thinking animation"
Task 10: Deploy Phase 3 and verify streaming
- Step 1: Check Nginx/NPM config for SSE support
SSE requires Nginx to NOT buffer the response. The streaming endpoint sets X-Accel-Buffering: no header. Verify NPM custom config allows this:
ssh maciejpi@57.128.200.27 "cat /etc/nginx/sites-enabled/nordabiznes.conf 2>/dev/null || echo 'Using NPM proxy'"
If using NPM, the X-Accel-Buffering: no header should be sufficient. If not, add to NPM custom Nginx config for nordabiznes.pl:
proxy_buffering off;
proxy_cache off;
- Step 2: Push, deploy to staging, test streaming
git push origin master && git push inpi master
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
Test on staging: open chat, send message, verify text appears word-by-word.
- Step 3: Deploy to production
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
Phase 4: Persistent User Memory (Tasks 11-15)
Task 11: Database migration — memory tables
Files:
-
Create:
database/migrations/092_ai_user_memory.sql -
Create:
database/migrations/093_ai_conversation_summary.sql -
Step 1: Create migration 092
-- 092_ai_user_memory.sql
-- Persistent memory for NordaGPT — per-user facts extracted from conversations
CREATE TABLE IF NOT EXISTS ai_user_memory (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
fact TEXT NOT NULL,
category VARCHAR(50) DEFAULT 'general',
source_conversation_id INTEGER REFERENCES ai_chat_conversations(id) ON DELETE SET NULL,
confidence FLOAT DEFAULT 1.0,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP DEFAULT (NOW() + INTERVAL '12 months'),
is_active BOOLEAN DEFAULT TRUE
);
CREATE INDEX idx_ai_user_memory_user_active ON ai_user_memory(user_id, is_active, confidence DESC);
CREATE INDEX idx_ai_user_memory_expires ON ai_user_memory(expires_at) WHERE is_active = TRUE;
GRANT ALL ON TABLE ai_user_memory TO nordabiz_app;
GRANT USAGE, SELECT ON SEQUENCE ai_user_memory_id_seq TO nordabiz_app;
- Step 2: Create migration 093
-- 093_ai_conversation_summary.sql
-- Auto-generated summaries of AI conversations for memory context
CREATE TABLE IF NOT EXISTS ai_conversation_summary (
id SERIAL PRIMARY KEY,
conversation_id INTEGER NOT NULL UNIQUE REFERENCES ai_chat_conversations(id) ON DELETE CASCADE,
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
summary TEXT NOT NULL,
key_topics JSONB DEFAULT '[]',
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_ai_conv_summary_user ON ai_conversation_summary(user_id, created_at DESC);
GRANT ALL ON TABLE ai_conversation_summary TO nordabiz_app;
GRANT USAGE, SELECT ON SEQUENCE ai_conversation_summary_id_seq TO nordabiz_app;
- Step 3: Commit migrations
git add database/migrations/092_ai_user_memory.sql database/migrations/093_ai_conversation_summary.sql
git commit -m "feat(nordagpt): add migrations for user memory and conversation summary tables"
Task 12: Add SQLAlchemy models
Files:
-
Modify:
database.py(insert before line 5954) -
Step 1: Add AIUserMemory model
Insert before the # DATABASE INITIALIZATION comment (line 5954):
class AIUserMemory(Base):
__tablename__ = 'ai_user_memory'
id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False)
fact = Column(Text, nullable=False)
category = Column(String(50), default='general')
source_conversation_id = Column(Integer, ForeignKey('ai_chat_conversations.id', ondelete='SET NULL'), nullable=True)
confidence = Column(Float, default=1.0)
created_at = Column(DateTime, default=datetime.utcnow)
expires_at = Column(DateTime)
is_active = Column(Boolean, default=True)
user = relationship('User')
source_conversation = relationship('AIChatConversation')
class AIConversationSummary(Base):
__tablename__ = 'ai_conversation_summary'
id = Column(Integer, primary_key=True)
conversation_id = Column(Integer, ForeignKey('ai_chat_conversations.id', ondelete='CASCADE'), nullable=False, unique=True)
user_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False)
summary = Column(Text, nullable=False)
key_topics = Column(JSON, default=list)
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow)
user = relationship('User')
conversation = relationship('AIChatConversation')
- Step 2: Verify syntax
python3 -m py_compile database.py && echo "OK"
- Step 3: Commit
git add database.py
git commit -m "feat(nordagpt): add AIUserMemory and AIConversationSummary ORM models"
Task 13: Create memory_service.py
Files:
-
Create:
memory_service.py -
Step 1: Create memory_service.py
"""
Memory Service for NordaGPT
=============================
Manages persistent per-user memory: fact extraction, storage, retrieval, cleanup.
"""
import json
import logging
from datetime import datetime, timedelta
from typing import Dict, Any, List, Optional
from database import SessionLocal, AIUserMemory, AIConversationSummary, AIChatMessage
logger = logging.getLogger(__name__)
EXTRACT_FACTS_PROMPT = """Na podstawie tej rozmowy wyciągnij kluczowe fakty o użytkowniku {user_name} ({company_name}).
Rozmowa:
{conversation_text}
Istniejące fakty (NIE DUPLIKUJ):
{existing_facts}
Zwróć TYLKO JSON array (bez markdown):
[{{"fact": "...", "category": "interests|needs|contacts|insights"}}]
Zasady:
- Tylko nowe, nietrywialne fakty przydatne w przyszłych rozmowach
- Nie zapisuj: "zapytał o firmę X" (to za mało)
- Zapisuj: "szuka podwykonawców do projektu PEJ w branży elektrycznej"
- Max 3 fakty. Jeśli nie ma nowych faktów, zwróć []
- Kategorie: interests (zainteresowania), needs (potrzeby biznesowe), contacts (kontakty), insights (wnioski/preferencje)
"""
SUMMARIZE_PROMPT = """Podsumuj tę rozmowę w 1-3 zdaniach. Skup się na tym, czego użytkownik szukał i co ustalono.
Rozmowa:
{conversation_text}
Zwróć TYLKO JSON (bez markdown):
{{"summary": "...", "key_topics": ["temat1", "temat2"]}}
"""
def get_user_memory(user_id: int, limit: int = 10) -> List[Dict]:
"""Get active memory facts for a user, sorted by recency and confidence."""
db = SessionLocal()
try:
facts = db.query(AIUserMemory).filter(
AIUserMemory.user_id == user_id,
AIUserMemory.is_active == True,
AIUserMemory.expires_at > datetime.now()
).order_by(
AIUserMemory.confidence.desc(),
AIUserMemory.created_at.desc()
).limit(limit).all()
return [
{
'id': f.id,
'fact': f.fact,
'category': f.category,
'confidence': f.confidence,
'created_at': f.created_at.isoformat()
}
for f in facts
]
finally:
db.close()
def get_conversation_summaries(user_id: int, limit: int = 5) -> List[Dict]:
"""Get recent conversation summaries for a user."""
db = SessionLocal()
try:
summaries = db.query(AIConversationSummary).filter(
AIConversationSummary.user_id == user_id
).order_by(
AIConversationSummary.created_at.desc()
).limit(limit).all()
return [
{
'summary': s.summary,
'topics': s.key_topics or [],
'date': s.created_at.strftime('%Y-%m-%d')
}
for s in summaries
]
finally:
db.close()
def format_memory_for_prompt(user_id: int) -> str:
"""Format user memory and summaries for injection into AI prompt."""
facts = get_user_memory(user_id)
summaries = get_conversation_summaries(user_id)
if not facts and not summaries:
return ""
parts = ["\n# PAMIĘĆ O UŻYTKOWNIKU"]
if facts:
parts.append("Znane fakty:")
for f in facts:
parts.append(f"- [{f['category']}] {f['fact']}")
if summaries:
parts.append("\nOstatnie rozmowy:")
for s in summaries:
topics = ", ".join(s['topics'][:3]) if s['topics'] else ""
parts.append(f"- {s['date']}: {s['summary']}" + (f" (tematy: {topics})" if topics else ""))
parts.append("\nWykorzystuj tę wiedzę do personalizacji odpowiedzi. Nawiązuj do wcześniejszych rozmów gdy to naturalne.")
return "\n".join(parts)
def extract_facts_async(
conversation_id: int,
user_id: int,
user_context: Dict,
gemini_service
):
"""
Extract memory facts from a conversation. Run async after response is sent.
Uses Flash-Lite for minimal cost.
"""
db = SessionLocal()
try:
# Get conversation messages
messages = db.query(AIChatMessage).filter_by(
conversation_id=conversation_id
).order_by(AIChatMessage.created_at).all()
if len(messages) < 2:
return # Too short to extract
conversation_text = "\n".join([
f"{'Użytkownik' if m.role == 'user' else 'NordaGPT'}: {m.content}"
for m in messages[-10:] # Last 10 messages
])
# Get existing facts to avoid duplicates
existing = db.query(AIUserMemory).filter(
AIUserMemory.user_id == user_id,
AIUserMemory.is_active == True
).all()
existing_text = "\n".join([f"- {f.fact}" for f in existing]) or "Brak"
prompt = EXTRACT_FACTS_PROMPT.format(
user_name=user_context.get('user_name', 'Nieznany'),
company_name=user_context.get('company_name', 'brak'),
conversation_text=conversation_text,
existing_facts=existing_text
)
response = gemini_service.generate_text(
prompt=prompt,
temperature=0.1,
max_tokens=300,
model='gemini-3.1-flash-lite-preview',
thinking_level='minimal',
feature='memory_extraction'
)
# Parse response
text = response.strip()
if text.startswith('```'):
text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()
facts = json.loads(text)
if not isinstance(facts, list):
return
for fact_data in facts[:3]:
if not fact_data.get('fact'):
continue
memory = AIUserMemory(
user_id=user_id,
fact=fact_data['fact'],
category=fact_data.get('category', 'general'),
source_conversation_id=conversation_id,
expires_at=datetime.now() + timedelta(days=365)
)
db.add(memory)
db.commit()
logger.info(f"Extracted {len(facts)} memory facts for user {user_id}")
except Exception as e:
logger.warning(f"Memory extraction failed for conversation {conversation_id}: {e}")
db.rollback()
finally:
db.close()
def summarize_conversation_async(
conversation_id: int,
user_id: int,
gemini_service
):
"""Generate or update conversation summary. Run async."""
db = SessionLocal()
try:
messages = db.query(AIChatMessage).filter_by(
conversation_id=conversation_id
).order_by(AIChatMessage.created_at).all()
if len(messages) < 2:
return
conversation_text = "\n".join([
f"{'Użytkownik' if m.role == 'user' else 'NordaGPT'}: {m.content[:200]}"
for m in messages[-10:]
])
prompt = SUMMARIZE_PROMPT.format(conversation_text=conversation_text)
response = gemini_service.generate_text(
prompt=prompt,
temperature=0.1,
max_tokens=200,
model='gemini-3.1-flash-lite-preview',
thinking_level='minimal',
feature='conversation_summary'
)
text = response.strip()
if text.startswith('```'):
text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()
result = json.loads(text)
existing = db.query(AIConversationSummary).filter_by(
conversation_id=conversation_id
).first()
if existing:
existing.summary = result.get('summary', existing.summary)
existing.key_topics = result.get('key_topics', existing.key_topics)
existing.updated_at = datetime.now()
else:
summary = AIConversationSummary(
conversation_id=conversation_id,
user_id=user_id,
summary=result.get('summary', ''),
key_topics=result.get('key_topics', [])
)
db.add(summary)
db.commit()
logger.info(f"Summarized conversation {conversation_id}")
except Exception as e:
logger.warning(f"Conversation summary failed for {conversation_id}: {e}")
db.rollback()
finally:
db.close()
def delete_user_fact(user_id: int, fact_id: int) -> bool:
"""Soft-delete a memory fact. Returns True if deleted."""
db = SessionLocal()
try:
fact = db.query(AIUserMemory).filter_by(id=fact_id, user_id=user_id).first()
if fact:
fact.is_active = False
db.commit()
return True
return False
finally:
db.close()
- Step 2: Verify syntax
python3 -m py_compile memory_service.py && echo "OK"
- Step 3: Commit
git add memory_service.py
git commit -m "feat(nordagpt): add memory_service.py — fact extraction, summaries, CRUD"
Task 14: Integrate memory into chat flow
Files:
-
Modify:
nordabiz_chat.py -
Modify:
blueprints/chat/routes.py -
Step 1: Inject memory into system prompt
In nordabiz_chat.py, in the _build_prompt() or _query_ai() method, after the user identity block and before the data sections, add memory:
from memory_service import format_memory_for_prompt
# After user_identity block, before data injection:
user_memory_text = ""
if user_context and user_context.get('user_id'):
user_memory_text = format_memory_for_prompt(user_context['user_id'])
# Prepend to system prompt:
system_prompt = user_identity + user_memory_text + f"""Jesteś pomocnym asystentem..."""
- Step 2: Trigger async memory extraction after response
In send_message() and send_message_stream(), after saving the AI response, trigger async extraction using threading:
import threading
from memory_service import extract_facts_async, summarize_conversation_async
# After saving AI response to DB (end of send_message/send_message_stream):
# Async memory extraction — don't block the response
def _extract_memory():
extract_facts_async(conversation_id, user_id, user_context, self.gemini_service)
# Summarize every 5 messages
if (conversation.message_count or 0) % 5 == 0:
summarize_conversation_async(conversation_id, user_id, self.gemini_service)
threading.Thread(target=_extract_memory, daemon=True).start()
- Step 3: Add memory CRUD API routes
In blueprints/chat/routes.py, add routes for viewing and deleting memory:
@bp.route('/api/chat/memory', methods=['GET'])
@login_required
@member_required
def get_user_memory_api():
"""Get current user's NordaGPT memory facts and summaries"""
from memory_service import get_user_memory, get_conversation_summaries
return jsonify({
'facts': get_user_memory(current_user.id, limit=20),
'summaries': get_conversation_summaries(current_user.id, limit=10)
})
@bp.route('/api/chat/memory/<int:fact_id>', methods=['DELETE'])
@login_required
@member_required
def delete_memory_fact(fact_id):
"""Delete a memory fact"""
from memory_service import delete_user_fact
if delete_user_fact(current_user.id, fact_id):
return jsonify({'status': 'ok'})
return jsonify({'error': 'Nie znaleziono'}), 404
- Step 4: Verify syntax
python3 -m py_compile nordabiz_chat.py && python3 -m py_compile blueprints/chat/routes.py && echo "OK"
- Step 5: Commit
git add nordabiz_chat.py blueprints/chat/routes.py
git commit -m "feat(nordagpt): integrate memory into chat — injection, async extraction, CRUD API"
Task 15: Deploy Phase 4 — migrations + code
- Step 1: Push to remotes
git push origin master && git push inpi master
- Step 2: Deploy to staging with migrations
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull"
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/092_ai_user_memory.sql"
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/093_ai_conversation_summary.sql"
ssh maciejpi@10.22.68.248 "sudo systemctl restart nordabiznes"
- Step 3: Test on staging
- Open chat, have a conversation about looking for IT companies
- Open another chat, ask "o czym rozmawialiśmy?" — verify AI mentions previous topics
- Check memory API:
curl https://staging.nordabiznes.pl/api/chat/memory(with auth) - Verify facts are extracted
- Step 4: Deploy to production
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull"
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && DATABASE_URL=\$(grep DATABASE_URL .env | cut -d'=' -f2) /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/092_ai_user_memory.sql"
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && DATABASE_URL=\$(grep DATABASE_URL .env | cut -d'=' -f2) /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/093_ai_conversation_summary.sql"
ssh maciejpi@57.128.200.27 "sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
- Step 5: Update release notes
Add entry in blueprints/public/routes.py _get_releases().
Post-Implementation Checklist
- Verify AI greets users by name
- Verify Smart Router logs show correct classification
- Verify streaming works on mobile (Android + iOS)
- Verify memory facts are extracted after conversations
- Verify memory is private (user A cannot see user B's facts)
- Verify response times: simple <3s, medium <6s, complex <12s
- Monitor costs for first week — compare with estimates
- Send message to Jakub Pornowski confirming speed improvements