nordabiz/docs/superpowers/plans/2026-03-28-nordagpt-identity-memory.md
Maciej Pienczyn 110d971dca
Some checks are pending
NordaBiz Tests / Unit & Integration Tests (push) Waiting to run
NordaBiz Tests / E2E Tests (Playwright) (push) Blocked by required conditions
NordaBiz Tests / Smoke Tests (Production) (push) Blocked by required conditions
NordaBiz Tests / Send Failure Notification (push) Blocked by required conditions
feat: migrate prod docs to OVH VPS + UTC→Warsaw timezone in all templates
Production moved from on-prem VM 249 (10.22.68.249) to OVH VPS
(57.128.200.27, inpi-vps-waw01). Updated ALL documentation, slash
commands, memory files, architecture docs, and deploy procedures.

Added |local_time Jinja filter (UTC→Europe/Warsaw) and converted
155 .strftime() calls across 71 templates so timestamps display
in Polish timezone regardless of server timezone.

Also includes: created_by_id tracking, abort import fix, ICS
calendar fix for missing end times, Pros Poland data cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 13:41:53 +02:00

1927 lines
64 KiB
Markdown

# NordaGPT Identity, Memory & Performance — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Transform NordaGPT from an anonymous chatbot into a personalized assistant with user identity, persistent memory, smart routing, and streaming responses.
**Architecture:** Four-phase rollout: (1) inject user identity into AI prompt, (2) smart router + selective context loading, (3) streaming SSE responses, (4) persistent user memory with async extraction. Each phase is independently deployable and testable.
**Tech Stack:** Flask 3.0, SQLAlchemy 2.0, PostgreSQL, Google Gemini API (3-Flash, 3.1-Flash-Lite), Server-Sent Events, Jinja2 inline JS.
**Spec:** `docs/superpowers/specs/2026-03-28-nordagpt-identity-memory-design.md`
---
## File Structure
### New files
| File | Responsibility |
|------|---------------|
| `smart_router.py` | Classifies query complexity, selects data categories and model |
| `memory_service.py` | CRUD for user memory facts + conversation summaries, extraction prompt |
| `context_builder.py` | Loads selective data from DB based on router decision |
| `database/migrations/092_ai_user_memory.sql` | Memory + summary tables |
| `database/migrations/093_ai_conversation_summary.sql` | Summary table |
### Modified files
| File | Changes |
|------|---------|
| `database.py` | Add AIUserMemory, AIConversationSummary models (before line 5954) |
| `nordabiz_chat.py` | Accept user_context, integrate router, selective context, memory injection |
| `gemini_service.py` | Token counting for streamed responses |
| `blueprints/chat/routes.py` | Build user_context, add streaming endpoint, memory CRUD routes |
| `templates/chat.html` | Streaming UI, thinking animation, memory settings panel |
---
## Phase 1: User Identity (Tasks 1-3)
### Task 1: Pass user context from route to chat engine
**Files:**
- Modify: `blueprints/chat/routes.py:234-309`
- Modify: `nordabiz_chat.py:163-180`
- [ ] **Step 1: Build user_context dict in chat route**
In `blueprints/chat/routes.py`, modify `chat_send_message()`. After line 262 (where `current_user.id` and `current_user.email` are used for limit check), add user_context construction:
```python
# After line 262, before line 268
# Build user context for AI personalization
user_context = {
'user_id': current_user.id,
'user_name': current_user.name,
'user_email': current_user.email,
'company_name': current_user.company.name if current_user.company else None,
'company_id': current_user.company.id if current_user.company else None,
'company_category': current_user.company.category.name if current_user.company and current_user.company.category else None,
'company_role': current_user.company_role or 'MEMBER',
'is_norda_member': current_user.is_norda_member,
'chamber_role': current_user.chamber_role,
'member_since': current_user.created_at.strftime('%Y-%m-%d') if current_user.created_at else None,
}
```
- [ ] **Step 2: Pass user_context to send_message()**
In the same function, modify the `chat_engine.send_message()` call (around line 282):
```python
# Before:
ai_response = chat_engine.send_message(
conversation_id,
user_message=message,
user_id=current_user.id,
thinking_level=thinking_level
)
# After:
ai_response = chat_engine.send_message(
conversation_id,
user_message=message,
user_id=current_user.id,
thinking_level=thinking_level,
user_context=user_context
)
```
- [ ] **Step 3: Update send_message() signature in nordabiz_chat.py**
In `nordabiz_chat.py`, modify `send_message()` at line 163:
```python
# Before:
def send_message(
self,
conversation_id: int,
user_message: str,
user_id: int,
thinking_level: str = 'high'
) -> AIChatMessage:
# After:
def send_message(
self,
conversation_id: int,
user_message: str,
user_id: int,
thinking_level: str = 'high',
user_context: Optional[Dict[str, Any]] = None
) -> AIChatMessage:
```
Add `from typing import Optional, Dict, Any` to imports if not already present.
- [ ] **Step 4: Thread user_context through to _query_ai()**
In `send_message()`, find the call to `_query_ai()` (around line 239) and add user_context:
```python
# Before:
ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level)
# After:
ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level, user_context=user_context)
```
- [ ] **Step 5: Update _query_ai() signature**
In `nordabiz_chat.py`, modify `_query_ai()` at line 890:
```python
# Before:
def _query_ai(
self,
context: Dict[str, Any],
user_message: str,
user_id: Optional[int] = None,
thinking_level: str = 'high'
) -> str:
# After:
def _query_ai(
self,
context: Dict[str, Any],
user_message: str,
user_id: Optional[int] = None,
thinking_level: str = 'high',
user_context: Optional[Dict[str, Any]] = None
) -> str:
```
- [ ] **Step 6: Commit**
```bash
git add blueprints/chat/routes.py nordabiz_chat.py
git commit -m "refactor(chat): thread user_context from route through to _query_ai"
```
---
### Task 2: Inject user identity into system prompt
**Files:**
- Modify: `nordabiz_chat.py:920-930`
- [ ] **Step 1: Add user identity block to system prompt**
In `nordabiz_chat.py`, inside `_query_ai()`, find line ~922 where `system_prompt` starts. Insert the user identity block BEFORE the main system prompt string (after line 921, before line 922):
```python
# Build user identity section
user_identity = ""
if user_context:
user_identity = f"""
# AKTUALNY UŻYTKOWNIK
Rozmawiasz z: {user_context.get('user_name', 'Nieznany')}
Firma: {user_context.get('company_name', 'brak')} — kategoria: {user_context.get('company_category', 'brak')}
Rola w firmie: {user_context.get('company_role', 'MEMBER')}
Członek Izby Norda Biznes: {'tak' if user_context.get('is_norda_member') else 'nie'}
Rola w Izbie: {user_context.get('chamber_role') or '—'}
Na portalu od: {user_context.get('member_since', 'nieznana data')}
ZASADY PERSONALIZACJI:
- Zwracaj się do użytkownika po imieniu (pierwsze słowo z imienia i nazwiska)
- W pierwszej wiadomości konwersacji przywitaj się: "Cześć [imię], w czym mogę pomóc?"
- Na pytania "co wiesz o mnie?" / "kim jestem?" — wypisz powyższe dane + powiązania firmowe z bazy
- Uwzględniaj kontekst firmy użytkownika w odpowiedziach (np. sugeruj partnerów z komplementarnych branż)
- NIE ujawniaj danych technicznych (user_id, company_id, rola systemowa)
"""
```
- [ ] **Step 2: Prepend user_identity to system_prompt**
Find where `system_prompt` is first assigned (line 922) and prepend:
```python
# Line 922 area - the system_prompt f-string starts here
system_prompt = user_identity + f"""Jesteś pomocnym asystentem portalu Norda Biznes...
```
This is a minimal change — just concatenate `user_identity` (which is empty string if no context) before the existing prompt.
- [ ] **Step 3: Verify syntax compiles**
```bash
python3 -m py_compile nordabiz_chat.py && echo "OK"
```
- [ ] **Step 4: Test locally**
Start local dev server and send a chat message. Verify in logs that the prompt now contains the user identity block. Check that the AI greets by name.
```bash
python3 app.py
# In another terminal:
curl -X POST http://localhost:5000/api/chat/1/message \
-H "Content-Type: application/json" \
-d '{"message": "Kim jestem?"}'
```
(Note: requires auth cookie — easier to test via browser)
- [ ] **Step 5: Commit**
```bash
git add nordabiz_chat.py
git commit -m "feat(nordagpt): inject user identity into AI system prompt — personalized greetings and context"
```
---
### Task 3: Deploy Phase 1 and verify
**Files:** None (deployment only)
- [ ] **Step 1: Push to remotes**
```bash
git push origin master && git push inpi master
```
- [ ] **Step 2: Deploy to staging**
```bash
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
```
- [ ] **Step 3: Test on staging — verify AI greets by name**
Open https://staging.nordabiznes.pl/chat, start new conversation, type "Cześć". Verify AI responds with your name.
Type "Co wiesz o mnie?" — verify AI lists your profile data.
- [ ] **Step 4: Deploy to production**
```bash
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
```
- [ ] **Step 5: Commit deployment notes (update release_notes in routes.py)**
Add new release entry in `blueprints/public/routes.py` `_get_releases()` function.
---
## Phase 2: Smart Router + Context Builder (Tasks 4-7)
### Task 4: Create context_builder.py — selective data loading
**Files:**
- Create: `context_builder.py`
- [ ] **Step 1: Create context_builder.py with selective loading functions**
```python
"""
Context Builder for NordaGPT Smart Router
==========================================
Loads only the data categories requested by the Smart Router,
instead of loading everything for every query.
"""
import json
import logging
from typing import Dict, Any, List, Optional
from datetime import datetime, timedelta
from database import (
SessionLocal, Company, Category, CompanyRecommendation,
NordaEvent, Classified, ForumTopic, ForumReply,
CompanyPerson, Person, User, CompanySocialMedia,
GBPAudit, CompanyWebsiteAnalysis, ZOPKNews,
UserCompanyPermissions
)
from sqlalchemy import func, desc
logger = logging.getLogger(__name__)
def _company_to_compact_dict(company) -> Dict:
"""Convert company to compact dict for AI context. Mirrors nordabiz_chat.py format."""
return {
'name': company.name,
'cat': company.category.name if company.category else None,
'profile': f'/firma/{company.slug}',
'desc': company.description_short,
'about': company.description_full[:500] if company.description_full else None,
'svc': company.services,
'comp': company.competencies,
'web': company.website,
'tel': company.phone,
'mail': company.email,
'city': company.city,
}
def build_selective_context(
data_needed: List[str],
conversation_id: int,
current_message: str,
user_context: Optional[Dict] = None
) -> Dict[str, Any]:
"""
Build AI context with only the requested data categories.
Args:
data_needed: List of category strings from Smart Router, e.g.:
["companies_all", "companies_filtered:IT", "companies_single:termo",
"events", "news", "classifieds", "forum", "company_people",
"registered_users", "social_media", "audits"]
conversation_id: Current conversation ID for history
current_message: User's message text
user_context: User identity dict
Returns:
Context dict compatible with nordabiz_chat.py _query_ai()
"""
db = SessionLocal()
context = {}
try:
# Always load: basic stats and conversation history
active_companies = db.query(Company).filter_by(status='active').all()
context['total_companies'] = len(active_companies)
categories = db.query(Category).all()
context['categories'] = [
{'name': c.name, 'slug': c.slug, 'company_count': len([co for co in active_companies if co.category_id == c.id])}
for c in categories
]
# Conversation history (always loaded)
from database import AIChatMessage, AIChatConversation
messages = db.query(AIChatMessage).filter_by(
conversation_id=conversation_id
).order_by(AIChatMessage.created_at.desc()).limit(10).all()
context['recent_messages'] = [
{'role': msg.role, 'content': msg.content}
for msg in reversed(messages)
]
# Selective data loading based on router decision
for category in data_needed:
if category == 'companies_all':
context['all_companies'] = [_company_to_compact_dict(c) for c in active_companies]
elif category.startswith('companies_filtered:'):
filter_cat = category.split(':', 1)[1]
filtered = [c for c in active_companies
if c.category and c.category.name.lower() == filter_cat.lower()]
context['all_companies'] = [_company_to_compact_dict(c) for c in filtered]
elif category.startswith('companies_single:'):
search = category.split(':', 1)[1].lower()
matched = [c for c in active_companies
if search in c.name.lower() or search in (c.slug or '')]
context['all_companies'] = [_company_to_compact_dict(c) for c in matched[:5]]
elif category == 'events':
events = db.query(NordaEvent).filter(
NordaEvent.event_date >= datetime.now(),
NordaEvent.event_date <= datetime.now() + timedelta(days=60)
).order_by(NordaEvent.event_date).all()
context['upcoming_events'] = [
{'title': e.title, 'date': str(e.event_date), 'type': e.event_type,
'location': e.location, 'url': f'/kalendarz/{e.id}'}
for e in events
]
elif category == 'news':
news = db.query(ZOPKNews).filter(
ZOPKNews.published_at >= datetime.now() - timedelta(days=30),
ZOPKNews.status == 'approved'
).order_by(ZOPKNews.published_at.desc()).limit(10).all()
context['recent_news'] = [
{'title': n.title, 'summary': n.ai_summary, 'date': str(n.published_at),
'source': n.source_name, 'url': n.source_url}
for n in news
]
elif category == 'classifieds':
classifieds = db.query(Classified).filter(
Classified.status == 'active',
Classified.is_test == False
).order_by(Classified.created_at.desc()).limit(20).all()
context['classifieds'] = [
{'type': c.listing_type, 'title': c.title, 'description': c.description,
'company': c.company.name if c.company else None,
'budget': c.budget_text, 'url': f'/b2b/{c.id}'}
for c in classifieds
]
elif category == 'forum':
topics = db.query(ForumTopic).filter(
ForumTopic.is_test == False
).order_by(ForumTopic.created_at.desc()).limit(15).all()
context['forum_topics'] = [
{'title': t.title, 'content': t.content[:300],
'author': t.author.name if t.author else None,
'replies': t.reply_count, 'url': f'/forum/{t.slug}'}
for t in topics
]
elif category == 'company_people':
people_query = db.query(CompanyPerson).join(Person).join(Company).filter(
Company.status == 'active'
).all()
grouped = {}
for cp in people_query:
cname = cp.company.name
if cname not in grouped:
grouped[cname] = []
grouped[cname].append({
'name': cp.person.name,
'role': cp.role_description,
'shares': cp.shares_value
})
context['company_people'] = grouped
elif category == 'registered_users':
users = db.query(User).filter(
User.is_active == True,
User.company_id.isnot(None)
).all()
grouped = {}
for u in users:
cname = u.company.name if u.company else 'Brak firmy'
if cname not in grouped:
grouped[cname] = []
grouped[cname].append({
'name': u.name, 'email': u.email,
'role': u.company_role, 'member': u.is_norda_member
})
context['registered_users'] = grouped
elif category == 'social_media':
socials = db.query(CompanySocialMedia).filter_by(is_valid=True).all()
grouped = {}
for s in socials:
cname = s.company.name if s.company else 'Unknown'
if cname not in grouped:
grouped[cname] = []
grouped[cname].append({
'platform': s.platform, 'url': s.url,
'followers': s.followers_count
})
context['company_social_media'] = grouped
elif category == 'audits':
# GBP audits
gbp = db.query(GBPAudit).order_by(GBPAudit.created_at.desc()).all()
seen = set()
gbp_unique = []
for g in gbp:
if g.company_id not in seen:
seen.add(g.company_id)
gbp_unique.append({
'company': g.company.name if g.company else None,
'score': g.overall_score, 'reviews': g.total_reviews,
'rating': g.average_rating
})
context['gbp_audits'] = gbp_unique
# SEO audits
seo = db.query(CompanyWebsiteAnalysis).all()
context['seo_audits'] = [
{'company': s.company.name if s.company else None,
'seo': s.seo_score, 'performance': s.performance_score}
for s in seo
]
# If no companies were loaded by any category, load a minimal summary
if 'all_companies' not in context:
context['all_companies'] = []
finally:
db.close()
return context
```
- [ ] **Step 2: Verify syntax**
```bash
python3 -m py_compile context_builder.py && echo "OK"
```
- [ ] **Step 3: Commit**
```bash
git add context_builder.py
git commit -m "feat(nordagpt): add context_builder.py — selective data loading for smart router"
```
---
### Task 5: Create smart_router.py — query classification
**Files:**
- Create: `smart_router.py`
- [ ] **Step 1: Create smart_router.py**
```python
"""
Smart Router for NordaGPT
==========================
Classifies query complexity and selects which data categories to load.
Uses Gemini 3.1 Flash-Lite for fast, cheap classification (~1-2s).
"""
import json
import logging
import time
from typing import Dict, Any, List, Optional
logger = logging.getLogger(__name__)
# Keyword-based fast routing (no API call needed)
FAST_ROUTES = {
'companies_all': ['wszystkie firmy', 'ile firm', 'lista firm', 'katalog', 'porównaj firmy'],
'events': ['wydarzenie', 'spotkanie', 'kalendarz', 'konferencja', 'szkolenie', 'kiedy'],
'news': ['aktualności', 'nowości', 'wiadomości', 'pej', 'atom', 'elektrownia', 'zopk'],
'classifieds': ['ogłoszenie', 'b2b', 'zlecenie', 'oferta', 'szukam', 'oferuję'],
'forum': ['forum', 'dyskusja', 'temat', 'wątek', 'post'],
'company_people': ['zarząd', 'krs', 'właściciel', 'prezes', 'udziały', 'wspólnik'],
'registered_users': ['użytkownik', 'kto jest', 'profil', 'zarejestrowany', 'członek'],
'social_media': ['facebook', 'instagram', 'linkedin', 'social media', 'media społeczn'],
'audits': ['seo', 'google', 'gbp', 'opinie', 'ocena', 'pageSpeed'],
}
# Model selection by complexity
MODEL_MAP = {
'simple': {'model': '3.1-flash-lite', 'thinking': 'minimal'},
'medium': {'model': '3-flash', 'thinking': 'low'},
'complex': {'model': '3-flash', 'thinking': 'high'},
}
ROUTER_PROMPT = """Jesteś routerem zapytań. Przeanalizuj pytanie i zdecyduj jakie dane potrzebne.
Użytkownik: {user_name} z firmy {company_name}
Pytanie: {message}
Zwróć TYLKO JSON (bez markdown):
{{
"complexity": "simple|medium|complex",
"data_needed": ["lista kategorii z poniższych"]
}}
Kategorie:
- companies_all — wszystkie firmy (porównania, przeglądy, "ile firm")
- companies_filtered:KATEGORIA — firmy z kategorii (np. companies_filtered:IT)
- companies_single:NAZWA — jedna firma (np. companies_single:termo)
- events — nadchodzące wydarzenia
- news — aktualności, PEJ, ZOPK
- classifieds — ogłoszenia B2B
- forum — tematy forum
- company_people — zarząd, KRS, udziałowcy
- registered_users — użytkownicy portalu
- social_media — profile social media firm
- audits — wyniki SEO/GBP
Zasady:
- "simple" = jedno pytanie o konkretną rzecz (telefon, adres, link)
- "medium" = porównanie, lista, filtrowanie
- "complex" = analiza, strategia, rekomendacje
- Wybierz MINIMUM kategorii. Nie ładuj niepotrzebnych danych.
- Jeśli pytanie dotyczy konkretnej firmy, użyj companies_single:nazwa
- Pytania ogólne o użytkownika (kim jestem, co wiesz) = [] (dane z profilu wystarczą)
"""
def route_query_fast(message: str, user_context: Optional[Dict] = None) -> Dict[str, Any]:
"""
Fast keyword-based routing. No API call.
Returns routing decision or None if uncertain (needs AI router).
"""
msg_lower = message.lower()
# Check for personal questions — no data needed
personal_patterns = ['kim jestem', 'co wiesz o mnie', 'mój profil', 'moje dane']
if any(p in msg_lower for p in personal_patterns):
return {
'complexity': 'simple',
'data_needed': [],
'model': '3.1-flash-lite',
'thinking': 'minimal',
'routed_by': 'fast'
}
# Check for greetings — no data needed
greeting_patterns = ['cześć', 'hej', 'witam', 'dzień dobry', 'siema', 'hello']
if any(msg_lower.strip().startswith(p) for p in greeting_patterns) and len(message) < 30:
return {
'complexity': 'simple',
'data_needed': [],
'model': '3.1-flash-lite',
'thinking': 'minimal',
'routed_by': 'fast'
}
# Check keyword matches
matched_categories = []
for category, keywords in FAST_ROUTES.items():
if any(kw in msg_lower for kw in keywords):
matched_categories.append(category)
# Check for specific company name mention
# Simple heuristic: if message has quotes or specific capitalized words
if not matched_categories:
# Can't determine — return None to trigger AI router
return None
# Determine complexity
if len(matched_categories) <= 1 and len(message) < 80:
complexity = 'simple'
elif len(matched_categories) <= 2:
complexity = 'medium'
else:
complexity = 'complex'
model_config = MODEL_MAP[complexity]
return {
'complexity': complexity,
'data_needed': matched_categories,
'model': model_config['model'],
'thinking': model_config['thinking'],
'routed_by': 'fast'
}
def route_query_ai(
message: str,
user_context: Optional[Dict] = None,
gemini_service=None
) -> Dict[str, Any]:
"""
AI-powered routing using Flash-Lite. Called when fast routing is uncertain.
"""
if not gemini_service:
# Fallback: load everything
return _fallback_route()
user_name = user_context.get('user_name', 'Nieznany') if user_context else 'Nieznany'
company_name = user_context.get('company_name', 'brak') if user_context else 'brak'
prompt = ROUTER_PROMPT.format(
user_name=user_name,
company_name=company_name,
message=message
)
try:
start = time.time()
response = gemini_service.generate_text(
prompt=prompt,
temperature=0.1,
max_tokens=200,
model='gemini-3.1-flash-lite-preview',
thinking_level='minimal',
feature='smart_router'
)
latency = int((time.time() - start) * 1000)
logger.info(f"Smart Router AI response in {latency}ms: {response[:200]}")
# Parse JSON from response
# Handle potential markdown wrapping
text = response.strip()
if text.startswith('```'):
text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()
result = json.loads(text)
complexity = result.get('complexity', 'medium')
model_config = MODEL_MAP.get(complexity, MODEL_MAP['medium'])
return {
'complexity': complexity,
'data_needed': result.get('data_needed', []),
'model': model_config['model'],
'thinking': model_config['thinking'],
'routed_by': 'ai',
'router_latency_ms': latency
}
except (json.JSONDecodeError, KeyError, Exception) as e:
logger.warning(f"Smart Router AI failed: {e}, falling back to full context")
return _fallback_route()
def route_query(
message: str,
user_context: Optional[Dict] = None,
gemini_service=None
) -> Dict[str, Any]:
"""
Main entry point. Tries fast routing first, falls back to AI routing.
"""
# Try fast keyword-based routing
result = route_query_fast(message, user_context)
if result is not None:
logger.info(f"Smart Router FAST: complexity={result['complexity']}, data={result['data_needed']}")
return result
# Fall back to AI routing
result = route_query_ai(message, user_context, gemini_service)
logger.info(f"Smart Router AI: complexity={result['complexity']}, data={result['data_needed']}")
return result
def _fallback_route() -> Dict[str, Any]:
"""Fallback: load everything, use default model. Safe but slow."""
return {
'complexity': 'medium',
'data_needed': [
'companies_all', 'events', 'news', 'classifieds',
'forum', 'company_people', 'registered_users'
],
'model': '3-flash',
'thinking': 'low',
'routed_by': 'fallback'
}
```
- [ ] **Step 2: Verify syntax**
```bash
python3 -m py_compile smart_router.py && echo "OK"
```
- [ ] **Step 3: Commit**
```bash
git add smart_router.py
git commit -m "feat(nordagpt): add smart_router.py — fast keyword routing + AI fallback"
```
---
### Task 6: Integrate Smart Router into nordabiz_chat.py
**Files:**
- Modify: `nordabiz_chat.py:163-282, 347-643, 890-1365`
- [ ] **Step 1: Add imports at top of nordabiz_chat.py**
After existing imports (around line 30), add:
```python
from smart_router import route_query
from context_builder import build_selective_context
```
- [ ] **Step 2: Modify send_message() to use Smart Router**
In `send_message()`, replace the call to `_build_conversation_context()` and `_query_ai()` (around lines 236-239). The key change: use the router to decide model and data, then use context_builder for selective loading.
Find the section where context is built and AI is queried (around lines 236-241):
```python
# Before (approximately lines 236-241):
# context = self._build_conversation_context(db, conversation, original_message)
# ai_response_text = self._query_ai(context, original_message, user_id=user_id, thinking_level=thinking_level, user_context=user_context)
# After:
# Smart Router — classify query and select data + model
route_decision = route_query(
message=original_message,
user_context=user_context,
gemini_service=self.gemini_service
)
# Override model and thinking based on router decision
effective_model = route_decision.get('model', '3-flash')
effective_thinking = route_decision.get('thinking', thinking_level)
# Build selective context (only requested data categories)
context = build_selective_context(
data_needed=route_decision.get('data_needed', []),
conversation_id=conversation.id,
current_message=original_message,
user_context=user_context
)
# Use the original _query_ai but with router-selected parameters
ai_response_text = self._query_ai(
context, original_message,
user_id=user_id,
thinking_level=effective_thinking,
user_context=user_context
)
```
Note: Keep `_build_conversation_context()` and full `_query_ai()` intact as fallback. The router's `_fallback_route()` loads all data, so it's safe.
- [ ] **Step 3: Log routing decisions**
After the route_query call, add logging:
```python
logger.info(
f"NordaGPT Router: user={user_context.get('user_name') if user_context else '?'}, "
f"complexity={route_decision['complexity']}, model={effective_model}, "
f"thinking={effective_thinking}, data={route_decision['data_needed']}, "
f"routed_by={route_decision.get('routed_by')}"
)
```
- [ ] **Step 4: Update the GeminiService call in _query_ai() to use effective model**
Currently `_query_ai()` uses `self.gemini_service` which has a fixed model. We need to pass the router-selected model to the generate_text call. In `_query_ai()`, around line 1352, modify:
```python
# Before:
response = self.gemini_service.generate_text(
prompt=full_prompt,
temperature=0.7,
thinking_level=thinking_level,
user_id=user_id,
feature='chat'
)
# After:
response = self.gemini_service.generate_text(
prompt=full_prompt,
temperature=0.7,
thinking_level=thinking_level,
user_id=user_id,
feature='chat',
model=route_decision.get('model') if hasattr(self, '_current_route_decision') else None
)
```
Actually, a cleaner approach — pass the model through context:
In `send_message()`, add to context before calling `_query_ai()`:
```python
context['_route_decision'] = route_decision
```
In `_query_ai()`, read it at the generate_text call:
```python
route = context.get('_route_decision', {})
effective_model_id = None
model_alias = route.get('model')
if model_alias:
from gemini_service import GEMINI_MODELS
effective_model_id = GEMINI_MODELS.get(model_alias)
response = self.gemini_service.generate_text(
prompt=full_prompt,
temperature=0.7,
thinking_level=thinking_level,
user_id=user_id,
feature='chat',
model=effective_model_id
)
```
- [ ] **Step 5: Verify syntax**
```bash
python3 -m py_compile nordabiz_chat.py && echo "OK"
```
- [ ] **Step 6: Commit**
```bash
git add nordabiz_chat.py
git commit -m "feat(nordagpt): integrate smart router — selective context loading + adaptive model selection"
```
---
### Task 7: Deploy Phase 2 and verify
- [ ] **Step 1: Push and deploy to staging**
```bash
git push origin master && git push inpi master
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
```
- [ ] **Step 2: Test on staging — verify routing works**
Test simple query: "Jaki jest telefon do TERMO?" — should be fast (2-3s), Flash-Lite model.
Test medium query: "Porównaj firmy budowlane w Izbie" — should load companies_all, medium speed.
Test complex query: "Jakie firmy mogłyby współpracować przy projekcie PEJ?" — should use full context.
Check logs for routing decisions:
```bash
ssh maciejpi@10.22.68.248 "journalctl -u nordabiznes -n 30 --no-pager | grep 'Router'"
```
- [ ] **Step 3: Deploy to production**
```bash
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
```
---
## Phase 3: Streaming Responses (Tasks 8-10)
### Task 8: Add streaming endpoint in Flask
**Files:**
- Modify: `blueprints/chat/routes.py`
- Modify: `nordabiz_chat.py`
- [ ] **Step 1: Add SSE streaming endpoint**
In `blueprints/chat/routes.py`, add a new route after `chat_send_message()` (after line ~309):
```python
@bp.route('/api/chat/<int:conversation_id>/message/stream', methods=['POST'])
@login_required
@member_required
def chat_send_message_stream(conversation_id):
"""Send message to AI chat with streaming response (SSE)"""
from flask import Response, stream_with_context
import json as json_module
data = request.get_json()
if not data or not data.get('message', '').strip():
return jsonify({'error': 'Wiadomość nie może być pusta'}), 400
message = data['message'].strip()
# Check limits
from nordabiz_chat import check_user_limits
limit_result = check_user_limits(current_user.id, current_user.email)
if limit_result.get('limited'):
return jsonify({'error': 'Przekroczono limit', 'limit_info': limit_result}), 429
# Build user context
user_context = {
'user_id': current_user.id,
'user_name': current_user.name,
'user_email': current_user.email,
'company_name': current_user.company.name if current_user.company else None,
'company_id': current_user.company.id if current_user.company else None,
'company_category': current_user.company.category.name if current_user.company and current_user.company.category else None,
'company_role': current_user.company_role or 'MEMBER',
'is_norda_member': current_user.is_norda_member,
'chamber_role': current_user.chamber_role,
'member_since': current_user.created_at.strftime('%Y-%m-%d') if current_user.created_at else None,
}
model_choice = data.get('model') or session.get('chat_model', 'flash')
model_key = '3-flash' if model_choice == 'flash' else '3-pro'
def generate():
try:
chat_engine = NordaBizChatEngine(model=model_key)
for chunk in chat_engine.send_message_stream(
conversation_id=conversation_id,
user_message=message,
user_id=current_user.id,
user_context=user_context
):
yield f"data: {json_module.dumps(chunk, ensure_ascii=False)}\n\n"
except PermissionError:
yield f"data: {json_module.dumps({'type': 'error', 'content': 'Brak dostępu do tej konwersacji'})}\n\n"
except Exception as e:
logger.error(f"Streaming error: {e}")
yield f"data: {json_module.dumps({'type': 'error', 'content': 'Wystąpił błąd'})}\n\n"
return Response(
stream_with_context(generate()),
mimetype='text/event-stream',
headers={
'Cache-Control': 'no-cache',
'X-Accel-Buffering': 'no', # Disable Nginx buffering
}
)
```
- [ ] **Step 2: Add send_message_stream() to NordaBizChatEngine**
In `nordabiz_chat.py`, add a new method after `send_message()` (after line ~282):
```python
def send_message_stream(
self,
conversation_id: int,
user_message: str,
user_id: int,
user_context: Optional[Dict[str, Any]] = None
):
"""
Generator that yields streaming chunks for SSE.
Yields dicts: {'type': 'thinking'|'token'|'done'|'error', 'content': '...'}
"""
import time
db = SessionLocal()
try:
conversation = db.query(AIChatConversation).filter_by(
id=conversation_id, user_id=user_id
).first()
if not conversation:
yield {'type': 'error', 'content': 'Konwersacja nie znaleziona'}
return
# Save user message
original_message = user_message
sanitized = self._sanitize_message(user_message)
user_msg = AIChatMessage(
conversation_id=conversation_id,
role='user',
content=sanitized
)
db.add(user_msg)
db.commit()
# Smart Router
route_decision = route_query(
message=original_message,
user_context=user_context,
gemini_service=self.gemini_service
)
yield {'type': 'thinking', 'content': 'Analizuję pytanie...'}
# Build selective context
context = build_selective_context(
data_needed=route_decision.get('data_needed', []),
conversation_id=conversation.id,
current_message=original_message,
user_context=user_context
)
context['_route_decision'] = route_decision
# Build prompt (reuse _query_ai logic for prompt building)
full_prompt = self._build_prompt(context, original_message, user_context, route_decision.get('thinking', 'low'))
# Get effective model
from gemini_service import GEMINI_MODELS
model_alias = route_decision.get('model', '3-flash')
effective_model = GEMINI_MODELS.get(model_alias, self.model_name)
# Stream from Gemini
start_time = time.time()
stream_response = self.gemini_service.generate_text(
prompt=full_prompt,
temperature=0.7,
stream=True,
thinking_level=route_decision.get('thinking', 'low'),
user_id=user_id,
feature='chat_stream',
model=effective_model
)
full_text = ""
for chunk in stream_response:
if hasattr(chunk, 'text') and chunk.text:
full_text += chunk.text
yield {'type': 'token', 'content': chunk.text}
latency_ms = int((time.time() - start_time) * 1000)
# Save AI response to DB
ai_msg = AIChatMessage(
conversation_id=conversation_id,
role='assistant',
content=full_text,
latency_ms=latency_ms
)
db.add(ai_msg)
conversation.updated_at = datetime.now()
conversation.message_count = (conversation.message_count or 0) + 2
db.commit()
yield {
'type': 'done',
'message_id': ai_msg.id,
'latency_ms': latency_ms,
'model': model_alias,
'complexity': route_decision.get('complexity')
}
except Exception as e:
logger.error(f"Stream error: {e}", exc_info=True)
yield {'type': 'error', 'content': 'Wystąpił błąd podczas generowania odpowiedzi'}
finally:
db.close()
```
- [ ] **Step 3: Extract prompt building into reusable method**
Add a `_build_prompt()` method to `NordaBizChatEngine` that extracts prompt construction from `_query_ai()`. This method builds the full prompt string without calling Gemini:
```python
def _build_prompt(
self,
context: Dict[str, Any],
user_message: str,
user_context: Optional[Dict[str, Any]] = None,
thinking_level: str = 'low'
) -> str:
"""Build the full prompt string. Extracted from _query_ai() for reuse in streaming."""
# Build user identity section
user_identity = ""
if user_context:
user_identity = f"""
# AKTUALNY UŻYTKOWNIK
Rozmawiasz z: {user_context.get('user_name', 'Nieznany')}
Firma: {user_context.get('company_name', 'brak')} — kategoria: {user_context.get('company_category', 'brak')}
Rola w firmie: {user_context.get('company_role', 'MEMBER')}
Członek Izby: {'tak' if user_context.get('is_norda_member') else 'nie'}
Rola w Izbie: {user_context.get('chamber_role') or '—'}
Na portalu od: {user_context.get('member_since', 'nieznana data')}
"""
# Reuse the existing system_prompt from _query_ai() lines 922-1134
# This is the same static prompt — extract it to a class attribute or method
# For now, call _query_ai's prompt logic
# NOTE: In implementation, refactor the static prompt into a separate method
# to avoid duplication. The key point is that _build_prompt returns the
# same prompt string that _query_ai would build.
# ... (reuse existing system prompt construction logic) ...
return full_prompt
```
**Implementation note:** The actual implementation should refactor `_query_ai()` to call `_build_prompt()` internally, then the streaming method also calls `_build_prompt()`. This avoids prompt duplication.
- [ ] **Step 4: Verify syntax**
```bash
python3 -m py_compile nordabiz_chat.py && python3 -m py_compile blueprints/chat/routes.py && echo "OK"
```
- [ ] **Step 5: Commit**
```bash
git add nordabiz_chat.py blueprints/chat/routes.py
git commit -m "feat(nordagpt): add streaming SSE endpoint + send_message_stream method"
```
---
### Task 9: Frontend streaming UI
**Files:**
- Modify: `templates/chat.html`
- [ ] **Step 1: Add streaming sendMessage function**
In `templates/chat.html`, replace the existing `sendMessage()` function (lines 2373-2454) with a streaming version:
```javascript
async function sendMessage() {
const input = document.getElementById('messageInput');
const message = input.value.trim();
if (!message || isSending) return;
isSending = true;
document.getElementById('sendBtn').disabled = true;
input.value = '';
autoResizeTextarea();
// Add user message to chat
addMessage('user', message);
// Create conversation if needed
if (!currentConversationId) {
try {
const startRes = await fetch('/api/chat/start', {
method: 'POST',
headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
body: JSON.stringify({title: message.substring(0, 50)})
});
const startData = await startRes.json();
currentConversationId = startData.conversation_id;
} catch (e) {
addMessage('assistant', 'Błąd tworzenia konwersacji.');
isSending = false;
document.getElementById('sendBtn').disabled = false;
return;
}
}
// Add empty assistant bubble with thinking animation
const msgDiv = document.createElement('div');
msgDiv.className = 'message assistant';
msgDiv.innerHTML = `
<div class="message-avatar">AI</div>
<div class="message-content">
<div class="thinking-dots"><span>.</span><span>.</span><span>.</span></div>
</div>
`;
document.getElementById('chatMessages').appendChild(msgDiv);
scrollToBottom();
const contentDiv = msgDiv.querySelector('.message-content');
try {
const response = await fetch(`/api/chat/${currentConversationId}/message/stream`, {
method: 'POST',
headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
body: JSON.stringify({message: message, model: currentModel})
});
if (response.status === 429) {
contentDiv.innerHTML = '';
contentDiv.textContent = 'Przekroczono limit zapytań.';
showLimitBanner();
isSending = false;
document.getElementById('sendBtn').disabled = false;
return;
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
let fullText = '';
let thinkingRemoved = false;
while (true) {
const {done, value} = await reader.read();
if (done) break;
const text = decoder.decode(value, {stream: true});
const lines = text.split('\n');
for (const line of lines) {
if (!line.startsWith('data: ')) continue;
try {
const chunk = JSON.parse(line.slice(6));
if (chunk.type === 'thinking') {
// Keep thinking dots visible
continue;
}
if (chunk.type === 'token') {
if (!thinkingRemoved) {
contentDiv.innerHTML = '';
thinkingRemoved = true;
}
fullText += chunk.content;
contentDiv.innerHTML = formatMessage(fullText);
scrollToBottom();
}
if (chunk.type === 'done') {
// Add tech info badge
if (chunk.latency_ms) {
const badge = document.createElement('div');
badge.className = 'thinking-info-badge';
badge.textContent = `${chunk.model || 'AI'} · ${(chunk.latency_ms/1000).toFixed(1)}s`;
msgDiv.appendChild(badge);
}
loadConversations();
}
if (chunk.type === 'error') {
contentDiv.innerHTML = '';
contentDiv.textContent = chunk.content || 'Wystąpił błąd';
}
} catch (e) {
// Skip malformed chunks
}
}
}
} catch (e) {
contentDiv.innerHTML = '';
contentDiv.textContent = 'Błąd połączenia z serwerem.';
}
isSending = false;
document.getElementById('sendBtn').disabled = false;
}
```
- [ ] **Step 2: Add CSS for thinking animation**
In `templates/chat.html`, in the `{% block extra_css %}` section, add:
```css
.thinking-dots {
display: flex;
gap: 4px;
padding: 8px 0;
}
.thinking-dots span {
animation: thinkBounce 1.4s infinite ease-in-out both;
font-size: 1.5rem;
color: var(--text-secondary);
}
.thinking-dots span:nth-child(1) { animation-delay: -0.32s; }
.thinking-dots span:nth-child(2) { animation-delay: -0.16s; }
.thinking-dots span:nth-child(3) { animation-delay: 0s; }
@keyframes thinkBounce {
0%, 80%, 100% { transform: scale(0); }
40% { transform: scale(1); }
}
```
- [ ] **Step 3: Verify locally and commit**
```bash
python3 -m py_compile app.py && echo "OK"
git add templates/chat.html
git commit -m "feat(nordagpt): streaming UI — word-by-word response with thinking animation"
```
---
### Task 10: Deploy Phase 3 and verify streaming
- [ ] **Step 1: Check Nginx/NPM config for SSE support**
SSE requires Nginx to NOT buffer the response. The streaming endpoint sets `X-Accel-Buffering: no` header. Verify NPM custom config allows this:
```bash
ssh maciejpi@57.128.200.27 "cat /etc/nginx/sites-enabled/nordabiznes.conf 2>/dev/null || echo 'Using NPM proxy'"
```
If using NPM, the `X-Accel-Buffering: no` header should be sufficient. If not, add to NPM custom Nginx config for nordabiznes.pl:
```
proxy_buffering off;
proxy_cache off;
```
- [ ] **Step 2: Push, deploy to staging, test streaming**
```bash
git push origin master && git push inpi master
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
```
Test on staging: open chat, send message, verify text appears word-by-word.
- [ ] **Step 3: Deploy to production**
```bash
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull && sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
```
---
## Phase 4: Persistent User Memory (Tasks 11-15)
### Task 11: Database migration — memory tables
**Files:**
- Create: `database/migrations/092_ai_user_memory.sql`
- Create: `database/migrations/093_ai_conversation_summary.sql`
- [ ] **Step 1: Create migration 092**
```sql
-- 092_ai_user_memory.sql
-- Persistent memory for NordaGPT — per-user facts extracted from conversations
CREATE TABLE IF NOT EXISTS ai_user_memory (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
fact TEXT NOT NULL,
category VARCHAR(50) DEFAULT 'general',
source_conversation_id INTEGER REFERENCES ai_chat_conversations(id) ON DELETE SET NULL,
confidence FLOAT DEFAULT 1.0,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP DEFAULT (NOW() + INTERVAL '12 months'),
is_active BOOLEAN DEFAULT TRUE
);
CREATE INDEX idx_ai_user_memory_user_active ON ai_user_memory(user_id, is_active, confidence DESC);
CREATE INDEX idx_ai_user_memory_expires ON ai_user_memory(expires_at) WHERE is_active = TRUE;
GRANT ALL ON TABLE ai_user_memory TO nordabiz_app;
GRANT USAGE, SELECT ON SEQUENCE ai_user_memory_id_seq TO nordabiz_app;
```
- [ ] **Step 2: Create migration 093**
```sql
-- 093_ai_conversation_summary.sql
-- Auto-generated summaries of AI conversations for memory context
CREATE TABLE IF NOT EXISTS ai_conversation_summary (
id SERIAL PRIMARY KEY,
conversation_id INTEGER NOT NULL UNIQUE REFERENCES ai_chat_conversations(id) ON DELETE CASCADE,
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
summary TEXT NOT NULL,
key_topics JSONB DEFAULT '[]',
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_ai_conv_summary_user ON ai_conversation_summary(user_id, created_at DESC);
GRANT ALL ON TABLE ai_conversation_summary TO nordabiz_app;
GRANT USAGE, SELECT ON SEQUENCE ai_conversation_summary_id_seq TO nordabiz_app;
```
- [ ] **Step 3: Commit migrations**
```bash
git add database/migrations/092_ai_user_memory.sql database/migrations/093_ai_conversation_summary.sql
git commit -m "feat(nordagpt): add migrations for user memory and conversation summary tables"
```
---
### Task 12: Add SQLAlchemy models
**Files:**
- Modify: `database.py` (insert before line 5954)
- [ ] **Step 1: Add AIUserMemory model**
Insert before the `# DATABASE INITIALIZATION` comment (line 5954):
```python
class AIUserMemory(Base):
__tablename__ = 'ai_user_memory'
id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False)
fact = Column(Text, nullable=False)
category = Column(String(50), default='general')
source_conversation_id = Column(Integer, ForeignKey('ai_chat_conversations.id', ondelete='SET NULL'), nullable=True)
confidence = Column(Float, default=1.0)
created_at = Column(DateTime, default=datetime.utcnow)
expires_at = Column(DateTime)
is_active = Column(Boolean, default=True)
user = relationship('User')
source_conversation = relationship('AIChatConversation')
class AIConversationSummary(Base):
__tablename__ = 'ai_conversation_summary'
id = Column(Integer, primary_key=True)
conversation_id = Column(Integer, ForeignKey('ai_chat_conversations.id', ondelete='CASCADE'), nullable=False, unique=True)
user_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False)
summary = Column(Text, nullable=False)
key_topics = Column(JSON, default=list)
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow)
user = relationship('User')
conversation = relationship('AIChatConversation')
```
- [ ] **Step 2: Verify syntax**
```bash
python3 -m py_compile database.py && echo "OK"
```
- [ ] **Step 3: Commit**
```bash
git add database.py
git commit -m "feat(nordagpt): add AIUserMemory and AIConversationSummary ORM models"
```
---
### Task 13: Create memory_service.py
**Files:**
- Create: `memory_service.py`
- [ ] **Step 1: Create memory_service.py**
```python
"""
Memory Service for NordaGPT
=============================
Manages persistent per-user memory: fact extraction, storage, retrieval, cleanup.
"""
import json
import logging
from datetime import datetime, timedelta
from typing import Dict, Any, List, Optional
from database import SessionLocal, AIUserMemory, AIConversationSummary, AIChatMessage
logger = logging.getLogger(__name__)
EXTRACT_FACTS_PROMPT = """Na podstawie tej rozmowy wyciągnij kluczowe fakty o użytkowniku {user_name} ({company_name}).
Rozmowa:
{conversation_text}
Istniejące fakty (NIE DUPLIKUJ):
{existing_facts}
Zwróć TYLKO JSON array (bez markdown):
[{{"fact": "...", "category": "interests|needs|contacts|insights"}}]
Zasady:
- Tylko nowe, nietrywialne fakty przydatne w przyszłych rozmowach
- Nie zapisuj: "zapytał o firmę X" (to za mało)
- Zapisuj: "szuka podwykonawców do projektu PEJ w branży elektrycznej"
- Max 3 fakty. Jeśli nie ma nowych faktów, zwróć []
- Kategorie: interests (zainteresowania), needs (potrzeby biznesowe), contacts (kontakty), insights (wnioski/preferencje)
"""
SUMMARIZE_PROMPT = """Podsumuj tę rozmowę w 1-3 zdaniach. Skup się na tym, czego użytkownik szukał i co ustalono.
Rozmowa:
{conversation_text}
Zwróć TYLKO JSON (bez markdown):
{{"summary": "...", "key_topics": ["temat1", "temat2"]}}
"""
def get_user_memory(user_id: int, limit: int = 10) -> List[Dict]:
"""Get active memory facts for a user, sorted by recency and confidence."""
db = SessionLocal()
try:
facts = db.query(AIUserMemory).filter(
AIUserMemory.user_id == user_id,
AIUserMemory.is_active == True,
AIUserMemory.expires_at > datetime.now()
).order_by(
AIUserMemory.confidence.desc(),
AIUserMemory.created_at.desc()
).limit(limit).all()
return [
{
'id': f.id,
'fact': f.fact,
'category': f.category,
'confidence': f.confidence,
'created_at': f.created_at.isoformat()
}
for f in facts
]
finally:
db.close()
def get_conversation_summaries(user_id: int, limit: int = 5) -> List[Dict]:
"""Get recent conversation summaries for a user."""
db = SessionLocal()
try:
summaries = db.query(AIConversationSummary).filter(
AIConversationSummary.user_id == user_id
).order_by(
AIConversationSummary.created_at.desc()
).limit(limit).all()
return [
{
'summary': s.summary,
'topics': s.key_topics or [],
'date': s.created_at.strftime('%Y-%m-%d')
}
for s in summaries
]
finally:
db.close()
def format_memory_for_prompt(user_id: int) -> str:
"""Format user memory and summaries for injection into AI prompt."""
facts = get_user_memory(user_id)
summaries = get_conversation_summaries(user_id)
if not facts and not summaries:
return ""
parts = ["\n# PAMIĘĆ O UŻYTKOWNIKU"]
if facts:
parts.append("Znane fakty:")
for f in facts:
parts.append(f"- [{f['category']}] {f['fact']}")
if summaries:
parts.append("\nOstatnie rozmowy:")
for s in summaries:
topics = ", ".join(s['topics'][:3]) if s['topics'] else ""
parts.append(f"- {s['date']}: {s['summary']}" + (f" (tematy: {topics})" if topics else ""))
parts.append("\nWykorzystuj tę wiedzę do personalizacji odpowiedzi. Nawiązuj do wcześniejszych rozmów gdy to naturalne.")
return "\n".join(parts)
def extract_facts_async(
conversation_id: int,
user_id: int,
user_context: Dict,
gemini_service
):
"""
Extract memory facts from a conversation. Run async after response is sent.
Uses Flash-Lite for minimal cost.
"""
db = SessionLocal()
try:
# Get conversation messages
messages = db.query(AIChatMessage).filter_by(
conversation_id=conversation_id
).order_by(AIChatMessage.created_at).all()
if len(messages) < 2:
return # Too short to extract
conversation_text = "\n".join([
f"{'Użytkownik' if m.role == 'user' else 'NordaGPT'}: {m.content}"
for m in messages[-10:] # Last 10 messages
])
# Get existing facts to avoid duplicates
existing = db.query(AIUserMemory).filter(
AIUserMemory.user_id == user_id,
AIUserMemory.is_active == True
).all()
existing_text = "\n".join([f"- {f.fact}" for f in existing]) or "Brak"
prompt = EXTRACT_FACTS_PROMPT.format(
user_name=user_context.get('user_name', 'Nieznany'),
company_name=user_context.get('company_name', 'brak'),
conversation_text=conversation_text,
existing_facts=existing_text
)
response = gemini_service.generate_text(
prompt=prompt,
temperature=0.1,
max_tokens=300,
model='gemini-3.1-flash-lite-preview',
thinking_level='minimal',
feature='memory_extraction'
)
# Parse response
text = response.strip()
if text.startswith('```'):
text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()
facts = json.loads(text)
if not isinstance(facts, list):
return
for fact_data in facts[:3]:
if not fact_data.get('fact'):
continue
memory = AIUserMemory(
user_id=user_id,
fact=fact_data['fact'],
category=fact_data.get('category', 'general'),
source_conversation_id=conversation_id,
expires_at=datetime.now() + timedelta(days=365)
)
db.add(memory)
db.commit()
logger.info(f"Extracted {len(facts)} memory facts for user {user_id}")
except Exception as e:
logger.warning(f"Memory extraction failed for conversation {conversation_id}: {e}")
db.rollback()
finally:
db.close()
def summarize_conversation_async(
conversation_id: int,
user_id: int,
gemini_service
):
"""Generate or update conversation summary. Run async."""
db = SessionLocal()
try:
messages = db.query(AIChatMessage).filter_by(
conversation_id=conversation_id
).order_by(AIChatMessage.created_at).all()
if len(messages) < 2:
return
conversation_text = "\n".join([
f"{'Użytkownik' if m.role == 'user' else 'NordaGPT'}: {m.content[:200]}"
for m in messages[-10:]
])
prompt = SUMMARIZE_PROMPT.format(conversation_text=conversation_text)
response = gemini_service.generate_text(
prompt=prompt,
temperature=0.1,
max_tokens=200,
model='gemini-3.1-flash-lite-preview',
thinking_level='minimal',
feature='conversation_summary'
)
text = response.strip()
if text.startswith('```'):
text = text.split('\n', 1)[1].rsplit('```', 1)[0].strip()
result = json.loads(text)
existing = db.query(AIConversationSummary).filter_by(
conversation_id=conversation_id
).first()
if existing:
existing.summary = result.get('summary', existing.summary)
existing.key_topics = result.get('key_topics', existing.key_topics)
existing.updated_at = datetime.now()
else:
summary = AIConversationSummary(
conversation_id=conversation_id,
user_id=user_id,
summary=result.get('summary', ''),
key_topics=result.get('key_topics', [])
)
db.add(summary)
db.commit()
logger.info(f"Summarized conversation {conversation_id}")
except Exception as e:
logger.warning(f"Conversation summary failed for {conversation_id}: {e}")
db.rollback()
finally:
db.close()
def delete_user_fact(user_id: int, fact_id: int) -> bool:
"""Soft-delete a memory fact. Returns True if deleted."""
db = SessionLocal()
try:
fact = db.query(AIUserMemory).filter_by(id=fact_id, user_id=user_id).first()
if fact:
fact.is_active = False
db.commit()
return True
return False
finally:
db.close()
```
- [ ] **Step 2: Verify syntax**
```bash
python3 -m py_compile memory_service.py && echo "OK"
```
- [ ] **Step 3: Commit**
```bash
git add memory_service.py
git commit -m "feat(nordagpt): add memory_service.py — fact extraction, summaries, CRUD"
```
---
### Task 14: Integrate memory into chat flow
**Files:**
- Modify: `nordabiz_chat.py`
- Modify: `blueprints/chat/routes.py`
- [ ] **Step 1: Inject memory into system prompt**
In `nordabiz_chat.py`, in the `_build_prompt()` or `_query_ai()` method, after the user identity block and before the data sections, add memory:
```python
from memory_service import format_memory_for_prompt
# After user_identity block, before data injection:
user_memory_text = ""
if user_context and user_context.get('user_id'):
user_memory_text = format_memory_for_prompt(user_context['user_id'])
# Prepend to system prompt:
system_prompt = user_identity + user_memory_text + f"""Jesteś pomocnym asystentem..."""
```
- [ ] **Step 2: Trigger async memory extraction after response**
In `send_message()` and `send_message_stream()`, after saving the AI response, trigger async extraction using threading:
```python
import threading
from memory_service import extract_facts_async, summarize_conversation_async
# After saving AI response to DB (end of send_message/send_message_stream):
# Async memory extraction — don't block the response
def _extract_memory():
extract_facts_async(conversation_id, user_id, user_context, self.gemini_service)
# Summarize every 5 messages
if (conversation.message_count or 0) % 5 == 0:
summarize_conversation_async(conversation_id, user_id, self.gemini_service)
threading.Thread(target=_extract_memory, daemon=True).start()
```
- [ ] **Step 3: Add memory CRUD API routes**
In `blueprints/chat/routes.py`, add routes for viewing and deleting memory:
```python
@bp.route('/api/chat/memory', methods=['GET'])
@login_required
@member_required
def get_user_memory_api():
"""Get current user's NordaGPT memory facts and summaries"""
from memory_service import get_user_memory, get_conversation_summaries
return jsonify({
'facts': get_user_memory(current_user.id, limit=20),
'summaries': get_conversation_summaries(current_user.id, limit=10)
})
@bp.route('/api/chat/memory/<int:fact_id>', methods=['DELETE'])
@login_required
@member_required
def delete_memory_fact(fact_id):
"""Delete a memory fact"""
from memory_service import delete_user_fact
if delete_user_fact(current_user.id, fact_id):
return jsonify({'status': 'ok'})
return jsonify({'error': 'Nie znaleziono'}), 404
```
- [ ] **Step 4: Verify syntax**
```bash
python3 -m py_compile nordabiz_chat.py && python3 -m py_compile blueprints/chat/routes.py && echo "OK"
```
- [ ] **Step 5: Commit**
```bash
git add nordabiz_chat.py blueprints/chat/routes.py
git commit -m "feat(nordagpt): integrate memory into chat — injection, async extraction, CRUD API"
```
---
### Task 15: Deploy Phase 4 — migrations + code
- [ ] **Step 1: Push to remotes**
```bash
git push origin master && git push inpi master
```
- [ ] **Step 2: Deploy to staging with migrations**
```bash
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && sudo -u www-data git pull"
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/092_ai_user_memory.sql"
ssh maciejpi@10.22.68.248 "cd /var/www/nordabiznes && /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/093_ai_conversation_summary.sql"
ssh maciejpi@10.22.68.248 "sudo systemctl restart nordabiznes"
```
- [ ] **Step 3: Test on staging**
1. Open chat, have a conversation about looking for IT companies
2. Open another chat, ask "o czym rozmawialiśmy?" — verify AI mentions previous topics
3. Check memory API: `curl https://staging.nordabiznes.pl/api/chat/memory` (with auth)
4. Verify facts are extracted
- [ ] **Step 4: Deploy to production**
```bash
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && sudo -u www-data git pull"
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && DATABASE_URL=\$(grep DATABASE_URL .env | cut -d'=' -f2) /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/092_ai_user_memory.sql"
ssh maciejpi@57.128.200.27 "cd /var/www/nordabiznes && DATABASE_URL=\$(grep DATABASE_URL .env | cut -d'=' -f2) /var/www/nordabiznes/venv/bin/python3 scripts/run_migration.py database/migrations/093_ai_conversation_summary.sql"
ssh maciejpi@57.128.200.27 "sudo systemctl restart nordabiznes"
curl -sI https://nordabiznes.pl/health | head -3
```
- [ ] **Step 5: Update release notes**
Add entry in `blueprints/public/routes.py` `_get_releases()`.
---
## Post-Implementation Checklist
- [ ] Verify AI greets users by name
- [ ] Verify Smart Router logs show correct classification
- [ ] Verify streaming works on mobile (Android + iOS)
- [ ] Verify memory facts are extracted after conversations
- [ ] Verify memory is private (user A cannot see user B's facts)
- [ ] Verify response times: simple <3s, medium <6s, complex <12s
- [ ] Monitor costs for first week compare with estimates
- [ ] Send message to Jakub Pornowski confirming speed improvements