PR #904 callback-based HKDF hack only fired for the first frame cryptor
(audio), leaving video frame cryptors with PBKDF2 - DEC_FAILED oscillation.
PR #921 integrates HKDF natively at the WebRTC C++ level, applying uniformly
to all frame cryptors (audio + video).
Also removes aggressive video re-keying workaround and adds 5s cooldown
to DEC_FAILED re-keying handler to prevent tight loops.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Video frame cryptors may not be fully initialized when set_key() is
first called during on_track_subscribed. Audio works immediately but
video oscillates OK↔DEC_FAILED with the same key.
Add staggered re-keying at 0.3s, 0.8s, 2s, 5s after video track
subscription to ensure the key is applied after the frame cryptor
is fully ready.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
KDF_PBKDF2=0 does NOT mean raw mode — libwebrtc applies its built-in
PBKDF2 on top of pre-derived keys, causing DEC_FAILED for audio too.
Revert to KDF_HKDF=1 (Rust applies HKDF, we pass raw base keys).
Keep diagnostic improvements:
- _derive_and_set_key() wrapper with logging
- Per-track type logging (audio vs video) in on_track_subscribed
- Frame size check in look_at_screen (detect E2EE failure)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch from Rust-side HKDF (KDF_HKDF=1) to Python-side HKDF derivation
with raw key mode (KDF_RAW=0). This eliminates potential HKDF implementation
mismatches between Rust FFI and Element Call JS that caused video frame
decryption failures (audio worked, video showed 8x8 garbage frames).
Changes:
- Add _derive_and_set_key() helper that pre-derives HKDF then calls set_key()
- Set key_derivation_function=KDF_RAW (proto 0 = no Rust-side derivation)
- Replace all direct set_key() calls with _derive_and_set_key()
- Add per-track diagnostic logging (audio vs video)
- Add frame size check in look_at_screen (detect E2EE failure early)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8x8 frames are encrypted garbage from E2EE video decryption failure.
Skip frames < 64x64 to avoid sending black/noise images to the LLM.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. Text bot can now capture video frames from active call when user
types vision-related queries ("siehst du meinen bildschirm", etc.)
2. Voice transcript injected into text bot context during active calls
3. Text messages injected into voice transcript with [typed in chat] prefix
4. Bot text replies injected back into voice transcript
This enables seamless context sharing between voice calls and text chat.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Video tracks (camera + screen share) were never getting E2EE keys set
via set_key() because the condition on track_subscribed only matched
audio tracks (kind==1). This caused DEC_FAILED for all video frames,
making look_at_screen return encrypted garbage or fail entirely.
Also added track source logging to distinguish camera vs screen share.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add HTTPS instruction to system prompt so LLM never generates http:// links.
Fix bare matrixhost.eu/settings references to use full https:// URLs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add scheduled messages/reminders system:
- New scheduled_messages table in memory-service with CRUD endpoints
- schedule_message, list_reminders, cancel_reminder tools for the bot
- Background scheduler loop (30s) sends due reminders automatically
- Supports one-time, daily, weekly, weekdays, monthly repeat patterns
Make article URL handling non-blocking:
- Show 3 options (discuss, text summary, audio) instead of forcing audio wizard
- Default to passing article context to AI if user just keeps chatting
- New AWAITING_LANGUAGE state for cleaner audio flow FSM
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrates _ensure_cross_signing() into Bot.start() flow. On first run, generates
and uploads cross-signing keys, then signs the bot device. On subsequent restarts,
detects existing cross-signatures and skips. Seeds persisted for device recovery.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows the bot to paginate back up to 500 messages in a room
to find specific content, beyond the default 10-message context window.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Route ~90% of simple chat to claude-haiku (4x cheaper), escalate to
claude-sonnet for code blocks, long messages, technical keywords,
multimodal, and explicit requests. Sentry tags track model_used,
escalation_reason, and token usage breadcrumbs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Make user_id required on all request models with field validators
- Always include user_id in WHERE clause for chunk queries (prevents cross-user data leak)
- Add bearer token auth on all endpoints except /health
- Add composite index on (user_id, room_id) for conversation_chunks
- Bot: guard query_chunks with sender check, pass room_id, send auth token
- Docker: pass MEMORY_SERVICE_TOKEN to both bot and memory-service
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Per-user Fernet encryption for fact/chunk_text/summary fields
- Postgres RLS with memory_app restricted role
- SSL for memory-db connections
- Data migration script (migrate_encrypt.py)
- DB migration (migrate_rls.sql)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Scanned passport PDFs have completely garbled OCR text that makes
the LLM think they're not passports, even though the AI-generated
title and summary correctly identify them. Added explicit instruction
to trust title/summary fields.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When doc_context is available, limit history to just 4 messages (2 exchanges)
to prevent stale answer patterns from overriding fresh document search results.
Without RAG results, keep 10 messages for normal conversation context.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The AI reply often contains full document content (passport details, etc.)
which the memory extraction LLM incorrectly stores as user facts. Limiting
to 200 chars avoids including document content while keeping the gist.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
30 messages of "only one passport" history overwhelmed fresh RAG results.
Reducing to 10 messages (5 exchanges) provides enough conversation context
without letting stale patterns dominate.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two changes:
1. Reorder messages: doc_context now placed RIGHT BEFORE the user message
(after chat history), so fresh search results override historical patterns
where the bot repeatedly said "only one passport"
2. Strengthen doc_context instructions: explicitly tell LLM that fresh search
results override chat history, and to list ALL matching documents
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The memory extraction prompt was extracting facts from RAG search results
(e.g., passport holder names) and storing them as if they were facts about
the user. Added explicit instruction to only extract facts the user directly
states about themselves.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
With only 3 results, passport queries often miss family members since
all passport files have similar low relevance scores. Increasing to 10
ensures all related documents are included in LLM context.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Raise VAD thresholds (activation 0.65→0.75, min speech 0.4→0.6s,
min silence 0.55→0.65s) to reduce false triggers from background noise
- Add "focus on latest message" instruction to all prompts (voice + text)
- Add "greet and wait" behavior for new conversations instead of auto-continuing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add rag_key_manager.py: stores encryption key in private E2EE room
- Bot loads key from Matrix on startup, injects into RAG via portal proxy
- No plaintext key on disk (removed RAG_ENCRYPTION_KEY from .env)
- Pass owner_id (matrix_user_id) to RAG search for user isolation
- Stronger format_context instructions for source link rendering
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DocumentRAG class now prefers local RAG endpoint (RAG_ENDPOINT env var)
over central portal API. When RAG_ENDPOINT is set, searches go to the
customer VM encrypted RAG service on localhost:8765. Falls back to
portal API for unmigrated customers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When v2 API returns 401 (scope mismatch with classic OAuth tokens),
fall back to v1 REST API which accepts classic scopes. Also provides
clear error message asking user to re-authorize if both fail.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>