Commit Graph

44 Commits

Author SHA1 Message Date
Christian Gick
52f8cb569c feat(voice): add cross-call memory and Brave Search tool
- Query user memories at call start and inject into agent system prompt
- Extract new facts after each exchange using claude-haiku via LiteLLM
- Add Brave Search tool (@function_tool) for current data queries
- Pass memory client and caller_user_id through VoiceSession constructor
- Pre-compute 8 HMAC-ratcheted EC keys for reliable E2EE decryption

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 15:27:59 +02:00
Christian Gick
2b8744de6e fix(voice): full E2EE bidirectional audio pipeline working
- bot.py: track active callers per room; only stop session when last
  caller leaves (fixes premature cancellation when Playwright browser
  hangs up while real app is still in call)

- voice.py: pre-compute 8 HMAC-ratcheted keys from EC's base key so
  decryption works immediately without waiting ~30s for Matrix to
  deliver EC's key-rotation event (root cause of user→bot silence)

- voice.py: fix set_key() argument order (identity, key, index) at all
  call sites — was (identity, index, key) causing TypeError

- voice.py: add audio frame monitor (AUDIO_FLOW) and mute/unmute event
  handlers for diagnostics

- voice.py: update livekit-agents 1.4.2 event names: user_state_changed,
  user_input_transcribed, conversation_item_added

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 15:17:35 +02:00
Christian Gick
b65d04389b fix: Switch E2EE to per-participant keys instead of shared key
Element Call uses per-participant keys, not shared key mode.
Bot now generates its own key, publishes it, and sets both
keys via key_provider.set_key() after connecting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 06:41:20 +02:00
Christian Gick
4a93827de3 revert: Restore voice.py and bot.py to last known working state (9aef846)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 20:47:51 +02:00
Christian Gick
533847c952 fix: Switch E2EE from shared key to per-participant key mode
Element Call uses per-participant keys via MatrixKeyProvider.onSetEncryptionKey(),
not shared key mode. This was causing silence with E2EE enabled.

- Set bot's own key and caller's key separately via e2ee_manager.key_provider.set_key()
- Live-update caller key when received after connect
- Fallback to set_shared_key if per-participant API unavailable

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 18:50:19 +02:00
Christian Gick
1bc044eaae fix: republish caller E2EE key as shared key, fallback to no-E2EE
Bot now publishes the same key as the caller so both sides can decrypt.
Falls back to no-encryption if no caller key received.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 17:35:31 +02:00
Christian Gick
08f4e115b9 fix: filter own key events, fix RoomOptions None, wait for participant
- Skip bot own encryption_keys events in on_unknown handler
- Always pass valid RoomOptions to AgentSession.start()
- Wait up to 10s for remote participant to connect before starting pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 17:33:12 +02:00
Christian Gick
6e1e9839cc fix: use timeline events for E2EE key exchange (not state events)
Element Call distributes encryption keys as timeline events, not room
state events. Changed bot to publish keys via room_send and fetch from
/messages endpoint instead of /state.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 17:28:56 +02:00
Christian Gick
85df4b295f fix: E2EE key timing + verbose logging + shorter greeting
- Reorder: send call member event BEFORE creating VoiceSession
- Store VoiceSession BEFORE start so sync handler can forward keys
- Increase E2EE key wait from 3s to 10s
- Add INFO-level logging for key lookup + room state scan via HTTP API
- Tighten voice system prompt to prevent long rambling greetings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 14:55:52 +02:00
Christian Gick
80582860b9 fix: E2EE key lookup for Element Call voice sessions
- Fix state_key format: try @user:domain:DEVICE_ID (Element Call format),
  then @user:domain, then scan all room state as fallback
- Publish bot E2EE key to room so Element shows encrypted status
- Extract caller device_id from call member event content
- Also fix pipecat-poc pipeline with context aggregators (CF-1579)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 14:51:26 +02:00
Christian Gick
85f8df5690 fix: VoiceSession cleanup on call leave + CXXABI compat + proactive E2EE key read
- Stop VoiceSession when call leave event received
- Copy libstdc++ from rust build stage to fix CXXABI_1.3.15 mismatch
- Read caller encryption key from room state before starting VoiceSession

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 17:21:51 +02:00
Christian Gick
fc3d915939 feat(e2ee): Add HKDF E2EE support for Element Call compatibility
Element Call uses HKDF-SHA256 + AES-128-GCM for frame encryption,
while the LiveKit Rust SDK defaults to PBKDF2 + AES-256-GCM.

- Multi-stage Dockerfile builds patched Rust FFI from EC-compat fork
- Generates Python protobuf bindings with new fields
- patch_sdk.py modifies installed livekit-rtc for new proto fields
- agent.py passes E2EE options with HKDF to ctx.connect()
- bot.py exchanges encryption keys via Matrix state events
- Separate Dockerfile.bot for bot service (no Rust build needed)

Ref: livekit/rust-sdks#904, livekit/python-sdks#570

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 16:29:06 +02:00
Christian Gick
578b6bb56f fix: compute correct LiveKit room name hash for Element Call
Element Call uses SHA256(room_id + "|m.call#ROOM") encoded as unpadded
base64 for LiveKit room names (via lk-jwt-service). The bot was using
the raw Matrix room ID, causing agent and user to join different rooms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 07:16:35 +00:00
Christian Gick
4cd7a0262e feat: Replace JSON memory with pgvector semantic search (MAT-11)
Add memory-service (FastAPI + pgvector) for semantic memory storage.
Bot now queries relevant memories per conversation instead of dumping all 50.
Includes migration script for existing JSON files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 06:25:50 +02:00
Christian Gick
bf81f7d0b9 fix: Remove !ai memory command — natural conversation only
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 10:11:09 +02:00
Christian Gick
b5c33f4701 fix: Fix memory system persistence and consolidate language prefs
- Replace separate bot-crypto/bot-memories volumes with single bot-data:/data
  volume so user_keys.json and language_prefs.json persist across restarts
- Remove redundant language_prefs.json infrastructure (constant, load/save,
  dict) — language preference now read from memories (last match wins)
- Add robust JSON extraction in _extract_memories (regex fallback for
  markdown fences, embedded arrays, non-array responses)
- Add info-level logging throughout memory extraction pipeline
- Add asyncio.wait_for timeout (15s) on memory extraction to prevent hangs
- Add !ai memory <fact> command for explicit, reliable memory storage
- Update _get_preferred_language to return last match (most recent wins)
- Update !ai forget to clear in-memory caches (pending translate/reply)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 09:49:05 +02:00
Christian Gick
94bf621490 fix: memory persistence + language auto-detection for translation workflow
- Upgrade memory/translation debug logs from debug to warning level
- Auto-detect language preference from extracted memory facts
- Persist language prefs to separate JSON file for reliability
- Add translation detection logging
- Use single linebreaks in translation menu

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 09:06:07 +02:00
Christian Gick
d6c30abca3 feat: DM translation workflow for forwarded foreign messages
Detect when a DM message is in a foreign language and offer an
interactive menu: translate, compose reply in that language, or
respond normally. Supports forwarded WhatsApp messages via Element.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 08:56:49 +02:00
Christian Gick
d7e32acfcb feat: Add persistent user memory system
- Extract and store memorable facts (name, language, preferences) per user
- Inject memories into system prompt for personalized responses
- LLM-based extraction after each response, deduplication against existing
- JSON files on Docker volume (/data/memories), capped at 50 per user
- System prompt updated: respond in users language, use memories
- Commands: !ai memories (view), !ai forget (delete all)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 08:19:12 +02:00
Christian Gick
eef850f7ac fix: Handle encrypted images + link text to recent images
- Add RoomEncryptedImage callback with decrypt_attachment for E2E rooms
- Cache recent images per room (60s TTL) so follow-up text messages
  like "was ist das" get the image context instead of hallucinating
- Treat filenames (containing dots) as no-caption, default to
  "What's in this image?"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 07:11:07 +02:00
Christian Gick
2199de47f9 fix: Handle encrypted image upload for Matrix rooms (MAT-9)
Upload with encrypt=True and filesize param. Handle UploadError
gracefully. Use m.file encrypted format when encryption keys returned.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 22:35:21 +02:00
Christian Gick
5c5f442a74 feat: Add PDF reading support to Matrix AI bot (MAT-10)
- Register RoomMessageFile callback, filter for application/pdf
- Extract text from PDFs using pymupdf (fitz)
- Send extracted text as context to LLM for summarization/Q&A
- Truncate at 50k chars to avoid token limits
- Add pymupdf to requirements.txt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 22:09:24 +02:00
Christian Gick
8b08056e0a feat: Add image reading and generation to Matrix AI bot (MAT-9)
- Register RoomMessageImage callback to handle incoming images
- Download and base64-encode images, send as multimodal content to LLM
- Add LLM tool calling with generate_image tool for natural image generation
- Upload generated images back to Matrix via m.image events
- Update system prompt to inform LLM about vision and image gen capabilities

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 21:54:45 +02:00
Christian Gick
f51e8d95e0 fix: Rename connect/disconnect to wildfiles connect/disconnect (WF-90)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 15:01:45 +02:00
Christian Gick
37757379ff feat: Add SSO device auth flow for !ai connect (WF-90)
!ai connect (no args) now starts a browser-based device authorization
flow instead of requiring a raw API key. Direct key input preserved
as fallback. Bot polls WildFiles for approval with 5s interval.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 14:33:35 +02:00
Christian Gick
8f82f22698 feat: Add per-user WildFiles auth via !ai connect/disconnect
- !ai connect <key>: validates key against WildFiles, stores per-user mapping, redacts message
- !ai disconnect: removes stored key
- RAG searches use per-user API key when available, fall back to WILDFILES_ORG
- Keys stored in /data/user_keys.json (Docker volume)

Implements WF-90

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 14:05:28 +02:00
Christian Gick
61d3e524e9 fix: render markdown links as clickable HTML in Matrix messages
_md_to_html now converts [text](url) to <a> tags and auto-links bare URLs.
Also instructs LLM to use markdown links instead of raw URLs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 13:51:22 +02:00
Christian Gick
d392cda64d fix: include document content in RAG context + disable DM auto-rename
format_context() was only passing metadata (title, date, link) to the LLM,
so the bot could not answer content questions. Now passes full OCR text.
Also removes auto-rename for DMs (Element reuses single DM room per user pair).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 13:43:04 +02:00
Christian Gick
abfc6ee34a feat: Re-rename DM rooms after 5min gap for new conversation topics
Matrix reuses a single DM room per user pair, so 'new' DMs jump
back to the old thread. Now the bot re-renames the room if >5min
has passed since the last rename, reflecting the new topic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 13:37:37 +02:00
Christian Gick
8402f0e36a fix: Increase WildFiles RAG timeout from 5s to 15s
Embedding generation on CPU (bge-m3) takes ~3s, plus network latency
can exceed 5s causing silent failures and empty results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 13:36:22 +02:00
Christian Gick
2e4090aff8 feat: Auto-rename DM rooms by default with improved title prompt
- DM rooms (1:1 with bot) now auto-rename after first response
  without needing !ai auto-rename on
- Group rooms still require explicit opt-in
- Skip rename if room already has a meaningful name
- Improved prompt: emoji prefix, 3-5 words, same language as chat
  (inspired by Open WebUI title generation)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 13:25:40 +02:00
Christian Gick
a0367f32e3 feat: Add RAG query rewriting for contextual follow-up questions
When a user asks "Wer ist Mieter in diesem Haus?" after discussing
a specific house, the raw message lacks context for RAG search.
Now uses a quick LLM call to resolve pronouns/references before
searching WildFiles (e.g. "diesem Haus" -> "Mieter Haus Coburg").

Also moved history fetch before RAG search so context is available.

Refs: WF-90

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 13:06:26 +02:00
Christian Gick
a80eb6f5b7 fix: Read source_url from top-level field in RAG response
source_url is now a top-level field on DocumentChunk, not nested
in metadata. Fall back to metadata for backwards compatibility.

Refs: WF-90

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 13:02:43 +02:00
Christian Gick
1e317bbd6d feat(CF-1189): Add auto-rename, fix system prompt, load room settings
- Add !ai auto-rename on/off command to auto-name rooms based on conversation topic
- Persist auto-rename setting via room state event (ai.agiliton.auto_rename)
- Generate short title via LLM after first AI response, set as m.room.name
- Load persisted model and auto-rename settings lazily from room state
- Strengthen system prompt: prohibit asking about document storage, file locations
- Fix bot suggesting !ai commands and admin contact to users

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 08:48:44 +02:00
Christian Gick
0c82047ba8 fix(CF-1189): Fix room.timeline crash, add markdown rendering, improve verification routing
- Replace room.timeline (non-existent in nio) with client.room_messages() API
- Add markdown-to-HTML conversion for formatted Matrix messages
- Route in-room verification events from both UnknownEvent and RoomMessageUnknown

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 06:53:00 +02:00
Christian Gick
21635bb3ab feat(CF-1189): Add in-room SAS verification for E2E key sharing
Element withholds megolm keys from unverified devices. Implements
the full in-room m.key.verification.* protocol so Element Verify works.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 19:22:32 +02:00
Christian Gick
ac26c71709 feat: respond to all messages in DMs without requiring @mention
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 19:02:36 +02:00
Christian Gick
2c60a1562c feat(CF-1189): Add AI text bot + WildFiles RAG integration
Extends bot.py with text message handling:
- RoomMessageText callback with @mention detection
- LLM responses via LiteLLM (OpenAI-compatible)
- WildFiles document search (DocumentRAG class)
- Per-room model selection via room state events
- Commands: !ai help/models/set-model/search
- Typing indicators during AI response generation
- 30s staleness check to avoid replaying history

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 18:10:28 +02:00
Christian Gick
d5af90c7c7 fix(CF-1170): Fix STT by correcting agent dispatch flow
Three fixes for voice agent not responding to speech:
1. Agent name: add --agent-name matrix-ai to CLI (was empty, dispatch couldnt match)
2. Move dispatch from on_invite to on_unknown call handler (dispatch when call starts, not on room join)
3. Use LiveKit room name from foci_preferred instead of raw Matrix room ID

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 17:21:32 +02:00
Christian Gick
ee4efd01ef fix: agent cleanup on disconnect + targeted audio input
- Agent disconnects custom room when all real participants leave
  (prevents zombie participants blocking auto-dispatch)
- Bot sends m.call.member state event on call detection
  (Element Call shows bot as joined)
- Use RoomInputOptions(participant_identity=...) to target real user
  audio input (framework agent-AJ_xxx participant was confusing RoomIO)
- Removed incorrect bot dispatch (Matrix room ID != LiveKit room name)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 13:41:53 +02:00
Christian Gick
a7b55a1696 fix: Persist login credentials for stable device ID
- Save user_id/device_id/access_token to crypto store on first login
- restore_login() on subsequent starts (no new device each restart)
- Enables proper Olm session persistence across restarts

CF-1147

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 08:03:51 +02:00
Christian Gick
d7044b613c fix: Auto-trust all devices for E2E decryption
Bot now trusts all room member devices on each sync, enabling
Megolm key exchange. Logs undecryptable events for debugging.

CF-1147

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 08:01:14 +02:00
Christian Gick
cbc61f1646 feat: Add E2E encryption support to Matrix bot
- matrix-nio[e2e] with libolm for Megolm encryption
- Persistent crypto store volume for key persistence
- Auto-accept key verification (SAS)
- Upload device keys on first login

CF-1147

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 07:56:46 +02:00
Christian Gick
fa65fbeb3d feat: Matrix AI voice agent (LiveKit + LiteLLM)
Bot @ai:agiliton.eu accepts room invites, dispatches LiveKit agent.
Agent joins call with STT (Groq Whisper) → LLM (Sonnet) → TTS (ElevenLabs)
pipeline, all routed through LiteLLM.

CF-1147

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 07:31:52 +02:00