Bot now publishes the same key as the caller so both sides can decrypt.
Falls back to no-encryption if no caller key received.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Skip bot own encryption_keys events in on_unknown handler
- Always pass valid RoomOptions to AgentSession.start()
- Wait up to 10s for remote participant to connect before starting pipeline
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Element Call distributes encryption keys as timeline events, not room
state events. Changed bot to publish keys via room_send and fetch from
/messages endpoint instead of /state.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All participants must use the SAME shared key. Bot was generating
its own key which couldn't decrypt user's audio. Now:
1. Fetch caller's key from room state via HTTP API
2. Fall back to waiting for key via sync handler
3. Publish the SAME key back (not a new one)
4. Only connect with E2EE if key available
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Element Call encrypts media by default. Bot must:
1. Generate its own 32-byte E2EE key
2. Publish it to room state (io.element.call.encryption_keys)
3. Connect to LiveKit with HKDF E2EE enabled
4. Use caller's key when received, own key as fallback
This fixes: Nicht verschlüsselt warning, silent audio (encrypted
frames couldn't be decoded by VAD/STT)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Pass participant_identity via RoomOptions so AgentSession knows
which audio track to consume (was silently ignoring user audio)
- Add USER_SPEECH and AGENT_SPEECH event handlers for debugging
- Simplify greeting to exact text to prevent hallucination
- Use httpx for room state scan (nio API was unreliable)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reorder: send call member event BEFORE creating VoiceSession
- Store VoiceSession BEFORE start so sync handler can forward keys
- Increase E2EE key wait from 3s to 10s
- Add INFO-level logging for key lookup + room state scan via HTTP API
- Tighten voice system prompt to prevent long rambling greetings
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix state_key format: try @user:domain:DEVICE_ID (Element Call format),
then @user:domain, then scan all room state as fallback
- Publish bot E2EE key to room so Element shows encrypted status
- Extract caller device_id from call member event content
- Also fix pipecat-poc pipeline with context aggregators (CF-1579)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
rust:latest produces FFI needing CXXABI_1.3.15 (GCC 14 libstdc++).
GCC 14 libstdc++ needs GLIBC 2.38. Bookworm only has 2.36.
Trixie has GLIBC 2.38+ — fixes the CXXABI_1.3.15 runtime error.
Also reverts to rust:latest since bookworm GCC 12 cant compile webrtc C++20.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
rust:latest links against GLIBC_2.38 libstdc++ which is incompatible with bookworm.
rust:bookworm (1.93.1) produces FFI binary compatible with bookworm libstdc++.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Stop VoiceSession when call leave event received
- Copy libstdc++ from rust build stage to fix CXXABI_1.3.15 mismatch
- Read caller encryption key from room state before starting VoiceSession
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
voice.py runs in bot container, not agent container.
- Wait 3s for encryption key before connecting
- Build E2EE options with HKDF when key received
- Bot container now uses patched Dockerfile (needs FFI)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Element Call uses HKDF-SHA256 + AES-128-GCM for frame encryption,
while the LiveKit Rust SDK defaults to PBKDF2 + AES-256-GCM.
- Multi-stage Dockerfile builds patched Rust FFI from EC-compat fork
- Generates Python protobuf bindings with new fields
- patch_sdk.py modifies installed livekit-rtc for new proto fields
- agent.py passes E2EE options with HKDF to ctx.connect()
- bot.py exchanges encryption keys via Matrix state events
- Separate Dockerfile.bot for bot service (no Rust build needed)
Ref: livekit/rust-sdks#904, livekit/python-sdks#570
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Element Call uses SHA256(room_id + "|m.call#ROOM") encoded as unpadded
base64 for LiveKit room names (via lk-jwt-service). The bot was using
the raw Matrix room ID, causing agent and user to join different rooms.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add memory-service (FastAPI + pgvector) for semantic memory storage.
Bot now queries relevant memories per conversation instead of dumping all 50.
Includes migration script for existing JSON files.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace separate bot-crypto/bot-memories volumes with single bot-data:/data
volume so user_keys.json and language_prefs.json persist across restarts
- Remove redundant language_prefs.json infrastructure (constant, load/save,
dict) — language preference now read from memories (last match wins)
- Add robust JSON extraction in _extract_memories (regex fallback for
markdown fences, embedded arrays, non-array responses)
- Add info-level logging throughout memory extraction pipeline
- Add asyncio.wait_for timeout (15s) on memory extraction to prevent hangs
- Add !ai memory <fact> command for explicit, reliable memory storage
- Update _get_preferred_language to return last match (most recent wins)
- Update !ai forget to clear in-memory caches (pending translate/reply)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Upgrade memory/translation debug logs from debug to warning level
- Auto-detect language preference from extracted memory facts
- Persist language prefs to separate JSON file for reliability
- Add translation detection logging
- Use single linebreaks in translation menu
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Detect when a DM message is in a foreign language and offer an
interactive menu: translate, compose reply in that language, or
respond normally. Supports forwarded WhatsApp messages via Element.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract and store memorable facts (name, language, preferences) per user
- Inject memories into system prompt for personalized responses
- LLM-based extraction after each response, deduplication against existing
- JSON files on Docker volume (/data/memories), capped at 50 per user
- System prompt updated: respond in users language, use memories
- Commands: !ai memories (view), !ai forget (delete all)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add RoomEncryptedImage callback with decrypt_attachment for E2E rooms
- Cache recent images per room (60s TTL) so follow-up text messages
like "was ist das" get the image context instead of hallucinating
- Treat filenames (containing dots) as no-caption, default to
"What's in this image?"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upload with encrypt=True and filesize param. Handle UploadError
gracefully. Use m.file encrypted format when encryption keys returned.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Register RoomMessageFile callback, filter for application/pdf
- Extract text from PDFs using pymupdf (fitz)
- Send extracted text as context to LLM for summarization/Q&A
- Truncate at 50k chars to avoid token limits
- Add pymupdf to requirements.txt
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>