matrix-ai-agent

Author	SHA1	Message	Date
Christian Gick	bfc717372c	fix(voice): add MSC4143 call.member encryption key support Element Call v0.17+ embeds encryption_keys in call.member state events instead of separate timeline events. In E2EE rooms, timeline events are encrypted and the bot HTTP fetch cannot decrypt them, causing DEC_FAILED. - Extract caller keys from call.member state event on join - Embed bot key in call.member state event - Check call.member state in key fetch (before timeline fallback) - Handle key updates in call.member during active calls - Update voice.py key poller to check call.member state first - Add debug logging for UnknownEvent types in call rooms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 08:12:58 +02:00
Christian Gick	c29c2170f3	fix(e2ee): fix screen share key rotation failures (MAT-164) - Route HTTP-fetched keys through on_encryption_key() for proper rotation detection - Replace boolean refetch gate with 500ms timestamp throttle for faster recovery - Reduce DEC_FAILED cooldown from 2s to 0.5s - Extend proactive key poll from 3s to 10s window - Add continuous background key poller (3s interval) during active calls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 12:35:50 +02:00
Christian Gick	f27d545012	fix(MAT-164): proactive key poll on screen share + faster DEC_FAILED recovery When a video track is subscribed (screen share starts), Element Call rotates the E2EE key. Instead of waiting for DEC_FAILED, proactively poll the timeline for the new key (6x @ 500ms = 3s window). Also reduce DEC_FAILED threshold from 3→1 and cooldown from 5s→2s for faster recovery when the proactive poll misses the rotation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 07:44:51 +02:00
Christian Gick	1a0a2ec305	fix: E2EE key re-fetch now triggers on DEC_FAILED before cooldown The re-fetch check was placed after the 5s cooldown return, so it never executed. Now it triggers after 3+ DEC_FAILED regardless of cooldown. Also relaxed stale key age filter from 60s to 300s to handle key rotation during ongoing calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 13:51:10 +02:00
Christian Gick	488e50e73c	fix: handle Element Call same-index key rotation on screen share Element Call rotates E2EE keys by re-sending index 0 with a new value when screen share starts. The LiveKit frame cryptor caches derived AES keys per index, so overwriting index 0 does not force re-derivation. Fix: detect when index 0 value changes and map to incrementing internal index so the frame cryptor gets a fresh key slot. Sets all accumulated keys on late arrival so cryptor can try both during transition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 13:43:56 +02:00
Christian Gick	3706f568b6	fix: skip stale E2EE keys and re-fetch on persistent DEC_FAILED - Timeline key fetch now filters by sent_ts (max 60s age) to avoid using keys from a previous call session - After 3+ consecutive DEC_FAILED events, automatically re-fetches key from timeline in case rotation happened - Tracks DEC_FAILED count per participant, resets on OK This should fix the issue where the bot picks up stale encryption keys from previous calls and can't decrypt the current caller's audio. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 13:38:10 +02:00
Christian Gick	a155f39ede	feat: instant "Einen Moment" filler when look_at_screen is invoked Plays immediate spoken feedback so the user knows the bot is processing their screen share / camera before the vision API responds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 13:29:21 +02:00
Christian Gick	5521819358	fix: add missing time import in voice.py E2EE handler The on_e2ee_state callback crashed with NameError on time.monotonic() when video tracks (screen share) arrived, preventing E2EE key re-derivation and causing the bot to miss screen-share related questions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 13:22:04 +02:00
Christian Gick	0c7070ebc4	fix(e2ee): remove diagnostic logging, video E2EE confirmed working (MAT-144) Root cause: aggressive video re-keying (set_key at 0.3/0.8/2/5s intervals) briefly cleared encryption_key between SetKey and HKDF callback, causing DEC_FAILED oscillation. Single set_key per track subscription is sufficient. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 11:16:37 +02:00
Christian Gick	4ae65524ac	fix(e2ee): revert to PR #904 branch, add MAT-144 diagnostics PR #921 requires custom WebRTC build not yet available. Added diagnostic logging: encryption_type per track, frame_cryptors count, and DEC_FAILED re-keying cooldown (5s) to reduce log spam. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 10:12:51 +02:00
Christian Gick	f85562ed28	fix(e2ee): switch to PR #921 Rust FFI branch for native HKDF (MAT-144) PR #904 callback-based HKDF hack only fired for the first frame cryptor (audio), leaving video frame cryptors with PBKDF2 - DEC_FAILED oscillation. PR #921 integrates HKDF natively at the WebRTC C++ level, applying uniformly to all frame cryptors (audio + video). Also removes aggressive video re-keying workaround and adds 5s cooldown to DEC_FAILED re-keying handler to prevent tight loops. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 09:52:32 +02:00
Christian Gick	1118ab5060	fix(e2ee): aggressive video re-keying after track subscription (MAT-144) Video frame cryptors may not be fully initialized when set_key() is first called during on_track_subscribed. Audio works immediately but video oscillates OK↔DEC_FAILED with the same key. Add staggered re-keying at 0.3s, 0.8s, 2s, 5s after video track subscription to ensure the key is applied after the frame cryptor is fully ready. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 09:38:17 +02:00
Christian Gick	61531d9913	fix(voice): disable activity video animation — causing lag (MAT-149) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 08:34:09 +02:00
Christian Gick	5ad1d1d60c	fix(e2ee): correct misleading log messages after KDF revert (MAT-144) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 08:29:34 +02:00
Christian Gick	c61bcffec2	revert(e2ee): restore KDF_HKDF=1, KDF_RAW=0 causes PBKDF2 double-derivation (MAT-144) KDF_PBKDF2=0 does NOT mean raw mode — libwebrtc applies its built-in PBKDF2 on top of pre-derived keys, causing DEC_FAILED for audio too. Revert to KDF_HKDF=1 (Rust applies HKDF, we pass raw base keys). Keep diagnostic improvements: - _derive_and_set_key() wrapper with logging - Per-track type logging (audio vs video) in on_track_subscribed - Frame size check in look_at_screen (detect E2EE failure) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 08:17:46 +02:00
Christian Gick	d586ddfa6d	fix(e2ee): pre-derive HKDF keys in Python instead of Rust FFI (MAT-144) Switch from Rust-side HKDF (KDF_HKDF=1) to Python-side HKDF derivation with raw key mode (KDF_RAW=0). This eliminates potential HKDF implementation mismatches between Rust FFI and Element Call JS that caused video frame decryption failures (audio worked, video showed 8x8 garbage frames). Changes: - Add _derive_and_set_key() helper that pre-derives HKDF then calls set_key() - Set key_derivation_function=KDF_RAW (proto 0 = no Rust-side derivation) - Replace all direct set_key() calls with _derive_and_set_key() - Add per-track diagnostic logging (audio vs video) - Add frame size check in look_at_screen (detect E2EE failure early) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 17:05:59 +02:00
Christian Gick	f5e08257eb	fix(MAT-140): Set E2EE decryption keys for video tracks, not just audio Video tracks (camera + screen share) were never getting E2EE keys set via set_key() because the condition on track_subscribed only matched audio tracks (kind==1). This caused DEC_FAILED for all video frames, making look_at_screen return encrypted garbage or fail entirely. Also added track source logging to distinguish camera vs screen share. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 16:06:52 +02:00
Christian Gick	efb976a27c	feat: activity video track (pulsing orb) for voice sessions - ActivityVideoPublisher renders animated orb on 160x120 canvas - Integrated into both agent.py and voice.py - Updates confluence-collab submodule	2026-03-06 15:58:51 +00:00
Christian Gick	1000891a97	fix: Improve voice noise tolerance and focus on latest message - Raise VAD thresholds (activation 0.65→0.75, min speech 0.4→0.6s, min silence 0.55→0.65s) to reduce false triggers from background noise - Add "focus on latest message" instruction to all prompts (voice + text) - Add "greet and wait" behavior for new conversations instead of auto-continuing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 13:30:14 +02:00
Christian Gick	b0f84670f2	fix: video track kind detection and Confluence page creation - Video track kind is 2 (not 0) in LiveKit Python SDK — camera was never captured - Replace broken confluence_collab.create_page import with direct REST API call Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 13:30:48 +02:00
Christian Gick	9d2e2ddcf7	fix(MAT-13): Add DNS fallback via web search for browse_url When browse_url fails with DNS resolution error (common with STT-misrecognized domain names like "klicksports" instead of "clicksports"), automatically try a web search to find the correct domain and retry. Applied to both text and voice bot. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 16:41:37 +02:00
Christian Gick	6fe9607fb1	feat: Add web page browsing tool (browse_url) to voice and text bot Both bots can now fetch and read web pages via browse_url tool. Uses httpx + BeautifulSoup to extract clean text from HTML. Complements existing web_search (Brave) with full page reading. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 16:26:17 +02:00
Christian Gick	7791a5ba8e	feat: add Confluence recent pages + Sentry error tracking (MAT-58, MAT-59) MAT-58: Add recent_confluence_pages tool to both voice and text chat. Shows last 5 recently modified pages so users can pick directly instead of having to search every time. MAT-59: Integrate sentry-sdk in all three entry points (agent.py, bot.py, voice.py). SENTRY_DSN env var, traces at 10% sample rate. Requires creating project in Sentry UI and setting DSN. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 08:44:57 +02:00
Christian Gick	10762a53da	feat(MAT-57): Add Confluence write & create tools to voice and text chat - Add create_confluence_page tool to voice mode (basic auth) - Add confluence_update_page and confluence_create_page tools to text chat (OAuth) - Fix update tool: wrap each paragraph in <p> tags instead of single wrapper - Update system prompt to mention create capability Previously only search/read were available. User reported bot couldn't write to or create Confluence pages — because the tools didn't exist. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 08:04:01 +02:00
Christian Gick	3bf9229ae4	fix(MAT-56): Prevent bot silence from STT noise leak + LLM timeout Three fixes for the bot going silent after ~10 messages: 1. STT artifact handler now returns early — previously detected noise leaks ("Vielen Dank.", etc.) but still appended them to transcript, inflating context until LLM timed out after 4 retries. 2. Context truncation — caps LLM chat context at 40 items and internal transcript at 80 entries to prevent unbounded growth in long sessions. 3. LLM timeout recovery — watchdog detects when agent has been silent for >60s despite user activity, sends a recovery reply asking user to repeat their question instead of staying permanently silent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 07:58:11 +02:00
Christian Gick	b19300d3ce	feat: Add confluence_search tool to voice bot Voice bot could read/update Confluence pages but could not search. Users asking to search Confluence got a refusal. Now the voice bot has search_confluence using CQL queries via the service account. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 12:48:50 +02:00
Christian Gick	9e146da3b0	feat(CF-1812): Use confluence-collab for section-based page editing Replace inline regex section parser in voice.py with confluence_collab library (BS4 parsing, 409 conflict retry). Bot now loads section outline into LLM context when Confluence links are detected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 11:37:37 +02:00
Christian Gick	326a874aa7	feat: Add on-demand camera/screen vision via look_at_screen tool Voice bot can now see the users camera or screen share when asked. Captures a single frame, encodes as JPEG, sends to Sonnet vision with full context (transcript + document). Triggered by phrases like schau mal, siehst du das, can you see this. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 06:36:52 +02:00
Christian Gick	cfb26fb351	feat: Add doubt triggers to think_deeper tool "bist du dir sicher" / "are you sure" / "stimmt das wirklich" now also trigger Opus escalation for fact-checking the previous answer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 06:23:51 +02:00
Christian Gick	6081f9a7ec	feat(MAT-46): Add think_deeper tool for Opus escalation in voice calls Sonnet can now escalate complex questions to Opus via a function tool, same pattern as search_web and read_confluence_page. Full context (transcript + document) is passed automatically. Triggered by user phrases like "denk genauer nach" / "think harder" or when Sonnet is unsure about complex analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 06:13:44 +02:00
Christian Gick	de66ba5eea	feat(MAT-46): Extract and post document annotations after voice calls When a voice call ends and a document was loaded in the room, the bot now analyzes the transcript for document-specific changes/corrections and posts them as a structured "Dokument-Aenderungen" message. Returns nothing if no document changes were discussed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 20:18:00 +02:00
Christian Gick	6a6f9ef1c4	fix(voice): auto-use active Confluence page ID, allow roleplay on docs - Confluence tools default to active page from room context — no more asking user for page_id - Prompt allows roleplay/mock interviews when document context present - Explicit instruction not to ask for page_id Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 14:31:49 +02:00
Christian Gick	c5e1c79e1b	fix(voice): reduce phantom speech responses from ambient noise - Raise VAD activation_threshold 0.50→0.65, min_speech_duration 0.2→0.4s - Add ghost phrase filter: suppress 1-2 word hallucinations (Danke, Ja, etc) - Strengthen prompt: stay silent unless clearly addressed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 13:48:14 +02:00
Christian Gick	b275e7cb88	feat(voice): add Confluence read/write tools for voice sessions Enable realtime Confluence page editing during Element Call voice sessions. - Add read_confluence_page and update_confluence_page function tools - Detect Confluence URLs shared in Matrix rooms, store page ID for voice context - Section-level updates via heading match + version-incremented PUT Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 13:09:34 +02:00
Christian Gick	e81aa79396	fix: increase voice PDF context to 40k chars, fix language detection sanity - Voice context per-document limit 10k→40k chars (was cutting off at page 6) - Language detection: reject results >30 chars (LLM returning sentences) - Voice.py: generalize "PDF" label to "Dokumente" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 12:40:13 +02:00
Christian Gick	90e662be96	feat(voice): PDF context in voice calls + call transcript summary (MAT-10) Pass PDF document context from room to voice session so the voice LLM can answer questions about uploaded PDFs. Persist call transcripts and post an LLM-generated summary to the room when the call ends. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 11:21:31 +02:00
Christian Gick	1ec63b93f2	feat(voice): per-user timezone via memory preferences - Store user timezone as [PREF:timezone] in memory service - Query timezone preference on session start, override default - Add set_user_timezone tool so bot learns timezone from conversation - On time-relevant questions, bot asks if user is still at stored location - Seeded Europe/Nicosia for @christian.gick:agiliton.eu Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 11:02:25 +02:00
Christian Gick	e84260f839	feat(prompt): add user timezone and LLM model to voice prompt Bot now knows the user's timezone (Europe/Berlin default) and which LLM model it's running on, so it can answer questions about both. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:56:40 +02:00
Christian Gick	277d6b5fe4	fix(e2ee): restore 3s key rotation wait, fix mute callback arg order Removing the blocking wait entirely caused DEC_FAILED - the rotated key had not arrived via nio sync before the pipeline started. Restore a short 3s wait (down from 10s) which is enough for nio to deliver the rotated key. Also fix on_mute/on_unmute arg order (participant, publication - not reversed). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:43:38 +02:00
Christian Gick	a11cafc1d6	feat(memory): store full conversation exchanges instead of LLM-extracted facts - Replace _extract_voice_memories with _store_voice_exchange - Store raw "User: ... / Assistant: ..." pairs directly - No LLM call needed — faster, cheaper, no lost context - Load as "Frühere Gespräche" with full thread context Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:40:59 +02:00
Christian Gick	150df19be1	fix(tts): revert to multilingual_v2 for better quality, keep speed 1.15x flash_v2_5 had audible compression artifacts. multilingual_v2 has higher fidelity while speed=1.15 via VoiceSettings still gives snappier delivery. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:38:46 +02:00
Christian Gick	294fbac913	feat(tts): switch to flash model + speed 1.15x for snappier voice - Model: eleven_multilingual_v2 → eleven_flash_v2_5 (lower latency) - Speed: 1.15x via VoiceSettings - Stability/similarity tuned for natural German speech Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:33:27 +02:00
Christian Gick	c532f4678d	fix(e2ee): consolidate key timing + noise filtering (MAT-40, MAT-41) - set_key() only called after frame cryptor exists (on_track_subscribed / late arrival) - Remove 10s blocking key rotation wait; keys applied asynchronously - Add DEC_FAILED (state 3) to e2ee_state recovery triggers - VAD watchdog re-applies all E2EE keys on >30s stuck as recovery - Expand STT artifact patterns (English variants, double-asterisk) - Add NOISE_LEAK diagnostic logging at STT level Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 08:33:40 +02:00
Christian Gick	4b4a150fbf	fix(e2ee): extend key rotation wait to 10s, debug late key events EC rotates encryption key when bot joins LiveKit room. The rotated key arrives via Matrix sync 3-5s later. Previous 2s wait was too short - DEC_FAILED before new key arrived. Extended wait to 10s. Added logging to bot.py to trace why late key events were not being processed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 21:54:27 +02:00
Christian Gick	230c083b7b	fix(e2ee): revert incorrect HKDF patch, remove pre-ratcheting The HKDF sed patch in Dockerfile was wrong — it swapped salt/info based on incorrect analysis of minified JS. The original Rust FFI parameters are correct: salt="LKFrameEncryptionKey", info=[0;128]. Also removed Python-side HMAC pre-ratcheting of keys. Element Call uses explicit key rotation via Matrix events, not HMAC ratcheting. Added diagnostic logging to trace exact key bytes during E2EE setup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 21:44:11 +02:00
Christian Gick	ea52236880	feat(e2ee): make E2EE configurable via E2EE_ENABLED env var Allows disabling E2EE for diagnostic purposes. When disabled, bot connects to LiveKit without frame encryption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 20:14:06 +02:00
Christian Gick	e3be4512d9	fix(e2ee): use correct Element Call E2EE parameters Inline E2EE options had 3 wrong values vs Element Call JS SDK: - failure_tolerance=-1 (infinite, hid all DEC_FAILED) → 10 - key_ring_size=16 (too small, keys overflow) → 256 - ratchet_window_size=16 (wrong) → 10 Now uses _build_e2ee_options() which was already correct but never called. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 20:00:55 +02:00
Christian Gick	7b7079352f	fix(noise): expand STT artifact filter to catch subtitle metadata leaks ElevenLabs scribe_v2_realtime also produces non-asterisk artifacts like "Untertitel: ARD Text im Auftrag von Funk (2017)" from TV/radio audio. Add pattern matching for subtitle metadata, copyright notices, and parenthetical/bracketed annotations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:43:22 +02:00
Christian Gick	c38ab96054	chore(voice): switch to Robert Ranger voice Replace Jack Marlowe (slow/raw) with Robert Ranger (deep/natural) for a more pleasant conversational voice assistant experience. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:34:54 +02:00
Christian Gick	fa9e95b250	fix(noise): filter STT noise annotations via on_user_turn_completed Replace broken _VoiceAgent stt_node override with _NoiseFilterAgent that uses on_user_turn_completed() + StopResponse. This operates downstream of VAD+STT so no backpressure risk to the audio pipeline. When ElevenLabs scribe_v2_realtime produces Störgeräusche etc., the agent now silently suppresses them before the LLM responds. The prompt-based filter is kept as defense-in-depth. Fixes: MAT-41 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:07:31 +02:00

1 2 3

121 Commits