matrix-ai-agent

Author	SHA1	Message	Date
Christian Gick	f5e08257eb	fix(MAT-140): Set E2EE decryption keys for video tracks, not just audio Video tracks (camera + screen share) were never getting E2EE keys set via set_key() because the condition on track_subscribed only matched audio tracks (kind==1). This caused DEC_FAILED for all video frames, making look_at_screen return encrypted garbage or fail entirely. Also added track source logging to distinguish camera vs screen share. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 16:06:52 +02:00
Christian Gick	efb976a27c	feat: activity video track (pulsing orb) for voice sessions - ActivityVideoPublisher renders animated orb on 160x120 canvas - Integrated into both agent.py and voice.py - Updates confluence-collab submodule	2026-03-06 15:58:51 +00:00
Christian Gick	1000891a97	fix: Improve voice noise tolerance and focus on latest message - Raise VAD thresholds (activation 0.65→0.75, min speech 0.4→0.6s, min silence 0.55→0.65s) to reduce false triggers from background noise - Add "focus on latest message" instruction to all prompts (voice + text) - Add "greet and wait" behavior for new conversations instead of auto-continuing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 13:30:14 +02:00
Christian Gick	b0f84670f2	fix: video track kind detection and Confluence page creation - Video track kind is 2 (not 0) in LiveKit Python SDK — camera was never captured - Replace broken confluence_collab.create_page import with direct REST API call Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 13:30:48 +02:00
Christian Gick	9d2e2ddcf7	fix(MAT-13): Add DNS fallback via web search for browse_url When browse_url fails with DNS resolution error (common with STT-misrecognized domain names like "klicksports" instead of "clicksports"), automatically try a web search to find the correct domain and retry. Applied to both text and voice bot. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 16:41:37 +02:00
Christian Gick	6fe9607fb1	feat: Add web page browsing tool (browse_url) to voice and text bot Both bots can now fetch and read web pages via browse_url tool. Uses httpx + BeautifulSoup to extract clean text from HTML. Complements existing web_search (Brave) with full page reading. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 16:26:17 +02:00
Christian Gick	7791a5ba8e	feat: add Confluence recent pages + Sentry error tracking (MAT-58, MAT-59) MAT-58: Add recent_confluence_pages tool to both voice and text chat. Shows last 5 recently modified pages so users can pick directly instead of having to search every time. MAT-59: Integrate sentry-sdk in all three entry points (agent.py, bot.py, voice.py). SENTRY_DSN env var, traces at 10% sample rate. Requires creating project in Sentry UI and setting DSN. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 08:44:57 +02:00
Christian Gick	10762a53da	feat(MAT-57): Add Confluence write & create tools to voice and text chat - Add create_confluence_page tool to voice mode (basic auth) - Add confluence_update_page and confluence_create_page tools to text chat (OAuth) - Fix update tool: wrap each paragraph in <p> tags instead of single wrapper - Update system prompt to mention create capability Previously only search/read were available. User reported bot couldn't write to or create Confluence pages — because the tools didn't exist. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 08:04:01 +02:00
Christian Gick	3bf9229ae4	fix(MAT-56): Prevent bot silence from STT noise leak + LLM timeout Three fixes for the bot going silent after ~10 messages: 1. STT artifact handler now returns early — previously detected noise leaks ("Vielen Dank.", etc.) but still appended them to transcript, inflating context until LLM timed out after 4 retries. 2. Context truncation — caps LLM chat context at 40 items and internal transcript at 80 entries to prevent unbounded growth in long sessions. 3. LLM timeout recovery — watchdog detects when agent has been silent for >60s despite user activity, sends a recovery reply asking user to repeat their question instead of staying permanently silent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 07:58:11 +02:00
Christian Gick	b19300d3ce	feat: Add confluence_search tool to voice bot Voice bot could read/update Confluence pages but could not search. Users asking to search Confluence got a refusal. Now the voice bot has search_confluence using CQL queries via the service account. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 12:48:50 +02:00
Christian Gick	9e146da3b0	feat(CF-1812): Use confluence-collab for section-based page editing Replace inline regex section parser in voice.py with confluence_collab library (BS4 parsing, 409 conflict retry). Bot now loads section outline into LLM context when Confluence links are detected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 11:37:37 +02:00
Christian Gick	326a874aa7	feat: Add on-demand camera/screen vision via look_at_screen tool Voice bot can now see the users camera or screen share when asked. Captures a single frame, encodes as JPEG, sends to Sonnet vision with full context (transcript + document). Triggered by phrases like schau mal, siehst du das, can you see this. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 06:36:52 +02:00
Christian Gick	cfb26fb351	feat: Add doubt triggers to think_deeper tool "bist du dir sicher" / "are you sure" / "stimmt das wirklich" now also trigger Opus escalation for fact-checking the previous answer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 06:23:51 +02:00
Christian Gick	6081f9a7ec	feat(MAT-46): Add think_deeper tool for Opus escalation in voice calls Sonnet can now escalate complex questions to Opus via a function tool, same pattern as search_web and read_confluence_page. Full context (transcript + document) is passed automatically. Triggered by user phrases like "denk genauer nach" / "think harder" or when Sonnet is unsure about complex analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 06:13:44 +02:00
Christian Gick	de66ba5eea	feat(MAT-46): Extract and post document annotations after voice calls When a voice call ends and a document was loaded in the room, the bot now analyzes the transcript for document-specific changes/corrections and posts them as a structured "Dokument-Aenderungen" message. Returns nothing if no document changes were discussed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 20:18:00 +02:00
Christian Gick	6a6f9ef1c4	fix(voice): auto-use active Confluence page ID, allow roleplay on docs - Confluence tools default to active page from room context — no more asking user for page_id - Prompt allows roleplay/mock interviews when document context present - Explicit instruction not to ask for page_id Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 14:31:49 +02:00
Christian Gick	c5e1c79e1b	fix(voice): reduce phantom speech responses from ambient noise - Raise VAD activation_threshold 0.50→0.65, min_speech_duration 0.2→0.4s - Add ghost phrase filter: suppress 1-2 word hallucinations (Danke, Ja, etc) - Strengthen prompt: stay silent unless clearly addressed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 13:48:14 +02:00
Christian Gick	b275e7cb88	feat(voice): add Confluence read/write tools for voice sessions Enable realtime Confluence page editing during Element Call voice sessions. - Add read_confluence_page and update_confluence_page function tools - Detect Confluence URLs shared in Matrix rooms, store page ID for voice context - Section-level updates via heading match + version-incremented PUT Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 13:09:34 +02:00
Christian Gick	e81aa79396	fix: increase voice PDF context to 40k chars, fix language detection sanity - Voice context per-document limit 10k→40k chars (was cutting off at page 6) - Language detection: reject results >30 chars (LLM returning sentences) - Voice.py: generalize "PDF" label to "Dokumente" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 12:40:13 +02:00
Christian Gick	90e662be96	feat(voice): PDF context in voice calls + call transcript summary (MAT-10) Pass PDF document context from room to voice session so the voice LLM can answer questions about uploaded PDFs. Persist call transcripts and post an LLM-generated summary to the room when the call ends. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 11:21:31 +02:00
Christian Gick	1ec63b93f2	feat(voice): per-user timezone via memory preferences - Store user timezone as [PREF:timezone] in memory service - Query timezone preference on session start, override default - Add set_user_timezone tool so bot learns timezone from conversation - On time-relevant questions, bot asks if user is still at stored location - Seeded Europe/Nicosia for @christian.gick:agiliton.eu Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 11:02:25 +02:00
Christian Gick	e84260f839	feat(prompt): add user timezone and LLM model to voice prompt Bot now knows the user's timezone (Europe/Berlin default) and which LLM model it's running on, so it can answer questions about both. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:56:40 +02:00
Christian Gick	277d6b5fe4	fix(e2ee): restore 3s key rotation wait, fix mute callback arg order Removing the blocking wait entirely caused DEC_FAILED - the rotated key had not arrived via nio sync before the pipeline started. Restore a short 3s wait (down from 10s) which is enough for nio to deliver the rotated key. Also fix on_mute/on_unmute arg order (participant, publication - not reversed). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:43:38 +02:00
Christian Gick	a11cafc1d6	feat(memory): store full conversation exchanges instead of LLM-extracted facts - Replace _extract_voice_memories with _store_voice_exchange - Store raw "User: ... / Assistant: ..." pairs directly - No LLM call needed — faster, cheaper, no lost context - Load as "Frühere Gespräche" with full thread context Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:40:59 +02:00
Christian Gick	150df19be1	fix(tts): revert to multilingual_v2 for better quality, keep speed 1.15x flash_v2_5 had audible compression artifacts. multilingual_v2 has higher fidelity while speed=1.15 via VoiceSettings still gives snappier delivery. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:38:46 +02:00
Christian Gick	294fbac913	feat(tts): switch to flash model + speed 1.15x for snappier voice - Model: eleven_multilingual_v2 → eleven_flash_v2_5 (lower latency) - Speed: 1.15x via VoiceSettings - Stability/similarity tuned for natural German speech Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:33:27 +02:00
Christian Gick	c532f4678d	fix(e2ee): consolidate key timing + noise filtering (MAT-40, MAT-41) - set_key() only called after frame cryptor exists (on_track_subscribed / late arrival) - Remove 10s blocking key rotation wait; keys applied asynchronously - Add DEC_FAILED (state 3) to e2ee_state recovery triggers - VAD watchdog re-applies all E2EE keys on >30s stuck as recovery - Expand STT artifact patterns (English variants, double-asterisk) - Add NOISE_LEAK diagnostic logging at STT level Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 08:33:40 +02:00
Christian Gick	4b4a150fbf	fix(e2ee): extend key rotation wait to 10s, debug late key events EC rotates encryption key when bot joins LiveKit room. The rotated key arrives via Matrix sync 3-5s later. Previous 2s wait was too short - DEC_FAILED before new key arrived. Extended wait to 10s. Added logging to bot.py to trace why late key events were not being processed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 21:54:27 +02:00
Christian Gick	230c083b7b	fix(e2ee): revert incorrect HKDF patch, remove pre-ratcheting The HKDF sed patch in Dockerfile was wrong — it swapped salt/info based on incorrect analysis of minified JS. The original Rust FFI parameters are correct: salt="LKFrameEncryptionKey", info=[0;128]. Also removed Python-side HMAC pre-ratcheting of keys. Element Call uses explicit key rotation via Matrix events, not HMAC ratcheting. Added diagnostic logging to trace exact key bytes during E2EE setup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 21:44:11 +02:00
Christian Gick	ea52236880	feat(e2ee): make E2EE configurable via E2EE_ENABLED env var Allows disabling E2EE for diagnostic purposes. When disabled, bot connects to LiveKit without frame encryption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 20:14:06 +02:00
Christian Gick	e3be4512d9	fix(e2ee): use correct Element Call E2EE parameters Inline E2EE options had 3 wrong values vs Element Call JS SDK: - failure_tolerance=-1 (infinite, hid all DEC_FAILED) → 10 - key_ring_size=16 (too small, keys overflow) → 256 - ratchet_window_size=16 (wrong) → 10 Now uses _build_e2ee_options() which was already correct but never called. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 20:00:55 +02:00
Christian Gick	7b7079352f	fix(noise): expand STT artifact filter to catch subtitle metadata leaks ElevenLabs scribe_v2_realtime also produces non-asterisk artifacts like "Untertitel: ARD Text im Auftrag von Funk (2017)" from TV/radio audio. Add pattern matching for subtitle metadata, copyright notices, and parenthetical/bracketed annotations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:43:22 +02:00
Christian Gick	c38ab96054	chore(voice): switch to Robert Ranger voice Replace Jack Marlowe (slow/raw) with Robert Ranger (deep/natural) for a more pleasant conversational voice assistant experience. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:34:54 +02:00
Christian Gick	fa9e95b250	fix(noise): filter STT noise annotations via on_user_turn_completed Replace broken _VoiceAgent stt_node override with _NoiseFilterAgent that uses on_user_turn_completed() + StopResponse. This operates downstream of VAD+STT so no backpressure risk to the audio pipeline. When ElevenLabs scribe_v2_realtime produces Störgeräusche etc., the agent now silently suppresses them before the LLM responds. The prompt-based filter is kept as defense-in-depth. Fixes: MAT-41 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:07:31 +02:00
Christian Gick	6c1073e79d	fix(vad): remove competing AudioStream that caused intermittent VAD failures The _count_frames coroutine created a second rtc.AudioStream on the caller's audio track, competing with AgentSession's internal pipeline for event loop time. Under load, this caused VAD to miss speech → user_state stuck on "away". - Remove _count_frames AudioStream (debugging artifact) - Add VAD state diagnostics (speaking count, away duration) - Add VAD watchdog: warns if user_state=away >30s (MAT-40 detection) Fixes: MAT-40 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:02:39 +02:00
Christian Gick	a8d4663f10	fix(tts): revert to Jack Marlowe voice, vmVmHDKBkkCgbLVIOJRb not accessible Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:52:06 +02:00
Christian Gick	06b588f313	fix(voice): add noise annotation filter to prompt + switch voice - Add LLM prompt rule to ignore Störgeräusche etc. annotations instead of overriding stt_node (which broke VAD pipeline) - Switch voice to vmVmHDKBkkCgbLVIOJRb per user preference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:49:31 +02:00
Christian Gick	e926908af7	test: revert to base Agent to check if stt_node override breaks VAD Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:45:56 +02:00
Christian Gick	fb09808a8c	fix(vad): lower activation threshold 0.60→0.50 Threshold 0.60 too strict, user speech consistently not detected. Back to default 0.50 with min_speech_duration=0.2 as noise guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:42:21 +02:00
Christian Gick	8f80e7d543	fix(tts): switch to Jack Marlowe - native German voice Replace George (British EN) with Jack Marlowe (Gng1FdSGZlhs6jKgzAxL), the only native German voice in the library. Fixes garbled number/date pronunciation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:37:05 +02:00
Christian Gick	125b0f5d2e	fix(tts): spell out numbers in words for German TTS George (British) voice mangles German digit strings. Force LLM to write all numbers as German words so TTS pronounces them correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:35:52 +02:00
Christian Gick	1b08683c17	fix(vad): lower activation threshold 0.75→0.60 0.75 too strict, user voice not detected. 0.60 with min_speech_duration=0.2 should balance noise rejection vs speech detection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:15:06 +02:00
Christian Gick	8445c9325c	revert(tts): remove pcm_24000 encoding, keep language=de pcm_24000 caused silent playback through livekit. Reverting to plugin default encoding which is known working. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:12:35 +02:00
Christian Gick	e090c60c19	feat(tts): upgrade to pcm_24000 encoding + language=de Switch from mp3_22050_32 (default) to lossless PCM 24kHz for cleaner voice output. Add language=de for German text normalization. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:08:23 +02:00
Christian Gick	1e1911995f	fix(stt): filter ElevenLabs noise annotations before LLM scribe_v2_realtime annotates background audio as Störgeräusche, Fernsehgeräusche etc. Override stt_node to drop these so the LLM only receives actual speech transcripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 17:59:17 +02:00
Christian Gick	02a7c91eaf	fix(vad): raise activation threshold to reduce noise triggers activation_threshold 0.5→0.75, min_speech_duration 0.05→0.2s Prevents ambient noise from triggering STT and producing 'Schlechte Qualität' transcripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 17:52:38 +02:00
Christian Gick	39ef4e0054	fix(stt): pass http_session to ElevenLabs STT plugin Plugin requires explicit aiohttp session; livekit http_context not available in this job setup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 17:45:42 +02:00
Christian Gick	2dce8419d4	fix(stt): set scribe_v2_realtime model with language_code for streaming STT - Add model_id="scribe_v2_realtime" (already set) + language_code from STT_LANGUAGE env (default "de") - Remove _stt_session from cleanup loop (plugin uses livekit http_context) - Remove _stt_session stub from __init__ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 17:26:51 +02:00
Christian Gick	4012950197	fix: Use scribe_v2_realtime model for ElevenLabs STT (streaming mode) scribe_v1 (REST) sets streaming=False, incompatible with livekit-agents 1.4 AgentSession. scribe_v2_realtime uses WebSocket streaming (confirmed working with Starter plan). Removes separate _stt_session aiohttp client. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 17:24:16 +02:00
Christian Gick	52f8cb569c	feat(voice): add cross-call memory and Brave Search tool - Query user memories at call start and inject into agent system prompt - Extract new facts after each exchange using claude-haiku via LiteLLM - Add Brave Search tool (@function_tool) for current data queries - Pass memory client and caller_user_id through VoiceSession constructor - Pre-compute 8 HMAC-ratcheted EC keys for reliable E2EE decryption Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 15:27:59 +02:00

1 2 3

105 Commits