matrix-ai-agent

Author	SHA1	Message	Date
Christian Gick	ea52236880	feat(e2ee): make E2EE configurable via E2EE_ENABLED env var Allows disabling E2EE for diagnostic purposes. When disabled, bot connects to LiveKit without frame encryption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 20:14:06 +02:00
Christian Gick	5bfe0d0188	chore: Trigger rebuild	2026-02-22 20:01:14 +02:00
Christian Gick	e3be4512d9	fix(e2ee): use correct Element Call E2EE parameters Inline E2EE options had 3 wrong values vs Element Call JS SDK: - failure_tolerance=-1 (infinite, hid all DEC_FAILED) → 10 - key_ring_size=16 (too small, keys overflow) → 256 - ratchet_window_size=16 (wrong) → 10 Now uses _build_e2ee_options() which was already correct but never called. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 20:00:55 +02:00
Christian Gick	c2338fca46	chore: Trigger rebuild	2026-02-22 19:45:13 +02:00
Christian Gick	7b7079352f	fix(noise): expand STT artifact filter to catch subtitle metadata leaks ElevenLabs scribe_v2_realtime also produces non-asterisk artifacts like "Untertitel: ARD Text im Auftrag von Funk (2017)" from TV/radio audio. Add pattern matching for subtitle metadata, copyright notices, and parenthetical/bracketed annotations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:43:22 +02:00
Christian Gick	5984132f60	chore: Trigger rebuild	2026-02-22 19:38:06 +02:00
Christian Gick	9e0f2a15b6	chore: Trigger rebuild	2026-02-22 19:35:11 +02:00
Christian Gick	c38ab96054	chore(voice): switch to Robert Ranger voice Replace Jack Marlowe (slow/raw) with Robert Ranger (deep/natural) for a more pleasant conversational voice assistant experience. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:34:54 +02:00
Christian Gick	38c3d93adf	chore: Trigger rebuild	2026-02-22 19:07:45 +02:00
Christian Gick	fa9e95b250	fix(noise): filter STT noise annotations via on_user_turn_completed Replace broken _VoiceAgent stt_node override with _NoiseFilterAgent that uses on_user_turn_completed() + StopResponse. This operates downstream of VAD+STT so no backpressure risk to the audio pipeline. When ElevenLabs scribe_v2_realtime produces Störgeräusche etc., the agent now silently suppresses them before the LLM responds. The prompt-based filter is kept as defense-in-depth. Fixes: MAT-41 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:07:31 +02:00
Christian Gick	7f03cc1f37	chore: Trigger rebuild	2026-02-22 19:02:56 +02:00
Christian Gick	6c1073e79d	fix(vad): remove competing AudioStream that caused intermittent VAD failures The _count_frames coroutine created a second rtc.AudioStream on the caller's audio track, competing with AgentSession's internal pipeline for event loop time. Under load, this caused VAD to miss speech → user_state stuck on "away". - Remove _count_frames AudioStream (debugging artifact) - Add VAD state diagnostics (speaking count, away duration) - Add VAD watchdog: warns if user_state=away >30s (MAT-40 detection) Fixes: MAT-40 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:02:39 +02:00
Christian Gick	a8d4663f10	fix(tts): revert to Jack Marlowe voice, vmVmHDKBkkCgbLVIOJRb not accessible Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:52:06 +02:00
Christian Gick	06b588f313	fix(voice): add noise annotation filter to prompt + switch voice - Add LLM prompt rule to ignore Störgeräusche etc. annotations instead of overriding stt_node (which broke VAD pipeline) - Switch voice to vmVmHDKBkkCgbLVIOJRb per user preference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:49:31 +02:00
Christian Gick	e926908af7	test: revert to base Agent to check if stt_node override breaks VAD Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:45:56 +02:00
Christian Gick	fb09808a8c	fix(vad): lower activation threshold 0.60→0.50 Threshold 0.60 too strict, user speech consistently not detected. Back to default 0.50 with min_speech_duration=0.2 as noise guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:42:21 +02:00
Christian Gick	8f80e7d543	fix(tts): switch to Jack Marlowe - native German voice Replace George (British EN) with Jack Marlowe (Gng1FdSGZlhs6jKgzAxL), the only native German voice in the library. Fixes garbled number/date pronunciation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:37:05 +02:00
Christian Gick	125b0f5d2e	fix(tts): spell out numbers in words for German TTS George (British) voice mangles German digit strings. Force LLM to write all numbers as German words so TTS pronounces them correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:35:52 +02:00
Christian Gick	1b08683c17	fix(vad): lower activation threshold 0.75→0.60 0.75 too strict, user voice not detected. 0.60 with min_speech_duration=0.2 should balance noise rejection vs speech detection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:15:06 +02:00
Christian Gick	8445c9325c	revert(tts): remove pcm_24000 encoding, keep language=de pcm_24000 caused silent playback through livekit. Reverting to plugin default encoding which is known working. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:12:35 +02:00
Christian Gick	e090c60c19	feat(tts): upgrade to pcm_24000 encoding + language=de Switch from mp3_22050_32 (default) to lossless PCM 24kHz for cleaner voice output. Add language=de for German text normalization. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 18:08:23 +02:00
Christian Gick	1e1911995f	fix(stt): filter ElevenLabs noise annotations before LLM scribe_v2_realtime annotates background audio as Störgeräusche, Fernsehgeräusche etc. Override stt_node to drop these so the LLM only receives actual speech transcripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 17:59:17 +02:00
Christian Gick	02a7c91eaf	fix(vad): raise activation threshold to reduce noise triggers activation_threshold 0.5→0.75, min_speech_duration 0.05→0.2s Prevents ambient noise from triggering STT and producing 'Schlechte Qualität' transcripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 17:52:38 +02:00
Christian Gick	39ef4e0054	fix(stt): pass http_session to ElevenLabs STT plugin Plugin requires explicit aiohttp session; livekit http_context not available in this job setup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 17:45:42 +02:00
Christian Gick	2dce8419d4	fix(stt): set scribe_v2_realtime model with language_code for streaming STT - Add model_id="scribe_v2_realtime" (already set) + language_code from STT_LANGUAGE env (default "de") - Remove _stt_session from cleanup loop (plugin uses livekit http_context) - Remove _stt_session stub from __init__ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 17:26:51 +02:00
Christian Gick	382a98dd09	chore: Trigger rebuild	2026-02-22 17:26:03 +02:00
Christian Gick	9bd7f27a84	fix: Use LITELLM_MASTER_KEY for memory service Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 17:25:16 +02:00
Christian Gick	4012950197	fix: Use scribe_v2_realtime model for ElevenLabs STT (streaming mode) scribe_v1 (REST) sets streaming=False, incompatible with livekit-agents 1.4 AgentSession. scribe_v2_realtime uses WebSocket streaming (confirmed working with Starter plan). Removes separate _stt_session aiohttp client. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 17:24:16 +02:00
Christian Gick	045e5831a6	chore: Trigger rebuild	2026-02-22 16:24:47 +02:00
Christian Gick	52f8cb569c	feat(voice): add cross-call memory and Brave Search tool - Query user memories at call start and inject into agent system prompt - Extract new facts after each exchange using claude-haiku via LiteLLM - Add Brave Search tool (@function_tool) for current data queries - Pass memory client and caller_user_id through VoiceSession constructor - Pre-compute 8 HMAC-ratcheted EC keys for reliable E2EE decryption Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 15:27:59 +02:00
Christian Gick	2b8744de6e	fix(voice): full E2EE bidirectional audio pipeline working - bot.py: track active callers per room; only stop session when last caller leaves (fixes premature cancellation when Playwright browser hangs up while real app is still in call) - voice.py: pre-compute 8 HMAC-ratcheted keys from EC's base key so decryption works immediately without waiting ~30s for Matrix to deliver EC's key-rotation event (root cause of user→bot silence) - voice.py: fix set_key() argument order (identity, key, index) at all call sites — was (identity, index, key) causing TypeError - voice.py: add audio frame monitor (AUDIO_FLOW) and mute/unmute event handlers for diagnostics - voice.py: update livekit-agents 1.4.2 event names: user_state_changed, user_input_transcribed, conversation_item_added Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 15:17:35 +02:00
Christian Gick	c379064f80	fix(voice): set caller key in on_track_subscribed — frame cryptor must exist for HKDF to apply Root cause: C++ set_key() only applies HKDF when impl_->GetKey(pid) returns a valid handler, which requires the frame cryptor for that participant to be initialized. Frame cryptors are created at track subscription time, not at connect time. Calling set_key(caller_identity, key) immediately after connect() skips HKDF derivation (impl_->GetKey returns null) → raw key stored → DEC_FAILED. Fix: move caller key setting to on_track_subscribed where frame cryptor definitely exists. Also update on_encryption_key to use set_key() for key rotation updates.	2026-02-22 14:05:54 +02:00
Christian Gick	190b35945c	fix(voice): guard e2ee_manager access when E2EE disabled (diagnostic mode)	2026-02-22 13:46:51 +02:00
Christian Gick	c188a2daf6	test(voice): disable E2EE entirely — check if EC sends plaintext vs encrypted If VAD triggers → EC audio reaches pipeline without decryption (plaintext or format issue). If VAD silent → E2EE encryption on EC side but key/format mismatch on our side. Note: bot greeting will be unencrypted so EC may not hear it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 13:34:26 +02:00
Christian Gick	3d05b503c6	test(voice): pre-derive HKDF in Python, use set_shared_key to bypass Rust FFI HKDF Diagnostic: if Rust FFI HKDF produces different result than EC JS HKDF, set_key(caller) would always fail (DEC_FAILED). Test: pre-derive AES key in Python matching livekit-client-sdk-js params (SHA-256, salt=LKFrameEncryptionKey, info=128-zeros, 16-byte output), pass to set_shared_key() which stores raw (no KDF). If user→bot decryption now works, root cause = Rust HKDF mismatch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 12:24:57 +02:00
Christian Gick	7adeebfe05	fix(voice): restore set_shared_key fallback + failure_tolerance=10 from working commit `e3ede3f` The confirmed-working Feb 21 commit (`e3ede3f`) used: - kp.set_shared_key(caller_key) as fallback for incoming audio decryption - failure_tolerance=10 (not -1) so DEC_FAILED state changes are visible Per-participant kp.set_key() alone is insufficient — the patched Rust FFI appears to fall back to shared_key for incoming track decryption. failure_tolerance=-1 was masking the DEC_FAILED state making diagnosis hard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 11:46:32 +02:00
Christian Gick	2a799f5760	fix(voice): set caller E2EE key on participant_connected + for all remote LK identities Two race conditions when bot joins first (remote=0): 1. Key arrives before participant joins LK → on_participant_connected now applies stored keys 2. Key arrives after session start → on_encryption_key now sets key for all remote_participants by LK identity Fixes identity mismatch between Matrix device_id (from key event) and LK participant identity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 11:30:23 +02:00
Christian Gick	5d31886192	debug(voice): add VAD start/stop events to trace where audio pipeline breaks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 11:18:51 +02:00
Christian Gick	f74a11fde8	fix(voice): separate aiohttp sessions for STT and TTS Sharing one session between ElevenLabs STT (WebSocket) and TTS (HTTP) can cause connection conflicts. Use dedicated sessions for each. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 11:15:46 +02:00
Christian Gick	475ab38f6f	chore: Trigger rebuild	2026-02-22 11:09:17 +02:00
Christian Gick	e3c1ded328	feat(voice): inject datetime into prompt, respond in DE/EN - Add VOICE_TIMEZONE env var (default: Europe/Berlin) for local time - Bot knows exact date/time at call start via _build_voice_prompt() - Respond in user language (DE or EN) instead of always German Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 11:02:56 +02:00
Christian Gick	92ab906a21	chore(voice): switch default voice to George (multilingual DE/EN) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 10:59:44 +02:00
Christian Gick	7696ca68ee	chore: Remove debug logging and pipecat-poc after E2EE fix confirmed working - Remove setLevel(DEBUG) for livekit.agents/plugins (added for diagnostics) - Remove periodic E2EE cryptor/participant state poll loop (no longer needed) - Remove pipecat-poc/pipeline.py (POC never deployed, LiveKit approach confirmed) E2EE bidirectional voice confirmed working in MAT-36. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 10:55:47 +02:00
Christian Gick	ac8a8a177c	chore: Trigger rebuild	2026-02-22 10:35:28 +02:00
Christian Gick	63545f032e	fix(voice): set E2EE keys immediately after connect, before rotation wait Root cause: caller track subscribed during 2s rotation wait creates a frame cryptor with no key → DEC_FAILED state → all incoming frames dropped. Setting the key after the wait doesn't recover the cryptor. Fix: set bot + caller keys immediately after lk_room.connect(), using the Matrix-provided caller identity. The post-rotation and post-find-remote key updates remain as belt+suspenders. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 10:34:20 +02:00
Christian Gick	4ab5486b5c	fix(voice): log remote participant identity and track count in E2EE poll Adds REMOTE_PARTICIPANT log every 10s to confirm caller is present and tracks are subscribed during E2EE decryption diagnosis. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 10:30:38 +02:00
Christian Gick	c4581c2917	fix(voice): reduce key rotation wait to 2s, increase E2EE poll to every 10s Phase 2 diagnostics: caller audio arrives immediately; setting the key earlier (2s vs 10s) avoids dropping initial frames. E2EE_CRYPTOR log now fires every 10s (was 30s) to confirm decryption state for incoming caller audio. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 10:27:19 +02:00
Christian Gick	5973ed1db3	fix(voice): revert to KDF_HKDF=1 with raw keys — proto value 0 is PBKDF2 not raw e2ee_patch.py shows KDF_PBKDF2=0, KDF_HKDF=1. Our KDF_NONE=0 was actually PBKDF2, double-deriving keys and causing silence. Removed Python HKDF pre-derivation — let Rust FFI apply HKDF internally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 09:26:44 +02:00
Christian Gick	6b457a2aef	fix(voice): use correct HKDF info=128zeros, length=16 matching LiveKit JS SDK LiveKit JS SDK deriveKeys(): info=new ArrayBuffer(128) (128 zero bytes, NOT identity), output=16 bytes AES-128. Previous code used identity as info and 32-byte output - both wrong, caused silence in both directions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 09:09:34 +02:00
Christian Gick	4f8bfbe479	fix(voice): pre-derive HKDF in Python, use KDF_NONE to bypass Rust FFI HKDF Rust FFI's KDF_HKDF path for incoming decryption may use wrong parameters. Pre-derive HKDF(base_key, salt="LKFrameEncryptionKey", info=identity) in Python and pass derived key with KDF_NONE so Rust FFI uses it directly as frame key. Matches EC's MatrixKeyProvider: ratchetWindowSize=10, keyringSize=256. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 08:47:41 +02:00

1 2 3 4

166 Commits