Allows disabling E2EE for diagnostic purposes. When disabled, bot
connects to LiveKit without frame encryption.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Inline E2EE options had 3 wrong values vs Element Call JS SDK:
- failure_tolerance=-1 (infinite, hid all DEC_FAILED) → 10
- key_ring_size=16 (too small, keys overflow) → 256
- ratchet_window_size=16 (wrong) → 10
Now uses _build_e2ee_options() which was already correct but never called.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ElevenLabs scribe_v2_realtime also produces non-asterisk artifacts like
"Untertitel: ARD Text im Auftrag von Funk (2017)" from TV/radio audio.
Add pattern matching for subtitle metadata, copyright notices, and
parenthetical/bracketed annotations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace Jack Marlowe (slow/raw) with Robert Ranger (deep/natural) for
a more pleasant conversational voice assistant experience.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace broken _VoiceAgent stt_node override with _NoiseFilterAgent that uses
on_user_turn_completed() + StopResponse. This operates downstream of VAD+STT
so no backpressure risk to the audio pipeline.
When ElevenLabs scribe_v2_realtime produces *Störgeräusche* etc., the agent
now silently suppresses them before the LLM responds. The prompt-based filter
is kept as defense-in-depth.
Fixes: MAT-41
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The _count_frames coroutine created a second rtc.AudioStream on the caller's
audio track, competing with AgentSession's internal pipeline for event loop
time. Under load, this caused VAD to miss speech → user_state stuck on "away".
- Remove _count_frames AudioStream (debugging artifact)
- Add VAD state diagnostics (speaking count, away duration)
- Add VAD watchdog: warns if user_state=away >30s (MAT-40 detection)
Fixes: MAT-40
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add LLM prompt rule to ignore *Störgeräusche* etc. annotations
instead of overriding stt_node (which broke VAD pipeline)
- Switch voice to vmVmHDKBkkCgbLVIOJRb per user preference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Threshold 0.60 too strict, user speech consistently not detected.
Back to default 0.50 with min_speech_duration=0.2 as noise guard.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace George (British EN) with Jack Marlowe (Gng1FdSGZlhs6jKgzAxL),
the only native German voice in the library. Fixes garbled number/date
pronunciation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
George (British) voice mangles German digit strings. Force LLM to
write all numbers as German words so TTS pronounces them correctly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0.75 too strict, user voice not detected. 0.60 with min_speech_duration=0.2
should balance noise rejection vs speech detection.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pcm_24000 caused silent playback through livekit. Reverting to
plugin default encoding which is known working.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switch from mp3_22050_32 (default) to lossless PCM 24kHz for cleaner
voice output. Add language=de for German text normalization.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
scribe_v2_realtime annotates background audio as *Störgeräusche*,
*Fernsehgeräusche* etc. Override stt_node to drop these so the LLM
only receives actual speech transcripts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Plugin requires explicit aiohttp session; livekit http_context not available
in this job setup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Query user memories at call start and inject into agent system prompt
- Extract new facts after each exchange using claude-haiku via LiteLLM
- Add Brave Search tool (@function_tool) for current data queries
- Pass memory client and caller_user_id through VoiceSession constructor
- Pre-compute 8 HMAC-ratcheted EC keys for reliable E2EE decryption
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- bot.py: track active callers per room; only stop session when last
caller leaves (fixes premature cancellation when Playwright browser
hangs up while real app is still in call)
- voice.py: pre-compute 8 HMAC-ratcheted keys from EC's base key so
decryption works immediately without waiting ~30s for Matrix to
deliver EC's key-rotation event (root cause of user→bot silence)
- voice.py: fix set_key() argument order (identity, key, index) at all
call sites — was (identity, index, key) causing TypeError
- voice.py: add audio frame monitor (AUDIO_FLOW) and mute/unmute event
handlers for diagnostics
- voice.py: update livekit-agents 1.4.2 event names: user_state_changed,
user_input_transcribed, conversation_item_added
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause: C++ set_key() only applies HKDF when impl_->GetKey(pid) returns a valid
handler, which requires the frame cryptor for that participant to be initialized.
Frame cryptors are created at track subscription time, not at connect time.
Calling set_key(caller_identity, key) immediately after connect() skips HKDF
derivation (impl_->GetKey returns null) → raw key stored → DEC_FAILED.
Fix: move caller key setting to on_track_subscribed where frame cryptor definitely exists.
Also update on_encryption_key to use set_key() for key rotation updates.
If VAD triggers → EC audio reaches pipeline without decryption (plaintext or format issue).
If VAD silent → E2EE encryption on EC side but key/format mismatch on our side.
Note: bot greeting will be unencrypted so EC may not hear it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diagnostic: if Rust FFI HKDF produces different result than EC JS HKDF,
set_key(caller) would always fail (DEC_FAILED). Test: pre-derive AES key
in Python matching livekit-client-sdk-js params (SHA-256, salt=LKFrameEncryptionKey,
info=128-zeros, 16-byte output), pass to set_shared_key() which stores raw (no KDF).
If user→bot decryption now works, root cause = Rust HKDF mismatch.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The confirmed-working Feb 21 commit (e3ede3f) used:
- kp.set_shared_key(caller_key) as fallback for incoming audio decryption
- failure_tolerance=10 (not -1) so DEC_FAILED state changes are visible
Per-participant kp.set_key() alone is insufficient — the patched Rust FFI
appears to fall back to shared_key for incoming track decryption.
failure_tolerance=-1 was masking the DEC_FAILED state making diagnosis hard.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two race conditions when bot joins first (remote=0):
1. Key arrives before participant joins LK → on_participant_connected now applies stored keys
2. Key arrives after session start → on_encryption_key now sets key for all remote_participants by LK identity
Fixes identity mismatch between Matrix device_id (from key event) and LK participant identity.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sharing one session between ElevenLabs STT (WebSocket) and TTS (HTTP)
can cause connection conflicts. Use dedicated sessions for each.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add VOICE_TIMEZONE env var (default: Europe/Berlin) for local time
- Bot knows exact date/time at call start via _build_voice_prompt()
- Respond in user language (DE or EN) instead of always German
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove setLevel(DEBUG) for livekit.agents/plugins (added for diagnostics)
- Remove periodic E2EE cryptor/participant state poll loop (no longer needed)
- Remove pipecat-poc/pipeline.py (POC never deployed, LiveKit approach confirmed)
E2EE bidirectional voice confirmed working in MAT-36.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause: caller track subscribed during 2s rotation wait creates a
frame cryptor with no key → DEC_FAILED state → all incoming frames dropped.
Setting the key after the wait doesn't recover the cryptor.
Fix: set bot + caller keys immediately after lk_room.connect(), using
the Matrix-provided caller identity. The post-rotation and post-find-remote
key updates remain as belt+suspenders.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds REMOTE_PARTICIPANT log every 10s to confirm caller is present
and tracks are subscribed during E2EE decryption diagnosis.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase 2 diagnostics: caller audio arrives immediately; setting the key
earlier (2s vs 10s) avoids dropping initial frames. E2EE_CRYPTOR log
now fires every 10s (was 30s) to confirm decryption state for incoming
caller audio.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LiveKit JS SDK deriveKeys(): info=new ArrayBuffer(128) (128 zero bytes, NOT identity), output=16 bytes AES-128.
Previous code used identity as info and 32-byte output - both wrong, caused silence in both directions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rust FFI's KDF_HKDF path for incoming decryption may use wrong parameters.
Pre-derive HKDF(base_key, salt="LKFrameEncryptionKey", info=identity) in Python
and pass derived key with KDF_NONE so Rust FFI uses it directly as frame key.
Matches EC's MatrixKeyProvider: ratchetWindowSize=10, keyringSize=256.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>