Commit Graph

119 Commits

Author SHA1 Message Date
Christian Gick
06b588f313 fix(voice): add noise annotation filter to prompt + switch voice
- Add LLM prompt rule to ignore *Störgeräusche* etc. annotations
  instead of overriding stt_node (which broke VAD pipeline)
- Switch voice to vmVmHDKBkkCgbLVIOJRb per user preference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:49:31 +02:00
Christian Gick
e926908af7 test: revert to base Agent to check if stt_node override breaks VAD
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 18:45:56 +02:00
Christian Gick
fb09808a8c fix(vad): lower activation threshold 0.60→0.50
Threshold 0.60 too strict, user speech consistently not detected.
Back to default 0.50 with min_speech_duration=0.2 as noise guard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 18:42:21 +02:00
Christian Gick
8f80e7d543 fix(tts): switch to Jack Marlowe - native German voice
Replace George (British EN) with Jack Marlowe (Gng1FdSGZlhs6jKgzAxL),
the only native German voice in the library. Fixes garbled number/date
pronunciation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 18:37:05 +02:00
Christian Gick
125b0f5d2e fix(tts): spell out numbers in words for German TTS
George (British) voice mangles German digit strings. Force LLM to
write all numbers as German words so TTS pronounces them correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 18:35:52 +02:00
Christian Gick
1b08683c17 fix(vad): lower activation threshold 0.75→0.60
0.75 too strict, user voice not detected. 0.60 with min_speech_duration=0.2
should balance noise rejection vs speech detection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 18:15:06 +02:00
Christian Gick
8445c9325c revert(tts): remove pcm_24000 encoding, keep language=de
pcm_24000 caused silent playback through livekit. Reverting to
plugin default encoding which is known working.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 18:12:35 +02:00
Christian Gick
e090c60c19 feat(tts): upgrade to pcm_24000 encoding + language=de
Switch from mp3_22050_32 (default) to lossless PCM 24kHz for cleaner
voice output. Add language=de for German text normalization.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 18:08:23 +02:00
Christian Gick
1e1911995f fix(stt): filter ElevenLabs noise annotations before LLM
scribe_v2_realtime annotates background audio as *Störgeräusche*,
*Fernsehgeräusche* etc. Override stt_node to drop these so the LLM
only receives actual speech transcripts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 17:59:17 +02:00
Christian Gick
02a7c91eaf fix(vad): raise activation threshold to reduce noise triggers
activation_threshold 0.5→0.75, min_speech_duration 0.05→0.2s
Prevents ambient noise from triggering STT and producing
'Schlechte Qualität' transcripts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 17:52:38 +02:00
Christian Gick
39ef4e0054 fix(stt): pass http_session to ElevenLabs STT plugin
Plugin requires explicit aiohttp session; livekit http_context not available
in this job setup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 17:45:42 +02:00
Christian Gick
2dce8419d4 fix(stt): set scribe_v2_realtime model with language_code for streaming STT
- Add model_id="scribe_v2_realtime" (already set) + language_code from STT_LANGUAGE env (default "de")
- Remove _stt_session from cleanup loop (plugin uses livekit http_context)
- Remove _stt_session stub from __init__

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 17:26:51 +02:00
Christian Gick
4012950197 fix: Use scribe_v2_realtime model for ElevenLabs STT (streaming mode)
scribe_v1 (REST) sets streaming=False, incompatible with livekit-agents 1.4 AgentSession.
scribe_v2_realtime uses WebSocket streaming (confirmed working with Starter plan).
Removes separate _stt_session aiohttp client.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 17:24:16 +02:00
Christian Gick
52f8cb569c feat(voice): add cross-call memory and Brave Search tool
- Query user memories at call start and inject into agent system prompt
- Extract new facts after each exchange using claude-haiku via LiteLLM
- Add Brave Search tool (@function_tool) for current data queries
- Pass memory client and caller_user_id through VoiceSession constructor
- Pre-compute 8 HMAC-ratcheted EC keys for reliable E2EE decryption

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 15:27:59 +02:00
Christian Gick
2b8744de6e fix(voice): full E2EE bidirectional audio pipeline working
- bot.py: track active callers per room; only stop session when last
  caller leaves (fixes premature cancellation when Playwright browser
  hangs up while real app is still in call)

- voice.py: pre-compute 8 HMAC-ratcheted keys from EC's base key so
  decryption works immediately without waiting ~30s for Matrix to
  deliver EC's key-rotation event (root cause of user→bot silence)

- voice.py: fix set_key() argument order (identity, key, index) at all
  call sites — was (identity, index, key) causing TypeError

- voice.py: add audio frame monitor (AUDIO_FLOW) and mute/unmute event
  handlers for diagnostics

- voice.py: update livekit-agents 1.4.2 event names: user_state_changed,
  user_input_transcribed, conversation_item_added

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 15:17:35 +02:00
Christian Gick
c379064f80 fix(voice): set caller key in on_track_subscribed — frame cryptor must exist for HKDF to apply
Root cause: C++ set_key() only applies HKDF when impl_->GetKey(pid) returns a valid
handler, which requires the frame cryptor for that participant to be initialized.
Frame cryptors are created at track subscription time, not at connect time.

Calling set_key(caller_identity, key) immediately after connect() skips HKDF
derivation (impl_->GetKey returns null) → raw key stored → DEC_FAILED.

Fix: move caller key setting to on_track_subscribed where frame cryptor definitely exists.
Also update on_encryption_key to use set_key() for key rotation updates.
2026-02-22 14:05:54 +02:00
Christian Gick
190b35945c fix(voice): guard e2ee_manager access when E2EE disabled (diagnostic mode) 2026-02-22 13:46:51 +02:00
Christian Gick
c188a2daf6 test(voice): disable E2EE entirely — check if EC sends plaintext vs encrypted
If VAD triggers → EC audio reaches pipeline without decryption (plaintext or format issue).
If VAD silent → E2EE encryption on EC side but key/format mismatch on our side.
Note: bot greeting will be unencrypted so EC may not hear it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 13:34:26 +02:00
Christian Gick
3d05b503c6 test(voice): pre-derive HKDF in Python, use set_shared_key to bypass Rust FFI HKDF
Diagnostic: if Rust FFI HKDF produces different result than EC JS HKDF,
set_key(caller) would always fail (DEC_FAILED). Test: pre-derive AES key
in Python matching livekit-client-sdk-js params (SHA-256, salt=LKFrameEncryptionKey,
info=128-zeros, 16-byte output), pass to set_shared_key() which stores raw (no KDF).
If user→bot decryption now works, root cause = Rust HKDF mismatch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 12:24:57 +02:00
Christian Gick
7adeebfe05 fix(voice): restore set_shared_key fallback + failure_tolerance=10 from working commit e3ede3f
The confirmed-working Feb 21 commit (e3ede3f) used:
- kp.set_shared_key(caller_key) as fallback for incoming audio decryption
- failure_tolerance=10 (not -1) so DEC_FAILED state changes are visible

Per-participant kp.set_key() alone is insufficient — the patched Rust FFI
appears to fall back to shared_key for incoming track decryption.
failure_tolerance=-1 was masking the DEC_FAILED state making diagnosis hard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 11:46:32 +02:00
Christian Gick
2a799f5760 fix(voice): set caller E2EE key on participant_connected + for all remote LK identities
Two race conditions when bot joins first (remote=0):
1. Key arrives before participant joins LK → on_participant_connected now applies stored keys
2. Key arrives after session start → on_encryption_key now sets key for all remote_participants by LK identity

Fixes identity mismatch between Matrix device_id (from key event) and LK participant identity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 11:30:23 +02:00
Christian Gick
5d31886192 debug(voice): add VAD start/stop events to trace where audio pipeline breaks
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 11:18:51 +02:00
Christian Gick
f74a11fde8 fix(voice): separate aiohttp sessions for STT and TTS
Sharing one session between ElevenLabs STT (WebSocket) and TTS (HTTP)
can cause connection conflicts. Use dedicated sessions for each.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 11:15:46 +02:00
Christian Gick
e3c1ded328 feat(voice): inject datetime into prompt, respond in DE/EN
- Add VOICE_TIMEZONE env var (default: Europe/Berlin) for local time
- Bot knows exact date/time at call start via _build_voice_prompt()
- Respond in user language (DE or EN) instead of always German

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 11:02:56 +02:00
Christian Gick
92ab906a21 chore(voice): switch default voice to George (multilingual DE/EN)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 10:59:44 +02:00
Christian Gick
7696ca68ee chore: Remove debug logging and pipecat-poc after E2EE fix confirmed working
- Remove setLevel(DEBUG) for livekit.agents/plugins (added for diagnostics)
- Remove periodic E2EE cryptor/participant state poll loop (no longer needed)
- Remove pipecat-poc/pipeline.py (POC never deployed, LiveKit approach confirmed)

E2EE bidirectional voice confirmed working in MAT-36.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 10:55:47 +02:00
Christian Gick
63545f032e fix(voice): set E2EE keys immediately after connect, before rotation wait
Root cause: caller track subscribed during 2s rotation wait creates a
frame cryptor with no key → DEC_FAILED state → all incoming frames dropped.
Setting the key after the wait doesn't recover the cryptor.

Fix: set bot + caller keys immediately after lk_room.connect(), using
the Matrix-provided caller identity. The post-rotation and post-find-remote
key updates remain as belt+suspenders.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 10:34:20 +02:00
Christian Gick
4ab5486b5c fix(voice): log remote participant identity and track count in E2EE poll
Adds REMOTE_PARTICIPANT log every 10s to confirm caller is present
and tracks are subscribed during E2EE decryption diagnosis.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 10:30:38 +02:00
Christian Gick
c4581c2917 fix(voice): reduce key rotation wait to 2s, increase E2EE poll to every 10s
Phase 2 diagnostics: caller audio arrives immediately; setting the key
earlier (2s vs 10s) avoids dropping initial frames. E2EE_CRYPTOR log
now fires every 10s (was 30s) to confirm decryption state for incoming
caller audio.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 10:27:19 +02:00
Christian Gick
5973ed1db3 fix(voice): revert to KDF_HKDF=1 with raw keys — proto value 0 is PBKDF2 not raw
e2ee_patch.py shows KDF_PBKDF2=0, KDF_HKDF=1.
Our KDF_NONE=0 was actually PBKDF2, double-deriving keys and causing silence.
Removed Python HKDF pre-derivation — let Rust FFI apply HKDF internally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 09:26:44 +02:00
Christian Gick
6b457a2aef fix(voice): use correct HKDF info=128zeros, length=16 matching LiveKit JS SDK
LiveKit JS SDK deriveKeys(): info=new ArrayBuffer(128) (128 zero bytes, NOT identity), output=16 bytes AES-128.
Previous code used identity as info and 32-byte output - both wrong, caused silence in both directions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 09:09:34 +02:00
Christian Gick
4f8bfbe479 fix(voice): pre-derive HKDF in Python, use KDF_NONE to bypass Rust FFI HKDF
Rust FFI's KDF_HKDF path for incoming decryption may use wrong parameters.
Pre-derive HKDF(base_key, salt="LKFrameEncryptionKey", info=identity) in Python
and pass derived key with KDF_NONE so Rust FFI uses it directly as frame key.

Matches EC's MatrixKeyProvider: ratchetWindowSize=10, keyringSize=256.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 08:47:41 +02:00
Christian Gick
c330900a3a fix(voice): wait for key rotation via nio sync, not HTTP fetch
io.element.call.encryption_keys events are Megolm-encrypted in this room
(appear as m.room.encrypted). The HTTP fetch cannot decrypt them — only
the nio sync client can via Olm/Megolm decryption.

Change the post-connect rotation poll to check self._caller_all_keys
directly (updated by on_encryption_key() via nio sync) instead of calling
_fetch_encryption_key_http() which always returns nothing in encrypted rooms.

Also extends wait to 10s and adds progress logging every 2s.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 08:23:17 +02:00
Christian Gick
cf519595d6 fix(voice): poll for EC key rotation post-connect, set all key indices
Element Call rotates its encryption key when a new participant joins the
LiveKit room. Previously the bot fetched only the pre-join key and set it
at index 0, while EC was already encrypting with the rotated key (index 1).

Changes:
- After connecting to LiveKit, poll the Matrix timeline up to 5s (10×0.5s)
  to detect the post-join key rotation
- Set ALL known caller key indices (not just 0) so the Rust FFI cryptor
  has the correct key regardless of which index EC is currently using
- Also set via caller_identity (belt+suspenders) if different from LK identity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 08:20:44 +02:00
Christian Gick
8b143a2ac4 debug(e2ee): poll frame_cryptors() every 30s for state diagnosis 2026-02-22 08:14:19 +02:00
Christian Gick
630a0de970 fix(e2ee): revert to per-participant mode with proper rotation handling
The shared-key mode uses HKDF with empty info, but Element Call JS uses
participant identity as HKDF info. Per-participant mode (set_key with
identity) matches EC's derivation.

Previous per-participant attempt (b65d043) failed because key rotation
(index 0→1 when bot joins) wasn't handled. Now on_encryption_key calls
set_key(caller_id, key, index) on rotation, so the bot stays in sync.

Changes:
- _build_e2ee_options(): remove caller_key param, shared_key=b"" (per-participant mode)
- _run(): set_key(remote_identity, caller_key, 0) for incoming decryption
- on_encryption_key: only set_key() on rotation (no set_shared_key)
2026-02-22 08:10:27 +02:00
Christian Gick
295c0ed5cb debug(e2ee): decode encryption state to human-readable names 2026-02-22 08:00:28 +02:00
Christian Gick
a6236a3817 debug(e2ee): update both shared+per-participant keys on rotation 2026-02-22 07:53:40 +02:00
Christian Gick
b22c4d48e9 debug(e2ee): add e2ee_state_changed event listener for diagnostics
Log DECRYPTION_FAILED / MISSING_KEY / OK states per participant
to pinpoint exactly what the Rust FFI reports about key setup.
2026-02-22 07:50:12 +02:00
Christian Gick
a8b30418c8 debug(e2ee): verify shared key + belt-suspenders per-participant key
Add export_shared_key() verification after connect to confirm key
is stored. Also set per-participant key for caller (belt+suspenders)
so both shared-key and per-participant decryption paths are active.
2026-02-22 07:47:09 +02:00
Christian Gick
65340bf0ee fix(e2ee): use set_shared_key for live key rotation updates
When Element Call sees the bot join, it rotates its encryption key
(index 0 → 1). The on_encryption_key callback was calling set_key()
(per-participant) which has no effect in shared-key mode. Switch to
set_shared_key() so the shared-key decryption path stays current when
the caller rotates keys.
2026-02-22 07:38:06 +02:00
Christian Gick
9cf4afc928 fix(e2ee): pass caller_key as shared_key at connect time
Per-participant set_key() for remote identities doesn't work for
incoming decryption in this Rust FFI build (set_shared_key() after
connect is also ignored in per-participant mode).

Solution: initialize with caller_key as shared_key (true shared-key
mode) so the Rust FFI uses it for incoming decryption. Then override
outgoing encryption via set_key(bot_identity, bot_key) after connect.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 07:31:14 +02:00
Christian Gick
4875a7dc9b fix(e2ee): add set_shared_key fallback for incoming audio decryption
Rust FFI may not use per-participant key for remote participant
decryption in all code paths. Set the caller key as both per-participant
AND shared key so either path works for incoming frame decryption.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 07:25:26 +02:00
Christian Gick
893e07a543 fix(e2ee): set caller keys at correct indices from timeline
Element Call may rotate encryption keys to index > 0. Previously we
always called set_key(identity, key, 0) regardless of the actual index,
causing decryption to fail when the active key was at a non-zero index.

- _fetch_encryption_key_http: collect all {index->key} pairs from event
- _run: set each caller key at its correct index
- on_encryption_key: handle multiple indices, remove first-key-only gate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 07:19:28 +02:00
Christian Gick
685218247a fix: Use empty bytes instead of None for shared_key
NoneType causes TypeError in patched room.py proto assignment.
Empty bytes is falsy so shared_key is not set in proto,
initializing key provider in per-participant mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 07:02:33 +02:00
Christian Gick
9ebf90c8bb fix: Use per-participant E2EE mode (no shared_key)
shared_key locks provider in shared-key mode, making set_key()
ineffective for per-participant decryption. Remove shared_key so
SDK initializes in per-participant mode. Also: failure_tolerance=-1
to prevent premature track closure on decrypt failures,
ratchet_window_size=16 to match Element Call.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 07:00:22 +02:00
Christian Gick
c290332a1e fix: Disable close_on_disconnect to keep session alive
E2EE key setup may briefly appear as participant disconnect.
Keep session alive to allow audio to flow once keys are settled.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 06:49:44 +02:00
Christian Gick
b65d04389b fix: Switch E2EE to per-participant keys instead of shared key
Element Call uses per-participant keys, not shared key mode.
Bot now generates its own key, publishes it, and sets both
keys via key_provider.set_key() after connecting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 06:41:20 +02:00
Christian Gick
ced2783a09 fix: Enable E2EE with caller's key as shared key
Element Call now rejects unencrypted audio. Use caller's key
as shared_key so both sides encrypt/decrypt with the same key.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 20:51:43 +02:00
Christian Gick
4a93827de3 revert: Restore voice.py and bot.py to last known working state (9aef846)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 20:47:51 +02:00