The custom HKDF=1 path in the Rust FFI fork (EC-compat-changes)
produces different derived keys than the JS SDK's libwebrtc C++.
Switching to PBKDF2=0 lets libwebrtc's built-in C++ FrameCryptor
handle key derivation identically to how the JS SDK does it.
Also aligned ratchet_window_size=0 and key_ring_size=16 to match
Element Call JS SDK defaults (were 10 and 256).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Element X (26.03.3+) sends io.element.call.encryption_keys as
to-device messages, not room timeline events. Added
UnknownToDeviceEvent callback to catch these and deliver keys
to active voice sessions.
Also added m.room.encrypted decryption attempt in timeline scan
as fallback for older Element versions that send encrypted timeline
events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Element X embeds E2EE keys inside memberships[].encryption_keys,
not at the top level of the call.member state event content.
Bot was only checking content.encryption_keys, so it never found
the caller's key — causing 'Warten auf Medien' (waiting for media)
because encrypted audio couldn't be decrypted.
- Added _extract_enc_keys_from_content() helper handling both formats
- Updated on_unknown handler, VoiceSession creation, and key fetch
- Bot now publishes keys in both formats for compatibility
- Updated voice.py state fetch to check memberships[] fallback
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Element Call v0.17+ embeds encryption_keys in call.member state events
instead of separate timeline events. In E2EE rooms, timeline events are
encrypted and the bot HTTP fetch cannot decrypt them, causing DEC_FAILED.
- Extract caller keys from call.member state event on join
- Embed bot key in call.member state event
- Check call.member state in key fetch (before timeline fallback)
- Handle key updates in call.member during active calls
- Update voice.py key poller to check call.member state first
- Add debug logging for UnknownEvent types in call rooms
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Route HTTP-fetched keys through on_encryption_key() for proper rotation detection
- Replace boolean refetch gate with 500ms timestamp throttle for faster recovery
- Reduce DEC_FAILED cooldown from 2s to 0.5s
- Extend proactive key poll from 3s to 10s window
- Add continuous background key poller (3s interval) during active calls
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a video track is subscribed (screen share starts), Element Call
rotates the E2EE key. Instead of waiting for DEC_FAILED, proactively
poll the timeline for the new key (6x @ 500ms = 3s window).
Also reduce DEC_FAILED threshold from 3→1 and cooldown from 5s→2s
for faster recovery when the proactive poll misses the rotation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The re-fetch check was placed after the 5s cooldown return, so it never
executed. Now it triggers after 3+ DEC_FAILED regardless of cooldown.
Also relaxed stale key age filter from 60s to 300s to handle key
rotation during ongoing calls.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Element Call rotates E2EE keys by re-sending index 0 with a new value
when screen share starts. The LiveKit frame cryptor caches derived AES
keys per index, so overwriting index 0 does not force re-derivation.
Fix: detect when index 0 value changes and map to incrementing internal
index so the frame cryptor gets a fresh key slot. Sets all accumulated
keys on late arrival so cryptor can try both during transition.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Timeline key fetch now filters by sent_ts (max 60s age) to avoid
using keys from a previous call session
- After 3+ consecutive DEC_FAILED events, automatically re-fetches
key from timeline in case rotation happened
- Tracks DEC_FAILED count per participant, resets on OK
This should fix the issue where the bot picks up stale encryption keys
from previous calls and can't decrypt the current caller's audio.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Plays immediate spoken feedback so the user knows the bot is processing
their screen share / camera before the vision API responds.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The on_e2ee_state callback crashed with NameError on time.monotonic()
when video tracks (screen share) arrived, preventing E2EE key re-derivation
and causing the bot to miss screen-share related questions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: aggressive video re-keying (set_key at 0.3/0.8/2/5s intervals)
briefly cleared encryption_key between SetKey and HKDF callback, causing
DEC_FAILED oscillation. Single set_key per track subscription is sufficient.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR #904 callback-based HKDF hack only fired for the first frame cryptor
(audio), leaving video frame cryptors with PBKDF2 - DEC_FAILED oscillation.
PR #921 integrates HKDF natively at the WebRTC C++ level, applying uniformly
to all frame cryptors (audio + video).
Also removes aggressive video re-keying workaround and adds 5s cooldown
to DEC_FAILED re-keying handler to prevent tight loops.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Video frame cryptors may not be fully initialized when set_key() is
first called during on_track_subscribed. Audio works immediately but
video oscillates OK↔DEC_FAILED with the same key.
Add staggered re-keying at 0.3s, 0.8s, 2s, 5s after video track
subscription to ensure the key is applied after the frame cryptor
is fully ready.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
KDF_PBKDF2=0 does NOT mean raw mode — libwebrtc applies its built-in
PBKDF2 on top of pre-derived keys, causing DEC_FAILED for audio too.
Revert to KDF_HKDF=1 (Rust applies HKDF, we pass raw base keys).
Keep diagnostic improvements:
- _derive_and_set_key() wrapper with logging
- Per-track type logging (audio vs video) in on_track_subscribed
- Frame size check in look_at_screen (detect E2EE failure)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch from Rust-side HKDF (KDF_HKDF=1) to Python-side HKDF derivation
with raw key mode (KDF_RAW=0). This eliminates potential HKDF implementation
mismatches between Rust FFI and Element Call JS that caused video frame
decryption failures (audio worked, video showed 8x8 garbage frames).
Changes:
- Add _derive_and_set_key() helper that pre-derives HKDF then calls set_key()
- Set key_derivation_function=KDF_RAW (proto 0 = no Rust-side derivation)
- Replace all direct set_key() calls with _derive_and_set_key()
- Add per-track diagnostic logging (audio vs video)
- Add frame size check in look_at_screen (detect E2EE failure early)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Video tracks (camera + screen share) were never getting E2EE keys set
via set_key() because the condition on track_subscribed only matched
audio tracks (kind==1). This caused DEC_FAILED for all video frames,
making look_at_screen return encrypted garbage or fail entirely.
Also added track source logging to distinguish camera vs screen share.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Raise VAD thresholds (activation 0.65→0.75, min speech 0.4→0.6s,
min silence 0.55→0.65s) to reduce false triggers from background noise
- Add "focus on latest message" instruction to all prompts (voice + text)
- Add "greet and wait" behavior for new conversations instead of auto-continuing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Video track kind is 2 (not 0) in LiveKit Python SDK — camera was never captured
- Replace broken confluence_collab.create_page import with direct REST API call
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When browse_url fails with DNS resolution error (common with STT-misrecognized
domain names like "klicksports" instead of "clicksports"), automatically try a
web search to find the correct domain and retry. Applied to both text and voice bot.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both bots can now fetch and read web pages via browse_url tool.
Uses httpx + BeautifulSoup to extract clean text from HTML.
Complements existing web_search (Brave) with full page reading.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MAT-58: Add recent_confluence_pages tool to both voice and text chat.
Shows last 5 recently modified pages so users can pick directly
instead of having to search every time.
MAT-59: Integrate sentry-sdk in all three entry points (agent.py,
bot.py, voice.py). SENTRY_DSN env var, traces at 10% sample rate.
Requires creating project in Sentry UI and setting DSN.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add create_confluence_page tool to voice mode (basic auth)
- Add confluence_update_page and confluence_create_page tools to text chat (OAuth)
- Fix update tool: wrap each paragraph in <p> tags instead of single wrapper
- Update system prompt to mention create capability
Previously only search/read were available. User reported bot couldn't
write to or create Confluence pages — because the tools didn't exist.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes for the bot going silent after ~10 messages:
1. STT artifact handler now returns early — previously detected noise
leaks ("Vielen Dank.", etc.) but still appended them to transcript,
inflating context until LLM timed out after 4 retries.
2. Context truncation — caps LLM chat context at 40 items and internal
transcript at 80 entries to prevent unbounded growth in long sessions.
3. LLM timeout recovery — watchdog detects when agent has been silent
for >60s despite user activity, sends a recovery reply asking user
to repeat their question instead of staying permanently silent.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Voice bot could read/update Confluence pages but could not search.
Users asking to search Confluence got a refusal. Now the voice bot
has search_confluence using CQL queries via the service account.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace inline regex section parser in voice.py with confluence_collab
library (BS4 parsing, 409 conflict retry). Bot now loads section outline
into LLM context when Confluence links are detected.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Voice bot can now see the users camera or screen share when asked.
Captures a single frame, encodes as JPEG, sends to Sonnet vision
with full context (transcript + document). Triggered by phrases like
schau mal, siehst du das, can you see this.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
"bist du dir sicher" / "are you sure" / "stimmt das wirklich" now also
trigger Opus escalation for fact-checking the previous answer.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sonnet can now escalate complex questions to Opus via a function tool,
same pattern as search_web and read_confluence_page. Full context
(transcript + document) is passed automatically. Triggered by user
phrases like "denk genauer nach" / "think harder" or when Sonnet is
unsure about complex analysis.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a voice call ends and a document was loaded in the room, the bot
now analyzes the transcript for document-specific changes/corrections
and posts them as a structured "Dokument-Aenderungen" message. Returns
nothing if no document changes were discussed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Confluence tools default to active page from room context — no more
asking user for page_id
- Prompt allows roleplay/mock interviews when document context present
- Explicit instruction not to ask for page_id
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable realtime Confluence page editing during Element Call voice sessions.
- Add read_confluence_page and update_confluence_page function tools
- Detect Confluence URLs shared in Matrix rooms, store page ID for voice context
- Section-level updates via heading match + version-incremented PUT
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pass PDF document context from room to voice session so the voice LLM
can answer questions about uploaded PDFs. Persist call transcripts and
post an LLM-generated summary to the room when the call ends.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Store user timezone as [PREF:timezone] in memory service
- Query timezone preference on session start, override default
- Add set_user_timezone tool so bot learns timezone from conversation
- On time-relevant questions, bot asks if user is still at stored location
- Seeded Europe/Nicosia for @christian.gick:agiliton.eu
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bot now knows the user's timezone (Europe/Berlin default) and which
LLM model it's running on, so it can answer questions about both.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removing the blocking wait entirely caused DEC_FAILED - the rotated key
had not arrived via nio sync before the pipeline started. Restore a short
3s wait (down from 10s) which is enough for nio to deliver the rotated key.
Also fix on_mute/on_unmute arg order (participant, publication - not reversed).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace _extract_voice_memories with _store_voice_exchange
- Store raw "User: ... / Assistant: ..." pairs directly
- No LLM call needed — faster, cheaper, no lost context
- Load as "Frühere Gespräche" with full thread context
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
flash_v2_5 had audible compression artifacts. multilingual_v2 has higher
fidelity while speed=1.15 via VoiceSettings still gives snappier delivery.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Model: eleven_multilingual_v2 → eleven_flash_v2_5 (lower latency)
- Speed: 1.15x via VoiceSettings
- Stability/similarity tuned for natural German speech
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EC rotates encryption key when bot joins LiveKit room. The rotated
key arrives via Matrix sync 3-5s later. Previous 2s wait was too
short - DEC_FAILED before new key arrived.
Extended wait to 10s. Added logging to bot.py to trace why late
key events were not being processed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The HKDF sed patch in Dockerfile was wrong — it swapped salt/info
based on incorrect analysis of minified JS. The original Rust FFI
parameters are correct: salt="LKFrameEncryptionKey", info=[0;128].
Also removed Python-side HMAC pre-ratcheting of keys. Element Call
uses explicit key rotation via Matrix events, not HMAC ratcheting.
Added diagnostic logging to trace exact key bytes during E2EE setup.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows disabling E2EE for diagnostic purposes. When disabled, bot
connects to LiveKit without frame encryption.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>