Video tracks (camera + screen share) were never getting E2EE keys set
via set_key() because the condition on track_subscribed only matched
audio tracks (kind==1). This caused DEC_FAILED for all video frames,
making look_at_screen return encrypted garbage or fail entirely.
Also added track source logging to distinguish camera vs screen share.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Raise VAD thresholds (activation 0.65→0.75, min speech 0.4→0.6s,
min silence 0.55→0.65s) to reduce false triggers from background noise
- Add "focus on latest message" instruction to all prompts (voice + text)
- Add "greet and wait" behavior for new conversations instead of auto-continuing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Video track kind is 2 (not 0) in LiveKit Python SDK — camera was never captured
- Replace broken confluence_collab.create_page import with direct REST API call
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When browse_url fails with DNS resolution error (common with STT-misrecognized
domain names like "klicksports" instead of "clicksports"), automatically try a
web search to find the correct domain and retry. Applied to both text and voice bot.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both bots can now fetch and read web pages via browse_url tool.
Uses httpx + BeautifulSoup to extract clean text from HTML.
Complements existing web_search (Brave) with full page reading.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MAT-58: Add recent_confluence_pages tool to both voice and text chat.
Shows last 5 recently modified pages so users can pick directly
instead of having to search every time.
MAT-59: Integrate sentry-sdk in all three entry points (agent.py,
bot.py, voice.py). SENTRY_DSN env var, traces at 10% sample rate.
Requires creating project in Sentry UI and setting DSN.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add create_confluence_page tool to voice mode (basic auth)
- Add confluence_update_page and confluence_create_page tools to text chat (OAuth)
- Fix update tool: wrap each paragraph in <p> tags instead of single wrapper
- Update system prompt to mention create capability
Previously only search/read were available. User reported bot couldn't
write to or create Confluence pages — because the tools didn't exist.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes for the bot going silent after ~10 messages:
1. STT artifact handler now returns early — previously detected noise
leaks ("Vielen Dank.", etc.) but still appended them to transcript,
inflating context until LLM timed out after 4 retries.
2. Context truncation — caps LLM chat context at 40 items and internal
transcript at 80 entries to prevent unbounded growth in long sessions.
3. LLM timeout recovery — watchdog detects when agent has been silent
for >60s despite user activity, sends a recovery reply asking user
to repeat their question instead of staying permanently silent.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Voice bot could read/update Confluence pages but could not search.
Users asking to search Confluence got a refusal. Now the voice bot
has search_confluence using CQL queries via the service account.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace inline regex section parser in voice.py with confluence_collab
library (BS4 parsing, 409 conflict retry). Bot now loads section outline
into LLM context when Confluence links are detected.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Voice bot can now see the users camera or screen share when asked.
Captures a single frame, encodes as JPEG, sends to Sonnet vision
with full context (transcript + document). Triggered by phrases like
schau mal, siehst du das, can you see this.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
"bist du dir sicher" / "are you sure" / "stimmt das wirklich" now also
trigger Opus escalation for fact-checking the previous answer.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sonnet can now escalate complex questions to Opus via a function tool,
same pattern as search_web and read_confluence_page. Full context
(transcript + document) is passed automatically. Triggered by user
phrases like "denk genauer nach" / "think harder" or when Sonnet is
unsure about complex analysis.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a voice call ends and a document was loaded in the room, the bot
now analyzes the transcript for document-specific changes/corrections
and posts them as a structured "Dokument-Aenderungen" message. Returns
nothing if no document changes were discussed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Confluence tools default to active page from room context — no more
asking user for page_id
- Prompt allows roleplay/mock interviews when document context present
- Explicit instruction not to ask for page_id
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable realtime Confluence page editing during Element Call voice sessions.
- Add read_confluence_page and update_confluence_page function tools
- Detect Confluence URLs shared in Matrix rooms, store page ID for voice context
- Section-level updates via heading match + version-incremented PUT
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pass PDF document context from room to voice session so the voice LLM
can answer questions about uploaded PDFs. Persist call transcripts and
post an LLM-generated summary to the room when the call ends.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Store user timezone as [PREF:timezone] in memory service
- Query timezone preference on session start, override default
- Add set_user_timezone tool so bot learns timezone from conversation
- On time-relevant questions, bot asks if user is still at stored location
- Seeded Europe/Nicosia for @christian.gick:agiliton.eu
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bot now knows the user's timezone (Europe/Berlin default) and which
LLM model it's running on, so it can answer questions about both.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removing the blocking wait entirely caused DEC_FAILED - the rotated key
had not arrived via nio sync before the pipeline started. Restore a short
3s wait (down from 10s) which is enough for nio to deliver the rotated key.
Also fix on_mute/on_unmute arg order (participant, publication - not reversed).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace _extract_voice_memories with _store_voice_exchange
- Store raw "User: ... / Assistant: ..." pairs directly
- No LLM call needed — faster, cheaper, no lost context
- Load as "Frühere Gespräche" with full thread context
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
flash_v2_5 had audible compression artifacts. multilingual_v2 has higher
fidelity while speed=1.15 via VoiceSettings still gives snappier delivery.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Model: eleven_multilingual_v2 → eleven_flash_v2_5 (lower latency)
- Speed: 1.15x via VoiceSettings
- Stability/similarity tuned for natural German speech
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EC rotates encryption key when bot joins LiveKit room. The rotated
key arrives via Matrix sync 3-5s later. Previous 2s wait was too
short - DEC_FAILED before new key arrived.
Extended wait to 10s. Added logging to bot.py to trace why late
key events were not being processed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The HKDF sed patch in Dockerfile was wrong — it swapped salt/info
based on incorrect analysis of minified JS. The original Rust FFI
parameters are correct: salt="LKFrameEncryptionKey", info=[0;128].
Also removed Python-side HMAC pre-ratcheting of keys. Element Call
uses explicit key rotation via Matrix events, not HMAC ratcheting.
Added diagnostic logging to trace exact key bytes during E2EE setup.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows disabling E2EE for diagnostic purposes. When disabled, bot
connects to LiveKit without frame encryption.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Inline E2EE options had 3 wrong values vs Element Call JS SDK:
- failure_tolerance=-1 (infinite, hid all DEC_FAILED) → 10
- key_ring_size=16 (too small, keys overflow) → 256
- ratchet_window_size=16 (wrong) → 10
Now uses _build_e2ee_options() which was already correct but never called.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ElevenLabs scribe_v2_realtime also produces non-asterisk artifacts like
"Untertitel: ARD Text im Auftrag von Funk (2017)" from TV/radio audio.
Add pattern matching for subtitle metadata, copyright notices, and
parenthetical/bracketed annotations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace Jack Marlowe (slow/raw) with Robert Ranger (deep/natural) for
a more pleasant conversational voice assistant experience.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace broken _VoiceAgent stt_node override with _NoiseFilterAgent that uses
on_user_turn_completed() + StopResponse. This operates downstream of VAD+STT
so no backpressure risk to the audio pipeline.
When ElevenLabs scribe_v2_realtime produces *Störgeräusche* etc., the agent
now silently suppresses them before the LLM responds. The prompt-based filter
is kept as defense-in-depth.
Fixes: MAT-41
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The _count_frames coroutine created a second rtc.AudioStream on the caller's
audio track, competing with AgentSession's internal pipeline for event loop
time. Under load, this caused VAD to miss speech → user_state stuck on "away".
- Remove _count_frames AudioStream (debugging artifact)
- Add VAD state diagnostics (speaking count, away duration)
- Add VAD watchdog: warns if user_state=away >30s (MAT-40 detection)
Fixes: MAT-40
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add LLM prompt rule to ignore *Störgeräusche* etc. annotations
instead of overriding stt_node (which broke VAD pipeline)
- Switch voice to vmVmHDKBkkCgbLVIOJRb per user preference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Threshold 0.60 too strict, user speech consistently not detected.
Back to default 0.50 with min_speech_duration=0.2 as noise guard.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace George (British EN) with Jack Marlowe (Gng1FdSGZlhs6jKgzAxL),
the only native German voice in the library. Fixes garbled number/date
pronunciation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
George (British) voice mangles German digit strings. Force LLM to
write all numbers as German words so TTS pronounces them correctly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0.75 too strict, user voice not detected. 0.60 with min_speech_duration=0.2
should balance noise rejection vs speech detection.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pcm_24000 caused silent playback through livekit. Reverting to
plugin default encoding which is known working.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switch from mp3_22050_32 (default) to lossless PCM 24kHz for cleaner
voice output. Add language=de for German text normalization.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
scribe_v2_realtime annotates background audio as *Störgeräusche*,
*Fernsehgeräusche* etc. Override stt_node to drop these so the LLM
only receives actual speech transcripts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Plugin requires explicit aiohttp session; livekit http_context not available
in this job setup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Query user memories at call start and inject into agent system prompt
- Extract new facts after each exchange using claude-haiku via LiteLLM
- Add Brave Search tool (@function_tool) for current data queries
- Pass memory client and caller_user_id through VoiceSession constructor
- Pre-compute 8 HMAC-ratcheted EC keys for reliable E2EE decryption
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>