Latency was dominated by the LLM call chain, not the 10-message context window.
Three fixes land together in the chat pipeline in bot.py:
1. Stream the main LLM call (new _stream_chat_completion helper) and
progressively edit the Matrix message via m.replace. Suppress visible
streaming during tool-calling iterations so the user never sees rolled-back
text. Final send is an authoritative edit that guarantees the full reply.
2. Gate _rewrite_query behind a pronoun/deictic heuristic (EN/DE/FR). When a
message has no references needing resolution we skip the extra Haiku
round-trip entirely and feed the original message to RAG directly.
3. Fire-and-forget the post-reply memory + chunk persistence with asyncio
background tasks so a slow extraction no longer blocks the next inbound
message. 20s timeout preserved inside the bg task; exceptions logged.
Added unit test for the pronoun heuristic (EN/DE/FR positive + negative cases,
short/empty messages).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Unverified devices (lacking cross-signing) caused OlmUnverifiedDeviceError
in _send_text(), silently breaking all message delivery. Now on_sync()
blacklists non-cross-signed devices instead of skipping them, and
_send_text() catches E2EE errors gracefully.
Adds 12 unit tests for device trust policy and send error handling.
CI test job now gates deployment in deploy.yml.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Brave Search results are passed through LiteLLM (claude-haiku) when
job config includes a `criteria` field. LLM returns indices of matching
results, filtering out noise before posting to Matrix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cron package that syncs jobs from matrixhost portal API, schedules execution
with timezone-aware timing, and posts results to Matrix rooms. Includes
Brave Search, reminder, and browser scrape (placeholder) executors with
formatter. 31 pytest tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>