matrix-ai-agent

christian/matrix-ai-agent

Fork 0

Commit Graph

Author	SHA1	Message	Date
Christian Gick	f4bdae7a1e	perf(MAT): cut bot reply latency — stream, skip redundant rewrite, non-blocking persist Some checks failed Build & Deploy / test (push) Failing after 1m10s Details Build & Deploy / build-and-deploy (push) Has been skipped Details Tests / test (push) Failing after 9s Details Latency was dominated by the LLM call chain, not the 10-message context window. Three fixes land together in the chat pipeline in bot.py: 1. Stream the main LLM call (new _stream_chat_completion helper) and progressively edit the Matrix message via m.replace. Suppress visible streaming during tool-calling iterations so the user never sees rolled-back text. Final send is an authoritative edit that guarantees the full reply. 2. Gate _rewrite_query behind a pronoun/deictic heuristic (EN/DE/FR). When a message has no references needing resolution we skip the extra Haiku round-trip entirely and feed the original message to RAG directly. 3. Fire-and-forget the post-reply memory + chunk persistence with asyncio background tasks so a slow extraction no longer blocks the next inbound message. 20s timeout preserved inside the bg task; exceptions logged. Added unit test for the pronoun heuristic (EN/DE/FR positive + negative cases, short/empty messages). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-15 18:48:48 +03:00

Author

SHA1

Message

Date

Christian Gick

f4bdae7a1e

perf(MAT): cut bot reply latency — stream, skip redundant rewrite, non-blocking persist

Build & Deploy / test (push) Failing after 1m10s

Details

Build & Deploy / build-and-deploy (push) Has been skipped

Details

Tests / test (push) Failing after 9s

Details

Latency was dominated by the LLM call chain, not the 10-message context window.
Three fixes land together in the chat pipeline in bot.py:

1. Stream the main LLM call (new _stream_chat_completion helper) and
   progressively edit the Matrix message via m.replace. Suppress visible
   streaming during tool-calling iterations so the user never sees rolled-back
   text. Final send is an authoritative edit that guarantees the full reply.

2. Gate _rewrite_query behind a pronoun/deictic heuristic (EN/DE/FR). When a
   message has no references needing resolution we skip the extra Haiku
   round-trip entirely and feed the original message to RAG directly.

3. Fire-and-forget the post-reply memory + chunk persistence with asyncio
   background tasks so a slow extraction no longer blocks the next inbound
   message. 20s timeout preserved inside the bg task; exceptions logged.

Added unit test for the pronoun heuristic (EN/DE/FR positive + negative cases,
short/empty messages).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-15 18:48:48 +03:00

1 Commits