perf(MAT): cut bot reply latency — stream, skip redundant rewrite, non-blocking persist
Latency was dominated by the LLM call chain, not the 10-message context window. Three fixes land together in the chat pipeline in bot.py: 1. Stream the main LLM call (new _stream_chat_completion helper) and progressively edit the Matrix message via m.replace. Suppress visible streaming during tool-calling iterations so the user never sees rolled-back text. Final send is an authoritative edit that guarantees the full reply. 2. Gate _rewrite_query behind a pronoun/deictic heuristic (EN/DE/FR). When a message has no references needing resolution we skip the extra Haiku round-trip entirely and feed the original message to RAG directly. 3. Fire-and-forget the post-reply memory + chunk persistence with asyncio background tasks so a slow extraction no longer blocks the next inbound message. 20s timeout preserved inside the bg task; exceptions logged. Added unit test for the pronoun heuristic (EN/DE/FR positive + negative cases, short/empty messages). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
41
tests/test_needs_query_rewrite.py
Normal file
41
tests/test_needs_query_rewrite.py
Normal file
@@ -0,0 +1,41 @@
|
||||
"""Heuristic gate for `_rewrite_query` (bot.py). Skips the LLM round-trip when
|
||||
the message has no pronouns or deictic references that would need context."""
|
||||
|
||||
from bot import Bot
|
||||
|
||||
|
||||
def _needs(msg: str) -> bool:
|
||||
return Bot._needs_query_rewrite(msg)
|
||||
|
||||
|
||||
def test_short_message_skipped():
|
||||
assert _needs("hi") is False
|
||||
assert _needs("ok") is False
|
||||
|
||||
|
||||
def test_self_contained_no_pronouns_skipped():
|
||||
assert _needs("What is the capital of France?") is False
|
||||
assert _needs("Summarize the Q3 earnings report") is False
|
||||
assert _needs("Wie ist das Wetter in Berlin morgen") is False
|
||||
|
||||
|
||||
def test_english_pronouns_trigger():
|
||||
assert _needs("What does it mean?") is True
|
||||
assert _needs("Can you fix that?") is True
|
||||
assert _needs("Tell me more about them") is True
|
||||
|
||||
|
||||
def test_german_pronouns_trigger():
|
||||
assert _needs("Was bedeutet das?") is True
|
||||
assert _needs("Kannst du es noch einmal erklären") is True
|
||||
assert _needs("Wer sind sie?") is True
|
||||
|
||||
|
||||
def test_french_pronouns_trigger():
|
||||
assert _needs("Qu'est-ce que ça veut dire?") is True
|
||||
assert _needs("Parle-moi de lui") is True
|
||||
|
||||
|
||||
def test_empty_or_whitespace():
|
||||
assert _needs("") is False
|
||||
assert _needs(" ") is False
|
||||
Reference in New Issue
Block a user