fix: Improve voice noise tolerance and focus on latest message

- Raise VAD thresholds (activation 0.65→0.75, min speech 0.4→0.6s, min silence 0.55→0.65s) to reduce false triggers from background noise - Add "focus on latest message" instruction to all prompts (voice + text) - Add "greet and wait" behavior for new conversations instead of auto-continuing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 13:30:14 +02:00
parent 90cdc7b812
commit 1000891a97
3 changed files with 11 additions and 4 deletions
--- a/agent.py
+++ b/agent.py
@@ -28,7 +28,10 @@ Rules:
 - Keep answers SHORT — 1-3 sentences max
 - Be direct, no filler words
 - If the user wants more detail, they will ask
- Speak naturally as in a conversation"""
+- Speak naturally as in a conversation
 - Always focus on the user's most recent message. Do not continue or summarize previous conversations
 - If a voice message contains only noise, silence, or filler sounds, ignore it completely
 - When a user greets you or starts a new conversation, greet briefly and wait for instructions"""
 server = AgentServer()
--- a/bot.py
+++ b/bot.py
@@ -94,6 +94,8 @@ IMPORTANT RULES — FOLLOW THESE STRICTLY:
 - If no relevant documents were found, simply say you don't have information on that topic and ask if you can help with something else. Do NOT speculate about why or suggest the user look elsewhere.
 - You can see and analyze images that users send. Describe what you see when asked about an image.
 - You can read and analyze PDF documents that users send. Summarize content and answer questions about them.
 - Always focus on the user's most recent message — whether it was text or voice. Do not automatically continue or summarize previous conversations.
 - When a user greets you or starts a new conversation after a pause, respond with a brief greeting and wait for their instructions.
 - You can generate images when asked — use the generate_image tool for any image creation, drawing, or illustration requests.
 - You can search the web using the web_search tool. Use it when users ask about current events, facts, or anything that needs up-to-date information.
 - You can open and read web pages using browse_url. Use it when a user shares a link, or when you need more detail from a search result. Summarize the key content concisely.
--- a/voice.py
+++ b/voice.py
@@ -54,6 +54,8 @@ STRIKTE Regeln:
 - Erfinde NICHTS ausser der Nutzer bittet explizit um Rollenspiel, Probegespraech oder Simulation basierend auf dem Dokumentinhalt. In dem Fall spiele die Rolle ueberzeugend und nutze den Dokumentinhalt als Grundlage
 - Beantworte nur was gefragt wird
 - Wenn niemand etwas fragt oder du dir nicht sicher bist ob jemand mit dir spricht, SCHWEIGE. Antworte NUR auf klare, direkte Fragen oder Anweisungen. Kein Smalltalk, kein "Danke", kein "Wie kann ich helfen" von dir aus
 - Fokussiere dich IMMER auf die letzte Nachricht des Nutzers — egal ob Text oder Sprache. Fuehre nicht automatisch fruehere Gespraeche fort und fasse sie nicht zusammen
 - Wenn ein Nutzer dich gruesst oder ein neues Gespraech nach einer Pause beginnt, antworte mit einer kurzen Begruessung und warte auf Anweisungen
 - Schreibe Zahlen und Jahreszahlen IMMER als Woerter aus (z.B. "zweitausendundzwanzig" statt "2026", "zweiundzwanzigsten Februar" statt "22. Februar")
 - Bei zeitrelevanten Fragen (Uhrzeit, Termine, Geschaeftszeiten): frage kurz nach ob der Nutzer noch in seiner gespeicherten Zeitzone ist, bevor du antwortest. Nutze set_user_timezone wenn sich der Standort geaendert hat.
 - Wenn der Nutzer seinen Standort oder seine Stadt erwaehnt, nutze set_user_timezone um die Zeitzone zu speichern.
@@ -153,9 +155,9 @@ def _get_vad():
    global _vad
    if _vad is None:
        _vad = silero.VAD.load(
-            activation_threshold=0.65,
+            activation_threshold=0.75,
-            min_speech_duration=0.4,
+            min_speech_duration=0.6,
-            min_silence_duration=0.55,
+            min_silence_duration=0.65,
        )
    return _vad