Commit Graph

247 Commits

Author SHA1 Message Date
Christian Gick
6fb8c33057 fix: Truncate AI reply to 200 chars in memory extraction to prevent doc pollution
The AI reply often contains full document content (passport details, etc.)
which the memory extraction LLM incorrectly stores as user facts. Limiting
to 200 chars avoids including document content while keeping the gist.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 15:47:23 +02:00
Christian Gick
f1529013ca fix: Limit chat history to 10 messages to prevent stale pattern override
30 messages of "only one passport" history overwhelmed fresh RAG results.
Reducing to 10 messages (5 exchanges) provides enough conversation context
without letting stale patterns dominate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 15:45:05 +02:00
Christian Gick
b925786867 fix: Move doc_context after history to prevent history pattern override
Two changes:
1. Reorder messages: doc_context now placed RIGHT BEFORE the user message
   (after chat history), so fresh search results override historical patterns
   where the bot repeatedly said "only one passport"
2. Strengthen doc_context instructions: explicitly tell LLM that fresh search
   results override chat history, and to list ALL matching documents

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 15:42:05 +02:00
Christian Gick
aa175b8fb9 fix: Prevent memory extraction from storing document facts as user facts
The memory extraction prompt was extracting facts from RAG search results
(e.g., passport holder names) and storing them as if they were facts about
the user. Added explicit instruction to only extract facts the user directly
states about themselves.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 15:27:47 +02:00
Christian Gick
e2bac92959 fix: increase RAG search top_k from 3 to 10
With only 3 results, passport queries often miss family members since
all passport files have similar low relevance scores. Increasing to 10
ensures all related documents are included in LLM context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 15:06:49 +02:00
Christian Gick
4ec4054db4 feat: Blinkist-style audio summary bot (MAT-74)
Add interactive article summary feature: user pastes URL → bot asks
language/duration/topics → generates audio summary via LLM + ElevenLabs
TTS → posts MP3 inline with transcript and follow-up Q&A.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 17:39:09 +02:00
Christian Gick
1000891a97 fix: Improve voice noise tolerance and focus on latest message
- Raise VAD thresholds (activation 0.65→0.75, min speech 0.4→0.6s,
  min silence 0.55→0.65s) to reduce false triggers from background noise
- Add "focus on latest message" instruction to all prompts (voice + text)
- Add "greet and wait" behavior for new conversations instead of auto-continuing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 13:30:14 +02:00
Christian Gick
90cdc7b812 chore: Trigger rebuild 2026-03-04 13:30:06 +02:00
Christian Gick
9578e0406b feat: Matrix E2EE key management + multi-user isolation
- Add rag_key_manager.py: stores encryption key in private E2EE room
- Bot loads key from Matrix on startup, injects into RAG via portal proxy
- No plaintext key on disk (removed RAG_ENCRYPTION_KEY from .env)
- Pass owner_id (matrix_user_id) to RAG search for user isolation
- Stronger format_context instructions for source link rendering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 11:19:02 +00:00
Christian Gick
5d3a6c8c79 chore: Trigger rebuild 2026-03-02 16:30:35 +02:00
Christian Gick
df9eaa99ec feat: Support customer-VM encrypted RAG service (MAT-68)
DocumentRAG class now prefers local RAG endpoint (RAG_ENDPOINT env var)
over central portal API. When RAG_ENDPOINT is set, searches go to the
customer VM encrypted RAG service on localhost:8765. Falls back to
portal API for unmigrated customers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 16:17:23 +02:00
Christian Gick
d9d2c0a849 fix: add v1 API fallback for Confluence page creation
When v2 API returns 401 (scope mismatch with classic OAuth tokens),
fall back to v1 REST API which accepts classic scopes. Also provides
clear error message asking user to re-authorize if both fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 14:25:47 +02:00
Christian Gick
f3db53798d fix: change default Confluence space from AG to AI
AG space does not exist. AI Collaboration (AI) is the correct default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 14:17:33 +02:00
Christian Gick
100f85e990 fix: use Confluence v2 API for page creation (v1 returns 410 Gone)
Switch from /wiki/rest/api/content to /wiki/api/v2/pages.
V2 requires space ID instead of key, so resolve via /api/v2/spaces first.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 14:13:06 +02:00
Christian Gick
b0f84670f2 fix: video track kind detection and Confluence page creation
- Video track kind is 2 (not 0) in LiveKit Python SDK — camera was never captured
- Replace broken confluence_collab.create_page import with direct REST API call

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 13:30:48 +02:00
Christian Gick
3c3eb196e1 refactor: Remove !ai command prefix, natural language only
- Remove all !ai command handling (help, models, set-model, search, etc)
- Remove legacy user_keys system (WildFiles API key storage)
- Remove docs connect/disconnect commands
- Bot now responds to all DM messages and @mentions naturally
- Settings managed exclusively via matrixhost.eu portal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 12:54:37 +02:00
Christian Gick
4bed67ac7f chore: remove all WildFiles references, use documents provider
- Remove WILDFILES_BASE_URL and WILDFILES_ORG env vars
- Rename _wildfiles_org_cache to _documents_cache
- Update _has_documents() to use provider=documents
- Remove "wildfiles connect" command alias (keep "docs connect")
- Remove WILDFILES env vars from docker-compose.yml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 12:06:01 +02:00
Christian Gick
c2d611ace8 chore: Trigger rebuild 2026-03-02 11:13:19 +02:00
Christian Gick
4d6cba1f0c feat: switch DocumentRAG to MatrixHost API, remove WildFiles dependency
DocumentRAG now calls MatrixHost /api/bot/documents/search instead of
the WildFiles API. Removes device auth flow and legacy org provisioning.
Bot authenticates via existing BOT_API_KEY pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 10:06:12 +02:00
Christian Gick
a4f01ca177 chore: Trigger rebuild 2026-03-02 06:48:19 +02:00
Christian Gick
d905f6ca6f feat: Auto-connect Documents via MatrixHost portal, rebrand WildFiles
Connect the Matrix AI bot to customer WildFiles orgs via the MatrixHost
portal API instead of requiring manual !ai wildfiles connect. The bot
now auto-resolves the user document org on every message, enabling
seamless RAG document search for all MatrixHost customers.

- Add _get_wildfiles_org() with portal API lookup and session cache
- Update DocumentRAG.search() to accept org_slug (no API key needed)
- Add DocumentRAG.get_org_stats() for org-based stats
- Update context building to use portal org lookup with legacy fallback
- Add !ai docs connect/disconnect aliases
- Rebrand all user-facing messages from WildFiles to Documents
- !ai wildfiles connect now checks portal first, shows auto-connect msg

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 06:41:09 +02:00
Christian Gick
fecf99ef60 chore(MAT-13): Switch chunk summarization from claude-haiku to gemini-flash
Reduces cost for conversation chunk summarization in live indexing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 18:27:43 +02:00
Christian Gick
9d2e2ddcf7 fix(MAT-13): Add DNS fallback via web search for browse_url
When browse_url fails with DNS resolution error (common with STT-misrecognized
domain names like "klicksports" instead of "clicksports"), automatically try a
web search to find the correct domain and retry. Applied to both text and voice bot.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 16:41:37 +02:00
Christian Gick
fb54ac2bea feat(MAT-13): Add conversation chunk RAG for Matrix chat history
Add semantic search over past conversations alongside existing memory facts.
New conversation_chunks table stores user-assistant exchanges with LLM-generated
summaries embedded for retrieval. Bot queries chunks on each message and injects
relevant past conversations into the system prompt. New exchanges are indexed
automatically after each bot response.

Memory-service: /chunks/store, /chunks/query, /chunks/bulk-store endpoints
Bot: chunk query + formatting, live indexing via asyncio.gather with memory extraction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 07:48:19 +02:00
Christian Gick
6fe9607fb1 feat: Add web page browsing tool (browse_url) to voice and text bot
Both bots can now fetch and read web pages via browse_url tool.
Uses httpx + BeautifulSoup to extract clean text from HTML.
Complements existing web_search (Brave) with full page reading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 16:26:17 +02:00
Christian Gick
34f403a066 feat(MAT-65): Remove WildFiles org-level fallback, require per-user key
No more shared org-level document search for unauthenticated users.
DocumentRAG.search() now returns empty if no API key provided.
Explicit !ai search command tells users to connect first.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 16:21:01 +02:00
Christian Gick
18607e39b5 fix(MAT-64): Convert --- to proper <hr/> in markdown-to-HTML
The _md_to_html method was missing horizontal rule conversion, so ---
rendered as literal dashes. Now converts to <hr/> and strips adjacent
<br/> tags for clean spacing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 13:54:24 +02:00
Christian Gick
7915d11463 fix(MAT-64): Ban headings and horizontal rules for compact output
System prompt now strictly forbids #/##/### headings and --- rules.
Uses **bold** for section titles instead, with no blank lines between
title and content, to eliminate excessive whitespace in Element.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 13:48:57 +02:00
Christian Gick
490822f3c3 fix(MAT-64): Inline source links and compact formatting
- System prompt now requires inline source links next to each claim
  instead of a separate "Quellen:" section at the bottom
- Use bold for sub-headings instead of ## to reduce padding/whitespace
- Limit horizontal rules for tighter message layout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 13:45:49 +02:00
Christian Gick
1db4f1f3bd fix(MAT-64): Improve web search formatting and require source links
- Format search results as markdown links: [Title](URL)
- System prompt now requires a "Quellen:/Sources:" section with
  clickable links whenever web_search is used

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 13:38:54 +02:00
Christian Gick
2826455036 feat(MAT-64): Add web search tool to text bot
The text bot had no websearch capability while the voice agent did.
Added Brave Search integration as a web_search tool so the bot can
answer questions about current events and look up information.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 13:30:36 +02:00
Christian Gick
e880376fdb chore: Trigger rebuild 2026-02-28 08:50:54 +02:00
Christian Gick
40a99c73f7 fix: Remove translation detection workflow from DM handler
The auto-detect language + translation menu was misidentifying regular
German messages and blocking normal responses. Bot now simply responds
in whatever language the user writes in, per updated system prompt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 08:47:33 +02:00
Christian Gick
5d730739b8 chore: Trigger rebuild 2026-02-27 08:52:18 +02:00
Christian Gick
2716f1946a fix: Remove bare SENTRY_DSN from environment sections
Bare variable references in environment: override env_file values
with the host shell value (empty). SENTRY_DSN is already loaded
via env_file: .env, so the explicit references were zeroing it out.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 08:51:57 +02:00
Christian Gick
7493df3b2c chore: Trigger rebuild 2026-02-27 08:47:17 +02:00
Christian Gick
7791a5ba8e feat: add Confluence recent pages + Sentry error tracking (MAT-58, MAT-59)
MAT-58: Add recent_confluence_pages tool to both voice and text chat.
Shows last 5 recently modified pages so users can pick directly
instead of having to search every time.

MAT-59: Integrate sentry-sdk in all three entry points (agent.py,
bot.py, voice.py). SENTRY_DSN env var, traces at 10% sample rate.
Requires creating project in Sentry UI and setting DSN.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 08:44:57 +02:00
Christian Gick
db10e435bc chore: Trigger rebuild 2026-02-27 08:04:20 +02:00
Christian Gick
10762a53da feat(MAT-57): Add Confluence write & create tools to voice and text chat
- Add create_confluence_page tool to voice mode (basic auth)
- Add confluence_update_page and confluence_create_page tools to text chat (OAuth)
- Fix update tool: wrap each paragraph in <p> tags instead of single wrapper
- Update system prompt to mention create capability

Previously only search/read were available. User reported bot couldn't
write to or create Confluence pages — because the tools didn't exist.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 08:04:01 +02:00
Christian Gick
9833c89aa6 chore: Trigger rebuild 2026-02-27 07:58:37 +02:00
Christian Gick
3bf9229ae4 fix(MAT-56): Prevent bot silence from STT noise leak + LLM timeout
Three fixes for the bot going silent after ~10 messages:

1. STT artifact handler now returns early — previously detected noise
   leaks ("Vielen Dank.", etc.) but still appended them to transcript,
   inflating context until LLM timed out after 4 retries.

2. Context truncation — caps LLM chat context at 40 items and internal
   transcript at 80 entries to prevent unbounded growth in long sessions.

3. LLM timeout recovery — watchdog detects when agent has been silent
   for >60s despite user activity, sends a recovery reply asking user
   to repeat their question instead of staying permanently silent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 07:58:11 +02:00
Christian Gick
b19300d3ce feat: Add confluence_search tool to voice bot
Voice bot could read/update Confluence pages but could not search.
Users asking to search Confluence got a refusal. Now the voice bot
has search_confluence using CQL queries via the service account.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 12:48:50 +02:00
Christian Gick
a3365626ae chore: Trigger rebuild 2026-02-26 12:39:20 +02:00
Christian Gick
11b80f07c6 chore: Trigger rebuild 2026-02-26 11:08:53 +02:00
Christian Gick
9a879f566d fix: Use Confluence v2 API for page reads
The v1 /wiki/rest/api/content/{id} endpoint returns 410 Gone.
Switch to /wiki/api/v2/pages/{id} with body-format=storage parameter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:08:29 +02:00
Christian Gick
3a5d37fac2 chore: Trigger rebuild 2026-02-26 10:25:07 +02:00
Christian Gick
f3b6f3f2f0 chore: Trigger rebuild 2026-02-26 10:21:02 +02:00
Christian Gick
48f6e7dd17 feat: Add Atlassian tools and agentic tool-calling loop
- Add AtlassianClient class: fetches per-user OAuth tokens from portal,
  calls Jira and Confluence REST APIs on behalf of users
- Add 7 Atlassian tools: confluence_search, confluence_read_page,
  jira_search, jira_get_issue, jira_create_issue, jira_add_comment,
  jira_transition
- Replace single LLM call with agentic loop (max 5 iterations)
  that feeds tool results back to the model
- Add PORTAL_URL and BOT_API_KEY env vars to docker-compose
- Update system prompt with Atlassian tool guidance

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:15:15 +02:00
Christian Gick
08a3c4a9cc refactor(CF-1812): Replace inline confluence-collab copy with git submodule
Single source of truth at christian/confluence-collab.git — eliminates stale copy drift.
Dockerfile COPY unchanged, works identically with submodule.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 12:30:31 +02:00
Christian Gick
9958fb9b6b fix: Update confluence-collab proxy with proper async lifecycle (CF-1812)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 11:51:29 +02:00