feat: Haiku-default model routing with Sonnet escalation + Sentry observability

Route ~90% of simple chat to claude-haiku (4x cheaper), escalate to
claude-sonnet for code blocks, long messages, technical keywords,
multimodal, and explicit requests. Sentry tags track model_used,
escalation_reason, and token usage breadcrumbs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Christian Gick
2026-03-08 17:11:24 +02:00
parent c8e5cd84bf
commit d6dae1da8e
2 changed files with 75 additions and 5 deletions

View File

@@ -19,6 +19,8 @@ services:
- LITELLM_BASE_URL
- LITELLM_API_KEY
- DEFAULT_MODEL
- BASE_MODEL=${BASE_MODEL:-claude-haiku}
- ESCALATION_MODEL=${ESCALATION_MODEL:-claude-sonnet}
- MEMORY_SERVICE_URL=http://memory-service:8090
- MEMORY_SERVICE_TOKEN
- PORTAL_URL