feat: Haiku-default model routing with Sonnet escalation + Sentry observability
Route ~90% of simple chat to claude-haiku (4x cheaper), escalate to claude-sonnet for code blocks, long messages, technical keywords, multimodal, and explicit requests. Sentry tags track model_used, escalation_reason, and token usage breadcrumbs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -19,6 +19,8 @@ services:
|
||||
- LITELLM_BASE_URL
|
||||
- LITELLM_API_KEY
|
||||
- DEFAULT_MODEL
|
||||
- BASE_MODEL=${BASE_MODEL:-claude-haiku}
|
||||
- ESCALATION_MODEL=${ESCALATION_MODEL:-claude-sonnet}
|
||||
- MEMORY_SERVICE_URL=http://memory-service:8090
|
||||
- MEMORY_SERVICE_TOKEN
|
||||
- PORTAL_URL
|
||||
|
||||
Reference in New Issue
Block a user