bug: LLM rate limit errors surface raw error to user instead of falling back #235

New issue

Open

opened 2026-03-09 06:23:09 +00:00 by David · 0 comments

David commented

2026-03-09 06:23:09 +00:00

Contributor

Description

When the primary LLM provider is rate limited (HTTP 429) or temporarily unavailable (503), the agent surfaces a raw/unhelpful error message to the user. In Telegram this shows as:

⚠️ API rate limit reached. Please try again later.

This is a poor user experience across all channels (Telegram, Matrix, CLI, etc.) and breaks conversational flow.

Root Cause

The PPQ plugin (cobot/plugins/ppq/plugin.py) has no specific handling for rate limit (429) or service unavailable (503) responses. All HTTP errors are caught as generic LLMError and bubbled up through the loop plugin, which returns a generic error message with an error reference ID. There is no retry logic, backoff, or fallback mechanism.

Expected Behavior

When the primary LLM is rate limited or unavailable, the agent should automatically retry with a configurable fallback LLM provider/model so the user experiences no interruption. The fallback should be transparent to the user.

Proposed Solution

Add fallback LLM config to cobot.yml:

ppq:
  api_base: https://openrouter.ai/api/v1
  model: anthropic/claude-sonnet-4
  api_key: sk-...
  fallback:
    api_base: https://openrouter.ai/api/v1  # can be same or different provider
    model: openai/gpt-4.1-mini
    api_key: sk-...  # optional, defaults to primary api_key

Default fallback model: openai/gpt-4.1-mini — strong reasoning, fast, and significantly cheaper than Claude Sonnet 4 (~$0.40/1M input vs ~$3/1M input).
Implementation in the LLM provider layer (channel-agnostic):
- Detect 429 / 503 / timeout responses specifically
- On rate limit: retry once with the fallback model
- On fallback failure: return a user-friendly error (not the raw API message)
- Emit an llm.fallback_triggered extension point event for observability
- Log which model was used for each response
Affected files:
- cobot/plugins/ppq/plugin.py — add fallback logic and rate-limit detection
- cobot/plugins/config/plugin.py — parse new fallback config block
- cobot/plugins/interfaces.py — possibly extend LLMError with error type (rate_limit, unavailable, etc.)
- cobot/plugins/loop/plugin.py — update error handling if needed

Acceptance Criteria

Rate limit (429) and unavailable (503) errors trigger automatic fallback
Fallback LLM is configurable via cobot.yml
Default fallback is openai/gpt-4.1-mini when no fallback is configured
Fallback is channel-agnostic (works for Telegram, Matrix, CLI, etc.)
llm.fallback_triggered event emitted for observability plugin
If both primary and fallback fail, a user-friendly error message is returned
Token usage tracking works correctly for fallback model responses

## Description When the primary LLM provider is rate limited (HTTP 429) or temporarily unavailable (503), the agent surfaces a raw/unhelpful error message to the user. In Telegram this shows as: > ⚠️ API rate limit reached. Please try again later. This is a poor user experience across all channels (Telegram, Matrix, CLI, etc.) and breaks conversational flow. ## Root Cause The PPQ plugin (`cobot/plugins/ppq/plugin.py`) has no specific handling for rate limit (429) or service unavailable (503) responses. All HTTP errors are caught as generic `LLMError` and bubbled up through the loop plugin, which returns a generic error message with an error reference ID. There is no retry logic, backoff, or fallback mechanism. ## Expected Behavior When the primary LLM is rate limited or unavailable, the agent should automatically retry with a configurable fallback LLM provider/model so the user experiences no interruption. The fallback should be transparent to the user. ## Proposed Solution 1. **Add `fallback` LLM config** to `cobot.yml`: ```yaml ppq: api_base: https://openrouter.ai/api/v1 model: anthropic/claude-sonnet-4 api_key: sk-... fallback: api_base: https://openrouter.ai/api/v1 # can be same or different provider model: openai/gpt-4.1-mini api_key: sk-... # optional, defaults to primary api_key ``` 2. **Default fallback model:** `openai/gpt-4.1-mini` — strong reasoning, fast, and significantly cheaper than Claude Sonnet 4 (~$0.40/1M input vs ~$3/1M input). 3. **Implementation in the LLM provider layer** (channel-agnostic): - Detect 429 / 503 / timeout responses specifically - On rate limit: retry once with the fallback model - On fallback failure: return a user-friendly error (not the raw API message) - Emit an `llm.fallback_triggered` extension point event for observability - Log which model was used for each response 4. **Affected files:** - `cobot/plugins/ppq/plugin.py` — add fallback logic and rate-limit detection - `cobot/plugins/config/plugin.py` — parse new `fallback` config block - `cobot/plugins/interfaces.py` — possibly extend `LLMError` with error type (rate_limit, unavailable, etc.) - `cobot/plugins/loop/plugin.py` — update error handling if needed ## Acceptance Criteria - [ ] Rate limit (429) and unavailable (503) errors trigger automatic fallback - [ ] Fallback LLM is configurable via `cobot.yml` - [ ] Default fallback is `openai/gpt-4.1-mini` when no fallback is configured - [ ] Fallback is channel-agnostic (works for Telegram, Matrix, CLI, etc.) - [ ] `llm.fallback_triggered` event emitted for observability plugin - [ ] If both primary and fallback fail, a user-friendly error message is returned - [ ] Token usage tracking works correctly for fallback model responses

doxios added the

Kind/Bug

label

2026-03-09 09:00:33 +00:00

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

ultanio/cobot#235

No description provided.

Rows
Columns