Models and routing
Teams that route every task through one frontier model often see cost spikes and outages. A practical approach uses tiers: strong models for planning and hard reasoning, faster models for long tool chains, balanced models for everyday work, and small or local models for volume — summaries, classification, and routing decisions.
How routing usually works
A router picks a model (or provider) from task type, tool-call depth, latency budget, and context size. Calls are wrapped with timeouts and fallbacks so one vendor glitch does not take down the workflow. Long sessions are summarized with a cheap model to limit context drift; important state is checkpointed so failures can resume instead of starting cold.
Aggregators such as OpenRouter (and similar) offer one HTTP endpoint over many providers — useful when you want redundancy without building it all yourself.
Failure modes to expect
- Everything on the biggest model — slow, expensive, often unnecessary.
- No fallbacks — one API error stops the run.
- Unbounded context — costs and errors rise as chats grow; summarize and trim.
- Subscription hacks as architecture — consumer plan terms change; design for portable API access when production matters.
Related
- What it costs — token math and hosting buckets
- How you know it’s working — cost per successful task as a health signal
- The market right now — multi-provider access as a hedge