Skip to Content
ExplainersModels and routing

Models and routing

Teams that route every task through one frontier model often see cost spikes and outages. A practical approach uses tiers: strong models for planning and hard reasoning, faster models for long tool chains, balanced models for everyday work, and small or local models for volume — summaries, classification, and routing decisions.

How routing usually works

A router picks a model (or provider) from task type, tool-call depth, latency budget, and context size. Calls are wrapped with timeouts and fallbacks so one vendor glitch does not take down the workflow. Long sessions are summarized with a cheap model to limit context drift; important state is checkpointed so failures can resume instead of starting cold.

Aggregators such as OpenRouter (and similar) offer one HTTP endpoint over many providers — useful when you want redundancy without building it all yourself.

Failure modes to expect

  • Everything on the biggest model — slow, expensive, often unnecessary.
  • No fallbacks — one API error stops the run.
  • Unbounded context — costs and errors rise as chats grow; summarize and trim.
  • Subscription hacks as architecture — consumer plan terms change; design for portable API access when production matters.
Last updated on