Models and routing

Teams that route every task through one frontier model often see cost spikes and outages. A practical approach uses tiers: strong models for planning and hard reasoning, faster models for long tool chains, balanced models for everyday work, and small or local models for volume — summaries, classification, and routing decisions.

How routing usually works

A router picks a model (or provider) from task type, tool-call depth, latency budget, and context size. Calls are wrapped with timeouts and fallbacks so one vendor glitch does not take down the workflow. Long sessions are summarized with a cheap model to limit context drift; important state is checkpointed so failures can resume instead of starting cold.

Aggregators such as OpenRouter (and similar) offer one HTTP endpoint over many providers — useful when you want redundancy without building it all yourself.

Failure modes to expect

Everything on the biggest model — slow, expensive, often unnecessary.
No fallbacks — one API error stops the run.
Unbounded context — costs and errors rise as chats grow; summarize and trim.
Subscription hacks as architecture — consumer plan terms change; design for portable API access when production matters.

What it costs — token math and hosting buckets
How you know it’s working — cost per successful task as a health signal
The market right now — multi-provider access as a hedge

Models and routing

How routing usually works

Failure modes to expect

Related