What it costs

Cost panic is normal. The move from “it worked in the demo” to “it runs all month” is mostly arithmetic, not mysticism.

The three buckets

Inference — paying a provider per token (input + output). This is usually the biggest variable line.
Fixed hosting — where the app and agent runtime live (Vercel, Fly, AWS, a small VPS). Often tens of dollars a month at small scale.
Data & integrations — vector DB, Postgres, observability, email/SMS, CRM APIs. Often predictable once you know traffic.

A five-minute estimate (back-of-napkin)

Pick a unit of work — e.g. one “support ticket resolved” or one “weekly report generated.”
Count tokens — rough is fine: prompt size + average reply size × steps per unit.
Apply the price list — your provider’s $/1M input tokens and $/1M output tokens (or subscription math if you’re routing through a bundled plan).
Multiply by volume — units per day × 30.
Add 20–40% for retries, retries on failure, and “the model ran twice because the user asked a follow-up.”

That’s enough to say whether you’re in the hundreds or thousands per month — which is all most stakeholders need at first.

Why routing matters

If every task hits a frontier model, your bill looks like a rocket. Models and routing is how teams keep quality where it matters and cheap throughput everywhere else — summaries, classification, routing decisions.

When many agents run at once

Multiple agents in parallel multiply API calls. Budgets, caps, and spend visibility exist so a “research swarm” does not become a surprise invoice.

What it costs

The three buckets

A five-minute estimate (back-of-napkin)

Why routing matters

When many agents run at once

Related