Skip to Content
ExplainersWhat it costs

What it costs

Cost panic is normal. The move from “it worked in the demo” to “it runs all month” is mostly arithmetic, not mysticism.

The three buckets

  1. Inference — paying a provider per token (input + output). This is usually the biggest variable line.
  2. Fixed hosting — where the app and agent runtime live (Vercel, Fly, AWS, a small VPS). Often tens of dollars a month at small scale.
  3. Data & integrations — vector DB, Postgres, observability, email/SMS, CRM APIs. Often predictable once you know traffic.

A five-minute estimate (back-of-napkin)

  1. Pick a unit of work — e.g. one “support ticket resolved” or one “weekly report generated.”
  2. Count tokens — rough is fine: prompt size + average reply size × steps per unit.
  3. Apply the price list — your provider’s $/1M input tokens and $/1M output tokens (or subscription math if you’re routing through a bundled plan).
  4. Multiply by volume — units per day × 30.
  5. Add 20–40% for retries, retries on failure, and “the model ran twice because the user asked a follow-up.”

That’s enough to say whether you’re in the hundreds or thousands per month — which is all most stakeholders need at first.

Why routing matters

If every task hits a frontier model, your bill looks like a rocket. Models and routing is how teams keep quality where it matters and cheap throughput everywhere else — summaries, classification, routing decisions.

When many agents run at once

Multiple agents in parallel multiply API calls. Budgets, caps, and spend visibility exist so a “research swarm” does not become a surprise invoice.

Last updated on