What it costs
Cost panic is normal. The move from “it worked in the demo” to “it runs all month” is mostly arithmetic, not mysticism.
The three buckets
- Inference — paying a provider per token (input + output). This is usually the biggest variable line.
- Fixed hosting — where the app and agent runtime live (Vercel, Fly, AWS, a small VPS). Often tens of dollars a month at small scale.
- Data & integrations — vector DB, Postgres, observability, email/SMS, CRM APIs. Often predictable once you know traffic.
A five-minute estimate (back-of-napkin)
- Pick a unit of work — e.g. one “support ticket resolved” or one “weekly report generated.”
- Count tokens — rough is fine: prompt size + average reply size × steps per unit.
- Apply the price list — your provider’s $/1M input tokens and $/1M output tokens (or subscription math if you’re routing through a bundled plan).
- Multiply by volume — units per day × 30.
- Add 20–40% for retries, retries on failure, and “the model ran twice because the user asked a follow-up.”
That’s enough to say whether you’re in the hundreds or thousands per month — which is all most stakeholders need at first.
Why routing matters
If every task hits a frontier model, your bill looks like a rocket. Models and routing is how teams keep quality where it matters and cheap throughput everywhere else — summaries, classification, routing decisions.
When many agents run at once
Multiple agents in parallel multiply API calls. Budgets, caps, and spend visibility exist so a “research swarm” does not become a surprise invoice.
Related
Last updated on