Skip to content

Cost Modeling

Most diagrams answer “what does this system do?” The Cost tab answers “what does it cost to run?” Every block can carry a cost, and Chinilla rolls those costs up across the canvas, the playback bar, and the overview panel so you can see where the money actually goes.

Open a block, switch to the Cost tab, and pick one of the three pills at the top.

ModeWhen to use itWhat you fill in
Token-basedLLM calls priced per million tokensPreset (or custom rates) + token ranges in/out
Flat per-callAnything priced per execution at a fixed amountA single dollar amount per call
Monthly onlyFixed cost regardless of trafficA single monthly amount

A block can also combine flat per-call + monthly (e.g. “API costs 0.0002 per call, plus a 70/mo platform fee”). Token mode is mutually exclusive with flat per-call: pick token OR flat for the per-execution side.

Pick an LLM from the Pricing preset dropdown to auto-fill input/output rates. The preset library covers the common 2026-era endpoints (OpenAI, Anthropic, Google, xAI, OpenRouter). If your model isn’t listed, leave the dropdown blank and type the rates by hand.

Then enter token ranges:

  • Input tokens (min/max) the prompt size you actually send
  • Output tokens (min/max) what the model writes back

Each call samples uniformly between min and max. Set min equal to max for a fixed token count. The estimate line below shows the expected per-call cost so you can sanity-check before running the simulation.

Pick a preset from the dropdown or type a number. The preset library is grouped into:

  • Per-task labor human-touch costs computed as hourly_rate × minutes / 60 (e.g. CS agent · 5 min · 35/hr → 2.92)
  • Transaction fees Stripe, PayPal, ACH, wires
  • Messaging Twilio SMS/voice, SendGrid email
  • Infrastructure per-invocation Lambda, DB query, S3 PUT, CDN request

For things that cost the same whether you process 10 or 10,000 events: hosted servers, SaaS seats, salaried staff. The preset library covers:

  • Loaded staff US-loaded salaries (base + benefits + overhead) for CS, engineers, PMs, clinicians, warehouse FTEs
  • Hosting / infra AWS EC2/RDS, Pinecone, Cloudflare, Datadog, S3 storage
  • SaaS subscriptions Notion, Slack, GitHub, Linear, Figma, Zendesk
  • Personal subscriptions Netflix, Spotify, iCloud, gym, home internet

Loaded staff numbers reflect roughly 1.4× base salary, which is the right number to use in a sim. Raw salary alone understates true cost.

Once a block has a cost, it surfaces everywhere relevant:

  • Above the block small badge next to the in/out/time/queue indicators
  • Float-up indicators during simulation, cost ticks float up alongside throughput
  • Playback bar cumulative cost rolls up next to the other live metrics
  • Overview panel total cost, top contributors, and cost-per-output for the system
  • Tooltip hover any block to see its cost line

If a block has no cost set, none of these surfaces show anything for it. Cost is opt-in per block, never made up.

Per-call cost (token or flat) is multiplied by actual throughput in the running simulation, so a 100/sec component costs 100× more than a 1/sec component at the same per-call rate. Monthly cost is added on top as a flat overhead, prorated to the simulation window when shown alongside per-call totals.

This means the eye-opener in mixed AI/human pipelines is consistent: LLM API costs are typically a small slice of total spend, and human labor (monthly mode) dominates. The Customer Support AI template under Cost Modeling demonstrates this directly.

The Cost Modeling category in the templates picker has worked examples covering both shapes:

  • LLM Eval Pipeline all-token, multi-stage validator + judge
  • RAG Pipeline mixed: token for LLM stages, flat per-call for embedding/reranker, monthly for vector DB
  • Multi-Model Router token routing across cheap/frontier/specialist models
  • Cascading Classifier tiered token cost, dropping cheap model for hard cases
  • Customer Support AI mixed AI + human, showing how labor dominates real-world cost mix

Open any of them to see how the modes get used together, then adapt the numbers to your own context (geography, model, vendor tier).

Presets are starting points, not prescriptions. After picking a preset you can edit the rate fields directly to override. Custom values stay with the block and don’t reset if you switch presets later. If you find yourself typing the same custom value across many components, that’s worth a feature request.