Cost Modeling

Most diagrams answer “what does this system do?” The Cost tab answers “what does it cost to run?” Every block can carry a cost, and Chinilla rolls those costs up across the canvas, the playback bar, and the overview panel so you can see where the money actually goes.

The three cost modes

Open a block, switch to the Cost tab, and pick one of the three pills at the top.

Mode	When to use it	What you fill in
Token-based	LLM calls priced per million tokens	Preset (or custom rates) + token ranges in/out
Flat per-call	Anything priced per execution at a fixed amount	A single dollar amount per call
Monthly only	Fixed cost regardless of traffic	A single monthly amount

A block can also combine flat per-call + monthly (e.g. “API costs 0.0002 per call, plus a 70/mo platform fee”). Token mode is mutually exclusive with flat per-call: pick token OR flat for the per-execution side.

Token-based mode

Pick an LLM from the Pricing preset dropdown to auto-fill input/output rates. The preset library covers the common 2026-era endpoints (OpenAI, Anthropic, Google, xAI, OpenRouter). If your model isn’t listed, leave the dropdown blank and type the rates by hand.

Then enter token ranges:

Input tokens (min/max) the prompt size you actually send
Output tokens (min/max) what the model writes back

Each call samples uniformly between min and max. Set min equal to max for a fixed token count. The estimate line below shows the expected per-call cost so you can sanity-check before running the simulation.

Flat per-call mode

Pick a preset from the dropdown or type a number. The preset library is grouped into:

Per-task labor human-touch costs computed as hourly_rate × minutes / 60 (e.g. CS agent · 5 min · 35/hr → 2.92)
Transaction fees Stripe, PayPal, ACH, wires
Messaging Twilio SMS/voice, SendGrid email
Infrastructure per-invocation Lambda, DB query, S3 PUT, CDN request

Monthly only

For things that cost the same whether you process 10 or 10,000 events: hosted servers, SaaS seats, salaried staff. The preset library covers:

Loaded staff US-loaded salaries (base + benefits + overhead) for CS, engineers, PMs, clinicians, warehouse FTEs
Hosting / infra AWS EC2/RDS, Pinecone, Cloudflare, Datadog, S3 storage
SaaS subscriptions Notion, Slack, GitHub, Linear, Figma, Zendesk
Personal subscriptions Netflix, Spotify, iCloud, gym, home internet

Loaded staff numbers reflect roughly 1.4× base salary, which is the right number to use in a sim. Raw salary alone understates true cost.

Where cost shows up

Once a block has a cost, it surfaces everywhere relevant:

Above the block small badge next to the in/out/time/queue indicators
Float-up indicators during simulation, cost ticks float up alongside throughput
Playback bar cumulative cost rolls up next to the other live metrics
Overview panel total cost, top contributors, and cost-per-output for the system
Tooltip hover any block to see its cost line

If a block has no cost set, none of these surfaces show anything for it. Cost is opt-in per block, never made up.

How rollups work

Per-call cost (token or flat) is multiplied by actual throughput in the running simulation, so a 100/sec component costs 100× more than a 1/sec component at the same per-call rate. Monthly cost is added on top as a flat overhead, prorated to the simulation window when shown alongside per-call totals.

This means the eye-opener in mixed AI/human pipelines is consistent: LLM API costs are typically a small slice of total spend, and human labor (monthly mode) dominates. The Customer Support AI template under Cost Modeling demonstrates this directly.

Templates to learn from

The Cost Modeling category in the templates picker has worked examples covering both shapes:

LLM Eval Pipeline all-token, multi-stage validator + judge
RAG Pipeline mixed: token for LLM stages, flat per-call for embedding/reranker, monthly for vector DB
Multi-Model Router token routing across cheap/frontier/specialist models
Cascading Classifier tiered token cost, dropping cheap model for hard cases
Customer Support AI mixed AI + human, showing how labor dominates real-world cost mix

Open any of them to see how the modes get used together, then adapt the numbers to your own context (geography, model, vendor tier).

Custom rates

Presets are starting points, not prescriptions. After picking a preset you can edit the rate fields directly to override. Custom values stay with the block and don’t reset if you switch presets later. If you find yourself typing the same custom value across many components, that’s worth a feature request.