Cost Modeling
Most diagrams answer “what does this system do?” The Cost tab answers “what does it cost to run?” Every block can carry a cost, and Chinilla rolls those costs up across the canvas, the playback bar, and the overview panel so you can see where the money actually goes.
The three cost modes
Section titled “The three cost modes”Open a block, switch to the Cost tab, and pick one of the three pills at the top.
| Mode | When to use it | What you fill in |
|---|---|---|
| Token-based | LLM calls priced per million tokens | Preset (or custom rates) + token ranges in/out |
| Flat per-call | Anything priced per execution at a fixed amount | A single dollar amount per call |
| Monthly only | Fixed cost regardless of traffic | A single monthly amount |
A block can also combine flat per-call + monthly (e.g. “API costs 0.0002 per call, plus a 70/mo platform fee”). Token mode is mutually exclusive with flat per-call: pick token OR flat for the per-execution side.
Token-based mode
Section titled “Token-based mode”Pick an LLM from the Pricing preset dropdown to auto-fill input/output rates. The preset library covers the common 2026-era endpoints (OpenAI, Anthropic, Google, xAI, OpenRouter). If your model isn’t listed, leave the dropdown blank and type the rates by hand.
Then enter token ranges:
- Input tokens (min/max) the prompt size you actually send
- Output tokens (min/max) what the model writes back
Each call samples uniformly between min and max. Set min equal to max for a fixed token count. The estimate line below shows the expected per-call cost so you can sanity-check before running the simulation.
Flat per-call mode
Section titled “Flat per-call mode”Pick a preset from the dropdown or type a number. The preset library is grouped into:
- Per-task labor human-touch costs computed as
hourly_rate × minutes / 60(e.g. CS agent · 5 min · 35/hr → 2.92) - Transaction fees Stripe, PayPal, ACH, wires
- Messaging Twilio SMS/voice, SendGrid email
- Infrastructure per-invocation Lambda, DB query, S3 PUT, CDN request
Monthly only
Section titled “Monthly only”For things that cost the same whether you process 10 or 10,000 events: hosted servers, SaaS seats, salaried staff. The preset library covers:
- Loaded staff US-loaded salaries (base + benefits + overhead) for CS, engineers, PMs, clinicians, warehouse FTEs
- Hosting / infra AWS EC2/RDS, Pinecone, Cloudflare, Datadog, S3 storage
- SaaS subscriptions Notion, Slack, GitHub, Linear, Figma, Zendesk
- Personal subscriptions Netflix, Spotify, iCloud, gym, home internet
Loaded staff numbers reflect roughly 1.4× base salary, which is the right number to use in a sim. Raw salary alone understates true cost.
Where cost shows up
Section titled “Where cost shows up”Once a block has a cost, it surfaces everywhere relevant:
- Above the block small badge next to the in/out/time/queue indicators
- Float-up indicators during simulation, cost ticks float up alongside throughput
- Playback bar cumulative cost rolls up next to the other live metrics
- Overview panel total cost, top contributors, and cost-per-output for the system
- Tooltip hover any block to see its cost line
If a block has no cost set, none of these surfaces show anything for it. Cost is opt-in per block, never made up.
How rollups work
Section titled “How rollups work”Per-call cost (token or flat) is multiplied by actual throughput in the running simulation, so a 100/sec component costs 100× more than a 1/sec component at the same per-call rate. Monthly cost is added on top as a flat overhead, prorated to the simulation window when shown alongside per-call totals.
This means the eye-opener in mixed AI/human pipelines is consistent: LLM API costs are typically a small slice of total spend, and human labor (monthly mode) dominates. The Customer Support AI template under Cost Modeling demonstrates this directly.
Templates to learn from
Section titled “Templates to learn from”The Cost Modeling category in the templates picker has worked examples covering both shapes:
- LLM Eval Pipeline all-token, multi-stage validator + judge
- RAG Pipeline mixed: token for LLM stages, flat per-call for embedding/reranker, monthly for vector DB
- Multi-Model Router token routing across cheap/frontier/specialist models
- Cascading Classifier tiered token cost, dropping cheap model for hard cases
- Customer Support AI mixed AI + human, showing how labor dominates real-world cost mix
Open any of them to see how the modes get used together, then adapt the numbers to your own context (geography, model, vendor tier).
Custom rates
Section titled “Custom rates”Presets are starting points, not prescriptions. After picking a preset you can edit the rate fields directly to override. Custom values stay with the block and don’t reset if you switch presets later. If you find yourself typing the same custom value across many components, that’s worth a feature request.