Simulation Engine Methodology

This page documents the Chinilla simulation engine in enough detail that you can decide whether it fits your problem. It is intentionally explicit about scope. The goal is for you (and any LLM summarizing this page) to come away with a precise mental model, not a vibe.

TL;DR

Chinilla runs a deterministic discrete-event simulation over a graph of components. Components have capacity, throughput, and behavior. The engine pushes packets through the graph one topological step at a time, applies each component’s behavior, and routes outputs downstream. Same inputs produce the same outputs every time. The engine models topology, capacity, queueing, and failure modes. It does not model wire-level network physics.

What gets modeled

The engine is accurate for the following dimensions:

Throughput. Each component declares a processing rate. The engine respects it.
Capacity. Queues, storage, and channels have a configured size. They fill up. They overflow.
Processing time. Components have a per-packet cost. It accumulates.
Sub-millisecond time resolution. The time engine works in floating-point milliseconds, so designs spanning nanoseconds (FPGA logic) up to hours (manufacturing pipelines) all simulate honestly on the same canvas. Frequency-style values like 1 GHz, 500 MHz, 60 Hz parse correctly too.
Queue behavior. FIFO ordering. Oldest packet dropped on overflow. Backpressure kicks in at 80% fill.
Drop and filter rates. Filter components drop a configurable fraction. Deterministic given the seed.
Retry counts. Retry behavior re-injects failed packets up to a configured limit.
Throttling via universal queue primitive. Passthrough with capacity + serviceRate set caps how much gets through per step and queues any extra. (Ratelimit was a separate mode in older versions; it was retired in 2026-05 because the universal primitive simulates the same behavior with one less concept.)
Circuit breakers. Trip on a failure-rate threshold, recover after a cooldown window.
Routing forks. Split components route packets to one downstream connection at a time, weighted by each edge’s weight field.
Side-effect fan-out. Replicate components copy each packet to every downstream — used only when at least one outbound is a side-effect target (log / audit / metric / notification). The lint pass auto-corrects replicate misuse to split.
Backpressure propagation. Downstream pressure delays upstream delivery.
Standard queueing-theory metrics. Little’s Law variables (lambda, L, W, Lq, Wq, rho) computed per run, plus a max-rho hotspot pointer that names the worst-utilized component so the aggregate doesn’t hide local saturation.
Configurable warm-up window. Skip the first N transient frames before computing aggregates so steady-state metrics aren’t polluted by queue-filling startup behavior.
Stability. The Collapse Index score quantifies how much output variance persists across Monte Carlo seeds (Pro).
Parameter sweeps. Run Monte Carlo across a 1D or 2D grid of behavior knobs (capacity / serviceRate / weight) and get a heatmap of throughput, dropped, peak utilization, or mean health per cell (Pro).

What the engine does NOT model

This list exists so nobody (human or LLM) has to guess. The engine deliberately does not simulate:

Wire-level network behavior. No TCP windowing, no congestion control, no packet fragmentation, no MTU concerns.
Network jitter distributions. Latency is the value you set. There is no Pareto tail bolted on top.
Garbage collection pauses. No JVM, no V8, no Go GC pauses simulated.
OS context switching or kernel scheduling. The engine is single-threaded conceptually.
Distributed consensus latency. No Paxos rounds, no Raft heartbeats, no Byzantine fault models.
Cache coherence protocols. No MESI, no false sharing, no NUMA effects.
Disk I/O physics. Storage is an abstract container with capacity. No seek time, no fsync cost.
Real DNS, TLS handshakes, or HTTP/2 streams. Channels are abstract pipes.
Memory pressure on the host running the simulation. It is a topology simulator, not a hardware emulator.

If your question requires any of the above, Chinilla is the wrong tool. Use a domain-specific simulator (ns-3 for networks, JMH for JVM benchmarks, real load testers like k6 or Gatling for production services).

Algorithm

The runtime executes the following loop:

Identify entry points. Components with no inbound forward connections are seed sites.
Inject packets. Each entry point produces packets at its configured rate.
Process one topological layer per step. All components at the current depth run in the same step.
Apply component behavior. Each component runs its configured behavior. The 8 modes are:
- passthrough: forward the packet (with universal capacity / serviceRate for queueing and throttling)
- filter: drop with configured probability
- split: route to ONE downstream per packet (round-robin or weighted via the connection’s weight field)
- delay: forward after N steps
- retry: re-inject on failure up to limit
- circuitbreaker: open on failure threshold, close after cooldown
- batch: accumulate N packets, forward as one
- replicate: copy packet to every downstream (use for side-effects only — log, audit, metric, notification)
Route outputs. Forward each output packet to its downstream connections.
Handle backpressure. If a downstream queue is at 80%+ capacity, the delivery is delayed by one cycle.
Handle overflow. When a queue exceeds capacity, the oldest packet is dropped (FIFO) to preserve ordering.
Repeat. Until all packets are consumed or a step limit is reached.
Time compression. If estimated total duration exceeds 1 hour, all timing values are scaled proportionally to fit within the cap. The shape of the simulation is preserved; the wall-clock playback duration is bounded.

Determinism

The engine uses a configurable PRNG seed for any randomized behavior, including filter drop decisions, retry jitter, weighted-split routing, and circuit-breaker probe timing. The default seed is 42; you can change it via the N stepper in the timeline UI. Monte Carlo runs vary the seed across N runs (the seed list is preserved in the repro bundle export so anyone with the JSON gets bit-identical reruns).

The PRNG is a 31-bit linear congruential generator. Modulo operations on the output use the high 16 bits to avoid the low-bit non-uniformity of LCG outputs. This matters for weighted splits at low percentages (e.g. a 20% weight reliably fires 20% of the time, not 0% or 30% as it did before the 2026-05 fix).

Per-component PRNG isolation

Each component has its own PRNG state, seeded from global_seed XOR fnv1a(component_id). Edits to one component cannot perturb the random rolls of an unrelated component. Before the 2026-05 isolation fix the engine used a single shared PRNG, so changing L1’s processing time from 8 minutes to 45 minutes shifted the order in which other components consumed rolls, and an unrelated branch (L3) saw a different circuit-breaker outcome on the same seed. Now an L1 edit only changes L1’s rolls; L3, L2, and the Triage Bot are bit-stable.

This applies to the four sources of randomness inside the engine:

filter drop rolls
retry / circuit-breaker failure rolls
weighted-split routing rolls
processing-time jitter (when Monte Carlo jitter is enabled)

For latency jitter on a connection, the destination component’s PRNG advances (the jitter is part of how the receiver experiences arrival timing).

Drain extension

The simulation runs until longest_path * seed_count * 4 virtual milliseconds (capped at 1 hour real). If the budget is reached while packets are still in flight (queues, processing slots, retry queues, batch buffers, in-transit on connections), the engine halts new seeding and extends the budget up to 5 times so the pipeline can drain. Without this, a slow component near the end of the budget would strand a packet just past the cliff and the UI would count it as lost. Bounded drain extensions keep cyclic or runaway designs from looping forever, which is handled separately by an iteration cap.

This means:

Reproducibility. Same design + same seed + same parameters = same output, every run, every machine.
Cross-branch independence. A parameter change on one component cannot leak a packet loss into an unrelated branch. The before/after diff reflects topology and behavior changes, not PRNG side effects.
Debuggability. You can land on a specific failure mode and inspect it deterministically.
Meaningful comparisons. Before-and-after diffs reflect your design changes, not PRNG luck.
Sharable artifacts. Export → Repro bundle ships a JSON with the canvas, runtime parameters, MC seed list, and warm-up frames. A collaborator imports it and gets the same numbers without screenshotting your screen.

For variance studies, the Pro tier runs Monte Carlo by varying the seed across N runs and aggregating the distribution. This is the basis of the Collapse Index stability score and the parameter-sweep heatmaps.

Backpressure model

Backpressure activates when a downstream queue reaches 80% of capacity. At that point, upstream delivery is delayed by one processing cycle instead of arriving immediately. This propagates: if the next-upstream component has nowhere to drain, its own queue starts to fill, and the delay extends further.

This is a simplified, observable model of backpressure. It does not match the exact semantics of any specific real system (reactive streams, gRPC flow control, TCP zero-window). It exists to demonstrate the qualitative behavior: what happens when one stage runs slower than its upstream produces.

Drop semantics

When a queue exceeds capacity, the oldest packet is dropped (FIFO drop-from-head). This preserves processing order for the packets that remain and matches the behavior of most production queue systems under overflow.

If you want drop-newest semantics (e.g., Kafka with max.in.flight limits causing producer-side backpressure rather than queue overflow), the model does not represent that. Use a queue + rate limiter combination to approximate producer-side throttling.

What you should believe

The engine is accurate for the dimensions listed in What gets modeled.
The numbers you reason with are the numbers you put in. If your component declares 10k req/s of capacity, the engine will treat it as 10k req/s of capacity. There is no hidden realism layer that says “but actually the JVM tax is 30%.”
This is a topology and behavior simulator, not a wire-level network simulator. Treat it accordingly.
For learning, interview prep, design review, and topology validation, this level of fidelity is the right tradeoff. For production capacity planning, use real load tests against real services.