Skip to content

Chat Assistant

The chat assistant (open with the Chinilla AI button in the top-right) lets you talk to your design instead of clicking everything. The AI panel has two tabs: Chat and GitHub. (Earlier versions had separate Spec and Map tabs; both moved — Spec is now an export format (Export → System Spec), and Map’s code-to-canvas workflow folds into chat — paste code into a chat message and ask “turn this into a diagram”.)

ActionExample
Build”Design a cafe with a register, kitchen, and order queue.”
Ask”What’s the weakest part of this design?”
Change”Add a cache between the API and the database.”
Analyze”Which blocks would take the whole thing down if they failed?”
Explain”Why is the queue backing up?”
Zoom in”Zoom into the API gateway.”
Rename”Translate all the labels to Korean.” / “Rename in DevOps terms.”

The AI can rewrite every label on the canvas while leaving topology, behaviors, capacities, and wiring exactly as they were. Useful for:

  • Translation. “Translate all the labels to Korean” / “to Japanese” / “to Spanish”. The runtime stats follow because the simulator keys off component IDs, not labels — your bottleneck readout, sim metrics, and Monte Carlo results stay identical.
  • Domain swaps. “Rename these in hospital triage terms” / “in DevOps terms” / “as if this were a CI/CD pipeline”. Same architecture, different vocabulary for a different audience.
  • Style passes. “Make all labels lowercase” / “use formal business terms” / “use abbreviations only”.

This is essentially free localization: draft the design once in English, then ship it in any language or domain register without redrawing.

Type your message in the chat box and hit Send. The AI uses your canvas plus the chat so far to write a response.

Tell the AI which level of abstraction you want

Section titled “Tell the AI which level of abstraction you want”

The single biggest cause of “the AI didn’t draw what I had in mind” is leaving the level of abstraction implicit. Most domains can be modeled at very different scales, and a vague prompt produces a confused middle-ground design. Be explicit:

Vague promptHigh-level promptLow-level prompt
”Design a SpaceX rocket""Design SpaceX launch control: authorization, telemetry, abort, recovery.""Design a rocket engine: turbopump, combustion chamber, nozzle, sensors."
"Design a coffee shop""Design the floor plan: doors, seating area, counter, drive-thru lane.""Design the operations: order placed, drink prep, pickup, payment, restock."
"Design a hospital""Design the building flow: ER, ICU, OR, pharmacy, admin.""Design the triage process: intake, ESI scoring, room assign, diagnostics."
"Design Netflix""Design the user journey: signup, browse, watch, recommendations.""Design the backend: API gateway, encoder, CDN, recommendation service, billing.”

Both levels are valid. The high-level version answers “what’s connected to what”. The low-level version answers “where does it break under load”. Mixing levels in one canvas usually produces a design that doesn’t simulate honestly at either scale.

Quick mental test before you prompt: what question do you want the simulation to answer? If it’s “do the rooms / paths / lanes flow correctly”, that’s high-level. If it’s “what’s the bottleneck under 10x traffic”, that’s low-level.

The real power isn’t one-shot generation. It’s the loop. Sketch something rough, run it, see what breaks, ask for a fix. Each round teaches the AI more about your system and what you care about.

A typical loop:

  1. Draft — “Design a checkout with cart, payment, and inventory.”
  2. Run — hit play, watch where items pile up or get lost.
  3. Ask — “Why is the payment service dropping items?”
  4. Fix — “Add a retry on the payment service with 3 attempts.”
  5. Stress test — “What happens if traffic doubles?”
  6. Analyze — “What’s the most traffic this can handle before the database breaks?”
  7. Change — “Add a cache between the API and the database.”
  8. Repeat.

Every follow-up builds on the chat so far. The AI remembers what you decided and can explain tradeoffs: “Adding a cache cuts read time from 200ms to 5ms but you’ll see stale data. Want me to add a time-to-live?”

The more context you give, the better the AI’s answers:

  • Constraints — “Budget is $200/month max” or “The team only knows Python.”
  • Priorities — “Speed matters more than cost” or “We need 99.9% uptime.”
  • Real-world references — “This should work like Stripe’s webhook retries.”
  • Challenge it — “Why did you pick Kafka instead of a simple queue?”

The AI can reason about tradeoffs. Try:

  • “What’s the tradeoff between polling and WebSockets for these notifications?”
  • “One database or split reads and writes?”
  • “Cache at the API or at the database?”
  • “What if we remove the queue? Is it worth the simpler design?”
  • “Is this overkill for 100 users a day?”

Design:

  • “Design a food delivery backend.”
  • “Add login and signup to this.”
  • “How would Netflix handle this differently?”

Debugging:

  • “Why is the queue backing up?”
  • “Which block is the bottleneck?”
  • “What happens if this service goes down?”

Numbers and scaling:

  • “What rate do I need to handle 10,000 users a day?”
  • “How many copies of the worker should I run?”
  • “Set realistic time numbers on each block.”

Code:

  • “What libraries would I need to build this?”
  • “Write the API routes for the gateway block.”

Files give the AI deeper context:

  • Requirements docs (.md, .txt) — product specs, PRDs, user stories.
  • Existing code (.py, .js, .ts) — current implementation to learn from or improve.
  • API schemas (.json) — OpenAPI specs, database schemas, config files.
  • Meeting notes (.md, .txt) — stakeholder feedback, design review notes.
  • Architecture docs (.md) — existing system docs to extend or migrate.

Every AI change goes through a review step. The chat shows you what’s about to change. Click Apply to accept or Reject to dismiss. Works for first drafts, layout changes, and small tweaks.

Select a block first, then ask. The AI focuses on just that block — setting its behavior, its numbers, its requirements.

Ask the AI to set behaviors:

“Make the queue filter out items with priority < 3.”

The AI picks the right mode and fills in the settings. All 8 modes are supported: passthrough, filter, split, delay, retry, circuit breaker, batch, replicate.

Attach files (.py, .txt, .md, .js, .ts, .json, etc.) for context. Good for pulling in existing specs, requirements, or code snippets to base your design on.

  • Up to 5 files per message.
  • 100KB per file max.
  • 100K characters total (message + files).

After each response, the AI shows clickable suggestions to push your design forward: draft, run, stress test, analyze, change.

The last 6 messages stay in full. Older messages get summarized down to 200 characters to save tokens while keeping the thread of the conversation.

  • 100K characters per message (including attachments).
  • 300 AI credits per month (Pro). Each generated design costs 5 credits, each chat iteration 1 credit.
  • Chat is a Pro feature. Free tier gets 1 AI-generated design per month from the launcher.

Earlier versions of Chinilla had separate Spec and Map tabs in the AI panel. Both have moved:

  • Spec is now an export format. Open the Export modal and pick System Spec — a deterministic architecture document built directly from your canvas (no AI tokens, no hallucination risk). See Export & Share for the full breakdown.
  • Map (code-to-canvas) folds into chat. Paste source code or attach a file in any chat message and ask “turn this into a diagram”. The AI reads the code structure (classes, functions, imports) and produces blocks and lines.