Cereby Mini: Why We Moved to a Two-Intent Model
From many named intents to one routing decision: mutate the page through reviewable edits, or answer in the thread without touching the doc.
The problem with naming everything
Cereby Mini sits on a document. Users ask for grammar fixes, rewrites, tone tweaks, explanations, and odd one-off requests that read nothing like our internal feature names. Our early routing tried to name every kind of ask: write, grammar, replace, extract, and more. That made demos and tests easy, but live language kept slipping between the labels.
When someone typed "make this punchier" or "friendlier" or "merge these sections," the classifier had to pick the nearest enum. Sometimes it picked right. Sometimes it picked an adjacent intent with high confidence and routed to the wrong tool entirely. The user got the right energy back, but the wrong surface. That is the failure mode that is hardest to explain: not a blank response, but a response that clearly misread what the user wanted for this page.
The issue was not vague users. It was routing that optimized for tidy labels instead of outcomes on the canvas.
What the old flow looked like
Grammar versus rewrite versus replace are genuinely close in meaning; the classifier had to pick one with high confidence even when the answer was ambiguous. Each new behavior added pressure to add a top-level label, and the catalog kept growing. Intent misfires felt personal: the system had clearly understood something, just not what they wanted for this page.
The shape we landed on
We collapsed the first routing step to two outcomes. Edit the document, or respond in text. Everything specific still happens after that fork.
The key insight was to align the top level with how users actually think. The question "should this change what is on the page, or is a reply in chat enough?" is one that users have already answered before they type. The routing layer should surface that decision, not override it with a product taxonomy.
Everything else, the right tools, templates, grammar pipelines, safety checks, belongs below the fork where we have more context (selection state, full document, attachments) and are no longer fighting a wrong umbrella label.
There is no third bucket at this layer. If a request is ambiguous, we route conservatively and let the review UX be the resolution mechanism: the user sees a proposed edit and decides whether it lands on the page.
What changed and what did not
| Area | Fine-grained intents | Two-intent fork |
|---|---|---|
| Unlabeled asks | Forced into wrong bucket or weak fallback | Edit if it should show on the page, else chat |
| Near-neighbor bugs | Grammar vs rewrite vs replace collisions | Single axis: page vs thread |
| New behaviors | Pressure to add taxonomy entries | Second-pass and tools absorb novelty |
| Mental model | "Which internal label?" | "Does this change the doc?" |
The maintenance load shifted from catalog growth to improving post-fork logic, where we have more context and the interesting work actually lives.
What this taught us
Outcomes beat enums at the top. Users think in terms of what should change, not our intent spreadsheet. A routing system that matches that mental model fails less often and fails more gracefully when it does.
Preview is part of routing policy. When the first step is broader, the review UX (accept or reject) is how you maintain user trust without needing perfect classification accuracy. The user resolves ambiguity; the classifier does not have to.
Next we are sharpening second-pass routing inside the edit branch, using selection state and document context to choose between templates, grammar-only passes, and full rewrites, and running evaluation on real transcripts to tune the edit-versus-chat decision where it is genuinely close.
