Introducing Cereby AI: Your Personal Learning Assistant
How a context-first design turned a generic chat interface into something that remembers who you are.
The problem with most learning chat
Picture a student who bombed three straight quizzes on cellular respiration, has a midterm in four days, and just typed "can you quiz me on this?" The stateless model answers like it always does: it has no idea about the failed quizzes, no idea about the midterm, and no memory of anything said two messages ago. The answer it produces is not wrong. It is just generic, and generic is not what anyone needed.
That is the problem Cereby AI is designed to fix. Not by swapping in a better base model or writing a longer system prompt that says "you are a helpful tutor." By making sure the server sends a deliberate bundle of learner state on every single request, so the model's answer can be shaped by what the student has actually been doing.
What we built instead
The key shift is treating learning data as a first-class input rather than an afterthought appended to the bottom of a prompt. Before the model sees the latest message, it already knows who is asking, what topics they have been failing, what artifacts exist in their notebooks, and what deadline pressure looks like on the calendar. Personalization comes from the data shape, not from the system prompt's wording.
Context assembly
We aggregate signals the product already owns: quiz outcomes, study paths, saved notes, calendar events, and the recent conversation thread. No special tutor persona needed.
Intent routing
Not every message should produce a note or a quiz. Routing intent to the right tool keeps latency reasonable and avoids burning tokens on artifacts nobody asked for. It is a cost control, not just a product nicety.
Artifacts and persistence
Outputs are not disposable chat bubbles. Notes land in notebooks. Quizzes attach to study materials. Study paths sync with scheduling surfaces. This matters because the next turn's context includes what shipped last time. The loop closes: the assistant that helped you yesterday knows what it said.
Adaptive depth
Length and difficulty are shaped by observed performance history and explicit follow-ups ("simpler," "more examples"). The same underlying path can expand or contract without forking the stack.
The hard parts
A few things were harder than they looked in the design.
Sparse early data is the trickiest one. New users have thin quiz history, which means personalization is weak right when first impressions matter most. The system has to produce useful artifacts before it has anything to personalize from, and the cold-start behavior has to be explicitly designed rather than left to drift.
Reference resolution is where bugs feel the worst. Follow-ups like "quiz on this" depend on stable thread IDs and trim-safe transcript handling under token limits. When that breaks, it feels like the assistant just forgot who you are. The technical cause is a clipped transcript or a missing thread reference; the user experience is something much worse.
Tool misfire is subtle but measurable. Over-triggering note or quiz generation wastes latency and burns token budget. Under-triggering means the assistant is just a chat window. Getting the routing guardrails right is an ongoing calibration, not a one-time setting.
What changed
| Dimension | Stateless chat | Context-first Cereby AI |
|---|---|---|
| Weak-point targeting | None | Quiz- and path-aware questions |
| Thread continuity | Per-message only | Thread plus session artifacts |
| Planning integration | Disconnected | Calendar- and quiz-aware planning |
| Response depth | Fixed for everyone | Shaped by history and follow-up |
The qualitative version: learners get durable study objects and coherent follow-ups instead of one-off answers. On the operator side, "it forgot me" reports drop sharply when context assembly stays healthy.
What's next
Tighter telemetry on which context slices drive quality versus cost. An explicit cold-start mode so thin-data users get useful defaults instead of silence. Deeper hooks into spaced repetition and file chat without blowing token limits.
