Updates, tips, and stories.
Notes from the team building Cereby — what we're shipping, why, and how to get more out of your studying.
Performance analysis without the pin
Our detailed performance breakdown only worked when users pinned a quiz first. We rebuilt it so the system understands subjects on its own, with a Gemini Flash 2.5 Lite classifier replacing brittle keyword matching.
Seeing Page Freezes in Our Error Dashboard
Chrome's "Page Unresponsive" dialog was happening in production and we had no signal for it. Here is how a tiny rAF heartbeat turned invisible freezes into rows in the same error dashboard everything else lands in, grouped per route so a 7-second freeze and a 17-second freeze on the same page collapse into one bug.
From 10MB to 500MB: How Direct-to-Bucket Uploads Scaled With Us
Next.js API routes cap request bodies at roughly 10MB, so we rerouted file uploads to go straight from the browser to Supabase Storage and gave the server only a small JSON pointer, which let us support up to 500MB without giving up validation, quota enforcement, or auditability.
Streaming Read-Aloud: From Buffered Waits to Sentence-Level Pipelining
Read-aloud used to wait for the full assistant response before speaking; we rebuilt the pipeline around streaming text deltas and parallel TTS prefetch so the first sentence plays in roughly half a second instead of several.
Enriching Cereby's Pinned Content Context: How We Made @ Mentions Smarter
Cereby's pinned-content feature was discarding most of what it knew about a student's study materials, so we built a modular enrichment pipeline that adds per-card mastery, note metadata, quiz time analytics, teaching content, and cross-content links without touching the prompt layer.
How We Rebuilt Cereby's Memory to Feel Like It Actually Knows You
Cereby's memory relied on the chat model choosing to save facts; it rarely did, so we replaced the tool-call approach with a parallel LLM classifier, pgvector retrieval, and tiered storage that now detects memories automatically, deduplicates by embedding similarity, and retrieves them in under 2ms.
Six Cuts to the Cereby Orchestration Layer: Lazy Context, Smarter Budgets, and Fewer Wasted Calls
How we restructured Cereby's core request pipeline to skip unnecessary database queries on file-chat, cache compression results across follow-ups, and scale token budgets to the model, compounding six changes into a faster and cheaper system.
From First-Match-Wins to Parallel Scoring: How We Fixed Cereby's Misclassification Problem
Cereby's intent classifier was fast and accurate for clear requests, but regex patterns that short-circuited before the LLM could run were silently misclassifying ambiguous ones, so we rebuilt it as a parallel scoring pipeline where every classifier competes and the best score wins.
Building the Cereby Humanizer: A Rule-Based System That Fights AI Detectors
We built a two-pass humanizer that combines a neural paraphraser with 12 deterministic rule-based transforms, driven by detector feedback, and dropped a 68% AI-detection score to 51% while keeping the text grammatically correct and semantically faithful.
How We Reduced PDF Export Size by 98%
A three-page sparse note was generating a 46 MB PDF on download, traced to full-scrollHeight rasterization at 2x scale with lossless PNG encoding, fixed in three files to bring output down to 1 MB.
Cereby AI System Design: From File Upload to Grounded Answer
"We built the pipeline that turns an uploaded PDF, screenshot, or pinned note into compressed, query-relevant context the AI can actually cite, covering parsing, OCR, chunking, compression, and prompt injection."
How We Fix Typos Without Wild Guesses
We built a scoped message normalizer for Cereby's fast-path routing so pattern matchers see stable text without corrupting what the language model receives.
Hardening Our Detector API for Production Reliability
Our AI detector worked. Then we had to operate it. Here is how a key-gated VM port became a Compose stack behind Nginx on detector.cereby.ai, with TLS, rate limits, a 401-abuse jail, a concurrency cap, and a runbook short enough that on-call actually reads it.
Cereby Mini: Why We Moved to a Two-Intent Model
We replaced a growing taxonomy of product intents with a single routing fork (edit the document or answer in chat), so downstream pipelines stay rich while the top level stops fighting real language with labels.
Introducing the Standalone AI Text Detector
A single aggregate score buried in the editor told learners a draft "looked risky" but not where, and gave them nothing to compare across revisions; we shipped a standalone detector with section scoring, explicit re-grade actions, and a versioned snapshot history.
Introducing Cereby Tutor: Your AI Study Partner Inside Every Quiz
Cereby Tutor brings question-bound AI help into every quiz, with hints before you submit and honest explanation after, optionally grounded in the same source the quiz came from.
Why Cereby Supports Multiple AI Models (and How to Choose One)
A two-default model stack concentrated vendor risk and gave learners no honest way to trade quality against cost, so we moved to a gateway-backed allowlist with explicit tiers that make the tradeoffs visible without changing how the app talks to providers.
How Cereby's Four-Layer Context Makes AI Feel Continuous
Ad hoc prompt blobs drifted in shape and blew token budgets, so we replaced them with a strict ordered stack of system, session, memory, summaries, and live thread, making trimming and continuity a policy rather than guesswork.
Introducing Cereby Mini: Your Document AI
Reading in an editor, switching to an AI chat, copying suggestions, and pasting them back is slow and error-prone, so we built an in-document assistant where every proposed change goes through a diff gate before it touches a single character.
Accurate Citations with Compressed Context: A Two-Stage Verification System
Cereby's compression pipeline cuts token usage by 90-95%, so we built a two-stage citation system that verifies every quote against the full original document even when the AI only saw 5% of it.
Query-Aware Smart Compression: Solving the Single-Page Truncation Problem (part 2)
Simple truncation was silently discarding the most relevant parts of oversized pages, so we replaced it with a scoring pipeline that selects chunks by query relevance instead of position.
Hierarchical Context Compression: Cutting AI Costs by 90% Without Losing Quality (part 1)
File Chat was fast and accurate in demos, but each query cost $1.50 and took 20 seconds once a real document was involved, so we built a two-phase compression system that cut token usage by 92% and response time by 85% without measurable accuracy loss.
Optimizing Cereby AI: From 5-8 Seconds to Sub-Second Responses
Every request to Cereby AI was taking 5-8 seconds, and we traced the problem to sequential database queries and redundant AI calls. This is how a three-tier cache, context compression, and parallelized queries brought that down to sub-second responses with 40-50% lower API costs.
Voice-Powered Learning: Cereby's New Audio Capabilities
Keyboard-first study flows broke on mobile and in motion, so we added push-to-talk dictation and a script-plus-TTS podcast path so learners can speak requests and listen to generated review audio without leaving Cereby.
Inside Cereby's Intent Classifier: How We Route Natural Language to the Right Tool
We replaced a brittle keyword router with a six-phase intent classification pipeline that gets 95% of requests right on the first try, down from roughly one in four needing correction.
Improving Cereby Capabilities: From Plain Text to Rich Visual Learning Materials
Cereby AI generated sound ideas but delivered them as flat bullets and broken math, so we rebuilt the output layer around a shared visual stack that AI and users write through equally.
Introducing Cereby AI: Your Personal Learning Assistant
Stateless chat threw away quiz history, calendar context, and thread continuity; Cereby AI fixes that by shipping a deliberate bundle of learner state on every request, so the model can give advice that is actually grounded in what the student has been struggling with.
Mastering Spaced Repetition: The Science of Long-Term Memory
Most studying feels productive but leaves nothing behind. Spaced repetition fixes that by scheduling every review at exactly the moment your memory is about to slip.
