Enriching Cereby's Pinned Content Context: How We Made @ Mentions Smarter

How identifying a quiet data-discard problem in our content pipeline turned pinned study materials into a genuinely rich tutoring context.

The data was there. We just weren't sending it.

Cereby's @ mention feature lets students pin specific content (notes, quizzes, flashcard sets) into a chat session. When something is pinned, Cereby enters sole-context mode: it answers only from that material.

A student pins a flashcard set they have studied five times. Cereby sees "5 sessions, avg 85%." It does not see that Card 3 ("Krebs Cycle") was missed in 4 of those 5 sessions. A student pins a note about photosynthesis. Cereby sees raw text and a creation timestamp. It does not see the tags, the folder, or the quiz generated from that note with a 72% average score. The data lived in IndexedDB and Supabase. The pipeline was discarding it.

None of this was an AI problem. It was a content-fetch problem.

Why the pipeline was throwing data away

Our fetch pipeline had two parallel paths:

Both paths produced the same FetchedContent shape: a type, a title, a content string, and a metadata bag. The gaps appeared in how each fetcher filled that content string.

For flashcards, aggregate session stats came through but the reviewCards array (which records exactly which cards were correct or incorrect in each session) was discarded during aggregation. For notes, tags, folder path, and any related quizzes or flashcards were ignored. Time analytics, teaching content, and cross-content links were all missing from the other types. None of the content told Cereby about relationships: a quiz generated from a note had a sourceResourceId pointing back, but neither side mentioned the other.

The enrichment layer

Inlining the missing data into each fetcher was the obvious move, but both ContentFetcher.fetchFlashcard() and resolveSelectedContent.resolveFlashcard() were already over 100 lines and existed on both client and server paths. We added a stage instead:

The ContentEnricher interface declares which content types an enricher applies to and implements a single enrich() method returning text to prepend or append. The EnrichmentCoordinator runs applicable enrichers per item and enforces a per-item token budget. The DataAdapter interface abstracts client-side (Dexie/IndexedDB) from server-side (Supabase service client) data access so enricher logic is written once, tested against a mock adapter, and connected to either real adapter without changes.

What each enricher adds

Per-card flashcard mastery

LearningContentFlashcardItem.attempts[].reviewCards[] was already recording { cardIndex, front, back, correct } for every session. The enricher groups by cardIndex, computes a miss rate, and classifies cards as struggling (>50%), needs work (30-50%), or mastered (<30%). What Cereby now sees:

## Per-card mastery (sorted by difficulty)
Card 3 "Krebs Cycle": missed 4/5 sessions (80% miss rate) [struggling]
Card 7 "ATP synthase": missed 2/5 sessions (40% miss rate) [needs work]Cards 1,2,4,5,6,8: consistently correct (mastered)

Note metadata and related content

Notes get tags, folder path, and related content via a reverse lookup on sourceResourceId (IndexedDB scan on the client, user_notebooks JSONB query on the server):

Tags: biology, photosynthesis
Folder: Science / Biology
Related content:
Quiz "Photosynthesis Test": 3 attempts, avg 72%Flashcard set "Plant Cells": 5 sessions, avg 85%

Cross-content linking

Pinned quizzes and flashcards get the reverse: the enricher reads sourceResourceId and surfaces the source note plus any siblings generated from it:

## Related content
Generated from note: "Photosynthesis" (Science / Biology)Also from same note: Flashcard set "Plant Cells" (5 sessions, avg 85%)

Quiz per-question time tracking

The one enricher that required a UI change: quizzes had no per-question timing. We added a questionStartTimeRef to the quiz modal, capped elapsed time at 5 minutes to filter idle, and threaded the value through QuizAttemptQuestionResult as an optional timeMs field. The enricher then flags questions taking more than 2x the per-attempt average:

## Time analysis
Average time per question: 15s
Slow questions (>2x average):
Q3 "Calculate molarity": avg 45s (3.0x average)

Token budget management

The coordinator enforces a per-item character budget that scales from ~2,000 tokens for 1-2 pinned items down to a floor of ~750 for five or more. When an enricher's output exceeds the remaining budget, the coordinator truncates and adds a [enrichment truncated] marker. Enrichers run in priority order: cross-content links and metadata first (small, high-value), then per-card mastery and time analytics, then teaching content last (largest, most variable).

Before and after

Dimension	Before	After
Flashcard mastery	Session-level aggregates only	Per-card miss rate with struggling card identification
Note context	Raw content plus createdAt	Tags, folder, related quizzes and flashcards
Quiz analytics	Score and wrong answers only	Per-question time outlier detection
Cross-content links	None	Bidirectional source note linking

The layer is fully backward-compatible: missing data returns empty strings; a pipeline throw falls back to the original unenriched content.

What this taught us

Everything we needed was already stored. The issue was a pipeline that treated content formatting as a one-shot serialization step rather than a composable one. Inserting an enrichment stage between fetch and format gave us a clean slot without touching either end.

The highest-value unlock was per-card data: students do not need aggregate flashcard scores, they need to know which specific cards they keep missing. The coordinator surfaced that the token tradeoff must be explicit: X items pinned means Y tokens per item, enforced, not aspirational. And adding timeMs?: number to QuizAttemptQuestionResult cost nothing for existing data, no migration required.

How identifying a quiet data-discard problem in our content pipeline turned pinned study materials into a genuinely rich tutoring context.

The data was there. We just weren't sending it.

None of this was an AI problem. It was a content-fetch problem.

Why the pipeline was throwing data away

Our fetch pipeline had two parallel paths:

Both paths produced the same FetchedContent shape: a type, a title, a content string, and a metadata bag. The gaps appeared in how each fetcher filled that content string.

The enrichment layer

What each enricher adds

Per-card flashcard mastery

## Per-card mastery (sorted by difficulty)
Card 3 "Krebs Cycle": missed 4/5 sessions (80% miss rate) [struggling]
Card 7 "ATP synthase": missed 2/5 sessions (40% miss rate) [needs work]Cards 1,2,4,5,6,8: consistently correct (mastered)

Note metadata and related content

Notes get tags, folder path, and related content via a reverse lookup on sourceResourceId (IndexedDB scan on the client, user_notebooks JSONB query on the server):

Tags: biology, photosynthesis
Folder: Science / Biology
Related content:
Quiz "Photosynthesis Test": 3 attempts, avg 72%Flashcard set "Plant Cells": 5 sessions, avg 85%

Cross-content linking

Pinned quizzes and flashcards get the reverse: the enricher reads sourceResourceId and surfaces the source note plus any siblings generated from it:

## Related content
Generated from note: "Photosynthesis" (Science / Biology)Also from same note: Flashcard set "Plant Cells" (5 sessions, avg 85%)

Quiz per-question time tracking

## Time analysis
Average time per question: 15s
Slow questions (>2x average):
Q3 "Calculate molarity": avg 45s (3.0x average)

Token budget management

Before and after

Dimension	Before	After
Flashcard mastery	Session-level aggregates only	Per-card miss rate with struggling card identification
Note context	Raw content plus createdAt	Tags, folder, related quizzes and flashcards
Quiz analytics	Score and wrong answers only	Per-question time outlier detection
Cross-content links	None	Bidirectional source note linking

The layer is fully backward-compatible: missing data returns empty strings; a pipeline throw falls back to the original unenriched content.

Enriching Cereby's Pinned Content Context: How We Made @ Mentions Smarter

The data was there. We just weren't sending it.

Why the pipeline was throwing data away

The enrichment layer

What each enricher adds

Per-card flashcard mastery

Note metadata and related content

Cross-content linking

Quiz per-question time tracking

Token budget management

Before and after

What this taught us

Learn with Cereby

Enriching Cereby's Pinned Content Context: How We Made @ Mentions Smarter

The data was there. We just weren't sending it.

Why the pipeline was throwing data away

The enrichment layer

What each enricher adds

Per-card flashcard mastery

Note metadata and related content

Cross-content linking

Quiz per-question time tracking

Token budget management

Before and after

What this taught us