How We Reduced PDF Export Size by 98%
Rasterized PDF exports balloon fast when scale, format, and captured height are all set to their worst defaults. Here is why it happened and the three targeted changes that fixed it.
The report that started it
A user flagged that downloading a three-page note as a PDF was producing a file over 46 MB. The note had mostly headings and short paragraphs. No embedded images. No unusual fonts. The kind of content that should fit in a few hundred kilobytes.
We traced it to the export pipeline in DocumentHeader.tsx. The path is straightforward: html2canvas captures the ProseMirror editor DOM, canvas.toDataURL encodes the result as an image, and jsPDF tiles that image across pages before saving. Nothing wrong with the shape. Three choices, however, were compounding into a disaster.
Why a sparse note produced a 46 MB file
The first problem was scale. The html2canvas call used scale: 2, which doubles resolution on each axis and quadruples pixel count. An 850 px wide note became a 1700 px wide canvas.
The second was height. The capture used documentElement.scrollHeight, which sounds right but includes everything the editor CSS adds for UX comfort: min-h-[calc(100vh-200px)] and pb-96. On a 900 px viewport, that is a minimum canvas height of roughly 700 px of content area plus 384 px of bottom padding, before any real content is measured. A sparse three-page note could sit inside a 5,000 to 8,000 px tall canvas of mostly white.
The third was format. The call used toDataURL("image/png"), which is always lossless regardless of the quality argument passed. For a 1700 by 6000 px canvas, raw RGBA pixel data is around 40 MB. jsPDF stores this as a raw pixel stream inside the PDF, so DEFLATE is the only compression working against that number.
All three were set to their worst values at once.
The fix: three independent levers
We did not replace the pipeline. The capture-then-tile approach is reasonable. We changed three parameters that determine output size: scale, format, and captured height.
Before:
html2canvas → scale: 2, height: scrollHeight
toDataURL → "image/png" (lossless, ~40 MB raw)
addImage → "PNG"
After:
html2canvas → scale: 1.5, height: contentHeight (clipped to last child)
toDataURL → "image/jpeg", 0.85 (lossy, 80-90% smaller)
addImage → "JPEG"
Dropping scale from 2 to 1.5 cuts pixel count by 44%. Switching from lossless PNG to JPEG at 0.85 quality cuts encoded size by 80 to 90 percent for text-on-white content. The height clip addresses the remaining inflation: capturing 4,000 px of white padding below three short paragraphs.
Scale and format are independent levers. Both were at maximum. Either fix alone would have made a meaningful difference. Applied together they account for nearly all of the 98% reduction.
What changed in each file
DocumentHeader.tsx (single-page PDF path). The html2canvas call moves from scale: 2 to scale: 1.5, and toDataURL changes from "image/png" to "image/jpeg", 0.85. The backgroundColor: "#ffffff" was already in place, which matters: JPEG has no alpha channel, so a guaranteed opaque white background is required before encoding.
The height clip measures the bounding box of the last actual content child rather than relying on scrollHeight:
const docRect = documentElement.getBoundingClientRect();
const children = Array.from(documentElement.children);
let contentHeight = documentElement.scrollHeight;
if (children.length > 0) {
const lastChildRect = children[children.length - 1].getBoundingClientRect();
const measured =
Math.ceil(lastChildRect.bottom - docRect.top + window.scrollY) + 48;
if (measured > 0 && measured < contentHeight) {
contentHeight = measured;
}
}
The + 48 adds a small margin below the last element (roughly py-3). If measured exceeds scrollHeight, the fallback is scrollHeight. Long notes use their full measured content height. Short sparse notes stop capturing dead space.
lib/export/imageExport.ts (multi-subpage rasterization path). This file builds an off-screen container from raw HTML, runs html2canvas, and returns a base64 image string. Same two changes: scale: 1.5 and toDataURL("image/jpeg", 0.85). Interactive drawing canvases stay at PNG. Drawn content can have semi-transparent strokes and hard edges where JPEG artifacts would be visible. Only full-page background rasterization switches.
lib/export/documentExport.ts (PDF assembly for multi-subpage exports). One line changes from pdf.addImage(pageImage, "PNG", ...) to pdf.addImage(pageImage, "JPEG", ...). jsPDF uses the format string to interpret incoming image data. Passing "PNG" when the data is JPEG-encoded can produce corrupted output in some jsPDF versions. Aligning the format string is necessary, not cosmetic.
What this taught us
canvas.toDataURL("image/png", 0.5) and canvas.toDataURL("image/png", 1.0) produce identical output. The quality argument is silently ignored for PNG. If you want lossy compression from a canvas, you must pass "image/jpeg" or "image/webp".
scrollHeight is a UX number, not a content number. It includes min-height, padding-bottom, and any other CSS that makes the live editor feel comfortable. For export, you want the last measured content bounding box.
jsPDF format strings must match the encoded data. Passing "PNG" with JPEG data is not a silent no-op in some jsPDF versions. Align them whenever the encoding changes.
Two download paths existed (DocumentHeader.tsx and imageExport.ts into documentExport.ts). Fixing only one would have left the multi-subpage route broken. Audit all callers before closing the ticket.
The PDF renders cleanly in Preview, Adobe Reader, and browser-native PDF viewers. Formatted content (headers, code blocks, tables) looks identical to the 2x PNG output at normal reading zoom.
