Hardening Our Detector API for Production Reliability
How a working scoring endpoint turned into something we could operate under real traffic.
The starting point was a port
The detector returned good scores. That part was never in doubt. What we shipped on day one, though, was the laziest possible production: a VM with the detector port exposed to the internet, an API key checked inside the app, and not much else. If you knew the URL and the key, you got a score. If you did not, you got a 401.
For a quiet integration, that is fine. As the only thing standing between the public internet and a model that costs CPU per request, it stops being fine quickly. There was no TLS guarantee at a stable hostname. There was no edge to absorb bursts, cap body size, or push back with a 429 before requests reached the app. A leaked key would have meant unbounded inference until someone noticed. A reboot meant remembering, by hand, which docker run lines to type. And our healthcheck called a key-protected route without a key, so the container marked itself "unhealthy" while serving real traffic perfectly.
None of that is a failure of the model. It is the kind of operational debt that does not show up until traffic does, and then it shows up all at once.
The shape we landed on
We kept the scoring app exactly as it was, then put a small Compose stack around it. Nginx became the only thing on 80 and 443. The detector container moved to an internal-only port. The product app started talking to https://detector.cereby.ai over TLS with a shared key, and the VM learned to bring all of this back up by itself after a power cycle.
The thing that made the rest tractable was a single split: edge concerns versus scoring concerns. Nginx owns TLS termination, request size, upstream timeouts, and coarse per-IP and per-key throttling. The container owns auth semantics and inference, with a hard ceiling on how many scoring jobs can run at once. That separation is what shrinks blast radius. Bad traffic becomes observable (429s in logs, a fail2ban jail filling up) and actionable, instead of a quiet wedge where the VM is technically up but no requests are getting through.
What "hardened" meant in concrete pieces
The Compose file owns the topology: the detector image built from the service repo with the API key passed as an environment variable, listening on a port that is never published; Nginx publishing 80 and 443 with config and cert volumes mounted in. A systemd unit calls docker compose up on boot so the topology after a reboot is byte-identical to the topology before it.
For TLS, certbot writes Let's Encrypt certs to a webroot that Nginx serves. Plain HTTP stays alive only for ACME challenges and for issuing a 301 to HTTPS on everything else. The hostname is detector.cereby.ai and that is the only hostname clients are expected to use.
Nginx adds the rest of what an exposed app should not have to think about: a body-size cap, connect/read/send timeouts to the upstream, per-IP and per-key rate limits, connection caps, and a regex location for scoring routes that writes its own access log to a path bind-mounted from the host. That dedicated log is what makes the abuse story possible without scraping generic logs.
Host posture is two fail2ban jails. One for SSH, key-only, with the usual restricted access. The second one tails the dedicated scoring log, matches repeated 401s on /score, and bans the offending IP at the firewall. This is the layer that catches credential probing and the kind of stress traffic that still costs us a TLS handshake and a proxy hop before the app can say "no."
Inside the app, a configurable concurrency cap (default 4) limits how many scoring inference jobs run simultaneously. Saturated requests come back as 503 with a Retry-After header. The point is not to be polite. The point is to refuse work the VM cannot do, fast, instead of queueing it into oblivion.
The threat we are actually defending against
It helps to be honest about what this stack is shaped to stop and what it is not. The detector is a CPU-bound model behind an API key. The cost of letting bad traffic through is not data exfiltration or remote code execution. It is wasted CPU on a single VM and a small bill for outbound bandwidth. That shapes the whole design.
What the stack absorbs is the steady drip of traffic that finds any exposed API: credential probes with key lists, scanners looking for .env and .git, scripted bursts that hammer /score to see what shape the response takes. None of it is sophisticated. All of it is constant. A single attacker with a key list and a loop can spend an evening on our endpoint and burn more inference time than a hundred legitimate users. The 401 jail and the per-IP limits make that uneconomical, not impossible. Two minutes of probing earns an hour of firewall block. Most opportunistic traffic gives up.
What the stack does not absorb is a distributed campaign with thousands of source IPs and valid-looking keys. We do not have the bandwidth ceiling, the IP intelligence, or the pattern recognition to defend against that. If it happens, the answer is to put Cloudflare or similar in front of detector.cereby.ai and let them own the perimeter. We have not done it because nothing of that shape has shown up. Adding a CDN today would be paying for a problem we do not have, in exchange for a vendor relationship and a control plane to learn. The day the math flips, we move.
Credential leak is a different problem with a different shape. If our API key ends up in a public commit, no perimeter security helps. The mitigation is short key rotation cycles and one-line revocation, not better firewalls. The architecture does not pretend to solve that, and pretending it did would be the more dangerous bug.
Verification, and what changed
The runbook is short on purpose. After any change: docker compose ps, tail both containers' logs, curl -I HTTP and expect a 301, curl -I HTTPS /healthz and expect 401 with no key, then curl /score with the key and a real body and expect a 200 with valid JSON. We also confirm the dedicated access log is receiving lines and fail2ban-client status shows the detector jail active.
Side by side, before and after look like this:
| Area | Before | After |
|---|---|---|
| Public surface | App port exposed | Nginx on 80/443 only |
| TLS | Not standardized | HTTPS default on detector.cereby.ai |
| Overload behavior | Best-effort | 429 at the edge, 503 when inference slots are full |
| 401 / abuse | Rate limits only | Dedicated log plus fail2ban ban after repeated 401s |
| Boot | Manual / ad hoc | systemd plus Compose |
| Post-deploy proof | Tribal knowledge | Copy-paste curl checks in a runbook |
What this taught us
Healthchecks have to match the auth reality of the routes they call. If production needs a key, the probe needs the same key. Otherwise you will spend an afternoon chasing a healthy container that Docker insists is sick.
The boring failures get the runbook entry. Cert ordering, missing compose files, hairpin 502s, broken build contexts. They are most of the actual incident time, and a one-paragraph cause-and-fix beats heroics at 2 a.m.
The rest is what you already know but skip: a reverse proxy pays for itself on day one, logs the host can read are worth the bind mount, rate limits and bans are different tools, and if three curls cannot prove the deploy worked, operators will not trust it.
Follow-up: 2026-04-19 audit
Three weeks in, we ran an independent audit against everything above. Most of it held. Two findings did not, and a few smaller things were worth tightening on the way through.
The biggest one was that the original key-gated path was still running. The pre-migration listener that pre-dated the Compose stack had been added to, not replaced by, the new topology. An older plaintext port was still reachable from the public internet, completely bypassing Nginx, TLS, the rate limits, the body-size cap, and the 401 jail. The app's API key check was the only thing left in front of it. The classic shape of a half-finished migration: v2 shipped, v1 never shut down.
The second one was that nothing was renewing the cert. The Let's Encrypt cert was still valid, but no timer, cron entry, or certbot container existed to renew it. Months from now it would have expired silently, with no alert in front of it.
The smaller findings (access-log scope, unbounded container logs, missing logrotate, SSH posture drift, and a build context that had diverged from the tracked source tree) are covered in the table below.
| Gap | Fix |
|---|---|
| Legacy plaintext listener | Disabled at the init system, port revoked at the firewall, now TCP-refused from the public internet. |
| Cert renewal | A twice-daily systemd timer runs a throwaway certbot/certbot container against the same bind mounts nginx already uses, then reloads nginx in place on success. |
| Access-log asymmetry | Moved the access log declaration up to the server {} level so every request, 443 and 80, hits the same bind-mounted file the jails read. |
| Scanner noise | Added a custom nginx-botsearch jail whose filter matches common attack paths (.git, .env, wp-admin, backup and dump files) against our log format. Ten hits in five minutes earns a 24-hour firewall ban. |
| Unbounded container logs | Per-container size cap in the Docker daemon config. Logs rotate automatically instead of growing without bound. |
| Nginx dedicated log growth | Weekly logrotate with gzip, coordinated with nginx -s reopen and a fail2ban reload in postrotate. |
Stitched together, the post-audit topology looks like this:
The legacy listener stays in the picture on purpose. Drawing it as a dead branch is the honest record of what the audit found, and it makes the next person to read this less likely to recreate the same v1-still-running mistake.
