Recognition Pipeline

Face recognition is the most distinctive — and most subtle — part of BrightBlur, and it runs entirely in the browser. No pixel and no embedding is ever sent to the server for inference. This chapter walks the pipeline from “an image is decoded” to “this face is suggested as Alice”, then covers the runtime that hosts the models and the cache that serves them. The code lives mostly in apps/web/src/lib/image/, with the server-side storage in apps/web/src/lib/server/api/person-data.ts and the client-side fold-in in apps/web/src/lib/pool-updater.ts / centroid-updater.ts.

The pipeline at a glance

decode → DETECT faces (SCRFD) → for each face: ALIGN (ArcFace/Umeyama) → EMBED (w600k)
       → MATCH embedding against the person's template POOL (raw cosine)
       → classify: auto-tag · suggest · unresolved

Detection and embedding are two separate TFLite models; matching is pure maths on the resulting 512-d vectors.

The TFLite runtime and the ML worker

Both models are TFLite, executed through @tensorflow/tfjs-tflite (WASM). They run in a Web Worker (src/lib/workers/ml.worker.ts) behind a facade (src/lib/workers/ml.ts) that exposes detectFaces and extractFaceEmbedding with a main-thread fallback when workers are unavailable (SSR, some test contexts).

src/lib/image/tflite-runtime.ts owns model loading. Two things there are load-bearing and have bitten production:

Integrity + sticky latch. Each model handle is created with an expectedSha256; on load the bytes are hashed and compared. A mismatch (or a deterministic load failure) latches the model “unavailable” so the app surfaces “recognition unavailable on this device” rather than silently returning zero-length embeddings on every face. The embedder’s hash is EXPECTED_MODEL_HASH in face-embedding.ts.
The importScripts polyfill (the _malloc incident). importScripts exists in module workers but throws when called; the polyfill must be gated on a zero-argument usability probe, never on typeof importScripts. Getting this wrong made face detection fail on iOS PWAs in June 2026. The branch is unreachable from Vitest by design (an XMLHttpRequest guard keeps it off Node), so any change to tflite-runtime.ts or the worker bootstrap must be validated with the heavy e2e:
Terminal window
```
cd apps/web && pnpm build && npx playwright test upload.test.ts --project=chromium-heavy
```
pnpm check and pnpm test:unit cannot catch a regression here.

Models are downloaded and SHA-256-verified at build time by scripts/setup-models.sh (pnpm setup:models) and served from static/models/.

Detection

src/lib/image/face-detection.ts + scrfd-decode.ts run SCRFD (the SCRFD_2.5G_KPS model, ~0.78 WIDER-FACE hard AP — a large step up from the old BlazeFace at ~0.55–0.60):

The input is letterboxed into 640×640 (SCRFD_INPUT_SIZE) with a neutral-grey fill (LETTERBOX_FILL_BYTE = 114, the InsightFace reference value) so the aspect ratio is preserved and the padding hallucinates no edges; pixels are normalised (p − 127.5) / 128.
Outputs are decoded per stride ([8, 16, 32]), thresholded at CONFIDENCE_THRESHOLD (0.3), and de-duplicated by NMS at NMS_IOU_THRESHOLD (0.4), then mapped back to source-image coordinates.
A DetectedFace is { boundingBox, confidence, landmarks } where landmarks is the five points (both eyes, nose tip, both mouth corners) in the original image’s pixel space. The detector model has its own EXPECTED_DETECTOR_HASH.

SCRFD expects a face to sit within a scene with margin — a tight, edge-to-edge crop of just a face will not fire. That fact matters for the rebuild flow (below).

Alignment and embedding

Detection gives a box and five landmarks; the embedder needs a canonically-aligned 112×112 crop.

Alignment (src/lib/image/face-alignment.ts): the five landmarks are mapped onto the standard ArcFace 5-point template via a Umeyama similarity transform (rotation + uniform scale + translation, no reflection — the closed-form least-squares solution). With only the two eyes available it falls back to the exact two-point similarity onto the same template eye points. The template coordinates are the canonical insightface arcface_dst values.
Embedding (src/lib/image/face-embedding.ts): a w600k MobileFaceNet model produces a 512-d, L2-normalised vector from the aligned crop. The model’s identity is its SHA-256 (EXPECTED_MODEL_HASH), and that hash is coupled to EMBEDDER_MODEL_VERSION (embedding-payload.ts): change the model and you must bump the version, because the version (a) stamps every stored embedding so the matcher refuses to mix vector spaces, and (b) keys the service-worker model cache (below). That coupling is now CI-pinned by model-version-coupling.test.ts — a hash change that forgets the version change fails the suite.
Crop hygiene (src/lib/image/full-res-embedding.ts): production always embeds from the full-resolution source blob, never the downscaled display decode, so stored vectors aren’t quietly derived from low-res pixels. A min-size gate (isFaceTooSmallToEmbed: inter-eye < 20 px, or bounding-box min side < 40 px) skips faces too small to be discriminative — they yield an empty vector and are never stored. Real Laplacian-variance sharpness is computed on the crop and threaded into the quality record.
Payload codec (src/lib/image/embedding-payload.ts): encodeEmbeddingPayload(embedding, quality, EMBEDDER_MODEL_VERSION) packs the vector, the quality inputs, and the model-version stamp into the bytes that are then encrypted and uploaded.

Matching and template pools

All matching happens in raw-cosine space (src/lib/image/pool-matching.ts, face-thresholds.ts, threshold-calibration.ts). Each person group has an encrypted template pool (person_pools) — a set of representative embeddings sealed as one envelope. A query face is scored by maximum cosine similarity over the pool’s templates. Two scores are derived: a raw pre-penalty max cosine, and a penalised score that subtracts a per-user negative penalty (RECOGNITION_LAMBDA 0.3 × the τ-decayed max cosine against the user’s hard negatives, RECOGNITION_TAU_SECONDS ≈ 60 days). The classification gates:

tLow (default 0.3, SUGGEST_THRESHOLD) — the membership gate, applied to the raw pre-penalty score. A negative penalty can never push a pool below membership.
tHigh (default 0.5, AUTO_TAG_THRESHOLD) — the auto-tag gate, applied to the penalised score.
rank + margin — the top identity’s raw score must beat the next distinct identity by AUTO_TAG_MARGIN (0.08).
corroboration — CORROBORATION_MIN_TEMPLATES (2) distinct witnesses at raw cosine ≥ WITNESS_STRENGTH_THRESHOLD (0.5), with near-duplicate witnesses collapsed at CORROBORATION_DEDUP_SIMILARITY (0.97).

These thresholds are per-user calibrated (threshold-calibration.ts) against the user’s history, defaulting to the constants above. Crucially, quality and recency are ranking-only — qualityScalar (detectionConfidence × sharpness × faceSize × poseNormality, each in [0,1], faceSize saturating at 112 px native) and recency weighting order candidates and break ties but never move a face across the band gates. Per-user hard negatives (personal_negative_stores) are τ-decayed and capped at 10 per identity (MAX_NEGATIVES_PER_IDENTITY) at the one append site in reject-tag.ts. Auto-tag therefore requires all of: raw rank-0, tHigh on the penalised score, the margin, the corroboration quorum, and a per-photo co-occurrence pass — so a single weak template can’t mis-tag someone.

Pools are maintained client-side: new embeddings upload as incorporated = 0, then pool-updater.ts’s updatePoolForGroup decrypts them, appends one template each (buildAppendedPool), and marks them incorporated in one atomic upsert (the legacy per-identity person_centroids seed is still produced by centroid-updater.ts for the migration path). Fold-in is MAX-cosine and compaction (pool-compaction.ts) selects rather than synthesises at the cap — so an extra template can only raise a match score, never lower it, and a flood of low-value templates can’t poison a centroid. The previous embedder version (BASELINE_EMBEDDER_MODEL_VERSION = uniface-mnv2-1) is pinned forever: pools still holding old-space templates stay inert under MAX-cosine rather than corrupting matches. Batch upload clustering (face-clustering.ts) is two-phase: match against known centroids at CENTROID_MATCH_THRESHOLD (0.5), then union-find cluster the rest at UNKNOWN_CLUSTER_THRESHOLD (0.45).

The service-worker model cache

The models are ~3 MB (detector) and ~13 MB (embedder), so the service worker (src/service-worker.ts) serves them cache-first. The cache name is keyed to the embedder version:

const MODEL_CACHE = `brightblur-models-${EMBEDDER_MODEL_VERSION}`;

This is not cosmetic. The models are served from stable URLs (/models/face-embedder.tflite), so when the binary is swapped the URL doesn’t change. A fixed cache name once meant every device that had cached the old model kept serving it forever after a swap; on-device the stale bytes failed the integrity check, latched the embedder unavailable, and silently broke all recognition (rebuild and new uploads) on existing installs. Keying the cache to EMBEDDER_MODEL_VERSION makes a model swap rename the cache; the activate handler drops the stale brightblur-models-* entry and the device re-fetches the correct model. When you change the model, the version bump (already required for the matcher) busts this cache for free — and CI now enforces that coupling.

Rebuild and untag flows

Two recently-added flows are where the per-face encryption and the deletion semantics meet recognition:

Rebuild from tagged photos (src/routes/people/[id]/slice-rebuild.ts + the GET /api/face-slices/unembedded endpoint). After the embedder swap wiped the pools (migration 0041), recognition can be reseeded from the stored face slices without re-tagging: decrypt each slice on-device, recover landmarks, embed, encrypt to the group’s current epoch key, and POST. Because the stored slice is a tight crop (the bounding box + 1 px), SCRFD won’t fire on it directly — so the rebuild pads the crop with neutral-grey margin before detection and maps the landmarks back to slice space. When even the padded crop yields no detection, it embeds the whole crop unaligned (resize-only) rather than skip; that’s safe under the MAX-cosine pool but leans on the multi-gated auto-tag for correctness. The flow is idempotent (a LEFT JOIN excludes already-embedded photos) and remembers attempted slices within a run.
Untag and photo deletion. person_embeddings.photo_id has no foreign key (see Data Model), so cascade is done in code: removing a person from a photo (removePersonFromPhoto) and deleting a photo both delete the embeddings by (person_group_id, photo_id) and, when an incorporated embedding is removed, drop the affected person_pools row and un-incorporate the survivors so the next rebuild re-ingests them. Statement order within that batch is load-bearing (deletes before un-incorporation). The client-side pool processed-cache is busted only on /people/[id] — any new untag surface must replicate that.

Where to look

image/face-detection.ts + scrfd-decode.ts (detect), image/face-alignment.ts (align), image/face-embedding.ts + full-res-embedding.ts + embedding-payload.ts (embed), image/pool-matching.ts + face-thresholds.ts + threshold-calibration.ts + quality.ts + template-pool.ts + face-clustering.ts (match), image/tflite-runtime.ts + workers/ml*.ts (runtime), pool-updater.ts + centroid-updater.ts + embedding-decrypt.ts (fold-in), service-worker.ts (model cache), routes/people/[id]/slice-rebuild.ts (rebuild).