← devlog

2026-05-18

Reading the trainer's source before writing the runbook

We had a recipe for validating PUP-3DGS pruning on our own captures. Building the harness to support the recipe revealed the recipe was wrong about which trainer to use.

docs/CAPTURE_AND_PRUNE_PIPELINE.md was drafted a day before this post. It read clean: capture your own video, train in Nerfstudio (the doc said "best PUP compatibility"), prune at retain=0.15, A/B the result against the original in a viewer. PRD v5 §12.1 calls the hero-NPC capture pipeline the build's #1 unknown — the doc was the de-risking plan.

The harness landed next. /compare?a=original.ply&b=pruned.ply: both files preload through the existing SplatCanvas LOD ladder, the swap is an entity-enable flip, the camera is shared. Auto-flick the variant at 2 Hz and any region that diverges between original and pruned jumps out — your visual system can't help but see motion. The snapshot button burns both halves into a side-by-side PNG with labels for the post-spike numbers table.

Then we went to verify the install steps the runbook would invoke. The canonical PUP-3DGS repo — j-alex-hanson/gaussian-splatting-pup, CVPR 2025, paper authors' own implementation — consumes the on-disk directory shape of the *original* graphdeco-inria 3DGS trainer. Specifically a folder containing point_cloud/iteration_30000/point_cloud.ply plus cameras.json and cfg_args. Not Nerfstudio's output. Not gsplat's. Not Postshot's. The three trainers our doc had listed as PUP-compatible don't directly feed the PUP pipeline at all.

Worse: the prune ratio isn't a single --retain flag the way the doc described. It's a fixed two-round bash script — round one drops 80% with 35k iterations of fine-tune, round two drops half of what remains. Net 90% reduction. Anyone who had followed the original runbook would have spent ~30 min standing up Nerfstudio in Colab, opened the PUP CLI, and hit a file-not-found error on the wrong directory shape.

We rewrote the doc. §4 (trainer compatibility table; the recommendation flipped from "Nerfstudio on a T4" to "graphdeco-inria on a RunPod A4000"), §5.3 (prune ratio is two passes, not one flag — edit the shell script to change ratios), §7 (realistic wall-clock is 1.5–2 hours first run, not the "30 min" figure the original quoted). docs/SPIKE_PUP_VALIDATION.md now has copy-paste commands that match what the actual repo runs.

The harness still works. /compare doesn't care where the .ply came from. /capture-rules — the mobile-friendly preflight checklist that lives on the phone while you record — was correct from the start because the §3 rules (manual shutter, locked exposure, loops not lines) sit below the trainer choice. /spike-status composes the audit scripts into a dashboard; the dashboard doesn't know about trainers either. The half of the work that turned out to be wrong was strictly the half that named specific external tools.

The lesson is narrower than "always read the source." The doc was drafted in good faith from project pages and paper abstracts — exactly the sources you should read to plan a spike. The miss was that the recipe encoded specific CLI invocations, and the only authoritative source for CLI invocations is the script the CLI runs. The harness work forced us to verify each command before documenting it, which caught the gap before any user followed the runbook.

Validating before validating. The 30 minutes saved aren't ours — they're whoever runs the spike next.